~ 3 min read
Enabling slideshows in Drupal by converting PPT and PDFs
One of the user stories we’ve been busy with at work was to enable a service similar to slideshare, where users are able to upload their presentations and we’ll create a browser slideshow for it. This means that we accept various formats, like the popular .ppt presentation files, and locally convert it to something we can work with and display on standards web browsers like images. Doing this with online services like slideshare is definitely possible using their APIs but we need to keep these stuff in-house due to company policy and such, so my colleague David Madar had put together a list of open source tools to get this job done.
Installing software
The solution includes the following software stack:
- Gearman as a job server for running background tasks
- Openoffice, a Python script and
ghostscript
, all for the purpose of accepting one input format and converting it to an output format as we desire (images for now).
Installing Gearman
In most cases I’d just grab the gearman
sources with the required headers and supporting libraries but if you’re working with CentOS then most packages are plain old, and secondly a pain to install if there’s no RPM for it (yes, it’s a rant). So if you’re one of the luckiest men on earth to be given CentOS 5 to work with (unfortunately 5.3 to be exact) for this little experiment you can do:
yum install gearmand.x86_64 wget http://pecl.php.net/package/gearman/0.8.3 && pecl install gearman-0.8.3.tar.gz
and this will give you gearman server with the required libraries as well as the gearman php extension. Now you are the proud owner of gearman server version 0.14, which was released back in 2010 while there’s already 1.x last released on 2013, but hey, at least your using RH/CentOS. Next up, installing the rest of software stack:
yum install openoffice.org-pyuno openoffice.org-headless.x86_64 openoffice.org-draw.x86_64 openoffice.org-graphicfilter.x86_64 openoffice.org-headless.x86_64 openoffice.org-impress.x86_64 openoffice.org-calc.x86_64 ghostscript
And finally getting unoconv, which is yet again not a member of the default RH5.8 repository packages so next is obtaining the RPM for unoconv from http://pkgs.org/centos-5-rhel-5/repoforge-i386/unoconv-0.5-1.el5.rf.noarch.rpm.html
and installing it.
wget http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/unoconv-0.5-1.el5.rf.noarch.rpmrpm -Uvh unoconv-0.5-1.el5.rf.noarch.rpm --nodeps
It’s important to test unoconv
and make sure it works. Give it a PPT file and convert it to PDF as follows:
/usr/bin/unconv -f pdf <inputfile.ppt>
Integrate with Drupal
It really depends on how you want to do things. To create a slideshow which doesn’t depend on any plugin you may want to convert everything ultimately to images which give you more room to play with, such as creating your own views based slideshow gallery, javascript to manipulate the navigation, etc. So if you’re getting a PPT file you want to convert that to PDF and then to break the PDF pages each into an image. If you’re getting a PDF that’s half the job already done for you. While you can create and manage all of the conversion process right-after the upload and as part of your content type creation submit handler you will probably find it a better user experience to process the conversion task as an offline, batch job. We chose to go with Gearman but there are ways to do this like hook into cron, etc. To make drush gearman-worker
command run in the background as a daemon without losing mysql connectivity a colleague found it effective to re-connect to the mysql database every time the worker function is called. This is accomplished this way:
function your_module_gearman_worker($work) {
// Reset the connection to the DB global $db_url, $active_db;
if (is_array($db_url)) {
$connect_url = array_shift(array_values($db_url));
} else {
$connect_url = $db_url;
}
$active_db = db_connect($connect_url);
// rest of your code goes here...
}