~ 3 min read

Enabling slideshows in Drupal by converting PPT and PDFs

share this story on
Using Gearman as a job server to run background scripts that convert media payload like PowerPoint and PDF files into Slideshows hosted on a Drupal site

One of the user stories we’ve been busy with at work was to enable a service similar to slideshare, where users are able to upload their presentations and we’ll create a browser slideshow for it. This means that we accept various formats, like the popular .ppt presentation files, and locally convert it to something we can work with and display on standards web browsers like images. Doing this with online services like slideshare is definitely possible using their APIs but we need to keep these stuff in-house due to company policy and such, so my colleague David Madar had put together a list of open source tools to get this job done.

Installing software

The solution includes the following software stack:

  1. Gearman as a job server for running background tasks
  2. Openoffice, a Python script and ghostscript, all for the purpose of accepting one input format and converting it to an output format as we desire (images for now).

Installing Gearman

In most cases I’d just grab the gearman sources with the required headers and supporting libraries but if you’re working with CentOS then most packages are plain old, and secondly a pain to install if there’s no RPM for it (yes, it’s a rant). So if you’re one of the luckiest men on earth to be given CentOS 5 to work with (unfortunately 5.3 to be exact) for this little experiment you can do:

 yum install gearmand.x86_64 wget http://pecl.php.net/package/gearman/0.8.3 && pecl install gearman-0.8.3.tar.gz

and this will give you gearman server with the required libraries as well as the gearman php extension. Now you are the proud owner of gearman server version 0.14, which was released back in 2010 while there’s already 1.x last released on 2013, but hey, at least your using RH/CentOS. Next up, installing the rest of software stack:

yum install openoffice.org-pyuno openoffice.org-headless.x86_64 openoffice.org-draw.x86_64 openoffice.org-graphicfilter.x86_64 openoffice.org-headless.x86_64 openoffice.org-impress.x86_64 openoffice.org-calc.x86_64 ghostscript

And finally getting unoconv, which is yet again not a member of the default RH5.8 repository packages so next is obtaining the RPM for unoconv from http://pkgs.org/centos-5-rhel-5/repoforge-i386/unoconv-0.5-1.el5.rf.noarch.rpm.html and installing it.

wget http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/unoconv-0.5-1.el5.rf.noarch.rpmrpm -Uvh unoconv-0.5-1.el5.rf.noarch.rpm --nodeps

It’s important to test unoconv and make sure it works. Give it a PPT file and convert it to PDF as follows:

/usr/bin/unconv -f pdf <inputfile.ppt>

Integrate with Drupal

It really depends on how you want to do things. To create a slideshow which doesn’t depend on any plugin you may want to convert everything ultimately to images which give you more room to play with, such as creating your own views based slideshow gallery, javascript to manipulate the navigation, etc. So if you’re getting a PPT file you want to convert that to PDF and then to break the PDF pages each into an image. If you’re getting a PDF that’s half the job already done for you. While you can create and manage all of the conversion process right-after the upload and as part of your content type creation submit handler you will probably find it a better user experience to process the conversion task as an offline, batch job. We chose to go with Gearman but there are ways to do this like hook into cron, etc. To make drush gearman-worker command run in the background as a daemon without losing mysql connectivity a colleague found it effective to re-connect to the mysql database every time the worker function is called. This is accomplished this way:

function your_module_gearman_worker($work) { 
  // Reset the connection to the DB global $db_url, $active_db; 
  if (is_array($db_url)) {
    $connect_url = array_shift(array_values($db_url)); 
  } else {
    $connect_url = $db_url;

  $active_db = db_connect($connect_url); 

  // rest of your code goes here...