A Drupal youtube Site Recipe

Please see the Media Mover annoucement and check out the Media Mover module suite on Drupal.org

updated for drupalcon | Download the .odp presentation

This describes a Drupal project to create a youtube style video sharing site. There are two major pieces that this project dealt with that stretch the project:

  • converting users' uploaded files into a multi-platform format (FLV Flash videos)
  • hosting the uploaded content with amazon's s3 services
  • The cross platform requirement is an obvious target. Flash does pose some problems for linux users, however, for our client, Flash was deemed the best solution. Amazon's s3 solution offered a low cost bandwidth and storage solution with a high degree of scalability that we have not found a competitive alternative solution to- since Amazon's fee structure is based a flat rate usage, and appears to scale extremely well, our client didn't have to invest into a media serving infrastructure. We considered scripting a processing script that would publish to YouTube, however, given that the client had specific regulations for who could participate, we decided that not using existing video services made sense.

    Both of these requirements necessitated an approach that would involve software outside of the Drupal framework since neither currently exists. To implement, the project ended up breaking the upload process apart from the conversion and media hosting processes. Since Drupal already does a good job of easily getting media onto a machine, what seemed an appropriate way to conserve hardware cycles was to break the conversion process out in a separate script run by cron. This script uses the Drupal database to find out what files it needs to process, processes them, then moves them, pushes them over to the s3 servers, and then updates the Drupal database with urls to the media.

    This project had an extremely tight turnaround- two weeks from beginning to launch so limiting custom functionality and programing was a priority. This meant limiting the number of custom programing points in the project.

    The Drupal Video Module
    When we first started the project, we considered extending the Drupal video module. It does a good job of embedding in media in the page and is quick to install. There are a few reasons why we decided to write custom software to handle our media:

    1) we needed to convert the uploaded media
    2) we wanted to host the media in a different location
    3) we wanted to auto generate a thumbnail from video upload
    4) we were concerned about the overhead on drupal if the conversion and management was done inside of drupal
    5) we were concerned about the amount of data that the video module requires.
    6) we initially thought we might use an additional machine to process media

    All of the reasons steered us in the direction of considering writing our own module to meet our projects need.

    Using CCK
    CCK gave us pretty much everything we needed for getting our data set off the ground. We needed our data set to be small- title, description, tags, the video file, and agreement with the rules of the site. CCK gave us all of this by connecting taxonomy and file attachments. This gave us a simple form for a user to upload video.

    Some Customization
    We did end up writing a small custom module to specifically to modify some of the presentation aspects for the end user experience. Utilizing the form_alter hook, we removed several fields where we store the url for the video file on amazon and the url of the thumbnail that we generate from the video. We also change the file upload field set to always be open and present some different text which is customizable by the admin.

    We also have a small tweak to how the user registration system works. We use the profile module to collect some additional data when the user registers. The client had an unusual request that they wanted any one to be able to enter, but only people with complete profile data to be able to post video. Thus the profile data wasn't required for all users, but was required for the upload process. We interrupted the node/add/content_video form, checking tif the user had filled out all the necessary profile data. If it wasn't found, we return the user to their profile edit screen.

    The Media Mover Script
    Since we were using CCK to get the content into Drupal, we decided to write an external script to be responsible for the conversion and movement of the files to Amazon. We wanted to do this because:

    1) we could run the conversion process in a que, making it easy stop if load on the machine is high.
    2) we avoid the additional load on Drupal and possibilities for timeouts.

    This introduces a few issues
    1) we have to be careful that we're not converting or moving data that already has been
    2) conversion and or moving processes could overrun themselves.

    We wanted to use straight forward tools that were easily accessible to run the script. This resulted in writing the script in php so users without low level access to servers could easily run the script with cron. We also came up with some process flow conditions that allowed us to meet some of these challenges without to much difficulty.

    We decided that all videos being uploaded to the site would have their status set to not be published and not be in the moderation que. After the media conversion process, the media would be set to published and in the moderation que. These settings gave us ways to pick out which files needed to be processed and a mechanism to re-run the processing. This could be done with just the published flag, though in our case, we needed moderation.

    Interact with the Drupal Database
    It's fairly simple to generate a list of nodes and their files if these rules are maintained:

    $query = "SELECT files.filepath, files.nid FROM files " .
    "LEFT JOIN node ON files.nid = node.nid ".
    "WHERE ((node.type = '". $drupal_cck_content_type ."') AND (node.status != 1) AND (node.moderate != 1))";

    There are a number of ways this could be done- in our case we could have used the Amazon URL field and the Thumbnail path field that we're using since both of these would be empty for a un-processed node, however, we wanted an easy way for the client to re-run the processing, so this was our solution.

    Process the Files
    Once we have a list of files, we just need to start processing them with ffmpeg. Note that you should have the --enable-mp3lame and --enable-faad support compiled into ffmpeg to be able to convert avi and mov files into flash. You may find that this page and this are helpful if you need to install ffmpeg on a debian or ubuntu server.

    We also need to be a bit careful about the file that the user uploads. Since we're running commands from php, we have to be sure that the user can't craft a file name that might allow them to execute code locally. The file should probably be completely renamed with a hash of the node id and the file name.

    // replace scary characters
    $pattern = "/[^a-zA-Z0-9_.]/";
    $flv_output = $output_path . preg_replace($pattern, "_", basename($file_path). ".flv");

    $command = "$path_to_ffmpeg -i '$file_path' -acodec mp3 -ar 22050 -ab 32 -vcodec flv -s " . $output_width . "X" . $output_height ." '$flv_output'";$output_height ." $flv_output";
    exec($command, $data );

    Once a file has been converted, our script moves the original file to an archive directory to keep the Drupal files directory manageable.

    We should also create a thumbnail for this video at the same time:

    $command = "$path_to_ffmpeg -y -i '$file_path' -vframes 1 -ss $thumb_time -an -vcodec mjpeg -f rawvideo -s " . $thumb_width . "X" . $thumb_height ." $thumb_path";
    exec($command, $data);

    Moving to S3
    Now we need to move the flv file to amazon's s3. I used the storage3 php library which made this straight forward:

    $s3=new storage3($myAccessKeyId, $mySecretAccessKey, $url);
    // put file on amazon
    $s3->putFile($file_path, $bucket, $file_name);
    // set the ACL
    $s3->setACL($bucket, $file_name);
    return "http://s3.amazonaws.com/" . $bucket . "/" . $file_name;

    Updating Drupal
    Now the Drupal database needs to be updated with the URL from Amazon and the thumbnail. This is pretty straight forward as we've stored all our new data in a array $files:

    // update files directory with the new path
    $query = "UPDATE files SET filepath = '". $file['archived_file'] . "' WHERE nid = '". $file['nid'] . "'";
    $query = "UPDATE $drupal_cck_content_table SET $drupal_cck_amazon_url_field = '" . $file['amazon_url'] . " WHERE nid = '". $file['nid'] . "'";
    $query = "UPDATE $drupal_cck_content_table SET $drupal_cck_thumbnailpath_field = '". $file['thumb']."' WHERE nid = '". $file['nid'] . "'";
    $query = "UPDATE node SET status = 1 WHERE nid='". $file['nid'] . "'";
    $query = "TRUNCATE cache";

    Unfortunately, we're truncating the cache here. It is probably possible to remove all of the cache only related to this specific node, however, we choose the brute force method. This should be fixed. We're running this script off of cron and run it every five minutes. We also have a lock file to prevent the script from overrunning itself.

    At this point the admin can go in and use the moderation que and make the new content visible.

    Theming the Node
    Using the swfObject library and the flash flv player to create a node with an IE Flash compatible player. We use a custom template to theme the specifics for the video nodes:

    $the_path_to_player = base_path() . path_to_theme() . "/flash_flv_player/flvplayer.swf";
    $the_movie = "$the_path_to_player?file=". $node->field_amazon_flv_file_url[0][value] . "&autostart=trueâ„‘=" . base_path() . file_directory_path() . "/" . $node->field_thumbnail_path[0][value];

    $params = array(
    "allowScriptAccess" => "sameDomain",
    "quality" => "high",
    "height" => "240",
    "width" => "320",
    "movie" => "$the_movie" ,

    drupal_add_js(drupal_get_path('module','swfobject') . "/swfobject.js");
    print swfobject_create($the_movie, $params);

    Once you stitch all this together, you get a pretty nice system.

    Drupal modules in use

  • CCK
  • Views
  • Voting API
  • UserReview
  • swfObject
  • upload
  • a custom module that modifies the cck fields for uploading
  • References

  • ffmpeg
  • php s3 library
  • flash flv player
  • Video Blogging using Django and Flash(tm) Video (FLV)
  • Comments

    Add a new comment