Saturday, June 30, 2012

Approaching automated images optimization in Plone

State of a naked Plone
Is not a news that images inside your site commonly cover a big percentage of the total size of the page (and probably also a good part of the information provided).
You can obviously optimize your Web server for using browser cache and take care of a lot of additional trick, however this probably only solve some kind of problems (I mean: you must do it... it's important!).

There are sites where images are changed (or added) frequently, maybe every day or hour. You can still think about use only small sized image, but this is not always simple to do (for a lot of users) or applicable.

Let think about a Plone site where your main page is a collection that show news.

How a Plone collection of news looks like The screenshot above is taken from a basic Plone 4.2 site where I simply used a collection of news with a modified version of the folder_summary_view template (where I show the 400x400 resized format called image_preview).

The original sizes of images above are not giant but not too small: first news is not giving you any image, 168KB for the second news, 266KB for the third news and 328Kb for the fourth news, for a total size of 762KB).

Now: don't forget one of the most useful Plone feature: the integration with PIL.
Using the PIL (or Pillow ;-)) library Plone is automatically (server side) resizing your image when you ask for a resized version of it (offtopic: when I explain Plone features to users, this is still one of the preferred ones).

So PIL is doing a lot of good job here: we are not downloading 762KB that browser will simply resize after download, but we download a resized version of the image istead.

You can see this in the "Document Size" report taken from the Web Developer Toolbar:
Document size: thanks to PIL Now you can ask: are not the 226KB of JavaScript the real size problem of the page?
Not exactly:
  • JavaScript source can be gzip compressed by Web server (like Apache in front of Plone)
  • JavaScript source are probably also cached in browser; although you update your site's contents frequently (like said above: you add news every hour) JavaScript is always the same. So: it simpler to use the browser cache with JavaScript than with images.
About the last note above: excluding layout images (logo, icons, ...) you must not forget that in Plone images are contents.
About image optimization
I remember a very interesting chapter about image optimization in one of the last book I read (Even Faster Web Sites, from Steve Souders): I learn a lot of information about different images format and problems using them.

The main argument of the chapter is about lossless image optimization.

When you use images for the Web you are often wasting bytes that you can save instead.
I'm not talking of compressing the images while loosing information and give to your users uglier images just for save some kilobytes: I'm talking of saving bytes while keeping the same level of visual information.

The book above talk of a lot of command lines tool that do the trick:
What they do is: optimizing the image when possible and removing image metadata.

Some weeks before reading the book, Denys Mishunov show us a cool tool for Mac front-end developers: ImageOptim. What this tool does is nothing more that try to run all of the tools above (and some other) on images that the user provide.

So: front-end developer must take care of providing the best image compression and optimization they can. Tools like YSlow or Google PageSpeed can easily help you to find images that you need to optimize.
This also should help your site with search engines optimization.

Let's go back to the example page above: I will probably need to run image optimization on all my Plone theme images once but, as you can see above, layout images are a minimal part of the total size of the page.

What I really can't do is: force my users to optimize images before loading them!

What we can do in Plone
My first idea:
As all tools above are command line tools, why don't use theme inside Plone? Why don't call theme as external processes before storing data in Plone?
I'm not the first that think about this task: Jon Stahl wrote a couple of articles about Plone and images optimization two year ago.
Inside the article you can read a sentence that say "Doing a mediocre job on this would probably be pretty easy, but it will take some focused effort to really nail the details that will make this sing"...
I quickly understand that my idea was exactly the kind of mediocre job Jon is talking about :-), and he's right.
Let's move on to understand why.

I put my idea in an alpha product for Plone: collective.optimage.
Do not use it in production until you read carefully the documentation and know what it's doing.
What the product will give you is:
  • react when a new image file is provided to IATBlobImage contents
  • take the mimetype of the file and run all registered optimization handler configured onto the blob file
  • substitute the original blob with the optimized ones
The product can be also configured for using some kind of image optimization tool while ignore others. Indeed the product will do nothing and force you to register manually additional ZCML, one for every external tool.
You can also easily provide your own.

Let's repeat the test with the same page. I added collective.optimage to the buildout then uploaded again all images inside news:
Document size: thanks to collective.optimage We saved 7 kilobytes, but a lot more if we refer to full size images:
  • 147 Kb instead of 168 Kb for the image of the second news
  • 94 Kb instead of 266 Kb for the image of the third news
  • 299 Kb instead of 328 Kb for the image of the fourth news
Total size now: 540 Kb instead of 762.

Problems of the approach
Apart some technological choices I did (just to be not forced to monkey-patch Plone code), the main problem is low performance.

When you use collective.optimage, your Zope thread is running an external process (if you configure more than a provider for the same kind of image, you'll run all of theme) and it's waiting for the execution to end.
Depends on image, format and tool used, this can be a long task: saving the image can became 2/5/10 seconds slower.
Some big images require a lot of time, while some tool are slower than other (for example: I provided an optimization handler for pngout, but it's disabled because I found it really slow).
This is not what you need if your Plone site can host a lot of concurrent editors.

Other approaches
Why don't run this task as scheduled job during the night? Why simply try to optimize all image blob file when the server is not working at 100% (like you'll probably do with a static HTML site)?
This can also be done offline (after all: with blob support images are simply file on the filesystem).

This is a task possible solution but you will provide to your users the unoptimized version of your images when they are "new":
  • editor save the news item
  • first-day visitors will download the unoptimized version
  • night job optimize the image
  • other visitors will get the optimized image from here to end of times
This can be enough if your Plone site is an image archive but not if your main need is optimize news, or other images that expire quickly. A news item image in a productive site can be downloaded thousands of time until another news take over it. Who care about old news?

An approach in the middle can be delay the job for some seconds or minutes, putting the task in a queue of "image to be optimized". This can probably be reached quickly using (so I bet on this as "best solution") but note that this didn't solve totally the delay from having the optimized image available to visitors (but this delay can be really short, like some seconds, in most cases).

Stop! Why collective.optimage is not working on my News Item?
Because I found (and I was stuck when I discover this) that right now images of News Item content type in Plone are not stored in blob. The product is only working for File and Image content types. I hope this will change in future...

Instead of supporting also non-blob-image, I preferred to test those features on a patched version that support News Items also.
You can try my fork of