Thursday, November 17, 2016

Versioning static files with S3 buckets

Although there's a trend of making single-page applications with frontend static files managed separately from backend api, the need of managing static files haven't gone away just yet. And everyone who does web is aware of common issues with it. Probably the most common one is browser cache. Files get cached in user's browser and are used even after you changed and deployed them. Cache-controlling headers can help somewhat but not much. That's my cache-boosting techniques are usually a must.

There are many ways to do cache boosting. Usually, it involves adding some version info into all static urls (e.g. /style.css becomes /style-13.29.css or /13.29/style.css), hence it's often called "versioning". If you use some Django app to manage your static files (compress, combine, minify, etc.) it often can provide you with some solution. Use it, it's probably reliable and easy. This proposal, however, is cool if you happen to use Amazon's S3 for your static hosting (directly, not behind cloudfront). Yeah, I'm mostly referring to Django in this post because that's what I use, but general principles apply anywhere.

What exactly I'm proposing? Url has two elements: domain and path, since you can create unlimited number of S3 buckets for free, you can create a bucket for every deployment with version in its name. So your static url will look like

I won't include particular examples in this post but basic workflow is like this: always populate AWS_STATIC_STORAGE_BUCKET_NAME setting from environment or some similar source, create a bucket when a deployment starts, make sure new version (it could be git hash or anything, just like with any other tools) is available as environment variable and previous one is somehow available too, run collectstatic (it will be using new bucket but currently-running application will still use the old one), reload application when it's done, destroy the old bucket once every host is reloaded (if running on more than one server). Multi-server environments will probably need some way of communicating for destroying old buckets effectively, but it's beyond this short post. Other than that, all you need is some way of shuffling two environment variables (or something), a couple settings, and two very short custom management commands (for creating and destroying buckets), and IAM role for the instance it's running on with appropriate policy.

Is this much better than using a directory in a single S3 bucket? No, not much. Url could be somewhat shorter (bucket names must be unique and by adding version to them it could be easier not to clash with other users), garbage collection is easier (you just remove the whole bucket, no need to do any file operations), but that's about it. IAM policy will be a bit more complicated, you'll need a little bit of additional code, no way to use the same bucket for static and media files (which might be a bad idea anyway but still). Overall, I do not recommend this way for anyone who doesn't understand everything in my post, use it on your own risk, but I personally find this idea pretty neat.