Friday, July 10, 2015

Using Docker: Is it worth it?

Many companies these days are starting using Docker in their workflow, both big shots and tiny startups, even open-source projects which might sometimes be somewhat conservative when it comes to tools. I'm not trying to provide any kind of instruction here but instead to ask questions you should be asking yourself before following or not following their lead. I'm concentrating on context of web services using Django and Python, but it should be applicable to more or less anything.

I must admit that Docker is cool. Linux containers is a fascinating thing itself and Docker is a great tool for simplifying their use for whatever purpose. Reusable and simple to build and share images, an easy way to launch those images and do stuff with them is a must if you use them. But should you containerize everything? It might, on the other hand, over-complicate your process without any significant benefits. So the first question is: do you even need containers or would you better off without them, using more traditional tools? Or maybe you should use them for some things and not use for some other ones?

Like building images for continuous integration and deployment while doing local development old-fashioned way. This is actually a big point. Because whenever I tried it for local development, Docker wasn't on their best foot. Compare two processes with the same Django project, you have the usual requirements.txt file describing stuff you should install (in virtualenv, usually), probably some additional requirements not needed outside you own environment (like django-debug-toolbar), you have Django's development webserver (./ runserver) that will seamlessly reload every time you change anything with code, you have some RDBMS (might just be sqlite, but usually the same one you use on production) with some test data in it making your development process easier, and some system requirement you just install once before you start developing (version of your DBMS is sometimes important, everything else usually less so). Traditionally, once you have installed system stuff, you just need two or three commands to setup your virtualenv and then you just launch devserver and start hacking. With Docker you'd have to build an image and launch container, often you'd have to launch another container for DBMS too (image can often be just pulled), and when you change your code, you rebuild and relaunch. Do you need that rebuild part after the changes? Can't you just mount your code into container? You can (look up volume in docker's documentation) but it will make it even more complicated as the process in container starts creating files in you local filesystem as a different user, making a mess and security breaches there. I won't even start explaining what kind of mess would be with boot2docker. Also it would complicate the deployment as you'll have to match images with the code revisions, etc. The point is, if you need to generate some data and keep it, containers might be not the best way of doing it (imagine yourself generating, say, a migration inside container then copying it with `docker cp` out, then building some special container for generating another thing because inside you usual container you don't have data for it and outside it you don't have tools or some other data...)

Regardless of whether or not you like to rebuild containers every time you change a line of code, should you use Docker for testing (especially continuous integration) and deployment. Isn't it great to have exactly the same image in your CI service and on production server? While it does sound appealing, remember that half of the stuff you need for testing is not needed on production and half of the production stuff is too complicated to be used for testing. It doesn't mean you can't either, and personally I like how it makes concurrent builds easy: not conflicts over database, or port, or some other resource whatsoever. But then if you try to use some paid service you might get stuck with an outdated version of docker that can't do many advanced stuff and without any kind of caching for images, even for images from the official registry (true story from one of the popular services). Are you comfortable setting up Jenkins for your CI needs or does your service has an up-to-date version of Docker?

And to the last important question. Docker is usually used to avoid problems arising from different environments. Are your environments different? If you have CI server, production server, staging server, and your own computer all running the same LTS version of same distribution of Linux, do you have much differences? Are other developers using the same distribution? (Yes, we all know what distribution it is). And answer not counting non-Linux users because they won't be able to use Docker natively anyways and since they need virtualization they might just as well use the virtual machine for testing and stuff without additional layer of Docker and containers. And if you have many non-Linux users, maybe you just need some virtualization solution (there are several easy to use tools running on top of VirtualBox or something).

As I said, Docker is a great tool but you should not use a microscope for hammering nails just because it's a cool tool or because some other guys do it. Use the tools that are great for your particular tasks.