Unikernels, Docker, and Why You Should Care
Docker's recent acquisition of Unikernel Systems has sent pulses racing in the microservice world. At the same time, many people have no clue what to make of it, so here's a quick explanation of why this move is a good thing.
Although you may not be involved in building or maintaining microservice-based software, you certainly use it. Many popular Web sites and services are powered by microservices, such as Netflix, eBay and PayPal. Microservice architectures lend themselves to cloud computing and "scale on demand", so you're sure to see more of it in the future.
Better tools for microservices is good news for developers, but it has a benefit for users too. When developers are better supported, they make better software. Ultimately that means more features and fewer bugs for everyone else. Of course, that's a rather lazy argument. So here's a more detailed description of Docker and unikernels.
Docker is a tool that allows developers to wrap their software in a container that provides a completely predictable runtime environment.
To appreciate containers fully, it's necessary to understand virtual machines. A virtual machine is pretty much what it sounds like: a non-actual machine--a simulation, if you will. In other words, it acts like a single computer complete with hardware, a filesystem, an operating system, services and application software. Because it's simulated, you can run several of them on the same machine.
Why would you do such a thing? There are a few reasons why it's a good idea.
The first reason is to run software that is built for a different operating system. For instance, if you are developing an Android app on your Ubuntu laptop, you can use a virtual machine to test that the app is working properly. Or, if you can't get your Windows programs to run with Wine, you can run Windows in VirtualBox. In these examples, VMs spare you the pain of switching operating systems or devices.
VMs have become essential in the high-volume world of enterprise computing. Before VMs became popular, physical servers often would run a single application or service, which was a really inefficient way of using physical resources. Most of the time, only a small percentage of the box's memory, CPU and bandwidth were used. Scaling up meant buying a new box--and that's expensive.
VMs meant that multiple servers could run on the same box at the same time. This ensured that the expensive physical resources were put to use.
VMs are also a solution to a problem that has plagued developers for years: the so-called "it works on my machine" problem that occurs when the development environment is different from the production environment. This happens very often. It shouldn't, but it does. It's normal to find different versions of software running on different machines. Programs can be very sensitive to their running environment, especially when it comes to their dependencies. A small difference in a library or package can break code that works on the developers machine.
Of course, employers and clients aren't impressed with the "it works on my laptop" argument. They want it to work on their machines too.
If the development machine and the production machine use identical VMs, the code should run perfectly in both environments. Using the abstraction of a virtual machine, you can exercise a great deal of control over your runtime environment.
Although VMs solve a lot of problems, they aren't without some shortcomings of their own. For one thing, there's a lot of duplication.
Imagine you have two CentOS VMs running together on a server. Both of them contain complete CentOS installations, from the kernel through the complete suite of GNU apps and utilities, standard services, language runtimes, software packages and scripts. The only difference between the VMs is the specific application code, its data files and its dependencies.
Containers, such as Docker, offer a more lightweight alternative to full-blown VMs. In many ways, they are very similar to virtual machines. They provide a mostly self-contained environment for running code. The big difference is that they reduce duplication by sharing. To start with, they share the host environment's Linux kernel. They also can share the rest of the operating system.
In fact, they can share everything except for the application code and data. For instance, I could run two WordPress blogs on the same physical machine using containers. Both containers could be set up to share everything except for the template files, media uploads and database.
With some sophisticated filesystem tricks, it's possible for each container to "think" that it has a dedicated filesystem. It's a little complex to describe in detail here, but trust me when I tell you that it's wicked cool.
Containers are much lighter and have lower overhead compared to complete VMs. Docker makes it relatively easy to work with these containers, so developers and operations can work with identical code. And, containers lend themselves to cloud computing too.
So what about microservices and unikernels?
Microservices are a new idea--or an old idea, depending on your perspective.
The concept is that instead of building a big "monolithic" application, you decompose your app into multiple services that talk to each other through a messaging system--a well-defined interface. Each microservice is designed with a single responsibility. It's focused on doing a single simple task well.
If that sounds familiar to you as an experienced Linux user, it should. It's an extension of some of the main tenets of the UNIX Philosophy. Programs should focus on doing one thing and doing it well, and software should be composed of simple parts that are connected by well-defined interfaces.
Microservices typically run in their own container. They usually communicate through TCP and the host environment (or possibly across a network).
The advantage of building software using microservices is that the code is very loosely coupled. If you need to fix a bug or add a feature, you need to make only changes in a few places. With monolithic apps, you probably would need to change several pieces of code.
What's more, with a microservice architecture, you can scale up specific microservices that are feeling strain. You don't have to replicate the entire application.
Using containers to develop and deploy a microservice architecture supports the goal of scalability, but it also introduces some drawbacks.
For one thing, each container consumes more resources than it ever will need. Each one is essentially a complete GNU Linux system, but each microservice uses only a few features of the underlying operating system. Each running service consumes memory and CPU cycles, and many of them are completely unnecessary.
Let's consider the secure shell (SSH) service. This may be useful for microservices that administrators will interface with directly, but it's expensive baggage for microservices that expose a simple TCP interface. There are many features that these microservices don't need.
Efficiency isn't the only concern here. When containers share the same Linux kernel, that opens the door to a set of security exploits. Malicious code that exploits a kernel weakness could potentially affect other containers running on the same machine.
Linux is a "kitchen sink" system--it includes everything needed for most multi-user environments. It has drivers for the most esoteric hardware combinations known to man.
But in the world of microservices, that level of support is strictly overkill. There's no need for a complete collection of services, and the host hypervisor will expose a minimal set of virtual devices, removing the need for an extensive collection of device drivers. Even with clever container tricks, such as sharing files and code between containers, there is still a lot of wastage.
Unikernels are a lighter alternative that is well suited to microservices. A unikernel is a self-contained environment that contains only the low-level features that a microservice needs to function. And, that includes kernel features.
This is possible because the environment uses a "library operating system". In other words, every kernel feature is implemented in a low-level library. When the microservice code is compiled, it is packed together with the features it needs, and general features the microservice doesn't use are stripped away.
The resulting bundle is much, much smaller than a dedicated VM or container. Instead of bundling up Gigabytes of generic code and features, a unikernel can ship a complete microservice in a few hundred kilobytes. That means they are very fast to boot and run, and more of them can run at the same time on the same physical box.
Unikernels also are naturally more secure than containers or VMs. If attackers are able to gain access to a container or VM, they have an entire Linux installation to exploit. A unikernel, on the other hand, has only a few features to exploit, and this seriously restricts the havoc unauthorized users can wreak.
Unikernels are great, but it has been hard for developers to work with them in the past. Docker is a tool that makes it easy to containerize applications and microservices.
Docker's recent acquisition of Unikernel Systems means it will be extending support for unikernels, making it easier to use them in real development and production environments. Considering the wide range of benefits that unikernels expose to modern architectures, that's exciting news.