What's new in ContainerPilot 3.0

June 16, 2017 - by Tim Gross

This week we've released a new major version of ContainerPilot -- version 3.0. This version addresses a number of lessons we've learned about the many complex variations of how our users operate their containerized applications. We've also changed the configuration to make it easier to understand the lifecycle of the application and to provide better primitives for the application developer to manage complex behaviors. The design for version 3.0 was first presented in RFD86. Version 2 will continue to be supported until September 30, 2017. But in the meanwhile, let's take a look at what's changed.

ContainerPilot is an init system

In the original design for ContainerPilot, we "shim" a single application -- ContainerPilot blocks until this main application exits and spins up concurrent threads to perform the various lifecycle hooks. ContainerPilot was originally dedicated to the control of a single persistent application, and this differentiated it from a typical init system like sysvinit, upstart, or systemd.

But as new uses appeared in the community, we expanded the polling behaviors of health checks and onChange handlers to include periodic tasks. Correctly supporting Consul or etcd on Triton (or any CaaS/PaaS) means requiring an agent running inside the container, and so we also expanded to include persistent coprocesses. There are a lot of moving parts and the interactions between them all have proven to be restrictive.

In v3 we've thrown away the concept of a "main" application and embraced the notion that ContainerPilot is an init system for running multiple processes inside containers. Each process will have its own health check, dependencies, frequency of run or restarting ("run forever", "run once", "run every N seconds"), and lifecycle hooks for startup and shutdown. In ContainerPilot 3 we've rolled all the various behavior hooks into a single abstraction called a job.

A v3 job can be a persistent application that restarts whenever it stops, a task that runs every few seconds, or one time task that runs when another job starts, stops, or changes status. The end user is empowered to take control over the entire lifecycle of their container by building a chain of dependencies out of the jobs inside it.

New API

Previously users have controlled ContainerPilot by firing signals from their handlers. But there are a limited number of Unix signals available and they can only start or toggle a behavior asynchronously without any feedback. In v3 we've added a new HTTP control plane. Your job behaviors can now use either ContainerPilot subcommands or send HTTP requests to a socket inside the container to update environment variables, toggle maintenance mode, record metrics for the Prometheus endpoint, or reload the ContainerPilot configuration.

New configuration language

By consolidating the various behavior hooks, we've greatly simplified the configuration options. At the same time, we've made it so that job configuration behaviors are much more fine-grained than in v2. Every job will emit events associated with the lifecycle of its process. Any job can react to the events emitted by any other job (or even its own events) via an optional when configuration field.

Suppose we want our Nginx job to start only after we've started a Consul agent and written configuration data from Consul. We might have a jobs configuration that looks like the following:

consul: "localhost:8500"
jobs: [
  {
    // note that without a 'port' field this job won't be registered
    // with service discovery, which is what we want.
    name: "consul-agent",
    exec: ["consul", "agent", "-config=/etc/consul/",
           "-retry-join", "{{ .CONSUL | default consul }}"],
    restarts: "unlimited"
    health: {
      // this health check will fire events inside this container
      // but not update the external service catalog
      exec: "consul info | grep leader"
    }
  },
  {
    name: "setup",
    exec: ["consul-template", "-once",
           "/etc/nginx/nginx.ctmpl:/etc/nginx/nginx.conf"]
    when: {
      source: "consul-agent",
      once: "healthy",
      timeout: "60s"
    }
  },
  {
    name: "nginx",
    exec: "nginx -g daemon off",
    port: 80,
    when: {
      source: "setup",
      once: "exitSuccess",
      timeout: "60s"
    }
    health: {
      // because we have a 'port' field set above for Nginx, this
      // health check will cause this job to be registered with the
      // service catalog
      exec: ["curl", "--fail", "-o", "/dev/null",
             "http://localhost:80/status"],
      interval: 5,
      ttl: 10
    }
    restarts: "unlimited"
  }
]

You can see in the configuration above that the Nginx job depends on the "setup" job to exit with success, which in turns depends on the "consul-agent" job to be marked healthy. You might also notice that the configuration language is no longer JSON. ContainerPilot now uses the more human-friendly JSON5. We chose this because it's easier for users to write and understand, but it's also backwards compatible for those users who might be generating JSON configurations in code.

Consul only

Hashicorp's Consul provides a number of higher-level capabilities than a simple KV store like etcd or zookeeper. Providing the ability to use these capabilities in ContainerPilot has meant going with a least-common-denominator approach and/or having complex provider-specific configuration options for tagging interfaces, providing secure connection to the backend, and faster deregistration on shutdown, among others.

So in v3 we've decided to double-down on Consul and drop support for etcd. Keep an eye out in the coming months for new features that take advantage of Consul's stronger abstractions as well as its multi-datacenter support.

Migrating to v3

You can upgrade to v3 incrementally. Because ContainerPilot only coordinates with other containers via Consul you don't need to upgrade all your application containers at once. We've updated our application blueprints for Nginx, Consul, and MySQL to demonstrate how v3 works, and we'll be updating many of our other Autopilot Pattern example applications in the coming weeks.

Let's walk through how you might port an application using ContainerPilot v2 to use ContainerPilot v3. The autopilotpattern/hello-world has been ported to v3. Below is how the ContainerPilot configuration looked in v2:

{
  "consul": "consul:8500",
  "preStart": "/bin/reload-nginx.sh preStart",
  "logging": {
    "level": "DEBUG",
    "format": "text"
  },
  "services": [
    {
      "name": "nginx",
      "port": 80,
      "interfaces": ["eth1", "eth0"],
      "health": "/usr/bin/curl -o /dev/null --fail -s http://localhost/health",
      "poll": 10,
      "ttl": 25
    }
  ],
  "backends": [
    {
      "name": "hello",
      "poll": 3,
      "onChange": "/bin/reload-nginx.sh onChange"
    },
    {
      "name": "world",
      "poll": 3,
      "onChange": "/bin/reload-nginx.sh onChange"
    }
  ]
}

In the original we had a preStart configuration, which will now become its own job. And look, with JSON5 we can have comments in our configuration!

{
  // without a "when" field this will start first
  name: "preStart",
  exec: "/bin/reload-nginx.sh preStart"
},

We want the Nginx job to wait until that job is done, so we'll add a when field to the Nginx job. The configuration for health checks has more options now, so we'll need up change the health check configuration as well:

{
  name: "nginx",
  port: 80,
  interfaces: ["eth1", "eth0"],
  exec: "nginx",
  when: {
    source: "preStart",
    once: "exitSuccess"
  },
  health: {
    exec: "/usr/bin/curl -o /dev/null --fail -s http://localhost/health",
    interval: 10,
    ttl: 25
  }
}

The last thing to change is the backends. In v3 these have become watches, but you'll see that the onChange behaviors have been moved into their own jobs. This lets you have fine-grained control over how you want ContainerPilot to respond to a change. For example, maybe you want a job to respond only if the watched service is unhealthy, or maybe you need to have two different actions run in parallel when there's a change. In the configuration below we've kept the exact same behavior as before:

jobs: [
  {
    name: "onchange-hello",
    exec: "/bin/reload-nginx.sh onChange",
    when: {
      source: "watch.hello",
      each: "changed"
    }
  },
  {
    name: "onchange-world",
    exec: "/bin/reload-nginx.sh onChange",
    when: {
      source: "watch.world",
      each: "changed"
    }
  }
],
watches: [
  {
    name: "hello",
    interval: 3
  },
  {
    name: "world",
    interval: 3
  }
]

The whole configuration can be found in GitHub and you can follow along with PR #22, to see how the other container images were updated.

Try it!

As you can tell from all the above, we've made some major changes to the way ContainerPilot works in v3. We think these changes reflect the best practices we've seen in the community and will position ContainerPilot as a helpful tool for a wide variety of use cases.

Version 2 will continue to be supported until September 30, 2017. Paid Joyent support customers will have access to extended support (talk to your solutions engineer or sales rep). We'll make bug fixes and backport any security issues to v2, but v2 won't get any new features.

Check out the new ContainerPilot documentation, join the ContainerPilot community on Github, and try ContainerPilot v3 for yourself!