docker Featured

Docker Swarm Basics

Uğur Aslan

Oct 3, 2020 • 6 min read

Docker Engine comes with ready-to-use cluster management and orchestration feature called "Swarm mode". Docker Swarm comes with a lot of features like load balancing, service discovery, multi-host overlay networking, scaling, rolling updates and declarative service definitions which can be used to define volumes, secrets, configs, networks, etc. as docker-compose files.

You can define your project components/services as a single docker-compose file and deploy it as a stack. It makes very easy to integrate automatic deployment services like Gitlab CI/CD for all of your services.

In addition to that, you can define replicas (the number of containers you want to run under a service definition) and Docker Swarm can distribute these replicas over multiple Swarm Nodes automatically if you enabled Docker Swarm on multiple servers.

Reverse Proxy and Auto Scaling problems

There are 2 main things are missing in Docker Swarm :

Auto Scaling
Ability to use different domain names for load balancer to serve different projects over the same external port.

If you are searching for an auto-scaling behavior, you should consider using many services/products of Cloud Providers or you should consider setting up Kubernetes which might be very time-consuming to adapt at the initial phase of your migration, if you don't already have any expertise on it. Another option is manually writing an auto-scaling algorithm based on your service metrics, which can automatically scale up or down the number of containers in your service definition without having any downtime.

Regarding the second issue, there are some good solutions. The first one is Traefik Proxy. Traefik is a reverse proxy and a load balancer as Nginx or HAProxy. But the difference is Traefik uses service discovery to dynamically configure itself to distribute the incoming traffic to specific services running behind the proxy. In other words, you set up Traefik once and it gets all the incoming requests. When you add a service to Docker Swarm, you add some configuration parameters in the service definition to enable Traefik for that service. So you don't have to change the configuration of Traefik itself to configure reverse-proxy rules for your service.

Traefik also provides SSL termination and it can also generate SSL certificates for the domains of your service automatically.

An alternative to Traefik is using the project called "Nginx-Proxy". It also automatically creates proxy configurations based on service definitions. But this feature is only available directly for single host Docker Swarm environments. If you have multiple hosts then you should check this blog post of the project owner to learn how to set up multi host service discovery with Nginx-Proxy.

Nginx-Proxy project supports additional backend related protocols like uWSGI, FastCGI which can be important features for some Python applications or PHP projects.

Lastly, to solve the same problem you can check out "Docker Flow Proxy" which uses HAProxy under the hood. They have several tutorials in their web site about setting it up with Docker Swarm. It also solves the problem of reverse proxy and provides similar features as Traefik and Nginx-Proxy.

Deployment and zero downtime

If you are new to containers and tried to deploy your project as a single Docker image to a production environment, you quickly realize you need something for easy deployment. Running docker run ... commands in the production is not a good approach. At that step, you definitely need something like AWS Elastic Beanstalk, Fargate or GCloud Cloud Run or any similar service/product to handle all the deployment headaches for you.

And not all projects can be single Docker images which can run independently. Eventually, you have to combine a few containers together (database, api, frontend, etc.) to actually define your project as a combination of services. At that stage, you probably start using docker-compose files. But running docker-compose up is not much different than running docker run .... Even you set up CI/CD which can automatically run commands for you, you will probably face downtimes while updating services. (Cloud products can also solve this of course).

Docker Swarm solves this deployment problem automatically. You can create service definitions as separate docker-compose files or you can combine all the services in a single docker-compose.yml. In both cases, by using Docker Swarm you can define a "stack" which contains all the services/containers of your project and deploy any of them by using the same stack name.

docker stack deploy -c docker-compose.yml myProjectStack

If you have, for example, 3 different services like api, postgres, frontend each having separate docker-compose files : api.yml, postgres.yml, frontend.yml, you can separately deploy them to the same stack :

docker stack deploy -c api.yml myProjectStack
docker stack deploy -c postgres.yml myProjectStack
docker stack deploy -c frontend.yml myProjectStack

This gives freedom to update and manage services belongs to the same project in different git repositories with separate CI/CD pipelines.

In addition to this, it is possible to have almost zero downtime while updating a service by using stack deploy stage by defining deployment policy like this :

deploy:
  replicas: 2
  update_config:
    delay: 10s
    failure_action: rollback
    order: start-first

According to this configuration, the number of containers must be running for the service is 2 (replicas : 2) . When an update is received, Docker Swarm will run new containers first ( order : start-first ) before removing existing running containers. If a failure occurs about the update, Docker Swarm will rollback to previous containers. You can find more options and configuration here.

This configuration does not only provides nearly zero downtime, it also provides a rollback option. You can also control how many replicas should be updated at once by specifying parallelism value.

Restarting containers automatically when a container exits with an error

If the application inside a container exits with an error, it must be either removed from the load balancer and replaced as soon as possible, or it must be restarted if restarting could be a solution (at least temporarily).

Docker Swarm provides a solution to this problem by ability to define a restart policy to restart containers when a failure occurs :

deploy:
  restart_policy:
    condition: on-failure
    delay: 5s
    max_attempts: 3
    window: 120s

By this configuration, Docker swarm can restart containers if a failure occurs. It checks if the restart is successful in 120 seconds window and tries to restart 3 times by placing a 5 seconds delay between restart attempts. You can check the official documentation for more details.

Managing configuration files

One of the problems of deploying containers to the production is shipping external configuration files which are mapped as bind mounts in docker-compose files. The simple solution might be creating a custom Docker image by using a Dockerfile which includes those custom configuration files. But in some cases you might want to use official Docker images and building a custom image on every configuration change might not be a good solution.

Docker swarm mode supports Docker configs which is introduced with Docker Engine v17.06. In your docker-compose file, as you define volumes, you can define configs :

configs:
  myConfig:
    file: ./myConfig.conf

While deploying to a stack in a CI/CD pipeline, if ./myConfig.conf file is present, Docker takes the content of the file as a named config data. You can map defined configs as files inside the container in your docker compose service definition.

    configs:
      - source: myConfig
        target: /etc/myServerApp/config.conf

So, in Swarm mode, you don't have to build custom Docker images if you only need to send some config files with your docker-compose file. Each docker config definition can keep up to 500 Kb data.

This might be also useful initiating a predefined database schema for your database service in Docker Swarm.

Managing sensitive data

Another issue for container deployments is managing sensitive data like some passwords, encryption keys, etc. Docker has a feature named Docker Secrets which safely stores sensitive data and can distribute safely over multiple Docker Swarm nodes. Each secret can keep data up to 500 Kb as configs. The differences are :

Docker configs are not encrypted at rest.
Docker configs can be mounted to any target path
Docker secrets are mounted under /run/secrets and only their file name can be changed.

version: "3.8"

services:
  myservice:
    image: myDockerImage:latest
    deploy:
      replicas: 1
    secrets:
      - source: myKey
        target: sshKey
secrets:
  myKey:
    file: ./myKey.pem

In this example, the content of ./myKey.pem will be available to the container at /run/secrets/sshKey.

While deployment in a CI/CD pipeline, sensitive data files can be created dynamically by using environment variables or any other method. There can be many options based on which CI/CD tool is used.

Other features

Docker Swarm also has network related features which might help you to isolate projects from each other to provide better security. You can create internal overlay networks which are connected to a bridge network to have full isolation.

volumes are available for also Swarm mode to store persistent data created by containers. volumes persists even the related container, service or stack is deleted. So, you can continue where you left off after deleting and re-creating a stack or service.

Although Docker Swarm can be used with multiple hosts to distribute the workload over hosts, usually on any host machine there are multiple services running at the same time. In that case, sometimes you might want to place some resource constraints for a service related to CPU or memory usage. This can also be done by specifying resource constraints for services :

deploy:
  resources:
    limits:
      cpus: '0.50'
      memory: 50M
    reservations:
      cpus: '0.25'
      memory: 20M

Lastly, in Swarm mode, service health checks can also be used to identify and recover problems. Health check is also useful for providing health status to any external monitoring tool or service to have notifications about it.