Why Ansible?

Ansible is a configuration management tool to automate tasks in your IT infrastructure. It offers a rather low barrier of entry, when compared to other tools. A local Ansible installation (i.e. on your machine) with SSH access to the infrastructure you want to manage is sufficient for getting started. Meaning, it requires no substantial additions to existing infrastructure (e.g. management servers or agents to install). Ansible also ships with an extensive standard library and has a large selection of modules to extend functionality.

Why Continuous Deployment?

Once a simple Ansible setup is up & running and things start to scale to more contributors, servers or services, it is usually necessary to automate the integration of code changes. By creating one or more central Ansible repositories, we create a single source of truth for our infrastructure. We shift to continuous integration, start testing and verifying changes to the code base.

The next logical step is then to use automate the deployment of this single source of truth, to make sure changes are applied in a timely/consistent manner. Infrastructure code that is not deployed on a regular basis tends to become riskier to deploy each day, since it’s better to discover errors promptly so that they can be traced back to recent code changes; and we all know that people make undocumented hand-crafted changes that are then overwritten and all goes up in flames. Thus we want shorter, more frequent cycles and consistent deployments to avoid our infrastructure code becoming stale and risky.

Why not AWX/Tower/AAP?

AWX (aka. Tower, now Ansible Automation Platform) aims to provide a continuous deployment experience for Ansible. Quote:

Ansible Tower(formerly ‚AWX‘) is a web-based solution that makes Ansible even more easy to use for IT teams of all kinds.

It offers a wide array of features for all your ‚Ansible at scale‘ needs, however it comes with some strings attached. Namely, it involves management overhead for smaller environments as it introduces yet another tool to install, learn, update and manage throughout its life cycle. Not only that but from version 18.0 onward the preferred way to install AWX is the AWX (Kubernetes) Operator. Meaning – preferably – we would need a Kubernetes instance laying around somewhere. Of course, there is always the option to use „unorchestrated“ Containers as an alternative, but that comes with its own obstacles.

Installation and management aside, there is also Red Hat’s upstream first approach to consider. Meaning, AWX is the upstream project of Ansible Tower and thus it might not be as ’stable‘. Furthermore, Red Hat does not recommend AWX for production environments. Quote:

The AWX team currently plans to release new builds approximately every 2 weeks. The AWX team will flag certain builds as “stable” at their discretion. Note that the term “stable” does not imply fitness for production usage or any kind of warranty whatsoever.

Obviously, there are alternatives to AWX/Ansible Tower. Rundeck allows for predefined workflows, these jobs can then be triggered from a Web GUI, API, CLI, or by schedule and works not just with Ansible. Semaphore offers a simple UI for Ansible to manage projects (environments, inventories, repositories, etc.) and includes an API for launching tasks. Puppet aficionados may already know Foreman, which is a great and battle-tested tool for provisioning machines. You can use the „Foreman Remote Execution“ to run your Playbooks and use Ansible callbacks to register new machines in Foreman. Here are some recommended videos on this topic:

– FOSDEM 2020, Foreman meets Ansible: https://www.youtube.com/watch?v=PQYCiJlnpHM
– OSCamp 2019, Ansible automation for Foreman (hosts): https://www.youtube.com/watch?v=Lt0MksAIYuQ

That being said, the premise was to avoid substantially extending any existing infrastructure. Any of the mentioned tools need at least an external database service (e.g. MariaDB, MySQL or PostgreSQL). With that in mind, this article will now describe alternative solutions for continuous deployment without AWX/Ansible Tower. It will show examples using the GitLab CI, however, the presented solutions should be adaptable to various CI/CD solutions.

Ansible Continuous Deployment via the Pipeline

For this article, we will assume a central Ansible Repository on an existing GitLab Server with some GitLab CI Pipeline already in place. Meaning, we might also have some experience with CI jobs in Containers.

Many (if not all) CI/CD solutions feature isolated jobs within Containers, which enables us to quickly spin up predefined execution environments for these jobs (e.g. pre-installed with various tools for testing). Furthermore, it is possible to use specific machines for specific jobs, or place certain machine in different network zones (e.g. a node that triggers something in production environment could be isolated from the rest).

Given this setup we will now explore two scenarios for Ansible Continuous Deployment via pipeline jobs. One based on SSH and the other based on HTTP (Webhooks).

The example Ansible repository follows a standard pattern and is safely stored in a git repository:

git clone git@git.my-example-company.com:ansible/ansible-configuration.git
cd ansible-configuration/

ls -l
ansible.cfg
playbooks/
roles/
inventory/
collections/
site.yml
requirements.yml

SSH

Since the basis for all Ansible deployments is SSH we will leverage this protocol to deploy our code. Fundamentally, there are two options to achieve this:

– Connect from a pipeline job to a central machine with Ansible already installed, download the code changes there and trigger a playbook
– Run an Ansible playbook directly in a pipeline job (i.e. a Container)

For this example we will generate a specific SSH Keypair that is then used in the pipeline. The public key needs to be added to the `authorized_keys` of any machines we want to connect to. Secrets such as the SSH private key can be managed directly in GitLab (CI Variables) or be stored in an external secret management tool (e.g. Hashicorp Vault). Don’t hardcode secrets in the Ansible code base or CI configuration.

# -t keytype (preferably use ed25519 whenever possible)
# -f output file
# -N passphrase
# -C comment

ssh-keygen -t ed25519 -f ansible-deployment -N '' -C 'Ansible-Deployment-Key'

Option A: via an Ansible machine

In this scenario, we connect from a CI job in the pipeline to a machine with Ansible already installed. On this machine we will clone the Ansible configuration and trigger a playbook. This article will refer to this machine as ‚central Ansible node‘, obviously a more complex infrastructure might need more of these machines (i.e. per network zone).

First, we need to copy the previously generated SSH Key onto the central Ansible node, so that we connect from the GitLab CI job. Second, we require a working Ansible setup on this node. Please note, that a detailed installation process will not be explained in this article, since the focus lies on the CI/CD part. We assume that this node has a dedicated user for Ansible is be able to successfully run the Ansible code.

# Copy public key for deployment on the central Ansible node
scp ansible-deployment.pub ansible@central-ansible-node.local
ssh ansible@central-ansible-node.local

# Authorize the public key for outside connections
cat 'ansible-deployment.pub' >> ~/.ssh/authorized_keys

# Install Ansible
pip3 install --user ansible # or ansible==version
# Further setup like inventory creation or dependency installation happens here...

At this point we assume, we can connect to our infrastructure and run Ansible playbooks at our leisure. Next we will create a GitLab CI job which do the following:

  • Retrieve the previously generated SSH private key from our secrets, so that we can connect to the central Ansible node
  • Connect to the central Ansible node and clone the repository there. We will use the GitLab’s CI job tokens for this
  • Create a temporary directory to isolate each pipeline job
  • Run a playbook via SSH on the central Ansible node
---
stages:
- deploy

variables:
CENTRAL_ANSIBLE_NODE: central-ansible-node.local
# Or you can provide a ssh_known_hosts file
ANSIBLE_HOST_KEY_CHECKING=False

deploy-ansible:
image: docker.io/busybox:latest
stage: deploy
before_script:
- mkdir -p ~/.ssh
# SSH_KNOWN_HOSTS is a CI variable to make sure we connect to the correct node
- echo $SSH_KNOWN_HOSTS ~/.ssh/known_hosts
# The SSH private key is a CI variable
- echo $SSH_PRIVATE_KEY > id_ed25519
- chmod 400 id_ed25519
script:
- TMPDIR=$(ssh -i id_ed25519 $CENTRAL_ANSIBLE_NODE "mktemp -d")
- ssh -i id_ed25519 $CENTRAL_ANSIBLE_SERVER "git clone https://gitlab-ci-token:${CI_JOB_TOKEN}@git.my-example-company.com:ansible/ansible-configuration.git $TMPDIR"
- ssh -i id_ed25519 $CENTRAL_ANSIBLE_SERVER "ansible-playbook $TMPDIR/site.yaml"

This basic example can be extended in many ways. For example, CI variables could be used to control which Ansible playbook is executed, change which hosts or tags are included. Furthermore GitLab can also trigger jobs on a schedule. Some of the benefits of this approach are that it is rather easy to set up since it mirrors the local execution workflow, plus the deployment can be debugged and triggered on the central Ansible node.

However, we now have a central Ansible node to manage and we might need several in different network zones. Additionally the `mktemp` solution is a bit hacky and might need a garbage collection job (e.g. `tmpreaper`). The next solution will alleviate some of these issues.

Option B: directly via a Pipeline

In this scenario, Ansible executed directly in the CI pipeline job (i.e. a Container). It is recommended to use a custom pre-build Ansible Container Image, to make the jobs faster and more consistent. This Image may contain a specific Ansible version and further tools required for the given code. The Image can be stored in the GitLab Container Registry. Building and storing Container Images is outside the scope of this article. Here’s a small example of how it might look like:

cat Dockerfile.ansible.example

FROM docker.io/python:3-alpine
RUN pip install --no-cache-dir ansible
# ... Install further tools or infrastructure specifics here
# The image will be stored at registry.my-example-company.com:ansible/ansible-configuration/runner:latest

cat .gitlab-ci.yaml
---
stages:
- deploy

variables:
# Or you can provide a ssh_known_hosts file
ANSIBLE_HOST_KEY_CHECKING=False

deploy-ansible:
image: registry.my-example-company.com:ansible/ansible-configuration/runner:latest
stage: deploy
before_script:
# The SSH private key is a CI variable
- echo $SSH_PRIVATE_KEY > id_ed25519
- chmod 400 id_ed25519
script:
- ansible-playbook --private-key id_ed25519 site.yaml

This removes the need for a central Ansible node and the need for external garbage collection, since these CI jobs are ephemeral by default. That being said, if we have a more complex network setup we might need runners in these zones and a way to control which job is executed where.

HTTP (Webhooks)

In this scenario, we setup another central Ansible node that will run the playbooks, however, there won’t be a SSH connection from the CI job. Instead we will trigger a webhook on this central Ansible node. While this scenario is more complex it offers some benefits when compared to previously discussed options.

Since there are several ways to implement incoming webhooks, we will not view a specific implementation but discuss the concept. Interestingly enough, a webhook-based feature is currently in developer preview to be provided by Ansible. Event-Based-Ansible provides a webhook service that can trigger Playbooks.

In this example we have a service providing webhooks running on central-ansible-node.local on port 8080. This service is configured to run Ansible with various options which we can pass via a HTTP POST request. This request will certain data that controls the Ansible playbook.

cat trigger-site-yaml.json
{
"token": "$WEBHOOK_TOKEN",
"playbook": "site.yaml",
"limit": "staging"
}

cat .gitlab-ci.yaml
---
stages:
- deploy

variables:
CENTRAL_ANSIBLE_NODE: central-ansible-node.local:8080

deploy-ansible:
image: docker.io/alpine:latest
stage: deploy
before_script:
- apk add curl gettext
script:
# Replace the $WEBHOOK_TOKEN placeholder in the file with the real value from the CI variables
- envsubst < trigger-site-yaml.json > trigger-site-yaml.run.json
- curl -X POST -H "Content-Type:application/json" -d @trigger-site-yaml.run.json $CENTRAL_ANSIBLE_NODE

From a security standpoint we remove the need for reachable SSH ports, the central Ansible node now just accepts HTTP (or specific HTTP methods) secured by Tokens. Furthermore there now is a layer between our CI jobs and the Ansible playbooks which can be used to validate requests.

That being said, this extra layer could also be seen as a hurdle that might break. And beside the central Ansible node we now need to manage a service that provides these webhooks. However, in the future Event-Based-Ansible might alleviate some of these issues.

Conclusions

Deploying Ansible is quite flexible due to its simple operational model based on SSH. As we have seen, there are some low-effort alternatives to AWX/Tower that can be applied in various use cases. However, at some point there is a maintainability tradeoff. Meaning, even though AWX/Tower might not appear as stable or is sometimes tricky to operate, once an environment is large enough it might be a better option than custom creations. Probably not a satisfying conclusion for an article named „without AWX/Tower“, I agree.

Foreman presents an interesting alternative due to its myriad of other features that you get with an installation. Finally, Event-Based-Ansible could be very promising webhook-based solution when it comes to automated deployments. Starting simple and then pivoting to a more complex system is always an option.

References

Markus Opolka
Markus Opolka
Consultant

Markus war nach seiner Ausbildung als Fachinformatiker mehrere Jahre als Systemadministrator tätig und hat währenddessen ein Master-Studium Linguistik an der FAU absolviert. Seit 2022 ist er bei NETWAYS als Consultant tätig. Hier kümmert er sich um die Themen Container, Kubernetes, Puppet und Ansible. Privat findet man ihn auf dem Fahrrad, dem Sofa oder auf GitHub.