Seite wählen

NETWAYS Blog

Ansible Continuous Deployment without AWX/Tower/AAP

Why Ansible?

Ansible is a configuration management tool to automate tasks in your IT infrastructure. It offers a rather low barrier of entry, when compared to other tools. A local Ansible installation (i.e. on your machine) with SSH access to the infrastructure you want to manage is sufficient for getting started. Meaning, it requires no substantial additions to existing infrastructure (e.g. management servers or agents to install). Ansible also ships with an extensive standard library and has a large selection of modules to extend functionality.

Why Continuous Deployment?

Once a simple Ansible setup is up & running and things start to scale to more contributors, servers or services, it is usually necessary to automate the integration of code changes. By creating one or more central Ansible repositories, we create a single source of truth for our infrastructure. We shift to continuous integration, start testing and verifying changes to the code base.

The next logical step is then to use automate the deployment of this single source of truth, to make sure changes are applied in a timely/consistent manner. Infrastructure code that is not deployed on a regular basis tends to become riskier to deploy each day, since it’s better to discover errors promptly so that they can be traced back to recent code changes; and we all know that people make undocumented hand-crafted changes that are then overwritten and all goes up in flames. Thus we want shorter, more frequent cycles and consistent deployments to avoid our infrastructure code becoming stale and risky.

Why not AWX/Tower/AAP?

AWX (aka. Tower, now Ansible Automation Platform) aims to provide a continuous deployment experience for Ansible. Quote:

Ansible Tower(formerly ‚AWX‘) is a web-based solution that makes Ansible even more easy to use for IT teams of all kinds.

It offers a wide array of features for all your ‚Ansible at scale‘ needs, however it comes with some strings attached. Namely, it involves management overhead for smaller environments as it introduces yet another tool to install, learn, update and manage throughout its life cycle. Not only that but from version 18.0 onward the preferred way to install AWX is the AWX (Kubernetes) Operator. Meaning – preferably – we would need a Kubernetes instance laying around somewhere. Of course, there is always the option to use „unorchestrated“ Containers as an alternative, but that comes with its own obstacles.

Installation and management aside, there is also Red Hat’s upstream first approach to consider. Meaning, AWX is the upstream project of Ansible Tower and thus it might not be as ’stable‘. Furthermore, Red Hat does not recommend AWX for production environments. Quote:

The AWX team currently plans to release new builds approximately every 2 weeks. The AWX team will flag certain builds as “stable” at their discretion. Note that the term “stable” does not imply fitness for production usage or any kind of warranty whatsoever.

Obviously, there are alternatives to AWX/Ansible Tower. Rundeck allows for predefined workflows, these jobs can then be triggered from a Web GUI, API, CLI, or by schedule and works not just with Ansible. Semaphore offers a simple UI for Ansible to manage projects (environments, inventories, repositories, etc.) and includes an API for launching tasks. Puppet aficionados may already know Foreman, which is a great and battle-tested tool for provisioning machines. You can use the „Foreman Remote Execution“ to run your Playbooks and use Ansible callbacks to register new machines in Foreman. Here are some recommended videos on this topic:

– FOSDEM 2020, Foreman meets Ansible: https://www.youtube.com/watch?v=PQYCiJlnpHM
– OSCamp 2019, Ansible automation for Foreman (hosts): https://www.youtube.com/watch?v=Lt0MksAIYuQ

That being said, the premise was to avoid substantially extending any existing infrastructure. Any of the mentioned tools need at least an external database service (e.g. MariaDB, MySQL or PostgreSQL). With that in mind, this article will now describe alternative solutions for continuous deployment without AWX/Ansible Tower. It will show examples using the GitLab CI, however, the presented solutions should be adaptable to various CI/CD solutions.

Ansible Continuous Deployment via the Pipeline

For this article, we will assume a central Ansible Repository on an existing GitLab Server with some GitLab CI Pipeline already in place. Meaning, we might also have some experience with CI jobs in Containers.

Many (if not all) CI/CD solutions feature isolated jobs within Containers, which enables us to quickly spin up predefined execution environments for these jobs (e.g. pre-installed with various tools for testing). Furthermore, it is possible to use specific machines for specific jobs, or place certain machine in different network zones (e.g. a node that triggers something in production environment could be isolated from the rest).

Given this setup we will now explore two scenarios for Ansible Continuous Deployment via pipeline jobs. One based on SSH and the other based on HTTP (Webhooks).

The example Ansible repository follows a standard pattern and is safely stored in a git repository:

git clone git@git.my-example-company.com:ansible/ansible-configuration.git
cd ansible-configuration/

ls -l
ansible.cfg
playbooks/
roles/
inventory/
collections/
site.yml
requirements.yml

SSH

Since the basis for all Ansible deployments is SSH we will leverage this protocol to deploy our code. Fundamentally, there are two options to achieve this:

– Connect from a pipeline job to a central machine with Ansible already installed, download the code changes there and trigger a playbook
– Run an Ansible playbook directly in a pipeline job (i.e. a Container)

For this example we will generate a specific SSH Keypair that is then used in the pipeline. The public key needs to be added to the `authorized_keys` of any machines we want to connect to. Secrets such as the SSH private key can be managed directly in GitLab (CI Variables) or be stored in an external secret management tool (e.g. Hashicorp Vault). Don’t hardcode secrets in the Ansible code base or CI configuration.

# -t keytype (preferably use ed25519 whenever possible)
# -f output file
# -N passphrase
# -C comment

ssh-keygen -t ed25519 -f ansible-deployment -N '' -C 'Ansible-Deployment-Key'

Option A: via an Ansible machine

In this scenario, we connect from a CI job in the pipeline to a machine with Ansible already installed. On this machine we will clone the Ansible configuration and trigger a playbook. This article will refer to this machine as ‚central Ansible node‘, obviously a more complex infrastructure might need more of these machines (i.e. per network zone).

First, we need to copy the previously generated SSH Key onto the central Ansible node, so that we connect from the GitLab CI job. Second, we require a working Ansible setup on this node. Please note, that a detailed installation process will not be explained in this article, since the focus lies on the CI/CD part. We assume that this node has a dedicated user for Ansible is be able to successfully run the Ansible code.

# Copy public key for deployment on the central Ansible node
scp ansible-deployment.pub ansible@central-ansible-node.local
ssh ansible@central-ansible-node.local

# Authorize the public key for outside connections
cat 'ansible-deployment.pub' >> ~/.ssh/authorized_keys

# Install Ansible
pip3 install --user ansible # or ansible==version
# Further setup like inventory creation or dependency installation happens here...

At this point we assume, we can connect to our infrastructure and run Ansible playbooks at our leisure. Next we will create a GitLab CI job which do the following:

  • Retrieve the previously generated SSH private key from our secrets, so that we can connect to the central Ansible node
  • Connect to the central Ansible node and clone the repository there. We will use the GitLab’s CI job tokens for this
  • Create a temporary directory to isolate each pipeline job
  • Run a playbook via SSH on the central Ansible node
---
stages:
- deploy

variables:
CENTRAL_ANSIBLE_NODE: central-ansible-node.local
# Or you can provide a ssh_known_hosts file
ANSIBLE_HOST_KEY_CHECKING=False

deploy-ansible:
image: docker.io/busybox:latest
stage: deploy
before_script:
- mkdir -p ~/.ssh
# SSH_KNOWN_HOSTS is a CI variable to make sure we connect to the correct node
- echo $SSH_KNOWN_HOSTS ~/.ssh/known_hosts
# The SSH private key is a CI variable
- echo $SSH_PRIVATE_KEY > id_ed25519
- chmod 400 id_ed25519
script:
- TMPDIR=$(ssh -i id_ed25519 $CENTRAL_ANSIBLE_NODE "mktemp -d")
- ssh -i id_ed25519 $CENTRAL_ANSIBLE_SERVER "git clone https://gitlab-ci-token:${CI_JOB_TOKEN}@git.my-example-company.com:ansible/ansible-configuration.git $TMPDIR"
- ssh -i id_ed25519 $CENTRAL_ANSIBLE_SERVER "ansible-playbook $TMPDIR/site.yaml"

This basic example can be extended in many ways. For example, CI variables could be used to control which Ansible playbook is executed, change which hosts or tags are included. Furthermore GitLab can also trigger jobs on a schedule. Some of the benefits of this approach are that it is rather easy to set up since it mirrors the local execution workflow, plus the deployment can be debugged and triggered on the central Ansible node.

However, we now have a central Ansible node to manage and we might need several in different network zones. Additionally the `mktemp` solution is a bit hacky and might need a garbage collection job (e.g. `tmpreaper`). The next solution will alleviate some of these issues.

Option B: directly via a Pipeline

In this scenario, Ansible executed directly in the CI pipeline job (i.e. a Container). It is recommended to use a custom pre-build Ansible Container Image, to make the jobs faster and more consistent. This Image may contain a specific Ansible version and further tools required for the given code. The Image can be stored in the GitLab Container Registry. Building and storing Container Images is outside the scope of this article. Here’s a small example of how it might look like:

cat Dockerfile.ansible.example

FROM docker.io/python:3-alpine
RUN pip install --no-cache-dir ansible
# ... Install further tools or infrastructure specifics here
# The image will be stored at registry.my-example-company.com:ansible/ansible-configuration/runner:latest

cat .gitlab-ci.yaml
---
stages:
- deploy

variables:
# Or you can provide a ssh_known_hosts file
ANSIBLE_HOST_KEY_CHECKING=False

deploy-ansible:
image: registry.my-example-company.com:ansible/ansible-configuration/runner:latest
stage: deploy
before_script:
# The SSH private key is a CI variable
- echo $SSH_PRIVATE_KEY > id_ed25519
- chmod 400 id_ed25519
script:
- ansible-playbook --private-key id_ed25519 site.yaml

This removes the need for a central Ansible node and the need for external garbage collection, since these CI jobs are ephemeral by default. That being said, if we have a more complex network setup we might need runners in these zones and a way to control which job is executed where.

HTTP (Webhooks)

In this scenario, we setup another central Ansible node that will run the playbooks, however, there won’t be a SSH connection from the CI job. Instead we will trigger a webhook on this central Ansible node. While this scenario is more complex it offers some benefits when compared to previously discussed options.

Since there are several ways to implement incoming webhooks, we will not view a specific implementation but discuss the concept. Interestingly enough, a webhook-based feature is currently in developer preview to be provided by Ansible. Event-Based-Ansible provides a webhook service that can trigger Playbooks.

In this example we have a service providing webhooks running on central-ansible-node.local on port 8080. This service is configured to run Ansible with various options which we can pass via a HTTP POST request. This request will certain data that controls the Ansible playbook.

cat trigger-site-yaml.json
{
"token": "$WEBHOOK_TOKEN",
"playbook": "site.yaml",
"limit": "staging"
}

cat .gitlab-ci.yaml
---
stages:
- deploy

variables:
CENTRAL_ANSIBLE_NODE: central-ansible-node.local:8080

deploy-ansible:
image: docker.io/alpine:latest
stage: deploy
before_script:
- apk add curl gettext
script:
# Replace the $WEBHOOK_TOKEN placeholder in the file with the real value from the CI variables
- envsubst < trigger-site-yaml.json > trigger-site-yaml.run.json
- curl -X POST -H "Content-Type:application/json" -d @trigger-site-yaml.run.json $CENTRAL_ANSIBLE_NODE

From a security standpoint we remove the need for reachable SSH ports, the central Ansible node now just accepts HTTP (or specific HTTP methods) secured by Tokens. Furthermore there now is a layer between our CI jobs and the Ansible playbooks which can be used to validate requests.

That being said, this extra layer could also be seen as a hurdle that might break. And beside the central Ansible node we now need to manage a service that provides these webhooks. However, in the future Event-Based-Ansible might alleviate some of these issues.

Conclusions

Deploying Ansible is quite flexible due to its simple operational model based on SSH. As we have seen, there are some low-effort alternatives to AWX/Tower that can be applied in various use cases. However, at some point there is a maintainability tradeoff. Meaning, even though AWX/Tower might not appear as stable or is sometimes tricky to operate, once an environment is large enough it might be a better option than custom creations. Probably not a satisfying conclusion for an article named „without AWX/Tower“, I agree.

Foreman presents an interesting alternative due to its myriad of other features that you get with an installation. Finally, Event-Based-Ansible could be very promising webhook-based solution when it comes to automated deployments. Starting simple and then pivoting to a more complex system is always an option.

References

Markus Opolka
Markus Opolka
Senior Consultant

Markus war nach seiner Ausbildung als Fachinformatiker mehrere Jahre als Systemadministrator tätig und hat währenddessen ein Master-Studium Linguistik an der FAU absolviert. Seit 2022 ist er bei NETWAYS als Consultant tätig. Hier kümmert er sich um die Themen Container, Kubernetes, Puppet und Ansible. Privat findet man ihn auf dem Fahrrad, dem Sofa oder auf GitHub.

Migration von GitLab mit Upgrade auf EE

Wer GitLab CE produktiv im Einsatz hat und mit den zusätzlichen Features der EE Version liebäugelt, der wird sich beim Umstieg zwangsläufig mit den Migrationsschritten auseinandersetzen, sofern der GitLab Server nicht von einem Hoster betreut wird. In diesem Post zeige ich, wie der Wechsel inklusive Migration auf einen anderen Server gelingen kann und beziehe mich dabei auf die Omnibus Version basierend auf Ubuntu 18.04. Der Ablauf ist gar nicht so kompliziert. Bügelt man die EE-Version einfach über die aktuelle CE Version, dann hat man nur zwei Schritte zu beachten. Wenn man allerdings einen EE Server parallel hochzieht, um dann auf diesen zu migrieren, so kommen ein paar mehr Schritte hinzu.
Deshalb zeige ich im Folgenden wie man per Backup und Restore auch zum Ziel kommt. Die ersten Schritte sind ziemlich identisch mit dem, was GitLab vorgibt.

  1. Zuerst erstellt man sich ein Backup:
    gitlab-rake gitlab:backup:create STRATEGY=copy

    Einfach um sich den aktuellen Stand vor der Migration zu sichern.
    Hier kann nicht viel schief gehen. Man sollte aber darauf achten, dass genügend Speicherplatz unter /var/opt/gitlab/backups zur Verfügung steht. Es sollten mindestens noch zwei Drittel des Speicherplatzes frei sein. Das resultierende Tar-Archiv sollte man sich anschließend weg kopieren, da im weiteren Verlauf ein weitere Backup erstellt wird.

  2.  Nun führt man das Script von GitLab aus, das die apt-Sourcen hinzufügt und ein paar benötigte Pakete vorinstalliert:
    curl https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh | sudo bash
  3. Jetzt checkt man noch kurz welche Versionsnummer von GitLab CE momentan installiert ist:
    dpkg -l |grep gitlab-ce
  4. Danach installiert man die GitLab EE Version. Dabei wird die CE Version gelöscht und eine Migration durchgeführt. Um den Versionsstand kompatibel zu halten, verwendet man die gleiche Versionsnummer wie aus dem vorherigen Schritt. Lediglich der teil ‚ce‘ wird zu ‚ee‘ abgeändert:
    apt-get update && sudo apt-get install gitlab-ee=12.1.6-ee.0
  5. Nun erstellt man erneut ein Backup:
    gitlab-rake gitlab:backup:create STRATEGY=copy
  6. Das neu erstellte Backup transferiert man einschließlich der Dateien /etc/gitlab/gitlab-secrets.json und /etc/gitlab/gitlab.rb auf den EE Server. Das lässt sich z.B. per scp bewerkstelligen:
    scp /var/opt/gitlab/backups/*ee_gitlab_backup.tar ziel-server:./
    scp /etc/gitlab/gitlab-secrets.json ziel-server:./
    scp /etc/gitlab/gitlab.rb ziel-server:./
  7. Auf dem Zielserver sollte man natürlich GitLab EE in der entsprechenden Version installieren. Hier hält man sich am besten an die offizielle Anleitung. Nicht vergessen bei der Installation die Versionsnummer anzugeben.
  8. Jetzt verschiebt man die Dateien die man per scp transferiert hat und setzt die Dateiberechtigungen:
    mv ~/*ee_gitlab_backup.tar /var/opt/gitlab/backups
    mv ~/gitlab-secrets.json ~/gitlab.rb /etc/gitlab/
    chown root:root /etc/gitlab/gitlab-secrets.json /etc/gitlab/gitlab.rb
    chown git:git /var/opt/gitlab/backups/*ee_gitlab_backup.tar
  9. Dann startet man den Restore:
    gitlab-rake gitlab:backup:restore

    Man quittiert die einzelnen Abfragen mit ‚yes‘. Hat sich die URL für den neuen EE Server geändert, dann sollte man das in der /etc/gitlab.rb anpassen. In diesem Fall sind auch Änderungen an den GitLab Runnern vorzunehmen. Es reicht dann allerdings wenn man auf dem jeweiligen Runner in der Datei config.toml die URL in der [[runners]] Sektion anpasst, da der Token gleich bleibt.

Troubleshooting:

Es ist allerdings auch möglich, dass es zu Problemen mit den Runnern kommt. Dies zeigt sich z.B. dadurch, dass der Runner in seinen Logs 500er-Fehler beim Verbindungsversuch zum GitLab meldet. In diesem Fall sollte man zuerst versuchen den Runner neu zu registrieren. Falls die Fehler bestehen bleiben, ist es möglich, dass diese durch einen CI-Job verursacht werden, der während der Migration noch lief. So war es zumindest bei mir der Fall. Mit Hilfe der Anleitung zum Troubleshooting und den Infos aus diesem Issue kam ich dann zum Ziel:

gitlab-rails dbconsole
UPDATE ci_builds SET token_encrypted = NULL WHERE status in ('created', 'pending');

Wenn man sich das alles sparen möchte, dann lohnt es sich einen Blick auf unsere GitLab EE Angebote der NETWAYS Web Services zu werfen.

Gabriel Hartmann
Gabriel Hartmann
Senior Systems Engineer

Gabriel hat 2016 als Auszubildender Fachinformatiker für Systemintegrator bei NETWAYS angefangen und 2019 die Ausbildung abgeschlossen. Als Mitglied des Web Services Teams kümmert er sich seither um viele technische Themen, die mit den NETWAYS Web Services und der Weiterentwicklung der Plattform zu tun haben. Aber auch im Support engagiert er sich, um den Kunden von NWS bei Fragen und Problemen zur Seite zu stehen.

GitLab Security Update Reviewed

NETWAYS schreibt die Sicherheit ihrer gehosteten Kundenumgebungen groß – daher kamen auch wir nicht um das Sicherheitsupdate in den GitLab Community Edition und Enterprise Edition Versionen herum.
GitLab machte Mitte März öffentlich, dass man auf eine Sicherheitslücke sowohl in der Community als auch in der Enterprise Edition gestoßen sei. Dabei soll es sich um sogenannte Server Side Request Forgery (SSRF) handeln, was Angreifern unter anderem den Zugriff auf das lokale Netzwerk ermöglich kann. GitLab löste dieses Problem nun durch ein Software Update und den Einbau der Option „Allow requests to the local network from hooks and services„, die per default deaktiviert ist und somit den Zugriff der Software auf das lokale Netz unterbindet.
Das Update auf eine neuere Version ist für viele Nutzer eine gute Lösung – allerdings nur, wenn diese keine Webhooks oder Services, die das lokale Netz als Ziel haben, nutzen. Denn wenn plötzlich die Webhooks und Services nicht mehr funktionieren und weder der Admin noch der User weiß, dass man bei der obigen Option einen Haken setzen muss, dann beginnt erst mal die Fehlersuche.
Fazit: Wer unbedingt auf Webhooks und ähnliches angewiesen ist, muss wohl oder übel vorerst mit der Sicherheitslücke leben.
Eingebaut wurde der Fix in folgende GitLab CE und EE Versionen: 10.5.6 / 10.4.6 / 10.3.9. Eine vollständige Übersicht an Releases findet man hier: GitLab Release

Managed Hosting bei NETWAYSGitLab CE und GitLab EE

NETWAYS Web Services – 30 Tage kostenfreies Testen von GitLab CE und GitLab EE

NETWAYS Web Services: Connect to your own Domain!

Our team has continued to improve the NETWAYS Web Services products for providing more comfort to our customers. Now any app can be run under its own Domain Name in combination with its own SSL certificate. This option is available for the following products:

The implementation within the product is quite simple. After your app has been created successfully, you will find a new webform in your app’s Access tab. Here is an example of a Request Tracker app:

As the webform shows, customers simply have to enter a registered Domain Name and their SSL Certificate as well as their SSL Key. The implementation in the app will be done by our NWS platform fully automated. Customers only need to take care about the quality and correctness of the certificate and to make sure they enter the DNS record correctly on their Domain Name Server. The IP address needed will be indicated underneath the webform in the information section. Furthermore, it is still possible to set an additional CName for your app. This means that your customized Domain Name and the CName can be used in parallel. Furthermore, the platform generated standard URL will stay valid and customers can always go back to the initial settings by removing their entries from the webform.
After clicking the save button, the app will be restarted and all changes will be taken into production immediately.
The bonus of this option is clear: Anybody working with your apps will be glad to use easy to read and memorize URLs. Furthermore, company identity and culture is even more important today than ever. So why not also provide your SuiteCRM, Rocket.Chat or Nextcloud with a well branded URL?
More information can be found on our NWS homepage, in any of our product sections or by contacting us via the NWS livechat.
Important note: All NWS products are up for a 30 day free trial!

Monthly Snap April > NETWAYS Web Services, Braintower, OSBConf, Teamweekend, AKCP, OSDC, GitLab CE, Galera, GitHub, Puppet

In April, Isabel startet with announcing a new Software Update for Braintower and Martin K. continued with a new NETWAYS Web Services tool: Rocket.Chat Hosting.
Then Julia announced the call for papers for the Open Source Backup Conference 2017 in Cologne while Marius wrote about the end of Ubuntu 12.04 LTS.
Later in April, Marius H. gave some practical tips for the Galera Cluster and Dirk wrote about contributing to projects on GitHub.
Then Martin K. presented the next NETWAYS Web Services App, GitLab CE Hosting, and Julia told about the latest OSDC-News.
Furthermore, Catharina reviewed the team event of the commercial departments while Noah told us how to block some Google searches in Google Chrome.
Lennart gave an insight in the monitoring project at htp GmbH and Daniel introduced himself.
Towards the end of April, Isabel told about the latest news concerning the AKCP sensorProbe 2+ and Martin S. explained external monitoring by the Icinga2 satellites at NETWAYS Web Services.
Last but not least, Jean reported on monitoring powershell scripts with Icinga 2, while Lennart went on with part 2 of automated monitoring with Puppet.