A Gitops Based Ansible QA Process in the Hybrid-Cloud

Mentat Collective
Dec 15, 2022
5 min read

Updated: Dec 26, 2022

At Mentat we like making things simple, secure, and scalable. So, we decided to create a fully automated Ansible QA process based on Gitops.

Introduction

Quality and consistency of services delivered are arguably the largest key differentiators between automation teams that succeed and automation teams that fail. Testing Ansible playbooks in feature branches can prove to be difficult, unreliable, and daunting to teams adopting Ansible. This is where a mature quality assurance testing harness becomes critical to the success of your DevOps team. However there are no simple ways to truly test Ansible playbooks that modify operating systems and applications. I am sure at this point some might say, "hey we can add simple Ansible tasks to test out Ansible code". That is great, but using a tool to validate itself is not a scientific test. So, what to do? Based on the recent growing demand of clients to deliver consistent tests of Ansible playbooks within a complex hybrid cloud environment, we created a Gitops based approach to QA playbooks that automatically deploys new virtual machines in all major clouds and then immediately runs Ansible on all nodes and reports back results to a central pipeline platform.

Solution Tech Stack:

Ansible - The Code.
Ansible Lint - The Code Defender.
Git - The Code Holder.
Azure DevOps - The Orchestration Engine.
Terraform - The Builder.
Azure Key Vault - The Secret Keeper.

Technical Summary

We started with the main tool of our platform Azure DevOps (ADO). ADO provided the testing harness we needed. We leveraged git and the ADO don't break the build (DBTB) feature to setup an automated pipeline to execute every time a user submitted a pull/merge request (PR/MR) from a feature branch to the develop branch and made a successful pipeline run a hard requirement to merging code into develop. To access this functionality in ADO, simply open your project, navigate to project settings, select repositories, pick your repo, select the branch you would like to setup don't break the build on. Ours looks like this:

We created and attached 3 pipelines - one for each cloud - to the develop branch that automatically execute each time someone submits a PR/MR. I know that you are thinking! What's in the pipeline? What's in the pipeline? Ok, I could not help the Seven movie reference.

Example DBTB Pipeline:

---
variables:
  - name: provider
    value: vmware
  - name: ansible_playbook
    value:vmware_base.yml
  - name: repo_branch
    ${{ if ne(parameters.repo_branch, '') }}:
      value: ${{ parameters.repo_branch }}

  - template: vars/vars-shared.yaml
  - template: vars/vars-linux.yaml
  - template: vars/vars-vmware.yaml

parameters:
  - name: repo_branch
    displayName: Ansible Repo Branch Name
    type: string
    default: ''

resources:
  repositories:
    - repository: templates
      type: git
      name: platform/pipeline-templates
      ref: develop

# CI Triggers
trigger:
  branches:
    exclude:
      - '*'

pool: $(pool)

# Release Stages
stages:
  - stage: Variables
    jobs:
      - job:
        steps:
          - template: steps/terragrunt-vmware.yaml@templates
          - template: ../common-pipeline-yamls/terraform-tfvars-${{ variables.provider }}.yaml
          - task: PublishPipelineArtifact@1
            inputs:
              targetPath: $(tf_directory)/terraform.tfvars
              artifactName: terraform_parameter_vars_$(tf_module)
  - template: terragrunt/terragrunt-apply.yaml@templates
    parameters:
      ado_environment: no_approval_required
      dont_break_the_build: true
  - template: ansible/ansible-lint.yaml@templates
    parameters:
      ado_environment: no_approval_required
      dont_break_the_build: true
  - template: ansible/ansible-apply.yaml@templates
    parameters:
      dont_break_the_build: true
  - template: terragrunt/terragrunt-destroy.yaml@templates
    parameters:
      ado_environment: no_approval_required
      dont_break_the_build: true

There is not much to see in the pipeline itself, but you should get the general idea from the contents. We leverage Terraform and Terragrunt to dynamically build out Linux virtual machines on VMware, which will then be used to run our Ansible playbook. Each of the three pipelines looks similar to this, except we pass in cloud specific details to build machines on the other platforms. This allows the Ansible team to test their playbooks out against perfect images on all platforms that are packaged with Packer. There are many advantages to this type of workflow, but the biggest one is consistent standard VMs that are delivered. And, by making the DBTB pipelines all a hard requirement in order to successfully merge code from feature to develop, we ensure that production systems are not negatively impacted by untested and/or faulty code.

Let's walk through what is happening in each stage of the pipeline. The first step here:

- template: steps/terragrunt-vmware.yaml@templates

We first execute one of our pipeline templates to get the build agents ready to deploy a machine to VMware. This requires reaching out to Azure Key Vault to secretly retrieve secrets necessary to authenticate to VMware. These values are held secretly in memory so no one can cat out the values or see what the values are in the pipeline execution output.

Next we generate a dynamic TFVars file that is created from the ADO pipeline web-form the user submitted. These parameters are passed to the agent as a TFVars file, which is then created and placed in the same directory as the module that we are executing.

- template: ../common-pipeline-yamls/terraform-tfvars-${{ variables.provider }}.yaml

Here we are referencing some common variables shared across teams and users.

Next we run a Terragrunt Apply from a pipeline template. Pipeline templates operate as functions that execute things in a specific a repeatable order so that we know every time someone runs a Terraform build they will have no issues.

- template: terragrunt/terragrunt-apply.yaml@templates
    parameters:
      ado_environment: no_approval_required
      dont_break_the_build: true

As part of this pipeline template we are also passing in a couple of parameters. The first parameter is provided to bypass our approval system because we will be automatically destroying the machines after the Ansible job completes.

Next we run Ansible. We created two parts to every Ansible DBTB. First we run Ansible lint against the new feature requested to be merged. Linting is incredibly important at scale. I cannot tell you the number of developers that have argued with us about how linting is not important and just a headache, and months later as they attempt to scale their code out and/or upgrade they run into all sorts of issues. Eventually every single developer has come around and recognized the necessity of properly formatted code. The downside of this approach is now you are stuck going through possibly hundreds of warnings and errors flagged by the linter. So it is better just to start linting code from day one instead of playing catch up.

  - template: ansible/ansible-lint.yaml@templates
    parameters:
      ado_environment: no_approval_required
      dont_break_the_build: true
  - template: ansible/ansible-apply.yaml@templates
    parameters:
      dont_break_the_build: true

If the code is clean on linting, the stage passes and moves on to applying the Ansible code to all VMs on all compute providers simultaneously. Once linting has completed we move on to the actual execution of the Ansible playbook. This stage simply collects all necessary secrets from Azure Key Vault and runs the Ansible playbook.

The last stage is an auto-destroy phase that always runs. This way we can ensure that we clean up after ourselves and leave lingering machines running and costing cash.

Outcomes:

Gitops automated quality assurance testing harness
Better quality code
Near elimination of breaking code changes as issues are automatically caught
Faster delivery of features
More secure code as the linting step checks for possible security gaps with linting and gitleaks

P.S.

This was a short one. :) So, check back in a week or two to see some of the other interesting things we have been doing with Gitops, Terraform, ADO, and Ansible.

A Gitops Based Ansible QA Process in the Hybrid-Cloud

Introduction

Technical Summary

Outcomes:

P.S.

Recent Posts

Comments