AwesomeOps Ansible Drift Control

Welcome back to our continuing series on AwesomeOps with Ansible and Windows Server!

In this post we will discuss one of our favorite topics, drift control! Why do we call it drift control and not drift prevention? Simple, complete prevention is impossible due to fluid nature of systems. Systems need to be used, updated, patched, debugged, etc. So instead of fighting this impossibility, it is wiser to setup automation systems that adapt with the ever shifting sands of hosts in the wild. Just like the Giphy of Han rocking a controlled drift with a heavily modified Rx7 with a Veilside Fortune body-kit. Yes we are a collection of nerds that enjoy sports, cars, and tech! Let's get into it.

Golden Goose = Golden Inventory!

The most important part of managing Ansible with large sets of hosts at scale is a well maintained and grouped Golden Inventory. This can be a daunting task, but with a few tips and tricks you will be on your way to automation bliss in no time. The first key is to fully integrate your golden inventory into the pipeline build process that we covered in parts 1-3 of AwesomeOps (Here they are). With the magic of Git we can create a dynamic inventory management system replete with hosts grouped into as many groups as you like! How does this work? We are glad you asked! The first thing you need to do is create a pipeline template that clones all necessary repositories. This pipeline template will be called by other pipelines like a function:

---
parameters:
  - name: inventory_action
    type: string
    default: add
  - name: dont_break_the_build
    type: string
    default: false

steps:
  - task: AzureCLI@2
    displayName: Git Clone Ansible Repositories
    condition: ne(variables.ansible_os, '')
    inputs:
      scriptType: bash
      scriptLocation: inlineScript
      azureSubscription: $(tf_azure_service_connection)
      addSpnToEnvironment: true
      inlineScript: |
        set -eo pipefail
        cd "ansible-temp-$(Build.BuildId)"

        REPO="ansible-infrastructure-common"
        (set -x; git clone --branch "develop" git@ssh.dev.azure.com:v3/your-ado-org-here/your-project-here/${REPO})

        REPO="ansible-inventory-common"
        (set -x; git clone --branch "develop" git@ssh.dev.azure.com:v3/your-ado-org-here/your-project-here/${REPO})

        REPO="ansible-win-base"
        (set -x; git clone --branch "develop" git@ssh.dev.azure.com:v3/your-ado-org-here/your-project-here/${REPO})
        ls

Simple enough. With a little Git magic this step clones all repos we need into our AKS agent. Next you will need to add a pipeline task to update your inventory.

- task: AzureCLI@2
    displayName: Update Ansible Inventories
    condition: ne(variables.ansible_os, '')
    inputs:
      scriptType: bash
      scriptLocation: inlineScript
      azureSubscription: $(tf_azure_service_connection)
      addSpnToEnvironment: true
      inlineScript: |
        set -eo pipefail
        if [ '${{ parameters.dont_break_the_build }}' == 'true' ] && [ '${{ parameters.inventory_action }}' == 'add' ]; then
          echo "changing directories"
          cd "ansible-temp-$(Build.BuildId)"
          chmod 777 -R ansible-inventory-common
          cd ansible-inventory-common
          git pull
          echo 'Creating branch on inventory for dont break the build'
          git checkout -b "feature/dbtb-$(Build.BuildId)"
          echo '${{ parameters.inventory_action }} to $(provider) inventory with ansible-inventory.sh $(provider)-inventory.inv "${{ parameters.inventory_action }}" "$(VM)" "$(IP)" "$(environment)_$(ansible_os)"'
          ./ansible-inventory.sh $(provider)-inventory.inv '${{ parameters.inventory_action }}' '$(VM)' '$(IP)' "$(environment)_$(ansible_os)"
        elif [ '${{ parameters.dont_break_the_build }}' == 'false' ]; then
          echo "changing directories"
          cd "ansible-temp-$(Build.BuildId)"
          chmod 777 -R ansible-inventory-common
          cd ansible-inventory-common
          git pull
          echo '${{ parameters.inventory_action }} to $(provider) inventory with ansible-inventory.sh $(provider)-inventory.inv "${{ parameters.inventory_action }}" "$(VM)" "$(IP)" "$(ansible_os) $(environment)_$(ansible_os)" "$(provider)"'
          ./ansible-inventory.sh $(provider)-inventory.inv '${{ parameters.inventory_action }}' '$(VM)' '$(IP)' '"$(ansible_os)" "$(environment)_$(ansible_os)"' "$(provider)"
        else
          echo 'No changes to Ansible inventories'
        fi

It is a lot simpler than it looks. This is one bash if statement that is running some checks to determine what to do based on conditions of the run. Because we have a lot of cool features baked into our platform, we need to run checks to determine if someone is attempting to merge a feature branch into a mainline branch or if someone is running a production build somewhere. But what is up with the provider variable? Great question, because we build into AWS, Azure, Azure-Gov, and VMware we decided to group hosts into separate provider specific files. It looks something like this:

grouped ansible inventories — ansible inventory

Now, within each of the files above we group our assets by: environment, operating system, location, and application stack. This may seem trivial, but creating grouped assets is just as important as the golden inventory itself. When you group assets you will enable your teams to target all hosts within a group to apply a change with Ansible, or even combine tags to do targeted Ansible updates. An example of this would be, if a manager asks you to update only Windows Servers in the DMZ that are hosting 1 particular application. Cool, no need to sort through long lists of hosts, no need to decipher crazy naming conventions, no need to worry. You just simply pass your groups into your playbook and you are golden. No pun intended. Or did I intend that? Either way, moving on.

Next you will need to add a new step to merge your inventory changes:

  - task: AzureCLI@2
    displayName: Git Merge Inventory Changes
    condition: ne(variables.ansible_os, '')
    inputs:
      scriptType: bash
      scriptLocation: inlineScript
      azureSubscription: $(tf_azure_service_connection)
      addSpnToEnvironment: true
      inlineScript: |
        echo "Push and merge changes to ansible-inventory-common"
        echo "changing directories"
        cd "ansible-temp-$(Build.BuildId)"
        cd ansible-inventory-common
        git pull
        git add --all
        git commit -m "Changes to Ansible inventories - Build $(Build.BuildNumber)"
        if [ '${{ parameters.dont_break_the_build }}' == 'true' ]; then
          git push origin "feature/dbtb-$(Build.BuildId)"
        else
          git push origin develop
        fi

See that! A few simple git commands and all of your changes will get merged into your common inventory repo. To be complete, you will want to make the inverse of these steps so that when your user pipelines run the destroy step, the corresponding hosts will get removed from your golden inventory. Once that is in place you will have the following golden inventory setup:

Git based golden inventory
Hosts are dynamically added to the inventory as part of the build process
Hosts are dynamically removed from the inventory as part of the destroy process
Hosts are dynamically grouped into logical groupings that make sense for your organization
Full lifecycle management of inventory hosts replete with historical change log tracking provided by git
The ability to schedule Ansible Windows Server baseline playbooks targeting one or many groups

Now that you have a golden inventory setup to dynamically manage hosts, let's move on to drift control with Ansible.

Ansible Drift Control

Imagine all configuration settings from operating system to application being automatically maintained. A place where SREs get a restful night sleep while on call. A place where junior devs break things and changes are reverted soon after!

It is possible, it just requires proper configuration management architecture and tooling, and most importantly buy-in from your C-Suite. But how, prey tell, does someone start? We are glad you asked! All we need to get started is two cups of Ansible, a dash of tags, and a sprinkle of scheduled pipelines.

Two Cups of Ansible

Regardless if you are running Ansible in a small mom and pop shop or an enterprise environment, it is important to first perform an assessment of your Windows Server environment to define the layers of Ansible that you will need to lay down. Below is a common set of layers that we have found inside of enterprises:

Core infrastructure settings: Active Directory domain join, NTP, DNS, Key Management System (KMS), WinRM configuration, registry entries, proxy configuration, Certificate Authority (CA) and certificates in general, timezone, BGInfo, network settings, etc. This is your Base Layer. It will function as your foundational layer.
Define all agents that need to be configured on your Windows Servers. These can be things like: Centrify, McAfee, Filebeat, CrowdStrike, etc. This would be your Second Layer. This layer is command and control layer that ensures users do not do anything nefarious.
Dedicated log files and location, look and feel of your GUI, and common Windows Server features that will be added. This is your Third Layer.
Application stack configurations. This is your Fourth Layer, and is commonly referred to as your Services Layer. Your team will provide the infrastructure (hypervisor/cloud) and the baseline configuration of your operating system, while developers will deploy onto your fully configured environment.

We will give you some time to think about this and then we will continue.

Ok, that should be long enough. :) The next thing you will want to do is take some time and think about the visual layout of how you will structure your Ansible playbook/role, as this can help determine what parts of each layer go where. This might seem trivial, but after decades of experience with AwesomeOps, we promise all of the little things really really matter because they add up over time and you will eventually find yourself in the middle of a lake of spaghetti code. That, we promise, is not a fun place. You will quickly get tired of eating spaghetti; delicious though spaghetti may be. All of that being said, we recommend keeping the structure simple to read and logical. Below is a sample directory outline of what our Ansible Windows Server Base configuration repository looks like.

Nice and tidy! But what is going on here? So we decided to treat each agent as a separate service file, we created a baseline_config.yml file where all of our foundational layer configurations would go, a calling main.yml file to run each service yml, and a cleanup.yml step to remove anything that may have been created during the configuration process that should not be on the system. How does the phrase go, something like: Cleanliness is close to AwesomeOps? Well, something like that :). Again, visual layout might seem trivial, but remember we are visual animals. If you make it super simple and appeal to the our primary sense of vision, understanding the code becomes much easier. Here is a sample of our main.yml file, which will lead us into our next part of the blog --> Ansible Tags.

---
- name: Gather Facts About Host
  ansible.builtin.setup:
  tags: 
    - always
    - gather_facts

- name: Windows Configuration Tasks
  block:
    - name: Include Baseline Config
      import_tasks: baseline_config.yml
      tags:
        - packer
        - drift_control
        - baseline_config

    - name: Include User Tasks
      import_tasks: users.yml
      tags:
        - packer
        - drift_control
        - users

    - name: Include 2022 Tasks
      import_tasks: server2k22.yml
      when:
        - "'10.0.20348' in ansible_distribution_version"
        - ansible_os_family == 'Windows'
      tags:
        - drift_control
        - version2022
    
    - name: Include BgInfo Tasks
      import_tasks: bg_info.yml
      tags:
        - packer
        - drift_control
        - bg_info

    - name: Include Crowd Strike Tasks
      import_tasks: crowd_strike.yml
      tags:
        - packer
        - drift_control
        - crowd_strike

    - name: Include IIS Tasks
      import_tasks: iis_setup.yml
      when: win_iis_service == 'Yes'
      tags:
        - iis
    
    - name: Run Cleanup Tasks
      import_tasks: cleanup.yml
      tags: 
        - packer
        - drift_control

This main.yml is our calling yml that targets our services ymls. And, you will notice we use tags extensively.

Dash of Tags

Tags will level up your Ansible game significantly (more on tags here). And they are easy to use and implement. But, just like everything else in the world of Automation, the first thing we recommend doing is planning out what tags to use, when they will be used, and what part of your stack they will be used in.

Tags are a key component in controlling drift within your organization. If you noticed above we have a tag called drift_control. This tag is used in combination with our scheduled pipelines to run the same Ansible Windows Server baseline code that is used in Packer and during the Build process. So how do you run ansible playbooks with tags? We are glad you asked:

ansible-playbook ansible_win_base.yml --tags "drift_control, other_tags_here"

OR

ansible-playbook ansible_win_base.yml --tags all

Samir and Michael Bolton said it best - no not that Michael Bolton of "How Am I Supposed To Live WIthout You?" - the Michael Bolton from Office Space. Again, it is all the little things that matter most. With the use of Ansible tags not only can you create some Windows Server drift_control within your organization, but you now have targeted deployments of different sections of your Ansible role. This has a number of benefits:

Faster deployment times
Easier troubleshooting
Ability to mix and match tags
Ability to map layers of Ansible code to layers of your build

What we did with our drift_control tag is target layers 1-3 described above, and tied these runs into our pipeline platform with Azure DevOps to control drift.

Sprinkle of Pipelines

Scheduling your tagged Ansible configuration to run multiple times a day will make sure the baseline configuration of all hosts in your environment will be kept in compliance with organization standards. Most any platform that has a scheduled pipeline feature can handle this. We use Azure DevOps in most of our engagements so below are a few helpful pipeline settings to get you started.

We could spend hours talking about Azure Pipelines, but we won't. We will just cover some of the basics here to get you on your way with Ansible drift control. First up, you will need to define your schedule.

# Scheduled Triggers
schedules:
  - cron: 0 12 * * 0
    displayName: Weekly on Sunday
    branches:
      include:
        - master
    always: true

As you can see the syntax is a simple cron. This little section here will tell Azure DevOps to run code weekly on Sunday. You can get really creative with you to meet the demands of your organization. Here is a simple example of a more advanced schedule.

schedules:
- cron: '0 3 * * Mon-Fri'
  displayName: M-F 3:00 AM (UTC) daily build
  branches:
    include:
    - /releases/*
- cron: '0 3 * * Sun'
  displayName: Sunday 3:00 AM (UTC) weekly latest version build
  branches:
    include:
    - /releases/lastversion
  always: true

To learn more about Azure pipeline schedules, checkout this helpful Microsoft link https://learn.microsoft.com/en-us/azure/devops/pipelines/process/scheduled-triggers?view=azure-devops&tabs=yaml.

Once you have your schedule defined, you will want to think about the use of the pipeline. Do you want to turn this pipeline into a utility that your DevOps team can use to do targeted deployments, or will it be left alone to run every day 4 times a day? If you choose to use this as a utility pipeline, you will want to consider your variables and parameters. We chose to have our drift control pipeline serve both purposes. This gives us the ability directly within Azure DevOps to use dropdown boxes to run sections of our Ansible Windows Server Base, or run all code on a small subset of hosts.

The End

Well folks, that is all for today. Next time on AwesomeOps we will cover some of the lower level Ansible Windows Server tasks!

AwesomeOps Part 4: Ansible Drift Control On Windows