Hello again from the world of AwesomeOps! We are back at it again. This time we focus on how to mitigate fear of the audit by getting ahead of the game and securing your DevOps systems. The specter of an audit looms large over anyone that has been in IT long enough. Yet, fear not, AwesomeOps is here to quell those fears!
Why Secure Before The Audit?
In the annals of AwesomeOps, it is written that an ounce of prevention is worth a pound of cure. What this practically means is that as DevOps engineers we must proactively shield code from the myriad threats that besiege organizations on a daily basis. There are many marketing terms for this like "DevSecOps" and "Shifting Left", but in reality DevOps has always included security at its core. However, while security is a core component, and security is usually baked into automation processes, teams do not typically have a simple way to show that they are secure. The ability to visualize and track security within your automation system is imperative not only for audits, but so that your team can make decisions on how best to improve. Remember, the audit is simply a mirror reflecting the state of system defenses. This means that the simplest way to be proactive and get ahead of the audit is to create the mirror so both your team and the auditor are 100% on the same page.
Build the Mirror: Dashboards Galore
There are loads of tools on the market that can store and visualize data, so do your research before you pick one and deploy. We use a standardized stack of services to accomplish this:
Azure DevOps (ADO) - Orchestration layer providing pipelines, code storage, and general automation.
Kafka - Scalable and durable message queuing system.
Elasticsearch - Data storage and search.
Kibana - Visualization engine that hooks up to Elasticsearch.
Terraform and/or Terraform Enterprise (TFE) - Build automation.
Ansible and/or Ansible Automation Platform (AAP)- Configuration management.
The simple overview of how this works end to end is: pipelines run in ADO and call the tool of choice for the job at hand. Within each stage of a pipeline run data about the run is pushed to Kafka. Kafka has been configured to connect to Elasticsearch so that when data is shipped it eventually makes its way into Elastic. Once the data is landed, Kibana is used to visualize all of the data. That is all well and good, but you may be asking what data should I visualize? Great question! Let's get into it.
Basic Dashboard Data Visualizations:
Regardless of the particular regulatory body or compliance requirements your team needs to work with, there are some basics that you will need to track.
Identity that executed the pipeline. The user and service account. If automated, make sure to add that attribute to the data.
Time of execution - When this was executed. Important to track in the event your system is distributed in multiple time zones.
Systems built. All metadata about the services or VM built.
Systems modified. Make sure that you ingest all of the metadata related to the system or service the code modified. Example VM: datacenter, hostname, IP, domain, operating system, version of OS, application, and project, change number such as ServiceNow.
Secrets checked out. (NOT the actual secret)
Tool used in the execution.
Ports used to connect to systems.
What was changed. Think Ansible's - Ok, Changed, Failed, Skipped.
Pipeline link. This is super helpful. Adding a clickable link back to the pipeline creates an easy audit trail to follow. Links are also helpful when troubleshooting.
ADO organization and project name.
Repository name and link and commit SHA.
With the above data points you will have a solid baseline to start building out your auditing dashboard.
Once you have data landed into Elastic, you are ready to start creating dashboards. The basics of a dashboard are:
Keep it simple. In the case of actionable dashboards it is important to walk the line of usability and necessity of data visualized. Having all of the data displayed on a dashboard will make the visualization useless as eyes will glaze over and people will stop using the dashboard. An example control is below.
Have controls that filter data based on what you want/need to see. This is particularly important in larger organizations. Think filtering by location (country, locality in the country, etc), line of business (LOB), compliance requirement.
Combine simple donut visuals with raw data. Donuts visuals are great tools to represent percentages. Raw data is great for engineers that need to troubleshoot. Example: you notice that 10% of your RedHat hosts failed to install a required package. You can click the failed 10% within the donut, which will filter all of the raw data.
Now that you have a basic mirror of your automation platform in place, it is important to layer in results from 1 or many security tools. Doing this will provide your team and auditors an additional degree of confidence in the security of your platform.
Secure Coding & Static Analysis
When in doubt follow the market leaders in security coding best practices. We follow Microsoft Security Development Lifecycle (SDL) when building automated systems. The basics are: Security Training for the entire team, Security Requirements gathering and implementation, Define Metrics and Compliance visuals, Manage Third Party Tool Risks, Use Approved Tools, Static Code Analysis, and more. If you have not heard about Microsoft SDL before, definitely read and re-read their process here: https://www.microsoft.com/en-us/securityengineering/sdl/practices
The link above provides fantastic insights on process, but if you want to talk practical tools,
there are many security tools available that can scan your code and provide best practices and security guides based on your Terraform or Ansible code. The key is to implement a suite of tools that target different types of vulnerabilities.
Layer one is the pre-commit hook (Checkout our Terraform Security Blog Here). This is the most important item to implement as it will act as the first line of defense by not allowing secrets and generally bad code from landing in your source control system. Next, make sure you have gitleaks or some type of secrets checking tool that can identify potential secrets committed to code. This tool should be part of the pre-commit, run nightly across all repos, and results data should be pushed to your data visualization system.
The next layer is a set of tools that will analyze your code for implementation best practices and/or security. We typically implement TFSec and/or Checkov and Ansible Lint. These tools are helpful in pointing out things like a DevOps engineer set an S3 bucket or Azure Storage to public when it should have been private.
All of these tools should be integrated into both pre-commit, PR/MRs, and CI/CD pipelines. Remember, security is about layers. So make sure to implement checks at multiple layers in your automation stack. Check security before code is committed, check security when code is committed, push security reports to a central dashboard, tie notification triggers to your dashboard to alert you when a probable violation occurs, and check nightly to catch those stragglers that work late. The point is, turning on 1 tool and calling it a day is a recipe for a security disaster, so make sure when you implement and manage automation security it is done in layers and that you visualize it along with your other automation data.
The End, For Now:
We will be back soon with another edition of AwesomeOps! We hope that you and/or your team finds this article generally helpful and interesting in preparing for an audit. If you enjoy our blogs, let us know! Add a comment, contact us on LinkedIn, or reach out to learn more about Mentat.
コメント