Not All Clouds Are Created EquaL: why we moved our AI Security Saas Service to Azure
- Mentat Collective
- Apr 16
- 4 min read
In the beginning there was Amazon Web Services (AWS) and all was right with the world of information technology. Gone were the days of needing to capacity plan for physical datacenters, apps could be reliably deployed to a highly available compute platform, developers fell in love with the APIs and speed to deployment, and companies were thrilled with the near infinite and cheap storage of S3. And with a big bang of sorts the market was suddenly flooded with enterprise grade cloud platforms offering similar services.
While Microsoft and Google threw their respective hats into the ring, their services were just not as good as AWS, and the market had already followed AWS as the first mover. But over time Microsoft began to catch-up. The new executive guard Satya replaced the old ways of Balmerism, and the benefits of this transition were a boon for Microsoft and Azure. And until recently if anyone asked for advice on the best cloud platform my default response was AWS, but Azure and Google are close behind. However, as Heraclitus once said "Everything flows. You cannot step in the same river twice (ie. the only constant in life is change)."
I remember the old days of AWS and it was glorious. I used to write code all day and night that would consistently and dynamically spin up complex AWS services consumed by applications and services for fortune 500 companies and large public sector entities. For a while it was easy, fast, cheap, and fun to deploy to AWS. So when my team and I set out to create new code to deploy SaaS infrastructure for a new AI security venture we are working on, our fond memories of deploying to AWS back in the day guided the decision to start with AWS.
The initial code only took 4 hours to bang out and get the entire SaaS infrastructure orchestrated with Terraform in pipelines. Amazing, right! We could spin up the entire environment in less than 10 minutes with everything accessible on the internet. We thought, man AWS is just like we remembered. And then, we deployed the application onto the infrastructure and we quickly realized AWS is not going to work. In fact, AWS has declined dramatically. I am not going to go into all of the issues encountered as that would end up as a book! Below are some of the many many issues:
The portal is bad. I have tried to be as objective as possible about this, but the reality is the portal is bad.
Refresh widgets are everywhere and they stink. Depending on the service you are using, the refresh polling rates seem to be different and inaccurate... a lot!
Lack of consistency in destroy operations across services. Sometimes you type destroy, or the name of the service and delete, or delete and a check box, and the list goes on. I found maybe 6-8 different variations.
Build and destroy is no longer consistent. There were at least a dozen instances where a service component was deployed and the destroy failed to clean-up. In addition, when logged into the portal with a top level account admin, I did not have permissions to destroy. In fact, I had to generate an IAM role, attach it to my user, wait 5 minutes, and then was allowed to destroy. I promise you it was not a user error, and not a code issue. :)
Service redeployments sporadically failed the entire stack no matter what was updated. This was the most infuriating. I cannot stand inconsistency in deployments and destroy operations.
Region selector for services is actually a nightmare. I have definitely logged into AWS and have been immediately terrified that all services are gone, only to realize the region randomly changed from my default. This does happen, and it is not user error. I never changed regions a single time during the deployment, and I found that when logging into the system again in a few hours the region sporadically switched. It does not happen often, but it happens enough to be concerning.
Container services need some love. We ran into so many issues, it is actually too frustrating to type out.
And many, many, many more issues!
What is an engineer to do?! Well, the choice was simple. Move cloud providers. So we spent a few hours and redesigned the entire infrastructure on Azure with Terraform. And I have to say, wow! Just wow! We ran into 0 issues like AWS. Services spin up and down without issues, refresh rates in the portal are nice, the interface on the portal provides more functionality and usability than comparable AWS services, no permissions issues, dynamic and functional Azure service to service authentication and communication, and more. This was legitimately surprising to me because I have been deploying Azure services since before it was AzureRM, and it was really bad. So it was astounding to see the improvements made to the entire Azure cloud from when it was almost unusable to today where it is measurably better than AWS.
Everything Flows:
In the end we decided to go all in with Azure. We can now deploy and destroy multiple SaaS environments in Azure without issue, make updates to running infrastructure without having to worry about outages like AWS, and everything is deployed with Azure DevOps pipelines running Terraform and Terragrunt. To paraphrase Heraclitus change is constant and if you do not recognize this fundamental truth, you will get left behind.
Comments