| Image by GR Stocks via Unsplash Copyright-free
Why it even exists...
This whole journey started a long time ago in Azure and was a big challenge for me when I was trying to get to know the whole Azure estate. A number of resources created across Azure tenants and subscriptions were quite significant and in many places it was obvious that there was no single standards for the resource creation process which made the whole task much harder.
Even now when we are migrating to AWS and trying to eliminate unnecessary resources in Azure to cut costs, it can be challenging because knowledge about resources created in multiple places is distributed across teams or even lost in some cases.
Cloud resources management in Azure has been handled in a few ways:
- manually via Azure portal
- using Azure ARM scripts
Changes were made by:
- CSRE team
- Engineering teams
From our observations there were some challenges within the engineering teams as they were required to:
- know infrastructure management solutions (ARM in Azure and Terraform later on)
- know details of the WR infrastructure to be able to correctly deploy resources (select networks, setup security groups, IP whitelist, IAM policies etc.)
Mainly due to the above lots of people were just using the copy/paste approach and/or creating tickets to CSRE team requesting help.
To mitigate those challenges and allow engineering teams to focus on writing code instead of worrying about infrastructure we started to think about how we could improve this situation.
We assumed that the most promising option is to create an abstraction between developers and cloud infrastructure by creating a simple meta-language which will allow teams to tell us what they need and we will handle infrastructure details.
This is where Resource Manager came in...
Resource Manager provides:
- enforcing standards
- enforcing security rules
- manage code to inject credentials to applications
- managing DBs (RDS, MongoDB Atlas)
- managing access to Aiven Kafka
- ensure proper and unified tagging (important for cost management, resource tracking, resource decommission)
- managing least privilege IAM roles based on resources used by service (there is no way for engineers to manually override it nor loosen them)
Management of Terraform environment
- central place to control terraform modules version
- central place to enforce terraform upgrades
- central place to enforce terraform modules upgrades (for example adding monitoring or improvements to RDS)
- manage terraform providers configuration
- manage terraform state storage
- helps teams concentrate on code because they don't have to think about infrastructure details
- infrastructure code managed by the team which is focused mainly on this part
- infrastructure code managed and owned by one team makes it easier to reuse it and avoid reinventing the wheel by multiple teams
- relatively easy and quick to introduce changes and standards when infrastructure is managed via central place/team
- whatever is changing on the infrastructure side is handled by the owner team and does not require any changes on the services side nor require them to learn a new language. This might include:
- migrating to a new way of managing resources (Azure ARM/Terraform/ AWS CloudFormation etc.)
- migrating to a new Terraform versions
- adding support for a new cloud or methods of resources management
- adding new features or fixes to terraform modules requires updates in a single place instead of multiple service repositories
Without a unified way of managing infrastructure all the above updates might be a long and painful process that require a lot of synchronization across teams who own particular service/resources code.
Resource Manager evolution...
Resource Manager (1.0)
At first Resource Manager (1.0) was a simple templating tool. It had a few limitations like:
- supports Azure resources only
- doesn't allow to define different groups of resources for separate environments (ex. dev/tst/ppd/prd envs must have use the same group of resources)
--- version: 1.0 metadata: name: service1 #Service name team: devops #Team name project: test_project #Team project owner_contact: firstname.lastname@example.org #Team contact email data: vault: secrets: - name: dbpassword common-secrets: - name: appdynamics-key resources: databases: # Currently we support only mysql - name: db_name1 # Logical name of the database - name: db_name2 # Logical name of the database queues: # Only storage storage accounts queues are supported - name: queuename1 # Queue name # we returns endpoint and credentials - name: queuename2 caches: - name: recipientservicecache topics: - name: topic subscriptions: - topic_name: topic_name #Topic created by other service by opther name: my-service-subscription #ServiceSubscription
Resource Manager 2.0
Once the decision about migrating to AWS had taken place we extended Resource Manager's capabilities with a new set of features which will allow it to handle multiple cloud solutions:
- added support for multiple cloud providers (ex. AWS, Mongo Atlas, Aiven)
- add flexibility to define separate groups of resources for each environment if needed (ex. deploy DB only in dev for some POC/investigation)
--- version: "2.0" metadata: # Tags applied to resources name: service1 # Service name team: devops # Team name project: hello-world # Team project owner_contact: email@example.com # Team contact email environments: - name: dev resources: # List of supported resources to be deployed to the cloud. templates: # List of configurations to process - name: tst resources: # List of supported resources to be deployed to the cloud. templates: # List of configurations to process - name: ppd resources: # List of supported resources to be deployed to the cloud. templates: # List of configurations to process - name: prd resources: # List of supported resources to be deployed to the cloud. templates: # List of configurations to process
How it works...
The below flow describes the way Resource Manager works in the pipeline.
- [CI] Resource Manager process resource.yaml manifest and any configuration template files defined by service
- [CI] Resource Manager reads necessary configurations for each environment where terraform changes will be deployed (2/3/4/5)
- [CI] Resource Manager generates Terraform code with requested resources (6)
- [CD] Pipeline retrieves Terraform scripts for deployment (7)
- [CD] Pipeline executes Terraform with scripts for selected environment (8)
- [CD] Pipeline deploys Kubernetes manifests (9)
Resource Manager offers a simplified solution for managing resources for services, mainly by allowing teams to use predefined resources and quickly start working without waiting for the DevOps team to deploy resources. It was a key tool which allowed to expedite onboarding new services and moving to Kubernetes.
It is also worth mentioning that the cost of adding support for new resources is relatively small.