Ways we make changes to the Application
There are a few main ways that we can make changes to our application:
Updating an upstream package which then can be referenced in our applications that can be approved after integration tests have passed.
Small changes to the application such as quality of life features and some minor bug fixes that can quickly be rolled out with no issues.
Large changes such as rearchitecting an application that may or may not be a breaking change for downstream dependencies. Usually it requires a lot of planning and can interfere with the day-to-day usage of a development environment.
The issues we used to face
The final point on large changes was our biggest reason for a dedicated development environment, some changes required time to be tested and iterated on until the team was happy to release it, or to try out alternative solutions in parallel. There were a lot of problems that came with the approach of using a single development environment however:
Every change was pushed to the development environment, this ranged from small changes that could be rolled out immediately, to large breaking changes that needed a lot of testing. We could not roll out our small changes to production without either rolling back the breaking changes or waiting for the team to get it working properly.
A lot of independent changes being pushed into one environment decreased the overall confidence in the deployment process as the number of changes increased. The deployment usually required operations approval and was a big process to get it rolled out which teams needed to dedicate time for, usually near the end of a sprint.
In the event of a rollback - which became more likely the more we held off from production deployments - every other change had to be rolled back also. This resulted in a loss for the customer as much needed features had to be delayed until some other issue was fixed.
Our solution focused around the idea that a developer could launch their own development environment from a feature branch, run end-to-end tests on it and then when satisfied, merge into master which would deploy to production.
The idea is simple, however to get our application and infrastructure to get to the point that it is now took a lot of effort from all areas of our Platform Engineering team. This would have been simple if we did not have many microservices and package dependencies coupled on our Q-CTRL platform. We do this because we have different engineering teams responsible for different parts of the platform, as well as there being many benefits to adopting a microservice architecture for scalability and code separation.
We started with an ideal workflow to work out our requirements:
Developer modifies code on their branch for their service.
Pushes the code to git and creates a pull request.
During the CI process, the Docker image is built and then deployed to a platform alongside all of the other application components. A developer could also modify different parts of the stack and include it in their environment. For example, a front-end and a back-end change together.
Once the environment has been deployed, end-to-end tests now run on the stack to ensure that nothing has been broken.
When the end-to-end tests pass, the developer can merge their change into the master branch, which would kick off a production deployment.
The temporary environment then gets cleaned up to save costs.
While the workflow is not complicated itself, there are many things that need to be taken into consideration:
How do we create DNS entries for new environments?
How do we handle secrets?
How do we run our end-to-end tests and how can we update them?
How do we clean up everything at the end?
And most importantly, how can we ensure this environment matches our production environment?
There is no perfect solution to this, some of it lies with the choice of technologies, some of it lies with the way the application works and the amount of dependencies on external services and some of it lies with the way the infrastructure stack itself is designed.
The core of our temporary testing environments is that our whole stack is created using Kubernetes manifests. All of our services, applications and configurations are set within the context of a Kubernetes cluster. With this, we can ensure that our development environment provisioning is very similar to how our production services are configured.
As shown by the diagram, there is a repository that maintains the definitions of the Kubernetes manifests that will be used by any application repository, the environment is created in Kubernetes into a new namespace once an application has finished building its new image. This image is set using the `kustomize edit set image` command.
Once the applications have been deployed successfully, an E2E job is then run inside the cluster to ensure that the environment is running successfully. The definition for this E2E task is a package that is located in another repository. In the context of our environment, the E2E test is just another application component in Kubernetes.
Once we are finished with our environment, we can simply delete the namespace which would delete every resource that was just created. Using Cluster Autoscaler, our cluster can scale for increase or decrease in requests.
Just creating Kubernetes manifests is not enough for a fully fledged application, especially if you want to automate other steps such as DNS creation and secrets management.
While trying to make our stack deployments as automated as possible, we would also like to not reinvent the wheel ourselves or use proprietary solutions, so we have opted to utilize the great open source technologies out there in the community.
We make extensive use of external-dns to automate the creation of our DNS records. Each namespace is the prefix for all of our domains. For example, api.q-ctrl.com will be deployed as <branch_name>.api.q-ctrl.com. With external-dns, you can set a host name annotation on each of your ingresses which will then create a DNS record to your desired DNS service, for example Cloudflare or Route53.
A question that may be asked already is how we handle getting our docker secrets and certificates shared across multiple namespaces. Our answer to this is Kubed, Kubed allows you to create a secret and sync it across any namespace you wish. The advantage of doing it this way is that we don’t have to create a secret every time we want to deploy a new environment as Kubed will sync it automatically to the namespace.
Nginx Ingress Controller + OAuth2 Proxy
Along with our application, we deploy some other services such as pgAdmin for viewing our database tables, RedisInsight for viewing performance on our redis cache and a service for retrieving temporary credentials for our environments. As these services are a bit more sensitive, we wish to easily put some authentication behind any service we wish. OAuth2 Proxy can be integrated with an Nginx Ingress Controller where requests will always be redirected back to our authentication service. Some more information around implementation can be found here. Doing it this way means that we don’t have to configure custom authentication for each service and we can just let the ingress handle it through Kubernetes definitions.
As a result of creating dynamic environments on each pull request, developers can now get feedback on their changes a lot quickly and view their code in a real cloud environment that is almost identical to production. It gives them the ability to try experimental changes without breaking a shared environment and blocking other developers, collaborate with the DevOps team on experimenting with infrastructure changes through Kubernetes manifests and to have their code deployed to production when it’s ready, without having to worry about any other potential code change breaking their feature.
At Q-CTRL, we have moved from a giant deployment at the end of each sprint that everyone prayed would work, to multiple deployments to production per day with very high confidence. Harnessing both good engineering practices and the power of Kubernetes, we can have confidence in rolling back a deployment to the last revision and have other developers continuing to create features in their own environment at the same time. Another unintended side effect is that we can also create separate environments for the purpose of performance and load testing. Something that would usually require a new environment and a bunch of engineering work to set up is something that we already do on an hourly basis and requires no intervention from the operations team.