Elon Musk famously talks about the real challenge that Tesla faced/s isn't building cars but building and running the factories for building cars at scale. He refers to the factories as "the machine that builds the machine" and while the factories aren't still fully autonomous and can't build cars without human labor, those humans' productivity is multiplied manyfold through highly automated and integrated production processes. The same applies to software development – the code isn't going to write itself anytime soon (although we're getting closer to that situation step-by-step), but with the help of integrated and automated infrastructure, developers are enabled to write code more efficiently, faster, and with more certainty.
At the core of every development team's developer workflow, there is git (of course other version control systems exist as well, but in reality, almost everyone is using git these days). And while git's cheap and simple (admittedly not everyone agrees about that) branching model makes it easy for developers to code away in their own branch as well as rebase and merge their branches back together, the real challenge that teams are facing is orchestrating just that so that one person's changes propagate to the codebases others are working on and merged code ends up in the production system fast, predictably and with certainty.
Most teams rely on one of two techniques for managing branches and getting merged code deployed to production:
mainbranch plus feature branches: once a pull request is merged, the new version of the
mainbranch is deployed to production automatically.
developmentbranch and feature branches branched off of that: pull requests are merged into
development, and that branch is merged into
productionin intervals following some sort of schedule or process.
While in reality there are thousands of variations of the latter approach, they
are essentially all the same. The goal of those multi-branch setups is to add a
safety net between a pull request being merged and the code being deployed to
the production system. Whenever "enough" changes have accumulated in the
development branch (or a scheduled release date has been reached), those
changes would typically go through some kind of testing process where they would
be checked for correctness (and potentially fixed) before eventually making
their way to the production system(s).
That results in all kinds of inefficiencies though:
developmentbranch it's often unclear where they originate; all kinds of unrelated changes have been merged together in the branch so it's unclear whether a bug originates in the pull request that implemented the respective feature or whether it is only caused by the combination of those changes with others.
developbranch and the testing/validation being performed on that branch. They now have to get back to something they considered done already, get all the context in their minds again while at the same time leaving whatever they are working on now behind, potentially causing problems for others that might be dependant on that work so that in the worst case whole cascades of focus and context switches are triggered.
In fact, these branching models and the inefficient workflows they force developer teams into are almost always only necessary due to a lack of powerful infrastructure. If that infrastructure is in place with proper automation and integration, teams are enabled to adopt a much simpler model and workflow:
mainbranch with auto-deployment
A branching model with a single
main branch and feature branches that are
branched off of that and merged back right into it, is conceptually much simpler
obviously. Furthermore, deploying all changes that are merged back into the
main branch immediately and automatically, dramatically improves the workflow:
Of course, the challenge is to do all the testing (and QA in the wider sense) that happens based on some sort of schedule or process in multi-branch models, for every single pull request – and ideally for multiple pull requests in parallel to achieve high throughput. This is where infrastructure and automation come in.
The main building block of any effective developer infrastructure is of course a good Continuous Integration (CI) system. The most basic task of which being to run automated checks on a set of code changes to establish baseline correctness.
Typically the foundation of those checks are some sort of unit tests (or whatever concept the language/framework of choice uses) to ensure the code in fact does what it is supposed to. They also help to catch regressions early on in cases where a change to one feature causes another, seemingly unrelated one to break. Good test coverage and a fast and stable CI system that runs tests are an absolute requirement for any development team to be successful. While that's something that's not really new or controversial in our industry, there's more than just unit tests that can be leveraged to ensure a set of changes is correct and doesn't lead to regressions
document.cookiein case you don't want your web app to be required to render a cookie banner – there are countless opportunities and I personally believe there's still a lot to do in that area that could have huge positive impacts on developer teams' efficiency.
This list isn't even nearly complete. Carefully analyzing any system and its history of issues usually reveals countless opportunities to automate checks that would have prevented those issues or can help prevent other issues in the future.
All of these techniques test one (sub)system in isolation. However, many systems today aren't built as monoliths but as networks of multiple, distributed systems – e.g. single page apps (SPAs) with their respective server backends or microservice architectures. In order to be able to auto-deploy any of the individual subsystems of such architectures, it's critical to validate they operate correctly in combination with all other subsystems.
The key technique for testing a multitude of subsystems together of course is end-to-end testing (sometimes also referred to as "integration" or "acceptance" testing – the terminology is a bit unclear in practice, asking four different people would typically result in five different opinions on the exact meaning of each of these terms). For a proper end-to-end test, a pull request that changes the code of one subsystem is tested together with the respective deployed revisions of all other subsystems. That allows catching situations where changes to the one subsystem, while completely consistent and correct within that subsystem, cause problems when interfacing with other subsystems. Typical examples for such situations would be backward-incompatible API changes that would cause errors for any client of the API that hasn't been updated yet.
Running such tests requires the ability to spin up a complete system including all of the subsystems on demand. Typically that is achieved by containerizing all of the systems so that a set of interconnected containers could be started up on the CI server. In the case of a web app that would mean serving the frontend app as well as running the API server in two containers and then running a headless browser to send requests to the frontend app (which would make requests against the backend) and asserting on the response.
Besides the simple ability to run these containers any time, another aspect of this is to maintain an example data set to load into those containers so that that data can be used in the end-to-end tests. Such datasets are typically maintained in the form of seed scripts that generate a number of well-defined resources. If such a setup isn't considered early in the project, this is particularly hard to build later on when there is a plethora of different resource types and data stores already – maintaining and evolving that data set along with the code is much easier and efficient.
End-to-end tests aren't the only valuable thing that is enabled by the ability to spin up instances of the system on demand though:
With all this infrastructure in place, it's possible to move all of the testing
and validation that's done en-block after a whole bunch of pull requests have
been merged in a multi-branch model to the point before every individual
pull request is merged. Once it passes all these checks, it can be merged into
main branch and auto-deployed to production with confidence. In fact, this
process can even lead to increased confidence in comparison to scheduled big
releases since every single deployment is now also much smaller in scope which
already reduces risk.
With all that testing and automation in place, it is still possible for things to blow up in production of course. Besides having error tracking with tools like Bugsnag or others in place, the ideal infrastructure also includes a process for running automated smoke tests against the production system after every single deployment.
These are quite similar in nature to the end-to-end tests with the main difference that they run against the production system. Typically those tests would focus on the main flows of an application that also have the highest relevance for the business:
One concern when running anything automated against the production system – potentially many times per day – is the amount of test data that is being produced in the process and that could potentially interfere with analytics or show up for real users. One way to address that (in the case of web applications) is to set a custom header that identifies the client as a test client so that the server can schedule the generated data to be deleted later on or otherwise be filtered from anything real users can see.
An efficient workflow based on effective infrastructure as described in this post undoubtedly raises teams to new levels of productivity. Admittedly, it takes time and effort to set it all up but the productivity gains easily outweigh the cost. In particular, when considered early on in a project, that cost isn't even as substantial as it might seem. The cost however of not having infrastructure like this once it's absolutely needed, which is when trying to scale a team up without scaling down its relative velocity at the same time, is certainly much higher.
If you're interested in development workflows and ways to increase efficiency, also have a look at my talk on the topic in which I also touch on some process aspects:
simplabs is a digital product development consultancy that helps teams ship better software faster, more predictably, and with higher quality. If you're interested in how we could help improve your infrastructure and workflow, schedule a call with us.