A Checklist of Dev-Ops Disciplines
I have worked on a lot of programming projects in my time, and while I was a programming consultant I have worked in a lot of different corporate environments. At some of them, it was easy to be concretely productive: I was able to contribute immediately, and at a rapid rate. At others, actual useful contributions would be impossible until I had a month or more of experience with a codebase, and even then every change would be a long slog. The difference can be overwhelming and palpable.
The biggest contributors to this difference wasn’t what programming language was chosen (though I do care a lot about that), nor how well the code was factored (though that’s also very important), but rather the organizational structure that surrounded the code: the build system, the repo configuration, the tests, the documentation, the ticketing system – the stuff outside of the code itself that was essential to how programmers interacted with the code. Most, but not quite all, of what I’m talking about falls under the header of Dev Ops.
After having read (some of) Code That Fits in Your Head, I have come to believe in the importance of check-lists, so I’ll share with you my personal check-list of important dev-ops and dev-ops adjacent considerations when setting up a new project, so that developers can work rapidly, effectively, and with fewer mistakes.
The stakes are high – if it takes forever to make a change, if the process between modifying your code and running your code is too long, programmers won’t be able to work unless they’re much more confident, biasing them towards overly simple fixes and against more complicated refactors. If there’s no tests, programmers will be overly careful modifying the code to avoid breaking things, and so the code won’t be able to evolve. New team members will take much longer to gear up, and everyone will be much less productive.
The worst thing is, managers are liable to dismiss developers' complaints, and developers are unlikely to have the confidence to raise them. It’s easy to be unsympathetic to the complaint that a job is tedious or inconvenient. It sounds to many programmers and managers alike like laziness, and the obvious answer is “Well, that’s why we pay you the big bucks.” Obviously development at these shops is still possible, and the old hands at the company, who are used to whatever system’s in place, have accepted the costs already.
But make no mistake: Developer convenience and happiness is closely connected to developer productivity and accuracy. So let’s discuss how to make a development environment convenient for a developer.
So here’s how we can make programming convenient, as a check-list with some explanation for each item. Many of these items I learned from colleagues and leaders along the way in my programming career; this is my first attempt to collect all of them.
- Let programmers use their own preferred development environment
Many developers have life-long habits and long-accumulated configurations for their favorite editors. I know I do! Standardizing IDEs or even operating systems can be tempting, but in general it isn’t worth it. Programming includes a lot of little steps, and making all of them take longer by changing a programmer’s environment can destroy momentum.
- Provide standards for developer workstations
This might seem to contradict the previous point, but I honestly think you need both. It should be super easy to figure out what kind of operating system requirements and dependencies are necessary to build all the projects, because, as we’ll get to soon, developers should be able to build projects locally.
For example, most Linux distributions are customizable enough that programmers will be able to find a development environment within that distribution that suits them. The dependencies of a project can then be specified as a package list within that distribution, but the developers can then customize the rest of their interface. Commonly used distributions should be preferred if developers are doing their own IT, so that they can easily find help online.
You’ve changed a line of code. Congratulations! Now how long will it be until you can see the results of your change in action? How many steps do you have to take to see if it fixed your issue? If it broke compilation? If it passes tests?
If the amount of time or number of steps is low, then people will be able to try out various solutions, use trace statements to debug issues, and otherwise interact with their code like a live system. If it’s high, they have to rely more on their own reasoning, which is fallible, more likely to lead to bugs, and more likely to lead to timid, overly-conservative changes, that work around problems rather than addressing them.
So how do we accomplish this?
- Projects should run natively and directly on developer workstations
In my mind, this is almost a deal-breaker for development. If you have to deploy to a dev environment or install on a physical piece of embedded hardware to test your software, your dev cycle will be far too long. Dev environments and physical hardware are of course essential for testing, but using them for absolutely all development introduces resource constraints where there don’t have to be, and lengthens dev cycles.
Even if the local dev environment is different from the prod environment, that’s fine. Even if some of the code won’t run and make sense, it’s still important to be able to run the rest of code locally. Even if it’s running on an embedded platform and operating system with no proper simulator, some of the code will work on Linux or macOS. Those components should be testable on the developer workstation itself.
- Building a project locally should be a single command
- Building and running automated tests for a project should be a single command
When I say a single command, I mean it. Exactly one. Two is far too many. Once you try it, you’ll never go back. If your workplace doesn’t do this, write a script. Check the script in.
Of course, if different developers have different computers, this might be difficult, but if you assume a standard set of dependencies, (or use a reproducible build system like NixOS), this command can just be an invocation of the build system.
In situations where it’s more complicated, a shell script should be
written to encapsulate the complexity. This shell script should be included
in the repo and maintained and checked by CI along with the other code,
so that it always works. The exact invocation of the command should
be completely invariate, and documented in the projects
Programmers don’t need to be distracted by complicated multi-part instructions that haven’t worked exactly right in years, or that work on some machines and not others or by twiddling with their Docker settings. They should be focused on actually improving and fixing code.
- Building and running a project locally should be a single command
This is similar to the above but might require sample configuration to be checked in along with the repo.
- Builds should be reasonably fast
Developers should program on sufficiently powerful computers for their
builds. Build scripts should use options like
-j and if helpful send
builds seamlessly to build farms (the seamlessness is important; it
should still be a single command for the developer and result in a local
build and run). Private caches should be set up, if this is possible
with your build system.
If programming in C or C++, header file hygiene can be an important consideration in build speeds – invest time into it. Use incrementality features of your build systems rather than having scripts that clean every time. Structure the code so that incremental builds are possible.
If necessary, allow developers to build only part of the project (while still making it simple to build the entire project).
- Use version control for all projects
This is hopefully obvious to all modern teams, but I wanted to make sure I said it anyway to talk some about why it’s important.
The first and more obvious upshot of version control is being able to undo and research mistakes. If the code changed how it works, developers should be able to ask “when did it break” before asking “how did it break.” If the changes in the log are fine-grained enough, this might prevent the need for investigating the “how.” (Note that bisecting often requires fast dev turn-around as well – these are interconnected.) Version control should always be used. Even informal, one-person projects, such as writing test programs to try out APIs, should exist within a version-controlled repo.
The second upshot is that it enables collaboration. This also makes it important even for very small projects, because it enables you to easily ask your colleagues for help, and your colleagues can then look at the code with their own preferred development environment and try out fixes on their own machine.
- Developers should be proficient in Git
It’s not enough to cargo cult Git knowledge or focus on that “one guy who understands Git.” Everyone should put the effort in to be that “one guy.” If you don’t know what “reflog” means or how rebasing differs from merging or how to edit commits deep in the history, you’re not a sufficiently proficient git user. Many, perhaps even most, programmers aren’t.
- Use and Enforce a Branching Discipline
Even on relatively small projects, no one should be committing and pushing
directly to the
master branch. If people push directly
master, every commit is automatically collaborative. This will make
developers commit less frequently than they otherwise should, and will
decrease the effectiveness of version control by having fewer versions
to go back to.
It will also, obviously, lead to people accidentally “breaking the build”
as projects get bigger. Committing a small change and merging that
develop should be two different actions. The
first should be done extremely often, and the second should only be
allowed if a certain number of hoops have been jumped through.
- Enforce CI
Before code can be merged into master, it should build. By default, merging
into master should be impossible unless the repository has verified the
build with CI. This is where we can easily test that it builds in a
deployment setting in addition to a development setting, where
artifacts can be created to deploy to servers or embedded devices
(though this should also be possible to do locally) and where
we can run automated tests. Coding standards should be enforced here,
cargo fix are great tools for Rust.
Ideally, your CI scripts should be checked into the same repo as the systems
they test, as is supported by GitLab with its
- Have tests in the repo
This is related. I’m not going to go into how to write tests and test coverage and all of that here; that’s again a separate topic for many many books. But there should be tests, and the important tests should be in the repo, and they should automatically be run by CI.
Remember: Tests aren’t just a tool for making sure the developers didn’t mess up after a fact. They’re there so developers can make sweeping changes with confidence.
- Avoid mono repos
This one’s simple: The git log is too spammy and CI for the whole thing
takes too long to run. Also, we have the technology of submodules, or,
if on Nix,
- Require code review
This should be enforced by your Git system. As for how to actually do code review, this is a big enough topic to be its own section, which is coming up.
The main point of code review is not to make sure bugs don’t get into the code, although it helps with that. The main point of code review is to mitigate bus factor, that is, to make sure there’s more than one person who is ready to maintain the code. All other guidance flows from here.
- At least the person who maintains the code should also review
If the MR is written by the primary maintainer of the codebase, it should reviewed by whoever would have to step up if they were abruptly “hit by a bus.”
This ensures that everyone maintaining the code is in agreement with not just style and correctness concerns, but in the general design, architecture, and organization of the code.
- The standard should be “Would I take responsibility to maintain this?”
If the answer is no, why not? Asking myself this question motivates me
to make more suggestions about how the code should be factored, so I
can jump in and make changes easily like I can with my own codebases,
rather than just simply verify that it looks like it works and doesn’t
This question leads to some natural sub-questions:
- How hard is it to find bugs in?
It shouldn’t just not have bugs, it should be obvious it doesn’t have bugs. This way, when a bug is actually discovered, code that isn’t buggy but is complicated won’t distract the poor developer trying to find the cause.
- How hard is it to modify to do something else?
- How easy is it to mess up?
This is where DRY (don’t repeat yourself) comes in. If I repeat the same pattern of code more than 2 times, and someone modifies it, they might only modify some of the instances of the pattern. This can also be mitigated not through abstraction but by putting all the instances next to each other, which is sometimes appropriate.
The code, however, should also not do premature abstraction, because then it will be impossible to find issues among all the spaghetti of function calls and variable references, so this is a balancing act.
- If a bug is found to be caused by this change, will we know which part to revert?
Remember, programmers should be able to bisect instead of having to read an entire codebase when they want to find a bug. If you found out that the bug was caused by this change set, would you be relieved to know or would you still have a lot of work ahead of you?
Last but not least, documentation.
- Documentation should say how to build the project
It should, as mentioned, be one command, and it should not depend on very much set up beyond “having a standard development workstation.”
- Documentation should say how to run the project
What flags or configuration does it take? How do you tell it to re-read the configuration? Does it use any environment variables?
- Documentation should say what the project is for
This should be before how to build it and run it, and should explain who might want to run it and where it fits into the broader organization, and the first things a programmer might want to know before looking at it. This will help people understand the stakes of modifying it, and where to start looking for features. This should be covered in the lede paragraph.
Which leads me to:
- There should be a lede paragraph
This should introduce the repo to someone who’s never heard of it and doesn’t have any context for what they’ve stumbled across. It should include its role in the company’s tech stack, its status, and what technologies it uses.
Here’s some examples:
This is the main repo for our flagship product, and it is one of our few repos that is not open source. Customers use it directly to control the widget machines, which it contains all the drivers for, and also Node.js code to serve the user-facing web interface.
This is run as a twice-daily batch job to automate pruning the widget description files. It is run on customer machines, and is open source as local administrators might want to customize it. It is written entirely in Perl 4 except for one module that is written in APL. Sometimes, it doesn’t work correctly, and we have to manually run an earlier version written in JCL and Cobol (link).
This implements the new DSL for widget description. Currently, it only supports translation to old widget descriptions, but it is hoped that it will eventually be integrated into the main repo. It is a research project still under active development. It is written in Haskell and Idris, and contains, as a component, a custom Prolog interpreter.
- Documentation should be discoverable
It should either be in the
README.md of the relevant repo or linked
to directly from there.
I guess I lied when I said that documentation was last. Project management is, I think, a topic for a different blog post, but what I wanted to say about this is: It should be very easy to add a new TODO item that the programmer doesn’t have to remember anymore. If it takes too long to make a ticket, developers will lose their flow on the project they were trying to work on, or will produce fewer tickets, in a bad way.
Ideal is “type a single sentence and press a single button” either in web or (preferably) command line. The resultant TODO items can then be fleshed out in a separate grooming meeting.
Paying attention to these things is a bigger multiplier on developer productivity than finding “10x developers,” and is essential for attracting and retaining good developers. Improving these things is hard, especially at organizations that are set in their ways, but it is far more important than it might look. Dedicated dev-ops professionals are essential in such things.
NewsletterFind out via e-mail when I make new posts!