How we made unit testing part of our engineering culture

Duncan Edwards
Instawork Engineering
7 min readFeb 17, 2024

--

Unit testing is an important practice in software development that involves testing individual units or components of code in isolation to ensure they function as intended. By writing comprehensive unit tests, Instawork engineers improve code quality, increase confidence in their software and reduce bugs for our users. Our Continuous Integration (CI) pipeline executes more than 1 million tests every day because our engineers understand the benefits of writing meaningful tests for their code.

For Instawork, the result is a robust platform where high-quality code translates into seamless experiences for our users. Every test we write builds a safety net of confidence around our software that enables our engineering team to make bigger and bolder improvements to the platform.

In this blog post, we will explore the importance of unit testing, describe how we ensure changes are sufficiently unit tested, provide tips for writing effective unit tests, and delve into techniques we use to eliminate flakiness from our tests.

Why we write unit tests

Software testing and QA are necessary practices for many reasons in the world of software engineering. However, unit tests offer a multitude of benefits that contribute outside of standard testing:

  1. Regression prevention: Unit tests help prevent regression bugs by retesting the functionality of code components whenever changes to the codebase are made. This ensures that previously working features continue to function as intended, including behaviors that may not be picked up by testing user functionality.
  2. Code documentation: When confused about the behavior of a particular class or function in your codebase, you can check the unit tests associated with it instead. Unit tests can provide great insights into the expected behavior of individual code components. We also find they help with the onboarding of new team engineers at Instawork.
  3. Code maintenance: Refactoring or building upon existing code is often a more difficult job than writing it in the first place. Engineers need to consider whether any existing behaviors might change unintentionally with the modifications they are making. It’s much easier to make changes with the safety net of comprehensive unit tests to ensure these behaviors remain unchanged.
  4. Avoid Flakiness in code: Instawork’s business logic is often linked with dates and times due to the features we provide for our users. One of the ways we’ve enhanced our unit tests is by running any new or modified tests for particular “dangerous” days and times — think daylight saving changeovers and around New Year’s. More on how we do this later.

At Instawork, we have over 30,000 Python unit tests that our engineers have written because everyone understands the benefits we gain from them. However, we don’t only rely on this culture to ensure unit tests are consistently added and maintained.

How we ensure unit tests are added with changes

The most effective way of ensuring unit tests are kept up to date is to make it an important part of engineering culture. At Instawork, all engineers understand the importance of unit tests and stay up to date with techniques and tools we use for them. Our company value to “Act like an owner” highlights the importance of strategic work like writing unit tests and means we all recognize the value it brings to building a high quality product.

Aside from just relying on people, we also measure the unit test coverage of our codebase using pytest-cov and Sonarcloud. Sonarcloud is particularly useful for measuring coverage on every PR and decorating the PR comments with coverage of new and modified lines of code. This combined with the culture of understanding the importance of unit testing means that engineers have the tools available to ensure their changes are properly tested.

A PR comment by Sonarcloud with the unit test coverage of the changes

Some tips that we have found useful for setting up this workflow with Sonarcloud:

  • Make sure you export the coverage report to a stand Cobertura format XML file before analysis with Sonarcloud
  • Sonar Scanner will need the pull request key and branch passed to it in order to correctly decorate the PR comments. We use CircleCI to automate these processes and find the standard variables useful for this.
  • To reduce the impact of this on your CI run time (particularly for large codebases), we find it helps to only measure coverage from tests related to the changes being made. Consider using a tool like pytest-testmon for this.

Our tips for writing an effective unit test

  1. Keep them simple — Sometimes this is easier said than done. A couple of ways to keep your unit tests small and concise are:
    - Use factories to create common objects without lots of boilerplate — with pytest we use factory_boy, and most languages have at least one library to write factories for tests
    - Avoid duplicating implementation logic — Your test implementation should not have transformation logic testing randomized parameters and outputs, instead use constant parameters that you know the output for
  2. Use AI to help with an initial implementation — At Instawork, we have found that using Github Copilot can save time in writing effective tests. Just name your test and let it determine the implementation, making changes as needed to test extra assumptions or change inputs. This is more effective by following the next recommendation
  3. Give meaningful names to your test — When checking test results, no engineers wants to see test_save or test_validation as tests that have failed - this requires looking at the tests to see exactly what they are doing. Try more verbose names that indicate what would have gone wrong, for example test_save_fires_creation_event or test_email_address_validation
  4. Make sure your tests are readable — Sometimes, tests can be read by experienced engineers and beginners to find out more about the interface and behaviour of pieces of code. If it’s not clear what the test is validating, refactoring may be required.
  5. Mock external interfaces, but try to run tests with internal components — At Instawork, we use pytest-xdist to run tests in parallel, which also works with Django to ensure database isolation between threads. We also created a simple pytest fixture to ensure OpenSearch indices could be used by tests in isolation.
  6. Tests should be deterministic — Flaky tests are a source of pain for all engineers. This became such a concern at Instawork that we started to address it more directly with a special check.

Reducing flakiness

One of the main headaches for large engineering teams is the inevitable appearance of flaky tests. In an effective CI pipeline where unit tests need to pass in the codebase to merge and deploy changes, flaky tests can grind development productivity to a halt if left unaddressed!

Many organizations choose to address flakiness by re-running test failures multiple times in all pipelines, but we find this isn’t sufficient at Instawork because:

  • This does not help for flakiness caused by date/time problems in tests, for example tests failing at particular times of the day or days of the week
  • Flaky tests may be an indication of flaky code that needs to actually be addressed by the engineering team and could impact the quality of the product
  • Ignoring flakiness leads to a culture of poor unit tests and accepted problems in the development cycle. Eventually it will lead to the same point of pipelines needing to be re-run due to flakiness, with developers not confident of whether their changes are breaking tests or the tests themselves are just poorly written.

One approach we tried was checking for general and time-based flakiness by running new and modified tests at key dates and times:

  • December 31 this year at 12am
  • January 1 next year at 6am
  • The day before the next daylight savings changeover at 12pm
  • The day of the next daylight savings changeover at 6pm
A high level view of our PR test pipeline

When creating this job, we did notice some ways it can incorrectly identify tests as flaky:

  • Rather than freezing time, we needed it to tick on from the frozen date, which is possible using freezegun’s tick parameter. This is because some tests rely on created/modified dates of objects for ordering.
  • Be careful when using factories as they can be initialized before the pytest fixture is run, which we found was the case with Factory Boy. We used the LazyAttribute class to fix this issue in our factories.
  • Due to Python’s implementation of threads and multiprocessing, freezing time can cause issues in these. For this reason, we do not freeze time for these libraries.

We did see a decrease in flakiness as a result of this, but the increased cost from running unit tests multiple times on a PR push means we now only run it when requested on a PR, rather than as a part of our usual CI process. If you are having unexpected issues with flaky tests due to special dates and times for your application, you may want to consider a similar approach.

Unit testing is a critically important practice for Engineering at Instawork

At Instawork, we believe in comprehensive and robust unit testing to build reliable software. Writing unit tests provides our engineers with confidence of avoiding potential regressions, insights into units of code without reading implementation and simpler processes to refactor and improve code.

We take pride in maintaining a high level of code coverage, and keep that a focus of our engineering process through culture and by measuring new code coverage. Unit testing for us is a testament to our commitment to the enduring success of the business.

Unit testing is an investment in the long term success of our codebase. We recommend embracing unit testing into an engineering team’s culture as a necessary and helpful exercise, rather than an additional task that gets thrown onto the end of each development task. Consider tests not as a burdensome task tacked onto the end of a user story, but as an integral and rewarding part of software development.

--

--