On March 21st, a patch was pushed to production which included changes that resulted in temporary loss of access of a large swath of our contributor users, as well as unnecessary email traffic to our committer base in the wake of the fix for the issue. To better anticipate and prevent these issues, we should introduce proper unit testing through mocking of the Github API and test these scripts end to end as part of our CI builds. While these mocks won't be perfect they should provide a much higher confidence in the results without having to run the script against production Github with development stubs.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
I agree that we must improve our test coverage for the sync script. We need to do a better job at detecting regressions before they make it to production.
However, I'm hesitant about the necessity and sustainability of mocking the GitHub API. While mocking can aid in testing, it may not fully replicate the real API's behaviour.
Moreover, maintaining mock responses could introduce overhead and complexity. We should carefully evaluate whether the benefits outweigh the costs before committing to maintaining a GH API mock server.
Would you agree that today's issue could have been identified via a simple unit test without a mock server? If not, could you please elaborate on why this bug couldn't have been detected with a basic unit test?
When I speak of mocking, I don't mean necessarily a mock server, but something like in the Hubspot sync script where we mock the API code in script for unit tests. This will replace the actual caller code and replace it with something where we can track the inputs + outputs and stop it from calling the upstream server. This still has the problem of not fully replicating the API behaviour, but allows us to implement unit tests to get better insight into how this code acts.
Since the Github code doesn't transform data at all, but instead is just a persistence layer, it can still provide a ton of value to do these tests. Mocking these calls would have minimal impact on the actual sync operations, as we just want to see if the script sent data in the format we expected.
We could put in place a pre-prod project with known list of commiters/contributors and run end-to-end tests on that project. If the lists of users are carefully designed, we should catch most of such issues.
We have some small tests like that, but it has a lot of drawbacks. The output isn't easily verifiable, it requires having a prod key in the dev instance, and provides a very narrow validation compared to the real script. In the end it wasn't very useful in testing and more often than not isn't used because of the limitations, and would likely be a false sense of security over something like unit testing.
@cguindon and I spoke about this in a call and we've come up with a more concrete plan for improving testing around the Github sync operations. As a change of process, we will endeavor to write unit tests as we develop logic going forward, as it is a gap in how we develop that cannot be replaced by manually testing additions in external environments. This will also include testing regressions in logic like the one yesterday.
For short term changes, we will update the GH sync.js file to be more testable through setting up a class around the functions and exporting properly, and write tests for pure logic functions that don't interact with Github. This should give us more immediate results without having to do mocking of complex methods immediately. This may include some minor refactoring to pull logic into their own methods to be properly tested in places where it is embedded in the methods.
For longer term plans, we will slowly implement mocking methods for the Github API and test the code that interacts with Github and has logical components interwoven. The goal isn't 100% coverage, but to cover any code that experiences regression or provides critical logic to the script. These may be done in parallel with the short term goals, but this plan is about getting value for the testing quickly with progressing levels of effort in writing the tests.
1
Martin Lowechanged title from Implement unit testing and mocking around the Github API to Implement unit testing and mocking around octokit
changed title from Implement unit testing and mocking around the Github API to Implement unit testing and mocking around octokit
As an update, the issue that caused the regression now has testing in the main branch that will be done before releases can go out. Additionally, I've made another patch to report on code coverage in testing, with our current coverage being ~15% across all files.
The biggest gaps in coverage are the wrapper classes around the GitLab/GitHub client libraries for caching + error checking. Adding coverage to those wrappers would massively boost our % coverage though are complicated to address.
Our goal with this issue is to attempt to hit at least 50% code coverage for all scripts, as well as overall coverage.