Improving Profile API, ECA and Sync Script Reliability
We recently had an issue where the Drupal account website returned a 403 error because a server disk was accidentally removed. Our Quarkus Profile API treated the 403 error like a 404 (user not found), which caused the sync script to mistakenly remove 66 users from their GitHub repositories, assuming their accounts were inactive.
To prevent this, we need to improve how our API and sync script handle these scenarios. Specifically, the API should return a 50x error for internal issues like a 403, instead of a 404, so the sync script knows not to take action.
I’d also like us to think creatively about making the system more reliable. For example, we could implement a safeguard that stops the sync script if it starts removing too many users in a row.
Let’s explore all possible solutions and improve the overall robustness of the service.