Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the Breakpad crash reporter asks the user if the user would like to send a crash report. If the user answers "yes!", then the Breakpad crash reporter collects data related to the crash, generates a crash report, and submits that crash report as an HTTP POST to Socorro--specifically the Socorro collector.
The Socorro collector is one of several components that comprise Socorro. Each of the components has different uptime requirements and different security risk profiles. However, all the code is maintained in a single repository and we deploy everything every time we do a deploy. This is increasingly inflexible and makes it difficult for us to make architectural changes to Socorro without affecting everything and incurring uptime risk for components that have high uptime requirements.
Because of that, in early 2016, we embarked on a rearchitecture to split out some components of Socorro into separate services. The first component to get split out was the Socorro collector since it needs has the highest uptime requirements of all the Socorro components, but rarely changes, so it'd be a lot easier to meet those requirements if it was separate from the rest of Socorro.
Thus I was tasked with splitting out the Socorro collector and this blog post covers that project. It's a bit stream-of-consciousness, because I think there's some merit to explaining the thought process behind how I did the work over the course of the project for other people working on projects.