Volunteer Responsibility Amnesty Day: 06-2022

Friday August 26, 2022, Will Kahn-Greene | share this (mastodon)

Summary

Back in June, I saw a note about Volunteer Responsibility Amnesty Day in Sumana's Changeset Consulting newsletter. The idea of it really struck a chord with me. I wondered whether running an event like this at work would help. With that, I coordinated an event, ran it, and this is the blog post summarizing how it went.

The context

As people leave Mozilla, the libraries, processes, services, and other responsibilities (hidden and visible) all suddenly become unowned. In some cases, these things get passed to teams and individuals and there's a clear handoff. In a lot of cases, stuff just gets dropped on the floor.

Some of these things should remain on the floor--we shouldn't maintain all the things forever. Sometimes things get maintained because of inertia rather than actual need. Letting these drop and decay over time is fine.

Some of these things turn out to be critical cogs in the machinations of complex systems. Letting these drop and decay over time can sometimes lead to a huge emergency involving a lot of unscheduled scrambling to fix. That's bad. No one likes that.

In the last year, I had picked up a bunch of stuff from people who had left and it was increasingly hard to juggle it all. Thus taking a day to audit all the things on my plate and figuring out which ones I don't want to do anymore seemed really helpful.

Further, even without people leaving, new projects show up, pipelines are added, new services are stood up--there's more stuff running and more stuff to do to keep it all running.

Thus I wondered, what if other people in Data Org at Mozilla had similar issues? What if there were tasks and responsibilities that we had accumulated over the years that, if we stepped back and looked at them, didn't really need to be done anymore? What if there were people who had too many things on their plate and people who had a lot of space? Maybe an audit would surface this and let us collectively shuffle some things around.

Setting it up

In that context, I decided to coordinate a Volunteer Responsibility Amnesty Day for Data Org.

I decided to structure it a little differently because I wanted to run something that people could participate in regardless of what time zone they were in. I wanted it to produce an output that individuals could talk with their managers about--something they could use to take stock of where things were at, surface work individuals were doing that managers may not know about, and provide a punch list of actions to fix any problems that came up.

I threw together a Google doc that summarized the goals, provided a template for the audit, and included a next steps which were pretty much tell us on Slack and bring it up with your manager in your next 1:1. Here's the doc:

https://docs.google.com/document/d/19NF69uavGXii_DEkRpQsJuklHxWoPTWwxBp_ucRela4/edit#

I talked to my manager about it. I mentioned it in meetings and in various channels on Slack.

On the actual day, I posted a few reminders in Slack.

How'd it go?

I figured it was worth doing once. Maybe it would be helpful? Maybe not? Maybe it helps us reduce the amount of stuff we're doing solely for inertia purposes?

I didn't get a lot of signal about how it went, though.

I know chutten participated and the audit was helpful for him. He has a ton of stuff on his plate.

I know Jan-Erik participated. I don't know if it was helpful for him.

I heard that Alessio decided to do this with his team every 6 months or so.

While I did organize the event, I actually didn't participate. I forget what happened, but something came up and I was bogged down with that.

That's about all I know. I think there are specific people who have a lot of stuff on their plate and this was helpful, but generally either people didn't participate (Maybe they were bogged down like me? Maybe they don't have much they're juggling?) or I never found out they participated.

Epilog

I think it was useful to do. It was a very low-effort experiment to see if something like this would be helpful. If it was the case that people had a lot on their plates, seems like this would have surfaced a bunch of things allowing us to improve peoples' work lives. I think for specific people who have a lot on their plate, it was a helpful exercise.

I didn't get enough signal to make me want to spend the time to run it again in December.

Given that:

Ift think it's good to run individually. If you're feeling overwhelmed with stuff, an audit is a great place to start figuring out how to fix that.
It might be good to run in a small team as an excercise in taking stock of what's going on and rebalance things.
It's probably not helpful to run in an org where maybe it ends up being more bookkeeping work than it's worth.

Dennis v1.0.0 released! Retrospective! Handing it off!

Friday June 10, 2022, Will Kahn-Greene | share this (mastodon)

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
a template linter for finding problems in strings in .pot files that make translator's lives difficult
a statuser for seeing the high-level translation/error status of your .po files
a translator for strings in your .po files to make development easier

v1.0.0 released!

It's been 5 years since I released Dennis v0.9. That's a long time.

This brings several minor things and clean up. Also, I transferred the repository from "willkg" to "mozilla" in GitHub.

b38a678 Drop Python 3.5/3.6; add Python 3.9/3.10 (#122, #123, #124, #125)
b6d34d7 Redo tarrminal printin' and colorr (#71)

There's an additional backwards-incompatible change here in which we drop the --color and --no-color arguments from dennis-cmd lint.
658f951 Document dubstep (#74)
adb4ae1 Rework CI so it uses a matrix
transfer project from willkg to mozilla for ongoing maintenance and support

Retrospective

I worked on Dennis for 9 years.

It was incredibly helpful! It eliminated an entire class of bugs we were plagued with for critical Mozilla sites like AMO, MDN, SUMO, Input [1], and others. It did it in a way that supported and was respectful of our localization community.

It was pretty fun! The translation transforms are incredibly helpful for fixing layout issues. Some of them also produce hilarious results:

/images/sumo_dubstep1.thumbnail.png — SUMO in dubstep.

/images/sumo-pirate.thumbnail.png — SUMO in Pirate.

/images/sumo_zombie1.thumbnail.png — SUMO in Zombie.

There were a variety of dennis recipes including using it in a commit hook to translate commit messages. https://github.com/mozilla/dennis/commits/main

I enjoyed writing silly things at the bottom of all the release blog posts.

I learned a lot about gettext, localization, and languages! Learning about the nuances of plurals was fascinating.

The code isn't great. I wish I had redone the tokenization pipeline. I wish I had gotten around to adding support for other gettext variable formats.

Regardless, this project had a significant impact on Mozilla sites which I covered briefly in my Dennis Retrospective (2013).

Handing it off

It's been 6 years since I worked on sites that have localization, so I haven't really used Dennis in a long time and I'm no longer a stakeholder for it.

I need to reduce my maintenance load, so I looked into whether to end this project altogether. Several Mozilla projects still use it for linting PO files for deploys, so I decided not to end the project, but instead hand it off.

Welcome @diox and @akatsoulas who are picking it up!

Where to go for more

For more specifics on this release, see here: https://dennis.readthedocs.io/en/latest/changelog.html#version-1-0-0-june-10th-2022

Documentation and quickstart here: https://dennis.readthedocs.io/en/latest/

Source code and issue tracker here: https://github.com/mozilla/dennis

39 of 7,952,991,938 people were aware that Dennis existed but tens--nay, hundreds!--of millions were affected by it.

Socorro/Tecken Overview: 2022, presentation

Monday May 16, 2022, Will Kahn-Greene | share this (mastodon)

Socorro and Tecken make up the services part of our crash reporting system at Mozilla. We ran a small Data Sprint day to onboard a new ops person and a new engineer. I took my existing Socorro presentation and Tecken presentation [1], combined them, reduced them, and then fixed a bunch of issues. This is that presentation.

Project audit experiences

Sunday January 16, 2022, Will Kahn-Greene | share this (mastodon)

Back in January 2020, I wrote How to pick up a project with an audit. I received some comments about it over the last couple of years, but I don't think I really did anything with them. Then Sumana sent an email asking whether I'd blogged about my experiences auditing projects and estimating how long it takes and things like that.

That got me to re-reading the original blog post and it was clear it needed an update, so I did that. One thing I focused on was differentiating between "service" and "non-service" projects. The post feels better now.

But that's not this post! This post is about my experiences with auditing. What happened in that Summer of 2019 which formed the basis of that blog post? What were those 5 [1] fabled projects? How did those audits go? Where are those projects now?

Everett v3.0.0 released!

Thursday January 13, 2022, Will Kahn-Greene | share this (mastodon)

What is it?

Everett is a configuration library for Python apps.

Goals of Everett:

flexible configuration from multiple configured environments
easy testing with configuration
easy automated documentation of configuration for users

From that, Everett has the following features:

is flexible for your configuration environment needs and supports process environment, env files, dicts, INI files, YAML files, and writing your own configuration environments
facilitates helpful error messages for users trying to configure your software
has a Sphinx extension for documenting configuration including autocomponentconfig and automoduleconfig directives for automatically generating configuration documentation
facilitates testing of configuration values
supports parsing values of a variety of types like bool, int, lists of things, classes, and others and lets you write your own parsers
supports key namespaces
supports component architectures
works with whatever you’re writing–command line tools, web sites, system daemons, etc

v3.0.0 released!

This is a major release that sports three things:

Adjustments in Python support.

Everett 3.0.0 drops support for Python 3.6 and picks up support for Python 3.10.

Reworked namespaces so they work better with Everett components.

Previously, you couldn't apply a namespace after binding the configuration to a component. Now you can.

This handles situations like this component:

class MyComponent:
    class Config:
        http_host = Option(default="localhost")
        http_port = Option(default="8000", parser=int)

        db_host = Option(default="localhost")
        db_port = Option(default="5432", parser=int)

config = ConfigManager.basic_config()

# Bind the configuration to a specific component so you can only use
# options defined in that component
component_config = config.with_options(MyComponent)

# Apply a namespace which acts as a prefix for options defined in
# the component
http_config = component_config.with_namespace("http")

db_config = component_config.with_namespace("db")

Overhauled Sphinx extension.

This is the new thing that I'm most excited about. This fixes a lot of my problems with documenting configuration.

Everett now lets you:
- document options and components:
  
  Example option:
```
.. everett:option:: SOME_OPTION
   :parser: int
   :default: "5"

   Here's some option.
```
  Example component:
```
.. everett:component:: SOME_COMPONENT

   .. rubric:: Options

   .. everett:option:: SOME_OPTION
      :parser: int
      :default: "5"

      Here's some option.
```
- autodocument all the options defined in a Python class
  
  Example autocomponentconfig:
```
.. autocomponentconfig:: myproject.module.MyComponent
   :show-table:
   :case: upper
```
- autodocument all the options defined in a Python module
  
  Example automoduleconfig:
```
.. automoduleconfig:: mydjangoproject.settings._config
   :hide-name:
   :show-table:
   :case: upper
```
This works much better with configuration in Django settings modules. This works with component architectures. This works with centrally defining configuration with a configuration class.

Further, all options and components are added to the index, have unique links, and are easier to link to in your documentation.

I updated the Antenna (Mozilla crash ingestion collector) docs:

https://antenna.readthedocs.io/en/latest/configuration.html

I updated the Eliot (Mozilla Symbolication Service) docs:

https://tecken.readthedocs.io/en/latest/configuration.html#symbolication-service-configuration-eliot

Why you should take a look at Everett

Everett makes it easy to:

deal with different configurations between local development and server environments
write tests for configuration values
document configuration
debug configuration issues

First-class docs. First-class configuration error help. First-class testing. This is why I created Everett.

If this sounds useful to you, take it for a spin. It's almost a drop-in replacement for python-decouple and os.environ.get('CONFIGVAR', 'default_value') style of configuration so it's easy to test out.

Where to go for more

For more specifics on this release, see here: https://everett.readthedocs.io/en/latest/history.html#january-13th-2022

Documentation and quickstart here: https://everett.readthedocs.io/

Source code and issue tracker here: https://github.com/willkg/everett

Kent v0.1.0 released! And the story of Kent in the first place....

Tuesday January 4, 2022, Will Kahn-Greene | share this (mastodon)

What is it?

Before explaining what it is, I want to talk about Why.

A couple of years ago, we migrated from the Raven Sentry client (Python) to sentry-sdk. One of the things we did was implement our own sanitization code which removed personally identifyable information and secret information (as best as possible) from error reports.

I find the documentation for writing sanitization filters really confusing. before_send? before_breadcrumb? When do those hooks kick off? What does an event look like? There's a link to a page that describes an event, but there's a lot of verbiage and no schema so it's not wildly clear what the errors my application is sending look like. [1]

Anyhow, so when we switched to sentry-sdk, we implemented some sanitization code because while Raven had some code, sentry-sdk did not. Then at some point between then and now, the sanitization code stopped working. It's my fault probably. I bet something changed in the sentry-sdk and I didn't notice.

Why didn't I notice? Am I a crappy engineer? Sure, but in this case the problem here is that the sanitization code runs in the context of handling an unhandled error. In handling the unhandled error, Sentry passes the event through our broken sanitization code and that throws an exception. Nothing gets sent to Sentry--neither the original error nor the sanitization error.

Once I realized there were errors, I looked in the logs and I can see the original errors--but not the sanitization errors.

"You should test your sanitization code!" you say! Right on! That's what we should be doing! We have unit tests but they run with ficticious data in a pocket dimension. So they passed wonderfully despite the issue!

What we needed was a few things:

I needed to be able to run a fake Sentry service that I could throw errors at and debug the sanitization code in my local environment without having to spin up a real Sentry instance
I needed to be able to see exactly what is in the error payloads for my application.
I needed something I can use for integration tests with the sentry-sdk.

That's how I ended up putting aside all the things I needed to do and built Kent.

So what is Kent?

Kent is a fake Sentry service. You can run it, set the Sentry DSN of your application to something like http://public@localhost:8000/1, and then Kent will capture Sentry error reports.

Kent takes 2 seconds to set up. You can run it locally:

$ pip install kent
$ kent-server run

You can run it in a Docker container. There's a sample Dockerfile in the repo.

It doesn't require databases, credentials, caching, or any of that stuff.

Kent stores things in-memory. You don't have to clean up after it.

Kent has a website letting you view errors with your browser.

Kent has an API letting you build integration tests that create the errors and then fetch them and assert things against them.

What questionable architectural decisions did you make?

I built it with Flask. Flask is great for stuff like this--that part is fine.

The part that's less fine is that I decided to put in the least amount of effort in standing it up as a service and putting it behind a real WSGI server, so I'm (ab)using Flask's cli and monkeypatching werkzeug to not print out "helpful" (but in this case--unhelpful) messages to the console.

I used pico.css because I read about it like yesterday and it seemed easier to use that than to go fiddling with CSS frameworks to get a really lovely looking site for a fake Sentry service.

I may replace that at some point with something that involves less horizontal space.

I only wrote one test. I have testing set up, but only wrote one test to make sure it's minimally viable. I may write more at some point.

I only tested with Python sentry-sdk. I figure if other people need it, they can let me know what else it works with and we can fix any issues that come up.

I decided to store errors in memory rather than persist things to disk. That was easy to do and seems like the right move. Maybe we'll hit something that requires us to do something different.

I named it Kent. I like short names. Friends suggested I name it Caerbannog because it was a sentry of a sort. I love that name, but I can't reliably spell it.

0.1.0 released!

I thought about making this 1.0.0, but then decided to put it into the world and use it for a bit and fix any issues that come up and then release 1.0.0.

Initial release with minimally viable feature set.

capture errors and keep them in memory
API endpoint to list errors
API endpoint to fetch error

Where to go for more

History of releases: https://github.com/willkg/kent/blob/main/HISTORY.rst

Source code, issue tracker, documentation, and quickstart here: https://github.com/willkg/kent

Let me know how this helps you!

I say that in a lot of my posts. "Let me know how this helps you!" or "Comment by sending me an email!" or something like that. I occasionally get a response--usually from Sumana--but most often, it's me talking to the void. I do an awful lot of work that theoretically positively affects thousands of people to be constantly talking to the void.

Let me know if you have positive or negative feelings about Kent by:

click on this link: https://github.com/willkg/kent/issues/3
add a reaction to the description which should be like two clicks

Socorro Engineering: 2021 retrospective

Wednesday December 22, 2021, Will Kahn-Greene | share this (mastodon)

Summary

2020h1 was rough and 2020h2 was not to be outdone. 2021h1 was worse in a lot of ways, but I got really lucky and a bunch of things happened that made 2021h2 much better. I'll talk a bit more about that towards the end.

But this post isn't about stymying the corrosion of multi-year burnout--it's a dizzying retrospective of Socorro engineering in 2021.

Eliot: retrospective (2021)

Monday November 15, 2021, Will Kahn-Greene | share this (mastodon)

Project

time:

1 year

impact:

reduced risk of Mozilla Symbols Server outage which affects symbols uploads from the build system
improves maintainability of symbolication service by offloading parsing of Breakpad symbols files and symbol lookups to external library that's used by other tools in crash reporting ecosystem at Mozilla
opens up possible futures around supporting inline functions and using other debug file types

Problem statement

Tecken is the project for the Mozilla Symbols Service. This service manages several things:

symbol upload API for uploading and storing debugging symbols generated by build systems for products like Firefox, Fenix, etc
download API for downloading symbols which could be located in a variety of different places supporting tools like Visual Studio, stackwalkers, profilers, symbolicators, etc
symbolication API for finding symbols for memory addresses

It also has a webapp for querying symbols, debugging symbols problems, managing API tokens, and granting permissions for uploading symbols.

All of those functions are currently handled by a single webapp service.

There are a few problems here.

First, we want to reduce risk of an outage for uploading symbols. When we have service outages, the build systems can't upload symbols. It tries really hard to upload symbols, so this increases the build times for Firefox and other products. On top of that, if the build system doesn't successfully upload symbols, any crashes in tests or channels result in unsymbolicated stacks which obscures the details of the crash.

There are several projects that are waiting in the eve to dramatically increase their use of the symbolication API which increases the likelihood of an outage with the service that affects symbol uploads.

Second, the existing symbolication API implementation is an independent implementation or sym file parsing, lookups, and symbolication. Whenever we make adjustments to how sym files are built, structured, or the lookup algorithms, we have to additionally update the symbolication API code.

Mozilla is in the process of rewriting crash reporting related code in Rust. It behooves us greatly to switch from our independent ipmlementation to a shared library.

Third, the symbolication API is missing some critical features like support for line numbers and inline functions. The existing code can't be extended to support either line numbers or inline functions--we need to rewrite it.

In September of 2020, I embarked on a project to break out the symbolication API as a separate microservice and implement it using the Symbolic library. That had the following effects:

eases the risk of outage due to increasing usage of the symbolication API,
adds support for line numbers and sets us up for adding support for inline functions, and
reduces the maintenance work because we'll be using a library used by other parts of the crash reporting ecosystem

This post covers that project.