Socorro: March 2019 happenings

Monday April 1, 2019, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in March.

Code of conduct: supporting in projects

Friday March 29, 2019, Will Kahn-Greene | share this (mastodon)

CODE_OF_CONDUCT.md

This week, Mozilla added PRs to all the repositories that Mozilla has on GitHub that aren't forks, Servo, or Rust. The PRs add a CODE_OF_CONDUCT.md file and also include some instructions on what projects can do with it. This standardizes inclusion of the code of conduct text in all projects.

I'm a proponent of codes of conduct. I think they're really important. When I was working on Bleach with Greg, we added code of conduct text in September of 2017. We spent a bunch of time thinking about how to do that effectively and all the places that users might encounter Bleach.

I spent some time this week trying to figure out how to do what we did with Bleach in the context of the Mozilla standard. This blog post covers those thoughts.

This blog post covers Python-centric projects. Hopefully, some of this applies to other project types, too.

What we did in Bleach in 2017 and why

In September of 2017, Greg and I spent some time thinking about all the places the code of conduct text needs to show up and how to implement the text to cover as many of those as possible for Bleach.

PR #314 added two things:

a CODE_OF_CONDUCT.rst file
a copy of the text to the README

In doing this, the code of conduct shows up in the following places:

the contents of the repository at https://github.com/mozilla/bleach
the rendered README text also at https://github.com/mozilla/bleach
the docs at https://bleach.readthedocs.io/
the PyPI page at https://pypi.org/project/bleach
in the source tarball of all releases
on the "new issue" page in GitHub [1]

In this way, users could discover Bleach in a variety of different ways and it's very likely they'll see the code of conduct text before they interact with the Bleach community.

The Mozilla standard

The Mozilla standard applies to all repositories in Mozilla spaces on GitHub and is covered in the Repository Requirements wiki page.

It explicitly requires that you add a CODE_OF_CONDUCT.md file with the specified text in it to the root of the repository.

This makes sure that all repositories for Mozilla things have a code of conduct specified and also simplifies the work they need to do to enforce the requirement and update the text over time.

This week, a bot added PRs to all repositories that didn't have this file. Going forward, the bot will continue to notify repositories that are missing the file and will update the file's text if it ever gets updated.

How to work with the Mozilla standard

Let's go back and talk about Bleach. We added a file and a blurb to the README and that covered the following places:

the contents of the repository at https://github.com/mozilla/bleach
the rendered README text also at https://github.com/mozilla/bleach
the docs at https://bleach.readthedocs.io/
the PyPI page at https://pypi.org/project/bleach
in the source tarball of all releases

With the new standard, we only get this:

the contents of the repository at https://github.com/mozilla/bleach
(maybe) in the source tarball of all releases

In order to make sure the file is in the source tarball, you have to make sure it gets added. The bot doesn't make any changes to fix this. You can use check-manifest to help make sure that's working. You might have to adjust your MANIFEST.in file or something else in your build pipeline--hence the maybe.

Because the Mozilla standard suggests they may change the text of the CODE_OF_CONDUCT.md file, it's a terrible idea to copy the contents of the file around your repository because that's a maintenance nightmare--so that idea is out.

It's hard to include .md files in reStructuredText contexts. You can't just add this to the long description of the setup.py file and you can't include it in a Sphinx project [2].

Greg and I chatted about this a bit and I think the best solution is to add minimal text that points to the CODE_OF_CONDUCT.md in GitHub to the README. Something like this:

Code of Conduct
===============

This project and repository is governed by Mozilla's code of conduct and
etiquette guidelines. For more details please see the `CODE_OF_CONDUCT.md
file <https://github.com/mozilla/bleach/blob/master/CODE_OF_CONDUCT.md>`_.

In Bleach, the long description set in setup.py includes the README:

def get_long_desc():
    desc = codecs.open('README.rst', encoding='utf-8').read()
    desc += '\n\n'
    desc += codecs.open('CHANGES', encoding='utf-8').read()
    return desc

...

setup(
    name='bleach',
    version=get_version(),
    description='An easy safe-list-based HTML-sanitizing tool.',
    long_description=get_long_desc(),
    ...

In Bleach, the index.rst of the docs also includes the README:

.. include:: ../README.rst

Contents
========

.. toctree::
   :maxdepth: 2

   clean
   linkify
   goals
   dev
   changes


Indices and tables
==================

* :ref:`genindex`
* :ref:`search`

In this way, the README continues to have text about the code of conduct and the link goes to the file which is maintained by the bot. The README is included in the long description of setup.py so this code of conduct text shows up on the PyPI page. The README is included in the Sphinx docs so the code of conduct text shows up on the front page of the project documentation.

So now we've got code of conduct text pointing to the CODE_OF_CONDUCT.md file in all these places:

the contents of the repository at https://github.com/mozilla/bleach
the rendered README text also at https://github.com/mozilla/bleach
the docs at https://bleach.readthedocs.io/
the PyPI page at https://pypi.org/project/bleach
in the source tarball of all releases

Plus the text will get updated automatically by the bot as changes are made.

Excellent!

Future possibilities

GitHub has a Community Insights page for each project. This is the one for Bleach. There's a section for "Code of conduct", but you only get a green checkmark if and only if you use one of GitHub's pre-approved code of conduct files.

There's a discussion about that in their forums.

Is this checklist helpful to people? Does it mean something to have all these items checked off? Is there someone checking for this sort of thing? If so, then maybe we should get the Mozilla text approved?

Hope this helps!

I hope to roll this out for the projects I maintain on Monday.

I hope this helps you!

Socorro: February 2019 happenings

Monday March 11, 2019, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in February.

Bleach: stepping down as maintainer

Friday March 1, 2019, Will Kahn-Greene | share this (mastodon)

What is it?

Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.

I'm stepping down

In October 2015, I had a conversation with James Socol that resulted in me picking up Bleach maintenance from him. That was a little over 3 years ago. In that time, I:

did 12 releases
improved the tests; switched from nose to pytest, added test coverage for all supported versions of Python and html5lib, added regression tests for xss strings in OWASP Testing Guide 4.0 appendix
worked with Greg to add browser testing for cleaned strings
improved documentation; added docstrings, added lots of examples, added automated testing of examples, improved copy
worked with Jannis to implement a security bug disclosure policy
improved performance (Bleach v2.0 released!)
switched to semver so the version number was more meaningful
did a rewrite to work with the extensive html5lib API changes
spent a couple of years dealing with the regressions from the rewrite
stepped up as maintainer for html5lib and did a 1.0 release
added support for Python 3.6 and 3.7

I accomplished a lot.

A retrospective on OSS project maintenance

I'm really proud of the work I did on Bleach. I took a great project and moved it forward in important and meaningful ways. Bleach is used by a ton of projects in the Python ecosystem. You have likely benefitted from my toil.

While I used Bleach on projects like SUMO and Input years ago, I wasn't really using Bleach on anything while I was a maintainer. I picked up maintenance of the project because I was familiar with it, James really wanted to step down, and Mozilla was using it on a bunch of sites--I picked it up because I felt an obligation to make sure it didn't drop on the floor and I knew I could do it.

I never really liked working on Bleach. The problem domain is a total fucking pain-in-the-ass. Parsing HTML like a browser--oh, but not exactly like a browser because we want the output of parsing to be as much like the input as possible, but as safe. Plus, have you seen XSS attack strings? Holy moly! Ugh!

Anyhow, so I did a bunch of work on a project I don't really use, but felt obligated to make sure it didn't fall on the floor, that has a pain-in-the-ass problem domain. I did that for 3+ years.

Recently, I had a conversation with Osmose that made me rethink that. Why am I spending my time and energy on this?

Does it further my career? I don't think so. Time will tell, I suppose.

Does it get me fame and glory? No.

Am I learning while working on this? I learned a lot about HTML parsing. I have scars. It's so nuts what browsers are doing.

Is it a community through which I'm meeting other people and creating friendships? Sort of. I like working with James, Jannis, and Greg. But I interact and work with them on non-Bleach things, too, so Bleach doesn't help here.

Am I getting paid to work on it? Not really. I did some of the work on work-time, but I should have been using that time to improve my skills and my career. So, yes, I spent some work-time on it, but it's not a project I've been tasked with to work on. For the record, I work on Socorro which is the Mozilla crash-ingestion pipeline. I don't use Bleach on that.

Do I like working on it? No.

Seems like I shouldn't be working on it anymore.

I moved Bleach forward significantly. I did a great job. I don't have any half-finished things to do. It's at a good stopping point. It's a good time to thank everyone and get off the stage.

What happens to Bleach?

I'm stepping down without working on what comes next. I think Greg is going to figure that out.

Thank you!

Jannis was a co-maintainer at the beginning because I didn't want to maintain it alone. Jannis stepped down and Greg joined. Both Jannis and Greg were a tremendous help and fantastic people to work with. Thank you!

Sam Snedders helped me figure out a ton of stuff with how Bleach interacts with html5lib. Sam was kind enough to deputize me as a temporary html5lib maintainer to get 1.0 out the door. I really appreciated Sam putting faith in me. Conversations about the particulars of HTML parsing--I'll miss those. Thank you!

While James wasn't maintaining Bleach anymore, he always took the time to answer questions I had. His historical knowledge, guidance, and thoughtfulness were crucial. James was my manager for a while. I miss him. Thank you!

There were a handful of people who contributed patches, too. Thank you!

Thank your maintainers!

My experience from 20 years of OSS projects is that many people are in similar situations: continuing to maintain something because of internal obligations long after they're getting any value from the project.

Take care of the maintainers of the projects you use! You can't thank them enough for their time, their energy, their diligence, their help! Not just the big successful projects, but also the one-person projects, too.

Shout-out for PyCon 2019 maintainers summit

Sumana mentioned that PyCon 2019 has a maintainers summit. That looks fantastic! If you're in the doldrums of maintaining an OSS project, definitely go if you can.

Changes to this blog post

Update March 2, 2019: I completely forgot to thank Sam Snedders which is a really horrible omission. Sam's the best!

Socorro: January 2019 happenings

Wednesday February 13, 2019, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

January was a good month. This blog post summarizes activities.

Everett v1.0.0 released!

Monday January 7, 2019, Will Kahn-Greene | share this (mastodon)

What is it?

Everett is a configuration library for Python apps.

Goals of Everett:

flexible configuration from multiple configured environments
easy testing with configuration
easy documentation of configuration for users

From that, Everett has the following features:

is composeable and flexible
makes it easier to provide helpful error messages for users trying to configure your software
supports auto-documentation of configuration with a Sphinx autocomponent directive
has an API for testing configuration variations in your tests
can pull configuration from a variety of specified sources (environment, INI files, YAML files, dict, write-your-own)
supports parsing values (bool, int, lists of things, classes, write-your-own)
supports key namespaces
supports component architectures
works with whatever you're writing--command line tools, web sites, system daemons, etc

v1.0.0 released!

This release fixes many sharp edges, adds a YAML configuration environment, and fixes Everett so that it has no dependencies unless you want to use YAML or INI.

It also drops support for Python 2.7--Everett no longer supports Python 2.

Why you should take a look at Everett

At Mozilla, I'm using Everett for Antenna which is the edge collector for the crash ingestion pipeline for Mozilla products including Firefox and Fennec. It's been in production for a little under a year now and doing super. Using Everett makes it much easier to:

deal with different configurations between local development and server environments
test different configuration values
document configuration options

It's also used in a few other places and I plan to use it for the rest of the components in the crash ingestion pipeline.

First-class docs. First-class configuration error help. First-class testing. This is why I created Everett.

If this sounds useful to you, take it for a spin. It's almost a drop-in replacement for python-decouple and os.environ.get('CONFIGVAR', 'default_value') style of configuration.

Enjoy!

Thank you!

Thank you to Paul Jimenez who helped fixing issues and provided thoughtful insight on API ergonomics!

Where to go for more

For more specifics on this release, see here: https://everett.readthedocs.io/en/latest/history.html#january-7th-2019

Documentation and quickstart here: https://everett.readthedocs.io/en/latest/

Source code and issue tracker here: https://github.com/willkg/everett

Socorro in 2018

Thursday January 3, 2019, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

2018 was a big year for Socorro. In this blog post, I opine about our accomplishments.

Socorro: December 2018 happenings

Wednesday January 2, 2019, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

At Mozilla, December is a rough month to get anything done, but we accomplished a bunch anyways!

Socorro: migrating to Python 3: retrospective (2018)

Friday December 14, 2018, Will Kahn-Greene | share this (mastodon)

Project

scope:: Socorro and Mozilla sites
period:: 2 years
impact:: migrated Socorro to Python 3 and paved the way for other Mozilla sites

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the Breakpad crash reporter asks the user if the user would like to send a crash report. If the user answers "yes!", then the Breakpad crash reporter collects data related to the crash, generates a crash report, and submits that crash report as an HTTP POST to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post talks about the project migrating Socorro to Python 3. It covers the incremental steps we did and why we chose that path plus some of the technical problems we hit.

Socorro: November 2018 happenings

Saturday December 1, 2018, Will Kahn-Greene | share this (mastodon)

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the Breakpad crash reporter asks the user if the user would like to send a crash report. If the user answers "yes!", then the Breakpad crash reporter collects data related to the crash, generates a crash report, and submits that crash report as an HTTP POST to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

November was another busy month! This blog post covers what happened.