html5lib-python 1.0 released!

, | Tweet this

html5lib-python v1.0 released!

Yesterday, Geoffrey released html5lib 1.0 [1]! The changes aren't wildly interesting.

The more interesting part for me is how the release happened. I'm going to spend the rest of this post talking about that.

[1] Technically there was a 1.0 release followed by a 1.0.1 release because the 1.0 release had issues.

The story of Bleach and html5lib

I work on Bleach which is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML. It relies heavily on another library called html5lib-python. Most of the work that I do on Bleach consists of figuring out how to make html5lib do what I need it to do.

Over the last few years, maintainers of the html5lib library have been working towards a 1.0. Those well-meaning efforts got them into a versioning model which had some unenthusing properties. I would often talk to people about how I was having difficulties with Bleach and html5lib 0.99999999 (8 9s) and I'd have to mentally count how many 9s I had said. It was goofy [2].

In an attempt to deal with the effects of the versioning, there's a parallel set of versions that start with 1.0b. Because there are two sets of versions, it was a total pain in the ass to correctly specify which versions of html5lib that Bleach worked with.

While working on Bleach 2.0, I bumped into a few bugs and upstreamed a patch for at least one of them. That patch sat in the PR queue for months. That's what got me wondering--is this project dead?

I tracked down Geoffrey and talked with him a bit on IRC. He seems to be the only active maintainer. He was really busy with other things, html5lib doesn't pay at all, there's a ton of stuff to do, he's burned out, and recently there have been spats of negative comments in the issues and PRs. Generally the project had a lot of stop energy.

Some time in August, I offered to step up as an interim maintainer and shepherd html5lib to 1.0. The goals being:

  1. land or close as many old PRs as possible
  2. triage, fix, and close as many issues as possible
  3. clean up testing and CI
  4. clean up documentation
  5. ship 1.0 which ends the versioning issues
[2] Many things in life are goofy.

Thoughts on being an interim maintainer

I see a lot of open source projects that are in trouble in the sense that they don't have a critical mass of people and energy. When the sole part-time volunteer maintainer burns out, the project languishes. Then the entitled users show up, complain, demand changes, and talk about how horrible the situation is and everyone should be ashamed. It's tough--people are frustrated and then do a bunch of things that make everything so much worse. How do projects escape the raging inferno death spiral?

For a while now, I've been thinking about a model for open source projects where someone else pops in as an interim maintainer for a short period of time with specific goals and then steps down. Maybe this alleviates users' frustrations? Maybe this gives the part-time volunteer burned-out maintainer a breather? Maybe this can get the project moving again? Maybe the temporary interim maintainer can make some of the hard decisions that a regular long-term maintainer just can't?

I wondered if I should try that model out here. In the process of convincing myself that stepping up as an interim maintainer was a good idea [3], I looked at projects that rely on html5lib [4]:

  • pip vendors it
  • Bleach relies upon it heavily, so anything that uses Bleach uses html5lib (jupyter, hypermark, readme_renderer, tensorflow, ...)
  • most web browsers (Firefox, Chrome, servo, etc) have it in their repositories because web-platform-tests uses it

I talked with Geoffrey and offered to step up with these goals in mind.

I started with cleaning up the milestones in GitHub. I bumped everything from the 0.9999999999 (10 9s) milestone which I determined will never happen into a 1.0 milestone. I used this as a bucket for collecting all the issues and PRs that piqued my interest.

I went through the issue tracker and triaged all the issues. I tried to get steps to reproduce and any other data that would help resolve the issue. I closed some issues I didn't think would ever get resolved.

I triaged all the pull requests. Some of them had been open for a long time. I apologized to people who had spent their time to upstream a fix that sat around for years. In some cases, the changes had bitrotted severely and had to be redone [5].

Then I plugged away at issues and pull requests for a couple of months and pushed anything out of the milestone that wasn't well-defined or something we couldn't fix in a week.

At the end of all that, Geoffrey released version 1.0 and here we are today!

[3] I have precious little free time, so this decision had sweeping consequences for my life, my work, and people around me.
[4] Recently, I discovered's pretty amazing project. They have a page for html5lib. I had written a (mediocre) tool that does vaguely similar things.
[5] This is what happens on projects that don't have a critical mass of energy/people. It sucks for everyone involved.

Conclusion and thoughts

I finished up as interim maintainer for html5lib. I don't think I'm going to continue actively as a maintainer. Yes, Bleach uses it, but I've got other things I should be doing.

I think this was an interesting experiment. I also think it was a successful experiment in regards to achieving my stated goals, but I don't know if it gave the project much momentum to continue forward.

I'd love to see other examples of interim maintainers stepping up, achieving specific goals, and then stepping down again. Does it bring in new people to the community? Does it affect the raging inferno death spiral at all? What kinds of projects would benefit from this the most? What kinds of projects wouldn't benefit at all?

Want to comment? Send an email to willkg at bluesock dot org. Include the url for the blog entry in your comment so I have some context as to what you're talking about.