Dennis v0.5 released! New lint rules, new template linter, bunch of fixes, and now a service!

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions

  • a template linter for finding problems in strings in .pot files that make translator's lives difficult

  • a statuser for seeing the high-level translation/error status of your .po files

  • a translator for strings in your .po files to make development easier

v0.5 released!

Since the last release announcement, there have been a handful of new lint rules added:

  • W301: Translation consists of just white space

  • W302: The translation is the same as the original string

  • W303: There are descrepancies in the HTML between the original string and the translated string

Additionally, there's a new template linter for your .pot files which can catch things like:

  • W500: Strings with variable names like o, O, 0, l, 1 which can be hard to read and are often replaced with a similar looking letter by the translator.

  • W501: One-character variable names which don't give translators enough context about what's being translated.

  • W502: Multiple unnamed variables which can't be reordered because the order the variables are expanded is specified outside of the string.

Dennis in action

Want to see Dennis in action, but don't want to install Dennis? I threw it up as a service, though it's configured for SUMO: http://dennis-sumo.paas.allizom.org/

Where to go for more

For more specifics on this release, see here: http://dennis.readthedocs.org/en/latest/changelog.html#version-0-5-august-24th-2014

Documentation and quickstart here: http://dennis.readthedocs.org/en/v0.5/

Source code and issue tracker here: https://github.com/willkg/dennis

Source code and issue tracker for Denise (Dennis-as-a-service): https://github.com/willkg/denise

3 out of 5 summer interns use Dennis to improve their posture while pranking their mentors.

Input status: August 19th, 2014

Development

High-level summary:

It's been a slower two weeks than normal, but we still accomplished some interesting things:

  • L Guruprasad finished cleaning up the Getting Started guide--that work helps all future contributors. He did a really great job with it. Thank you!

  • Landed a minor rewrite to rate-limiting/throttling.

  • Redid the Elasticsearch indexing admin page.

  • Fixed some Heartbeat-related things.

Landed and deployed:

  • cf2e0e2 [[bug 948954]] Redo index admin

  • f917d41 Update Getting Started guide to remove submodule init (L. Guruprasad)

  • 5eb6d6d Merge pull request #329 from lgp171188/peepify_submodule_not_required_docs

  • c168a5b Update peep from v1.2 to v1.3

  • adf7361 [[bug 1045623]] Overhaul rate limiting and update limits

  • 7647053 Fix response view

  • f867a2d Fix rulename

  • 8f0c36e [[bug 1051214]] Clean up DRF rate limiting code

  • 0f0b738 [[bug 987209]] Add django-waffle (v0.10)

  • b52362a Make peep script executable

  • 461c503 Improvie Heartbeat API docs

  • 8f0ccd3 [[bug 1052460]] Add heartbeat view

  • d1604f0 [[bug 1052460]] Add missing template

Landed, but not deployed:

  • ed2923f [[bug 1015788]] Cosmetic: flake8 fixes (analytics)

  • afdfc6a [[bug 1015788]] Cosmetic: flake8 fixes (base)

  • 05e0a33 [[bug 1015788]] Cosmetic: flake8 fixes (feedback)

  • 2d9bc26 [[bug 1015788]] Cosmetic: flake8 fixes (heartbeat)

  • dc6e990 Add anonymize script

Current head: dc6e990

Rough plan for the next two weeks

  1. Working on Dashboards-for-everyone bits. Documenting the GET API. Making it a bit more functional. Writing up some more examples. (https://wiki.mozilla.org/Firefox/Input/Dashboards_for_Everyone)

  2. Update Input to ElasticUtils v0.10 ([bug 1055520])

  3. Land all the data retention policy work ([bug 946456])

  4. Gradients (https://wiki.mozilla.org/Firefox/Input/Gradient_Sentiment)

  5. Product administration views ([bug 965796])

Most of that is in some state of half-done, so we're going to spend the next couple of weeks focusing on finishing things.

What I need help with

  1. (django) Update to django-rest-framework 2.3.14 ([bug 934979]) -- I think this is straight-forward. We'll know if it isn't if the tests fail.

  2. (django, cookies, debugging) API response shouldn't create anoncsrf cookie ([bug 910691]) -- I have no idea what's going on here because I haven't looked into it much.

  3. (html) Fixing the date picker in Chrome ([bug 1012965]) -- The issue is identified. Someone just needs to do the fixing.

For details, see our GetInvolved page:

https://wiki.mozilla.org/Webdev/GetInvolved/input.mozilla.org

If you're interested in helping, let me know! We hang out on #input on irc.mozilla.org and there's the input-dev mailing list.

Additional thoughts

We're in the process of doing a Personally Identifiable Information audit on Input, the systems it's running on and the processes that touch and move data around. This covers things like "what data are we storing?", "where is the data stored?", "who/what has access to that data?", "does that data get copied/moved anywhere?", "who/what has access to where the data gets copied/moved to?", etc.

I think we're doing pretty well. However, during the course of the audit, we identified a few things we should be doing better. Some of them already have bugs, one of them is being worked on already and the others need to be written up.

Some time this week, I'll turn that into a project and write up missing bugs.

That's about it!

Input status: August 4th, 2014

Summary

This is the status report for development on Input. I publish a status report to the input-dev mailing list every couple of weeks or so covering what was accomplished and by whom and also what I'm focusing on over the next couple of weeks. I sometimes ruminate on some of my concerns. I think one time I told a joke.

Work summary

  • Continued work on the development environment with awesome help from L. Guruprasad

  • Implement the bits required to support Loop

  • Add smoketest coverage for Firefox OS feedback form

  • Fix some technical debt issues

Development

Landed and deployed:

  • 6097571 [[bug 1040919]] Change version to only have one dot

  • dcf8f91 [[bug 1040919]] Fix .0.0 versions to .0

  • 4012933 Remove CEF-related things

  • 4887c49 [[bug 1030905]] Add product/version tests

  • 1bdd3a8 [[bug 1042222]] update Vagrantfile to use 14.04 LTS (L. Guruprasad)

  • c1bdfbb Add L. Guruprasad to CONTRIBUTORS

  • 3e607a5 [[bug 1042560]] Remove unused packages in dev VM (L. Guruprasad)

  • c0759d1 Update django-mozilla-product-details

  • bcf8ec8 Update product details

  • 58579ab [[bug 987801]] peep-ify requirements

  • 248b0ab [[bug 987801]] Exit if requirements are mismatched

  • 5db57bc [[bug 987801]] Redo how we figure out whether to use vendor/

  • df8400d Fix sampledata generation to use bulk create

  • fa0caac [[bug 1041622]] Capture querystring slop

  • 98745b5 [[bug 1021155]] Add basic FirefoxOS smoketest

  • 6cf9e39 [[bug 1041664]] Capture slop in Input API

  • ff4b711 Show context in response view

  • 976ebbf [[bug 1045942]] Add category to response table

  • 34622ad Add docs for category and extra context for Input API

  • d1a8dd5 Add an additional note about keys

  • fec1319 [[bug 998726]] Quell Django warnings

  • 288afe4 Rename fjord/manage.py to fjord/manage_utils.py

  • d40e8a9 Update dennis to v0.4.3

Landed, but not deployed:

Current head: 28cd90f

Over the next two weeks

  1. Keep an eye out for any Loop or Heartbeat related work--that's top priority.

  2. Work on gradient support and product picker support

What I need help with

  1. (Google chrome, JavaScript, CSS, HTML) Investigate what's wrong with the date picker in Chrome and fix it [[bug 1012965]]

  2. Test out the Getting Started instructions. We've had a few people go through these already, but it's definitely worth having more eyes. http://fjord.readthedocs.org/en/latest/getting_started.html

If you're interested in helping, let me know! We hang out on #input on irc.mozilla.org and there's the input-dev mailing list.

That's it!

Input status: July 20th, 2014

Summary

This is the status report for development on Input. I publish a status report to the input-dev mailing list every couple of weeks or so covering what was accomplished and by whom and also what I'm focusing on over the next couple of weeks. I sometimes ruminate on some of my concerns. I think one time I told a joke.

Last status report was at the end of June. This status report covers the last few things we landed in 2014q2 as well as everything we've done so far in 2014q3.

Development

Landed and deployed:

  • 6ecd0ce [[bug 1027108]] Change default doc theme to mozilla sphinx (Anna Philips)

  • 070f992 [[bug 1030526]] Add cors; add api feedback get view

  • f6f5bc9 [[bug 1030526]] Explicitly declare publicly-visible fields

  • c243b5d [[bug 1027280]] Add GengoHumanTranslater.translate; cleanup

  • 3c9cdd1 [[bug 1027280]] Add human tests; overhaul Gengo tests

  • ff39543 [[bug 1027280]] Add support for the Gengo sandbox

  • 258c0b5 [[bug 1027280]] Add test for get_balance

  • 44dd8e5 [[bug 1027280]] Implement Gengo Human push_translations

  • 35ae6ec [[bug 1027280]] Clean up API code

  • a7bf90a [[bug 1027280]] Finish pull_translations and tests

  • c9db147 [[bug 1027286]] Gengo translation system status

  • f975f3f [[bug 1027291]] Implement spot Gengo human translation

  • f864b6b [[bug 1027295]] Add translation_sync cron job

  • c58fd44 [[bug 1032226]] en-GB should copyover, too

  • 7480f87 [[bug 1032226]] Tweak the code to be more defensive

  • 7ac1114 [[bug 1032571]] CSRF exempt the API

  • ac856eb [[bug 1032571]] Fix tests to catch csrf issues in the api

  • 74e8e09 [[bug 1032967]] Handle unsupported language pairs

  • 74a409e [[bug 1026503]] First pass at vagrantification

  • a7a440f Continued working on docs; ditched hacking howto

  • 44e702b [[bug 1018727]] Backfill translations

  • 69f9b5b Fix date_end issue

  • e59d4f6 [[bug 1033852]] Better handle unsupported src languages

  • cc3c4d7 Add list of unsupported languages to admin

  • 32e7434 [[bug 1014874]] Fix translate ux

  • 672abba [[bug 1038774]] Hide responses from hidden products

  • e23eca5 Fix a goof in the last commit

  • 6f78e2e [[bug 947767]] Nix authentication for API stuff

  • a9f2179 Fix response view re: non-existent products

  • e4c7c6c [Bug 1030905] fjord feedback api tests for dates (Ian Kronquist)

  • 0d8e024 [[bug 935731]] Add FactoryBoy

  • 646156f Minor fixes to the existing API docs

  • f69b58b [[bug 1033419]] Heartbeat backend prototype

  • f557433 [[bug 1033419]] Add docs for heartbeat posting

Landed, but not deployed:

  • 7c7009b [[bug 935731]] Switch all tests to use FactoryBoy

  • 2351fb5 Generate locales so ubuntu will quite whining (Ian Kronquist)

Current head: 7ea9fc3

High-level

At a high level, this is:

  1. Landed automated Gengo human translation and a bunch of minor fixes to make it work more smoothly.

  2. Reworked how we build development environments to use vagrant. This radically simplifies the instructions and should make it a lot easier for contributors to build a development environment. This in turn should lead to more people working on Input.

  3. Fixed a bug where products marked as "hidden" were still showing up in the dashboard.

  4. Implemented a GET API for Input responses. (https://wiki.mozilla.org/Firefox/Input/Dashboards_for_Everyone)

  5. Implemented the backend for the Heartbeat prototype. (https://wiki.mozilla.org/Firefox/Input/Heartbeat)

  6. Also, I'm fleshing out the Input section in the wiki complete with project plans. (https://wiki.mozilla.org/Firefox/Input)

Over the next two weeks

  1. Continue fleshing out project plans for in-progress projects on the wiki.

  2. Gradient sentiment and product picker work.

What I need help with

  1. We have a new system for setting up development environments. I've tested it on Linux. Ian has, too (pretty sure he's using Linux). We could use some help testing it on Windows and Mac OSX.

Do the instructions work on Windows? Do the instructions work on Mac OSX? Are there important things the instructions don't cover? Is there anything confusing?

http://fjord.readthedocs.org/en/latest/getting_started.html

  1. I'm changing the way I'm managing Fjord development. All project plans will be codified in the wiki. A rough roadmap of which projects are on the drawing board, in-progress, completed, etc is also on the wiki. I threw together a structure for all of this that I think is good, but it could use some review.

Do these project plans provide useful information? Are there important questions that need answering that the plans do not answer?

https://wiki.mozilla.org/Firefox/Input

If you're interested in helping, let me know! We hang out on #input on irc.mozilla.org and there's the input-dev mailing list.

I think that covers it!

Input: 2014q2 post-mortem

I'm going to start doing quarterly post-mortems for Input development. The goal is to be more communicative about what happened, why, what's in the works and what I need more help with.

NB: "Fjord" is the name of the codebase that runs Input.

Bug and git stats

Bugzilla
========

Bugs created:        63
Bugs fixed:          54

git
===

Total commits: 151

      Will Kahn-Greene :   143  (+15123, -4602, files 446)
        ossreleasefeed :     3  (+197, -42, files 9)
          Joshua Smith :     2  (+65, -31, files 5)
          Anna Philips :     1  (+367, -3, files 12)
     Swarnava Sengupta :     1  (+2, -2, files 1)
         Ricky Rosario :     1  (+0, -0, files 0)

Total lines added: 15754
Total lines deleted: 4680
Total files changed: 473

We added a lot of lines of code this quarter:

  • April 1st, 2014: 15195 total, 6953 Python

  • July 1st, 2014: 20456 total, 9247 Python

That's a pretty big jump in LOC. I think a bunch of that is the translation-related changes.

Contributor stats

5 non-core people contributed to Fjord development.

I spent some time over the weekend finishing up Vagrant provisioning script and rewriting the docs. I'm planning to spend some more time in 2014q3 reducing the complexity and barriers for setting up a Fjord development environment to the point where someone can contribute.

Additionally, I'm planning to create more bugs that are contributor-friendly. I started doing that in the last week. I think a good goal for Input is to have around 20 contributor-y bugs hanging around at any given time.

Accomplishments

Site health dashboard: I wrote a mediocre site health dashboard that's good enough to give me a feel for how the site is performing before and after a deployment. This still needs some work, but I'll schedule that for a rainy day.

Client side smoke tests: I wrote smoke tests for the client side. I based it on the defunct input-tests code that QA was maintaining up until we rewrote Input. There are still a bunch of tests that I want to write to have a better coverage of things, but having something is way better than nothing. I'm hoping the smoke tests will reduce the amount of manual testing I'm doing, too.

Vagrant: I took some inspiration from Erik Rose and DXR and wrote a Vagrant provisioning shell script. This includes a docs overhaul as well. This work is almost done, but needs some more testing and will probably land in the next week or two. This will make peoples' lives easier.

Automated translation system (human and machine): I wrote an automated translation system. It's generalized so that it isn't model/field specific. It's also generalized so that we can add plugins for other translation systems. It's currently got plugins for Dennis, Gengo machine translation and Gengo human translation. I turned the automated human translation on yesterday and it seems to be working well. That was a HUGE project. I'm glad it's done.

One thing it includes is a lot of auditing and metrics gathering. This will make it possible for me to go back in time and look at how the translation system worked on various Input feedback responses and hone the system going forward to reduce the number of human translations we're doing and also reduce the number of problems we have doing them.

Better query syntax: We were upgraded to Elasticsearch 0.90.10. I switched the query syntax for the dashboard search field to use Elasticsearch simple_query_string. That allows users to express search queries they weren't previously able to express.

utm_source and utm_campaign handling: I finished the support for handling utm_source and utm_campaign querystring parameters. This allows us to differentiate between organic feedback and non-organic feedback.

More like this: I added a "more like this" section to the response view. This makes it possible for UA analyzers to look at a response and see other responses that are similar.

Dashboards for you, dashboards for everyone!

I'm putting this in its own section because it's intriguing. I'll write another blog post about it later in July as things gel.

On Thursday, a couple of days after d3 training that Matt organized, I threw together a better GET API for Input feedback responses. It's not documented, it probably has some bugs and it's probably going to change a bit, but the gist of it is that it lets you more easily build a dashboard that meets your needs against live Input data.

Here's a proof-of-concept:

http://bl.ocks.org/willkg/c4d5a272f86ae4510750

That's looking at live Input data using the new GET API. The code is in a GitHub gist. It auto-updates every 2 minutes.

The problem is that I've got a ton of Input work to do and I just can't write dashboard code on Input fast enough. Further, of the people I've talked to that use the front page dashboard, they all have really different questions they're asking of the data. I'm hoping this alleviates that bottleneck by letting you and everyone else write dashboards that meet your needs.

I encourage you to take my proof-of-concept, fork the gist, tweak it, use bl.ocks.org or something to "host" the gist. Build the dashboard that answers your questions. Share it with other people. Plus, let me know about it. If you have issues with the API, submit a bug and tell me.

If this scratches the itch I think needs scratching, it should result in a bunch of interesting dashboards. If that happens, I'll write some code in Input to create a curated list of them so people can find them more easily.

Summary

This was a really nuts quarter and parts of it really sucked, but we got a lot accomplished and we laid some groundwork for some really interesting things for 2014q3.

Update April 21st, 2015

LGuruprasad found a bug in the script that caused commits-by-author information to be wrong. Fixed the script and updated the stats!

Input status: June 23rd, 2014

I publish a status report to the input-dev mailing list every couple of weeks or so covering what was accomplished and by whom and also what I'm focusing on over the next couple of week. I sometimes ruminate on some of my concerns. I think one time I told a joke.

Since the last report:

Landed and deployed:

Landed, but not deployed:

  • c348989 Add bug triaging for new contributors section

  • 5b7dc67 Add Gengo API tests and skip_if infrastructure

  • 98d30fb [[bug 1026131]] Add Gengo human translations bookkeeping

  • 38d8584 [[bug 1026131]] Rework translations system logging code

  • 1d9e67a [[bug 1027293]] Add audit records to response view

HEAD: 1d9e67a

Mostly I spent the last couple of weeks working on automated Gengo human translation support. This involved some infrastructure rewriting plus some additional infrastructure so that when we push all this out, we can see what's going on as it is happening.

Additionally, I went through and updated the mentor metadata for mentored bugs, added a bunch of new mentored bugs and worked with two potential contributors on them.

Over the next week (last week in 2014q2):

  1. finish up automated Gengo human translation work

First thing in 2014q3, I'll spend some time "opening up" the development side of the project. This will make it easier/possible to follow and participate in development. I'm still figuring out some of the details and it's likely I'll continue to change how things work over the course of the quarter, but plan to follow advice from the Community Building team and Erik Rose who seems to be doing really super with DXR.

(updated) Using the bug_mentor field with the Bugzilla REST API to get mentored bugs

In my previous post, I mentioned how Bugzilla grew a mentor field and had some code on how to query Bugzilla using the old and new APIs for the list of mentored bugs. This updates that information.

Gerv and Byron pointed out there's a isnotempty operator that's better to use than the way I cobbled together to query for bugs that have data in the bug_mentor field.

Thus, the code should look like this:

import requests

# Using the old BzAPI: https://wiki.mozilla.org/Bugzilla:REST_API
r = requests.get(
    'https://api-dev.bugzilla.mozilla.org/latest' +
    '/bug?product=Input&f1=bug_mentor&o1=isnotempty'
)
data = r.json()
print len(data['bugs'])  # Prints 9


# Using the new BMO API. https://wiki.mozilla.org/BMO/REST
r = requests.get(
    'https://bugzilla.mozilla.org/rest' +
    '/bug?product=Input&f1=bug_mentor&o1=isnotempty'
)
data = r.json()
print len(data['bugs'])  # Prints 9

Using the bug_mentor field with the Bugzilla REST API to get mentored bugs

Updated June 23rd, 2014:

There's a better way to query the data. See the update blog post.

Bugzilla grew a mentor field recently. This is really fantastic as it solves some interesting problems and makes it easier to track various aspects of mentoring which have been previously difficult to track. Yay to everyone involved in making that happen!

Migrating from the old way (sticking [mentor=xxx] in the whiteboard field) to the new way caused a problem that I spent a while working on today. I heard reports of other people having the same problem, hence this blog post.

There are a bunch of Bugzilla-symbiotic systems which would show a list of mentored bugs by checking to see if the string mentor= was in the whiteboard field. That no longer works. Instead we have to check to see if the bug_mentor field is empty. However, this is difficult to express with the old Bugzilla REST API (BzAPI).

The bug_mentor field is unique in that it holds email addresses which have the @ in them. So we can (ab)use this property by seeing if the bug_mentor field contains the @ character.

I did this with the GetInvolved/input.mozilla.org page. Here is the diff in case that's helpful.

Here's some Python that shows this with the old BzAPI and the new BMO API which pulls mentored bugs for Input:

import requests

# Using the old BzAPI: https://wiki.mozilla.org/Bugzilla:REST_API
r = requests.get(
    'https://api-dev.bugzilla.mozilla.org/latest' +
    '/bug?product=Input&bug_mentor_type=contains&bug_mentor=@'
)
data = r.json()
print len(data['bugs'])  # Prints 9


# Using the new BMO API. https://wiki.mozilla.org/BMO/REST
r = requests.get(
    'https://bugzilla.mozilla.org/rest' +
    '/bug?product=Input&bug_mentor_type=substring&bug_mentor=@'
)
data = r.json()
print len(data['bugs'])  # Prints 9

Fiddling with Kibana

I just kicked off a script that's going to take around 4 hours to complete mostly because the API it's running against doesn't want me doing more than 60 requests/minute. Given I've got like 13k requests to do, that takes a while.

I'm (ab)using Elasticsearch to store the data from my script so that I can analyze it more easily--terms facet is pretty handy here.

Given that I've got some free time now, I spent 5 minutes setting up Kibana.

Steps:

  1. download the tarball

  2. untar it into a directory

  3. edit kibana-3.0.1/config.js to point to my local Elasticsearch cluster (the defaults were fine, so I could have skipped this step)

  4. cd kibana-3.0.1/ and run python -m SimpleHTTPServer 5000 (I'm using a Python-y thing here, but you can use any web-server)

  5. point my browser to http://localhost:5000

Now I'm using Kibana.

Now that I've got it working, first thing I do is click on the cog in the upper right hand corner, click on the Index tab and change the index to the one I wanted to look at. Now I'm looking at the data my script is producing.

The Kibana site says Kibana excels at timestamped data, but I think it's helpful for what I'm looking at now despite it not being timestamped. I get immediate terms facets on the fields for the doc type I'm looking at. I can run queries, pick specific columns, reorder, do graphs, save my dashboard to look at later, etc.

If you're doing Elasticsearch stuff, it's worth looking at if only to give you another tool to look at data with.

Input: changed query syntax across the site

Better search syntax is here!

Yesterday I landed the changes for bug 986589 which affects all the search boxes and search feeds on Input. Now they use the Elasticsearch simple-query-search query instead of the hand-rolled query parser I wrote.

This was only made possible in the last month after we were updated from Elasticsearch 0.20.6 (or whatever it was) to 0.90.10.

Tell me more about this ... syntax.

I'm pretty psyched! It's pretty much the minimum required syntax for useful searching. It's kind of silly it took a year to get to this point, but so it goes.

To quote the Elasticsearch 0.90 documentation:

+  signifies AND operation
|  signifies OR operation
-  negates a single token
"  wraps a number of tokens to signify a phrase for searching
*  at the end of a term signifies a prefix query
( and )  signify precedence

Negation and prefix were the two operators my hand-rolled query parser didn't have.

What does this mean for you?

It means that you need to use the new syntax for searches on the dashboard and other parts of the site.

Further, this affects feeds, so if you're using the Atom feed, you'll probably need to update the search query there, too.

Also, we added a ? next to search boxes which links to a wiki page that documents the syntax with examples. It's a wiki page, so if the documentation is subpar or it's missing examples, feel free to let me know or fix it yourself.