Google Summer of Code 2007: PyBlosxom ... finale

Thursday August 23, 2007 20:32, Will Kahn-Greene | Tweet this

I mentored Z who was working on pyblosxom-webfront which is a web-based interface to your PyBlosxom blog.

Overall I'm pretty happy with the project. I had a pretty crazy May and June that definitely affected the first half of the project, but Z and I had a few chats just before the second half and ironed most of the issues out.

While he was working on webfront, I rolled out PyBlosxom 1.4 (and 1.4.1 and 1.4.2) which have support for Paste. Paste makes writing plugins and testing them _so_ much easier. Z also worked out some problems with making complex plugins. We should look into refactoring the comments plugin accordingly.

From here Z says he'd like to continue working on webfront and maintaining it. There are a bunch of things that it's missing, but it's a good platform to build on and it was a good experience to work with him to get there.

Thank you Google!

Migrating tickets in Trac to bugs in Bugzilla

Thursday August 23, 2007 20:25, Will Kahn-Greene | Tweet this

I spent a large portion of the last few weeks at PCF building a migration script to migrate tickets from our Trac instance to bugs in our Bugzilla instance.

I started writing SQL scripts, but then it got too hairy because there are a bunch of Trac ticket fields that have no constraints and translating them to Bugzilla equivalents required mappings and temp tables ... I abandoned that approach pretty quickly and wrote the migration script in Python.

The outcome of the migration is pretty decent. We've spent time fixing the data in Bugzilla after the migration, but I don't think there's a way to do a perfect migration because of the nature of the two bug systems.

I thought the project was interesting and mentioned it to a few people. The most common thing people respond with when hearing I was working on migrating our bug data from Trac to Bugzilla is, "What??? WHY?!?!" and their eyes would open wide with shock. I think Bugzilla has an undeserved bad rap.

The scripts are here (participatoryculture.org) if anyone else with similar plans is looking for them.

As a side note, the Python Database API specification PEP is fantastic--anyone who contributed to it should get a gold star.

PyPi, Cheesecake, and such

Monday July 23, 2007 20:50, Will Kahn-Greene | Tweet this

I'm really late to the PyPi party. Both of the projects I've worked on over the years are in PyPi, but neither are up to date and neither support easy_install.

Figured I'd update them now. Three things I bumped into while working on Lyntin:

the PyPi tutorial is pretty good and walks you through what you need to know,
the list of classifiers is here, and
the Cheesecake folks have scores for all the PyPi modules here

That last one is kind of interesting. You can see the scores and the Cheesecake log output.

PyBlosxom 1.4 released

Monday July 02, 2007 15:29, Will Kahn-Greene | Tweet this

It's been 17 months or so since the last PyBlosxom release which for a small project is probably too long a period of time. Real life is a harsh mistress sometimes.

Changes:

bunch of bug-fixes
converted the documentation from docbook to restructured text
updates to the documentation
code overhaul
added the beginnings of unit/functional testing using nose
better WSGI support
Paste support

w00t!

GPL 3 released

Friday June 29, 2007 14:17, Will Kahn-Greene | Tweet this

I went to the FSF GPL 3 release event and it was really interesting. There's been a lot of discussion and heated exchanges over the GPL 3, but I was in grad school and wasn't really paying much attention. Listening to the discussions at the event about GPL version 3 was really enlightening.

Personally, I'm excited about the license. It allays some fears I've always had as a developer. Going foward, I'll be putting the majority of my work under the GPL version 3 license.

PyBlosxom status: 06/20/2007

Wednesday June 20, 2007 21:39, Will Kahn-Greene | Tweet this

I quietly released an RC2 for PyBlosxom 1.4 today after merging in Yury's patch for Paste support and merging in Steven's work for WSGI support. It's really awesome to be able to do paster serve blog.ini and have things work. At a minimum, it'll be a lot easier to test PyBlosxom--no mucking about with web-server configurations needed anymore.

We've finished moving all the documentation from docbook to reST format. It still needs a lot of work, but we're making measurable progress which is really cool. Also, the documentation is easier to work with and maybe that reduces the amount of energy it takes for people to help out.

PyBlosxom 1.4 should be released soon. Very soon. Will it be perfect? No, but it's a huge milestone for the project. That's pretty exciting in the grand scheme of things.

06/24/2007 - Fixed "server" to "serve". I keep typing paster server ... which is wrong.

Using exiftool and Python to fix photos (edit: to order them)

Wednesday June 20, 2007 10:50, Will Kahn-Greene | Tweet this

S and I decided to get a wedding photographer in addition to allowing our guests to take as many photos as they wanted of all aspects of our wedding (except when we were getting dressed [and ... undressed]). There were a few reasons for this one of which being the several horror stories we've heard about people's digital media dying causing them to lose all pictures of their wedding. Ick.

The problem is that there are a fajillion pictures and it's really hard to order them into a single consistent timeline. The wedding photographer we had¹ had four cameras and took some 800 pictures. My dad took another 100 or so. Other people took a bunch, too. Right now I'm working with 1200+ pictures all of which are pretty big (between 5 MB and 10 MB each). It's not feasible to tweak them all by hand to order them. I didn't want to leave them unordered--my soul shudders at that thought. I needed a way to do batch processing to reorder pictures from a bunch of cameras into a nice timeline.

More after the break...

BREAK

First thing I did was put all the pictures that had no EXIF information to the side. There weren't many of them and I don't see any way to batch process them.

We're now left with 1000+ pictures all of which have EXIF information that we can work with. Interesting properties of the problem:

all of the pictures have DateTimeOriginal and SerialNumber headers in the EXIF data
pictures from a single camera have a consistent timeline in the sense that if picture 2 from camera A comes before picture 3 from camera A by 14 seconds, then that's what happened
the time from all the pictures from camera A are a constant offset from all the pictures from camera B

This is all pretty obvious and there's nothing exciting here, but it does reduce the problem to picking a camera as a baseline and then figuring out how far off in seconds and minutes the other cameras are from the baseline. That's pretty easy.

First thing I do is get exiftool and run it like this to rename all the files according to their timestamp and serialnumber:

  exiftool '-FileName

  Then I copied all the files into a thumbs directory and
  in the thumbs directory I used mogrify which comes with 
  ImageMagick
  to create thumbnails:

  for i in `ls *.jpg`; do mogrify -quality 65 -geometry 150; done


  Then I did this to figure out all the serial numbers of the cameras
  that took these photos:

  exiftool -SerialNumber *.jpg | sort -u


  Then I wrote a Python script (any language will do) to build an index
  of the images using timeline offsets:

import os, time, datetime

# serial number -> (offset(minute, second), color)
OFFSETS = { "1020415017": ((0, 37), "#ff5555"), # red
            "1420918126": ((1, 11), "#ff55ff"), # purple
            "1621009923": ((1, 9), "#55ff55"),  # green
            "620306618": ((0, 0), "#5555ff") }  # blue (baseline)

def getinfo(fn):
   t = fn.split("-", 1)[1]
   t = t.split("_", 1)

   cam = t[1]
   cam = cam.split(".")[0]

   t = t[0]

   print fn, t

   t = time.strptime(t, "%H%M%S")

   # hardcoded year, month, and day
   t = datetime.datetime(2007, 05, 26, t[3], t[4], t[5])

   offset = OFFSETS[cam][0]

   t = t - datetime.timedelta(0, offset[1], 0, 0, offset[0])
   return (t, cam, fn)

def build():
   files = os.listdir(os.getcwd())
   files = [f for f in files if f.endswith(".jpg")]

   pics = [ getinfo(fn) for fn in files ]

   pics.sort()

   out = open("index.html", "w")
   out.write("<html><body><table border=\"1\">\n")

   for t, cam, fn in pics:
      out.write("""
<tr><td>
  <table>
    <tr><td bgcolor="#aaaaaa">name</td><td>%s</td></tr>
    <tr><td bgcolor="#aaaaaa">camera</td><td bgcolor="%s">%s</td></tr>
    <tr><td bgcolor="#aaaaaa">datestamp</td><td>%s</td></tr>
  </table>
</td><td><img src="%s"></td></tr>""" % (fn, offsets[cam][1], cam, repr(t), fn))

   out.write("</table></body></html>")
   out.close()

if __name__ == "__main__":
   build()


  Note that I color-code the cameras.  I find this makes it really easy
  to eyeball the timeline without trying to distinguish between 
  similar-looking serial numbers.


  I run that in my thumbs directory and it builds an index.html
  page that I can look at with a web-browser.  The index.html has
  the offsets factored in.  I look through the pictures and tweak the 
  offsets until all the cameras are consistent with one timeline.  Once I 
  have a final set of offsets, I go through the pictures for each camera 
  and (very carefully) do this:

  exiftool "-AllDates-=0:0:0 0:M:S" *SN.jpg
                               ^ ^   ^^


  replacing:

M with the minute offset,
S with the second offset, and
SN with the camera serial number

Then you do another pass at renaming files and the files should then be in the same consistent timeline and in alphabetical order by filename.

After I worked out the process it took a couple of hours. We had the advantage of having a couple of points during the wedding where a lot of photographs were taken and it was obvious as to what order they needed to be in.

[1] - http://www.jillgoldman.com/ -- Jill is awesome.

06/21/2007 - Changed the title to something more appropriate. I was thinking "fix" because I was modifying the EXIF metadata for each photo to put them in the correct order, but Konquest makes a good point. I also fixed one of the command lines.

06/22/2007 - bockris said in the reddit.com comments:
This is a good idea. I've had to fix the EXIF data my photos before because I changed the batteries and didn't reset the date but I've never tried to sync up multiple cameras after the fact.

If I'm ever in a similar situation as the OP, I think I have everyone take a picture of a clock with a second hand at some period during the event. That would let you easily get the time difference among all cameras.

Using register allocation algorithms for determining table layout

Tuesday June 12, 2007 14:43, Will Kahn-Greene | Tweet this

S and I decided to assign tables for our guests during the wedding reception. There were a bunch of good reasons for doing this which I'm not going to go into here. However, assigning 100 or so guests to 10 or so tables while maximizing "goodness" and minimizing "badness" isn't trivial to do on paper by hand. At some point during wedding planning, I decided we could do table layout with a modified register allocation algorithm. This is a quick summary of translating register allocation into table layout along with some commentary on how this works nicely and also where it doesn't quite work.

More after the jump....

BREAK

Register allocation is typically done using graph-coloring with some additional bits that turn the NP-complete graph-coloring algorithm into a linear-time one that results in a good approximation. Andrew Appel talks about register allocation in his book on compilers entitled Modern Compiler Implementation in ML in chapter 11. It involves a graph of nodes representing temps connected by edges that represent interference between the temps. There's also a list of move-related nodes and nodes related by move-edges are coalesced if possible leading the connected temps to be colored with the same register.

In table layout, there are groups of people who can't sit at the same table with other groups of people. This could be due to family issues, differences in politics, past history, ... Additionally, there are groups of people you want to sit at the same table with other groups if possible.

Let's do some substitution. We'll substitute registers for tables, edges representing people who can't sit at the same table as interference, edges representing people who would be good sitting together as move-related, and temps as people and we have mapped the register allocation problem into a table layout problem.

I'm not going to go into the details of register allocation--Appel talks about the George and Appel algorithm from "Iterated register coalescing" (1996) for 25 pages and it'd be hard to summarize that into a blog entry.

There are a few things that don't translate well from register allocation to table layout like spilling. In register allocation if you can't get it to work you spill a temp into memory and then start over. I suppose you could uninvite particularly ornery people, that's one possible mapping for spilled temps. Another possibility is that you look for the least-worst pairing and remove that edge from the graph. A third possibility is to break up a larger table into two smaller tables giving you an extra color to work with.

Speaking of tables, there's one big difference between tables and registers: a register can be assigned to an unlimited number of temps so long as they don't interfere with one another. A table has a limited number of seats. So if you were to use a register allocation algorithm, you have the additional constraint that a limited number of people can be assigned to each table. If the register allocation implementation that you use is deterministic, this constraint could cause your situation to be unsolvable without backtracking. It would be a good idea to introduce a random element that causes assignments to occur in different orders between iterations.

The move-related edges (which represent people who should be sitting together) could be prioritized and that priority could be used for selecting moves to coalesce. For example, you might want to keep couples together and perhaps families, too. Perhaps you want to put all the children at a single table that you can put close to the bathroom.

Theoretically, this sounds like it would work pretty well for many cases, at a minimum returning a layout that can be tinkered with. We never used this method--by the time I had finished grad school, S was already past most of the table layout issues.

It's possible the constraint on people per table handicaps the register allocation algorithm to such a degree that it would be better to use a more general purpose constraint satisfaction problem solver instead. This requires further study.

Google Summer of Code 2007: PyBlosxom

Saturday May 19, 2007 12:55, Will Kahn-Greene | Tweet this

Blake poked me via email and suggested I put together some ideas for a PyBlosxom GSoC project under the PSF umbrella a couple of months ago. It was a hectic time, but I threw some ideas together based on items we had in the TODO list (many of which are pretty stale at this point). I'm happy to say that we had a great proposal for building a web front end for PyBlosxom--a tool that I think a good portion of PyBlosxom users would be happy to have.

I'll be mentoring this project over the course of the summer. Z has already started working on things and I think this will turn out nicely.

As a side note, this is a huge motivator towards finishing up a release and getting out a new version of the PyBlosxom manual. On the flip side, I'm getting married in a week so I'm finding it difficult to allocate time to get the work done. Wedding planning is intense. They should use wedding planning to teach project management courses--talk about shifting requirements and general project insanity. ;)

Cleaning Up PyBlosxom Using Cheesecake on Reddit

Saturday May 19, 2007 12:47, Will Kahn-Greene | Tweet this

Someone posted my article Cleaning Up PyBlosxom Using Cheesecake onto programming.reddit which was a bit of a surprise to me.

I'll keep track of the entry and the comments that it generates in case there's any feedback that would be useful for making the article better.