Jeff Balogh — GeistHaus

Jan 22, 2013 Updated Jan 22, 2013

Show full content

One of the most fun projects I worked on at Mozilla was glow.mozilla.org. It was a couple weeks before the launch of Firefox 4 and my frontend buddy @potch and I decided we should make a map to track downloads in real-time. This was back when Firefox releases were a big deal. It turned out to be a really compelling visual. We saw a bunch of tweets about people loving the map, it was featured in Flowing Data and nytimes.com, and somebody even made a video about it.

Potch started working on the frontend map with SVG and canvas while I figured out how to serve the download data; we would meet in the middle. We didn’t have much experience with node.js and didn’t know what kind of traffic we’d see, so I decided to play it safe and use static json data files for a near-real-time experience. Each minute of downloads was stored in a separate file in a predictable location. When the page loaded it looked for the data file two minutes before the current time, since we knew (hoped) that it would be written and ready to serve.

Our script loaded the record of the downloads and started dropping dots on the world map wherever the download came from (according to IP addresses). No one knew the data was two minutes old and our web servers were pretty good at serving static files.

The download data was already flowing into HBase (set up by someone else), so I wrote a Python script that connected to HBase every minute and retrieved the latest data. I massaged it into a structure easier for the js to consume and wrote it all out to a per-minute json file. That script ran on a cron job since we like old reliable technology.

The dots were animated on a transparent canvas overlaid on the SVG world map. We had a lot of trouble making the animation performant since we were working with old browsers, Firefox 3.6 and Chrome 10. Everything worked, but it was super slow.

The Sunday before the release, I curled up with the Chrome profiler and optimized the script. The original version was thrashing the GC by removing items in the middle of our large array of data points, so I rewrote it to sort the data first and use a moving cursor to traverse the array, animating dots as it went along. Our only garbage collection came when we were done with a minute’s worth of data. I made some more tweaks to fix slow canvas and DOM interactions and we were good to go. We went from slowing down around 500 points per minute to slowing down at around 70,000 points per minute. At peak we saw 6,500 downloads per minute, so the page worked beautifully.

http://jbalogh.me/2013/01/22/the-story-of-glow

How to Add Push Notifcations to your Website

Apr 5, 2012 Updated Apr 5, 2012

Show full content

This week I released https://github-notifications.herokuapp.com/, a site that demos push notifications in Firefox. It uses Github web hooks to send notifications when there’s a commit to one of your repositories. Push notifications are currently implemented in Firefox as an experimental add-on.

This post shows the code I used to send push notifications from the site.

Get a Push URL

When you give a website permission to send push notifications, Firefox asks the Push Notification Service to create a URL for the site to contact you. That URL is returned through the mozNotification javascript API.

var notification = navigator.mozNotification;
if (notification && notification.requestRemotePermission) {
  var request = notification.requestRemotePermission();
  request.onsuccess = function() {
    var url = request.result.url;
    jQuery.post("/add-push-url", {"push-url": url});
  }
}

This code checks that navigator.mozNotification exists and then asks for permission to send notifications using requestRemotePermission(). When the onsuccess event fires, the callback POSTs your push URL back to the server.

(You can play with the mozNotification API by installing the add-on.)

Save the Push URL

The backend of my site is a simple Flask app. The User model stores the username and push URL:

class User(Model, db.Model):
    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(80), unique=True)
    push_url = db.Column(db.String(256), nullable=True)

Users are created with an empty push_url after they connect to Github through OAuth. The push_url is filled in by calling the /add-push-url view:

@app.route('/add-push-url', methods=['POST'])
def add_push_url():
    username = session['username']
    user = User.query.filter_by(username=username).first_or_404()
    user.push_url = request.form['push-url']
    db.session.add(user)
    db.session.commit()
    notify(user.push_url,
           'Welcome to Github Notifications!',
           'So glad to have you %s.' % user.username)
    return ''

Send a Notifcation

Sending a notification is as easy as POSTing to the push URL.

def notify(push_url, title, body, action_url=None):
    msg = {'title': title,
           'body': body}
    if action:
        msg['actionUrl'] = action_url
    requests.post(push_url, msg)

Using the requests library, notify sends a message back to the user’s push URL as a string of url-encoded parameters. After that, the push notification system takes care of getting the message to the user’s browser.

That’s all the code needed to send push notifications to Firefox; we’re trying to keep the system as simple as possible for developers. In the coming months the add-on will be integrated into the browser and our Push Notification Service will go live. If you have questions, feel free to contact me over email or twitter.

http://jbalogh.me/2012/04/05/how-to-add-push-notifications-to-your-site

Introducing Push Notifications

Jan 30, 2012 Updated Jan 30, 2012

Show full content

Push notifications are a way for websites to send small messages to users when the user is not on the site. iOS and Android devices already support their own push notification services, but we want to make notifications available to the whole web. We’re making prototypes and designing the API right now and want to share our progress.

How it works

The website gets a URL where it can send notifications to the user. The URL points to the Notification Service, and is a secret between the user and the website.
The site sends a notification to the Notification Service.
The Notification Service delivers the message to Firefox on the desktop, on Android, on Boot to Gecko, or on iOS through Firefox Home; we’ll find the right place to deliver the message.

To start sending push notifications, a website needs to ask the user for permission. Here’s some example code:

var notification = navigator.mozNotification;
if (notification && notification.requestRemotePermission) {
  // Ask the user to allow notifications.
  var request = notification.requestRemotePermission();
  request.onsuccess = function() {
    var url = request.result;
    console.log('New push URL: ' + url);
    // We got a new push URL, store it on the server.
    jQuery.post('/push-urls/', {url: url});
  };
}

The notification API will live at navigator.mozNotification until it gets standardized. First we get the API object and check that it exists. If it’s there we ask the user for permission to send notifications using notification.requestRemotePermission(), which returns an object we use to watch for events.

If the user grants permission, the browser will talk to the Notification Service and grab a new URL that links our site to the user. Every site/user pair gets a unique URL.

The URL is available in the onsuccess callback as request.result and should be sent back to the server and stored for future use.

On the Server

Now that we have a URL, we can send messages from our servers to the Notification Service.

POST /some-queue-url HTTP/1.1
Content-Type: application/json

{"iconUrl": "http://example.com/shipped.png",
 "title": "Your package has shipped.",
 "body": "We shipped your package at 10am this morning.",
 "actionUrl": "http://example.com/order-status",
 "replaceId": "order-status"}

iconUrl: URL of the icon to be shown with this notification.
title: Primary text of the notification.
body: Secondary text of the notification.
actionUrl: URL to be opened if the user clicks on the notification.
replaceId: A string which identifies a group of like messages. If the user is offline, only the last message with the same replaceId will be sent when the user comes back online.

Once the notification is in the system, we’ll deliver it to the recipient on all the devices they have Firefox installed, but we’ll try not to show duplicate notifications on different devices.

For a more detailed description, please check out our wiki page.

Update: If you’d like to give feedback please email me, find me on twitter, or reply to this post on mozilla.dev.platform.

http://jbalogh.me/2012/01/30/push-notifications

Cache Machine: Automatic caching for your Django models

Feb 9, 2010 Updated Feb 9, 2010

Show full content

Cache Machine hooks into Django’s ORM to provide easy object caching and invalidation.

One of our primary goals with the rewrite of addons.mozilla.org was to improve our cache management. Large sites like AMO rely on layers of caching to stay afloat, and caching database queries in memcached is one of our favorite tools.

AMO heavily favors reads over writes, so we have great cache performance; the hit rate ranges from 90%-98%. However, once something is in the cache, it’s stuck there until timeout (60 minutes). Combined with front-end caching, this can mean it’s a couple of hours before add-on developers see their changes roll out to the site. We don’t like that.

For zamboni, our Django-based rewrite, seamless object caching and invalidation was my first project. Today we released Cache Machine as a drop-in library for use in any Django application. The package is available on pypi and the code is on github.

Usage

Here’s a cache-enabled model:

from django.db import models

import caching.base

class Zomg(caching.base.CachingMixin, models.Model):
    val = models.IntegerField()

    objects = caching.base.CachingManager()

The first step is to inherit from CachingMixin. This mixin gives your model post_sync and post_delete signal handlers that will flush the object from cache. It also adds a cache_key property that helps invalidate the object properly.

Then you replace the default manager with CachingManager. Instead of a normal QuerySet, this manager returns CachingQuerySets which try to pull objects from cache before performing a database lookup.

How it works

Cache Machine knows how to cache normal QuerySets and RawQuerySets. Each QuerySet is keyed by the active locale and the SQL of the underlying query. The CachingQuerySet wraps around Queryset.iterator() to check cache before hitting the database.

Invalidation is the interesting part. As we iterate through a set of objects in a database result, we create a “flush list” for each object. The flush list maps each object to a list of cached queries it’s a member of. When an object is invalidated in the post_sync signal, all of the queries it was a part of are immediately invalidated.

Parent and child foreign-key relationships are also tracked in the flush lists. If a parent object changes, its flush list will be invalidated along with all children that point to it, and vice versa.

Issues

Only the memcached and locmem backends are supported. Cache Machine relies on infinite cache timeouts for storing flush lists, but none of Django’s builtin backends support this (even though the memcached server does). We wrap the memcached and locmem backends to fix the infinite timeout issue, but file and database backends aren’t implemented since they’re not useful to us.

Cache Machine does not cache values() and values_list() calls. Since these methods don’t return full objects, we can’t know how to invalidate them properly. They could be overridden to do normal lookups and then pull out the results, but I haven’t gotten around to that yet.

count() queries will not be cached. These can’t be invalidated efficiently. I recommend denormalizing your tables and adding a count field if you need to access it often. Update: limited count caching was enabled in this commit.

Cache Machine has a few log.debug calls in its caching and invalidation internals. These work fine with zamboni, since we set up our logging on startup. I don’t know if these calls will be problematic without or logging config. Let me know.

http://jbalogh.me/2010/02/09/cache-machine

Highlights from DjangoCon 2009

Oct 20, 2009 Updated Oct 20, 2009

Show full content

Long (long!) overdue, here’s a bunch of links that point to the good things I learned at djangocon.

Restful Ponies

Started with a basic REST overview, gets interesting when it shows how you can easily expose resources using Piston
django-roa builds on Piston to give models a basic REST API for free
remoteobjects is an object-restational mapper that maps objects to REST apis on the web. It’s really cool and I’m playing with it for the new Bugzilla rest view.

Deploying Django

Talks about how they’re doing “repeatable, automated, isolated” deployments using Python tools. Don’t you wish we were rocking this on AMO?

virtualenv and pip to keep the deployment environment isolated
fabric to script vcs checkout and installation
using lightweight tags in git to mark deployment versions
up and down migrations with South to stay safe
mod_wsgi daemon mode is preferred, fastcgi is cool too

Scaling Django

Pownce was serving hundreds of request/sec, thousands of db ops/sec, it can be done. It’s simple to do automatic caching and invalidation when your queries are going through the ORM, Django’s signals decouple the invalidation process from your update code. Using multiple databases with Django is not straightforward, but it’s getting easier with an SoC project that’s ready to merged in.

sqlparse pretty prints SQL queries

The Realtime Web

Showed how they created an IRC client in the browser over comet with Django on the backend. The browser and the app keep “persistent” connections to an orbited server that handles all the messy details so the app can pretend it’s really connected directly to the browser.

This coincided with the release of FriendFeed’s Tornado web server, which led to an interesting focus on async during the conference.

Using Django in Non-Standard Ways

Covers a lot of little hurdles that people might consider show-stoppers when using Django, how to overcome them. WSGI middleware is fun:

repoze.bitblt: automatically scales images
repoze.squeeze: Merges JS/CSS automatically based on statistical analysis
repoze.profile: Aggregates Python profiling data across all requests, and provides an html frontend for viewing the data
repoze.slicer: extract/filter pieces of an html response

Pinax Tutorial

Pinax tries to be a collection of reusable apps that work well together, but it looks like you spend more time trying to configure things than actually making useful apps.

http://jbalogh.me/2009/10/20/djangocon-2009

The worst schema versioning system, ever?

May 23, 2009 Updated May 23, 2009

Show full content

schematic talks to your database over stdin on the command line. Thus, it supports all DBMSs that have a command line interface and doesn’t care what programming language you worship. Win!

It only looks for files in the same directory as itself so you should put this script, settings.py, and all migrations in the same directory.

Configuration is done in settings.py, which should look something like:

# How to connect to the database
db = 'mysql --silent -p blam -D pow'
# The table where version info is stored.
table = 'schema_version'

It’s python so you can do whatever crazy things you want, and it’s a separate file so you can keep local settings out of version control.

Migrations are just sql in files whose names start with a number, like 001-adding-awesome.sql. They’re matched against '^\d+' so you can put zeros in front to keep ordering in ls happy, and whatever you want after the migration number, such as text describing the migration.

schematic creates a table (named in settings.py) with one column, that holds one row, which describes the current version of the database. Any migration file with a number greater than the current version will be applied to the database and the version tracker will be upgraded. The migration and version bump are performed in a transaction.

The version-tracking table will initially be set to 0, so the 0th migration could be a script that creates all your tables (for reference). Migration numbers are not required to increase linearly.

schematic doesn’t pretend to be intelligent. Running migrations manually without upgrading the version tracking will throw things off.

Tested on sqlite any mysql.

NOTE: any superfluous output, like column headers, will cause an error. On mysql, this is fixed by using the --silent parameter.

Things that might be nice: downgrades, running python files.

http://jbalogh.me/2009/05/23/schematic

Introducing poboy

May 14, 2009 Updated May 14, 2009

Show full content

I’d be surprised if poboy is useful to anyone I don’t work with, but I wrote a README, so that should be shared with the internet.

Finds all the gettext calls that have an inline fallback and moves that fallback into the messages.po file. Thus, you can use ___('msgid', 'msgstr') when you’re writing new code and use this script to clean up afterwards.

poboy won’t edit any code files. Instead, it prints out a unified diff that you can check for correctness and send to patch. I didn’t want to deal with rewriting files safely.

How I use it

Find all the strings that have a fallback:

poboy locale/en_US/LC_MESSAGES/messages.po --find

Find the strings with a fallback that aren’t already in messages.po:

poboy locale/en_US/LC_MESSAGES/messages.po -an

That’s -a for --add (to the .po file) and -n for --dry_run.

Show the strings that will be added and the cleanup patch:

poboy locale/en_US/LC_MESSAGES/messages.po -n

And the fun one, add the strings to messages.po and generate a cleanup patch:

poboy locale/en_US/LC_MESSAGES/messages.po > poboy.patch

http://jbalogh.me/2009/05/14/poboy

pyquery: a jquery-like library for python

Mar 24, 2009 Updated Mar 24, 2009

Show full content

pyquery is a fantastic little library for dealing with XML and HTML documents. It brings the power and ease of jQuery into Python, letting you deal with CSS selectors and functions instead of a clunky DOM. I try to avoid dealing with XML as much as possible, but slinging around pyquery almost makes XML fun.

Building lxml

The hardest part of working with pyquery is getting it installed. pyquery gets all of its XML power from lxml, which has a reputation for being difficult. Ian Bicking mentioned that lxml2.2 has become much easier to install by providing an option to compile the troublesome C libs as static libraries, which has avoided any problems for me. All you need to do is define STATIC_DEPS=true in the build environment:

STATIC_DEPS=true pip install pyquery

This has worked for me on OS X with pip, easy_install, buildout, and probably anything else based on distutils.

Web Scraping

Web scraping is ridiculously easy with pyquery. Grabbing a Shakespearean insult from the web is as simple as

import pyquery

p = pyquery.PyQuery('http://www.pangloss.com/seidel/Shaker/')
insult = p('font').text()

Finding the insult on that page is aided by the author’s semantic font tag.

Testing

I like to make sure that my views are working correctly, another task in which I’m finding pyquery indispensable. I’ve seen regexen used for the same task, but examining a real DOM is much more resilient than trying to pick out pieces by matching strings. Testing views is especially useful when dealing with template systems like Django’s and Jinja’s which silently hide errors instead of raising exceptions.

assert d('#stats').text() == '5 tests: +2 -3'

I’ve noticed that testing the HTML in this manner has improved my semantic markup. Pulling out and testing pieces of the page forces me to add meaningful ids and classes to the elements.

Bonus

For extra HTML goodness, the tests submit response pages to the w3c Validator using this multipart form encoder. Then, of course, I use pyquery to make sure all is well.

validator = post_multipart('validator.w3.org', '/check',
                           {'fragment': response.data})
assert pyquery.PyQuery(validator)('#congrats')

http://jbalogh.me/2009/03/24/pyquery

Nose Test Runner for Django

Nov 2, 2008 Updated Nov 2, 2008

Show full content

Update: you can now find django-nose on pypi and github with much better documentation.

I am not a big fan of Python’s unittest library. The Java-inspired API and the difficulty of running tests are too much for me to deal with. That’s why I love nose: I can use regular asserts (or the Pythonic helpers in nose.tools) and running all my tests is as simple as calling nosetests from the command line. On top of that, nose also supports cool plugins like generating coverage reports and running tests interactively, test fixtures at any granularity level, and simple selection of tests to run, making me a happy tester.

Which is why I wrote a custom test runner as soon as I started working on basie. Django provides its own test runner framework, but it’s far less advanced than nose.

I haven’t packaged it up for PyPI yet, but you can download nose_runner.py from our repository. Here’s the documentation:

Django test runner that invokes nose.

Usage:
    ./manage.py test DJANGO_ARGS -- NOSE_ARGS

The 'test' argument, and any other args before '--', will not be passed to
nose, allowing django args and nose args to coexist.

You can use

    NOSE_ARGS = ['list', 'of', 'args']

in settings.py for arguments that you always want passed to nose.

http://jbalogh.me/2008/11/02/nose-test-runner-for-django