acb's technical journal

Coding the Australian senate preference visualisation

In the run-up to the recent Australian federal election, a record number of candidates lodged preference tickets for the Senate; to help untangle the preferences, I decided to grab the data, analyse it and write an interactive web-based visualisation, using Python to obtain and process the data and the D3 JavaScript library to code the visualisaton. Here are some more details of how I did this.

Collecting and processing the preference data

I found the data on the ABC's election site, where it was available for public viewing. Unfortunately, while it was browseable, it was only there in HTML tables, rather than some more machine-friendly form (like, say, JSON or CSV) more suitable to immediate number-crunching. As such, it needed work to convert it into such.

There is a library for Python named BeautifulSoup, which excels at parsing and scraping HTML; using this library it is possible to write a script to traverse a HTML document and extract order from its structure. So I wrote a script, extract_prefs.py for fetching the state-specific pages from the ABC's site, going through the sections for the party-specific voting tickets, extracting the figures from the HTML tables they were presented in and coalescing them to a set of dictionaries, each summarising the average preference which some party or ticket A gives to any candidate of a party or ticket B, and then writing the whole lot to a JSON file.

The script itself is fairly straightforward; the function readstate fetches a state's data from the ABC website and parses it with BeautifulSoup; the page is comprised of a set of <H2> headings containing party ticket names, followed by tables containing the party's preferences (each row naming a candidate, a party and an order of preference). readstate returns a dictionary of lists of table row data for each group name. We then need to process this data to transform the group names into party names (if present); this is slightly complicated by the fact that a party may offer several optional preference tickets. We deal with this by finding headings containing text like “(Ticket 1 of 3)”, stripping this text out and averaging the party's tickets together; we also clean up the party names somewhat, using a hand-assembled dictionary to coalesce all minor variations (i.e., “The Greens”, “Greens” and “Australian Greens”) into the same name. Finally, we crunch the rows in the tickets, summing and averaging the preferences for each party's candidates, and write the results out to our JSON file. For the purpose of the visualisation, we also keep track of which states a party has a presence in.

extract_prefs.py creates two JSON files, to be precise; the first one, named avgprefs.py is the data in long form; it consists of dictionaries (one indexed by preferences given and one by received), mapping each party to an array of (party name, average preference) objects, like so:

"Future Party": [
  {"pref": 1.5, "name": "Future Party"}, 
  {"pref": 3.5, "name": "The Wikileaks Party"}, 
  {"pref": 5.5, "name": "Building Australia Party"}, 
  {"pref": 7.5, "name": "Help End Marijuana Prohibition"}, 
  {"pref": 9.5, "name": "Secular Party of Australia"}, 
  {"pref": 11.5, "name": "Bullet Train for Australia"},
  . . .
This is included twice: once mapping preferences indexed by giver, and once by recipient. Of course, this is a somewhat inefficient way of sending data to a web browser; the parties' names vary considerably in length, and each one would appear roughly 200 times. As such, we take one additional step to compress the file for online transmission: we make a sorted list of the party names, put that in the JSON object, and everywhere else, replace each party name with its index in the list. This more concise file is written to prefdata.json, and is about 2/3 of the size of the uncompressed data. (We keep the uncompressed file, because Python scripts used elsewhere in the process read it, and I was too lazy to update them not to.)

An aside: order of key parties

It is at this stage that we can have some fun with the data, performing some tests on it. While there are roughly 100 parties (varying by state), one can isolate a handful of parties whom people know and often have strong opinions on, and use the order in which they appear in another party's preferences as a proxy for the other party's ideological affinity or worldview. While the precise basket of marker parties is somewhat subjective, I chose the three major parties (ALP and the two Coalition parties), the Greens and One Nation, and wrote a script to go through the preference data and categorise all the parties by which order these five markers appear in their preferences. The result, and analysis/speculation, appears in my other blog here. The most recent interactive visualisation duplicates this functionality somewhat by allowing the user to limit the preferences shown to a basket of user-selected parties.

Visualising the data

After this, I started building the visualisation. The design of the visualisation was to be as follows:

  1. Most of the screen would be taken up by a two-dimensional space, in which parties would be plotted as points, with their proximity to one another being proportional to how favourably they preferenced each other; i.e., it was to be hoped that parties with affinities or backroom deals binding them would end up clustering together in telltale clumps of ideology or self-interest. The points would be annotated with abbreviated party names.
  2. Clicking on a party would bring up a detailed view, showing which states it is running in and its given and received preferences displayed as bar charts, and listing its “preference buddies” (parties it is closest to) and “adversaries” (parties it is furthest from)

Requirement A above is known as a force-directed graph, and can be laid out algorithmically. The general algorithm involves creating a graph with the nodes representing the parties and the edges (lines between the nodes) representing the connections between the parties, with the strength of each connection being inversely proportional to the preference given by one party to another. Then the points are placed at random positions in a space and the algorithm runs repeatedly, adjusting the nodes' positions relative to the attractive forces of the edges until the whole thing more or less stabilises (which is why it spins when the page loads). Luckily, D3 has code specifically implementing force-directed graphs, which does most of the work for you; you just need to provide the list of nodes and edges.

The nodes and edges, incidentally, are created by another Python script, makefdg.py, which parses the avgprefs.json file created earlier and produces a fdg.json file, containing the party nodes, the links between them, and the colours for each node.

How are the nodes coloured? Well, the process is partially subjective and partially algorithmic, and has several stages:

  • A handful of parties have (typically bold, primary) colours associated with them. These remain their colours.
  • For each other party, its colour is initially set to the sum of the colours of these parties scaled by proximity (in terms of square root of average preference) to the party in question.
  • Additionally, keywords in party names affect the hue of the colour; i.e., parties containing the word socialist are given a reddish hue, while parties containing Christian are given a bluish hue.

Finally, I made an editorial decision to darken the colour of parties in proportion to their affinity to two openly racist/xenophobic parties; these are one party synonymous with opposition to Asian immigration over the past decade or two and one with roots in neo-Nazi groups. I justify this decision on the grounds that those of almost all political persuasions would either find these parties objectionable or would pretend to do so whilst in polite company. This was the only such decision I took in the colouring.

The JavaScript/D3 visualisation code

The actual code which handles the visualisation is all in affinity.html, along with the HTML it works with. The document itself is fairly simple, consisting of a header, a <svg> tag containing the force-directed graph, a sidebar containing one of a number of panes, and a footer. The sidebar initially contains an introductory text, though when the user selects a party, this is replaced by a party information panel, whose contents are filled in with the party details.

The JavaScript code loads the two JSON documents from the server on load, decoding and storing the preference data (which is used in the party info pane when a party is selected) and using the force-directed graph data to draw the parties on the left pane. When the user selects one, the preference data is used to draw the bar graphs on the right. (The number of bars is scaled to the number of parties in the selected party's preferences and the user's filter settings, with bars being scaled to fit.) As bars may be too narrow to display a party name, hovering the mouse over a bar will display a pop-over view with the party's name and preference information.

The bar charts in the party information pane are encapsulated into a PrefChartObject object in the code; this object has a constructor, which takes a SVG element selector and a list of parties and initialises an empty bar chart, and an updateData method which accepts new data (i.e., that for the newly selected party, or with new filters applied) and replaces the chart's existing data with the new data, animating the bars into their new locations. Using an object was a useful technique for compartmentalising and duplicating the functionality of the bar chart component.

For more information on the techniques used, I recommend looking at the examples and tutorials on D3 website. Also, Mike Dewar's Getting Started with D3 is an excellent guide to D3 and its uses in web-based data visualisation.

There are no comments yet on "Coding the Australian senate preference visualisation"

Want to say something? Do so here.

Post pseudonymously

Display name:
URL:(optional)
To prove that you are not a bot,
please enter the text in the image on the right
in the field below it.

Your Comment:

Please keep comments on topic and to the point. Inappropriate comments may be deleted.

Note that markup is stripped from comments; URLs will be automatically converted into links.