Friday 31 December 2010

2010: So, that happened

What did I do in 2010? I barely remember.

I went to Edinburgh and climbed a mountain. Although Wikipedia claims it's just a big hill.

I moved out of Westminster
The Palace of Westminster

and back to Docklands.
45: Brand New Day

I wrote in the Wandering Book.

I got this cool new job which enabled me to host London's very first HackCamp. I'm still not sure I didn't imagine the spontaneous singing along to The Lion Sleeps Tonight by The Tokens.
52.0: HackCampers

I got to see what the 4th of July looks like in Boston.
Independence Day

Chris Chabot and I went to Stockholm where we ran a Buzz hackathon.

I gave a presentation about TDD on AppEngine at EuroPython.

I ran a Buzz hackathon in Lisbon.

A group of us entered a team in a photography contest held at the National Portrait Gallery. Improbably enough one of our photos (not the one below) won and we got to see it projected onto the walls of the gallery.
The Three D's

I attended OpenTech. I think I may be one of the few people who has managed to attend every single OpenTech. Even back when it wasn't called OpenTech.
Bits, atoms, photons & words

I went back to the US in the autumn and got to meet a lot of the ex-Thoughtworks crowd who are now at DRW Trading.
DRW Sunset

Straight after that the Google Developer Days took me across Europe including a walk through Red Square at night.

I got back from that just in time to help run the world's hardest raffle at XPDay.
The Google Raffle

This was followed by the chance to see GoogleGogol Bordello live at the Kentish Town Forum.
Gogol Bordello

This takes us till Christmas and the end of this year.

Next year is another game and it's too early to predict the outcome.
Adversarial contrast

Add comments on Buzz

Monday 2 August 2010

On translation

People are often surprised when they discover that Dave Hoover and I have only met twice. Yet, somehow, over the course of 4 and a half years we put together this book. People seem to like it.

Dave and I have never met Yoshiki Shibata but somehow we all put together this book. I hope people like it.

As part of the translation process I wrote a new foreword and last week I got to see it in print for the first time. Since I can't even guess at the meaning of Kanji symbols I had to infer the meaning from memory. Seeing my own words in a language I couldn't understand generates the strangest feeling of alienation.

I thought I'd share something of that feeling. You can read the English and Japanese versions of the foreword below. Hopefully those of you who can read both languages will better appreciate the craft of translation after seeing both versions.

The craft of translation
This book you're holding in your hand is about what it takes to do skilled work. In the months we've worked with Yoshiki Shibata I've come to appreciate the skill involved in translating a book and how lucky we are to have a translator who shares so many of our values. Translation is not a matter of mechanically converting words any more than a programmer's job consists of mindlessly converting requirements into code. In both disciplines we must add something of ourselves in order to transmute the raw materials into something that reflects our skill and our spirit.

One of the lessons I've learned in the five years since we started working on Apprenticeship Patterns is that it's really easy to assume that other people share a lot of your obscure and tacit knowledge. Our original version of the book is full of little bits of tacit knowledge that we've extracted from the experiences of our interviewees and Shibata-san has helped us make that material clearer. In some places he has gone beyond mere translation and helped us see our words in new ways.

As Dave has blogged: this book "won’t teach you how to be a great programmer, it will teach you how to learn to be a great programmer." That phrase was directly inspired by our conversations with Shibata-san.

I hope that this book will inspire you to become part of the Software Craftsmanship community. I hope that it will inspire you to improve your skills and I hope that it will guide [you] on to the path of apprenticeship. After all that's the long road we're all walking.

The new foreword

Add comments on Buzz

Tuesday 29 June 2010

The spectrum of networks

In August I'm going to be on a panel at the BlogTalk conference where we shall be discussing the differences between social and conversational networks. This blog post is an attempt to clarify my thinking and get some definitions out there.

We all have a rough idea of what makes a social network. For me the important elements are:
  • Symmetric relationships
  • Expressing connections with people you already know
  • Messages default to private
  • A strong sense of who can see your data and in what context
On the other hand a conversational network is primarily based upon:
  • Asymmetric relationships
  • Following people you find interesting even if you don't know them
  • Messages default to public
  • Your data ends up in various different contexts and may be aggregated/remixed/reused all over the web.
This isn't a dichotomy but a spectrum. Services occupy points along this spectrum and often move across it as they add or remove features over time.

Neither extreme is better than the other and individual users may need to use a service in ways that defy the expectations of the service's creators. This leads to the situation where a conversational network like Twitter has protected accounts and a social network like Facebook lets you publish status updates that the whole world can see.

This is just a model, a way of looking at the world, but it has interesting implications.

For example the model shows me that I use Flickr more than PicasaWeb or Facebook photos because I mostly desire a conversational photo-sharing experience. I want people to aggregate/reuse/remix my photos so I use a Creative Commons license and join interesting groups that juxtapose my photos with other people's work or encourage blogs (like Global Nerdy or Londonist or to embed them.

Another interesting implication is that whilst most of the interesting people and most of the web's creativity is at the conversational end of the spectrum, most of the people are at the social end of the spectrum. This intriguing contrast was first raised by my colleague Paul Adams. He pointed out that the vast majority of people don't want to be on public display and this 1% rule leads to services where only a few create content which a lot more share/curate and the vast majority consume.
In fact this isn't a weakness of the model but a strength because that's the world we live in.

References and inspirations:

  • The phrase "conversational network" comes from this Jaiku thread.
  • A Buzz thread where Jonas Nockert points out that, given time, conversational networks drift towards the social side of the spectrum.
  • Fambit is an example of a service that occupies the social end of the spectrum.
  • A Buzz comment wherein Brian Cronmiller independently discovers the same phenomenon
  • Results from a South Korean study which point out that the Twitter network isn't structured like a conventional social network

Add comments on Buzz

Sunday 30 May 2010

The Wandering Book

A while ago, Enrique, set up the Wandering Book as a means of capturing the zeitgeist of the software craftsmanship movement. The idea is that a moleskine notebook wanders between people who think of themselves as members of this community. These people then have a week to contribute some useful insight before passing it on.

I'm guilty of taking significantly longer than a week before passing it on. My contribution is below.

What have you made recently?

Whenever software craftspeople gather that's one of the questions I'd like us to ask each other. I'd also like us to ask:
  • what are you making?
  • what do you want to make next?
  • what have you learned from the things you made?

These are some of the questions that get to the heart of what we do.

We make software: code, databases, user interfaces, etc. We do it all. We may not be able to match the experts in each domain but we can make complete software all on our own.

I'm not talking about the artefacts of your day job or the things your team built. I'm talking about things that matter enough to you, that you created them in your own time and for your own reasons. These things you choose to make define the borders of your craft.

Even though I firmly believe that deliberate practice builds skill I don't think it's sufficient unless you also make things. In the same way I think that our current idea of software craftsmanship is insufficient if we're going to create a healthy community rather than another hollow buzzword.

Recently I've been thinking about the idea of a "generative community." This is a group of people united by overlapping values that lead them to create things that affect the real world (this may be software, devices, conferences, websites, etc) rather than just talk and think about making things.

I'd like our little community to be generative in the same way that Christopher Alexander wanted patterns to be generative. And I'd like you, the reader, to help make this happen.

Add comments on Buzz

50: A place called home

50: A place called home

Monday 24 May 2010

Joining the Social Web team

What has two thumbs and is joining the Social Web team at Google? Me.

I'm going to be one of the Developer Advocates based in the London office. I'll be looking after all things related to 'social' and the social web in Europe, the Middle East and Africa (EMEA).

I plan on spending a lot of my time listening to and learning from people outside the company. In fact when I try to describe all the facets of this job I tend to point people to Christian Heilman's book or Dion Almaer's blog post or Simone Brunozzi's blog post.

The social web is bigger than any one product or company. That's why my job is going to be as much about helping to grow the social web as it will be about helping developers to use Google APIs. So if you're doing something interesting with the social web in EMEA and you think Google can help then send me an email. I'm ade at

I'm also, to quote John Panzer, "a cluster of heterogeneous identifiers." You can follow most of them on Buzz:

Add comments on Buzz

49: On-Call Sam

49: On-Call Sam

Thursday 20 May 2010

Fiddling with Google Buzz

I woke up this morning and, inspired by Ian Bicking's post, thought I'd take a look at showing my last N Buzz posts on my website:

I started with the example code from here: which makes a request for a JSON object representing all of a user's public posts. Then I tweaked it a little so that it uses my numeric identifier rather than my username. This is in order to avoid leaking my email address. I also changed it so it only shows the last 5 items. I then added a little bit of code to extract the link for each item.

Working out how to traverse the JSON object was made easier thanks to DeWitt's JSON indent project:

It meant that I only had to work out how to read this: rather than:

After that I only had to tweak the appearance to fit in with the rest of my, rather old-fashioned, website. Hopefully someone will take this code and turn it into a proper widget that can easily be re-used.

Add comments on Buzz

Monday 3 May 2010

Apprenticeship Patterns is now Creative Commons licensed

Just over 5 years ago Dave and I started Apprenticeship Patterns on a wiki. We used that wiki to organize the stories we found as we went around the world asking people how they became skilled software developers. When O'Reilly approached us about turning our wiki into an actual book printed on dead trees we were delighted but we also emphasised our desire to share the ideas with the widest possible audience. Fortunately O'Reilly are an incredibly englightened publishing house and they were already thinking about ways to get their books into the Creative Commons.

Just like we were one of the first O'Reilly books to experiment with using a wiki to get early feedback during the writing process we're also one of the first O'Reilly books to experiment with publishing our material under a Creative Commons license. Starting from today the book is now available here:

We're using O'Reilly's experimental Open Feedback Publishing system which lets people, after registering, attach comments to any section of the book. If there's ever a second edition your feedback will be an essential part of it so please don't be shy.

Communicating with atoms

A few weeks ago I attended an Open Source Jam where the topic was "building blocks." I gave a lightning talk about why the combination of Atom and Webhooks is changing the way web applications interoperate. In this set of blog posts I'd like to flesh out that 5 minute presentation and explain how Atom is potentially a universal payload format for the web in the same way that byte streams are a universal payload format for Unix.
  • Part 1: Communicating with atoms1
  • Part 2: Webhooks
  • Part 3: JSONistas and XMLheads
Here's the problem. I have N different web apps that I want to connect to each other in arbitrary ways. Some of these web apps don't exist yet and some of them will be written by people I don't know who won't ask permission before they connect their web apps to mine.

In an enterprise environment we'd solve this using some kind of common messaging infrastructure or/and a universal data format. We would then deal with the inevitable pain as the evolution of the different web applications broke compatibility or locked the entire system into moving at the pace of the slowest team.

On the web we don't have that luxury. But Atom can help.

The Atom specification includes a number of simple-looking features which solve complex problems and open up the possibility of using Atom entries as a universal payload format for interoperability across the web.

The basic structure of an Atom document requires either an atom:feed or atom:entry as the top level element. Apart from that you must have an atom:id, an atom:author, atom:link, atom:title and an atom:updated2. Everything else is optional.

Since Atom is based on XML it has support for namespaces. This makes it possible for you to take an atom:entry and enrich it with new tags that only make sense in the context of your application. For instance the Activity Streams specification adds lots of new tags. However it also uses lots of tags from other specifications. It extends Atom and re-uses Atom Threading Extensions, Atom Media Extensions, xCal, PortableContacts and GeoRSS. The feed ends up looking like this:

But what if you just want to treat that ActivityStream or any other Atom extension as if it were a simple Atom feed?

Atom Processors that encounter foreign markup in a location that is legal according to this specification MUST NOT stop processing or signal an error. It might be the case that the Atom Processor is able to process the foreign markup correctly and does so. Otherwise, such markup is termed "unknown foreign markup".

When unknown foreign markup is encountered as a child of atom:entry, atom:feed, or a Person construct, Atom Processors MAY bypass the markup and any textual content and MUST NOT change their behavior as a result of the markup's presence.

When unknown foreign markup is encountered in a Text Construct or atom:content element, software SHOULD ignore the markup and process any text content of foreign elements as though the surrounding markup were not present.

Well the Atom specification insists that an Atom processor, like your web app, 'must ignore' foreign markup in an Atom feed unless it is sure it knows what to do with it. This means that we can have multiple different but interoperable versions of Atom or any other XML language floating around at the same time. Tim Bray called this "an unstated axiom of the World Wide Web" and I agree with him that this simple rule allows "multidirectional growth" since anyone can extend Atom without asking permission from a central authority or worrying too much about versioning. I also agree with Sam Ruby when he says that "90% of all namespaces are crap" but since we can't tell which namespaces will become popular and which will be ignored we should let anybody and everybody have a go.

The above makes it sound like we're entering into a happy, fun world of atomic interoperability and distributed extensibility. However you will eventually want to re-publish an atom:entry from another feed. Then you'll realise that the compulsory elements I listed above aren't enough. If you're building a tool like Planet Venus or Streamer you're going to need to generate an Atom feed containing entries from other websites. These sites may be using various different extensions and id generation schemes. The spec says that you should preserve the atom:feed's original atom:id inside the atom:source if the atom:feed was the top-level element.

This way you can have an atom:id that points to your copy of the atom:entry and an atom:source which points to the original atom:entry so that if someone then re-publishes part of your feed we can still find out the provenance of the atom:entry.
The atom:source 'points' to the original source of the atom:entry and instead of making your own atom:id you use the "permanent, universally unique identifier" that the author(s) of the atom:entry assigned.

The above sounds complicated but it means we can treat Atom entries as a loosely joined chain of small pieces that can be piped, filtered and aggregated with web-based equivalents of the standard Unix tools. By simply publishing a feed containing these pieces I know that other people's tools can consume, re-mix, compose and syndicate my content in ways I can't even think of.

The simple blog aggregators we've been building are just the beginning and in part 2 I'll talk about how Webhooks will let us go beyond feeds and start thinking in terms of low-latency streams.

OSJAM PanoPhoto by

1- Thanks to Sheila Thomson for the title.
2- These tags may be optional or compulsory depending on which other tags you're using.

Updated 2010/05/03: Bob Wyman has pointed out some mistakes in the original version of this article. Firstly atom:source elements may contain elements from the atom:feed not the atom:entry. Secondly atom:id elements are permanent and universally unique identifiers which don't have to be URLs so it's mis-leading to talk about them pointing anywhere. Finally atom:source is not about the provenance of an atom:entry, since that would require tracking all the locations it went through before you saw it, but about the point of origin. See here for more details from Bob.

46: Shake hands forever

46: Shake hands forever

45: Brand New Day

45: Brand New Day

Wednesday 3 March 2010

Mapping personal practices

A long time ago Joe Walnes ran a session at the Extreme Tuesday Club where he encouraged us all to draw maps of our personal practices. As he put it:
We want your personal practices that you find important. Different people work in different ways, so we thought it would be interesting to discuss this.

In your own time, make a list of your 10 most important practices for coding and design. These do not have to be XP related and should be the most important things in your mind. For example: Separate interfaces from implementation, mock objects, follow the Law of Demeter, test driven development. Controversy encouraged - everyone's different.
Try to determine any relationships that may exist between these. Specifically, which practices support other practices. For example: unit testing supports refactoring and test driven design.
Draw a map of the relationships - like this:

Sadly the old XTC wiki has gone to that great stand-up in the sky but I was able to use the Wayback Machine's copy to rescue some people's maps.

Tuesday 16 February 2010

Pirate testing

This is an example of pirate testing. It's using a test-suite consisting of language-neutral data. There's also a ruby implementation.

So what's pirate testing? It's a form of data-driven testing. The tests are specified in a language neutral format (XML, JSON, YAML, whatever) and then various test harnesses for different languages are written. These ensure that the different implementations all provide the same functionality.

Let me rephrase that in dictionary-speak:
Pirate Testing is a family of techniques for creating functional tests that are independent of a language or implementation. This usually means that the tests are data-driven or they're written in a neutral language that can be invoked from multiple other languages. The easiest way to think about this is that there is a DSL used for writing the tests (this can be a data format or a programming language) and one or more general purpose languages used for implementing functionality that satisfies the tests.

This is an idea that goes back to the days of Jon Bentley's 'little languages' but the name originated in this post by Sam Ruby. In his case he was literally dealing with the tests for a virtual machine called Pirate.

This: is an implementation of pirate testing. Some things you should note. I dynamically build a PyUnit TestSuite which has one TestCase per record in the testdata.json file. This ensures that the tests have all the standard benefits (shared setup, shared teardown, independence, reporting, etc) that come from using a *Unit testing framework.

It's vital to make sure your pirate tests are isolated from each other; that the test runner keeps going after it encounters a failure and that you emit detailed diagnostic information when a test fails. One of my patches to the Feedparser project makes it dump out the entire environment when a test fails because it was proving difficult to debug problems. The implementation by Matt Sanford of the Twitter conformance tests currently doesn't do this.

Like any technique pirate testing has benefits but it also has downsides. As it can really only be done with functional tests it means that low-level bugs can hide in individual implementations. To make matters worse these kinds of test suites tend to grow very large and eventually take a long time to run. This dissuades people from running them very often and as such they can easily end up in a state where a large percentage of the tests are always broken. However the most insidious danger is that when 7 tests fail you may just fix each one on it's own rather than spotting the common element responsible for the break.

The error messages from pirate tests tend to be generic and unhelpful unless you take extra steps when writing the test harness. They don't replace the need for unit testing and keeping the test data readable is a challenge. Then there are the problems caused when someone decides to 'refactor' the test data to eliminate the inevitable duplication and they break several different implementations...That's assuming you can even get the various different implementors to agree on which set of pirate tests are the canonical ones.

So if they have all these problems why do people bother? The obvious reason is that it makes it easier to bootstrap new implementations of a tool. But that isn't the biggest reason. The really big benefit of this technique is that it aggregates the lessons learned by all the implementations in one machine-readable format. This means that compatibility, compliance with a specification and interoperability aren't topics for debate but empirical matters which can be settled by a carefully crafted test case.

Monday 1 February 2010

New Frontiers: TDD and Refactoring Workshop at Brunel University

At the first Software Craftsmanship Conference in London I met Steve Counsell. He's an academic at Brunel University with an interest in Object Orientation, Metrics and Refactoring.

A while ago Steve invited me to give a short presentation at one of a series of workshops he's running. These 'Reftest' workshops are aimed at bringing academia and industry together to share our insights and experiences with both Test Driven Development and Refactoring.

There were too many interesting presentations at last week's event for me to describe them all. But they should all eventually end up on the Reftest website so you can read them there. That site doesn't have a feed yet so I've set up this: which basically notifies you of any changes to the website.

Anyway, there was a lovely presentation by Charles Tolman from Quantel about some of the issues you run into when you have a 12 year old codebase containing 17 million lines of C++ and Python. This lead into an interesting discussion about detecting duplicate code. I maintain Same but nowadays I find that Eric Raymond's Comparator is a better solution for most people so I recommended that.

One of my favourite asides from Charles's 30 years in software (20 of them spent at Quantel) was "tools should enable the intelligence of the programmer rather than attempt to encapsulate the intelligence of the programmer."

The folks from the University of Kent showed some impressive results for their tools which provide:
  • model checking
  • property-based testing
  • interactive clone elimination

Best of all many of the principles behind them aren't restricted to Erlang. So these kinds of techniques and tools can be applied to mainstream languages. They just chose Erlang because the tools were built as part of the ProTest project they're doing with Ericsson.

My own presentation looked at some of the problems with TDD and Refactoring.

I discussed problems with teaching the cluster of tacit knowledge that we call TDD and covered issues like the convenient fictions we use, the mis-leading tutorials and the overly simplistic examples that involve starting with a blank slate. I also threw in a brief digression about the Norvig-Jeffries kerfuffle.

The section of my presentation on refactoring mostly involved going back to Opdyke's thesis and looking at some of the consequences of his ideas. One thing that struck me was how insightful Opdyke's work was when it came to the issues like preserving class invariants and the subtle side-effects of most refactorings. Somewhere along the line though as we automated refactoring we seem to have lost the heart of his ideas.

I'd expected the last section with it's references to model checking, property-based testing tools like ScalaCheck, mutation testing and fuzz testing to be controversial but the discussion ended up being a very productive look at ways in which we could fix the problems. The whole day made me very hopeful about the prospects for significant progress in the tools and techniques for TDD and refactoring.

Updated 2010/02/06: Ben Stopford has written up his perspective on the workshop and uploaded his PDF slides.

Saturday 2 January 2010

What did I learn in 2009?

I attended both Software Craftsmanship conferences and as a result I now see software craftsmanship as a 'movement' in the same sense as Impressionism. In other words it's a group of very loosely affiliated people who are linked by overlapping values rather than rigid adherence to some set of rules. However its roots in the Agile movement mean its likely to be subject to the same forces that took us from 'lightweight methodologies' for getting useful work done to a marketing brand with a formal alliance overseeing its usage. Large chunks of the Agile community seem to have become more interested in process improvement than in improving their skill or their techniques. Reading parody sites like this: and finally asking awkward questions about Scrum (it's a very bad sign when no-one will publicly answer a question like "is it possible to fail a Scrum course?") made me realise that 'Agile' has become a meaningless tag that I don't want to be associated with any more.

This was the year I started to really see the limits of my tools and techniques. This includes Java, TDD, the LAMP stack and the conventional approach to logging. As a result I'm doing more work with Python; thinking more in terms of invariants and algorithmic complexity as well as looking at tools like QuickCheck and SparseCheck; playing with App Engine and experimenting with the event sink approach that Nat Pryce and Steve Freeman describe in their book Growing Object Oriented Software.

The aforementioned limits (plus a discussion with Dan Creswell and Manik Surtani) lead me to the Scalability Staircase and I realised that much of my 'experience' was now invalid because I'm not working in an enterprise environment. In fact many of those enterprise software environments have radically changed but the people working in them are only just starting to realise that they'll need different approaches to software development now that they're working at web scale.

I started to appreciate the benefits of feature segmentation/partitioning through flags and dependency injection. Flickr wrote a somewhat controversial article about their approach although many of the critics haven't spotted the connection with bucket testing/multivariate testing.

In 2009 I found that I got better results by modelling people who were effective rather than brilliant. Effective people tended to have generated a lot of artefacts whilst the brilliant tended to agonise over a small set of projects and as such artificially constrained their potential impact. Perfectionism often got in the way of delivering results whilst focussing on getting a little better every day left with me lots of useful artefacts, lots of real-world feedback and more opportunities to take creative risks.

Getting in the habit of making improvements rather than offering criticism leads to lots of work but it also leads to things getting better. As a consequence I'm now spending more time helping out on open source projects. Another consequence was discovering that you're better off focussing on people's artefacts (their code, their data, the things they've made) rather than their opinions. Without that focus it becomes difficult to distinguish between people who have discovered better skills/techniques and those who just have a convincing argument.

One of the side-effects of my new-found appreciation for these 'productive communities' was that I attended more BarCamp-style events. At one of those I ran a session which gathered a group of people who understand the semantic web and Linked Data. Thanks to those people I now understand the appeal of Web Scale Identifiers. Basically the idea is that if your website is about something like books, music, musicians, etc which already has a good identifier (e.g. ISBN, ASIN or MusicBrainz id) then you should re-use that identifier. Inventing your own identifier or exposing your database's primary key or using the simple name leads to a wide range of annoying problems for your users whilst re-using existing identifiers means you get to benefit from the work being by the rest of the Web.