Hack Manchester

Last month I attended Hack Manchester – a 24 coding event as part of the Manchester Science Festival, held at MOSI. Having only arranged to team up with Mike, we ended up joining two guys Shaf introduced us to, his colleagues from the BBC, by the names of Jack and Tom. The four of us formed a team, and after browsing the challenges set, we liked the idea of Intechnica‘s Bacon Number problem the most, but rather than just solve the Bacon Number problem, we derived the challenge and set off to build a tool to find the film set with the largest birthday party (most common birthday per film, among actors in the same film).

We decided the data provided was too poorly formatted, and because any alternatives (such as the Open Movie Database) required sign-up and prior approval, we ended up scraping IMDB for the actor birthday data. I wrote a Python script using Beautiful Soup which worked really well, it hit every day page on IMDB and stored each of the actor’s names in a MongoDB collection in the following format:

{
    birthday: '27-10',
    name: 'John Cleese'
}

I ran the script for the 1st Jan page, to see that it worked – and I had a number of records stored, all with ’01-01′ as the date, so it seemed to be working ok. I wrapped a loop around it to hit every day of every month. I started running the script, with no idea how long it would take. I quit it quite early and added a print statement on every new month for indication as to where it was up to. Just how I used to solve most of the number crunching Project Euler problems! I watched it run, and it seemed to take about a minute and a half per month, so it took about 20 mins to run in total (it also crashed out at one point when it got a 500 error from IMDB – I deleted all from the collection from May (incomplete) and ran it from there again, in order not to get duplicates, or miss any out!). Also, I should point out that we were having to run off tethering from my phone, because the Hack Manchester wifi only had 100 IPs to dish out (not ideal for a hackathon with 250 geeks with ~4 devices each!) – a real shame as I’m sure the organisers did all they could to reassure the providers that they would need a lot of connected devices. Quite a lot of data ran through my phone that night – hundreds of hits at IMDB, various packages (such as Beautiful Soup, the Mongo libraries, the IMDB text file data, etc.)

I sanity-checked this data, by looking at the number of records held in each of the dates in the collection. I noticed that 1 January had a significant number more than all the other dates. I assumed I had left the data in from when I initially ran it on 1st Jan to test the script – although it was more than 2, even 3 (actually about 10) times all the other days. I deleted Jan 1st and ran it on that day again, and got the same number. I looked at the IMDB page for 1st Jan and there were genuinely a lot more than for any other day. I asked around my team mates for an idea – someone suggested that people aim for 1st January as a birth date, but I said it’s not distributed among nearby dates, and that didn’t really make sense anyway. Of course (you probably already deduced this – please excuse us – we were tired), it was that 1st January would have been the default value if no date was entered, or maybe this list included actors without a birth date given.

I committed this code at around 1:45am, and about 45 minutes later, while browsing the team’s work on github, I noticed the commit times for some files. The times, given in a friendly time-relative human-readable way were:

30 minutes ago
in 16 minutes
27 minutes ago
an hour ago
5 hours ago

What’s that? I committed the file … in 16 minutes? As in, in the future? How is that so? Well of course, Hack Manchester happened overnight on the day in the year when daylight saving reverts back and we move from BST to GMT, and this happens at 2am, when it goes back to 1am. So every ‘time’ between 01:00 and 01:59 happened twice. I thought this was rather amusing :)

We then had a searchable database of actors and their birthday. Jack whipped up a Twitter Bootstrap web interface, in to which I added some PHP code (using the PHP MongoDB library) to display a list of actors with a given birthday, or show a given actor’s birthday. At this point we have no movies stored, so we had limited functionality.

Meanwhile, Mike had been writing a bunch of PHP classes containing methods for looking up the data. He’d also started writing a Ruby script to extract film-actor data from some text files he found somewhere. He’d had real trouble extracting out the data in a way it would be useful to us. It was tab-separated and had referenced films by random alphanumeric IDs rather than film names, and also contained a ridiculous number of porno films. Later on, Jack and I adapted this code to try to get it to insert the data in to our existing MongoDB collection. It was quite fiddly, and we weren’t really sure how accurately the data was being collated, but worth a try!

At this point we had a discussion about how we would store the data. Someone suggested:

We need another table to store the films, and another to store the film-actor relations

Erm, that’s not how Mongo works. I’m no expert, and my solution may not have been the best, or Mongo-est, but I know you can store lists as values (making multiple ‘tables’, or collections or whatever, unnecessary), so I suggested we would be fine to add a ‘movies’ field to the existing actor storage, which would be a list of films they’d been in, e.g:

{
    birthday: '18-12',
    name: 'Brad Pitt',
    movies: ['Fight Club', 'Inglourious Basterds']
}

We managed to figure out how to add the movie field to an actor, and how to append a movie to list already containing one, and we wrote this in to the script and let it run. We left in a print statement to see what was happening, which obviously slowed the process down a lot. Think about how many movies there are, and think about how many actors there are. Now think about how many instances of an actor being in a movie. That’s a lot. It took bloody ages. And didn’t seem to work. We were out of time by this point (in fact time was almost up when the program started running). Doing this properly we’d have tested it better, and ensured all data was being entered correctly. We were just having a bash at getting it to work.

All of us completely exhausted, we awaited the event closing and awards ceremony. Mike and I had stayed in the museum all night – each attempting a short nap on a couple of occasions, rather unsuccessfully in a room full of geeks bashing away at their respective keyboards. Tom had a prior engagement, so he shot off early evening, and Jack headed home later on due to problems with the wifi, and worked on setting us up an amazon instance to host the project from home.

Among the hackers were many friends of mine – including a team consisting of Michael Heap and Tim Hastings; an MMU team with Farkie; a Manchester Girl Geeks team; a couple of Laterooms team including Mark/Kirsty, Jim & Andy; a Thoughtworks team with Daley, and so on. I had plenty of people to chat to while taking breaks (I drank a lot of coffee) – and met a bunch of new people too.

It came to the closing and the winners of each category was named, and had a chance to give a short demo of their project. Some amazing stuff went on show – it was great to see so much innovation from so many teams. By chance, no-one else had chosen the Bacon Number challenge, so we won by default! A bit lame, I know, but the way I see it is that we weren’t so awful that they decided to withdraw! I count that as a win. And what was the prize? A brand new 512MB Raspberry Pi each! Can’t complain! Huge thanks to Intechnica for the prizes :)

Also a great big thanks to Gemma and Sean for putting the event on. It was fantastic! I will definitely enter events like this in future, even without a team – you can always group up with people and get something done. I was worried about working with people who used different languages or frameworks and that we wouldn’t be able to get things done, but we pooled ideas and skills together and managed to build some cool stuff! Also thanks to MOSI for the use of the space (all through the night!) during the science festival.

The code from our hack is available at github – it may or may not get updated/fixed in future, but at the time of writing was as we left at the end of the event

Also check out Farkie’s blog post on the Magma Digital blog – Hack Manchester 2012

Norwich City FC Angry at Fan For Leaking New Kit

I just read an article on BBC News. All quotes are taken directly from that article. The link is at the bottom of this post.

Norwich City kit published on internet by boy ahead of launch

When I read this I imagined the boy in question had illegally obtained pictures of the new kit, and posted them to facebook/twitter/blogs/etc. I read on.

A 17-year-old Norwich City fan has angered the club by leaking pictures of its new kit 12 hours before the official launch.

Yep. Sounds about right. I’m guessing he had access to the pictures, or maybe he stole them from somewhere, or broke in to a place holding them and took pictures himself. I read on.

Norfolk Police were called in after IT student Chris Brown published images of the 2012-13 strip on the internet.

Ooh! The Police were called. This must have been serious. And he’s an IT student? That must be relevant. He must have hacked in to something, maybe someone’s laptop, and stole the pictures.

The teenager, from Norwich, managed to obtain the pictures from the club’s kit launch website as it was being updated.

Oh. So he hacked a web server and illegally obtained images from it, somehow?

Chris told the BBC he was able to take the images from a section of the site that was being worked on, finding them linked from a file within the website’s source code.
Any computer user can view this through their internet browser.

Wait, so he accessed their public website, right-clicked, went to view source, spotted a link in the HTML, went to the URL, chopped a filename off the end of it to view the directory listing, spotted an images folder, clicked on a few links to images which opened up in his browser, then right-click and saved the pictures? Pictures they uploaded to a public website? Right. A few thoughts came to mind:

1. That is not illegal

2. That is not morally wrong

3. That’s easy

4. They should have protected the directory listing if they didn’t want its contents made public (standard practice)

5. That is not hacking, he is not a hacker, he is not dangerous

6. They should not have called the Police

I read on.

The club’s chief executive David McNally said he had asked for a report into what happened.
He said: “We are the guardians of the football club whilst we’re here and so we will protect our property.
“And our property in the digital age involves our intellectual property, so we won’t allow anybody to come in and take it from us.”

7. Although they legally own the images, they chose to put them on the internet, making them publicly available (not viewable in the website but still easy to locate) and chose not to add even basic protection to them

8. Their webmasters/sysadmins are very amateur

9. The club officials taking this matter seriously are foolish for thinking he’s done something wrong, and they clearly don’t know how the internet works

10. He hasn’t done anything damaging to the club, merely leaked a “spoiler” of their new shirt before they had unveiled it

Copyright infringement is only a criminal offence if someone makes money from it or causes the copyright owner serious damage. If that is not the case, it would be up to the club to take action – not the police.
The club has not said if it will pursue the matter any further legally.

There is nothing wrong with what this boy did. He’s a long-term fan of the club – a season ticket holder. He was clearly just eager to know what the new kit was like, noticed they were building a new section of the website, decided to have a snoop to see what he could find. He found what he was looking for (due to their incompetence) and wanted to share it to let others see what he had been so eager to see. Anyone in his position would have done exactly the same, whether for a football shirt, film, TV show, music album, comic book, book release. In fact most of those examples are likely to be much more damaging than his football shirt leak – people seeing it “early” will not make a difference to the number they sell, and he’s not offering an free or cheap alternative to buying the shirt because of the pictures he’s shared. They’re pictures of a football shirt, not replicas of it, or a DIY make-your-own-shirt blueprint. The club making a deal of this is a joke.

11. Oh, and, it’s a football shirt. This really isn’t a big deal.

BBC News Link: Norwich City kit published on internet by boy ahead of launch

Hack To The Future

The BBC wrote about Hack To The Future on their Research & Development blog, including a short video featuring their coverage of me explaining my nontransitive dice session! Also some screen time with Sam of Manchester Girl Geeks (who gave a brilliant keynote); Tom Crick (Cardiff Metropolitan University); and of course, organiser Alan O’Donahoe.

I had a great time at the event, got to witness a real sensation of excitement for computer science amongst children. It’s really encouraging that we’re finally harnessing that energy and hopefully will be able to give direction to the kids interested in taking up programming.

See the full article – Teaching coding to kids at Hack to the Future – BBC R&D