Remember that feeling of being a kid and building your first tree house with your mates. Remember the rough, moss covered wood, bent nails, that “borrowed” hammer from the shed, a bag of sweets to keep you all going and a couple of days hard work. Remember that feeling of success once you had finished! Remember standing back at the end of the day, looking over at your mates all with smiles on your faces. Remember how good that felt!
Well, I have just surfaced from two days of the company approved, fully grown up, adult equivalent, HACKATHONS.
7 people, over 2 days, took on the challenge to enhance solr, an open source enterprise search platform, and give it spatial / map based search capabilities.
Key to solr’s search capability is metadata, without good quality metadata there was no way we would be able to query geo-spatial data. This became the focus, the gathering of the wood if you like, problem was we needed to decide how we get good metadata then what we would need out of it. After much deliberating with post-its on walls and glass dividers turned into whiteboards, is was decided that reading 5 main spatial data formats and getting bounding-box geometries was going to be all we achieved over the 2 days, and let’s be honest our tree houses were never palaces!
To get this data we dived straight into tika, a toolkit which “detects and extracts metadata and structured text content from various documents”. By simply producing some extra metadata parsers to support spatial types, we could easily analyse files we wanted and get the geographic bounding box – simples. The day progressed well, each guy trying to extract different file types, coffees getting passed around, jokes being told, old videos occasionally thrown onto the projector from YouTube, and the glass divider slowly becoming less glass like as throughout the day pen went on. Until we had finally done it, files parsed and a simple (minX,minY),(maxX,maxY) calculated from the extent of the geometries within.
At this point, like your mum calling you back for dinner, it was time for the pub and some well earned food.
In fresh(ish) and all ready to go (coffee in hand) we started the second and final day. A 15 minute stand up led the way to explain where we were and what we had left to do, more post-its on the task board and we were all off. Today we were going to focus on getting the new metadata into solr, create a data enumerator that would find, process and post all our data, and create a simple front-end to demonstrate that the search was working.
Tasks shared, off we went. More videos, more coffee, and the remainder of the donuts from yesterdays 3 dozen, we were hacking again. Designs were getting passed about, ideas of how things could actually work discussed and the occasional silence as the now older grey matter worked that little bit harder on a single task. Then the days first, “done!” As the developer proudly walked over to our makeshift kanban board and moved another task from “in progress” to “complete”, routing back via the donuts – we had all the metadata we needed. A quick meeting and, like trying to decide where the window should be, we decided that Drupal should be the front-end of choice. Tasks created off we went again.
Hours passed and people still drove on, not that we had a delivery to do, not that we had a customer waiting, not that we couldn’t have stopped, but there was something in all of us wanting to get the end to end example working in 2 days, almost to prove it could be done.
Completion, end to end data load, parse, index and search. We had a nice working example of a search which showed areas of a map relating to our data coverage and the ability to spatially [q=geo:”interects(point(52,-2))”] query solr and get accurate data back.
Finally our new
home tree house was built, nails may have been half hammered in and bent, it may not have been pretty, it may not have been robust, but then again we didn’t have to sleep in our new tree house.