So I need to hang up some tools in my shed. I need some bent hook things – I think. Off to the hardware store in which I search for the fixings section. Following the signs hanging from the roof, my search is soon directed to a rack covered in lots of individual packets and I spot the thing I am looking for, but what’s this – they come in lots of different sizes. After a bit of localised searching I grab the size I need, but wait – in the next rack there are some specialised tool hanging devices. Square hooks, long hooks, double-prong hooks, spring clips, an amazing choice! Pleased with what I discovered and selected I’m soon heading down the isle when my attention is drawn to a display of shelving with hidden brackets – just the thing for under the TV in the lounge. I grab one of those and head for the checkout before my credit card regrets me discovering anything else.
We all know the library ‘browse’ experience. Head for a particular book, and come away with a different one on the same topic that just happened to be on a nearby shelf, or even a totally different one that you ‘found’ on the recently returned books shelf.
An ambition for the web is to reflect and assist what we humans do in the real world. Search has only brought us part of the way. By identifying key words in web page text, and links between those pages, it makes a reasonable stab at identifying things that might be related to the keywords we enter.
As I commented recently, Semantic Search messages coming from Google indicate that they are taking significant steps towards the ambition. By harvesting Schema.org described metadata embedded in html, by webmasters enticed by Rich Snippets, and building on the 12 million entity descriptions in Freebase they are amassing the fuel for a better search engine. A search engine [that] will better match search queries with a database containing hundreds of millions of “entities”—people, places and things.
How much closer will this better, semantic, search get to being able to replicate online the scenario I shared at the start of this post. It should do a better job of relating our keywords to the things that would be of interest, not just the pages about them. Having a better understanding of entities should help with the Paris Hilton problem, or at least help us navigate around such issues. That better understanding of entities, and related entities, should enable the return of related relevant results that did not contain our keywords.
But surely there is more to it than that. Yes there is, but it is not search – it is discovery. As in my scenario above, humans do not only search for things. We search to get ourselves to a start point for discovery. I searched for an item in the fixings section in the hardware store or a book in the the library I then inspected related items on the rack and the shelf to discover if there was anything more appropriate for my needs nearby. By understanding things and the [semantic] relationships between them, systems could help us with that discovery phase. It is the search engine’s job to expose those relationships but the prime benefit will emerge when the source web sites start doing it too.
Take what is still one of my favourite sites – BBC wildlife. Take a look at the Lion page, found by searching for lions in Google. Scroll down a bit and you will see listed the lion’s habitats and behaviours. These are all things or concepts related to the lion. Follow the link to the flooded grassland habitat, where you will find lists of flora and fauna that you will find there, including the aardvark which is nocturnal. Such follow-your-nose navigation around the site supports the discovery method of finding things that I describe. In such an environment serendipity is only a few clicks away.
There are two sides to the finding stuff coin – Search and Discovery. Humans naturally do both, systems and the web are only just starting to move beyond search only. This move is being enabled by the constantly growing data that is describing things and their relationships – Linked Data. A growth stimulated by initiatives such as Schema.org, and Google providing quick return incentives, such as Rich Snippets & SEO goodness, for folks to publish structured data for reasons other than a futuristic Semantic Web.