One of the more eagerly awaited presentations at the Semantic Tech & Business Conference in Berlin today was a late addition to the program from Denny Vrandecic. With the prominence of Dbpedia in the Linked Open Data Cloud, anything new from Wikipedia with data in it was bound to attract attention, and we were not disappointed.
Denny started by telling us that from March he would be moving to Berlin to work for the Wikimedia Foundation on WikiData.
He then went on to explain that the rich Wikipedia resource may have much of the world’s information but does not have all the answers. There vast differences in coverage between language versions for instance. Also it is not good at answering questions such as what are the 10 largest cities with a female mayor. You get some cities back but most if not all of them do not have a female mayor. One way to address this issue, that has proliferated in Wikipedia is Lists. The problem with lists is that there are so many of them, in several languages, with often duplicates, and then there are the array of lists of lists.
We must accept Wikipedia doesn’t have all the answers – humans can read articles but computers can not understand the meaning. WikiData created articles on a topic will point to the relevant wikipedia articles in all languages.
Dbpedia has been a great success at extracting information from Wikipedia info-boxes and publishing it as data, but it is not editable. WikiData will turn that model on it’s head, by providing an editable environment for data that will then be used to automatically populate the info-boxes. WikiData will also reference secondary databases. For example indicating that the CIA World Factbook provides a value for something.
WikiData will not define the truth, it will collect the references to the data.
Denny listed the objectives of the WikiData project to be:
- Provide a database of the world’s knowledge that anyone can edit
- Collect references and quotes for millions of data items
- Engage a sustainable community that collects data from everywhere in a machine-readable way
- Increase the quality and lower the maintenance costs of Wikipedia and related projects
- Deliver software and community best practices enabling others to engage in projects of data collection and provisioning
WikiData phase 1, which includes creating one WikiData page for each Wikipedia entity which then lists representations in each language. Those individual language versions will then pull the language links from WikiData, should be complete in the summer.
The second phase will include the centralisation of data vales for info-boxes and then have the Wikipedias populate their info-boxes from WikiData.
The final phase will be to enable inline queries against WikiData to be made from Wikipedias with the results surfaced in several formats.
Denny did not provide a schedule for the second an third phases.
This is all in addition to the ability to provide freely, re-usable, machine-readable access to the world’s data.
The beginnings of an interesting project from WikiMedia that could radically influence the data landscape – well woth watching as it progresses.