Replies: 5 comments 1 reply
-
@yakagami There is a QLever instance for DBpedia on https://qlever.cs.uni-freiburg.de/dbpedia . It takes only around one hour to build. I have three questions:
|
Beta Was this translation helpful? Give feedback.
-
Ah. I didn't think to check if dbpedia was available already, only wikipedia. 1 and 2. Yes, I find thier site very confusing. As another example they have dbpedia live which has an endpoint here that has seemingly been down for over a year.
I will try to take a look sometime to see what is necessary to extract just the infobox data from wikipedia and convert that to RDF. |
Beta Was this translation helpful? Give feedback.
-
Also note that the English Wikipedia is already part of our Wikidata instance. See "Index Information" on https://qlever.cs.uni-freiburg.de/wikidata . To show you an example of what's possible, here is an example query that finds all mentions of an astronaut from Wikidata in a sentence from Wikipedia: https://qlever.cs.uni-freiburg.de/wikidata/UmCdKa . You can combine arbitrary SPARQL searches on Wikidata with arbitrary keyword queries on Wikipedia that way. If you have particular use cases in mind, let us know. |
Beta Was this translation helpful? Give feedback.
-
This instance is updated once per week. But it indexes the text, not the infoboxes. Turning the infoboxes into triples looks rather straightforward to me. And in my understanding, it should be a (small) part of DBpedia. Maybe you can find out which DBpedia files cover this. If DBpedia is no longer updated frequently, it should also be relatively easy to extract the infoboxes from the Wikipedia articles oneself. Wikipedia is not that large (and much smaller than Wikidata, as far as sheer data sizeis concerned). |
Beta Was this translation helpful? Give feedback.
-
Seems like https://github.com/dbpedia/extraction-framework/blob/587d999f1b92221605b3c27d9c930ef12ab4aed1/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala is part of it. I will continue to look into this topic. Thanks for your input. |
Beta Was this translation helpful? Give feedback.
-
I think Wikipedia would be a great example for QLever. Besides unstructued text, Wikipedia has Infoboxes that contain structured data according to categorical templates. One can extract this data into RDF records similar to wikidata. Currently dbpedia does this and it takes "around 4-7 days" to convert the raw XML from wikipedia dumps to their RDF database. They have a SPARQL endpoint here. The database as I understand is updated only every 3 months compared to the wikipedia dumps which are every two weeks. Text search is obviously slow/nonexistant. Adding this would certainly be more involved/difficult to set up but should have a good payoff both to demonstate QLever's indexing and query speed and to provide a kowledge source to users.
Beta Was this translation helpful? Give feedback.
All reactions