DBpedia Schema Queries

Overview

In this notebook, I begin the process of analyzing the schema of the DBpedia Ontology. This is a local notebook in which I load data from the filesystem into an in-memory graph, thus it is part of the unit tests for gastrodon. This is feasible because the schema is much smaller than DBpedia as a whole.

The following diagram illustrates the relationships between DBpedia Ontology, it's parts, DBpedia, and the world it describes

The numbers above are really rough (by as much as 30 orders of magnitude)), but some key points are:

  • The DBpedia Ontology has it's own Ontology, which is a subset of RDFS, OWL, Dublin Core, Prov and similar vocabularies
  • The DBpedia Ontology is much smaller (thousands of times) than DBpedia itself
  • DBpedia does not directly describe the external universe, but instead describes Wikipedia, which itself describes the universe.

It's important to keep these layers straight, because in this notebook, we are looking at a description of the vocabulary used in DBpedia that uses RDFS, OWL, etc. vocabulary. RDF is unusual among data representations in that schemas in RDF are themselves written in RDF, and can be joined together with the data they describe. In this case, however, I've separated out a small amount of schema data that I intend to use to control operations against a larger database, much like the program of a numerically controlled machine tool or the punched cards that control a Jacquard Loom.

This notebook is part of the test suite for the Gastrodon framework, and a number of bugs were squashed and API improvements made in the process of creating it. It will be of use to anyone who wants to better understand RDF, SPARQL, DBPedia, Pandas, and how to put it all together with Gastrodon.

Setup

As always, I import names from the Python packages I use:

In [1]:
%load_ext autotime
import sys
sys.path.append("../..")

%matplotlib inline
from rdflib import Graph,URIRef
from gastrodon import LocalEndpoint,one,QName
import gzip
import pandas as pd
pd.set_option("display.width",100)
pd.set_option("display.max_colwidth",80)

Loading the graph

I Zopfli compressed a copy of the DBpedia Ontology from version 2015-10, so I can load it like so:

In [2]:
g=Graph()
time: 1.99 ms
In [3]:
g.parse(gzip.open("data/dbpedia_2015-10.nt.gz"),format="nt")
Out[3]:
<Graph identifier=Nb3891148d36e4e0d8954e500c4bfbdeb (<class 'rdflib.graph.Graph'>)>
time: 4.57 s

Now it is loaded in memory in an RDF graph which I can do SPARQL queries on; think of it as a hashtable on steroids. I can get the size of the graph (number of triples) the same way I would get the size of any Python object:

In [4]:
len(g)
Out[4]:
30318
time: 3 ms

The Graph is supplied by RDFLib, but I wrap it in an Endpoint object supplied by Gastrodon; this provides a bridge between RDFLib and pandas as well as smoothing away the difference between a local endpoint and remote endpoints (a SPARQL database running in another process or on another computer)

In [5]:
e=LocalEndpoint(g)
time: 10 ms

Counting properties and discovering namespaces

Probably the best query to run on an unfamiliar database is to count the properties (predicates) used in it. Note that the predicates that I'm working with in this stage are the predicates that make up the DBpedia Ontology, they are not the predicates that are used in the larger DBpedia Ontology. I'll show you those later.

In [6]:
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties
Out[6]:
cnt
p
rdfs:label 11645
rdf:type 6681
http://www.w3.org/ns/prov#wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1208
rdfs:subPropertyOf 971
rdfs:subClassOf 748
http://www.w3.org/2002/07/owl#equivalentClass 407
http://www.w3.org/2002/07/owl#equivalentProperty 222
http://www.w3.org/2002/07/owl#disjointWith 24
http://creativecommons.org/ns#license 2
http://purl.org/dc/terms/issued 1
http://purl.org/dc/terms/title 1
http://purl.org/dc/terms/description 1
http://purl.org/dc/terms/source 1
http://www.w3.org/2002/07/owl#versionInfo 1
http://purl.org/dc/terms/publisher 1
http://xmlns.com/foaf/0.1/homepage 1
http://purl.org/vocab/vann/preferredNamespaceUri 1
http://purl.org/dc/terms/creator 1
http://purl.org/vocab/vann/preferredNamespacePrefix 1
http://purl.org/dc/terms/modified 1
time: 4.98 s

Note that the leftmost column is bold; this is because gastrodon recognized that this query groups on the ?p variable and it made this an index of the pandas dataframe. Gastrodon uses the SPARQL parser from RDFLib to understand your queries to support you in writing and displaying them. One advantage of this is that if you want to make a plot from the above data frame (which I'll do in a moment after cleaning the data) the dependent and independent variables will be automatically determined and things will 'just work'.

Another thing to note is that the table shows short names such as rdfs:label as well as full URIs for predicates. The full URIs are tedious to work with, so I add a number of namespace declarations and make a new LocalEndpoint

In [7]:
g.bind("prov","http://www.w3.org/ns/prov#")
g.bind("owl","http://www.w3.org/2002/07/owl#")
g.bind("cc","http://creativecommons.org/ns#")
g.bind("foaf","http://xmlns.com/foaf/0.1/")
g.bind("dc","http://purl.org/dc/terms/")
g.bind("vann","http://purl.org/vocab/vann/")
time: 3 ms
In [8]:
e=LocalEndpoint(g)
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties
Out[8]:
cnt
p
rdfs:label 11645
rdf:type 6681
prov:wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1208
rdfs:subPropertyOf 971
rdfs:subClassOf 748
owl:equivalentClass 407
owl:equivalentProperty 222
owl:disjointWith 24
cc:license 2
dc:issued 1
dc:title 1
dc:description 1
dc:source 1
owl:versionInfo 1
dc:publisher 1
foaf:homepage 1
vann:preferredNamespaceUri 1
dc:creator 1
vann:preferredNamespacePrefix 1
dc:modified 1
time: 3.48 s

Metadata about the DBpedia Ontology

I find it suspicious that so many properties occur only once, so I investigate:

In [9]:
single=e.select("""
   SELECT ?s {
      ?s dc:source ?o .
    }
""")
single
Out[9]:
s
0 http://dbpedia.org/ontology/
time: 56.1 ms

The one function will extract the single member of any list, iterable, DataFrame, or Series that has just one member.

In [10]:
ontology=one(single)
time: 1.5 ms

The select function can see variables in the stack frame that calls it. Simply put, if you use the ?_ontology variable in a SPARQL query, select will look for a Python variable called ontology, and substitute the value of ontology into ?_ontology. The underscore sigil prevents substitutions from happening by accident.

and we see here a number of facts about the DBpedia Ontology, that is, the data set we are working with.

In [11]:
meta=e.select("""
    SELECT ?p ?o {
        ?_ontology ?p ?o .
    } ORDER BY ?p
""")
meta
Out[11]:
p o
0 cc:license http://www.gnu.org/copyleft/fdl.html
1 cc:license http://creativecommons.org/licenses/by-sa/3.0/
2 dc:creator DBpedia Maintainers and Contributors
3 dc:description \n The DBpedia ontology provides the classes and properties use...
4 dc:issued 2008-11-17T12:00Z
5 dc:modified 2015-11-02T09:36Z
6 dc:publisher DBpedia Maintainers
7 dc:source http://mappings.dbpedia.org
8 dc:title The DBpedia Ontology
9 vann:preferredNamespacePrefix dbo
10 vann:preferredNamespaceUri http://dbpedia.org/ontology/
11 rdf:type http://purl.org/vocommons/voaf#Vocabulary
12 rdf:type owl:Ontology
13 rdfs:comment \n This ontology is generated from the manually created specifi...
14 owl:versionInfo 4.1-SNAPSHOT
15 foaf:homepage http://wiki.dbpedia.org/Ontology
time: 64.5 ms
In [12]:
ontology
Out[12]:
rdflib.term.URIRef('http://dbpedia.org/ontology/')
time: 2.51 ms

How Gastrodon handles URI References

Gastrodon tries to display things in a simple way while watching your back to prevent mistakes. One potential mistake is that RDF makes a distinction between a literal string such as "http://dbpedia.org/ontology/" and a URI references such as <http://dbpedia.org/ontology/>. Use the wrong one and your queries won't work!

This particularly could be a problem with abbreviated names, for instance, let's look at the first predicate in the meta frame. When displayed in as a result of Jupyter notebook or in a Pandas Dataframe, the short name looks just like a string:

In [13]:
license=meta.at[0,'p']
license
Out[13]:
'cc:license'
time: 13.5 ms

that's because it is a string! It's more than a string, however, it is a class which is a subclass of string:

In [14]:
type(license)
Out[14]:
gastrodon.GastrodonURI
time: 21.5 ms

and in fact has the full URI reference hidden away inside of it

In [15]:
meta.at[0,'p'].to_uri_ref()
Out[15]:
rdflib.term.URIRef('http://creativecommons.org/ns#license')
time: 17 ms

When you access this value in a SPARQL query, the select function recognizes the type of the variable and automatically inserts the full URI reference

In [16]:
e.select("""
    SELECT ?s ?o {
        ?s ?_license ?o .
    }
""")
Out[16]:
s o
0 http://dbpedia.org/ontology/ http://creativecommons.org/licenses/by-sa/3.0/
1 http://dbpedia.org/ontology/ http://www.gnu.org/copyleft/fdl.html
time: 41.5 ms

Counting properties that are not about the Ontology

Since the metadata properties that describe this dataset really aren't part of it, it makes sense to remove these from the list so that we don't have so many properties that are used just once

In [17]:
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
      FILTER(?s!=?_ontology)
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties
Out[17]:
cnt
p
rdfs:label 11645
rdf:type 6679
prov:wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1207
rdfs:subPropertyOf 971
rdfs:subClassOf 748
owl:equivalentClass 407
owl:equivalentProperty 222
owl:disjointWith 24
time: 8.29 s

At this point it is about as easy to make a pie chart as it is with Excel. A pie chart is a good choice here because each fact has exactly one property in it:

In [18]:
properties["cnt"].plot.pie(figsize=(6,6)).set_ylabel('')
Out[18]:
<matplotlib.text.Text at 0x1af7e74f780>
time: 265 ms

My favorite method for understanding this kind of distribution is to sort the most common properties first and then compute the Cumulative Distribution Function, which is the percentage of facts that have used the predicates we've seen so far.

This is easy to compute with Pandas

In [19]:
100.0*properties["cnt"].cumsum()/properties["cnt"].sum()
Out[19]:
p
rdfs:label                 38.429807
rdf:type                   60.471256
prov:wasDerivedFrom        71.803841
rdfs:range                 80.245528
rdfs:domain                88.188898
rdfs:comment               92.172134
rdfs:subPropertyOf         95.376543
rdfs:subClassOf            97.845027
owl:equivalentClass        99.188172
owl:equivalentProperty     99.920797
owl:disjointWith          100.000000
Name: cnt, dtype: float64
time: 8 ms

Note that this result looks different than the DataFrames you've seen so far because it is not a DataFrame, it is a series, which has just one index column and one data column. It's possible to stick several series together to make a DataFrame, however.

In [20]:
pd.DataFrame.from_items([
    ('count',properties["cnt"]),
    ("frequency",100.0*properties["cnt"]/properties["cnt"].sum()),
    ("distribution",100.0*properties["cnt"].cumsum()/properties["cnt"].sum())
])
Out[20]:
count frequency distribution
p
rdfs:label 11645 38.429807 38.429807
rdf:type 6679 22.041449 60.471256
prov:wasDerivedFrom 3434 11.332585 71.803841
rdfs:range 2558 8.441687 80.245528
rdfs:domain 2407 7.943370 88.188898
rdfs:comment 1207 3.983235 92.172134
rdfs:subPropertyOf 971 3.204409 95.376543
rdfs:subClassOf 748 2.468484 97.845027
owl:equivalentClass 407 1.343146 99.188172
owl:equivalentProperty 222 0.732625 99.920797
owl:disjointWith 24 0.079203 100.000000
time: 25.5 ms

Unlike many graphical depictions, the above chart is fair to both highly common and unusually rare predicates.

Languages

It makes sense to start with rdf:label, which is the most common property in this database.

Unlike many data formats, RDF supports language tagging for strings. Objects (mainly properties and classes used in DBpedia) that are talked about in the DBpedia Ontology are described in multiple human languages, and counting the language tags involves a query that is very similar to the property counting query:

In [21]:
e.select("""
   SELECT (LANG(?label) AS ?lang) (COUNT(*) AS ?cnt) {
      ?s rdfs:label ?label .
   } GROUP BY LANG(?label) ORDER BY DESC(?cnt)
""")
Out[21]:
lang cnt
0 en 3953
1 de 2049
2 nl 1296
3 el 1227
4 fr 755
5 ga 469
6 ja 374
7 sr 259
8 es 256
9 it 244
10 ko 237
11 pt 221
12 pl 120
13 gl 39
14 tr 27
15 sl 26
16 ca 26
17 ru 19
18 bg 11
19 zh 11
20 id 6
21 ar 5
22 bn 4
23 be 4
24 eu 4
25 lv 1
26 cs 1
27 hy 1
time: 1.66 s

A detail you might notice is that the lang column is not bolded, instead, a sequential numeric index was created when I made the data frame. This is because Gastrodon, at this moment, isn't smart enough to understand a function that appears in the GROUP BY clause.

This is easy to work around by assinging the output of this function to a variable in a BIND clause.

In [22]:
lang=e.select("""
   SELECT ?lang (COUNT(*) AS ?cnt) {
      ?s rdfs:label ?label .
      BIND (LANG(?label) AS ?lang)
   } GROUP BY ?lang ORDER BY DESC(?cnt)
""")
lang
Out[22]:
cnt
lang
en 3953
de 2049
nl 1296
el 1227
fr 755
ga 469
ja 374
sr 259
es 256
it 244
ko 237
pt 221
pl 120
gl 39
tr 27
sl 26
ca 26
ru 19
bg 11
zh 11
id 6
ar 5
bn 4
be 4
eu 4
lv 1
cs 1
hy 1
time: 2.38 s

One key to getting correct results in a data analysis is to test your assumptions. English is the most prevalent language by far, but can we assume that every object has an English name? There are 3593 objects with English labels, but

In [23]:
distinct_s=one(e.select("""
   SELECT (COUNT(DISTINCT ?s) AS ?cnt) {
      ?s rdfs:label ?o .
   }
"""))
distinct_s
Out[23]:
3954
time: 1.13 s

objects with labels overall, so there must be at least one object without an English label. SPARQL has negation operators so we can find objects like that:

In [24]:
black_sheep=one(e.select("""
   SELECT ?s {
      ?s rdfs:label ?o .
      FILTER NOT EXISTS {
          ?s rdfs:label ?o2 .
          FILTER(LANG(?o2)='en')
      }
   }
"""))
black_sheep
Out[24]:
rdflib.term.URIRef('http://dbpedia.org/ontology/hasSurfaceForm')
time: 20.3 s

Looking up all the facts for that object (which is a property used in DBpedia) shows that it has a name in greek, but not any other language

In [25]:
meta=e.select("""
    SELECT ?p ?o {
        ?_black_sheep ?p ?o .
    } ORDER BY ?p
""")
meta
Out[25]:
p o
0 rdf:type owl:DatatypeProperty
1 rdf:type rdf:Property
2 rdfs:comment Reserved for DBpedia.
3 rdfs:label επιφάνεια από
4 rdfs:range xsd:string
5 prov:wasDerivedFrom http://mappings.dbpedia.org/index.php/OntologyProperty:hasSurfaceForm
time: 37.5 ms

I guess that's the exception that proves the rule. Everything else has a name in English, about half of the schema objects have a name in German, and the percentage falls off pretty rapidly from there:

In [26]:
lang_coverage=100*lang["cnt"]/distinct_s
lang_coverage
Out[26]:
lang
en    99.974709
de    51.820941
nl    32.776935
el    31.031866
fr    19.094588
ga    11.861406
ja     9.458776
sr     6.550329
es     6.474456
it     6.170966
ko     5.993930
pt     5.589277
pl     3.034901
gl     0.986343
tr     0.682853
sl     0.657562
ca     0.657562
ru     0.480526
bg     0.278199
zh     0.278199
id     0.151745
ar     0.126454
bn     0.101163
be     0.101163
eu     0.101163
lv     0.025291
cs     0.025291
hy     0.025291
Name: cnt, dtype: float64
time: 6 ms

As the percentages add up to more than 100 (an object can have names in many languages), the pie chart would be a wrong choice, but a bar chart is effective.

In [27]:
lang_coverage.plot(kind="barh",figsize=(10,6))
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1af7e64b208>
time: 502 ms

Classes used in the DBpedia Ontology

I use another GROUP BY query to count classes used in the DBpedia Ontology. In the name of keeping the levels of abstraction straight, I'll point out that there are eight classes in the DBpedia Ontology, but that the DBpedia Ontology describes 739 classes used in DBpedia.

In [28]:
types=e.select("""
   SELECT ?type (COUNT(*) AS ?cnt) {
      ?s a ?type .
   } GROUP BY ?type ORDER BY DESC(?cnt)
""")
types
Out[28]:
cnt
type
rdf:Property 2695
owl:DatatypeProperty 1734
owl:ObjectProperty 1099
owl:Class 739
rdfs:Datatype 382
owl:FunctionalProperty 30
owl:Ontology 1
http://purl.org/vocommons/voaf#Vocabulary 1
time: 672 ms

739 classes are really a lot of classes! You personally might be interested in some particular domain (say Pop Music) but to survey the whole thing, I need some way to pick out classes which are important.

If I had access to the whole DBpedia database, I could count how many instances of these classes occur, and that would be one measure of importance. (I have access to this database, as do you, but I'm not using it for this notebook because I want this notebook to be self-contained)

As it is, one proxy for importance is how many properties apply to a particular class, or, in RDFS speak, how many properties have this class as the domain. The assumption here is that important classes are well documented, and we get a satisfying list of the top 20 classes this way

In [29]:
types=e.select("""
   SELECT ?s (COUNT(*) AS ?cnt) {
      ?s a owl:Class .
      ?p rdfs:domain ?s .
   } GROUP BY ?s ORDER BY DESC(?cnt) LIMIT 20
""")
types
Out[29]:
cnt
s
http://dbpedia.org/ontology/Person 253
http://dbpedia.org/ontology/Place 183
http://dbpedia.org/ontology/PopulatedPlace 151
http://dbpedia.org/ontology/Athlete 94
http://dbpedia.org/ontology/Settlement 56
http://dbpedia.org/ontology/School 47
http://dbpedia.org/ontology/SpaceMission 43
http://dbpedia.org/ontology/Island 38
http://dbpedia.org/ontology/MilitaryUnit 29
http://dbpedia.org/ontology/Organisation 27
http://dbpedia.org/ontology/Species 27
http://dbpedia.org/ontology/Work 27
http://dbpedia.org/ontology/Planet 27
http://dbpedia.org/ontology/MeanOfTransportation 25
http://dbpedia.org/ontology/Broadcaster 25
http://dbpedia.org/ontology/Spacecraft 25
http://dbpedia.org/ontology/ArchitecturalStructure 24
http://dbpedia.org/ontology/Film 24
http://dbpedia.org/ontology/Artist 22
http://dbpedia.org/ontology/RouteOfTransportation 22
time: 605 ms

Adding another namespace binding makes sense to make the output more managable

In [30]:
g.bind("dbo","http://dbpedia.org/ontology/")
e=LocalEndpoint(g)
types=e.select("""
   SELECT ?s (COUNT(*) AS ?cnt) {
      ?s a owl:Class .
      ?p rdfs:domain ?s .
   } GROUP BY ?s ORDER BY DESC(?cnt) LIMIT 5
""")
types.head()
Out[30]:
cnt
s
dbo:Person 253
dbo:Place 183
dbo:PopulatedPlace 151
dbo:Athlete 94
dbo:Settlement 56
time: 448 ms

Common properties for People

To survey some important properties that apply to a dbo:Person I need some estimate of importance. I choose to count how many languages a property is labeled with as a proxy for importance -- after all, if a property is broadly interesting, it will be translated into many languages. The result is pretty satisfying.

In [31]:
person_types=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
       ?p rdfs:domain dbo:Person .
       ?p rdfs:label ?l .
   } GROUP BY ?p ORDER BY DESC(?cnt) LIMIT 30
""")
person_types
Out[31]:
cnt
p
dbo:birthDate 10
dbo:birthPlace 9
http://dbpedia.org/ontology/Person/height 8
http://dbpedia.org/ontology/Person/weight 8
dbo:knownFor 7
dbo:nationality 7
dbo:school 6
dbo:bloodType 6
dbo:achievement 6
dbo:deathPlace 6
dbo:placeOfBurial 6
dbo:child 6
dbo:birthName 6
dbo:residence 6
dbo:deathDate 6
dbo:eyeColor 6
dbo:relative 5
dbo:deathYear 5
dbo:spouse 5
dbo:education 5
dbo:parent 5
dbo:birthYear 5
dbo:waistSize 5
dbo:university 5
dbo:bustSize 5
dbo:college 5
dbo:hairColor 5
dbo:sibling 5
dbo:deathCause 4
dbo:hipSize 4
time: 339 ms

To make something that looks like a real report, I reach into my bag of tricks.

Since the predicate URI contains an English name for the predicate, I decide to show a label in German. The OPTIONAL clause is essential so that we don't lose properties that don't have a German label (there is exactly one in the list below). I use a subquery to compute the language count, and then filter for properties that have more than one language.

In [32]:
e.select("""
   SELECT ?p ?range ?label ?cnt {
        ?p rdfs:range ?range .
        OPTIONAL { 
            ?p rdfs:label ?label .
             FILTER(LANG(?label)='de')
        }
        {
           SELECT ?p (COUNT(*) AS ?cnt) {
               ?p rdfs:domain dbo:Person .
               ?p rdfs:label ?l .
           } GROUP BY ?p ORDER BY DESC(?cnt)
        }
       FILTER(?cnt>4)
   } ORDER BY DESC(?cnt)
""")
Out[32]:
p range label cnt
0 dbo:birthDate xsd:date Geburtsdatum 10
1 dbo:birthPlace dbo:Place Geburtsort 9
2 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre Höhe (cm) 8
3 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram Gewicht (kg) 8
4 dbo:nationality dbo:Country Nationalität 7
5 dbo:deathDate xsd:date Sterbedatum 6
6 dbo:residence dbo:Place Residenz 6
7 dbo:eyeColor xsd:string Augenfarbe 6
8 dbo:birthName rdf:langString Geburtsname 6
9 dbo:school dbo:EducationalInstitution schule 6
10 dbo:deathPlace dbo:Place Sterbeort 6
11 dbo:placeOfBurial dbo:Place Ort der Bestattung 6
12 dbo:child dbo:Person Kind 6
13 dbo:relative dbo:Person Verwandter 5
14 dbo:sibling dbo:Person Geschwister 5
15 dbo:waistSize xsd:double Taillenumfang (μ) 5
16 dbo:birthYear xsd:gYear Geburtsjahr 5
17 dbo:university dbo:EducationalInstitution Universität 5
18 dbo:hairColor xsd:string Haarfarbe 5
19 dbo:spouse dbo:Person Ehepartner 5
20 dbo:bustSize xsd:double None 5
21 dbo:parent dbo:Person Elternteil 5
22 dbo:deathYear xsd:gYear Sterbejahr 5
23 dbo:college dbo:EducationalInstitution College 5
time: 4.7 s

Towards a simple schema browser

You'd probably agree with me that the query above is getting to be a bit much, but now that I have it, I can bake it into a function which makes it easy to ask questions of the schema. The following query lets us make a similar report for any class and any language. (I use the German word for 'class' because the English word class and the synonymous word type are both reserved words in Python.)

In [33]:
def top_properties(klasse='dbo:Person',lang='de',threshold=4):
    klasse=QName(klasse)
    df=e.select("""
       SELECT ?p ?range ?label ?cnt {
            ?p rdfs:range ?range .
            OPTIONAL { 
                ?p rdfs:label ?label .
                 FILTER(LANG(?label)=?_lang)
            }
            {
               SELECT ?p (COUNT(*) AS ?cnt) {
                   ?p rdfs:domain ?_klasse .
                   ?p rdfs:label ?l .
               } GROUP BY ?p ORDER BY DESC(?cnt)
            }
           FILTER(?cnt>?_threshold)
       } ORDER BY DESC(?cnt)
    """)
    return df.style.highlight_null(null_color='red')
time: 5.5 ms

Note that the select here can see variables in the immediately enclosing scope, that is, the function definition. As it is inside a function definition, it does not see variables defined in the Jupyter notebook. The handling of missing values is a big topic in Pandas, so I take the liberty of highlighting the label that is missing in German.

In [34]:
top_properties()
Out[34]:
p range label cnt
0 dbo:birthDate xsd:date Geburtsdatum 10
1 dbo:birthPlace dbo:Place Geburtsort 9
2 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre Höhe (cm) 8
3 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram Gewicht (kg) 8
4 dbo:nationality dbo:Country Nationalität 7
5 dbo:deathDate xsd:date Sterbedatum 6
6 dbo:residence dbo:Place Residenz 6
7 dbo:eyeColor xsd:string Augenfarbe 6
8 dbo:birthName rdf:langString Geburtsname 6
9 dbo:school dbo:EducationalInstitution schule 6
10 dbo:deathPlace dbo:Place Sterbeort 6
11 dbo:placeOfBurial dbo:Place Ort der Bestattung 6
12 dbo:child dbo:Person Kind 6
13 dbo:relative dbo:Person Verwandter 5
14 dbo:sibling dbo:Person Geschwister 5
15 dbo:waistSize xsd:double Taillenumfang (μ) 5
16 dbo:birthYear xsd:gYear Geburtsjahr 5
17 dbo:university dbo:EducationalInstitution Universität 5
18 dbo:hairColor xsd:string Haarfarbe 5
19 dbo:spouse dbo:Person Ehepartner 5
20 dbo:bustSize xsd:double None 5
21 dbo:parent dbo:Person Elternteil 5
22 dbo:deathYear xsd:gYear Sterbejahr 5
23 dbo:college dbo:EducationalInstitution College 5
time: 4.53 s

In Japanese, a different set of labels is missing. It's nice to see that Unicode characters outside the latin-1 codepage work just fine.

In [35]:
top_properties(lang='ja')
Out[35]:
p range label cnt
0 dbo:birthDate xsd:date 生年月日 10
1 dbo:birthPlace dbo:Place 生地 9
2 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre 身長 (cm) 8
3 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram 体重 (kg) 8
4 dbo:nationality dbo:Country 国籍 7
5 dbo:deathDate xsd:date 没年月日 6
6 dbo:residence dbo:Place 居住地 6
7 dbo:eyeColor xsd:string None 6
8 dbo:birthName rdf:langString None 6
9 dbo:school dbo:EducationalInstitution None 6
10 dbo:deathPlace dbo:Place 死没地 6
11 dbo:placeOfBurial dbo:Place None 6
12 dbo:child dbo:Person 子供 6
13 dbo:relative dbo:Person 親戚 5
14 dbo:sibling dbo:Person 兄弟 5
15 dbo:waistSize xsd:double ウエスト (μ) 5
16 dbo:birthYear xsd:gYear 生年 5
17 dbo:university dbo:EducationalInstitution 大学 5
18 dbo:hairColor xsd:string None 5
19 dbo:spouse dbo:Person 配偶者 5
20 dbo:bustSize xsd:double バスト (μ) 5
21 dbo:parent dbo:Person 5
22 dbo:deathYear xsd:gYear 没年 5
23 dbo:college dbo:EducationalInstitution None 5
time: 5.23 s

And of course it can be fun to look at other classes and languages:

In [36]:
top_properties('dbo:SpaceMission',lang='fr',threshold=1)
Out[36]:
p range label cnt
0 dbo:spacecraft dbo:Spacecraft véhicule spatial 4
1 dbo:nextMission dbo:SpaceMission mision siguiente 3
2 http://dbpedia.org/ontology/SpaceMission/distanceTraveled http://dbpedia.org/datatype/kilometre None 3
3 dbo:distanceTraveled xsd:double None 3
4 http://dbpedia.org/ontology/SpaceMission/mass http://dbpedia.org/datatype/kilogram None 3
5 dbo:orbitalInclination xsd:float None 2
6 dbo:landingSite xsd:string None 2
7 dbo:missionDuration xsd:double None 2
8 dbo:crewSize xsd:nonNegativeInteger None 2
9 dbo:booster dbo:Rocket None 2
10 dbo:launchDate xsd:date None 2
11 dbo:spacewalkEnd xsd:date None 2
12 dbo:landingDate xsd:date None 2
13 dbo:spacestation dbo:SpaceStation None 2
14 dbo:lunarRover dbo:MeanOfTransportation None 2
15 dbo:spacewalkBegin xsd:date None 2
16 dbo:numberOfOrbits xsd:nonNegativeInteger None 2
17 dbo:launchPad dbo:LaunchPad None 2
18 http://dbpedia.org/ontology/SpaceMission/lunarOrbitTime http://dbpedia.org/datatype/hour None 2
19 dbo:launchSite dbo:Building None 2
20 dbo:crewMember dbo:Astronaut None 2
21 dbo:lunarOrbitTime xsd:double None 2
22 http://dbpedia.org/ontology/SpaceMission/missionDuration http://dbpedia.org/datatype/day None 2
23 dbo:previousMission dbo:SpaceMission None 2
24 dbo:lunarModule xsd:string None 2
time: 4.98 s

About "prov:wasDerivedFrom"

The prov:wasDerivedFrom property links properties and classes defined in the DBpedia Ontology to the places where they are defined on the mappings web site.

In [37]:
e.select("""
    SELECT ?s ?o {
       ?s prov:wasDerivedFrom ?o .
    } LIMIT 10
""")
Out[37]:
s o
0 dbo:colourName http://mappings.dbpedia.org/index.php/OntologyProperty:colourName
1 dbo:sharingOutPopulation http://mappings.dbpedia.org/index.php/OntologyProperty:sharingOutPopulation
2 dbo:daira http://mappings.dbpedia.org/index.php/OntologyProperty:daira
3 dbo:firstBroadcast http://mappings.dbpedia.org/index.php/OntologyProperty:firstBroadcast
4 dbo:place http://mappings.dbpedia.org/index.php/OntologyProperty:place
5 dbo:jutsu http://mappings.dbpedia.org/index.php/OntologyProperty:jutsu
6 dbo:numberOfSuites http://mappings.dbpedia.org/index.php/OntologyProperty:numberOfSuites
7 dbo:ChessPlayer http://mappings.dbpedia.org/index.php/OntologyClass:ChessPlayer
8 dbo:approvedByLowerParliament http://mappings.dbpedia.org/index.php/OntologyProperty:approvedByLowerParlia...
9 dbo:regency http://mappings.dbpedia.org/index.php/OntologyProperty:regency
time: 44.5 ms
In [38]:
_.at[0,'o']
Out[38]:
rdflib.term.URIRef('http://mappings.dbpedia.org/index.php/OntologyProperty:colourName')
time: 15 ms

Subclasses

Subclasses can be queried with queries like the following, which lists direct subtypes of dbo:Person.

In [39]:
e.select("""
   SELECT ?type {
      ?type rdfs:subClassOf dbo:Person .
   }
""")
Out[39]:
type
0 dbo:Producer
1 dbo:Criminal
2 dbo:MilitaryPerson
3 dbo:SportsManager
4 dbo:TheatreDirector
5 dbo:TelevisionDirector
6 dbo:Politician
7 dbo:OfficeHolder
8 dbo:Referee
9 dbo:Psychologist
10 dbo:Economist
11 dbo:Scientist
12 dbo:HorseTrainer
13 dbo:Celebrity
14 dbo:Coach
15 dbo:FictionalCharacter
16 dbo:OrganisationMember
17 dbo:Astronaut
18 dbo:PlayboyPlaymate
19 dbo:Orphan
20 dbo:Presenter
21 dbo:MovieDirector
22 dbo:Archeologist
23 dbo:Artist
24 dbo:Aristocrat
25 dbo:Farmer
26 dbo:Journalist
27 dbo:Ambassador
28 dbo:Royalty
29 dbo:BusinessPerson
30 dbo:Linguist
31 dbo:Athlete
32 dbo:Architect
33 dbo:Cleric
34 dbo:Noble
35 dbo:RomanEmperor
36 dbo:Engineer
37 dbo:Monarch
38 dbo:PoliticianSpouse
39 dbo:Model
40 dbo:TelevisionPersonality
41 dbo:Writer
42 dbo:Egyptologist
43 dbo:Judge
44 dbo:Religious
45 dbo:MemberResistanceMovement
46 dbo:Chef
47 dbo:Lawyer
48 dbo:Philosopher
49 dbo:BeautyQueen
time: 63 ms

SPARQL 1.1 has property path operators that will make the query engine recurse through multiple rdfs:subClassOf property links.

In [40]:
e.select("""
   SELECT ?type {
      ?type rdfs:subClassOf* dbo:Person .
   }
""")
Out[40]:
type
0 dbo:Person
1 dbo:Producer
2 dbo:Criminal
3 dbo:Murderer
4 dbo:SerialKiller
5 dbo:MilitaryPerson
6 dbo:SportsManager
7 dbo:SoccerManager
8 dbo:TheatreDirector
9 dbo:TelevisionDirector
10 dbo:Politician
11 dbo:Lieutenant
12 dbo:PrimeMinister
13 dbo:Deputy
14 dbo:Congressman
15 dbo:VicePresident
16 dbo:VicePrimeMinister
17 dbo:President
18 dbo:Mayor
19 dbo:Senator
20 dbo:Chancellor
21 dbo:Governor
22 dbo:MemberOfParliament
23 dbo:OfficeHolder
24 dbo:Referee
25 dbo:Psychologist
26 dbo:Economist
27 dbo:Scientist
28 dbo:Medician
29 dbo:Entomologist
... ...
154 dbo:Pope
155 dbo:Cardinal
156 dbo:ChristianBishop
157 dbo:Vicar
158 dbo:ChristianPatriarch
159 dbo:Saint
160 dbo:Priest
161 dbo:Noble
162 dbo:RomanEmperor
163 dbo:Engineer
164 dbo:Monarch
165 dbo:PoliticianSpouse
166 dbo:Model
167 dbo:TelevisionPersonality
168 dbo:Host
169 dbo:Writer
170 dbo:SongWriter
171 dbo:ScreenWriter
172 dbo:PlayWright
173 dbo:Historian
174 dbo:MusicComposer
175 dbo:Poet
176 dbo:Egyptologist
177 dbo:Judge
178 dbo:Religious
179 dbo:MemberResistanceMovement
180 dbo:Chef
181 dbo:Lawyer
182 dbo:Philosopher
183 dbo:BeautyQueen

184 rows × 1 columns

time: 113 ms

The previous queries work "down" from a higher-level class, but by putting a '^' before the property name, I can reverse the direction of traversal, to find all topics which dbo:Painter is a subclass of.

In [41]:
e.select("""
   SELECT ?type {
      ?type ^rdfs:subClassOf* dbo:Painter .
   }
""")
Out[41]:
type
0 dbo:Painter
1 dbo:Artist
2 dbo:Person
3 dbo:Agent
4 owl:Thing
time: 42 ms
In [42]:
e.select("""
   SELECT ?type {
      dbo:Painter rdfs:subClassOf* ?type .
   }
""")
Out[42]:
type
0 dbo:Painter
1 dbo:Artist
2 dbo:Person
3 dbo:Agent
4 owl:Thing
time: 39.5 ms

The same outcome can be had by switching the subject and object positions in the triple:

In [43]:
e.select("""
   SELECT ?type {
      dbo:City rdfs:subClassOf* ?type .
   }
""")
Out[43]:
type
0 dbo:City
1 dbo:Settlement
2 dbo:PopulatedPlace
3 dbo:Place
4 owl:Thing
time: 29.5 ms

Equivalent Classes

The DBpedia Ontology uses owl:equivalentClass to specify equivalency between DBpedia Ontology types and types used in other popular systems such as wikidata and schema.org:

In [44]:
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
   } LIMIT 10
""")
Out[44]:
a b
0 dbo:Activity http://www.wikidata.org/entity/Q1914636
1 dbo:Novel http://www.wikidata.org/entity/Q8261
2 dbo:Writer http://www.wikidata.org/entity/Q36180
3 dbo:Band http://www.wikidata.org/entity/Q215380
4 dbo:InformationAppliance http://www.wikidata.org/entity/Q1067263
5 dbo:SolarEclipse http://www.wikidata.org/entity/Q3887
6 dbo:Legislature http://www.wikidata.org/entity/Q11204
7 dbo:Automobile http://schema.org/Product
8 dbo:MusicGenre http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
9 dbo:Food http://www.wikidata.org/entity/Q2095
time: 39 ms

Here are all of the equivalencies between the DBpedia Ontology and schema.org.

In [45]:
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
    FILTER(STRSTARTS(STR(?b),"http://schema.org/"))
   }
""")
Out[45]:
a b
0 dbo:Automobile http://schema.org/Product
1 dbo:Aircraft http://schema.org/Product
2 dbo:Person http://schema.org/Person
3 dbo:Stadium http://schema.org/StadiumOrArena
4 dbo:College http://schema.org/CollegeOrUniversity
5 dbo:Lake http://schema.org/LakeBodyOfWater
6 dbo:Restaurant http://schema.org/Restaurant
7 dbo:SkiArea http://schema.org/SkiResort
8 dbo:Locomotive http://schema.org/Product
9 dbo:RadioStation http://schema.org/RadioStation
10 dbo:School http://schema.org/School
11 dbo:Canal http://schema.org/Canal
12 dbo:Ship http://schema.org/Product
13 dbo:SportsEvent http://schema.org/SportsEvent
14 dbo:EducationalInstitution http://schema.org/EducationalOrganization
15 dbo:AdministrativeRegion http://schema.org/AdministrativeArea
16 dbo:River http://schema.org/RiverBodyOfWater
17 dbo:Park http://schema.org/Park
18 dbo:Sea http://schema.org/SeaBodyOfWater
19 dbo:BodyOfWater http://schema.org/BodyOfWater
20 dbo:Arena http://schema.org/StadiumOrArena
21 dbo:HistoricPlace http://schema.org/LandmarksOrHistoricalBuildings
22 dbo:Language http://schema.org/Language
23 dbo:Airport http://schema.org/Airport
24 dbo:Organisation http://schema.org/Organization
25 dbo:City http://schema.org/City
26 dbo:Annotation http://schema.org/Comment
27 dbo:ShoppingMall http://schema.org/ShoppingCenter
28 dbo:Hotel http://schema.org/Hotel
29 dbo:Website http://schema.org/WebPage
30 dbo:Album http://schema.org/MusicAlbum
31 dbo:Painting http://schema.org/Painting
32 dbo:Hospital http://schema.org/Hospital
33 dbo:Place http://schema.org/Place
34 dbo:MilitaryVehicle http://schema.org/Product
35 dbo:Book http://schema.org/Book
36 dbo:TelevisionEpisode http://schema.org/TVEpisode
37 dbo:Event http://schema.org/Event
38 dbo:Film http://schema.org/Movie
39 dbo:Continent http://schema.org/Continent
40 dbo:Bank http://schema.org/BankOrCreditUnion
41 dbo:University http://schema.org/CollegeOrUniversity
42 dbo:Library http://schema.org/Library
43 dbo:TelevisionStation http://schema.org/TelevisionStation
44 dbo:Mountain http://schema.org/Mountain
45 dbo:HistoricBuilding http://schema.org/LandmarksOrHistoricalBuildings
46 dbo:MusicFestival http://schema.org/Festival
47 dbo:Country http://schema.org/Country
48 dbo:SportsTeam http://schema.org/SportsTeam
49 dbo:Sculpture http://schema.org/Sculpture
50 dbo:Work http://schema.org/CreativeWork
51 dbo:Song http://schema.org/MusicRecording
time: 161 ms

Many of these are as you would expect, but there are some that are not correct, given the definition of owl:equivalentClass from the OWL specification.

9.1.2 Equivalent Classes

An equivalent classes axiom EquivalentClasses( CE1 ... CEn ) states that all of the class expressions CEi, 1 ≤ i ≤ n, are semantically equivalent to each other. This axiom allows one to use each CEi as a synonym for each CEj — that is, in any expression in the ontology containing such an axiom, CEi can be replaced with CEj without affecting the meaning of the ontology. An axiom EquivalentClasses( CE1 CE2 ) is equivalent to the following two axioms:

SubClassOf( CE1 CE2 )
SubClassOf( CE2 CE1 )

Put differently, anything that is a member of one class is a member of the other class and vice versa. That's true for dbo:TelevisionEpisode and schema:TVEpisode, but not true for many cases involving schema:Product

In [46]:
g.bind("schema","http://schema.org/")
e=LocalEndpoint(g)
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
    FILTER(?b=<http://schema.org/Product>)
   }
""")
Out[46]:
a b
0 dbo:Automobile schema:Product
1 dbo:Aircraft schema:Product
2 dbo:Locomotive schema:Product
3 dbo:Ship schema:Product
4 dbo:MilitaryVehicle schema:Product
time: 285 ms

I think you'd agree that an Automobile is a Product, but that a Product is not necessarily an automobile. In these cases,

dbo:Automobile rdfs:subClassOf schema:Product .

is more accurate.

Let's take a look at external classes which aren't part of schema.org or wikidata:

In [47]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentClass ?b .
        FILTER(!STRSTARTS(STR(?b),"http://schema.org/"))
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/"))    
    }
""")
Out[47]:
a b
0 dbo:MusicGenre http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
1 dbo:LegalCase http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
2 dbo:Book http://purl.org/ontology/bibo/Book
3 dbo:Sales http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
4 dbo:ChemicalSubstance http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#ChemicalObject
5 dbo:Place dbo:Location
6 dbo:List http://www.w3.org/2004/02/skos/core#OrderedCollection
7 dbo:TopicalConcept http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
8 dbo:Holiday http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#TimeInterval
9 dbo:Year http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#TimeInterval
10 dbo:PenaltyShootOut http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Event
11 dbo:List http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Collection
12 dbo:Person http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#NaturalPerson
13 dbo:Unknown http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Entity
14 dbo:Film http://dbpedia.org/ontology/Wikidata:Q11424
15 dbo:MeanOfTransportation http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#DesignedArtifact
16 dbo:Organisation http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialPerson
17 dbo:Database http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#InformationObject
18 dbo:Document foaf:Document
19 dbo:Food http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#FunctionalSubstance
20 dbo:Agent http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent
21 dbo:Person foaf:Person
22 dbo:Monastery http://www.ontologydesignpatterns.org/ont/d0.owl#Location
23 dbo:Abbey dbo:Monastery
24 dbo:UnitOfWork http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
25 dbo:Ideology http://www.ontologydesignpatterns.org/ont/d0.owl#CognitiveEntity
26 dbo:Annotation http://purl.org/ontology/bibo/Note
27 dbo:Tax http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Description
28 dbo:Activity http://www.ontologydesignpatterns.org/ont/d0.owl#Activity
29 dbo:GovernmentType http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
30 dbo:Article http://purl.org/ontology/bibo/Article
31 dbo:Event http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Event
32 dbo:Polyhedron http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SpaceRegion
33 dbo:Project http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#PlanExecution
time: 248 ms

To keep track of them all, I add a few more namespace declarations.

In [48]:
g.bind("dzero","http://www.ontologydesignpatterns.org/ont/d0.owl#")
g.bind("dul","http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#")
g.bind("bibo","http://purl.org/ontology/bibo/")
g.bind("skos","http://www.w3.org/2004/02/skos/core#")
e=LocalEndpoint(g)
time: 3 ms
In [49]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentClass ?b .
        FILTER(!STRSTARTS(STR(?b),"http://schema.org/"))
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/"))    
    }
""")
Out[49]:
a b
0 dbo:MusicGenre dul:Concept
1 dbo:LegalCase dul:Situation
2 dbo:Book bibo:Book
3 dbo:Sales dul:Situation
4 dbo:ChemicalSubstance dul:ChemicalObject
5 dbo:Place dbo:Location
6 dbo:List skos:OrderedCollection
7 dbo:TopicalConcept dul:Concept
8 dbo:Holiday dul:TimeInterval
9 dbo:Year dul:TimeInterval
10 dbo:PenaltyShootOut dul:Event
11 dbo:List dul:Collection
12 dbo:Person dul:NaturalPerson
13 dbo:Unknown dul:Entity
14 dbo:Film http://dbpedia.org/ontology/Wikidata:Q11424
15 dbo:MeanOfTransportation dul:DesignedArtifact
16 dbo:Organisation dul:SocialPerson
17 dbo:Database dul:InformationObject
18 dbo:Document foaf:Document
19 dbo:Food dul:FunctionalSubstance
20 dbo:Agent dul:Agent
21 dbo:Person foaf:Person
22 dbo:Monastery dzero:Location
23 dbo:Abbey dbo:Monastery
24 dbo:UnitOfWork dul:Situation
25 dbo:Ideology dzero:CognitiveEntity
26 dbo:Annotation bibo:Note
27 dbo:Tax dul:Description
28 dbo:Activity dzero:Activity
29 dbo:GovernmentType dul:Concept
30 dbo:Article bibo:Article
31 dbo:Event dul:Event
32 dbo:Polyhedron dul:SpaceRegion
33 dbo:Project dul:PlanExecution
time: 230 ms

The mapping from dbo:Film to <http://dbpedia.org/ontology/Wikidata:Q11424> is almost certainly a typo.

Disjoint Classes

Another bit of OWL vocabulary used in the DBpedia Ontology is owl:disjointWith

In [50]:
e.select("""
    SELECT ?b ?a {
        ?a owl:disjointWith ?b .   
    } ORDER BY ?b
""")
Out[50]:
b a
0 dbo:Fish dbo:Mammal
1 dbo:HistoricalPeriod dbo:PrehistoricalPeriod
2 dbo:Person dbo:TimePeriod
3 dbo:Person dbo:Escalator
4 dbo:Person dbo:MovingWalkway
5 dbo:Person dbo:Activity
6 dbo:Person dbo:GeologicalPeriod
7 dbo:Person dbo:Mountain
8 dbo:Person dbo:PeriodOfArtisticStyle
9 dbo:Person dbo:On-SiteTransportation
10 dbo:Person dbo:MeanOfTransportation
11 dbo:Person dbo:ConveyorSystem
12 dbo:Person dbo:UnitOfWork
13 dbo:Person dbo:ProtohistoricalPeriod
14 dbo:Person dbo:Gate
15 dbo:Person dbo:Mine
16 dbo:Person dbo:Tower
17 dbo:Person dbo:Building
18 dbo:Person dbo:HistoricalPeriod
19 dbo:Person dbo:Event
20 dbo:Place dbo:Agent
21 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:Work
22 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:Organisation
23 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:ReligiousOrganisation
time: 155 ms

If two classes are disjoint, that means that nothing can be an instance of both things. For instance, a Fish cannot be a Mammal, a Person is not a building, etc. These sort of facts are helpful for validation, but one should resist the impulse to make statements of disjointness which aren't strictly true. (For instance, it would be unlikely, but not impossible, to be the winner of both a Heisman Trophy and a Fields Metal, so these are not disjoint categories.)

Datatypes

RDF not only has "types" (classes) that represent named concepts, but it also has literal datatypes. These include the standard datatypes from XML such as xsd:integer and xsd:datetime, but also derived types that specialize those types. This makes it possible to tag quantities in terms of physical units, currency units, etc.

In [51]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
    } LIMIT 10
""")
Out[51]:
type
0 http://dbpedia.org/datatype/unitedArabEmiratesDirham
1 http://dbpedia.org/datatype/milliampere
2 http://dbpedia.org/datatype/romanianNewLeu
3 http://dbpedia.org/datatype/squareInch
4 http://dbpedia.org/datatype/turkishLira
5 http://dbpedia.org/datatype/vanuatuVatu
6 http://dbpedia.org/datatype/omaniRial
7 http://dbpedia.org/datatype/indianRupee
8 http://dbpedia.org/datatype/kilobyte
9 http://dbpedia.org/datatype/hertz
time: 32.5 ms
In [52]:
g.bind("type","http://dbpedia.org/datatype/")
e=LocalEndpoint(g)
time: 6.5 ms
In [53]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
    } LIMIT 10
""")
Out[53]:
type
0 type:unitedArabEmiratesDirham
1 type:milliampere
2 type:romanianNewLeu
3 type:squareInch
4 type:turkishLira
5 type:vanuatuVatu
6 type:omaniRial
7 type:indianRupee
8 type:kilobyte
9 type:hertz
time: 26.5 ms

The information about these data types are currently sparse: this particular type has just a label and a type.

In [54]:
e.select("""
    SELECT ?p ?o {
        type:lightYear ?p ?o .
    }
""")
Out[54]:
p o
0 rdf:type rdfs:Datatype
1 rdfs:label lightYear
time: 42.5 ms

These turn out to be the only properties that any datatypes have; pretty clearly, datatypes are not labeled in the rich set of languages that properties and classes are labeled in. (Note that vocabulary exists in RDFS and OWL for doing just that, such as specifying that type:lightYear would be represented as a number, specifying that a particular type of numeric value is in a particular range, etc.)

In [55]:
e.select("""
    SELECT ?p (COUNT(*) AS ?cnt) {
        ?s a rdfs:Datatype .
        ?s ?p ?o .
    } GROUP BY ?p
""")
Out[55]:
cnt
p
rdf:type 382
rdfs:label 382
time: 156 ms

Another approach is to look at how datatypes get used, that is, how frequently various datatypes are used as the range of a property.

In [56]:
e.select("""
    SELECT ?type (COUNT(*) AS ?cnt) {
        ?p rdfs:range ?type .
        ?type a rdfs:Datatype .
    } GROUP BY ?type ORDER BY DESC(?cnt)
""")
Out[56]:
cnt
type
xsd:string 779
xsd:nonNegativeInteger 265
xsd:double 187
xsd:date 146
xsd:gYear 60
xsd:integer 53
rdf:langString 49
type:millimetre 26
type:kilogram 25
xsd:float 25
xsd:positiveInteger 18
type:kilometre 15
type:kelvin 12
type:squareKilometre 10
type:metre 9
type:hour 6
type:day 6
xsd:boolean 5
type:inhabitantsPerSquareKilometre 4
type:cubicMetrePerSecond 4
xsd:dateTime 4
type:kilogramPerCubicMetre 3
type:cubicMetre 2
type:kilometrePerSecond 2
type:minute 2
type:cubicKilometre 2
type:cubicCentimetre 1
type:second 1
type:valvetrain 1
type:engineConfiguration 1
type:kilometrePerHour 1
type:centimetre 1
type:kilowatt 1
type:megabyte 1
type:squareMetre 1
type:gramPerKilometre 1
type:litre 1
type:newtonMetre 1
xsd:anyURI 1
type:fuelType 1
xsd:gYearMonth 1
time: 261 ms
In [57]:
len(_)
Out[57]:
41
time: 4.01 ms

Out of 382 properties, only 41 actually appear as the range of the properties in the schema. Here are a few properties that are unused in the schema.

In [58]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
        MINUS { ?s ?p ?type }
    } LIMIT 20
""")
Out[58]:
type
0 type:unitedArabEmiratesDirham
1 type:milliampere
2 type:romanianNewLeu
3 type:squareInch
4 type:turkishLira
5 type:vanuatuVatu
6 type:omaniRial
7 type:indianRupee
8 type:kilobyte
9 type:hertz
10 type:microampere
11 type:millisecond
12 type:mauritanianOuguiya
13 type:congoleseFranc
14 type:milligramForce
15 type:newtonCentimetre
16 type:burundianFranc
17 type:squareMile
18 type:squareMillimetre
19 type:megametre
time: 11.8 s

According to the DBpedia Ontology documentation, there are two kinds of datatype declarations in mappings. In some cases the unit is explictly specified in the mapping field (ex. a field that contains a length is specified in meters) and in other cases, a particular datatype is specific to the field.

It turns out most of the knowledge in the DBpedia Ontology system is hard coded into a scala file; this file contains rich information that is not exposed in the RDF form of the Ontology, such as conversion factors, the fact that miles per hour is a speed, etc.

It is quite possible to encode datatypes directly into a fact, for example,

:Iron :meltsAt "1811 K"^^type:kelvin .


It is possible that such facts could be found in DBpedia or some other database, but I'm not going to check for that in this notebook, because this notebook is only considering facts that are in the ontology file supplied with this notebook.

Properties Measured in Kilograms

In [59]:
e.select("""
    SELECT ?p {
        ?p rdfs:range type:kilogram
    }
""")
Out[59]:
p
0 http://dbpedia.org/ontology/Spacecraft/dryCargo
1 http://dbpedia.org/ontology/MovingWalkway/mass
2 http://dbpedia.org/ontology/SpaceMission/lunarSampleMass
3 http://dbpedia.org/ontology/Spacecraft/cargoGas
4 http://dbpedia.org/ontology/MeanOfTransportation/mass
5 http://dbpedia.org/ontology/Escalator/mass
6 http://dbpedia.org/ontology/Spacecraft/totalMass
7 http://dbpedia.org/ontology/Rocket/lowerEarthOrbitPayload
8 http://dbpedia.org/ontology/ConveyorSystem/weight
9 http://dbpedia.org/ontology/On-SiteTransportation/mass
10 http://dbpedia.org/ontology/On-SiteTransportation/weight
11 http://dbpedia.org/ontology/SpaceMission/mass
12 http://dbpedia.org/ontology/Person/weight
13 http://dbpedia.org/ontology/Engine/weight
14 http://dbpedia.org/ontology/MeanOfTransportation/weight
15 http://dbpedia.org/ontology/Escalator/weight
16 http://dbpedia.org/ontology/Weapon/weight
17 http://dbpedia.org/ontology/Galaxy/mass
18 http://dbpedia.org/ontology/ConveyorSystem/mass
19 http://dbpedia.org/ontology/Spacecraft/totalCargo
20 http://dbpedia.org/ontology/MovingWalkway/weight
21 http://dbpedia.org/ontology/Spacecraft/cargoWater
22 http://dbpedia.org/ontology/Planet/mass
23 http://dbpedia.org/ontology/Spacecraft/cargoFuel
24 http://dbpedia.org/ontology/Rocket/mass
time: 39.5 ms

One unfortunate thing is that the DBpedia ontology sometimes composes property URIs by putting together the class (ex. "Galaxy") and the property (ex. "mass") with a slash between them. Slash is not allowed in a localname, which means that you can't write ontology:Galaxy/mass. You can write the full URI, or you could define a prefix galaxy such that you can write Galaxy:mass. Yet another approach is to set the base URI to

http://dbpedia.org/ontology/

in which case you could write <Galaxy/mass>. I was tempted to do that for this notebook, but decided against it, because soon I will be joining the schema with more DBpedia data, where I like to set the base to

http://dbpedia.org/resource/

In a slightly better world, the property might be composed with a period, so that the URI is just "ontology:Galaxy.mass". (Hmm... Could Gastrodon do that for you?)

Datatype properties vs Object Properties

RDFS has a single class to represent a property, rdf:Property; OWL makes it a little more complex by defining both owl:DatatypeProperty and owl:ObjectProperty. The difference between these two kinds of property is the range: a Datatype Property has a literal value (object), while an Object Property has a Resource (URI or blank node) as a value.

I'd imagine that every rdf:Property should be either an owl:DatatypeProperty or owl:ObjectProperty, so that the sums would match. I wouldn't take it for granted, so I'll check it:

In [60]:
counts=e.select("""
   SELECT ?type (COUNT(*) AS ?cnt) {
      ?s a ?type .
      FILTER (?type IN (rdf:Property,owl:DatatypeProperty,owl:ObjectProperty))
   } GROUP BY ?type ORDER BY DESC(?cnt)
""")["cnt"]
counts
Out[60]:
type
rdf:Property            2695
owl:DatatypeProperty    1734
owl:ObjectProperty      1099
Name: cnt, dtype: int64
time: 1.82 s
In [61]:
counts["rdf:Property"]
Out[61]:
2695
time: 3.5 ms
In [62]:
counts["owl:DatatypeProperty"]+counts["owl:ObjectProperty"]
Out[62]:
2833
time: 16.5 ms

The sums don't match.

I'd expect the two kinds of DatatypeProperties to be disjoint; and they are, because I can't find any classes which are an instance of both.

In [63]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:DatatypeProperty .
      ?klasse a owl:ObjectProperty .
   }
""")
Out[63]:
klasse
time: 179 ms

However, there are cases where a property is registered as an OWL property but not as an RDFS property:

In [64]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:DatatypeProperty .
      MINUS {?klasse a rdf:Property}
   }
""")
Out[64]:
klasse
0 http://dbpedia.org/ontology/Galaxy/mass
1 http://dbpedia.org/ontology/Planet/maximumTemperature
2 http://dbpedia.org/ontology/Planet/meanTemperature
3 http://dbpedia.org/ontology/Galaxy/meanTemperature
4 http://dbpedia.org/ontology/Engine/cylinderBore
5 http://dbpedia.org/ontology/MovingWalkway/height
6 http://dbpedia.org/ontology/ConveyorSystem/diameter
7 http://dbpedia.org/ontology/Spacecraft/totalCargo
8 http://dbpedia.org/ontology/MovingWalkway/length
9 http://dbpedia.org/ontology/Planet/volume
10 http://dbpedia.org/ontology/Escalator/width
11 http://dbpedia.org/ontology/Engine/height
12 http://dbpedia.org/ontology/Planet/periapsis
13 http://dbpedia.org/ontology/Engine/width
14 http://dbpedia.org/ontology/Escalator/diameter
15 http://dbpedia.org/ontology/Galaxy/averageSpeed
16 http://dbpedia.org/ontology/School/campusSize
17 http://dbpedia.org/ontology/Weapon/length
18 http://dbpedia.org/ontology/PopulatedPlace/populationUrbanDensity
19 http://dbpedia.org/ontology/Engine/length
20 http://dbpedia.org/ontology/Canal/originalMaximumBoatLength
21 http://dbpedia.org/ontology/Spacecraft/totalMass
22 http://dbpedia.org/ontology/Canal/originalMaximumBoatBeam
23 http://dbpedia.org/ontology/On-SiteTransportation/width
24 http://dbpedia.org/ontology/Engine/powerOutput
25 http://dbpedia.org/ontology/Galaxy/density
26 http://dbpedia.org/ontology/SpaceMission/stationEvaDuration
27 http://dbpedia.org/ontology/Planet/averageSpeed
28 http://dbpedia.org/ontology/Galaxy/volume
29 http://dbpedia.org/ontology/GrandPrix/distance
... ...
108 http://dbpedia.org/ontology/SpaceMission/lunarSurfaceTime
109 http://dbpedia.org/ontology/Rocket/mass
110 http://dbpedia.org/ontology/LunarCrater/diameter
111 http://dbpedia.org/ontology/Engine/diameter
112 http://dbpedia.org/ontology/MovingWalkway/width
113 http://dbpedia.org/ontology/ConveyorSystem/width
114 http://dbpedia.org/ontology/Engine/co2Emission
115 http://dbpedia.org/ontology/MeanOfTransportation/diameter
116 http://dbpedia.org/ontology/Spacecraft/dockedTime
117 http://dbpedia.org/ontology/ConveyorSystem/length
118 http://dbpedia.org/ontology/PopulatedPlace/populationDensity
119 http://dbpedia.org/ontology/On-SiteTransportation/height
120 http://dbpedia.org/ontology/GrandPrix/course
121 http://dbpedia.org/ontology/GeopoliticalOrganisation/populationDensity
122 http://dbpedia.org/ontology/Planet/minimumTemperature
123 http://dbpedia.org/ontology/Infrastructure/length
124 http://dbpedia.org/ontology/Engine/acceleration
125 http://dbpedia.org/ontology/ConveyorSystem/weight
126 http://dbpedia.org/ontology/Spacecraft/cargoGas
127 http://dbpedia.org/ontology/Lake/areaOfCatchment
128 http://dbpedia.org/ontology/SpaceMission/lunarOrbitTime
129 http://dbpedia.org/ontology/Planet/temperature
130 http://dbpedia.org/ontology/GeopoliticalOrganisation/areaMetro
131 http://dbpedia.org/ontology/Stream/minimumDischarge
132 http://dbpedia.org/ontology/SpaceMission/mass
133 http://dbpedia.org/ontology/On-SiteTransportation/length
134 http://dbpedia.org/ontology/SpaceShuttle/distance
135 http://dbpedia.org/ontology/ChemicalSubstance/density
136 http://dbpedia.org/ontology/Galaxy/apoapsis
137 http://dbpedia.org/ontology/Weapon/diameter

138 rows × 1 columns

time: 31.1 s
In [65]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:ObjectProperty .
      MINUS {?klasse a rdf:Property}
   }
""")
Out[65]:
klasse
time: 18.1 s

However, there are no properties defined as an RDFS property that are not defined in OWL.

In [66]:
e.select("""
   SELECT ?p {
      ?p a rdf:Property .
      MINUS {
          { ?p a owl:DatatypeProperty }
          UNION
          { ?p a owl:ObjectProperty }
      }
   }
""")
Out[66]:
p
time: 45.2 s

Conclusion: to get a complete list of properties defined in the DBpedia Ontology, is necessary and sufficient to use the OWL property declarations. The analysis above that uses rdfs:Property should use the OWL property classes to get complete results.

Subproperties

Subproperties are used in RDF to gather together properties that more or less say the same thing.

For instance, the mass of a galaxy is comparable (in principle) to the mass of objects like stars and planets that make it. Thus in a perfect world, the mass of a galaxy would be related to a more general "mass" property that could apply to anything from coins to aircraft carriers.

I go looking for one...

In [67]:
galaxyMass=URIRef("http://dbpedia.org/ontology/Galaxy/mass")
e.select("""
   SELECT ?p {
      ?_galaxyMass rdfs:subPropertyOf ?p .
   }
""")
Out[67]:
p
time: 19 ms

... and don't find it. That's not really a problem, because this I can always add one by adding a few more facts to my copy of the DBpedia Ontology. Let's see what is really there...

In [68]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
   }
""")
Out[68]:
from to
0 dbo:trustee dul:sameSettingAs
1 dbo:stateOfOrigin dul:hasLocation
2 dbo:satScore dul:isClassifiedBy
3 dbo:parentOrganisation dul:sameSettingAs
4 dbo:routeEndLocation dul:hasCommonBoundary
5 dbo:soccerTournamentMostSuccesfull dul:isSettingFor
6 dbo:opponents dul:hasParticipant
7 dbo:nationalOlympicCommittee dul:hasParticipant
8 dbo:deathPlace dul:hasLocation
9 dbo:editing dul:coparticipatesWith
10 dbo:costumeDesigner dul:coparticipatesWith
11 dbo:principalArea dul:isLocationOf
12 dbo:child dul:sameSettingAs
13 dbo:mouthDistrict dul:hasLocation
14 dbo:highestMountain dul:isLocationOf
15 dbo:religiousHeadLabel dul:sameSettingAs
16 dbo:nisCode dul:isClassifiedBy
17 dbo:valvetrain dul:hasComponent
18 dbo:alias dbo:alternativeName
19 dbo:translator dul:coparticipatesWith
20 dbo:presenter dul:hasParticipant
21 dbo:iso6392Code dbo:LanguageCode
22 dbo:sales dul:hasSetting
23 dbo:executiveHeadteacher dul:coparticipatesWith
24 dbo:presidentGeneralCouncil dul:sameSettingAs
25 dbo:officialOpenedBy dul:hasParticipant
26 dbo:relatedPlaces dul:unifies
27 dbo:managementRegion dul:sameSettingAs
28 dbo:campus dul:hasLocation
29 dbo:championInMixedDouble dul:hasParticipant
... ... ...
941 dbo:associatedBand dul:isMemberOf
942 dbo:chiefEditor dul:coparticipatesWith
943 dbo:filmFareAward dul:coparticipatesWith
944 dbo:dutchPPNCode dbo:code
945 dbo:sportSpecialty dul:isParticipantIn
946 dbo:militaryUnit dul:isMemberOf
947 dbo:goldMedalist dul:hasParticipant
948 dbo:order dul:isSpecializedBy
949 dbo:projectCoordinator dul:isSettingFor
950 dbo:homeStadium dul:sameSettingAs
951 dbo:building dul:isLocationOf
952 dbo:politicalPartyOfLeader dul:hasPart
953 dbo:formerPartner dul:coparticipatesWith
954 dbo:highestState dul:isLocationOf
955 dbo:iso6393Code dbo:LanguageCode
956 dbo:picture dul:concretelyExpresses
957 dbo:protectionStatus dbo:Status
958 dbo:formerHighschool dul:isMemberOf
959 dbo:headChef dul:coparticipatesWith
960 dbo:deputy dul:coparticipatesWith
961 dbo:languageRegulator dul:sameSettingAs
962 dbo:capitalDistrict dul:hasLocation
963 dbo:manager dul:coparticipatesWith
964 dbo:southEastPlace dbo:closeTo
965 dbo:derivative dul:specializes
966 dbo:brand dul:coparticipatesWith
967 dbo:tenant dul:sameSettingAs
968 dbo:launchSite dul:hasParticipant
969 dbo:viceLeaderParty dul:sameSettingAs
970 dbo:chaplain dul:coparticipatesWith

971 rows × 2 columns

time: 297 ms

It looks like terms on the left are always part of the DBpedia Ontology:

In [69]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?from),"http://dbpedia.org/ontology/"))
   }
""")
Out[69]:
from to
time: 326 ms

Terms on the right are frequently part of the http://ontologydesignpatterns.org/wiki/Ontology:DOLCE%2BDnS_Ultralite (DUL)

ontology and are a way to explain the meaning of DBpedia Ontology terms in terms of DUL. Let's look at superproperties that aren't from the DUL ontology:

In [70]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   }
""")
Out[70]:
from to
0 dbo:alias dbo:alternativeName
1 dbo:iso6392Code dbo:LanguageCode
2 dbo:premiereDate dbo:releaseDate
3 dbo:codeMemorial dbo:Code
4 dbo:silverMedalist dbo:Medalist
5 dbo:championInDoubleMale dbo:championInDouble
6 dbo:bronzeMedalist dbo:Medalist
7 dbo:championInSingle dbo:champion
8 dbo:commandant dbo:keyPerson
9 dbo:playRole dbo:uses
10 dbo:awayColourHexCode dbo:colourHexCode
11 dbo:dutchMIPCode dbo:Code
12 dbo:codeMunicipalMonument dbo:code
13 dbo:muteCharacterInPlay dbo:characterInPlay
14 dbo:westPlace dbo:closeTo
15 dbo:isPartOfMilitaryConflict dbo:isPartOf
16 dbo:ngcName dbo:name
17 dbo:provinceIsoCode dbo:isoCode
18 dbo:locationCity dbo:location
19 dbo:averageDepth dbo:depth
20 dbo:northEastPlace dbo:closeTo
21 dbo:originalLanguage dbo:language
22 dbo:iso6391Code dbo:LanguageCode
23 dbo:subTribus dbo:Tribus
24 dbo:communityIsoCode dbo:isoCode
25 dbo:inseeCode dbo:codeSettlement
26 dbo:olympicOathSwornByAthlete dbo:olympicOathSwornBy
27 dbo:codeLandRegistry dbo:Code
28 dbo:northPlace dbo:closeTo
29 dbo:subClassis dbo:classis
... ... ...
46 dbo:isPartOfAnatomicalStructure dbo:isPartOf
47 dbo:codeStockExchange dbo:code
48 dbo:premiereYear dbo:releaseYear
49 dbo:nextMission dbo:followedBy
50 dbo:championInSingleFemale dbo:championInSingle
51 dbo:isPartOfWineRegion dbo:isPartOf
52 dbo:maximumDepth dbo:depth
53 dbo:eastPlace dbo:closeTo
54 dbo:dutchWinkelID dbo:code
55 dbo:ekatteCode dbo:codeSettlement
56 dbo:northWestPlace dbo:closeTo
57 dbo:southPlace dbo:closeTo
58 dbo:championInSingleMale dbo:championInSingle
59 dbo:codeProvincialMonument dbo:code
60 dbo:politicGovernmentDepartment dbo:Department
61 dbo:rankingWins dbo:Wins
62 dbo:chorusCharacterInPlay dbo:characterInPlay
63 dbo:codeIndex dbo:code
64 dbo:ofsCode dbo:isoCode
65 dbo:owningOrganisation dbo:owner
66 dbo:officialSchoolColour dbo:ColourName
67 dbo:zipCode dbo:postalCode
68 dbo:silCode dbo:LanguageCode
69 dbo:messierName dbo:name
70 dbo:otherWins dbo:Wins
71 dbo:senator dbo:MemberOfParliament
72 dbo:dutchPPNCode dbo:code
73 dbo:iso6393Code dbo:LanguageCode
74 dbo:protectionStatus dbo:Status
75 dbo:southEastPlace dbo:closeTo

76 rows × 2 columns

time: 370 ms

Out of those 75 relationships, I bet many of them point to the same superproperties:

In [71]:
e.select("""
   SELECT ?to (COUNT(*) AS ?cnt) {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   } GROUP BY ?to ORDER BY DESC(?cnt)
""")
Out[71]:
cnt
to
dbo:code 9
dbo:closeTo 8
dbo:LanguageCode 4
dbo:Code 3
dbo:Medalist 3
dbo:championInDouble 3
dbo:isPartOf 3
dbo:isoCode 3
dbo:champion 2
dbo:colourHexCode 2
dbo:characterInPlay 2
dbo:name 2
dbo:location 2
dbo:depth 2
dbo:Tribus 2
dbo:codeSettlement 2
dbo:olympicOathSwornBy 2
dbo:followedBy 2
dbo:championInSingle 2
dbo:Wins 2
dbo:alternativeName 1
dbo:releaseDate 1
dbo:keyPerson 1
dbo:uses 1
dbo:language 1
dbo:classis 1
dbo:administrativeHeadCity 1
dbo:genre 1
dbo:Distance 1
dbo:releaseYear 1
dbo:Department 1
dbo:owner 1
dbo:ColourName 1
dbo:postalCode 1
dbo:MemberOfParliament 1
dbo:Status 1
time: 285 ms

The most common superproperty is dbo:code, which represents identifying codes. For instance, this could be a postal Code, UPC Code, or a country or regional code. Unfortunately, only a small number of code-containing fields are so identified.

In [72]:
e.select("""
   SELECT ?about ?from {
      ?from 
          rdfs:subPropertyOf dbo:code ;
          rdfs:domain ?about .
   }
""")
Out[72]:
about from
0 dbo:Place dbo:codeProvincialMonument
1 dbo:Place dbo:codeMunicipalMonument
2 dbo:Place dbo:codeNationalMonument
3 dbo:Company dbo:codeStockExchange
4 dbo:MemberResistanceMovement dbo:codeIndex
5 dbo:MemberResistanceMovement dbo:dutchNAIdentifier
6 dbo:UndergroundJournal dbo:dutchWinkelID
7 dbo:MemberResistanceMovement dbo:codeListOfHonour
8 dbo:WrittenWork dbo:dutchPPNCode
time: 31.5 ms

Looking at the superproperty dbo:closeTo, the subproperties represent (right-hand) locations that are adjacent to (left-hand) locations in the directions of the cardinal and ordinal (definition) directions.

In [73]:
e.select("""
   SELECT ?about ?from {
      ?from 
          rdfs:subPropertyOf dbo:closeTo ;
          rdfs:domain ?about .
   }
""")
Out[73]:
about from
0 dbo:Place dbo:southEastPlace
1 dbo:Place dbo:northWestPlace
2 dbo:Place dbo:northPlace
3 dbo:Place dbo:northEastPlace
4 dbo:Place dbo:southPlace
5 dbo:Place dbo:eastPlace
6 dbo:Place dbo:southWestPlace
7 dbo:Place dbo:westPlace
time: 40 ms

Looking a the superproperties in DUL, these look much like the kind of properties one would expect to defined in an upper or middle ontology:

In [74]:
e.select("""
   SELECT ?to (COUNT(*) AS ?cnt) {
      ?from rdfs:subPropertyOf ?to .
      FILTER(STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   } GROUP BY ?to ORDER BY DESC(?cnt)
""")
Out[74]:
cnt
to
dul:coparticipatesWith 226
dul:sameSettingAs 169
dul:hasLocation 85
dul:hasParticipant 63
dul:isParticipantIn 42
dul:isPartOf 39
dul:isClassifiedBy 38
dul:hasCommonBoundary 33
dul:isMemberOf 31
dul:hasPart 26
dul:isLocationOf 24
dul:isSettingFor 15
dul:hasQuality 15
dul:hasMember 13
dul:isDescribedBy 9
dul:nearTo 8
dul:specializes 8
dul:hasComponent 7
dul:hasSetting 6
dul:isExpressedBy 6
dul:isSpecializedBy 6
dul:hasRole 4
dul:hasConstituent 3
dul:conceptualizes 3
dul:precedes 3
dul:unifies 2
dul:follows 2
dul:associatedWith 2
dul:hasRegion 2
dul:isAbout 2
dul:isRoleOf 1
dul:overlaps 1
dul:concretelyExpresses 1
time: 596 ms

A really common kind of property is a "part-of" relationship, known as meronymy if you like greek.

In [75]:
e.select("""
   SELECT ?domain ?p ?range {
      ?p 
          rdfs:subPropertyOf dul:isPartOf ;
          rdfs:domain ?domain ;
          rdfs:range ?range .
   }
""")
Out[75]:
domain p range
0 dbo:Settlement dbo:federalState dbo:PopulatedPlace
1 dbo:Mountain dbo:mountainRange dbo:MountainRange
2 dbo:Settlement dbo:geolocDepartment dbo:PopulatedPlace
3 dbo:Island dbo:lowestState dbo:PopulatedPlace
4 dbo:PopulatedPlace dbo:department dbo:PopulatedPlace
5 dbo:Place dbo:province dbo:Province
6 dbo:WineRegion dbo:isPartOfWineRegion dbo:WineRegion
7 http://dbpedia.org/ontology/Diocese,_Parish dbo:deanery dbo:Deanery
8 dbo:PopulatedPlace dbo:lieutenancyArea dbo:PopulatedPlace
9 dbo:MilitaryConflict dbo:isPartOfMilitaryConflict dbo:MilitaryConflict
10 dbo:PopulatedPlace dbo:sheading dbo:PopulatedPlace
11 dbo:Department dbo:prefecture dbo:PopulatedPlace
12 dbo:PopulatedPlace dbo:oldDistrict dbo:PopulatedPlace
13 dbo:Brain dbo:isPartOfAnatomicalStructure dbo:AnatomicalStructure
14 dbo:PopulatedPlace dbo:councilArea dbo:PopulatedPlace
15 dbo:AnatomicalStructure dbo:organSystem dbo:AnatomicalStructure
16 dbo:Country dbo:continent dbo:Continent
17 dbo:Place dbo:provinceLink dbo:Province
18 dbo:Settlement dbo:jointCommunity dbo:PopulatedPlace
19 dbo:Place dbo:sovereignCountry dbo:PopulatedPlace
20 dbo:PopulatedPlace dbo:oldProvince dbo:PopulatedPlace
21 dbo:Settlement dbo:isoCodeRegion xsd:string
22 dbo:PopulatedPlace dbo:parish dbo:PopulatedPlace
23 dbo:Place dbo:district dbo:PopulatedPlace
24 http://dbpedia.org/ontology/Parish,_Deanery dbo:diocese dbo:Diocese
25 dbo:PopulatedPlace dbo:metropolitanBorough dbo:PopulatedPlace
26 dbo:Island dbo:governmentRegion dbo:PopulatedPlace
time: 86 ms

Equivalent Property

The case of "part of" properties is a good example of a subproperty relationship in that, say, "Mountain X is a part of Y Mountain range" is clearly a specialization of "X is a part of Y." That's different from the case where two properties mean exactly the same thing.

Let's take a look at equivalent properties defined in the DBpedia Ontology:

In [76]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
    }
""")
Out[76]:
a b
0 dbo:isPartOf http://www.wikidata.org/entity/P361
1 dbo:league http://www.wikidata.org/entity/P118
2 dbo:compressionRatio http://www.wikidata.org/entity/P1247
3 dbo:inflow http://www.wikidata.org/entity/P200
4 dbo:author schema:author
5 dbo:ulanId http://www.wikidata.org/entity/P245
6 dbo:albumRuntime schema:duration
7 dbo:map schema:maps
8 dbo:owner http://www.wikidata.org/entity/P127
9 dbo:architecturalStyle http://www.wikidata.org/entity/P149
10 dbo:runtime schema:duration
11 dbo:imdbId http://www.wikidata.org/entity/P345
12 dbo:founder http://www.wikidata.org/entity/P112
13 dbo:nationality schema:nationality
14 dbo:foundingDate http://www.wikidata.org/entity/P571
15 dbo:duration schema:duration
16 dbo:ecNumber http://www.wikidata.org/entity/P591
17 dbo:residence http://www.wikidata.org/entity/P263
18 dbo:musicBy http://www.wikidata.org/entity/P86
19 dbo:kingdom http://www.wikidata.org/entity/P75
20 dbo:iso6392Code http://www.wikidata.org/entity/P219
21 dbo:architect http://www.wikidata.org/entity/P84
22 dbo:startDate schema:startDate
23 dbo:coordinates http://www.wikidata.org/entity/P625
24 dbo:birthDate schema:birthDate
25 dbo:author http://www.wikidata.org/entity/P50
26 dbo:killedBy http://www.wikidata.org/entity/P157
27 dbo:musicalArtist http://www.wikidata.org/entity/P175
28 dbo:bibsysId http://www.wikidata.org/entity/P1015
29 dbo:order http://www.wikidata.org/entity/P70
... ... ...
192 dbo:ethnicity http://www.wikidata.org/entity/P172
193 dbo:atomicNumber http://www.wikidata.org/entity/P1086
194 dbo:colour http://www.wikidata.org/entity/P462
195 dbo:homeport http://www.wikidata.org/entity/P504
196 dbo:genus http://www.wikidata.org/entity/P74
197 dbo:creator http://www.wikidata.org/entity/P170
198 dbo:deathDate http://www.wikidata.org/entity/P570
199 dbo:child http://www.wikidata.org/entity/P40
200 dbo:illustrator schema:illustrator
201 dbo:computingPlatform http://www.wikidata.org/entity/P400
202 dbo:selibrId http://www.wikidata.org/entity/P906
203 dbo:citizenship http://www.wikidata.org/entity/P27
204 dbo:license http://www.wikidata.org/entity/P275
205 dbo:cosparId http://www.wikidata.org/entity/P247
206 dbo:country http://www.wikidata.org/entity/P17
207 dbo:isbn http://www.wikidata.org/entity/P957
208 dbo:deathCause http://www.wikidata.org/entity/P509
209 dbo:apparentMagnitude http://www.wikidata.org/entity/P1215
210 dbo:fuelSystem http://www.wikidata.org/entity/P1211
211 dbo:okatoCode http://www.wikidata.org/entity/P721
212 dbo:formationDate http://www.wikidata.org/entity/P571
213 dbo:giniCoefficient http://www.wikidata.org/entity/P1125
214 dbo:bSide http://www.wikidata.org/entity/P1432
215 dbo:locatedInArea schema:containedIn
216 dbo:landingDate http://www.wikidata.org/entity/P620
217 dbo:relative schema:relatedTo
218 dbo:highestPoint http://www.wikidata.org/entity/P610
219 dbo:locatedInArea http://www.wikidata.org/entity/P131
220 dbo:genre schema:genre
221 dbo:episodeNumber schema:episodeNumber

222 rows × 2 columns

time: 112 ms

Many of these properties are from Wikidata, so it probably makes sense to bind a namespace for Wikidata.

In [77]:
g.bind("wikidata","http://www.wikidata.org/entity/")
e=LocalEndpoint(g)
time: 1.5 ms

This kind of equivalency with Wikidata is meaningful precisely because DBpedia and Wikidata are competitive (and cooperative) databases that cover the same domain. Let's take a look at equivalencies to databases other than Wikidata:

In [78]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/entity/"))
    }
""")
Out[78]:
a b
0 dbo:author schema:author
1 dbo:albumRuntime schema:duration
2 dbo:map schema:maps
3 dbo:runtime schema:duration
4 dbo:nationality schema:nationality
5 dbo:duration schema:duration
6 dbo:startDate schema:startDate
7 dbo:birthDate schema:birthDate
8 dbo:director schema:director
9 dbo:club dbo:team
10 dbo:firstPublisher schema:publisher
11 dbo:jureLanguage dbo:language
12 dbo:numberOfEpisodes schema:numberOfEpisodes
13 dbo:mediaType schema:bookFormat
14 dbo:landArea dbo:area
15 dbo:producer schema:producer
16 dbo:isbn schema:isbn
17 dbo:language schema:inLanguage
18 dbo:publisher schema:publisher
19 dbo:waterArea dbo:area
20 dbo:deFactoLanguage dbo:language
21 dbo:artist schema:byArtist
22 dbo:filmRuntime schema:duration
23 dbo:starring schema:actors
24 dbo:restingDate schema:deathDate
25 dbo:deathDate schema:deathDate
26 dbo:numberOfPages schema:numberOfPages
27 dbo:award schema:awards
28 dbo:parentOrganisation schema:branchOf
29 dbo:musicComposer schema:musicBy
30 dbo:picture schema:image
31 dbo:spouse schema:spouse
32 dbo:endDate schema:endDate
33 dbo:illustrator schema:illustrator
34 dbo:locatedInArea schema:containedIn
35 dbo:relative schema:relatedTo
36 dbo:genre schema:genre
37 dbo:episodeNumber schema:episodeNumber
time: 177 ms

The vast number of those link to schema.org, except for a handful which link to other DBpedia Ontology properties.

In [79]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
        FILTER(STRSTARTS(STR(?b),"http://dbpedia.org/ontology/"))
    }
""")
Out[79]:
a b
0 dbo:club dbo:team
1 dbo:jureLanguage dbo:language
2 dbo:landArea dbo:area
3 dbo:waterArea dbo:area
4 dbo:deFactoLanguage dbo:language
time: 326 ms

The quality of these equivalencies are questionable to me; for instance, in geography, people often publish separate "land area" and "water areas" for a region. Still, out of 30,000 facts, I've seen fewer than 30 that looked obviously wrong: an error rate of 0.1% is not bad on some terms, but if we put these facts into a reasoning system, small errors in the schema can result in an avalanche of inferred facts resulting in a disproportionately large impact on results.

Namespaces

Rather than starting with a complete list of namespaces used in the DBpedia Ontology, I gradually added them as they turned up in queries. It would be nice to have a tool that automatically generates this kind of list, but for the time being, I am saving this list here for future reference.

In [80]:
e.namespaces()
Out[80]:
prefix namespace
bibo http://purl.org/ontology/bibo/
cc http://creativecommons.org/ns#
dbo http://dbpedia.org/ontology/
dc http://purl.org/dc/terms/
dul http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#
dzero http://www.ontologydesignpatterns.org/ont/d0.owl#
foaf http://xmlns.com/foaf/0.1/
owl http://www.w3.org/2002/07/owl#
prov http://www.w3.org/ns/prov#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
schema http://schema.org/
skos http://www.w3.org/2004/02/skos/core#
type http://dbpedia.org/datatype/
vann http://purl.org/vocab/vann/
wikidata http://www.wikidata.org/entity/
xml http://www.w3.org/XML/1998/namespace
xsd http://www.w3.org/2001/XMLSchema#
time: 20 ms

Conclusion and next steps

In this notebook, I've made a quick survey of the contents of the DBpedia Ontology. This data set is useful to build into the "local" tests for Gastrodon because it is small enough to work with in memory, but complex enough to be a real-life example. For other notebooks, I work over the queries and data repeatedly to eliminate imperfections that make the notebooks unclear, but here the data set is a fixed target, which makes it a good shakedown cruise for Gastrodon in which I was able to fix a number of bugs and make a number of improvements.

One longer term goal is to explore data from DBpedia and use it as a basis for visualization and data analysis. The next step towards that is to gather data from the full DBpedia that will help prioritize the exploration (which properties really get used?) and answer some questions that are still unclear (what about the data types which aren't used the schema?)

Another goal is to develop tools to further simplify the exploration of data sets and schemas. The top_properties function defined above is an example of the kind of function that could be built into a function library that would reduce the need to write so many SPARQL queries by hand.