Welcome¶
qwikidata is a Python package with tools that allow you to interact with Wikidata.
The package defines a set of classes that allow you to represent Wikidata entities in a Pythonic way. It also provides a Pythonic way to access three data sources,
Quick Install¶
Requirements¶
python >= 3.5
Install with pip¶
You can install the most recent version using pip,
pip install qwikidata
Quick Examples¶
Linked Data Interface¶
from qwikidata.entity import WikidataItem, WikidataLexeme, WikidataProperty
from qwikidata.linked_data_interface import get_entity_dict_from_api
# create an item representing "Douglas Adams"
Q_DOUGLAS_ADAMS = "Q42"
q42_dict = get_entity_dict_from_api(Q_DOUGLAS_ADAMS)
q42 = WikidataItem(q42_dict)
# create a property representing "subclass of"
P_SUBCLASS_OF = "P279"
p279_dict = get_entity_dict_from_api(P_SUBCLASS_OF)
p279 = WikidataProperty(p279_dict)
# create a lexeme representing "bank"
L_BANK = "L3354"
l3354_dict = get_entity_dict_from_api(L_BANK)
l3354 = WikidataLexeme(l3354_dict)
SPARQL Query Service¶
from qwikidata.sparql import (get_subclasses_of_item,
return_sparql_query_results)
# send any sparql query to the wikidata query service and get full result back
# here we use an example that counts the number of humans
sparql_query = """
SELECT (COUNT(?item) AS ?count)
WHERE {
?item wdt:P31/wdt:P279* wd:Q5 .
}
"""
res = return_sparql_query_results(sparql_query)
# use convenience function to get subclasses of an item as a list of item ids
Q_RIVER = "Q4022"
subclasses_of_river = get_subclasses_of_item(Q_RIVER)
JSON Dump¶
import time
from qwikidata.entity import WikidataItem
from qwikidata.json_dump import WikidataJsonDump
from qwikidata.utils import dump_entities_to_json
P_OCCUPATION = "P106"
Q_POLITICIAN = "Q82955"
def has_occupation_politician(item: WikidataItem, truthy: bool = True) -> bool:
"""Return True if the Wikidata Item has occupation politician."""
if truthy:
claim_group = item.get_truthy_claim_group(P_OCCUPATION)
else:
claim_group = item.get_claim_group(P_OCCUPATION)
occupation_qids = [
claim.mainsnak.datavalue.value["id"]
for claim in claim_group
if claim.mainsnak.snaktype == "value"
]
return Q_POLITICIAN in occupation_qids
# create an instance of WikidataJsonDump
wjd_dump_path = "wikidata-20190401-all.json.bz2"
wjd = WikidataJsonDump(wjd_dump_path)
# create an iterable of WikidataItem representing politicians
politicians = []
t1 = time.time()
for ii, entity_dict in enumerate(wjd):
if entity_dict["type"] == "item":
entity = WikidataItem(entity_dict)
if has_occupation_politician(entity):
politicians.append(entity)
if ii % 1000 == 0:
t2 = time.time()
dt = t2 - t1
print(
"found {} politicians among {} entities [entities/s: {:.2f}]".format(
len(politicians), ii, ii / dt
)
)
if ii > 10000:
break
# write the iterable of WikidataItem to disk as JSON
out_fname = "filtered_entities.json"
dump_entities_to_json(politicians, out_fname)
wjd_filtered = WikidataJsonDump(out_fname)
# load filtered entities and create instances of WikidataItem
for ii, entity_dict in enumerate(wjd_filtered):
item = WikidataItem(entity_dict)
License¶
Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright¶
Copyright 2019 Kensho Technologies, LLC.
Important Links¶
readthedocs | PyPI | github