KG Probabilistic Schema

Build a probabilistic schema for a subset of a knowledge graph. The subset is defined by a SPARQL graph pattern, and the schema characterises the properties and values of the matching entities: occurrence distributions, value type breakdowns, numeric statistics, and top frequent values. Can be used on local files (only CLI) and SPARQL endpoints.

Disclaimer: This tool relies on heavy SPARQL queries, and was mostly tested on local SPARQL endpoints (i.e. on SPARQL endpoints from locally deployed triplestores). It will take a lot of time on files, and also, very likely, on remote SPARQL endpoint where it might not even complete. Online SPARQL endpoints having availability issues, we discourage you to try it with those unless you are sure the one you want to inspect can handle it.

Installation

pip install rdflib pandas SPARQLWrapper flask

CLI

python build_pschema.py <source> "<pattern>" [--hops N] <output_file>

Argument	Description
`source`	Path to a local RDF file or URL of a SPARQL endpoint
`pattern`	SPARQL graph pattern using `?x` as the main variable
`--hops N`	Number of hops to explore (default: 2)
`output_file`	Path for the output JSON file

Examples

Local RDF file, entities of type schema:Person:

python build_pschema.py data.ttl "?x a <http://schema.org/Person>" --hops 2 schema.json

Remote SPARQL endpoint, Wikidata humans:

python build_pschema.py https://query.wikidata.org/sparql \
    "?x wdt:P31 wd:Q5" --hops 1 schema.json

Output format

The output is a JSON file produced by summarizeSchema. For each property and inverse property of the matching entities it includes:

occurrences — frequency distribution of the number of values per entity (e.g. {"0": 0.05, "1": 0.88, "2": 0.07}), or descriptive statistics (avg, std, median, min, max) when cardinality is highly variable
types — frequency breakdown of value types (RDF class or XSD datatype)
values — one of:
- {"type": "numeric", "avg": …, "std": …, "median": …, "min": …, "max": …}
- {"type": "categorical", "top10": {"value": frequency, …}}
- {"type": "high_cardinality"} — no grouping possible (most values are near-unique)
subschema — recursive schema for the neighbouring entities (up to --hops levels)

Web interface

Start the server:

python app.py

Then open http://localhost:5000 in a browser.

Usage

Step 1 — Connect to a SPARQL endpoint by entering its URL and clicking Connect. The property list is fetched automatically. rdf:type is always included.

Step 2 — Define a pattern by selecting a property and a value. Both fields support free-text entry with autocomplete. The pattern used is ?x <property> <value> (or the appropriate literal form).

Step 3 — Build Schema runs the analysis (2 hops) and displays the summarized schema. Two views are available:

JSON — raw schema as formatted JSON, with an Export JSON button.
Diagram — UML-inspired graph where nodes represent entity sets and edges represent properties (solid arrows for outgoing, reverse arrows for incoming). Each node shows entity count, type distribution, and value statistics. Each edge shows the property name and occurrence distribution. Supports drag to pan and Ctrl+scroll to zoom. An Export SVG button downloads the diagram for use in documents.

Screenshots

Schema for entities of type Dataset in the DSKG knowledge graph.

JSON view

Diagram view

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
imgs		imgs
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
build_pschema.py		build_pschema.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG Probabilistic Schema

Installation

CLI

Examples

Output format

Web interface

Usage

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

KG Probabilistic Schema

Installation

CLI

Examples

Output format

Web interface

Usage

Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages