Querying the Knowlege Graph
Before querying your knowledge graph, you must make sure its already modeled and mapped. If you haven't modeled your knowledge graph yet, its recommended you review first:
Summary of Timbr schemas
Timbr creates five virtual schemas to represent the ontology in a relational way. For more information about Timbr schemas please follow the links below:
timbr schema: Intrinsic definition - explicit ontology/relational mapping.
etimbr schema: Exhaustive definition - Timbr schema and implicit derived properties.
dtimbr schema: Dereferenced definition - etimbr schema with graph traversals.
vtimbr schema: A schema that expose views saved in a given ontology.
gtimbr schema: A schema to run Graph Algorithms on the ontology concepts.
The gtimbr schema is additional and is not included in the basic Timbr license.
Query Timbr schema
Used when no graph traversals or derived properties are required in the results. Direct and inherited properties of a concept are shown as table columns.
Who are all the soccer managers who were also soccer players?
The concept (represented as a virtual table) we are querying is the concept: timbr.person. We used the Timbr schema since we don’t do graph traversal or ask for properties defined by the concepts derived from person (like the concepts soccerplayer and soccermanager). The following properties (represented as columns) were used for this query:
entity_id – this is instance id (in our case the dbpedia id) of the person we are querying.
entity_label – represents the entity label (rdfs:label) of a person.
birthdate – represents the birthdate of a person.
entity_type – represents the entity type of a person (rdf:type) and we want to filter all the persons with type of soccermananger and soccerplayer.
Lastly we want to order by the birthdate in an ascending order.
The SQL query is as follows:
SELECT entity_id AS `Datasource Link`, entity_label AS `Full_Name`, birthdate
FROM `timbr`.`person`
WHERE entity_type = 'soccermanager,soccerplayer'
ORDER BY birthdate
The results are:
Which universities have the highest number of faculty members?
The concept (represented as a virtual table) we are querying is the concept: timbr.university. We used the schema of Timbr since we don’t do graph traversal or ask for properties defined by the concepts derived from university concept. The following properties (represented as columns) were used for this query:
entity_label – represents the entity label (rdfs:label) of the university.
facultysize – represents the faculty size in a university.
numberofstudents – represents the number of students in a university.
numberofundergraduatestudents – represents the number of undergraduate students in a university.
Lastly we want to order by the facultysize in a descending order.
The SQL query is as follows:
SELECT entity_label AS `University`,
facultysize AS `Faculty Size`,
numberofstudents AS `Number of Students`,
numberofundergraduatestudents AS `Number of Undergraduate Students`
FROM `timbr`.`university`
ORDER BY facultysize DESC
The results are:
What types of persons and how many of each exist in the DBpedia ontology?
We use Apache Spark as our back-end and we can enjoy the native Apache Spark SQL functions like format_number(), The concept we are querying is the concept: timbr.person. We used the schema of Timbr since we don’t do graph traversal or ask for properties defined by the concepts derived from person concept. The following properties (represented as columns) were used for this query:
entity_type – represents the entity type of a person (rdf:type).
This column represents the type of each instance (row)
We order by the COUNT(1) (number of people) in descending order.
The SQL query is as follows:
SELECT format_number(COUNT(1), 0) AS `Number of People`,
entity_type AS `Belong to Category (Concepts)`
FROM `timbr`.`person`
GROUP BY entity_type
ORDER BY COUNT(1) DESC
The results are:
Query etimbr schema
Used when properties from derived concepts are needed. The schema is used when querying a concept requiring properties that some of its derived concepts.
What are the things having a slogan which contain the word 'Best'?
The concept we are querying is the concept: etimbr.thing. Every concept inherits from the concept thing. Therefore when we query the concept using the etimbr schema, we subsequently query all the concepts and properties in the ontology. In this case, we query any concept containing the property slogan We used the schema of etimbr because we query for properties defined by the concepts derived from thing (concepts such as broadcastnetwork, media, company, and more). The following properties (represented as columns) were used for this query:
entity_label – represents the entity label (rdfs:label) of the thing we are querying.
entity_type – represents the entity type of the thing we are querying (rdf:type).
slogan – since the concept thing in the schema etimbr contains all the properties of all the concepts, we can filter on the slogan property to query for all concepts containing the property of slogan and filter them by the ones containing the word ‘Best’.
The SQL query is as follows:
SELECT entity_label AS `Label`,
entity_type AS `Category`,
slogan AS `Slogan`
FROM `etimbr`.`thing`
WHERE `slogan` LIKE '%Best%'
The results are:
What types of works contain the word 'Batman' (including the release date, ISBN, or IMDB-ID if available)?
The concept we are querying is the concept: etimbr.work When we query the concept work using the etimbr schema, we subsequently query all the concepts and properties which derived from work. In this case, we query the derived concepts from the concept work containing the properties:
releasedate – any derived concept with a property releasedate
imdbid – a property indicating the id in the IMDB website. This property usually applies to tv series and movies
isbn – which is a global identifier for books, novels, and other written works
The following properties (represented as columns) were used for this query:
entity_id – this is instance id (in our case the dbpedia id) of the work we are querying.
entity_label – represents the entity label (rdfs:label) of the work.
entity_type – represents the entity type (rdf:type) of the work.
releasedate – If a derived concept from work has a property of releasedate it will be shown, otherwise it will be null.
imdbid – If a derived concept from work has a property of imdbid it will be shown, otherwise it will be null.
isbn – If a derived concept from work has a property of isbn it will be shown, otherwise it will be null.
The SQL query is as follows:
SELECT entity_id AS `Datasource Link`,
entity_label AS `Title`,
entity_type AS `Category`,
releasedate AS `Release Date`,
imdbid AS `IMDB-ID`,
isbn AS `ISBN`
FROM `etimbr`.`work`
WHERE entity_label LIKE '%Batman%'
The results are:
What are all the things (any concept with a date) that happened on January 1st, 2000?
The concept we are querying is the concept: etimbr.thing. Every concept inherits from the concept thing. Therefore when we query the concept using the etimbr schema, we subsequently query all the concepts and properties in the ontology. In this case, we query any concept containing the property date. Note – as date is the property we want to query and also a reserved SQL word, we need to quote it with the ` sign (in our case ` is used for Apache Spark but it depends on the underlying engine) as `date`. as a best practice, its always good to encompass properties with the ` sign We used the schema of etimbr because we want any concept with the date property. The following properties were used for this query:
resource_id – this is the primary key we use in dbpedia (also represented as entity_id in the ontology) and represents the instance unique id (in our case the dbpedia id).
entity_label – represents the entity label (rdfs:label) of the thing we are querying.
entity_type – represents the entity type (rdf:type) of the thing we are querying.
date – any derived concept from thing that has a property (represented as a column) of date will return. Then we filter the date by requesting only the dates happening on the 1st of January 2000 in (ISO 8601 calendar date format, i.e. – “YYYY-MM-DD”).
The SQL query is as follows:
SELECT resource_id AS `Datasource Link`,
entity_type AS `Category`,
entity_label AS `Label`
FROM `etimbr`.`thing`
WHERE `date` = '2000-01-01'
The results are:
Query dtimbr schema
Used when we want to perform graph traversals between concepts (JOIN the data between concepts) The schema is for queries involving other concepts which are connected through relationships.
What entities are owners of banks in Europe?
The concept (represented as a virtual table) we are querying is the concept: dtimbr.bank We used the schema of dtimbr since we want to use the relationship between the concept of a bank to a concept of a country and subsequently to a concept of a continent in our query. The following properties (represented as columns) were used for this query:
entity_label – represents the entity label (rdfs:label) of the bank we are querying.
`owner[agent].entity_label` – represents the entity label (rdfs:label) of the owner of the bank we are querying. owner is a reference (ObjectProperty) to agent concept which may be a person, a company or an organisation.
`country[country].entity_label` – represents the entity label (rdfs:label) of the country of the bank we are querying. country is a reference (ObjectProperty) to country concept. in dtimbr we can use the country reference as a path in the graph.
`country[country].continent[continent].entity_label` – represents the entity label (rdfs:label) of the continent of the bank's country we are querying. continent is a path that jumps two hops. first from the current concept bank to the concept of country, then from the concept of country to the continent concept, and we request the entity_label (rdfs:label) of that continent concept.
We filter the results by which the continent of the company must be Europe. Finally, we order our results by owner in ascending order.
The SQL query is as follows:
select entity_label as `Bank`,
`owner[agent].entity_label` as `Owner`,
`country[country].entity_label` as `Bank_Country`,
`country[country].continent[continent].entity_label` as `Bank_Continent`
from dtimbr.bank
where `country[country].continent[continent].entity_label` = 'Europe'
order by `Owner`
The results are:
What companies in Europe or Asia have the highest operating income per employee?
We answered this query using a back-end of Apache Spark and therefore we enjoy native Apache Spark SQL functions like format_number(). The concept (virtual table) we are querying is the concept: dtimbr.company We used the schema of dtimbr since we want to use the relationship between the concept of a company to a concept of a country and subsequently to a concept of a continent in our query. The following properties (represented as columns) were used for this query:
entity_label – represents the entity label (rdfs:label) of the company we are querying.
operatingincome – represents the operatingincome of a company.
numberofemployees – represents the numberofemployees of a company.
`country[country].entity_label` – represents the entity label (rdfs:label) of the country of the company we are querying.
`country[country].continent[continent].entity_label` – represents the entity label (rdfs:label) of the continent of the company's country we are querying.
Note – we also select for a column which divides the operating income by the number of employees (operatingincome/numberofemployees) to answer our query
We filter the results by the following parameters:
the operatingincome of the company must be over 150,000,000 USD
the continent of the company must be Europe or Asia
Finally, we order our results by operatingincome in descending order.
The SQL query is as follows:
SELECT entity_label AS `Company Name`,
format_number(operatingincome, 0) AS `Operating Income`,
format_number(numberofemployees, 0) AS `Number of Employees`,
format_number(operatingincome/numberofemployees, 0) AS `Operating Income per Employee`,
`country[country].entity_label` AS `Country`,
`country[country].continent[continent].entity_label` AS `Continent`
FROM `dtimbr`.`company`
WHERE operatingIncome > 150000000
The results are:
What are all the companies that their parent company is owned by an organisation containing the word 'Warner'?
The concept we are querying is: dtimbr.organisation Company concept is derived from Organisation concept. Organisation concept is derived from Agent concept. Person concept is also derived from Agent concept. We used the schema of dtimbr since we want to use the relationship between the concept of an organisation to a concept of another organisation and subsequently to another concept of an agent in our query. The following properties (represented as columns) were used for this query:
entity_label – represents the entity label (rdfs:label) of the organisation we are querying.
`owningcompany[company].entity_label` – An inherited property of the concept organisation (which is inherited by the concept agent). represents the entity label (rdfs:label) of an agent concept which may be a person, a company, an organisation, or any other concept deriving from agent.
`owningcompany[company].owner[agent].entity_label` – An inherited property of the concept organisation (which is inherited by the concept agent). represents the entity label (rdfs:label) of the owner of the owningcompany. This owner of an owningcompany is an agent which may be a person, a company, an organisation, or any other concept deriving from agent.
entity_type – represents the entity type (rdf:type) of the organisation we are querying.
`country[country].entity_label` – represented a path in the graph to the country concept. we select the entity label (rdfs:label) of the country of the organisation we are querying.
We filter the results by the following parameters:
the continent of the company must be Europe
Finally, we order our results by owner in ascending order.
The SQL query is as follows:
select entity_label as `Company Name`,
`owningcompany[company].entity_label` as `Parent Company`,
`owningcompany[company].owner[agent].entity_label` as `Owner of Parent Company`,
entity_type as `Company Category`,
`country[country].entity_label` as `Company Country`
from dtimbr.organisation
where `owningcompany[company].owner[agent].entity_label` like '%Warner%'
The results are:
Who are the many callers and the recipinets of those calls?
We answered this query using the concept (virtual table): dtimbr.person. We used the schema of dtimbr since we want to use the Many-to-Many relationship between the many people who made calls, to the many people who received those calls. The following properties (represented as columns) were used for this query:
caller[person].person_id – represents the unique id of people who made calls to others.
caller[person].first_name – represents the name of people who made calls.
callee[person].person_id – represents the unique id of people who received calls from others.
callee[person].first_name – represents the name of people who received calls.
The SQL query is as follows:
SELECT `caller[person].person_id` as `caller_id`,
`caller[person].first_name` as `caller_name`,
`callee[person].person_id` as `callee_id`,
`callee[person].first_name` as `callee_name`
FROM dtimbr.person
The results are:
Who are the subsidiary companies of Intel as well as thier transitivty levels?
We answered this query using the concept (virtual table): dtimbr.company. We used the schema of dtimbr since we want to use the transitive relationship between the parnet company Intel and its subsidiaries. The Transitive relationship is used in the square parentheses by choosing the concept we want and conecting it to a * followed by the number of levels of transitivity we would like to see. In this query it will show not only who is part of the Intel hierarchy but also at what level in the hierarchy each company is, showing us up to four levels of transitivity as we declared in the query. The following properties (represented as columns) were used for this query:
`organization_id` – represents the unique id given to the main owner company Intel.
`organization_name` – represents the name of the owner company Intel.
`purchased[company*4].organization_id` – represents the unique id of up to 4 levels of subsidiary companies of Intel.
`purchased[company*4].organization_name` – represents the name of up to 4 levels of subsidiary companies of Intel.
`purchased[company*4]_transitivity_level` – represents the transitivity_level (hierarchy level) to up to 4 levels of subsidiary companies of Intel.
We filter the results by the following parameters:
the owning organization name must be Intel.
The SQL query is as follows:
SELECT `organization_id` as owner_id,
`organization_name` as owner_name,
`purchased[company*4].organization_id` as subsidiary_id,
`purchased[company*4].organization_name` as subsidiary_name,
`purchased[company*4]_transitivity_level` as company_transitivity_level
FROM dtimbr.company
WHERE `organization_name` = 'Intel'
The results are:
In order to query a relationship property in the SQL editor or from any other endpoint, users must use the dtimbr schema which exposes relationships. Once loading a concept's metadata, the relationship properties will appear with "_" (underscore) before the property name indicating that this is a property of the many-to-many relationship and not the property of the target concept which is identified with a dot before the property name.
The syntax of a relationship property appears as follows:
<relationship>[<target_concept>]_<relationship property>
Querying measures in your data model
Measures enable multi-dimensional analysis in timbr using standard SQL. Measures are properties with aggregate calculations like totals, counts, or sums once and reuse them in queries.
The measures metadata is exposed in the dtimbr
(denormalized) schema and when querying a concept with measures and properties you'll need to define a GROUP BY for the non-aggregated properties.
Measures already use aggregate functions so you can't include them in the GROUP BY
or WHERE
clause.
But they can be used in HAVING
clause like any other aggregate functions.
Example 1: Querying Measures with Aggregate Functions
SELECT `of_customer[customer].customer_segment` as customer_segment,
AGGREGATE(`measure.count_of_order`) as number_of_orders ,
AGGREGATE(`measure.count_of_customer`) as number_of_customers,
AGGREGATE(`measure.total_revenue`) as total_revenue,
AGGREGATE(`measure.total_sales`) as total_sales
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`
of_customer[customer].customer_segment
: This field represents the customer segment of a given customer in the order data. It categorizes customers into different segments (e.g., Corporate, Individual, etc.)AGGREGATE(
measure.count_of_order
): This measure field calculates the total number of orders. The AGGREGATE function in Timbr sums up the count_of_order measure for each customer segment.AGGREGATE(
measure.count_of_customer
): This field provides the total number of unique customers. Timbr aggregates the count of customers, which helps track how many customers placed orders in each segment.AGGREGATE(
measure.total_revenue
): This field aggregates the total revenue generated by each customer segment. It sums up the revenue from all orders placed by customers in each segment.AGGREGATE(
measure.total_sales
): This field computes the total sales for each customer segment. It sums up all sales figures within the specified customer segment, giving an overview of sales performance.
In timbr you can either use AGGREGATE
or SQL aggregate functions like COUNT
/SUM
/AVG
when referencing a measure in a query.
Timbr rewrites the SQL to use the aggregate function that was defined in the measure creation.
Every measure is prefixed with measure.
to allow users to differentiate between properties and measures when exploring the concept table metadata.
Example 2: Measures with SQL Aggregate Functions
Timbr support standrad SQL aggregate functions when using measures in order to support SQL clients that aren't aware of the AGGREGATE function.
SELECT `of_customer[customer].customer_segment` as customer_segment,
COUNT(`measure.count_of_order`) as number_of_orders,
COUNT(`measure.count_of_customer`) as number_of_customers,
SUM(`measure.total_revenue`) as total_revenue,
SUM(`measure.total_sales`) as total_sales
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`
There is not difference between COUNT(measure.count_of_order
) as number_of_orders or AGGREGATE(measure.count_of_order
) as number_of_orders as timbr will use the aggregate function that was defined in count_of_order
of_customer[customer].customer_segment
: As in the first example, this field represents the customer segment, helping categorize customers for analysis.COUNT(
measure.count_of_order
): This SQL aggregate function counts the total number of orders placed within each customer segment.COUNT(
measure.count_of_customer
): This counts the total number of unique customers per segment, helping you understand customer participation.SUM(
measure.total_revenue
): The SQL SUM function aggregates the total revenue generated from each customer segment.SUM(
measure.total_sales
): The SQL SUM function here adds up the total sales within each customer segment, showing overall sales contribution.
Query with filtering a measure after the GROUP BY (using HAVING
):
-- Use measure in HAVING
SELECT `of_customer[customer].customer_segment` as customer_segment,
AGGREGATE(`measure.count_of_order`) as number_of_orders
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`
HAVING AGGREGATE(`measure.count_of_order`) > 100000
-- Use alias in HAVING
SELECT `of_customer[customer].customer_segment` as customer_segment,
AGGREGATE(`measure.count_of_order`) as number_of_orders
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`
HAVING number_of_orders > 100000
Example 3: Measures with Dynamic Filters
You can filter a specific measure in timbr at runtime (filters can also be defined when creating the measure).
SELECT `of_customer[customer].customer_segment` as customer_segment,
AGGREGATE(`measure.count_of_order`) as number_of_orders,
AGGREGATE(`measure.total_revenue`) FILTER (where market = 'Europe') as total_sales_europe,
AGGREGATE(`measure.total_revenue`) FILTER (where market = 'US') as total_sales_us
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`
Dynamic filters can be applied on top of measures that already have a filter. Timbr will combine the filters at run-time:
-- Create a measure with filter
CREATE MEASURE total_revenue_europe decimal
AS SELECT total_sales - total_discount
WHERE market = 'Europe';
-- Add a filter ad-hoc in the query:
SELECT `of_customer[customer].customer_segment` as customer_segment,
AGGREGATE(`measure.total_revenue_europe`) FILTER (WHERE brand = 'MyBrand') as total_sales_europe_mybrand
FROM dtimbr.`order`
GROUP BY `of_customer[customer].customer_segment`;
Timbr translates the filter to CASE WHEN for the database in runtime:
of_customer[customer].customer_segment
: As previously, this field represents the customer segment.AGGREGATE(
measure.count_of_order
): This aggregates the total number of orders in each customer segment.AGGREGATE(
measure.total_revenue
) FILTER (where market = 'Europe'): This field sums up the total revenue from European markets. By applying a filter (market = 'Europe'), we narrow the aggregation to orders placed in Europe.AGGREGATE(
measure.total_revenue
) FILTER (where market = 'US'): Similarly, this field sums up the total revenue from US markets, filtering the total revenue to reflect only orders from the US.
Regardless of whether you use aggregate functions like AGGREGATE, SUM, or COUNT, the core logic in Timbr's calculations is defined by the measure itself. This means that the calculation you see is driven by how the user configured the measure. Whether you're aggregating data or counting values, it's the user-defined measure that dictates the behavior, ensuring consistency and flexibility in querying the data.
Querying multiple Knowledge Grpahs
Querying multiple Knowledge Graphs allows users to combine data from different data models (Ontologies) in a single query. This feature is especially useful in complex environments where data is distributed across various domains or datasets, enabling a holistic view and interaction between disparate data sources.
Querying Template
Required information for Querying multiple Knowledge Grpahs (the curly brackets {} should not be an input, they are used only as a variable substitution):
{ontology_name} - The ontology model of the knowldge graph you'd like to query
{timbr_schema} - The virtual Timbr schema you'd like to query
{concept_name/view_name} - The name of the concept or view you'd like to query
SELECT ... FROM {ontology_name}.{timbr_schema}.{concept_name/view_name}
Supported Timbr schemas: timbr, dtimbr, vtimbr.
Example using the timbr schema
The following example demonstrates how to query concepts from different ontologies using the timbr
schema.
SELECT name FROM calls.timbr.person
UNION ALL
SELECT customer_name FROM supply_chain.timbr.customer
Example using the dtimbr schema
This example shows how to perform a query involving the dtimbr
schema, allowing for graph traversals.
SELECT name FROM calls.dtimbr.person
WHERE `has_device[device].device_id` IN (
SELECT `made_order[order].includes_product[product].product_id` FROM supply_chain.dtimbr.customer
)
Requirements
- Single datasource, multiple knowledge graphs:
This setup works if the Knowledge Graphs share the same datasource (e.g., MySQL). If different datasources are queried simultaneously, one of the sources (which is not being used in the query) will return null values (not an error), indicating no data available from that particular source.
- Multiple datasources, multiple knowledge graphs:
When using a virtualization datasource (such as Spark or Databricks), it's possible to query multiple Knowledge Graphs and datasources, and recieve consolidated results, provided the user has the necessary access permissions to the different datasources and knowldge graph models.
It’s important to mention that when querying multiple ontologies, users can create an ontology view that queries multiple ontologies.