Skip to main content

Timbr SQLAlchemy Connector

This is a sample connector to Timbr using SQLAlchemy.

Dependencies

  • Access to a Timbr Server service

  • Python version 3.9.13 or newer

  • Support SQLAlchemy version 1.4.36 or newer. (version 2.x.x or newer is currently not supported).

  • For Linux based machines only install those dependencies first:

    • gcc
    • heimdal-dev
    • krb5
    • python-devel
    • python-dev
    • python-all-dev
    • libsasl2-dev
  • In Ubuntu run

    apt install gcc, heimdal-dev, krb5, python-devel, python-dev, python-all-dev, libsasl2-dev

Installation

  • Install as clone repository:

  • Install using pip and git:

    pip install git+https://github.com/WPSemantix/timbr_python_SQLAlchemy

  • Install using pip:

    pip install pytimbr-sqla

Sample usage

  • For an example of how to use the Python SQLAlchemy connector for timbr, follow this example file
  • For an example of how to use the Python SQLAlchemy connector with 'PyHive' as async query for timbr, follow this example file
  • For an example of how to use the Python SQLAlchemy connector with 'PyHive' as sync query for timbr, follow this example file

Connection parameters

General example

  hostname = '<TIMBR_IP/HOST>'
port = '<TIMBR_PORT>'
ontology = '<ONTOLOGY_NAME>'
protocol = '<http/https>'
username = '<TIMBR_USER/token>'
password = '<TIMBR_PASSWORD/TOKEN_VALUE>'

# hostname - The IP / Hostname of the Timbr server (not necessarily the hostname of the Timbr platform).
# port - The port to connect to in the Timbr server. Timbr's default port with enabled_ssl is 443 without SSL is 11000.
# ontology = The name of the ontology (knowledge graph) to connect.
# protocol - Connection protocol can be 'http' or 'https'.
# username - Use 'token' as the username when connecting using a Timbr token, otherwise use the user name.
# password - If using a token as a username then the pass is the token value, otherwise its the user's password.

HTTP example with dummy data

Username and password

  hostname = 'mytimbrenv.com'
port = '11000'
ontology = 'my_ontology'
protocol = 'http'
username = 'timbr'
password = 'StrongPassword'

Timbr token

  hostname = 'mytimbrenv.com'
port = '11000'
ontology = 'my_ontology'
protocol = 'http'
username = 'token'
password = '<TOKEN_VALUE>'

HTTPS example with dummy data

Username and password

  hostname = 'mytimbrenv.com'
port = '443'
ontology = 'my_ontology'
protocol = 'https'
username = 'timbr'
password = 'StrongPassword'

Timbr token

  hostname = 'mytimbrenv.com'
port = '443'
ontology = 'my_ontology'
protocol = 'https'
username = 'token'
password = '<TOKEN_VALUE>'

Connection options

1. Connect using pytimbr_sqla and SQLAlchemy packages

  from sqlalchemy import create_engine

# Declare the connection variables
# General example
hostname = '<TIMBR_IP/HOST>'
port = '<TIMBR_PORT>'
ontology = '<ONTOLOGY_NAME>'
protocol = '<http/https>'
username = '<TIMBR_USER/token>'
password = '<TIMBR_PASSWORD/TOKEN_VALUE>'

# hostname - The IP / Hostname of the Timbr server (not necessarily the hostname of the Timbr platform).
# port - The port to connect to in the Timbr server. Timbr's default port with enabled_ssl is 443 without SSL is 11000.
# ontology = The name of the ontology (knowledge graph) to connect.
# protocol - Connection protocol can be 'http' or 'https'.
# username - Use 'token' as the username when connecting using a Timbr token, otherwise use the user name.
# password - If using a token as a username then the pass is the token value, otherwise its the user's password.

# Create new sqlalchemy connection
engine = create_engine(f"timbr+{protocol}://{username}@{ontology}:{password}@{hostname}:{port}")

# Connect to the created engine
conn = engine.connect()

# Execute a query
query = "SHOW CONCEPTS"
res_obj = conn.execute(query)
results_headers = [(desc[0], desc[1]) for desc in res_obj.cursor.description]

# Optional Performance Tuning:
# By default, Timbr fetches results in batches of 10,000 rows at a time.
# You can increase this batch size for better performance with large datasets.
# Note: The maximum batch size is controlled by your Timbr server's
# "TIMBR_RESULTSET_MAX_FETCH_SIZE" configuration setting.
res_obj.cursor._arraysize = 20000

results = res_obj.fetchall()

# Print the columns name
for name, col_type in results_headers:
print(f"{name} - {col_type}")
# Print the results
for result in results:
print(result)

2. Connect using PyHive and SQLAlchemy packages

Attention

PyHive provides two operational modes: asynchronous and synchronous.

However, Timbr exclusively supports asynchronous query execution when using PyHive.

Connect using PyHive Asynchronous Mode

  from sqlalchemy import create_engine
from TCLIService.ttypes import TOperationState

# Declare the connection variables
# General example
hostname = '<TIMBR_IP/HOST>'
port = '<TIMBR_PORT>'
ontology = '<ONTOLOGY_NAME>'
protocol = '<http/https>'
username = '<TIMBR_USER/token>'
password = '<TIMBR_PASSWORD/TOKEN_VALUE>'
connect_args = {
'configuration': {
'set:hiveconf:hiveMetadata': 'true',
'set:hiveconf:active_datasource': '<datasource_name>',
'set:hiveconf:queryTimeout': '<TIMEOUT_IN_SECONDS>',
},
}

# hostname - The IP / Hostname of the Timbr server (not necessarily the hostname of the Timbr platform).
# port - The port to connect to in the Timbr server. Timbr's default port with enabled_ssl is 443 without SSL is 11000.
# ontology = The name of the ontology (knowledge graph) to connect.
# protocol - Connection protocol can be 'http' or 'https'.
# username - Use 'token' as the username when connecting using a Timbr token, otherwise use the user name.
# password - If using a token as a username then the pass is the token value, otherwise its the user's password.
# connect_args - The connection special arguments for extra customization. The only argument you must have is the first one (set:hiveconf:hiveMetadata) the others are optional.

# Create new sqlalchemy connection
engine = create_engine(f"hive+{protocol}://{username}@{ontology}:{password}@{hostname}:{port}", connect_args = connect_args)

# Connect to the created engine
conn = engine.connect()
dbapi_conn = engine.raw_connection()
cursor = dbapi_conn.cursor()

# Execute a query
query = "SHOW CONCEPTS"
cursor.execute(query)

# Check the status of this execution
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
status = cursor.poll().operationState

# Get the results of the execution
results_headers = [(desc[0], desc[1]) for desc in cursor.description]
results = cursor.fetchall()

# Display the results of the execution
# Print the columns name
for name, col_type in results_headers:
print(f"{name} - {col_type}")
# Print the results
for result in results:
print(result)

Connect using PyHive Synchronous Mode

  from sqlalchemy import create_engine
from TCLIService.ttypes import TOperationState

# Declare the connection variables
# General example
hostname = '<TIMBR_IP/HOST>'
port = '<TIMBR_PORT>'
ontology = '<ONTOLOGY_NAME>'
protocol = '<http/https>'
username = '<TIMBR_USER/token>'
password = '<TIMBR_PASSWORD/TOKEN_VALUE>'
connect_args = {
'configuration': {
'set:hiveconf:async': 'false',
'set:hiveconf:hiveMetadata': 'true',
'set:hiveconf:active_datasource': '<datasource_name>',
'set:hiveconf:queryTimeout': '<TIMEOUT_IN_SECONDS>',
},
}

# hostname - The IP / Hostname of the Timbr server (not necessarily the hostname of the Timbr platform).
# port - The port to connect to in the Timbr server. Timbr's default port with enabled_ssl is 443 without SSL is 11000.
# ontology = The name of the ontology (knowledge graph) to connect.
# protocol - Connection protocol can be 'http' or 'https'.
# username - Use 'token' as the username when connecting using a Timbr token, otherwise use the user name.
# password - If using a token as a username then the pass is the token value, otherwise its the user's password.
# connect_args - The connection special arguments for extra customization. The only 2 arguments you must have are the first and the second one (set:hiveconf:async, set:hiveconf:hiveMetadata) the others are optional.

# Create new sqlalchemy connection
engine = create_engine(f"hive+{protocol}://{username}@{ontology}:{password}@{hostname}:{port}", connect_args = connect_args)

# Connect to the created engine
conn = engine.connect()

# Execute a query
query = "SHOW CONCEPTS"
res_obj = conn.execute(query)
results_headers = [(desc[0], desc[1]) for desc in res_obj.cursor.description]
results = res_obj.fetchall()

# Print the columns name
for name, col_type in results_headers:
print(f"{name} - {col_type}")
# Print the results
for result in results:
print(result)