Datasource
This section is about operating datasources in Timbr. Datasources are used as the execution engine for all the queries run on an ontology. Timbr connects to a datasource using a JDBC driver, and can theoretically connect to any datasource that has a JDBC driver. The types of datasources that are currently supported out-of-the-box are:
Supported Datasources
Amazon Athena
with driver:com.simba.athena.jdbc.Driver
Amazon Redshift
with driver:com.amazon.redshift.jdbc.Driver
Apache Drill
with driver:org.apache.drill.jdbc.Driver
Apache Hive
with driver:org.apache.hive.jdbc.HiveDriver
Apache Spark
with driver:org.apache.hive.jdbc.HiveDriver
Azure Blob Storage
with JSON connection templateAzure Datalake Storage
with driver:org.apache.drill.jdbc.Driver
Clickhouse
with driver:com.clickhouse.jdbc.ClickHouseDriver
Databricks
with driver:com.simba.spark.jdbc.Driver
Google BigQuery
with driver:com.simba.googlebigquery.jdbc42.Driver
Google Cloud Storage
with JSON connection templateMicrosoft SQL Server
with driver:com.microsoft.sqlserver.jdbc.SQLServerDriver
MySQL
with driver:com.mysql.jdbc.Driver
Oracle
with driver:oracle.jdbc.driver.OracleDriver
Oracle 11
with driver:oracle.jdbc.driver.OracleDriver
PostgreSQL
with driver:org.postgresql.Driver
Presto
with driver:com.facebook.presto.jdbc.PrestoDriver
S3
with driver:org.apache.drill.jdbc.Driver
SAP Hana
with driver:com.sap.db.jdbc.Driver
Snowflake
with driver:net.snowflake.client.jdbc.SnowflakeDriver
Trino
with driver:io.trino.jdbc.TrinoDriver
Vertica Analytics Platform
with driver:com.vertica.jdbc.Driver
Create a datasource
Required information when querying to create a new datasource (the curly
brackets {}
should not be an input, they are used only as a
variable substitution):
- {db_name} - The name of the datasource as it will be presented in Timbr
- {target_url} - The JDBC URL of the datasource, usually including a protocol, hostname, port and sometimes database/catalog
- {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
- {user} - The username credentials for the datasource
- {password} - The password credentials for the datasource
To create a datasource you can run the command:
CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(URL '{target_url}', DRIVER '{datasource_driver}', TYPE '{datasource_type}', USER '{user}', PASSWORD '{password}') WITH VIRTUALIZATION
Optional keys are:
- DESCRIPTION '{description}' - Provide a string with a description for the datasource
- WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark
Create a datasource based on a JSON string
Some datasources like S3, Azure Datalake Storage, Google Cloud Storage, and Azure Blob Storage require a JSON connection configuration template.
Required information when querying to create a new datasource (the curly brackets {}
should not be an input, they are used only as a variable substitution):
- {db_name} - The name of the datasource as it will be presented in Timbr
- {json} - A stringify version of the template JSON used for the connection
- {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
- {json_secret} - The password credentials for the datasource
If you want to create such datasource requiring a JSON string, you can do it programmatically by running
CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(JSON '{json}', TYPE '{datasource_type}', PASSWORD '{json_secret}') WITH VIRTUALIZATION
Optional keys are:
- DESCRIPTION '{description}' - Provide a string with a description for the datasource
- WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark
Duplicating a Datasource
Required information for duplicating a Datasource using the export command (the curly brackets
{}
should not be an input, they are used only as a variable
substitution):
{datasource_name} - The name of the datasource being duplicated.
{new_datasource_name} - The new name of the datasource being duplicated.
EXPORT DATASOURCE `{datasource_name}` TO `{new_datasource_name}`
Export Datasource between different environments
Exporting a datasource to a remote/external environment
Required information for exporting a datasource to an external instance (the curly brackets {}
should not be an input, they are used only as a variable substitution):
- {datasource_name} - The name of the datasource you want to export as presented in the current Timbr instance.
- {overwrite} - Whether or not it should overwrite an existing
datasource if one with the same name already exists. In order to
overwrite you should write
OVERWRITE
, otherwise you can leave it empty - {timbr_url} - The timbr_url in the Timbr environment.
- {remote_user_token} - The token value found in the Timbr environment.
EXPORT Datasource {datasource_name} {overwrite} TO EXTERNAL URL '{timbr_url}' USER 'token' PASSWORD '{remote_user_token}'
Show all datasources
Shows all the datasources associated with an ontology
SHOW DATASOURCES
Show SQL create datasources statement
Required information for show create datasource (the curly brackets {}
should not be
an input, they are used only as a variable substitution):
- {datasource_name} - The name of the datasource you want to show a create statement for.
SHOW CREATE DATASOURCE `{datasource_name}`
Change the description of a datasource
Required information when changing the description of a datasource (the curly brackets {}
should not be an input, they are used only as a variable substitution):
- {datasource} - The name of the datasource as it will be presented in Timbr
- {description} - The description given to the datasource
When connected to a current ontology that has the datasource set as active, you can run the command
ALTER DATASOURCE `{datasource}` SET DESCRIPTION = '{description}'
Setting the virtualization of a datasource on / off
If a datasource was created with a WITH VIRTUALIZATION option, you can set it ON
or OFF
Required information when setting the virtualization of a datasource (the curly brackets {}
should not be an input, they are used only as a variable substitution):
- {datasource} - The name of the datasource as it's presented in Timbr
- {value} -
1
will set it ON and0
will set it OFF
ALTER DATASOURCE `{datasource}` SET VIRTUALIZATION = {value}
Removing a datasource
Removes a datasource from Timbr.
In order to remove a datasource from timbr you need to make sure it is not associated/connected to any ontology. If the datasource is associated/connected to an ontology, you should first remove it from that ontology and then remove it from Timbr.
Required information for removing a datasource (the curly brackets {}
should not be an input, they are used only as a variable
substitution):
- {datasource} - The name of the datasource as it's presented in Timbr
DROP DATASOURCE `{datasource}`