Skip to main content

Datasource

This section is about operating datasources in Timbr. Datasources are used as the execution engine for all the queries run on an ontology. Timbr connects to a datasource using a JDBC driver, and can theoretically connect to any datasource that has a JDBC driver. The types of datasources that are currently supported out-of-the-box are:

Supported Datasources

  • Amazon Athena with driver: com.simba.athena.jdbc.Driver
  • Amazon Redshift with driver: com.amazon.redshift.jdbc.Driver
  • Apache Drill with driver: org.apache.drill.jdbc.Driver
  • Apache Hive with driver: org.apache.hive.jdbc.HiveDriver
  • Apache Spark with driver: org.apache.hive.jdbc.HiveDriver
  • Azure Blob Storage with JSON connection template
  • Azure Datalake Storage with driver: org.apache.drill.jdbc.Driver
  • Clickhouse with driver: com.clickhouse.jdbc.ClickHouseDriver
  • Databricks with driver: com.simba.spark.jdbc.Driver
  • Google BigQuery with driver: com.simba.googlebigquery.jdbc42.Driver
  • Google Cloud Storage with JSON connection template
  • Microsoft SQL Server with driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
  • MySQL with driver: com.mysql.jdbc.Driver
  • Oracle with driver: oracle.jdbc.driver.OracleDriver
  • Oracle 11 with driver: oracle.jdbc.driver.OracleDriver
  • PostgreSQL with driver: org.postgresql.Driver
  • Presto with driver: com.facebook.presto.jdbc.PrestoDriver
  • S3 with driver: org.apache.drill.jdbc.Driver
  • SAP Hana with driver: com.sap.db.jdbc.Driver
  • Snowflake with driver: net.snowflake.client.jdbc.SnowflakeDriver
  • Trino with driver: io.trino.jdbc.TrinoDriver
  • Vertica Analytics Platform with driver: com.vertica.jdbc.Driver

Create a datasource

Required information when querying to create a new datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {db_name} - The name of the datasource as it will be presented in Timbr
  • {target_url} - The JDBC URL of the datasource, usually including a protocol, hostname, port and sometimes database/catalog
  • {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
  • {user} - The username credentials for the datasource
  • {password} - The password credentials for the datasource

To create a datasource you can run the command:

CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(URL '{target_url}', DRIVER '{datasource_driver}', TYPE '{datasource_type}', USER '{user}', PASSWORD '{password}') WITH VIRTUALIZATION

Optional keys are:

  • DESCRIPTION '{description}' - Provide a string with a description for the datasource
  • WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark

Create a datasource based on a JSON string

Some datasources like S3, Azure Datalake Storage, Google Cloud Storage, and Azure Blob Storage require a JSON connection configuration template.

Required information when querying to create a new datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {db_name} - The name of the datasource as it will be presented in Timbr
  • {json} - A stringify version of the template JSON used for the connection
  • {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
  • {json_secret} - The password credentials for the datasource

If you want to create such datasource requiring a JSON string, you can do it programmatically by running

CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(JSON '{json}', TYPE '{datasource_type}', PASSWORD '{json_secret}') WITH VIRTUALIZATION

Optional keys are:

  • DESCRIPTION '{description}' - Provide a string with a description for the datasource
  • WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark

Duplicating a Datasource

Required information for duplicating a Datasource using the export command (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource_name} - The name of the datasource being duplicated.

  • {new_datasource_name} - The new name of the datasource being duplicated.

EXPORT DATASOURCE `{datasource_name}` TO `{new_datasource_name}`

Export Datasource between different environments

Exporting a datasource to a remote/external environment

Required information for exporting a datasource to an external instance (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource_name} - The name of the datasource you want to export as presented in the current Timbr instance.
  • {overwrite} - Whether or not it should overwrite an existing datasource if one with the same name already exists. In order to overwrite you should write OVERWRITE, otherwise you can leave it empty
  • {timbr_url} - The timbr_url in the Timbr environment.
  • {remote_user_token} - The token value found in the Timbr environment.
EXPORT Datasource {datasource_name} {overwrite} TO EXTERNAL URL '{timbr_url}' USER 'token' PASSWORD '{remote_user_token}'

Show all datasources

Shows all the datasources associated with an ontology

SHOW DATASOURCES

Show SQL create datasources statement

Required information for show create datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource_name} - The name of the datasource you want to show a create statement for.
SHOW CREATE DATASOURCE `{datasource_name}` 

Change the description of a datasource

Required information when changing the description of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource} - The name of the datasource as it will be presented in Timbr
  • {description} - The description given to the datasource

When connected to a current ontology that has the datasource set as active, you can run the command

ALTER DATASOURCE `{datasource}` SET DESCRIPTION = '{description}'

Setting the virtualization of a datasource on / off

If a datasource was created with a WITH VIRTUALIZATION option, you can set it ON or OFF

Required information when setting the virtualization of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource} - The name of the datasource as it's presented in Timbr
  • {value} - 1 will set it ON and 0 will set it OFF
ALTER DATASOURCE `{datasource}` SET VIRTUALIZATION = {value}

Removing a datasource

Removes a datasource from Timbr.

Important

In order to remove a datasource from timbr you need to make sure it is not associated/connected to any ontology. If the datasource is associated/connected to an ontology, you should first remove it from that ontology and then remove it from Timbr.

Required information for removing a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

  • {datasource} - The name of the datasource as it's presented in Timbr
DROP DATASOURCE `{datasource}`