Datasource
This section is about operating datasources in Timbr. Datasources are used as the execution engine for all the queries run on an ontology. Timbr connects to a datasource using a JDBC driver, and can theoretically connect to any datasource that has a JDBC driver. The types of datasources that are currently supported out-of-the-box are:
Supported Datasources
Amazon Athenawith driver:com.simba.athena.jdbc.DriverAmazon Redshiftwith driver:com.amazon.redshift.jdbc.DriverApache Drillwith driver:org.apache.drill.jdbc.DriverApache Hivewith driver:org.apache.hive.jdbc.HiveDriverApache Sparkwith driver:org.apache.hive.jdbc.HiveDriverAzure Blob Storagewith JSON connection templateAzure Datalake Storagewith driver:org.apache.drill.jdbc.DriverClickhousewith driver:com.clickhouse.jdbc.ClickHouseDriverDatabrickswith driver:com.simba.spark.jdbc.DriverGoogle BigQuerywith driver:com.simba.googlebigquery.jdbc42.DriverGoogle Cloud Storagewith JSON connection templateNetSuitewith driver:com.netsuite.jdbc.openaccess.OpenAccessDriverMicrosoft SQL Serverwith driver:com.microsoft.sqlserver.jdbc.SQLServerDriverMySQLwith driver:com.mysql.jdbc.DriverOraclewith driver:oracle.jdbc.driver.OracleDriverOracle 11with driver:oracle.jdbc.driver.OracleDriverPostgreSQLwith driver:org.postgresql.DriverPrestowith driver:com.facebook.presto.jdbc.PrestoDriverS3with driver:org.apache.drill.jdbc.DriverSalesforcewith driver:com.salesforce.datacloud.jdbc.DataCloudJDBCDriverSAP Hanawith driver:com.sap.db.jdbc.DriverSnowflakewith driver:net.snowflake.client.jdbc.SnowflakeDriverTrinowith driver:io.trino.jdbc.TrinoDriverVertica Analytics Platformwith driver:com.vertica.jdbc.Driver
Create a datasource
Required information when querying to create a new datasource (the curly
brackets {} should not be an input, they are used only as a
variable substitution):
- {db_name} - The name of the datasource as it will be presented in Timbr
- {target_url} - The JDBC URL of the datasource, usually including a protocol, hostname, port and sometimes database/catalog
- {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
- {user} - The username credentials for the datasource
- {password} - The password credentials for the datasource
To create a datasource you can run the command:
CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(URL '{target_url}', DRIVER '{datasource_driver}', TYPE '{datasource_type}', USER '{user}', PASSWORD '{password}') WITH VIRTUALIZATION
Optional keys are:
- DESCRIPTION '{description}' - Provide a string with a description for the datasource
- WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark
Create a datasource based on a JSON string
Some datasources like S3, Azure Datalake Storage, Google Cloud Storage, and Azure Blob Storage require a JSON connection configuration template.
Required information when querying to create a new datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):
- {db_name} - The name of the datasource as it will be presented in Timbr
- {json} - A stringify version of the template JSON used for the connection
- {datasource_type} - The type of the datasource, as presented in the list of supported datasources above
- {json_secret} - The password credentials for the datasource
If you want to create such datasource requiring a JSON string, you can do it programmatically by running
CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(JSON '{json}', TYPE '{datasource_type}', PASSWORD '{json_secret}') WITH VIRTUALIZATION
Optional keys are:
- DESCRIPTION '{description}' - Provide a string with a description for the datasource
- WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark
Duplicating a Datasource
Required information for duplicating a Datasource using the export command (the curly brackets
{} should not be an input, they are used only as a variable
substitution):
{datasource_name} - The name of the datasource being duplicated.
{new_datasource_name} - The new name of the datasource being duplicated.
EXPORT DATASOURCE `{datasource_name}` TO `{new_datasource_name}`
Export Datasource between different environments
Exporting a datasource to a remote/external environment
Required information for exporting a datasource to an external instance (the curly brackets {} should not be an input, they are used only as a variable substitution):
- {datasource_name} - The name of the datasource you want to export as presented in the current Timbr instance.
- {overwrite} - Whether or not it should overwrite an existing
datasource if one with the same name already exists. In order to
overwrite you should write
OVERWRITE, otherwise you can leave it empty - {timbr_url} - The timbr_url in the Timbr environment.
- {remote_user_token} - The token value found in the Timbr environment.
EXPORT Datasource {datasource_name} {overwrite} TO EXTERNAL URL '{timbr_url}' USER 'token' PASSWORD '{remote_user_token}'
Show all datasources
Shows all the datasources associated with an ontology
SHOW DATASOURCES
Show SQL create datasources statement
Required information for show create datasource (the curly brackets {} should not be
an input, they are used only as a variable substitution):
- {datasource_name} - The name of the datasource you want to show a create statement for.
SHOW CREATE DATASOURCE `{datasource_name}`
Change the description of a datasource
Required information when changing the description of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):
- {datasource} - The name of the datasource as it will be presented in Timbr
- {description} - The description given to the datasource
When connected to a current ontology that has the datasource set as active, you can run the command
ALTER DATASOURCE `{datasource}` SET DESCRIPTION = '{description}'
Setting the virtualization of a datasource on / off
If a datasource was created with a WITH VIRTUALIZATION option, you can set it ON or OFF
Required information when setting the virtualization of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):
- {datasource} - The name of the datasource as it's presented in Timbr
- {value} -
1will set it ON and0will set it OFF
ALTER DATASOURCE `{datasource}` SET VIRTUALIZATION = {value}
Removing a datasource
Removes a datasource from Timbr.
In order to remove a datasource from timbr you need to make sure it is not associated/connected to any ontology. If the datasource is associated/connected to an ontology, you should first remove it from that ontology and then remove it from Timbr.
Required information for removing a datasource (the curly brackets {}
should not be an input, they are used only as a variable
substitution):
- {datasource} - The name of the datasource as it's presented in Timbr
DROP DATASOURCE `{datasource}`