Datasources
The Datasources component manages the different datasources that are associated with knowledge graphs. Users can add, edit, change, or remove a database and define the settings of each data source. A user can tag datasources, set supported datasources as virtualization and toggle them on or off, and set their description. Timbr supports granting granular access to different datasources based on users or roles.
If the datasource is Apache Spark or Databricks then the datasource can also be used as a virtualization engine and run queries from two different datasources.
Supported Datasources:
Timbr supports full back-end integration to any relational database that is SQL / ANSI SQL compliant or can be queried in SQL. The connection can be established either by a JDBC or ODBC connector.
The Timbr platform currently supports the following back-ends:
- Amazon Athena
- Amazon Redshift
- Amazon S3
- Snowflake
- Databricks
- SAP Hana
- Google BigQuery
- Google Cloud Storage
- MySQL
- Apache Spark
- Apache Drill
- Apache Hive
- Azure Blob Storage
- Azure Datalake Storage
- Oracle
- Impala
- Microsoft SQL Server
- PostgreSQL
- Trino
- Presto
- Vertica Analytics Platform
- IBM DB2
More can be added upon request
Manage Datasources
All the available datasources can be found in Manage Datasources, which can be accessed through the Manage tab by clicking on Datasources.
On the right of the screen is the search filter enabling you to search through the datasources using the search bar, as well as the option to decide whether to view the datasources as boxes or in a list.
Adding a new Datasource
on the top right is Test connection which when clicked on will open a window with a list of datasources that can be selected in order to test their connection. Next to Test connection is Add New Datasource that When clicked on will open a pop-up in order to begin adding a new datasource to the environment.
Another way to add a new datasource is by clicking the big add button beneath the knowledge graphs and datasources tabs.
When Add New Datasource or the Add button are clicked on, a pop-up window will appear where all the relevant connection details must be provided in order to connect the datasource to the environment.
Each back-end datasource has slightly different requirements in order to connect to it, but the following information is required for the vast majority of the back-end datasources:
- The hostname for accessing the back-end datasource server
- The port for accessing the back-end datasource server
- The username for the back-end datasource
- The password for the back-end datasource
Some back-end datasources like PostgreSQL, Athena, and SAP Hana require to specify a database name in order to create a connection
Other back-end datasources like Big Query require different information in order to create a connection. Parameters such as: - The project id as specified in the Google BigQuery project - The associated email with access to the Google BigQuery project specified in the project id field - A private key file in JSON format as provided by Google BigQuery for authentication
If supported by the datasource back-end, optional additional parameters (Additional Params) can be integrated into the connection between the Knowledge Graph and the back-end datasource
On the top right, the toggle can be switched from Simple to JDBC Url in order to connect the datasource using the relevant JDBC URL.
When the details are entered either with a username and password or via the JDBC URL, Save needs to be clicked in order to save the information and connect the datasource to the environment.
Timbr enables to add Virtualization for the Apache Spark and Databricks datasources. When these datasources are selected, a checkbox with an option for Active Virtualization will appear, indicating whether virtualization is on or off for the specific datasource. This option can be switched on and off at a later time as well.
Editing an existing Datasource
Each datasource both in box or list view contains 3 horizontal dots that when clicked on offer the following additional options on the selected Datasource:
Edit - Opens a window to edit the selected datasource and its different configurations.
Set description - Enables to add a description to the datasource to further explain what it represents.
Test connection - Executes a quick test to verify the datasource’s connectivity, checking if it is active and responsive.
Access Permissions - Opens a window to edit the Access permissions for the selected datasource.
Manage Tags - Opens a pop-up window to manage the existing tags given to the knowledge graph, as well as the option to add new tags.
Set virtualization - Allows you to enable or disable virtualization for the selected datasource via a pop-up window. (This option is available only for datasources configured with virtualization capabilities as discussed below).
Delete - Deletes the selected datasource.
Timbr offers Virtualization with Apache Spark and Databricks so in addition to the name given to the datasource, a small icon will appear on all Apache Spark or Databricks datasources signifying that those datasources have the ability of virtualization. When the virtualization is on the icon will be green, when it is turned off the icon will be black.
To turn the Virtualization on and off the 3 horizontal dots must be clicked. Here you will find Set Virtualization In addition to the 6 options every datasource has when the dots are clicked, which are: Edit, Set description, Test connection, Access Permissions, Manage Tags, and Delete.
After Set Virtualization is clicked, in the window that appears, when the With virtualization toggle is turned on and Set virtualization is clicked on the bottom right, virtualization will be turned on and the icon will show green. When the With virtualization toggle is turned off and Set virtualization is clicked on the bottom right, virtualization will be turned off and the icon will show black.