About data sources
From the Data Source section you can connect to a wide range of data platforms. This module supports integration with popular services such as MySQL, MS SQL Server, DB2, and Postgres databases through a user-friendly authentication and authorization workflow.
About pipelines
Data pipeline is a structured workflow that facilitates the movement and transformation of data from various data sources into your central warehouse and also in another data source. A pipeline can be compared to a conveyor belt that picks up data, performs specific actions on it, and then deposits it in its designated location.
Adding data sources
Only an Account admin or Lakehouse admin can add data sources.
Adding members to an orchestration
You can add users to an orchestration. If they do not already have access to associated pipelines, they can be granted permissions. Only Lakehouse members can be added, and they will receive creator permissions by default.
Creating Orchestrations
Orchestration in DataGOL allows you to define and manage the execution order of multiple pipelines. This is crucial when you have dependencies between pipelines, where one pipeline needs to complete successfully before another can begin. Without orchestration, scheduling pipelines to run independently might lead to failures or unexpected results if a dependent pipeline runs before its required data is ready.
Deleting an orchestration
To delete an orchestration, do the following:
Designating a partition column
To designate a partition column, do the following:
Editing an Orchestration
To edit an orchestration, do the following:
Generating column metadata catalog with AI
To generate the column metadata catalog using AI, do the following:
Jobs
Navigate to Jobs: Under the Lakehouse section, go to Jobs.
Managing data sources
- Data sources can be sorted by creation date, displaying either the latest or oldest entries.
Managing Orchestrations
You can manage and monitor the orchestrations from the Orchestrations page. The main Orchestration page display a list of all created orchestrations.
Running an orchestration
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration. The list of created orchestrations is displayed.
Schema change detection
DataGOL's Schema Change Detection feature monitors the source tables involved in your pipelines for any modifications to their structure (e.g., new columns, deleted columns, data type changes). This helps you stay informed about potential impacts on your data pipelines and warehouse. When a schema change is detected, it is recorded, indicating the table, the modified columns, and the nature of the change (e.g., data type, size).
Updating the orchestration settings
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration.
Viewing pipeline details of an orchestration
From the Pipelines tab of an orchestration, you can view the specifics of each pipeline within an orchestration. Click a pipeline link to access its comprehensive details, including its data source and destination. Furthermore, you can find information about the pipeline's creator and its designated stage order. For granular insights, the status and job stage details are readily available. To explore the data lineage, scroll right and click the more options icon.
Viewing the orchestration graph
The Graphs tab within an orchestration provides a clear visual representation of the pipeline's execution order, illustrating the sequence of stages. To view the orchestration graph do the following:
Viewing the orchestration run details
The Runs tab provides a comprehensive view of each orchestration execution. Here, you can readily see details such as the run's name, the user who initiated it, the start and end times, the total duration, and the current status (e.g., completed, failed). To investigate data lineage, click the more options icon. Additionally, clicking the View Snapshot link offers a detailed, point-in-time overview of that specific run.