About Lakehouse
Lakehouse provides a unified platform for storing diverse data (structured, semi-structured, unstructured) and performing advanced analytics. Key capabilities include:
Lakehouse provides a unified platform for storing diverse data (structured, semi-structured, unstructured) and performing advanced analytics. Key capabilities include:
Data pipeline is a structured workflow that facilitates the movement and transformation of data from various data sources into your central warehouse and also in another data source. A pipeline can be compared to a conveyor belt that picks up data, performs specific actions on it, and then deposits it in its designated location.
You can add users to an orchestration. If they do not already have access to associated pipelines, they can be granted permissions. Only Lakehouse members can be added, and they will receive creator permissions by default.
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Pipelines.
Orchestration in DataGOL allows you to define and manage the execution order of multiple pipelines. This is crucial when you have dependencies between pipelines, where one pipeline needs to complete successfully before another can begin. Without orchestration, scheduling pipelines to run independently might lead to failures or unexpected results if a dependent pipeline runs before its required data is ready.
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Pipelines.
To delete an orchestration, do the following:
To edit an orchestration, do the following:
Navigate to Jobs: Under the Lakehouse section, go to Jobs.
You can manage and monitor the orchestrations from the Orchestrations page. The main Orchestration page display a list of all created orchestrations.
You can manage and monitor the pipelines from the Pipelines page.
Click the link of a pipeline in the Pipeline list for a detailed view.
While creating a pipelilne, you can choose any of the following sync mode options:
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration. The list of created orchestrations is displayed.
1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration.
From the Pipelines tab of an orchestration, you can view the specifics of each pipeline within an orchestration. Click a pipeline link to access its comprehensive details, including its data source and destination. Furthermore, you can find information about the pipeline's creator and its designated stage order. For granular insights, the status and job stage details are readily available. To explore the data lineage, scroll right and click the more options icon.
The Graphs tab within an orchestration provides a clear visual representation of the pipeline's execution order, illustrating the sequence of stages. To view the orchestration graph do the following:
The Runs tab provides a comprehensive view of each orchestration execution. Here, you can readily see details such as the run's name, the user who initiated it, the start and end times, the total duration, and the current status (e.g., completed, failed). To investigate data lineage, click the more options icon. Additionally, clicking the View Snapshot link offers a detailed, point-in-time overview of that specific run.