3 docs tagged with "Pipelines"

About orchestration

Orchestration in DataGOL allows you to define and manage the execution order of multiple pipelines. This is crucial when you have dependencies between pipelines, where one pipeline needs to complete successfully before another can begin. Without orchestration, scheduling pipelines to run independently might lead to failures or unexpected results if a dependent pipeline runs before its required data is ready.

Creating Dedup pipeline

1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Pipelines.

Lakehouse workflow

1. Connecting to data Sources: The journey begins with establishing connections to various data sources. Think of these as the starting points where your raw data lives. These could be relational databases like SQL Server, PostgreSQL, cloud data warehouses like Redshift, or even file storage systems such as S3 or Azure blob.. The crucial function here is to enable the Lakehouse to access the data that needs to be processed and analyzed.