18 docs tagged with "Connector"

About Data sources

Data sources are essentially the points where your raw information is stored. These can range from traditional databases like PostgreSQL and MySQL to modern cloud warehouses such as Snowflake or object storage systems like Amazon S3. The data sources module is designed for simple integration, allowing you to connect to a wide variety of systems, including REST APIs, MongoDB, and other platforms. The connection process is streamlined, using minimal credentials and automatically discovering the structure of your data as soon as you connect. After you are connected, you can perform several key actions to get the most out of your data.

About pipelines

Data pipeline is a structured workflow that facilitates the movement and transformation of data from various data sources into your central warehouse and also in another data source. A pipeline can be compared to a conveyor belt that picks up data, performs specific actions on it, and then deposits it in its designated location.

Adding members to an orchestration

You can add users to an orchestration. If they do not already have access to associated pipelines, they can be granted permissions. Only Lakehouse members can be added, and they will receive creator permissions by default.

Agent Ops

Agent ops is a feature designed for company administrators or super admins to monitor, manage, and analyze the usage and cost associated with your company's large language model (LLM) providers (e.g., OpenAI, Anthropic, Gemini).

Creating orchestrations

Orchestration in DataGOL allows you to define and manage the execution order of multiple pipelines. This is crucial when you have dependencies between pipelines, where one pipeline needs to complete successfully before another can begin. Without orchestration, scheduling pipelines to run independently might lead to failures or unexpected results if a dependent pipeline runs before its required data is ready.

Deleting an orchestration

To delete an orchestration, do the following:

Designating a partition column

To designate a partition column, do the following:

Editing an orchestration

To edit an orchestration, do the following:

Generating column metadata catalog with AI

To generate the column metadata catalog using AI, do the following:

Jobs

Navigate to Jobs: Under the Lakehouse section, go to Jobs.

Managing Data sources

- Data sources can be sorted by creation date, displaying either the latest or oldest entries.

Managing orchestrations

You can manage and monitor the orchestrations from the Orchestrations page. The main Orchestration page display a list of all created orchestrations.

Running an orchestration

1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration. The list of created orchestrations is displayed.

Schema change detection

DataGOL's Schema Change Detection feature monitors the source tables involved in your pipelines for any modifications to their structure (e.g., new columns, deleted columns, data type changes). This helps you stay informed about potential impacts on your data pipelines and warehouse. When a schema change is detected, it is recorded, indicating the table, the modified columns, and the nature of the change (e.g., data type, size).

Updating the orchestration settings

1. On the DataGOL Home page, from the left navigation panel, click Lakehouse > Orchestration.

Viewing pipeline details of an orchestration

From the Pipelines tab of an orchestration, you can view the specifics of each pipeline within an orchestration. Click a pipeline link to access its comprehensive details, including its data source and destination. Furthermore, you can find information about the pipeline's creator and its designated stage order. For granular insights, the status and job stage details are readily available. To explore the data lineage, scroll right and click the more options icon.

Viewing the orchestration graph

The Graphs tab within an orchestration provides a clear visual representation of the pipeline's execution order, illustrating the sequence of stages. To view the orchestration graph do the following:

Viewing the orchestration run details

The Runs tab provides a comprehensive view of each orchestration execution. Here, you can readily see details such as the run's name, the user who initiated it, the start and end times, the total duration, and the current status (e.g., completed, failed). To investigate data lineage, click the more options icon. Additionally, clicking the View Snapshot link offers a detailed, point-in-time overview of that specific run.