Skip to main content

Getting Started

Our goal

In this guide, our goal will be to stream information about flights from an imaginary airport to a file. The flight records contain just the airline name and the scheduled departure time.

Install Conduit

If you're using a macOS or Linux system, you can install Conduit with the following command:

$ curl https://conduit.io/install.sh | bash

If you're not using macOS or Linux system, you can still install Conduit following one of the different options provided in our installation page.

note

The Conduit binary contains both, the Conduit service and the Conduit CLI, with which you can interact with Conduit.

Initialize Conduit

Firs, let's initialize the working environment:

$ conduit init

Created directory: processors
Created directory: connectors
Created directory: pipelines
Configuration file written to conduit.yaml

Conduit has been initialized!

To quickly create an example pipeline, run 'conduit pipelines init'.
To see how you can customize your first pipeline, run 'conduit pipelines init --help'.

conduit init creates the directories where you can put your pipeline configuration files, connector binaries, and processor binaries. There's also a conduit.yaml that contains all the configuration parameters that Conduit supports.

In this guide, we'll only use the pipelines directory, since we won't need to install any additional connector nor to change Conduit's default configuration.

Build a pipeline

Next, we can use the Conduit CLI to build the example pipeline:

$ conduit pipelines init

conduit pipelines init builds an example that generates flight information from an imaginary airport every second. Use conduit pipelines init --help to learn how to customize the pipeline.

If the pipelines directory, you'll notice a new file, pipeline-generator-to-file.yaml that contains our pipeline's configuration:

version: "2.2"
pipelines:
- id: example-pipeline
status: running
name: "generator-to-file"
connectors:
- id: example-source
type: source
plugin: "generator"
settings:
# Generate field 'airline' of type string
# Type: string
# Optional
format.options.airline: 'string'
# Generate field 'scheduledDeparture' of type 'time'
# Type: string
# Optional
format.options.scheduledDeparture: 'time'
# The format of the generated payload data (raw, structured, file).
# Type: string
# Optional
format.type: 'structured'
# The maximum rate in records per second, at which records are
# generated (0 means no rate limit).
# Type: float
# Optional
rate: '1'
- id: example-destination
type: destination
plugin: "file"
settings:
# Path is the file path used by the connector to read/write records.
# Type: string
# Optional
path: './destination.txt'

The configuration above tells us some basic information about the pipeline (ID and name) and that we want Conduit to start the pipeline automatically ( status: running).

Then we see a source connector, that uses the generator plugin, which is a built-in plugin that can generate random data. The source connector's settings translate into: generate structured data, 1 record per second. Each generated record should contain an airline field (type: string) and a scheduledDeparture field (type: duration).

What follows is a destination connector where the data will be written to. It uses the file plugin, which is a built-in plugin that writes all the incoming data to a file. It has only one configuration parameter, which is the path to the file where the records will be written.

Run Conduit

With the pipeline configuration being ready, we can run Conduit:

$ conduit --pipelines.path pipelines/

Conduit is now running the pipeline. Let's check the contents of the destination.txt using:

tail -f destination.txt | jq

Every second, you should a JSON object like this:

{
"position": "MjU=",
"operation": "create",
"metadata": {
"conduit.source.connector.id": "example-pipeline:example-source",
"opencdc.createdAt": "1730801194148460912",
"opencdc.payload.schema.subject": "example-pipeline:example-source:payload",
"opencdc.payload.schema.version": "1"
},
"key": "cHJlY2VwdG9yYWw=",
"payload": {
"before": null,
"after": {
"airline": "wheelmaker",
"scheduledDeparture": "2024-11-05T10:06:34.148469Z"
}
}
}

The JSON object you see is the OpenCDC record that holds the data being streamed as well as other data and metadata. In the .payload.after field you will see the user data that was generated by the generator connector:

{
"airline": "wheelmaker",
"scheduledDeparture": "2024-11-05T10:06:34.148469Z"
}

The pipeline will keep streaming the data from the generator source connector to the file destination connector as long as Conduit is running. To stop Conduit, press Ctrl + C (on a Linux OS, or the equivalent on other operating systems). This will trigger a graceful shutdown that stops reads from source connectors and waits for records that are still in the pipeline to be acknowledged. The next time Conduit starts, it will start reading data from where it stopped.

What's next?

Now that you've got the basics of running Conduit and creating a pipeline covered, here are a few places to dive in deeper:

Or, if you want to experiment a bit more, check out the examples in our GitHub repository.

scarf pixel conduit-site-docs-getting-started