Getting Started
Our goal
In this guide, our goal will be to stream information about flights from an imaginary airport to a file. The flight records contain just the airline name and the scheduled departure time.
Install Conduit
If you're using a macOS or Linux system, you can install Conduit with the following command:
$ curl https://conduit.io/install.sh | bash
If you're not using macOS or Linux system, you can still install Conduit following one of the different options provided in our installation page.
The Conduit binary contains both, the Conduit service and the Conduit CLI, with which you can interact with Conduit.
Initialize Conduit
Firs, let's initialize the working environment:
$ conduit init
Created directory: processors
Created directory: connectors
Created directory: pipelines
Configuration file written to conduit.yaml
Conduit has been initialized!
To quickly create an example pipeline, run 'conduit pipelines init'.
To see how you can customize your first pipeline, run 'conduit pipelines init --help'.
conduit init
creates the directories where you can put your pipeline
configuration files, connector binaries, and processor binaries. There's also a
conduit.yaml
that contains all the configuration parameters that Conduit
supports.
In this guide, we'll only use the pipelines
directory, since we won't need to
install any additional connector nor to change Conduit's default configuration.
Build a pipeline
Next, we can use the Conduit CLI to build the example pipeline:
$ conduit pipelines init
conduit pipelines init
builds an example that generates flight information
from an imaginary airport every second. Use conduit pipelines init --help
to
learn how to customize the pipeline.
If the pipelines
directory, you'll notice a new file,
pipeline-generator-to-file.yaml
that contains our pipeline's configuration:
version: "2.2"
pipelines:
- id: example-pipeline
status: running
name: "generator-to-file"
connectors:
- id: example-source
type: source
plugin: "generator"
settings:
# Generate field 'airline' of type string
# Type: string
# Optional
format.options.airline: 'string'
# Generate field 'scheduledDeparture' of type 'time'
# Type: string
# Optional
format.options.scheduledDeparture: 'time'
# The format of the generated payload data (raw, structured, file).
# Type: string
# Optional
format.type: 'structured'
# The maximum rate in records per second, at which records are
# generated (0 means no rate limit).
# Type: float
# Optional
rate: '1'
- id: example-destination
type: destination
plugin: "file"
settings:
# Path is the file path used by the connector to read/write records.
# Type: string
# Optional
path: './destination.txt'
The configuration above tells us some basic information about the pipeline (ID
and name) and that we want Conduit to start the pipeline automatically (
status: running
).
Then we see a source connector, that uses the
generator
plugin,
which is a built-in plugin that can generate random data. The source connector's
settings translate into: generate structured data, 1 record per second. Each
generated record should contain an airline
field (type: string) and a
scheduledDeparture
field (type: duration).
What follows is a destination connector where the data will be written to. It
uses the file
plugin, which is a built-in plugin that writes all the incoming
data to a file. It has only one configuration parameter, which is the path to
the file where the records will be written.
Run Conduit
With the pipeline configuration being ready, we can run Conduit:
$ conduit --pipelines.path pipelines/
Conduit is now running the pipeline. Let's check the contents of the destination.txt
using:
tail -f destination.txt | jq
Every second, you should a JSON object like this:
{
"position": "MjU=",
"operation": "create",
"metadata": {
"conduit.source.connector.id": "example-pipeline:example-source",
"opencdc.createdAt": "1730801194148460912",
"opencdc.payload.schema.subject": "example-pipeline:example-source:payload",
"opencdc.payload.schema.version": "1"
},
"key": "cHJlY2VwdG9yYWw=",
"payload": {
"before": null,
"after": {
"airline": "wheelmaker",
"scheduledDeparture": "2024-11-05T10:06:34.148469Z"
}
}
}
The JSON object you see is the OpenCDC record that
holds the data being streamed as well as other data and metadata. In the
.payload.after
field you will see the user data that was generated by the
generator
connector:
{
"airline": "wheelmaker",
"scheduledDeparture": "2024-11-05T10:06:34.148469Z"
}
The pipeline will keep streaming the data from the generator source connector to
the file destination connector as long as Conduit is running. To stop Conduit,
press Ctrl + C
(on a Linux OS, or the equivalent on other operating systems). This will
trigger a graceful shutdown that stops reads from source connectors and waits
for records that are still in the pipeline to be acknowledged. The next time
Conduit starts, it will start reading data from where it stopped.
What's next?
Now that you've got the basics of running Conduit and creating a pipeline covered, here are a few places to dive in deeper:
Or, if you want to experiment a bit more, check out the examples in our GitHub repository.