Using the Standard Data Ingestion API
To push data to the Celonis Platform using the Standard Data Ingestion API, you'll need to either create a data pool or use an existing data pool. For more information about data pools, see: Creating and managing data pools.
With access to a data pool:
-
From your data pool diagram, click
Data Connections
.
-
Click
Add Data Connection
and then select
Push data into Celonis
.
-
Define a connection name and the target table names you want to push data into.
When using multiple tables: Add these into the connection configuration. In the same step, define the data structure to use. There are two types:
-
Flat -
Select this option if the data you're pushing is in a flat format. If you proceed with the Flat type, you also need to define:
- Primary key: This is used to properly handle delta loads and is similar to defining a primary key in a regular extraction. This primary key must also be part of the pushed data every time or the delivery will fail.
-
Age column:
This column is used to guarantee that the most recent version of a record (identified by its primary key) is stored in Celonis. Usually this is a date column and if a record with a value smaller than the existing value gets inserted, the update won’t be applied. However, values with equal or larger values will be applied accordingly.
For a case where you are pushing data into one target table called order , which has order_id as primary key and a timestamp reflecting when the record was created ( created_at ), your final configuration would look like this:
-
Nested -
This option needs to be selected when the data you want to push is in a nested format (e.g. JSON). In this case, you need to configure the table schema so that the nested tables can be constructed and the appropriate primary keys can be retrieved. To do this, click
Options > Configure Table Schema
:
To define the schema, we suggest using one exemplary record from your data and copy and paste it into the window on the left side.
In this example, we are using an order with some information such as id, date, price info, and one nested table with regards to shipments. Once the example record gets pasted into the left side of the window, we automatically derive the target table structure with the corresponding column names on the right side of the window:
By default, a column is added called celonisid (indicated by a tooltip) in case no primary key has been configured. However, in this example we define:
-
The
order_id
to be the primary key by selecting the relevant checkbox. Once selected, two things will happen automatically:
- The autogenerated celonisid will disappear from the configuration.
- The order_id gets created as a foreign key on the child table so users can join these two tables later on.
- The order_date column as the age column.
As a result, the configuration will look like this:
If you click Finish, the Primary Key and Age column defined as part of the schema configuration will automatically be applied.
-
The
order_id
to be the primary key by selecting the relevant checkbox. Once selected, two things will happen automatically:
-
Flat -
Select this option if the data you're pushing is in a flat format. If you proceed with the Flat type, you also need to define:
- Save the configuration and note down the Access Key and Access Secret . Those are the credentials that you’ll need for the API request to authenticate to Celonis when pushing data.
-
Go to your console and create an AWS profile via the following command:
aws configure --profile profileName
You will need:
- To insert the Access Key and Secret retrieved in the previous step.
- The AWS region. This depends on the cluster your Celonis tenant is running on. For example, for eu-1, this would be eu-central-1 .
- To add the default data format, which is always parquet.
-
To push data, you can call the
AWS S3 API via the copy (cp) command
and provide the file(s) you want to push.
A sample API call for pushing a single file called shipment.parquet into a target table called Orders would look like this:
aws s3 cp shipment.parquet --endpoint-url https://dev.eu-1.celonis.cloud/api/data-ingestion s3://continuous/connection/9c92424b-829d-4beb-a65c-411873b268f8/Orders/ --profile profileName
This comprises:- aws s3 cp: This is the standard AWS API call to copy objects/files to a S3 bucket.
- shipment.parquet: This is the name of the parquet file that gets copied to the bucket.
-
--endpoint-url https://dev.eu-1.celonis.cloud/api/data-ingestion:
The S3 bucket is behind a Celonis specific url to relate it to the Celonis team and cluster. In this case the team name is
dev
and it’s on the
eu-1
cluster. More generically the call looks like this:
--endpoint-url https://teamName.cluster.celonis.cloud/api/data-ingestion
-
s3://continuous/connection/9c92424b-829d-4beb-a65c-411873b268f8/order/:
This is the S3 bucket where data gets pushed to, where:
- The 32-digit id is the connection ID that can be found in the url of the Data Connection you created in step 1.
- Order relates to the target table into which that data should get pushed. If you’ve configured multiple tables in step 2 you need to have different API calls to reference the respective target table.
- --profile profileName: This is the AWS profile you’ve configured as part of step 4.
-
After executing the API call, the AWS API will return the response code indicating whether the upload was successful or not. In case it has been successful, the response will look like this for the sample file:
upload: ./shipment.parquet to s3://continuous/connection/9c92424b-829d-4beb-a65c-411873b268f8/Orders/shipment.parquet