Skip to content

Kafka Integration

Ontul supports Apache Kafka as both a streaming source and sink, enabling real-time data pipelines within the same cluster used for batch processing and interactive SQL.

Streaming Source

Ontul consumes messages from Kafka topics with support for multiple formats:

  • JSON: Parse JSON messages with automatic schema inference or a provided schema
  • Avro: Deserialize Avro messages using Schema Registry integration
  • Raw: Access raw key, value, topic, partition, offset, and timestamp fields

Each consumer group runs as an independent long-lived streaming job on a Worker node.

Streaming Sink

Processed data can be written back to Kafka topics:

  • Arrow rows are serialized to JSON and produced to the target topic
  • Optional key field for partitioning
  • Batched writes with LZ4 compression
  • Kafka producer properties are fully configurable

Use Cases

Real-Time Ingestion

Consume events from Kafka and write directly into Iceberg tables for near-real-time analytics:

Kafka Topic → Ontul Streaming Job → Iceberg Table

Stream Processing

Read from one Kafka topic, transform data with SQL or SDK operations, and write to another:

Kafka Input Topic → Ontul Processing → Kafka Output Topic

Job Management

Kafka streaming jobs are managed through the Admin UI or REST API:

  • Start and stop consumer groups
  • Monitor ingestion progress and throughput
  • View real-time job logs
  • Configure consumer properties (group ID, offset reset, poll timeout, batch size)