Skip to content

Iceberg Integration

NeorunBase integrates with Apache Iceberg, enabling automatic synchronization of transactional data to an open lakehouse format. This allows downstream analytics engines such as Apache Spark, Trino, and Hive to query NeorunBase data directly.

Automatic Data Synchronization

NeorunBase automatically syncs table data to Iceberg tables in the background:

  • Initial sync: A full snapshot of the table is exported as Parquet files to S3-compatible object storage and registered in the Iceberg catalog.
  • Incremental sync: After the initial sync, only the changes (inserts, updates, deletes) are synchronized incrementally, minimizing the overhead.

Iceberg Catalog Support

NeorunBase connects to any Iceberg REST catalog (e.g., Polaris, Nessie) with support for:

  • OAuth2 client credentials authentication
  • Static bearer token authentication

Open Lakehouse Analytics

Once data is synced to Iceberg, it can be queried by any engine that supports the Iceberg table format:

  • Apache Spark: Batch and streaming analytics
  • Trino: Interactive SQL queries
  • Apache Hive: Data warehousing workloads
  • Apache Flink: Stream processing

External Iceberg Table Queries

NeorunBase can also read data from external Iceberg tables. This allows you to query data stored in Iceberg (Parquet, ORC, Avro formats) directly from NeorunBase using standard SQL, bridging the gap between the transactional and analytical worlds.

S3-Compatible Storage

Iceberg data files are stored in S3-compatible object storage, supporting AWS S3, MinIO, and other S3-compatible services.