What is Chango?

Chango is unified data lakehouse platform to solve the problems which occur in your data area, which can be installed either in online/public or in offline/disconnected environment.

Chango provides popular open source engines like spark, trino, kafka and iceberg as lakehouse table format and several chango specific components.

Chango Data Lakehouse Platform

In Ingestion layer:

Spark and Trino with Chango Query Exec will be used as data integration tool.
Kafka is used as event streaming platform to handle streaming events.
Chango Ingestion will be used to insert incoming streaming events to Chango directly.

In Storage layer:

Chango supports Apache Ozone as object storage by default and external S3 compatible object storage like AWS S3, MinIO, OCI Object Storage.
Data lakehouse format is Iceberg table format in Chango.

In Transformation layer:

Spark and Trino with Chango Query Exec will be used to run ETL jobs.

In Analytics layer:

Trino is used as query engine to explore all the data in Chango.
BI tools like Apache Superset will connect to Trino to run queries through Chango Trino Gateway.

In Management layer:

Azkaban is used as workflow. All the batch jobs like ETL can be integrated with Azkaban.
Chango REST Catalog is Iceberg REST Catalog and used as data catalog in Chango.
Chango supports storage security to control data access based on RBAC in Chango. Chango Authorizer will be used for it.
Chango Trino Gateway is an implementation of Trino Gateway concept. Chango Trino Gateway provides several features like authentication, authorization, smart query routing(routing to less exhausted trino clusters), trino cluster activation/deactivation. For more details, see Chango Trino Gateway.
Chango Spark SQL Runner exposes REST API to which clients send spark sql queries using REST to execute spark queries.
Chango Spark Thrift Server exposes JDBC/Thrift to which clients send spark sql queries using JDBC/Thrift to execute spark queries.

Chango Architecture From the Point of Use Cases

This picture above shows Chango architecture from the point of use cases in data lakehouses.

Data Exploration

Users can run trino and spark sql queries like ETL queries and interactive queries through Superset which connects to Chango Trino Gateway and Chango Spark Thrift Server.

ETL Query Jobs with Workflow Engine

All the ETL query jobs will be integrated and scheduled with Azkaban. Trino ETL queries and spark SQL ETL query jobs will be processed periodically by Azkaban. ETL queries will be sent to Chango Query Exec through REST, and ETL queries will be executed through Chango Trino Gateway by Trino and Chango Spark Thrift Server by Spark.

Realtime Analytics

CDC data, for example, PostgreSQL CDC data will be captured by Chango CDC which will send it to Chango Streaming Ingestion(Chango Data API + Kafka + Chango Spark Streaming) through REST. Incoming streaming events will be inserted into iceberg table.
Log files will be read by Chango Log which will send it to Chango Streaming Ingestion through REST.
Streaming events generated by Applications will be sent to Chango Streaming Ingestion through REST.