Skip to content

File

Apache Avro logo Apache Avro Apache Hudi logo Apache Hudi Apache Iceberg logo Apache Iceberg Apache ORC logo Apache ORC Apache Parquet logo Apache Parquet CSV logo CSV Delta Lake logo Delta Lake

Attribute Apache Avro Apache Hudi Apache Iceberg Apache ORC Apache Parquet CSV Delta Lake
Name Apache Avro Apache Hudi Apache Iceberg Apache ORC Apache Parquet CSV Delta Lake
Description Apache Avro is the leading serialization format for record data, and first choice for streaming data pipelines. Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Utilises data stored in either parquet or orc. Iceberg is a high-performance format for huge analytic tables. Utilises data stored in either parquet, avro, or orc. ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. Comma-Separated Values (CSV) is a text file format that uses commas to separate values in plain text. Delta Lake is an open-source storage framework that enables building a Lakehouse architecture.
License Apache license 2.0 Apache license 2.0 Apache license 2.0 Apache license 2.0 Apache license 2.0 N/A Apache license 2.0
Source code https://github.com/apache/avro https://github.com/apache/hudi https://github.com/apache/iceberg https://github.com/apache/orc https://github.com/apache/parquet-format https://github.com/delta-io/delta
Website https://avro.apache.org/ https://hudi.apache.org/ https://iceberg.apache.org/ https://orc.apache.org/ https://parquet.apache.org/ https://www.rfc-editor.org/rfc/rfc4180.html https://delta.io/
Year created 2009 2016 2017 2013 2013 0 2019
Company Apache Uber Netflix Hortonworks, Facebook Twitter, Cloudera Databricks
Language support java, c++, c#, c, python, javascript, perl, ruby, php, rust java, scala, c++, python java, scala, c++, python, r, php java, scala, c++, python, r, php, go scala, java, python, rust
Use cases Stream processing, Analytics, Efficient data exchange Incremental data processing, Data upserts, Change Data Capture (CDC), ACID transactions Write once read many, Analytics, Efficient storage, ACID transactions Write once read many, Analytics, Efficient storage, ACID transactions Write once read many, Analytics, Efficient storage, Column based queries Write once read many, Analytics, Efficient storage, ACID transactions
Is human readable
no
no
no
no
no
yes
no
Orientation row column or row column or row row column row column
Has type system
yes
yes
yes
yes
yes
no
yes
Has nested structure support
yes
yes
yes
yes
yes
no
yes
Has native compression
yes
yes
yes
yes
yes
no
yes
Has encoding support
yes
yes
yes
yes
yes
no
yes
Has constraint support
no
yes
no
no
no
no
yes
Has acid support
no
yes
yes
no
no
no
yes
Has metadata
yes
yes
yes
yes
yes
no
yes
Has encryption support
no
maybe
maybe
yes
yes
no
maybe
Data processing framework support Apache Flink, Apache Gobblin, Apache NiFi, Apache Pig, Apache Spark, Apache Spark, Apache Flink, Apache Drill, Apache Flink, Apache Gobblin, Apache Pig, Apache Spark, Apache Flink, Apache Gobblin, Apache Hadoop, Apache NiFi, Apache Pig, Apache Spark, Apache Beam, Apache Drill, Apache Flink, Apache Spark, Apache Beam, Apache Drill, Apache Flink, Apache Gobblin, Apache Hive, Apache NiFi, Apache Pig, Apache Spark, Apache Drill, Apache Flink, Apache Spark,
Analytics query support Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt, Apache Hive, Apache Impala, AWS Athena, BigQuery, Clickhouse, Presto, Trino, Apache Impala, Apache Druid, Apache Hive, AWS Athena, BigQuery, Clickhouse, Dremio, DuckDB, Presto, Trino, Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt, Presto, Trino, Apache Hive, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt, Apache Hive, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, Presto, Trino,