Skip to content

File

Apache Parquet logo Apache Parquet Apache Avro logo Apache Avro

Attribute Apache Parquet Apache Avro
Name Apache Parquet Apache Avro
Description Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. Apache Avro is the leading serialization format for record data, and first choice for streaming data pipelines.
License Apache license 2.0 Apache license 2.0
Source code https://github.com/apache/parquet-format https://github.com/apache/avro
Website https://parquet.apache.org/ https://avro.apache.org/
Year created 2013 2009
Company Twitter, Cloudera Apache
Language support java, scala, c++, python, r, php java, c++, c#, c, python, javascript, perl, ruby, php, rust
Use cases Write once read many, Analytics, Efficient storage, Column based queries Stream processing, Analytics, Efficient data exchange
Is human readable
no
no
Orientation column row
Has type system
yes
yes
Has nested structure support
yes
yes
Has native compression
yes
yes
Has encoding support
yes
yes
Has constraint support
no
no
Has acid support
no
no
Has metadata
yes
yes
Has encryption support
yes
no
Data processing framework support Apache Beam, Apache Drill, Apache Flink, Apache Spark, Apache Flink, Apache Gobblin, Apache NiFi, Apache Pig, Apache Spark,
Analytics query support Apache Hive, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt, Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt,