Skip to content

File

Apache Avro logo Apache Avro Apache Parquet logo Apache Parquet

Attribute Apache Avro Apache Parquet
Name Apache Avro Apache Parquet
Description Apache Avro is the leading serialization format for record data, and first choice for streaming data pipelines. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
License Apache license 2.0 Apache license 2.0
Source code https://github.com/apache/avro https://github.com/apache/parquet-format
Website https://avro.apache.org/ https://parquet.apache.org/
Year created 2009 2013
Company Apache Twitter, Cloudera
Language support java, c++, c#, c, python, javascript, perl, ruby, php, rust java, scala, c++, python, r, php
Use cases Stream processing, Analytics, Efficient data exchange Write once read many, Analytics, Efficient storage, Column based queries
Is human readable
no
no
Orientation row column
Has type system
yes
yes
Has nested structure support
yes
yes
Has native compression
yes
yes
Has encoding support
yes
yes
Has constraint support
no
no
Has acid support
no
no
Has metadata
yes
yes
Has encryption support
no
yes
Data processing framework support Apache Flink, Apache Gobblin, Apache NiFi, Apache Pig, Apache Spark, Apache Beam, Apache Drill, Apache Flink, Apache Spark,
Analytics query support Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt, Apache Hive, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt,