Skip to content

File

Apache Hudi logo Apache Hudi Apache Parquet logo Apache Parquet

Attribute Apache Hudi Apache Parquet
Name Apache Hudi Apache Parquet
Description Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Utilises data stored in either parquet or orc. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
Source code https://github.com/apache/hudi https://github.com/apache/parquet-format
Website https://hudi.apache.org/ https://parquet.apache.org/
License Apache license 2.0 Apache license 2.0
Year created 2016 2013
Company Uber Twitter, Cloudera
Use cases Incremental data processing, Data upserts, Change Data Capture (CDC), ACID transactions Write once read many, Analytics, Efficient storage, Column based queries
Language support java, scala, c++, python, r, php
Is human readable
no
no
Orientation column or row column
Has type system
yes
yes
Has nested structure support
yes
yes
Has native compression
yes
yes
Has encoding support
yes
yes
Has constraint support
yes
no
Has acid support
yes
no
Has metadata
yes
yes
Has encryption support
maybe
yes
Data processing framework support Apache Spark, Apache Flink, Apache Beam, Apache Drill, Apache Flink, Apache Spark,
Analytics query support Apache Hive, Apache Impala, AWS Athena, BigQuery, Clickhouse, Presto, Trino, Apache Hive, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt,