GeistHaus
log in · sign up

Encodings

parquet.apache.org

Parquet encoding definitions This file contains the specification of all supported encodings. Unless otherwise stated in page or encoding documentation, any encoding can be used with any page type. Supported Encodings For details on current implementation status, see the Implementation Status page. Encoding type Encoding enum Supported Types Plain PLAIN = 0 All Physical Types Dictionary Encoding PLAIN_DICTIONARY = 2 (Deprecated) RLE_DICTIONARY = 8 All Physical Types Run Length Encoding / Bit-Packing Hybrid RLE = 3 BOOLEAN, Dictionary Indices Delta Encoding DELTA_BINARY_PACKED = 5 INT32, INT64 Delta-length byte array DELTA_LENGTH_BYTE_ARRAY = 6 BYTE_ARRAY Delta Strings DELTA_BYTE_ARRAY = 7 BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY Byte Stream Split BYTE_STREAM_SPLIT = 9 INT32, INT64, FLOAT, DOUBLE, FIXED_LEN_BYTE_ARRAY Deprecated Encodings Encoding type Encoding enum Bit-packed (Deprecated) BIT_PACKED = 4

8 pages link to this URL
Querying Parquet with Millisecond Latency

Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet's open format and broad ecosystem support make it the obvious choice for a wide class of data systems. In this article we explain several advanced techniques needed to query data stored in the Parquet format quickly that we implemented in the Apache Arrow Rust Parquet reader. Together these techniques make…

4 inbound links article en