GeistHaus
log in · sign up

Querying Parquet with Millisecond Latency

arrow.apache.org

Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet's open format and broad ecosystem support make it the obvious choice for a wide class of data systems. In this article we explain several advanced techniques needed to query data stored in the Parquet format quickly that we implemented in the Apache Arrow Rust Parquet reader. Together these techniques make…

4 pages link to this URL
Supercharging S3 Intelligent Tiering with Content Crush

Scribd and Slideshare have been using AWS S3 for almost twenty years and store hundreds of billions of objects making storage management quite a challenge. My focus at Scribd has generally been around data and storage but only in the past twelve months have I started to really focus on one of our hardest technology problems: cost-effective storage and availability for the hundreds of billions of objects that represent our content library.

3 inbound links article en CC BY-SA 4.0
2026 March: Recently Studied Stuff

Over the past week I have made a more conscious effort to keep track of some really interesting articles that came through my feed reader. I am a big fan of the open web and the power of RSS for disseminating interesting information from actual people. Below are some really interesting posts I have read recently!

0 inbound links article en rssarrowparquetrust