Talk "When Fire(bolt) meets Ice(berg)"

The Future is Fast, Open, and Deployable Anywhere: When Fire(bolt) meets Ice(berg) by Georg Kreuzmayr was presented at Munich Datageeks - May Edition 2025

Abstract

Analytics and AI-driven applications demand both scalable and low-latency data access. However, data infrastructure capable of meeting these requirements often suffers from prohibitive costs or complexity. Moreover, industry best practice dictates that companies should avoid vendor lock-in by adopting open table formats like Apache Iceberg and Delta Lake. Firebolt aims to address this challenge with Firebolt Core: a free, self-hosted version of our data warehouse, distributed as a lightweight Docker image. Deployable anywhere – from a single laptop to enterprise data centers – it is forever free to use. In this talk we take a look under the hood of Firebolt Core and dive into the challenges we had to solve to bring a free-to-use production-grade data warehouse to the market. Furthermore, you will learn about our efforts to support high-performance, low-latency analytics on the lakehouse architecture.

About the speaker

Georg Kreuzmayr is a Software Engineer at Firebolt. As part of the Query Processing team, he works on bringing Firebolt's low-latency capabilities to the open lakehouse architecture. Georg studied computer science at the Technical University of Munich, where he got inspired to pursue a career in databases by exciting university courses and research projects.

Transcript summary

Introduction and Speaker Background

The talk was presented by two Firebolt software engineers at a Munich Datageeks event. Damian completed his PhD at KIT (Karlsruhe Institute of Technology) before joining Firebolt, while his colleague completed bachelor's and master's degrees at TU Munich. The presentation focuses on Firebolt's integration with Apache Iceberg and their approach to solving low-latency data analytics challenges.

Core Problem: Low-Latency Analytics is Not Trivial

Building responsive applications that analyze data at low latency cannot be taken for granted. It is remarkably easy to build systems that fail to scale properly. The speakers emphasized that analyzing data quickly requires careful architectural decisions and optimization techniques.

Firebolt's Target Use Cases

Firebolt is specifically built for analytical queries with several key characteristics:

Large-scale data analytics - Processing substantial volumes of data efficiently
Repeated query patterns - Optimizing for queries that follow similar structures, such as user-facing applications with filter-based searches where the query pattern remains consistent even as specific parameters change
High concurrency - Serving many queries simultaneously
Selective data access - Scenarios where large datasets are stored but only small subsets are needed to answer specific queries

Firebolt's Technical Approach

Columnar Query Engine Firebolt employs a columnar storage and query engine, which is essential for analytical workloads.

Decoupled Storage and Compute Following modern architectural principles, storage and compute are separated into distinct components, enabling independent scaling and optimization.

Sub-result Caching A sophisticated caching system that can reuse parts of previously computed queries to accelerate subsequent queries with similar patterns. This is particularly effective for repeated query patterns.

Elasticity Across Multiple Dimensions

Horizontal scaling (scale-out) - Using multiple servers simultaneously to compute queries
Vertical scaling (scale-up) - Varying server sizes from small, cost-effective instances to the largest available AWS instances
Multi-cluster operation - Running multiple clusters concurrently with automatic query routing to available clusters while maintaining ACID-compliant results

Indexing and Pruning Pre-computing parts of queries through indexing accelerates query execution. Pruning ensures that only necessary data is scanned, minimizing unnecessary work.

Performance and Cost Advantages

Customer query benchmarks demonstrate that Firebolt achieves significantly lower latency compared to other systems. While competitors can sometimes reduce latency by scaling up their systems (with BigQuery being an exception with different scaling characteristics), the critical differentiator is cost. Firebolt delivers the low latency needed for responsive applications at substantially lower costs.

Apache Iceberg Integration

Why Iceberg Matters Iceberg represents a paradigm shift by decoupling the writing system from the reading system. This separation allows specialized systems to focus on their strengths. For example, systems like Rising Wave can concentrate exclusively on efficiently writing Iceberg tables and generating optimized Parquet files, while query engines like Firebolt can focus on reading this data efficiently.

Iceberg Fundamentals Iceberg solves several problems that arise when simply storing Parquet files in object storage:

Schema evolution - Tables can change over time with columns being added or removed
Time travel - Ability to query historical data states from specific points in time
ACID transactions - Ensuring consistent, reliable data reads

Iceberg Architecture Iceberg uses hierarchical metadata files stored in object storage (such as S3). These files form a chain that ultimately points to the actual data files stored in Parquet format. This architecture enables Firebolt to allow customers to query existing Iceberg tables without migration effort.

The Low-Latency Challenge

Network Latency Fundamentals Object storage systems like S3, GCS, or Azure Blob Storage are accessed via HTTP connections. Moving data from object storage to query engines running on compute instances (such as EC2) introduces latency. Research has shown that latency varies significantly based on request block sizes:

Requesting very small blocks (1 kilobyte) results in lower latency but fails to saturate available network bandwidth
Requesting large blocks (1 megabyte) increases latency but better utilizes network capacity
Modern EC2 instances can achieve network bandwidth up to 400 Gbps per second

The optimization challenge is finding the right block size that minimizes latency while maximizing network bandwidth utilization, as network bandwidth has become the new bottleneck.

The Iceberg Latency Problem With block sizes around 1-2 megabytes providing approximately 100 milliseconds of latency, reading Iceberg's hierarchical metadata files sequentially (metadata file, then Avro files, then data files) accumulates latencies, potentially requiring half a second before any query can be answered. This latency is unacceptable for customers running queries that need to complete in around 10 milliseconds.

Live Demonstration: Firebolt Core

Firebolt Core Introduction The speakers demonstrated Firebolt Core, a free offering that runs locally using Docker. This differentiates it from their managed SaaS platform and makes the technology accessible for local development and testing.

Demo Setup The demonstration ran on a MacBook, querying an Iceberg table stored in US East-1 from Munich, Germany. They used the TPC-H benchmark schema, specifically the orders table.

Querying Iceberg Tables Firebolt provides a table-valued function called read_iceberg that treats Iceberg tables as standard SQL tables. The initial query selecting several columns took 1.58 seconds due to the geographic distance between Munich and US East-1 (worse than the theoretical 0.5 seconds for co-located resources).

Caching Benefits Running the same query a second time executed in milliseconds, demonstrating the effectiveness of caching mechanisms. The speakers emphasized this was not simple result caching.

Complex Query Example A more sophisticated query joining orders, customers, and nation tables while computing aggregated statistics (sum of total order price and count per nation) was demonstrated. After the initial execution, adding a filter on order date still executed in approximately 40 milliseconds, proving that sub-result caching was reusing partial query results rather than just returning cached final results.

Sub-Result Caching Architecture

Caching Mechanisms Firebolt implements two primary sub-result caching strategies:

Hash table caching for joins - Caching the hash tables built during join operations
Maybe cache operator - A flexible caching operator that the query optimizer can place anywhere in the execution plan, with the runtime deciding whether to actually cache based on result size and utility

Query Execution Flow Query execution is represented as a directed graph where:

Scan operators retrieve data from tables
Join operators combine data from multiple sources
The right side of a join builds a hash table
The left side of a join probes the hash table
Aggregation operators compute final results

Fire Cache Storage All cached results are stored in Firebolt's "Fire Cache" (maintaining the fire naming convention throughout their products).

Partial Query Reuse When a new query arrives, the system checks if it has seen the entire sub-plan before and whether the underlying tables have changed. If conditions are met:

The cached result can be returned immediately without executing the sub-plan
For queries with modifications (like additional filters), unchanged portions of the query (such as a customer-nation join) can still be retrieved from cache while only executing the changed portions

Iceberg-Specific Optimizations The caching mechanism works transparently with Iceberg tables. The key requirement was teaching the system to understand when Iceberg tables change to avoid returning stale results.

List Iceberg Files Caching Iceberg table reading involves two table-valued functions:

list_iceberg_files - Retrieves the list of all files belonging to a snapshot from metadata
read_from_s3 - Downloads and reads the actual files

Since list_iceberg_files involves the expensive sequential metadata file reads causing the 0.1-second staggered latencies, inserting a maybe cache operator between these functions allows the file list to be cached in memory. Subsequent queries on unchanged tables bypass the metadata traversal entirely, enabling sub-500-millisecond query times.

Additional Optimizations

The read_from_s3 operator caches downloaded files on local disk to avoid re-downloading
Metadata file lookup operations also incorporate caching to determine if data has changed

Scan Architecture: Reading from Object Storage

The read_from_s3 Operator This component retrieves data from S3 or other object storage, decodes it, and provides in-memory columns to the query engine core.

Design Goals The primary objective is fully utilizing available resources, particularly network bandwidth. Achieving full network bandwidth saturation without encountering CPU or memory bottlenecks is exceptionally challenging.

The Memory vs. Latency Dilemma Two extreme approaches both fail:

Downloading everything - Requesting terabytes of data and downloading to local storage before processing fails when data exceeds available memory
Sequential small chunks - Requesting 2-megabyte blocks, processing them, then fetching the next block results in constant 100-millisecond wait times between chunks

Requirements The solution must fetch data from S3 while simultaneously processing previously fetched data, ensuring memory limits are never exceeded despite the processing pipeline potentially consuming arbitrary amounts of memory.

Component Architecture

Fetcher - Issues HTTP GET requests to object storage and delivers data blocks to the buffer manager
Disk Manager - Handles writing cached data to local SSD and retrieving it when needed
Buffer Manager - Manages individual blocks, tracking which S3 file each block belongs to and at what offset, coordinating with both the fetcher and disk manager
Control Flow - Schedules operations at appropriate times, deciding whether to issue fetch/disk requests or decode previously fetched data from formats like Parquet

Scan Architecture: Key Technical Innovations

Fixed-Size Block Chunking Files are divided into fixed-size blocks (around 1-2 megabytes), which ensures full network bandwidth utilization. When Parquet files contain many small row groups, fetching one row group at a time would underutilize network capacity. Fixed-size blocks are shared across multiple row groups when necessary.

C++ Coroutines The implementation leverages C++ coroutines, allowing functions to suspend and resume execution at arbitrary points. This enables writing sequential-looking code that transparently handles asynchronous operations like background data fetching. This is crucial because fixed-size blocks create non-contiguous buffers (a 10-megabyte row group split across five 2-megabyte blocks), and decoding from non-contiguous buffers while suspending at arbitrary points requires coroutines to avoid extremely complex manual state management.

Dual-Bandwidth Caching Strategy Caching serves two purposes:

Latency reduction - Avoiding repeated network round trips
Bandwidth multiplication - Once network bandwidth is saturated, cached data can be read from disk simultaneously with network reads, effectively utilizing both network bandwidth and disk read bandwidth in parallel

Task Families for Parallelism Parquet row groups can be processed independently on separate CPU cores. The control flow scheduler operates on "task families" rather than individual tasks. Each task family comprises multiple tasks (such as decoding individual columns). This design allows:

Multiple CPU cores to work on a single task family when only one is available in the system
Each core to make independent progress without excessive lock contention, which is critical for scaling across multi-core architectures

Conclusion

The presentation demonstrated how Firebolt addresses the challenges of low-latency analytical queries on Iceberg tables through a combination of sophisticated caching strategies, parallel processing architectures, and careful optimization of network and storage bandwidth utilization. The speakers expressed enthusiasm about continuing to solve these complex technical challenges and welcomed further discussion.