Back to Blog
HeatWave Query Engine Internals — Joins, Aggregations, Hashing & Vectorization

HeatWave Query Engine Internals — Joins, Aggregations, Hashing & Vectorization

   Mariusz Antonik    MySQL Heatwave    1 min read    3 views

Introduction

HeatWave is designed for analytics at massive scale. Its performance comes from a combination of vectorized SQL operations, parallel execution, and distributed joins.

Let’s explore the internals.


1. Vectorized Execution

HeatWave processes queries using CPU vectorization:

  • SIMD instructions (Single Instruction, Multiple Data)

  • Processes 8, 16, or 32 values per CPU cycle

  • Massive throughput gains


2. Distributed Hash Joins

HeatWave supports three join strategies:

1. Broadcast Join

Small table → broadcast to all nodes
Large table → partitioned

2. Partitioned Hash Join

Both tables partitioned by join key

3. Hybrid Join

Combination of both based on cost model

HeatWave’s join engine uses:

  • Hash tables in RAM

  • Compressed data blocks

  • SIMD hash probes


3. Aggregations Optimized

HeatWave performs:

  • Local aggregation on each worker

  • Global merge on coordinator

This reduces network traffic significantly.


4. Bloom Filters

Used to prune rows that won’t match join conditions.
Reduces data movement and CPU load.


5. Indexing

HeatWave does not use InnoDB secondary indexes.
It builds:

  • Vectorized sort-merge structures

  • Hash-based filtering structures

These are built in memory during query execution or load time.


Conclusion

HeatWave’s execution engine is highly optimized for analytics, offering near column-store performance inside MySQL without external ETL or BI pipelines.