Introduction
HeatWave is designed for analytics at massive scale. Its performance comes from a combination of vectorized SQL operations, parallel execution, and distributed joins.
Let’s explore the internals.
1. Vectorized Execution
HeatWave processes queries using CPU vectorization:
-
SIMD instructions (Single Instruction, Multiple Data)
-
Processes 8, 16, or 32 values per CPU cycle
-
Massive throughput gains
2. Distributed Hash Joins
HeatWave supports three join strategies:
1. Broadcast Join
Small table → broadcast to all nodes
Large table → partitioned
2. Partitioned Hash Join
Both tables partitioned by join key
3. Hybrid Join
Combination of both based on cost model
HeatWave’s join engine uses:
-
Hash tables in RAM
-
Compressed data blocks
-
SIMD hash probes
3. Aggregations Optimized
HeatWave performs:
-
Local aggregation on each worker
-
Global merge on coordinator
This reduces network traffic significantly.
4. Bloom Filters
Used to prune rows that won’t match join conditions.
Reduces data movement and CPU load.
5. Indexing
HeatWave does not use InnoDB secondary indexes.
It builds:
-
Vectorized sort-merge structures
-
Hash-based filtering structures
These are built in memory during query execution or load time.
Conclusion
HeatWave’s execution engine is highly optimized for analytics, offering near column-store performance inside MySQL without external ETL or BI pipelines.