HeatWave Query Engine Internals | Joins, Hashing, SIMD Vectorization & Aggregations

Introduction

HeatWave is designed for analytics at massive scale. Its performance comes from a combination of vectorized SQL operations, parallel execution, and distributed joins.

Let’s explore the internals.

1. Vectorized Execution

HeatWave processes queries using CPU vectorization:

SIMD instructions (Single Instruction, Multiple Data)
Processes 8, 16, or 32 values per CPU cycle
Massive throughput gains

2. Distributed Hash Joins

HeatWave supports three join strategies:

1. Broadcast Join

Small table → broadcast to all nodes
Large table → partitioned

2. Partitioned Hash Join

Both tables partitioned by join key

3. Hybrid Join

Combination of both based on cost model

HeatWave’s join engine uses:

Hash tables in RAM
Compressed data blocks
SIMD hash probes

3. Aggregations Optimized

HeatWave performs:

Local aggregation on each worker
Global merge on coordinator

This reduces network traffic significantly.

4. Bloom Filters

Used to prune rows that won’t match join conditions.
Reduces data movement and CPU load.

5. Indexing

HeatWave does not use InnoDB secondary indexes.
It builds:

Vectorized sort-merge structures
Hash-based filtering structures

These are built in memory during query execution or load time.

Conclusion

HeatWave’s execution engine is highly optimized for analytics, offering near column-store performance inside MySQL without external ETL or BI pipelines.

HeatWave Query Engine Internals — Joins, Aggregations, Hashing & Vectorization

Introduction

1. Vectorized Execution

2. Distributed Hash Joins

1. Broadcast Join

2. Partitioned Hash Join

3. Hybrid Join

3. Aggregations Optimized

4. Bloom Filters

5. Indexing

Conclusion

Related Posts

Troubleshooting HeatWave — Common Issues, Causes & Fixes

Monitoring HeatWave — Complete Guide for DBAs & DevOps

Understanding HeatWave In-Memory Storage — Compression, Encoding & Columnar Magic

About the Author

Mariusz Antonik

Recent Posts

All Tags