Saturday 2 November 2024

Difference between Hash Aggregate and Stream Aggregate

 Below are the difference between Hash Aggregate and Stream Aggregate

Sno

Feature

Hash Aggregate

Stream Aggregate

1

Input Requirement

Works on unsorted data.

Works on sorted data.

2

Memory Usage

Uses more memory since it builds hash tables to group the data. Can result in memory spills if the dataset is large.

Generally uses less memory because it processes rows in a streaming manner

3

Performance

Suitable for larger datasets where sorting is expensive or unnecessary. May be slower if data spills to disk.

More efficient when the data is pre-sorted, as it processes rows in a stream without needing to buffer much data.

4

Scalability

Works well for large datasets that are not sorted, but memory usage is a concern.

Highly efficient for pre-sorted data and smaller datasets, but performance can degrade if sorting is needed.

5

Execution Plan

Builds a hash table in memory to group and aggregate rows.

Streams through the sorted input, grouping data as it encounters rows that belong to the same group.

6

Common Usage Scenarios

Used when sorting the input would be expensive or infeasible, e.g., large datasets with no useful indexes.

Used when the input data is already sorted or when there is an index that supports the required order.

7

Disk I/O

Can result in spilling to disk if the hash table is too large to fit in memory, leading to slower performance.

Generally avoids disk I/O since it processes rows row-by-row, but performance degrades if sorting is needed.

8

Spill Risk

Higher risk of spilling to disk for large datasets.

Lower risk of memory issues since it doesn't need to store much data at once.

 

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts