A multi-level caching architecture for stateful stream computation

Muhammed Tawfiqul Islam; Renata Borovica-Gajic; Shanika Karunasekera

doi:10.1145/3524860.3539803

What is it about?

Stream processing is used for real-time applications that deal with large volumes, velocities, and varieties of data. Stream processing frameworks discretize continuous data streams to apply computations on smaller batches. For real-time stream-based data analytics algorithms, the intermediate states of computations might need to be retained in memory until the query is complete. Thus, a massive surge in memory demand needs to be satisfied to run these algorithms successfully. However, a worker/server node in a computing cluster may have limited memory capacity. In addition, multiple parallel processes might be running concurrently, sharing the primary memory. As a result, a streaming application might fail to run or complete due to a memory shortage. To mitigate this memory-limitation problem, we proposed a scalable multi-level caching architecture to support the state management of complex streaming applications. We have developed a prototype API in Java following the proposed multi-level caching architecture, which can be used as a caching library to develop stateful stream applications that benefit from in-memory caching. There is also support for various data structures in the caching API to represent complex application states. The caching API manages the application state and cache levels transparently so that the application is unaware of real-time object evictions/loads between different levels of memory.

Photo by Jeremy Bezanger on Unsplash

Why is it important?

Stream computing become popular due to the massive analytics demand across various industries. When real-time streaming applications generate a lot of data in real-time, it needs to be processed fast with a limited number of resources. However, for some applications, data need to be retained in the main memory for longer as they may be needed in subsequent operations. This results in a memory limitation problem for which many stateful stream applications either fail or experience severe performance degradation. Thus, this work paves the way to tackle this problem by extending the available memory with a multi-layer cache. As a cache is employed, only the frequent objects are stored in the fast storage (process heap) to make the application faster. Only when the cache limit is reached, slower levels of memory are used to backup objects that can be loaded to the process heap if needed.

Perspectives

This paper tackles a critical problem of the memory limitation of stream computing applications . The proposed architecture and caching API can be utilized to develop scalable stateful streaming applications.
Dr Muhammed Tawfiqul Islam
University of Melbourne

This page is a summary of: A multi-level caching architecture for stateful stream computation, June 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3524860.3539803.
You can read the full text:

Read

Resources

Video
DEBS 2022 Presentation
Paper Title: A Multi-level Caching Architecture for Stateful Stream Computation Presented by Muhammed Tawfiqul Islam of University of Melbourne at the 16th ACM Conference on Distributed and Event-based Systems Conference (Copenhegen, Denmark), 2022.

Contributors

The following have contributed to this page

Dr Muhammed Tawfiqul Islam
University of Melbourne

Solving the memory limitation problem of stateful stream computing applications.

What is it about?

Why is it important?

Perspectives

Resources

DEBS 2022 Presentation

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Solving the memory limitation problem of stateful stream computing applications.

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

DEBS 2022 Presentation

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management