What is it about?
Stream processing is used for real-time applications that deal with large volumes, velocities, and varieties of data. Stream processing frameworks discretize continuous data streams to apply computations on smaller batches. For real-time stream-based data analytics algorithms, the intermediate states of computations might need to be retained in memory until the query is complete. Thus, a massive surge in memory demand needs to be satisfied to run these algorithms successfully. However, a worker/server node in a computing cluster may have limited memory capacity. In addition, multiple parallel processes might be running concurrently, sharing the primary memory. As a result, a streaming application might fail to run or complete due to a memory shortage. To mitigate this memory-limitation problem, we proposed a scalable multi-level caching architecture to support the state management of complex streaming applications. We have developed a prototype API in Java following the proposed multi-level caching architecture, which can be used as a caching library to develop stateful stream applications that benefit from in-memory caching. There is also support for various data structures in the caching API to represent complex application states. The caching API manages the application state and cache levels transparently so that the application is unaware of real-time object evictions/loads between different levels of memory.
Featured Image
Photo by Jeremy Bezanger on Unsplash
Why is it important?
Stream computing become popular due to the massive analytics demand across various industries. When real-time streaming applications generate a lot of data in real-time, it needs to be processed fast with a limited number of resources. However, for some applications, data need to be retained in the main memory for longer as they may be needed in subsequent operations. This results in a memory limitation problem for which many stateful stream applications either fail or experience severe performance degradation. Thus, this work paves the way to tackle this problem by extending the available memory with a multi-layer cache. As a cache is employed, only the frequent objects are stored in the fast storage (process heap) to make the application faster. Only when the cache limit is reached, slower levels of memory are used to backup objects that can be loaded to the process heap if needed.
Perspectives
Read the Original
This page is a summary of: A multi-level caching architecture for stateful stream computation, June 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3524860.3539803.
You can read the full text:
Resources
Contributors
The following have contributed to this page