All Stories

  1. Accelerating Stream Processing Engines via Hardware Offloading
  2. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
  3. KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models
  4. Scaling Up Memory Disaggregated Applications with SMART
  5. Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared Memory
  6. Falcon: Fast OLTP Engine for Persistent Cache and Non-Volatile Memory
  7. Efficiently Answering Path Queries on Evolving Graphs