All Stories

  1. Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures
  2. SpotVerse: Optimizing Bioinformatics Workflows with Multi-Region Spot Instances in Galaxy and Beyond
  3. SmartGraph: A Framework for Graph Processing in Computational Storage
  4. FAAStloop: Optimizing Loop-Based Applications for Serverless Computing
  5. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  6. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  7. Minimizing Coherence Errors via Dynamic Decoupling
  8. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale
  9. Impact of Write-Allocate Elimination on Fujitsu A64FX
  10. MBFGraph: An SSD-based External Graph System for Evolving Graphs
  11. Hardware Support for Constant-Time Programming
  12. Architecture-Aware Currying
  13. Quantifying and Mitigating Cache Side Channel Leakage with Differential Set
  14. Optimizing CPU Performance for Recommendation Systems At-Scale
  15. EdgePC: Efficient Deep Learning Analytics for Point Clouds on Edge Devices
  16. Cypress
  17. Multi-resource fair allocation for consolidated flash-based caching systems
  18. Fine-Granular Computation and Data Layout Reorganization for Improving Locality
  19. An architecture interface and offload model for low-overhead, near-data, distributed accelerators
  20. Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping
  21. Pushing Point Cloud Compression to the Edge
  22. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  23. Memory Space Recycling
  24. Data Convection
  25. Data Convection
  26. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  27. Memory Space Recycling
  28. A Scheduling Framework for Decomposable Kernels on Energy Harvesting IoT Edge Nodes
  29. End-to-end Characterization of Game Streaming Applications on Mobile Platforms
  30. Memory Space Recycling
  31. Data Convection
  32. Kraken
  33. SpecSafe: detecting cache side channels in a speculative world
  34. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  35. Distance-in-time versus distance-in-space
  36. Fluid: a framework for approximate concurrency via controlled dependency relaxation
  37. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  38. Mix and Match: Reorganizing Tasks for Enhancing Data Locality
  39. Ghost Thread
  40. SplitServe
  41. Fifer
  42. Implications of Public Cloud Resource Heterogeneity for Inference Serving
  43. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks
  44. Déjà View: Spatio-Temporal Compute Reuse for‘ Energy-Efficient 360° VR Video Streaming
  45. Computing with Near Data
  46. Quantifying Data Locality in Dynamic Parallelism in GPUs
  47. Architecture-Aware Approximate Computing
  48. Architecture-Aware Approximate Computing
  49. Co-optimizing memory-level parallelism and cache-level parallelism
  50. Quantifying Data Locality in Dynamic Parallelism in GPUs
  51. NEOFog
  52. ReveNAND
  53. Enhancing computation-to-core assignment with physical location information
  54. Enhancing computation-to-core assignment with physical location information
  55. Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory
  56. Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory
  57. A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
  58. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads
  59. Exploiting Intra-Request Slack to Improve SSD Performance
  60. Exploiting Intra-Request Slack to Improve SSD Performance
  61. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads
  62. VIP
  63. A case for core-assisted bottleneck acceleration in GPUs
  64. Anatomy of GPU Memory System for Multi-Application Execution
  65. Optimizing off-chip accesses in multicores
  66. EECache
  67. Memory Row Reuse Distance and its Role in Optimizing Application Performance
  68. Memory Row Reuse Distance and its Role in Optimizing Application Performance
  69. VIP
  70. A case for core-assisted bottleneck acceleration in GPUs
  71. Network footprint reduction through data access and computation placement in NoC-based manycores
  72. Optimizing off-chip accesses in multicores
  73. TaPEr
  74. Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy
  75. Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
  76. Trading cache hit rate for memory performance
  77. Orchestrated scheduling and prefetching for GPGPUs
  78. Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
  79. Physically addressed queueing (PAQ)
  80. A compiler framework for extracting superword level parallelism
  81. A compiler framework for extracting superword level parallelism