All Stories

  1. Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes
  2. GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure
  3. Using Additive Modifications in LU Factorization Instead of Pivoting
  4. A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
  5. Task-graph scheduling extensions for efficient synchronization and communication
  6. PLASMA
  7. Towards batched linear solvers on accelerated hardware platforms
  8. Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster
  9. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems
  10. Optimization for performance and energy for batched matrix computations on GPUs
  11. Towards batched linear solvers on accelerated hardware platforms
  12. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting