All Stories

  1. GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure
  2. Using Additive Modifications in LU Factorization Instead of Pivoting
  3. A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
  4. Task-graph scheduling extensions for efficient synchronization and communication
  5. PLASMA
  6. Towards batched linear solvers on accelerated hardware platforms
  7. Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster
  8. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems
  9. Optimization for performance and energy for batched matrix computations on GPUs
  10. Towards batched linear solvers on accelerated hardware platforms
  11. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting