Decoupling Compute and Memory for Async GPUs

8 points | by yiyingzhang 1 day ago

4 comments

  • bobbyzhu2008 1 day ago
    67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable.
  • jhap 1 day ago
    This seems like a better version of CUDA, for Hopper GPUs?
  • preetham_rangu 1 day ago
    [dead]
  • jackofficial643 13 hours ago
    [flagged]