Graphics Programming Weekly 438


Cooperative Vectors Introduction

  • introduces Cooperative Vectors, a GPU API that enables hardware-accelerated vector–matrix multiplications inside shaders
  • explains the motivation from neural material and neural radiance caching use cases
  • covers practical inference using MulOptimal matrix layouts and training via OuterProductAccumulate and VectorAccumulate


CuRast: Cuda-Based Software Rasterization for Billions of Triangles

  • a CUDA-based software rasterizer capable of large triangle-count datasets without precomputed LODs or acceleration structures
  • uses a 3-stage pipeline: small triangles are rasterized by one thread, larger triangles by one warp, and the largest triangles are split and queued for a third stage
  • presents performance comparisons against Vulkan-based hardware rasterization


Micropolygon Rendering in Anvil

  • presentation about the micropolygon system implemented in Anvil
  • presents the whole pipeline from data preparation, over culling, to the actual rasterization step
  • show how to integrate into the streaming system and optimizations across the system
  • additionally presents the challenges of getting the system to run on Switch 2


EA Sports F1 25: Path Tracing at 200mph

  • The presentation discusses the challenges for the path tracing mode in F1 2025
  • discusses ReSTIR, ReGIR combination used
  • focuses on the challenges of night races with a large number of dynamic lights
  • additionally presents a look at the debug and development tools developed for the process


All Rays Lead to Rome: Next-Gen Graphics in Anno 117: Pax Romana

  • The talks discuss the rendering engine used for Anno 117
  • covering how objects are built from individual subobjects layered together, how meshes are adapted to the terrain, and interactions with the LOD system
  • covers procedural grass generation
  • presents the approach to ray tracing integration


How We Draw a 3D Sprite World: The Stylized Art of Never's End

  • talk from the technical artist summit track that focuses on the stylized rendering of a 3D world with a stylized 2D sprite look
  • presents how to apply outline rendering, stylized shading, sky, and clouds
  • additionally presents challenges related to pixel snapping, depth sorting, and shadows


Metal Lossy Compression Format

  • reverse engineers Apple’s undocumented Metal Lossy Compression format
  • describes the memory layout: variable-size blocks packed within 128-byte tiles, with per-block metadata
  • details the multiple encoding modes supported


An interpreter for HLSL that can run shader code on the CPU.

  • an experimental C# library that interprets HLSL shader code on the CPU, enabling CPU-side execution of shaders
  • includes a built-in shader testing framework with HLSL-native test support
  • supports the majority of HLSL features, including wave intrinsics, divergent control flow, groupshared memory, and texture types


Single-pass palette refinement and ordered dithering

  • Investigates using Bayer matrix traversal for pixel visitation in online k-means color quantization, resulting in an ordered dithering pattern
  • Shows that omitting the final Euclidean pixel mapping in Bayer traversal creates visible dither patterns without extra dithering steps
  • Shuffling the block order within the Bayer matrix combines structure and randomness, yielding cleaner dithering than pure raster or random methods


[video] Efficient Bloom with Gaussian Blur in OpenGL

  • presents how to implement bloom using OpenGL
  • explains the theory and shows the practical implementation walkthrough


Thanks to Jasper Bekkers for support of this series.


Would you like to see your name here too? Become a Patreon of this series.