Graphics Programming Weekly 444


Multi-Layer Reservoir Splatting for Temporal Reuse under Disocclusion

  • introduces a multi-layer ReSTIR variant that preserves sample history when occluded
  • uses reservoir splatting and layered depth ranges to move samples between screen-space layers without tracing extra rays
  • demonstrates much lower disocclusion noise for temporally reused samples with only a small extra performance cost


Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

  • presents KernelPro, an LLM-guided optimization system that uses profiling tools for feedback collection
  • combines bottleneck classification with kernel-level, instruction-level, and system-level profiling to generate precise tuning guidance
  • presents results for performance and energy efficiency


Adaptive Catmull-Clark Subdivision with Compute Tessellation

  • explains a compute shader based Catmull-Clark tesselation pipeline that adapts subdivision density to feature and screen-space needs
  • uses analytic bicubic B-spline evaluation for regular patches and a compact irregular-region scheme to avoid expensive quadtree traversal
  • reports results with visually similar results
  • presents performance,quality and mesh authoring tradeoff analysis


[video] Smooth-Maximum, the most useful function

  • explains how the maximum of two values can be expressed as a center point plus a distance term
  • derives smooth-maximum by replacing the absolute value with a square-root expression and adding a bias parameter for softer blending
  • demonstrates using smooth-maximum to patch and sculpt procedural surfaces, enabling natural transitions and rounded polyhedral forms


Neural Texture Compression using Hypernetworks

  • proposes a hypernetwork that generates both latent texture features and the decoder MLP weights for neural texture compression
  • avoids per-material gradient descent tuning by learning a single model capable of producing high-quality compressed texture decoders
  • showing the same network achiectecture can be used for texture upscaling


Procedural UV Derivatives Evaluation in SORT Renderer

  • explains why arbitrary procedural texture coordinates break traditional derivative pipelines in an offline renderer with a texture cache
  • builds a solution in SORT’s custom SSL shading language using forward-mode differentiation for screen-space UV partials
  • details how the renderer handles derivatives for hit points, function calls and control flow
  • validates the costs with compile-time and runtime measurements


[video] Writing GPU shaders in plain Rust (Firestar99 - Sebastian Sydow at RustWeek)

  • surveys Rust-GPU as a path to write shaders in plain Rust, compiling to SPIR-V for Vulkan and potentially WebGPU
  • covers shared data layouts via repr© types, no_std execution for GPU runtime constraints, and pointer emulation for Vulkan’s logical pointers


Computing Camera Rays

  • derives robust camera ray generation from world-to-clip matrices using clip-space plane intersections rather than far-plane subtraction
  • identifies numerical cancellation issues and presents a solution for them with cross products of transformed clip-space plane normals
  • includes optimized GLSL shader code, support for far-plane ray length, and notes on Direct3D/HLSL conventions and renderer integration


New AMD Radeon Developer Tool Suite update brings shader source code, Extended PIX Markers, and command-line capture

  • summarizes the RDTS update across the ecosystem
  • Radeon GPU profiler adds DirectX 12 HLSL shader source display, average active lane divergence metrics, and Extended PIX Marker support with shader hash filtering


Thanks to Aras Pranckevičius for support of this series.


Would you like to see your name here too? Become a Patreon of this series.