Marlin CFD

Device Independent Fourier Spectral & LBM Solver

C++20 MPI + LibTorch Multi-GPU

Formula 1 Aerodynamics Case Study

Extreme Scale
Turbulence

Simulating the chaotic wake of a Formula 1 car requires resolving eddies down to the sub-millimeter scale. Using the Smagorinsky Large Eddy Simulation (LES) model, Marlin captures the transient nature of flow separation and reattachment.

The solver runs natively on NVIDIA A100 GPUs via LibTorch tensors, completely bypassing manual CUDA kernels.

Run Statistics

  • Grid Size: 36.5 Million Cells
  • Compute: 1x NVIDIA A100
  • Memory: 38 GB VRAM
  • Drag Coeff: 0.933

Nuclear Safety

Coupled Thermo-Hydraulics in Containment Domes

Temperature Field

Vorticity Field

Dome Geometry

Thermal Plumes &
Vortex Shedding

Inside a reactor containment dome, heat sources generate massive thermal plumes that drive complex circulation patterns. We use a Double Distribution Function (DDF) LBM approach: one distribution for fluid momentum ($f$) and another for internal energy ($g$).

Key Metrics

DOFs 1.1 Billion
Simulated Time 101 Mins
Output Size 282 GB
Steps 100,000
01. Technical Overview

The Methodology

Marlin solves the discretized Lattice Boltzmann Equation (LBE) using a tensor-based approach. Instead of iterating over cells in a loop (Array-of-Structures), we treat the entire grid as a set of tensors (Structure-of-Arrays) and perform collision/streaming as matrix operations.

$$ f_\alpha(\mathbf{x} + \mathbf{e}_\alpha \delta t, t + \delta t) = f_\alpha^{eq} + \tilde{f}_\alpha - \sum_\beta \left( \mathbf{M}^{-1} \mathbf{S} \mathbf{M} \right)_{\alpha, \beta} \tilde{f}_\beta. $$

Core Capabilities

  • Mid-grid Bounceback Boundaries
  • Smagorinsky & Vreman Turbulence Models
  • D2Q9, D3Q19, D3Q27 Stencils
  • Porous Media Flow

Hardware Agnostic

By targeting the LibTorch C++ API, Marlin code compiles once and runs anywhere PyTorch runs:

NVIDIA CUDA AMD ROCm Intel XPU CPU AVX512

What's Next?

Multiphase flow, Reactive flow, and Compressible regimes.