A monolithic 3-tier fat tree buys the cheapest switch count at every scale as folded-Clos is mathematically minimal. It then pays that saving back many times over in super-spine transceivers. The Fordist model replicates self-contained Dragonfly+ pods linked by passive co-located DAC: more switches yet far fewer optics, zero east-west storage contention and lower latency.
Adjust and the chart recomputes. Defaults are street estimates.
| Line item | Fat Tree qty | DF+ qty | Fat Tree $ | Fordist $ | Winner |
|---|
VAST CNodes must terminate somewhere, and the comparison above already equalizes the storage optics on both sides. But where the monolith lands that storage is a forced choice between two penalties yet neither of which the Fordist design pays.
CNodes hang off compute leaves. Synchronous, bursty checkpoint and dataset-staging I/O then crosses the same super-spine core that carries all-reduce east-west collectives. In a non-blocking 3 level fat tree with 72 GPUs per rack and Q3400-RA switch this really isn't possible as all leaf local ports are full.
→ tail-latency spikes, depressed MFU, idle GPU-hoursStand up an entirely parallel network: its own VXLAN overlay, routing convention, and lossless PFC/ECN tuning, plus a second control plane that Slurm and Kubernetes must reconcile against the compute fabric.
→ another fabric, another failure domain, huge capex, permanent opex taxStorage terminates inside the same InfiniBand fabric, on seperate DF+ group that serves as the UGAL pressure-relief path. It uses idle bandwidth without contention with east-west training collectives.
→ one fabric, one convention, no east-west storm