Replicating computer-simulated science is harder than you think, as a group of aerospace researchers from George Washington University have found.
In fact, without decent versioning, documentation, publication of data and rigorous evidence standards, simulations that attract headlines both in academic and general media should probably be consigned to the lamented Journal of Irreproducible Results.
Olivier Mesnard and Lorena Barba of the university's mechanical and aerospace engineering school have just spent three years trying to replicate computational fluid dynamics (CFD) results first published in 2014, and have published their travails here at Arxiv.
CFD is vital to all kinds of engineering: as well as aeronautics, it helps make submarines silent and F1 cars faster.
In their original research, Mesnard and Barba wrote their own CFD software to model the aerodynamics of flying snakes (like the one shown in the IOP-sourced video below).
Their next step, to make the work easier for other researchers to replicate their results, was a replication study on other packages: the free OpenFOAM fluid modeller; the also-open IBAMR project from New York University; and a rewrite of their own code using the PETSc library for parallelism.
That's where things got painful, they write: to get where they expected to get needed “three years of dedicated work that encountered a dozen ways that things can go wrong, conquered one after another, to arrive finally at (approximately) the same findings and a whole new understanding of what it means to do “reproducible research” in computational fluid dynamics.”
They turned up “vexing and unnerving” challenges in setting up their CFD runs on different programs, and “unexpected tricks of the trade” in software that only the authors of a project knew about.
Even more mystifying, the boffins write, their own software delivered different results, depending on whether it was running on GPUs (using the Cusp library) or parallel CPUs (using the PETSc library).
If you're not yet sympathetic with the researchers, there's one more wrinkle. With reproducability in mind when they conducted their original study, the pair “adopted a set of practices years ago to make our research reproducible”. That included version control of their own software, their data published under an MIT license (not just the paper).
However, mere library version changes (for example, in CUDA and Cusp) produced different final results that needed a lot of work to fix.
Different hardware, a newer operating system, and a newer compiler can all wreck reproducibility, they write: “In an iterative linear solver, any of these things could be related to lack of floating-point reproducibility. And in unsteady fluid dynamics, small floating-point differences can add up over thousands of time steps to eventually trigger a flow instability (like vortex merging).”
As they note, “computational science and engineering lacks an accepted standard of evidence”.
“When large simulations run on specific hardware with one-off compute allocations, they are unlikely to be reproduced. In this case, it is even more important that researchers advance towards these HPC applications on a solid progression of fully reproducible research at the smaller scales.” ®