I was provided poorly performing, serial code to perform a Lattice-Boltzmann fluid dynamics computation. I made significant improvements by applying serial optimisations in C, compiler options and parallelising first across shared memory CPUs in OpenMP, and then across distributed memory CPUs with MPI
A 2,000 word report on the OpenMP section describing the optimisations I made, the reasons behind their success, and analysing the performance using tools such as Intel Advisor