Wednesday, October 23, 2013

Project proposal

For the semester project, I intend to create a performance stability measuring tool using the core of the 'particles' code developed in class and improved throughout the course of the semester.

The tool will run an automated set of tests across the nodes of a heterogenous cluster. It will run differing numbers of process and processes with varying levels of threading across the nodes, using OpenMPI and OpenMP. It will perform CPU configuration discovery on each node (using, at a minimum, /proc/cpuinfo) to maximize utility and minimize the user input required. It will measure the variability of the timing results from these tests to provide a visual representation of the stability and overall efficiency of the nodes in the cluster. The tool will also perform MPI communications between nodes to measure the basic communication latencies of the cluster.

The tool will measure various processing lengths (e.g., ~32 seconds, ~16 seconds, ~8 seconds, ~4 seconds, down to ~0.0625 seconds) so as to analyze the behavior of systems with both stable and unstable job timings.

The tool will output, at a minimum, the mean, minimum, maximum, standard deviation, and variance of the timings for all configurations. The tool will also attempt to fit the performance curves to various standard functions, and generate graphs and statistics to indicate the closeness of the fit for those models.

The tool will require a standard C compiler and a Python 2.7+ environment with the statistics and graphing libraries pandas, numpy, and matplotlib available.