Tuesday, December 3, 2013

final project report and source

A benchmarking tool using OpenMP and MPI.

The report and source code can be downloaded from http://cosc462web.newton.utk.edu/pgaultne/finalProject, or, if that location is not accessible to you, from my personal web server at http://petergaultney.com/school/CS462/finalProject.


Wednesday, October 23, 2013

Project proposal

For the semester project, I intend to create a performance stability measuring tool using the core of the 'particles' code developed in class and improved throughout the course of the semester.

The tool will run an automated set of tests across the nodes of a heterogenous cluster. It will run differing numbers of process and processes with varying levels of threading across the nodes, using OpenMPI and OpenMP. It will perform CPU configuration discovery on each node (using, at a minimum, /proc/cpuinfo) to maximize utility and minimize the user input required. It will measure the variability of the timing results from these tests to provide a visual representation of the stability and overall efficiency of the nodes in the cluster. The tool will also perform MPI communications between nodes to measure the basic communication latencies of the cluster.

The tool will measure various processing lengths (e.g., ~32 seconds, ~16 seconds, ~8 seconds, ~4 seconds, down to ~0.0625 seconds) so as to analyze the behavior of systems with both stable and unstable job timings.

The tool will output, at a minimum, the mean, minimum, maximum, standard deviation, and variance of the timings for all configurations. The tool will also attempt to fit the performance curves to various standard functions, and generate graphs and statistics to indicate the closeness of the fit for those models.

The tool will require a standard C compiler and a Python 2.7+ environment with the statistics and graphing libraries pandas, numpy, and matplotlib available.

Wednesday, September 4, 2013

a basic, O(N^2) program...

The purpose of this exercise is to write a program that demonstrates the need for parallel computing. In a basic particle simulation, with calculations of distance from each particle to every other particle, the problem is O(N^2). This naive, unparallel program simply nests one loop over the set of particles inside another.

The program can be downloaded here.

The program computes the mean of all of the particles, and the variance from the mean. These calculations are O(N). After computing all of the distances, the program computes the mean distance and the variance, as well as the mean inverse squared distance and its variance.

Because the program randomly generates the particles, it is expected that the means will converge to (0,0) with larger numbers of particles. The variance, with a radius of 1, is expected to converge to 0.333... for the square shape, and 0.250 for the disk shape, as the disk shape is smaller in surface area than the square.

RESULTS

numParticles 1000
shape square
radius 1.000000
xMean 0.003013
yMean -0.006152
xVariance 0.347908
yVariance 0.335383
distanceMean 1.055987
distanceVariance 1.367950
distanceInverseSquaredMean 10.803389
distanceInverseSquaredVariance 911364.212702

shape square radius 1 numParticles 1000         timeElapsed 0.16s

The values are mostly as expected, though the Inverse Squared Distance variance is a very large number that is hard to intuitively verify.

Doubling the problem size does in fact quadruple (2^2) the runtime, as shown for numParticles = 2000:

numParticles 2000
shape square
radius 1.000000
xMean -0.008348
yMean -0.011882
xVariance 0.341173
yVariance 0.339286
distanceMean 1.053602
distanceVariance 1.361599
distanceInverseSquaredMean 16.772375
distanceInverseSquaredVariance 59622542.750757

shape square radius 1 numParticles 2000         timeElapsed 0.66s

Again, values are as expected, and the variance has dropped slightly. Doubling the problem size to 4000:

numParticles 4000
shape square
radius 1.000000
xMean -0.004678
yMean -0.009344
xVariance 0.334032
yVariance 0.334881
distanceMean 1.044886
distanceVariance 1.338162
distanceInverseSquaredMean 13.926066
distanceInverseSquaredVariance 22747124.295431


shape square radius 1 numParticles 4000         timeElapsed 2.63s

At this point, the mean and variance are clearly converging. Doubling again to 8000:

numParticles 8000
shape square
radius 1.000000
xMean 0.000013
yMean -0.005476
xVariance 0.329461
yVariance 0.333698
distanceMean 1.040121
distanceVariance 1.326485
distanceInverseSquaredMean 14.022405
distanceInverseSquaredVariance 67923729.057555


shape square radius 1 numParticles 8000         timeElapsed 10.5s

With the disk shape, we see similar results, but the variances are smaller:

numParticles 2000
shape disk
radius 1.000000
xMean -0.013676
yMean 0.003475
xVariance 0.255792
yVariance 0.248521
distanceMean 0.909900
distanceVariance 1.009132
distanceInverseSquaredMean 18.541516
distanceInverseSquaredVariance 57242207.267090

shape disk radius 1 numParticles 2000           timeElapsed 0.66s

The runtime with the disk shape is not significantly different from the runtime with the square, as the distance computation dominates the particle generation and discarding of 'invalid' particles.

The full set of timing results for disk show the O(N^2) growth of the problem:

shape disk radius 1 numParticles 1000           timeElapsed 0.16s
shape disk radius 1 numParticles 2000           timeElapsed 0.66s
shape disk radius 1 numParticles 3000           timeElapsed 1.47s
shape disk radius 1 numParticles 4000           timeElapsed 2.63s
shape disk radius 1 numParticles 5000           timeElapsed 4.09s
shape disk radius 1 numParticles 6000           timeElapsed 5.92s
shape disk radius 1 numParticles 7000           timeElapsed 8.04s
shape disk radius 1 numParticles 8000           timeElapsed 10.48s


Thursday, August 22, 2013

raison d'ĂȘtre

In the eternal words of bounty hunter Jubal Early: "Well, here I am..."

This blog will serve as a catalog of the discoveries made while using two BeagleBone Blacks as a parallel computing platform.

The course being followed is CS462 at The University of Tennessee. I have yet to decide on my primary interfacing platform, but it will either be Mac OS X Lion or a newer version of Ubuntu, likely running in a VM. Hopefully one of these two environments will prove themselves suitable for cross-compiling software onto the BeagleBones, as I'd expect compiling on the BeagleBones themselves to be a little suboptimal.