|
|
Project: Simulation in Titanium
Student Researchers: Szu-Huey Chuang, Carrie Fei, Ellen Tsai
Advisor: Katherine Yelick
Institution: University of California, Berkeley
Introduction
Biological research increasingly relies on high end computing technology
to aid in the modeling and simulation of biologic systems. For example,
simulation of three-dimensional models of biologic systems, such as the
human heart, requires large-scale parallel machines, which are difficult
to manage and program. To simplify programming, the Titanium group at
UC Berkeley has developed a high performance dialect of Java that uses
a single program multiple data (SPMD) parallelism model with a global
address space. Titanium offers a uniform programming model across single
processors and the more complicated shared and distributed memory parallel
machines. They have also developed a heart simulation in Titanium, using
the immersed boundary method developed by Peskin and McQueen. Titanium
uses a static compilation model and provides support for expressive multidimensional
arrays. The goal of our research is to evaluate the single node performance
of the Titanium compiler and of the Titanium array constructs, and to
investigate techniques for performance tuning in Titanium.
Benchmarking Process
Our benchmarking is done using the Scimark2 Benchmark suite. This suite
is a collection of single processor Java programs that run popular scientific
computations. We used the following benchmarks:
- Jacobi Successive Over-Relaxation (SOR)
- Sparse matrix multiply
- Dense LU matrix factorization
We used both a small and large input data set--the small data input is
designed to fit in cache, whereas the large one will reside in main memory.
We also varied the compiler flags, choosing the optimization level and
whether binds checking was turned off.
Our strategy was to start with the reference Java implementation and incrementally
refine it using Titanium features. We started by introducing Titanium
arrays, and then converted the loops to use Titaniums "foreach"
construct. Foreach loops are unique to Titanium because of
their unordered iterations through multidimensional arrays, which simplify
binds' checking and other optimizations. The most difficult task in our
project was to develop an understanding of Titanium language, including
both the semantics of the language extensions (e.g., when foreach loops
were legal) and the performance implications of each construct. In the
case of the SOR benchmark, we also changed the algorithm to use a checkerboard
ordering, which changed the numerical properties but is often considered
a better algorithm for both performance and numerical reasons.
We collected performance data from several versions of each of the benchmarks.
In this report, we highlight two comparisons:
- First, we compare the Titanium compiler to the Java compiler using
the reference Java implementation.
- Second, we compare the efficiency of Titanium arrays to Java arrays
by compiling both array types using the Titanium compiler. Wherever
possible, we replaced the for loops with foreach loops.
Results
A composite score of the Scimark benchmarks is shown below and is qualitatively
similar to the performance of individual benchmarks. The composite is
an average of the 5 Scimark benchmarks, the three described above as well
as a Monte Carlo algorithm and a FFT that we did not extensively optimize.
Using the Titanium compiler, the results shown below are with optimization
flag turned on and bound checking flag turned off. The results show two
main results: 1) Titaniums static compilation model produces better
code than the Java compiler; 2) Titanium arrays do not produce better
performance than Java arrays as originally expected, probably because
the expressive power of Titanium arrays, which allow for sub-arrays and
strided arrays, overwhelm any possible performance benefits from the specialized
compiler optimizations.
Our study was one important step in understanding the benefits of Titanium
language features and the Titanium compiler. Neither the Java compilers
nor the Titanium compiler are a fixed quantity, as optimizations are always
being added to both. Thus, we view this work as part of an ongoing process
to understand the single processor performance of Titanium and Java on
scientific applications.
Titanium research website:
http://titanium.cs.berkeley.edu
|