Project: Simulation in Titanium
Student Researchers: Szu-Huey Chuang, Carrie Fei, Ellen Tsai
Advisor: Katherine Yelick
Institution: University of California, Berkeley





Introduction

Biological research increasingly relies on high end computing technology to aid in the modeling and simulation of biologic systems. For example, simulation of three-dimensional models of biologic systems, such as the human heart, requires large-scale parallel machines, which are difficult to manage and program. To simplify programming, the Titanium group at UC Berkeley has developed a high performance dialect of Java that uses a single program multiple data (SPMD) parallelism model with a global address space. Titanium offers a uniform programming model across single processors and the more complicated shared and distributed memory parallel machines. They have also developed a heart simulation in Titanium, using the immersed boundary method developed by Peskin and McQueen. Titanium uses a static compilation model and provides support for expressive multidimensional arrays. The goal of our research is to evaluate the single node performance of the Titanium compiler and of the Titanium array constructs, and to investigate techniques for performance tuning in Titanium.

Benchmarking Process


Our benchmarking is done using the Scimark2 Benchmark suite. This suite is a collection of single processor Java programs that run popular scientific computations. We used the following benchmarks:

  • Jacobi Successive Over-Relaxation (SOR)
  • Sparse matrix multiply
  • Dense LU matrix factorization

We used both a small and large input data set--the small data input is designed to fit in cache, whereas the large one will reside in main memory. We also varied the compiler flags, choosing the optimization level and whether binds’ checking was turned off.

Our strategy was to start with the reference Java implementation and incrementally refine it using Titanium features. We started by introducing Titanium arrays, and then converted the loops to use Titanium’s "foreach" construct. ‘Foreach loops’ are unique to Titanium because of their unordered iterations through multidimensional arrays, which simplify binds' checking and other optimizations. The most difficult task in our project was to develop an understanding of Titanium language, including both the semantics of the language extensions (e.g., when foreach loops were legal) and the performance implications of each construct. In the case of the SOR benchmark, we also changed the algorithm to use a checkerboard ordering, which changed the numerical properties but is often considered a better algorithm for both performance and numerical reasons.

We collected performance data from several versions of each of the benchmarks. In this report, we highlight two comparisons:

  • First, we compare the Titanium compiler to the Java compiler using the reference Java implementation.
  • Second, we compare the efficiency of Titanium arrays to Java arrays by compiling both array types using the Titanium compiler. Wherever possible, we replaced the for loops with foreach loops.

Results

A composite score of the Scimark benchmarks is shown below and is qualitatively similar to the performance of individual benchmarks. The composite is an average of the 5 Scimark benchmarks, the three described above as well as a Monte Carlo algorithm and a FFT that we did not extensively optimize. Using the Titanium compiler, the results shown below are with optimization flag turned on and bound checking flag turned off. The results show two main results: 1) Titanium’s static compilation model produces better code than the Java compiler; 2) Titanium arrays do not produce better performance than Java arrays as originally expected, probably because the expressive power of Titanium arrays, which allow for sub-arrays and strided arrays, overwhelm any possible performance benefits from the specialized compiler optimizations.



Our study was one important step in understanding the benefits of Titanium language features and the Titanium compiler. Neither the Java compilers nor the Titanium compiler are a fixed quantity, as optimizations are always being added to both. Thus, we view this work as part of an ongoing process to understand the single processor performance of Titanium and Java on scientific applications.

Titanium research website:
http://titanium.cs.berkeley.edu