Language performance benchmarks

These are my own results from Christopher W. Cowell-Shah's benchmarking code, available at http://www.cowell-shah.com/research/benchmark/code, the results of which he compiled into This OSnews Article.

My benchmarks are applicable to performance on AMD Athlon XP/MP systems. All binaries have been compiled to use the SSE extensions on these processors.

My binaries are available at http://fails.org/benchmark.

Tests were performed on a dual Athlon MP 2.0GHz system running Linux 2.6.0.

b-gcc was compiled by gcc 3.3 with these flags: -O3 -march=athlon -msse

b-icc is compiled by icc 8.0 with these flags: -tpp6 -xiMK -O3

b-icc-opt has been optimized with Profile Guided Optimization. First, Benchmark.c was compiled with -prof_gen to create an "instrument" executable. Next, the instrument executable was executed, and a run-time profile was generated (in the form of a .dyn file). Finally, b-icc-opt itself was compiled with -prof_use -tpp6 -xiMK -O3.

The Java code was compiled with J2SE v 1.4.2_03 javac -g:none and executed with java -server

Results

Integer performance

Intel icc 8.0 (PGO) 6340 ms
gcc 3.3 6550 ms
Intel icc 8.0 6740 ms
Java 1.4 7271 ms

There wasn't much variance here as in other tests. icc actually loses out to gcc unless profile guided optimization is used, which was somewhat surprising.

Floating point performance

Intel icc 8.0 (PGO) 5540 ms
Intel icc 8.0 5560 ms
gcc 3.3 6250 ms
Java 1.4 11501 ms

Here we see that icc is the clear winner and that Java is the clear loser.

64-bit integer performance

gcc 3.3 16760 ms
Java 1.4 23017 ms
Intel icc 8.0 27140 ms
Intel icc 8.0 (PGO) 27460 ms

Ouch! icc is the clear loser here. Not only was Profile Guided Optimization unable to help, but the binary compiled with PGO actually lost out completely. Clearly Intel has not placed much effort into optimizing icc's 64-bit integer performance.

Trig performance

Intel icc 8.0 (PGO) 2430 ms
Intel icc 8.0 2510 ms
gcc 3.3 3640 ms
Java 1.4 77649 ms

Ouch indeed! Here we see that something is terribly wrong either with Java 1.4 or the Java implementation being used. The benchmark author noted the same behavior in the original OSnews article, and also noted that performance was significantly better on this benchmark with Java 1.3.

I/O performance

gcc 3.3 1090 ms
Intel icc 8.0 (PGO) 1190 ms
Intel icc 8.0 1230 ms
Java 1.4 3418 ms

Here Java performs three times worse on what should be an I/O bound operation. This is the second benchmark where gcc outperforms icc with PGO, the first being 64-bit integer performance.

Conclusion

Comparing my results to those of the original author, there are three possible explanations for the descrepancies between his findings for gcc vs. Java performance.

The first is that the Cygwin bundled gcc is somehow producing significantly worse code on Win32 than it is on Linux. My guess would be that this is not the case... the code produced should be more or less the same on all the mathematical benchmarks.

The second is that the JRE implementation on Windows is much better than on Linux. I would also guess that this is not the case, and would expect them to function with more or less equivalent performance.

So, that leaves us with the final conclusion which we can draw, and would be my guess as to why Java performed so much better in the original author's benchmarks relative to gcc. I would assume that the HotSpot compiler produces much better SSE2 optimized code than gcc. Without SSE2, Java's performance is worse in all respects tested here than native code compiled with gcc 3.3.

[Update 2004-01-09]: Several people have written in to suggest that Cygwin's POSIX compatibility layer might be responsible for the lower relative performance of gcc in the original benchmarks. If you read the original article you will see that the author was using the -mno-cygwin flag which causes gcc to use the Windows API directly. This is not a problem.

Many other people have suggested my assumption about the relative performance of the JVM on Windows versus Linux is incorrect. While the general performance of the JVM on Windows may be significantly better than on Linux, my assumption was that the mechanisms responsible for executing the mathematical code, namely the HotSpot compiler, will operate more or less the same on both Windows and Linux. Perhaps I will do some benchmarks to verify this empirically.

As for the performance of the trig benchmark in Java, the original author has stated that his implementation is probably sub-par from a performance stantpoint, and that since the StrictMath class was used (albeit indirectly) the results were both much more accurate and much more computationally intensive.

I didn't want this whole writeup to sound too critical of Java. Clearly for a language designed to operate in a platform agnostic manner, it's holding its own in what might be considered a somewhat nonstandard configuration (Linux/Athlon). On the Win32/P4 side of things, Java was mostly on par with, and at one point surpassing native code, and my best guess was because of HotSpot's SSE2 backend.

- Bascule <bascule@dragon.atmos.colostate.edu>