The PathScale™ Compiler Suite is consistently proving to be the highest performing 64-bit compilers for AMD-based Opteron™ systems running the Linux operating system. Relative to our competition, our performance advantages are clear.
At PathScale, we are focused on maximizing real-world application performance. With help from some of our partners, we have been working with a number of well-known application codes.
The PathScale Compiler Suite has been able to prove performance advantages of up 40% over the most well-known competitive Opteron compiler alternative.
As always, the results on your specific codes will vary, but we are confident that using the PathScale C, C++, and Fortran compilers will bring immediate benefits to Opteron and Athlon64 users.
Do you have benchmark results you would like to share with other PathScale compiler customers? Send email to support@pathscale.com and we will work to include it in this page.
POLYHEDRON
The PathScale Compiler Suite produces the fastest and most accurate
results for AMD Opteron systems for the Polyhedron 2004 Fortran 90 and Fortran
77 benchmarks. The Polyhedron web pages demonstrate this -- http://www.polyhedron.co.uk/compare/linux/f90bench_AMD.html, http://www.polyhedron.co.uk/compare/linux/f77bench_AMD.html --
as they show that PathScale is the only compiler vendor with no red squares for
64-bit results (meaning that on no code is 64-bit PathScale 50% or more
slower than the fastest compiler on a benchmark code).
Here are some comparisons obtained from those web pages:
| Polyhedron 2004 F77 Benchmarks 64-bit Compiler Comparisons |
| |
Geometric Mean Time in seconds |
PathScale % Faster |
| PathScale 2.1 |
19.16 |
|
| Commercial Compiler A |
25.54 |
+33% |
| Commercial Compiler B |
22.75 |
+19% |
| Commercial Compiler C |
29.26 |
+53% |
| Polyhedron 2004 F90 Benchmarks 64-bit Compiler Comparisons |
| |
Geometric Mean Time in seconds |
PathScale % Faster |
| PathScale 2.1 |
22.76 |
|
| Commercial Compiler A |
26.25 |
+15% |
| Commercial Compiler B |
28.41 |
+25% |
| Commercial Compiler C |
35.30 |
+55% |
Details on the test machine and OS, are at the f77 and f90 URLs shown above.
Details compiler versions are at http://www.polyhedron.com/compare/linux/version.html.
PathScale 2.1 64-bit optimization flags:
F77: -O3 -LNO:fu=9 -OPT:div_split:fast_math:fast_sqrt -IPA:plimit=3500
F90: -Ofast -OPT:fast_math=on -WOPT:if_conv=off -LNO:fu=9:full_unroll_size=7000
SPEC® CPU2000
The PathScale Compiler Suite enables the highest performance results for
both integer and floating point SPEC CPU2000 speed benchmarks for any
AMD64-based Linux® system. The best evidence for this is that since October
2004 through August 26, 2005, on AMD processors and Linux operating systems,
there have been 186 CPU2000 results published at www.spec.org using
PathScale compilers and none with other compilers.
Since there are no results with competitive compilers published recently on the
SPEC web site, we ran our own comparison to a competitive compiler using latest
compilers for each with the following results:
| |
SPECint®2000 |
SPECfp®2000 |
| PathScale ™ v2.2.1 |
1598 |
1984 |
| PGI® Workstation 6.0-5 |
1269 |
1779 |
| % Faster for PathScale |
+26% |
+12% |
Benchmarks were run on a 2.2 Ghz 1-CPU system with DDR400/PC3200 memory. Full
details on the compiler flags and configuration used are available here. If
anyone can provide us with improved base or peak optimization flags for the
competitive compiler, we will be happy to use them and update these results.
Results Published by Our Partners Using the PathScale Compiler Suite -- Including Dual-Core Opteron Results
Recently, AMD and HP have chosen to submit SPEC CPU2000 results for dual-core
Opteron (for example, Opteron Models 275 and 875) systems with PathScale
Compilers.
Also, IBM, HP, Fujitsu-Siemens, Sun, and AMD continue to choose the PathScale Compiler Suite to get the highest level of performance from their
AMD64-based Linux® systems.
SPEC® OMP2001
"Sun tested 2-way Sun Fire V20z and 4-way Sun Fire V40z servers using multiple SPEC benchmarks,
including the SPEC® ompM2001 suite of OpenMP® benchmarks. The PathScale Compiler Suite
helped Sun's AMD® Opteron® processor-based servers set world records for SPEC ompM2001 on
two-processor and four-processor systems. The Sun/PathScale two-processor results were 29 percent
faster (footnote 1) than previous-best Linux ompM2001 benchmarks using non-PathScale Fortran and
C compilers. This 29 percent advantage, enabled in large part by PathScale compilers,
far exceeds the eight percent faster clock rate of the newer Sun systems."
Footnote 1:
(1) About the SPEC OMPM2001 Results Reported Above:
Two-Processor Results: The Sun V40z server with PathScale Compiler Suite
and 2.6 GHz AMD Opteron CPUs achieved a result of 6486 on a system with two
cores, two chips and two threads. This comparison is based on the best
performing two-processor Linux servers currently shipping, including previous
results with competitor's compiler on a 2.4 GHz Sun Java Workstation W2100z
system [SPECompM2001 5085, two cores, two chips, two threads].
Four-Processor Results: The Sun V40z server with PathScale Compiler
Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 11223 on a system with
four cores, four chips and four threads. This comparison is based on the best
performing four-processor Linux servers currently shipping, including previous
results on a 2.4 GHz Sun V40z system with a non-PathScale compiler [SPECompM2001
8694, four cores, four chips, four threads].
STREAM
The Pathscale Compiler Suite produces the highest single-CPU and
OpenMP Parallel STREAM results for any system powered by AMD CPUs.
OpenMP
| Machine ID (Higher results are better) |
ncpus |
COPY |
SCALE |
ADD |
TRIAD |
| AMD_Opteron_848 (PathScale 2.0) |
4 |
15378 |
15845 |
15618 |
15921 |
| PathScale 2.2 |
4 |
16872 |
16932 |
16543 |
16545 |
| % Faster for PathScale 2.2 vs. 2.0 |
|
+8% |
+7% |
+6% |
+4% |
Single CPU
The above results are for STREAM Benchmarks run on Opteron 248 (2.2 Ghz) machines with DDR400
memory and are posted at http://www.cs.virginia.edu/stream/standard/Bandwidth.html.
Results for both PathScale and our competitor are both identified as
as 'ASUS_SK8N_Opteron248' and 'ASUS_SK8N_Opteron248 (1 CPU)'. Click on the data
link at the right of those lines for more details on the submission.
* Results with 'PathScale 2.1' are on the same system as the 2.0
results on the STREAM web site and use the
following optimization flags:
OpenMP: pathf90/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp
Serial: pathf90/pathcc -O3 -CG:use_prefetchnta .
HimenoBMT "The Performance Evaluation"
The Pathscale Compiler Suite produces the excellent single CPU and
4-CPU OpenMP results using the popular Himeno benchmark:http://accc.riken.jp/HPC/HimenoBMT/index_e.html
Serial results on Opteron 2.2 GHz, PC3200
| |
F77 |
F90
MFLOPS |
C |
| PathScale 2.0 |
1584 |
1189 |
267 |
| 64-bit Commercial Compiler |
1419 |
1125 |
141 |
| GNU compilers (3.4.3 & g95) |
1002 |
588 |
213 |
| PathScale Advantage |
| Commercial 64-bit compiler |
+12% |
+6% |
+89% |
| Gnu compilers (3.4.3 & g95) |
+58% |
+102% |
+25% |
4-thread OpenMP Results on 4-CPU (Microway) 2.2 GHz Opteron, PC3200 server
| |
4 thread MFLOPS |
PathScale Advantage |
| Original Himeno F77 OpenMP code PathScale 2.0 |
1969 |
|
| Commercial 64-bit compiler |
1691 |
+16% |
| PathScale-modified* Himeno F77 OpenMP code PathScale 2.0 |
5155 |
|
| Commercial 64-bit compiler |
4309 |
+20% |
* _System & Compiler Flag & source code Details_
SPEC® and the benchmark names SPECfp® and
SPECint® are registered trademarks of the Standard Performance Evaluation
Corporation. AMD, AMD Opteron, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Linux is a registered trademark of Linus
Torvalds. All other trademarks and company names mentioned are the property of
their respective owners.
AM2 ATMOSPHERE MODEL CODE
This code used at the University of Utah's Meteorology Department for climate research.
The code consists of several closely coupled modules and is parallelized with MPI.
It is written with Fortran 95 constructs. |
 |
 |
Results for this benchmark were run independently at the University of Utah and published with their permission. |
| |
1CPU |
2CPU |
4CPU |
| PathScale v1.2 |
368.89 sec. |
201.88 sec. |
99.11 sec. |
| PGI v5.2 |
483.45 sec. |
253.38 sec. |
135.53 sec. |
| % Faster for PathScale |
+31.1% |
+25.5% |
+36.7.5% |
QUANTUM MONTE CARLO
Monte Carlo methods are extremely important in computational physics and related applied fields, and have many diverse applications.
PathScale compilers do particularly well in Monte Carlo codes. |
 |
 |
Results for this benchmark were run independently at Los Alamos National Laboratory and published with their permission. |
| |
Time
(lower number is better) |
PathScale % Faster |
| PathScale v1.0 |
78.08 sec. |
|
| PGI v5.1 |
135.80 Sec. |
73.93% |
| GCC v3.4.0 |
111.01 sec. |
42.20% |
Compiler settings for Quantum Monte Carlo C++ application
PathScale v1.0: pathCC -64 -ansiE -Ofast -ffast-math
PGI v5.1: pgCC -Kieee -fastsse -O3 -Minline=levels:10 -Msafeptr=global
-Mvect=sse -Mvect=assoc -Mvect=cachesize:1048576 -Mvect=prefetch
GCC v3.4.0: g++ -O3 -ffast-math -mtune=opteron -mfpmath=sse,387
-mieee-fp -m64
For more information on this benchmark go to: http://sourceforge.net/projects/qmcbeaver.