This repository has been archived by the owner on Aug 10, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
nega
committed
Nov 8, 2011
0 parents
commit 35fa9da
Showing
3,406 changed files
with
1,382,682 additions
and
0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
|
||
Copyright 2009, 2010 The University of Texas at Austin. | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are | ||
met: | ||
|
||
1. Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in | ||
the documentation and/or other materials provided with the | ||
distribution. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT AUSTIN ``AS IS'' | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, | ||
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | ||
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT | ||
AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED | ||
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR | ||
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF | ||
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
|
||
The views and conclusions contained in the software and documentation | ||
are those of the authors and should not be interpreted as representing | ||
official policies, either expressed or implied, of The University of | ||
Texas at Austin. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
Optimized GotoBLAS2 libraries version 1.13 | ||
|
||
By Kazushige Goto <[email protected]> | ||
|
||
# This is the last update and done on 5th Feb. 2010. | ||
|
||
0. License | ||
|
||
See 00TACC_Research_License.txt. | ||
|
||
1. Supported OS | ||
|
||
Linux | ||
FreeBSD(Also it may work on NetBSD) | ||
OSX | ||
Soralis | ||
Windows 2k, XP, Server 2003 and 2008(both 32bit and 64bit) | ||
AIX | ||
Tru64 UNIX | ||
|
||
2. Supported Architecture | ||
|
||
X86 : Pentium3 Katmai | ||
Coppermine | ||
Athlon (not well optimized, though) | ||
PentiumM Banias, Yonah | ||
Pentium4 Northwood | ||
Nocona (Prescott) | ||
Core 2 Woodcrest | ||
Core 2 Penryn | ||
Nehalem-EP Corei{3,5,7} | ||
Atom | ||
AMD Opteron | ||
AMD Barlcelona, Shanghai, Istanbul | ||
VIA NANO | ||
|
||
X86_64: Pentium4 Nocona | ||
Core 2 Woodcrest | ||
Core 2 Penryn | ||
Nehalem | ||
Atom | ||
AMD Opteron | ||
AMD Barlcelona, Shanghai, Istanbul | ||
VIA NANO | ||
|
||
IA64 : Itanium2 | ||
|
||
Alpha : EV4, EV5, EV6 | ||
|
||
POWER : POWER4 | ||
PPC970/PPC970FX | ||
PPC970MP | ||
CELL (PPU only) | ||
POWER5 | ||
PPC440 (QCDOC) | ||
PPC440FP2(BG/L) | ||
POWERPC G4(PPC7450) | ||
POWER6 | ||
|
||
SPARC : SPARC IV | ||
SPARC VI, VII (Fujitsu chip) | ||
|
||
MIPS64/32: Sicortex | ||
|
||
3. Supported compiler | ||
|
||
C compiler : GNU CC | ||
Cygwin, MinGW | ||
Other commercial compiler(especially for x86/x86_64) | ||
|
||
Fortran Compiler : GNU G77, GFORTRAN | ||
G95 | ||
Open64 | ||
Compaq | ||
F2C | ||
IBM | ||
Intel | ||
PathScale | ||
PGI | ||
SUN | ||
Fujitsu | ||
|
||
4. Suported precision | ||
|
||
Now x86/x86_64 version support 80bit FP precision in addition to | ||
normal double presicion and single precision. Currently only | ||
gfortran supports 80bit FP with "REAL*10". | ||
|
||
|
||
5. How to build library? | ||
|
||
Please see 02QuickInstall.txt or just type "make". | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
Quick installation for GotoBLAS2 | ||
|
||
*************************************************************************** | ||
*************************************************************************** | ||
** ** | ||
** ** | ||
** Just type "make" <<return>>. ** | ||
** ** | ||
** If you're not satisfied with this library, ** | ||
** please read following instruction and customize it. ** | ||
** ** | ||
** ** | ||
*************************************************************************** | ||
*************************************************************************** | ||
|
||
|
||
1. REALLY REALLY quick way to build library | ||
|
||
Type "make" or "gmake". | ||
|
||
$shell> make | ||
|
||
The script will detect Fortran compiler, number of cores and | ||
architecture which you're using. If default gcc binary type is | ||
64bit, 64 bit library will be created. Otherwise 32 bit library | ||
will be created. | ||
|
||
After finishing compile, you'll find various information about | ||
generated library. | ||
|
||
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | ||
|
||
GotoBLAS2 build complete. | ||
|
||
OS ... Linux | ||
Architecture ... x86_64 | ||
BINARY ... 64bit | ||
C compiler ... GCC (command line : gcc) | ||
Fortran compiler ... PATHSCALE (command line : pathf90) | ||
Library Name ... libgoto_barcelonap-r1.27.a (Multi threaded; Max | ||
num-threads is 16) | ||
|
||
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | ||
|
||
|
||
2. Specifying 32bit or 64bit library | ||
|
||
If you need 32bit binary, | ||
|
||
$shell> make BINARY=32 | ||
|
||
If you need 64bit binary, | ||
|
||
$shell> make BINARY=64 | ||
|
||
|
||
3. Specifying target architecture | ||
|
||
If you need library for different architecture, you can use TARGET | ||
option. You can find current available options in top of getarch.c. | ||
For example, if you need library for Intel core2 architecture, | ||
you'll find FORCE_CORE2 option in getarch.c. Therefore you can | ||
specify TARGET=CORE2 (get rid of FORCE_) with make. | ||
|
||
$shell> make TARGET=CORE2 | ||
|
||
Also if you want GotoBLAS2 to support multiple architecture, | ||
|
||
$shell> make DYNAMIC_ARCH=1 | ||
|
||
All kernel will be included in the library and dynamically switched | ||
the best architecutre at run time. | ||
|
||
|
||
4. Specifying for enabling multi-threaded | ||
|
||
Script will detect number of cores and will enable multi threaded | ||
library if number of cores is more than two. If you still want to | ||
create single threaded library, | ||
|
||
$shell> make USE_THREAD=0 | ||
|
||
Or if you need threaded library by force, | ||
|
||
$shell> make USE_THREAD=1 | ||
|
||
|
||
5. Specifying target OS | ||
|
||
Target architecture will be determined by the CC. If you | ||
specify cross compiler for MIPS, you can create library for | ||
MIPS architecture. | ||
|
||
$shell> make CC=mips64el-linux-gcc TARGET=SICORTEX | ||
|
||
Or you can specify your favorite C compiler with absolute path. | ||
|
||
$shell> make CC=/opt/intel/cc/32/10.0.026/bin/icc TARGET=BARCELONA | ||
|
||
Binary type (32bit/64bit) is determined by checking CC, you | ||
can control binary type with this option. | ||
|
||
$shell> make CC="pathcc -m32" | ||
|
||
In this case, 32bit library will be created. | ||
|
||
|
||
6. Specifying Fortran compiler | ||
|
||
If you need to support other Fortran compiler, you can specify with | ||
FC option. | ||
|
||
$shell> make FC=gfortran | ||
|
||
|
||
7. Other useful options | ||
|
||
You'll find other useful options in Makefile.rule. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
GotoBLAS2 FAQ | ||
|
||
1. General | ||
|
||
1.1 Q Can I find useful paper about GotoBLAS2? | ||
|
||
A You may check following URL. | ||
|
||
http://www.cs.utexas.edu/users/flame/Publications/index.htm | ||
|
||
11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of | ||
High-Performance Matrix Multiplication," ACM Transactions on | ||
Mathematical Software, accepted. | ||
|
||
15. Kazushige Goto and Robert van de Geijn, "High-Performance | ||
Implementation of the Level-3 BLAS." ACM Transactions on | ||
Mathematical Software, submitted. | ||
|
||
|
||
1.2 Q Does GotoBLAS2 work with Hyperthread (SMT)? | ||
|
||
A Yes, it will work. GotoBLAS2 detects Hyperthread and | ||
avoid scheduling on the same core. | ||
|
||
|
||
1.3 Q When I type "make", following error occured. What's wrong? | ||
|
||
$shell> make | ||
"./Makefile.rule", line 58: Missing dependency operator | ||
"./Makefile.rule", line 61: Need an operator | ||
... | ||
|
||
A This error occurs because you didn't use GNU make. Some binary | ||
packages install GNU make as "gmake" and it's worth to try. | ||
|
||
|
||
1.4 Q Function "xxx" is slow. Why? | ||
|
||
A Generally GotoBLAS2 has many well optimized functions, but it's | ||
far and far from perfect. Especially Level 1/2 function | ||
performance depends on how you call BLAS. You should understand | ||
what happends between your function and GotoBLAS2 by using profile | ||
enabled version or hardware performance counter. Again, please | ||
don't regard GotoBLAS2 as a black box. | ||
|
||
|
||
1.5 Q I have a commercial C compiler and want to compile GotoBLAS2 with | ||
it. Is it possible? | ||
|
||
A All function that affects performance is written in assembler | ||
and C code is just used for wrapper of assembler functions or | ||
complicated functions. Also I use many inline assembler functions, | ||
unfortunately most of commercial compiler can't handle inline | ||
assembler. Therefore you should use gcc. | ||
|
||
|
||
1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it? | ||
|
||
A Please understand that OpenMP is a compromised method to use | ||
thread. If you want to use OpenMP based code with GotoBLAS2, you | ||
should enable "USE_OPENMP=1" in Makefile.rule. | ||
|
||
|
||
1.7 Q Could you tell me how to use profiled library? | ||
|
||
A You need to build and link your application with -pg | ||
option. After executing your application, "gmon.out" is | ||
generated in your current directory. | ||
|
||
$shell> gprof <your application name> gmon.out | ||
|
||
Each sample counts as 0.01 seconds. | ||
% cumulative self self total | ||
time seconds seconds calls Ks/call Ks/call name | ||
89.86 975.02 975.02 79317 0.00 0.00 .dgemm_kernel | ||
4.19 1020.47 45.45 40 0.00 0.00 .dlaswp00N | ||
2.28 1045.16 24.69 2539 0.00 0.00 .dtrsm_kernel_LT | ||
1.19 1058.03 12.87 79317 0.00 0.00 .dgemm_otcopy | ||
1.05 1069.40 11.37 4999 0.00 0.00 .dgemm_oncopy | ||
.... | ||
|
||
I think profiled BLAS library is really useful for your | ||
research. Please find bottleneck of your application and | ||
improve it. | ||
|
||
1.8 Q Is number of thread limited? | ||
|
||
A Basically, there is no limitation about number of threads. You | ||
can specify number of threads as many as you want, but larger | ||
number of threads will consume extra resource. I recommend you to | ||
specify minimum number of threads. | ||
|
||
|
||
2. Architecture Specific issue or Implementation | ||
|
||
2.1 Q GotoBLAS2 seems to support any combination with OS and | ||
architecture. Is it possible? | ||
|
||
A Combination is limited by current OS and architecture. For | ||
examble, the combination OSX with SPARC is impossible. But it | ||
will be possible with slight modification if these combination | ||
appears in front of us. | ||
|
||
|
||
2.2 Q I have POWER architecture systems. Do I need extra work? | ||
|
||
A Although POWER architecture defined special instruction | ||
like CPUID to detect correct architecture, it's privileged | ||
and can't be accessed by user process. So you have to set | ||
the architecture that you have manually in getarch.c. | ||
|
||
|
||
2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong? | ||
|
||
A You have to make sure if lib.exe and mspdb80.dll are in Microsoft | ||
Studio PATH. The easiest way is to use 'which' command. | ||
|
||
$shell> which lib.exe | ||
/cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
|
||
Quick guide to build library for Windows 64bit. | ||
|
||
1. What you need | ||
|
||
a. Windows Server 2003 or later | ||
b. Cygwin environment(make, gcc, g77, perl, sed, wget) | ||
c. MinGW64 compiler | ||
d. Microsoft Visual Studio (lib.exe and mspdb80.dll are required to create dll) | ||
|
||
2. Do ./quickbuild.win64 | ||
|
||
Good luck |
Oops, something went wrong.