Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

Commit

Permalink
initial import
Browse files Browse the repository at this point in the history
  • Loading branch information
nega committed Nov 8, 2011
0 parents commit 35fa9da
Show file tree
Hide file tree
Showing 3,406 changed files with 1,382,682 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
Binary file added ._01Readme.txt
Binary file not shown.
Binary file added ._02QuickInstall.txt
Binary file not shown.
Binary file added ._03FAQ.txt
Binary file not shown.
Binary file added ._04Windows64bit.txt
Binary file not shown.
Binary file added ._05LargePage
Binary file not shown.
Binary file added ._06WeirdPerformance
Binary file not shown.
Binary file added ._Makefile
Binary file not shown.
Binary file added ._Makefile.alpha
Binary file not shown.
Binary file added ._Makefile.generic
Binary file not shown.
Binary file added ._Makefile.getarch
Binary file not shown.
Binary file added ._Makefile.ia64
Binary file not shown.
Binary file added ._Makefile.mips64
Binary file not shown.
Binary file added ._Makefile.power
Binary file not shown.
Binary file added ._Makefile.rule
Binary file not shown.
Binary file added ._Makefile.sparc
Binary file not shown.
Binary file added ._Makefile.system
Binary file not shown.
Binary file added ._Makefile.tail
Binary file not shown.
Binary file added ._Makefile.x86
Binary file not shown.
Binary file added ._Makefile.x86_64
Binary file not shown.
Binary file added ._benchmark
Binary file not shown.
Binary file added ._c_check
Binary file not shown.
Binary file added ._cblas.h
Binary file not shown.
Binary file added ._common.h
Binary file not shown.
Binary file added ._common_alpha.h
Binary file not shown.
Binary file added ._common_c.h
Binary file not shown.
Binary file added ._common_d.h
Binary file not shown.
Binary file added ._common_ia64.h
Binary file not shown.
Binary file added ._common_interface.h
Binary file not shown.
Binary file added ._common_lapack.h
Binary file not shown.
Binary file added ._common_level1.h
Binary file not shown.
Binary file added ._common_level2.h
Binary file not shown.
Binary file added ._common_level3.h
Binary file not shown.
Binary file added ._common_linux.h
Binary file not shown.
Binary file added ._common_macro.h
Binary file not shown.
Binary file added ._common_mips64.h
Binary file not shown.
Binary file added ._common_param.h
Binary file not shown.
Binary file added ._common_power.h
Binary file not shown.
Binary file added ._common_q.h
Binary file not shown.
Binary file added ._common_reference.h
Binary file not shown.
Binary file added ._common_s.h
Binary file not shown.
Binary file added ._common_sparc.h
Binary file not shown.
Binary file added ._common_thread.h
Binary file not shown.
Binary file added ._common_x.h
Binary file not shown.
Binary file added ._common_x86.h
Binary file not shown.
Binary file added ._common_x86_64.h
Binary file not shown.
Binary file added ._common_z.h
Binary file not shown.
Binary file added ._cpuid.S
Binary file not shown.
Binary file added ._cpuid.h
Binary file not shown.
Binary file added ._cpuid_alpha.c
Binary file not shown.
Binary file added ._cpuid_ia64.c
Binary file not shown.
Binary file added ._cpuid_mips.c
Binary file not shown.
Binary file added ._cpuid_power.c
Binary file not shown.
Binary file added ._cpuid_sparc.c
Binary file not shown.
Binary file added ._cpuid_x86.c
Binary file not shown.
Binary file added ._ctest
Binary file not shown.
Binary file added ._ctest.c
Binary file not shown.
Binary file added ._ctest1.c
Binary file not shown.
Binary file added ._ctest2.c
Binary file not shown.
Binary file added ._driver
Binary file not shown.
Binary file added ._exports
Binary file not shown.
Binary file added ._f_check
Binary file not shown.
Binary file added ._ftest.f
Binary file not shown.
Binary file added ._ftest2.f
Binary file not shown.
Binary file added ._getarch.c
Binary file not shown.
Binary file added ._getarch_2nd.c
Binary file not shown.
Binary file added ._interface
Binary file not shown.
Binary file added ._kernel
Binary file not shown.
Binary file added ._l1param.h
Binary file not shown.
Binary file added ._l2param.h
Binary file not shown.
Binary file added ._lapack
Binary file not shown.
Binary file added ._make.inc
Binary file not shown.
Binary file added ._param.h
Binary file not shown.
Binary file added ._patch.for_lapack-3.1.1
Binary file not shown.
Binary file added ._quickbuild.32bit
Binary file not shown.
Binary file added ._quickbuild.64bit
Binary file not shown.
Binary file added ._quickbuild.win32
Binary file not shown.
Binary file added ._quickbuild.win64
Binary file not shown.
Binary file added ._reference
Binary file not shown.
Binary file added ._symcopy.h
Binary file not shown.
Binary file added ._test
Binary file not shown.
Binary file added ._version.h
Binary file not shown.
32 changes: 32 additions & 0 deletions 00License.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

Copyright 2009, 2010 The University of Texas at Austin.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.

THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT AUSTIN ``AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT
AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation
are those of the authors and should not be interpreted as representing
official policies, either expressed or implied, of The University of
Texas at Austin.
93 changes: 93 additions & 0 deletions 01Readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Optimized GotoBLAS2 libraries version 1.13

By Kazushige Goto <[email protected]>

# This is the last update and done on 5th Feb. 2010.

0. License

See 00TACC_Research_License.txt.

1. Supported OS

Linux
FreeBSD(Also it may work on NetBSD)
OSX
Soralis
Windows 2k, XP, Server 2003 and 2008(both 32bit and 64bit)
AIX
Tru64 UNIX

2. Supported Architecture

X86 : Pentium3 Katmai
Coppermine
Athlon (not well optimized, though)
PentiumM Banias, Yonah
Pentium4 Northwood
Nocona (Prescott)
Core 2 Woodcrest
Core 2 Penryn
Nehalem-EP Corei{3,5,7}
Atom
AMD Opteron
AMD Barlcelona, Shanghai, Istanbul
VIA NANO

X86_64: Pentium4 Nocona
Core 2 Woodcrest
Core 2 Penryn
Nehalem
Atom
AMD Opteron
AMD Barlcelona, Shanghai, Istanbul
VIA NANO

IA64 : Itanium2

Alpha : EV4, EV5, EV6

POWER : POWER4
PPC970/PPC970FX
PPC970MP
CELL (PPU only)
POWER5
PPC440 (QCDOC)
PPC440FP2(BG/L)
POWERPC G4(PPC7450)
POWER6

SPARC : SPARC IV
SPARC VI, VII (Fujitsu chip)

MIPS64/32: Sicortex

3. Supported compiler

C compiler : GNU CC
Cygwin, MinGW
Other commercial compiler(especially for x86/x86_64)

Fortran Compiler : GNU G77, GFORTRAN
G95
Open64
Compaq
F2C
IBM
Intel
PathScale
PGI
SUN
Fujitsu

4. Suported precision

Now x86/x86_64 version support 80bit FP precision in addition to
normal double presicion and single precision. Currently only
gfortran supports 80bit FP with "REAL*10".


5. How to build library?

Please see 02QuickInstall.txt or just type "make".

118 changes: 118 additions & 0 deletions 02QuickInstall.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
Quick installation for GotoBLAS2

***************************************************************************
***************************************************************************
** **
** **
** Just type "make" <<return>>. **
** **
** If you're not satisfied with this library, **
** please read following instruction and customize it. **
** **
** **
***************************************************************************
***************************************************************************


1. REALLY REALLY quick way to build library

Type "make" or "gmake".

$shell> make

The script will detect Fortran compiler, number of cores and
architecture which you're using. If default gcc binary type is
64bit, 64 bit library will be created. Otherwise 32 bit library
will be created.

After finishing compile, you'll find various information about
generated library.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

GotoBLAS2 build complete.

OS ... Linux
Architecture ... x86_64
BINARY ... 64bit
C compiler ... GCC (command line : gcc)
Fortran compiler ... PATHSCALE (command line : pathf90)
Library Name ... libgoto_barcelonap-r1.27.a (Multi threaded; Max
num-threads is 16)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


2. Specifying 32bit or 64bit library

If you need 32bit binary,

$shell> make BINARY=32

If you need 64bit binary,

$shell> make BINARY=64


3. Specifying target architecture

If you need library for different architecture, you can use TARGET
option. You can find current available options in top of getarch.c.
For example, if you need library for Intel core2 architecture,
you'll find FORCE_CORE2 option in getarch.c. Therefore you can
specify TARGET=CORE2 (get rid of FORCE_) with make.

$shell> make TARGET=CORE2

Also if you want GotoBLAS2 to support multiple architecture,

$shell> make DYNAMIC_ARCH=1

All kernel will be included in the library and dynamically switched
the best architecutre at run time.


4. Specifying for enabling multi-threaded

Script will detect number of cores and will enable multi threaded
library if number of cores is more than two. If you still want to
create single threaded library,

$shell> make USE_THREAD=0

Or if you need threaded library by force,

$shell> make USE_THREAD=1


5. Specifying target OS

Target architecture will be determined by the CC. If you
specify cross compiler for MIPS, you can create library for
MIPS architecture.

$shell> make CC=mips64el-linux-gcc TARGET=SICORTEX

Or you can specify your favorite C compiler with absolute path.

$shell> make CC=/opt/intel/cc/32/10.0.026/bin/icc TARGET=BARCELONA

Binary type (32bit/64bit) is determined by checking CC, you
can control binary type with this option.

$shell> make CC="pathcc -m32"

In this case, 32bit library will be created.


6. Specifying Fortran compiler

If you need to support other Fortran compiler, you can specify with
FC option.

$shell> make FC=gfortran


7. Other useful options

You'll find other useful options in Makefile.rule.
119 changes: 119 additions & 0 deletions 03FAQ.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
GotoBLAS2 FAQ

1. General

1.1 Q Can I find useful paper about GotoBLAS2?

A You may check following URL.

http://www.cs.utexas.edu/users/flame/Publications/index.htm

11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of
High-Performance Matrix Multiplication," ACM Transactions on
Mathematical Software, accepted.

15. Kazushige Goto and Robert van de Geijn, "High-Performance
Implementation of the Level-3 BLAS." ACM Transactions on
Mathematical Software, submitted.


1.2 Q Does GotoBLAS2 work with Hyperthread (SMT)?

A Yes, it will work. GotoBLAS2 detects Hyperthread and
avoid scheduling on the same core.


1.3 Q When I type "make", following error occured. What's wrong?

$shell> make
"./Makefile.rule", line 58: Missing dependency operator
"./Makefile.rule", line 61: Need an operator
...

A This error occurs because you didn't use GNU make. Some binary
packages install GNU make as "gmake" and it's worth to try.


1.4 Q Function "xxx" is slow. Why?

A Generally GotoBLAS2 has many well optimized functions, but it's
far and far from perfect. Especially Level 1/2 function
performance depends on how you call BLAS. You should understand
what happends between your function and GotoBLAS2 by using profile
enabled version or hardware performance counter. Again, please
don't regard GotoBLAS2 as a black box.


1.5 Q I have a commercial C compiler and want to compile GotoBLAS2 with
it. Is it possible?

A All function that affects performance is written in assembler
and C code is just used for wrapper of assembler functions or
complicated functions. Also I use many inline assembler functions,
unfortunately most of commercial compiler can't handle inline
assembler. Therefore you should use gcc.


1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it?

A Please understand that OpenMP is a compromised method to use
thread. If you want to use OpenMP based code with GotoBLAS2, you
should enable "USE_OPENMP=1" in Makefile.rule.


1.7 Q Could you tell me how to use profiled library?

A You need to build and link your application with -pg
option. After executing your application, "gmon.out" is
generated in your current directory.

$shell> gprof <your application name> gmon.out

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
89.86 975.02 975.02 79317 0.00 0.00 .dgemm_kernel
4.19 1020.47 45.45 40 0.00 0.00 .dlaswp00N
2.28 1045.16 24.69 2539 0.00 0.00 .dtrsm_kernel_LT
1.19 1058.03 12.87 79317 0.00 0.00 .dgemm_otcopy
1.05 1069.40 11.37 4999 0.00 0.00 .dgemm_oncopy
....

I think profiled BLAS library is really useful for your
research. Please find bottleneck of your application and
improve it.

1.8 Q Is number of thread limited?

A Basically, there is no limitation about number of threads. You
can specify number of threads as many as you want, but larger
number of threads will consume extra resource. I recommend you to
specify minimum number of threads.


2. Architecture Specific issue or Implementation

2.1 Q GotoBLAS2 seems to support any combination with OS and
architecture. Is it possible?

A Combination is limited by current OS and architecture. For
examble, the combination OSX with SPARC is impossible. But it
will be possible with slight modification if these combination
appears in front of us.


2.2 Q I have POWER architecture systems. Do I need extra work?

A Although POWER architecture defined special instruction
like CPUID to detect correct architecture, it's privileged
and can't be accessed by user process. So you have to set
the architecture that you have manually in getarch.c.


2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong?

A You have to make sure if lib.exe and mspdb80.dll are in Microsoft
Studio PATH. The easiest way is to use 'which' command.

$shell> which lib.exe
/cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe
13 changes: 13 additions & 0 deletions 04Windows64bit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

Quick guide to build library for Windows 64bit.

1. What you need

a. Windows Server 2003 or later
b. Cygwin environment(make, gcc, g77, perl, sed, wget)
c. MinGW64 compiler
d. Microsoft Visual Studio (lib.exe and mspdb80.dll are required to create dll)

2. Do ./quickbuild.win64

Good luck
Loading

0 comments on commit 35fa9da

Please sign in to comment.