-
Notifications
You must be signed in to change notification settings - Fork 16
/
Changes
886 lines (825 loc) · 53 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
*****************************************************************************
** TAU Portable Profiling Package **
** http://tau.uoregon.edu **
*****************************************************************************
** Copyright 1997-2025 **
** Department of Computer and Information Science, University of Oregon **
** Research Centre Juelich, Germany **
** Advanced Computing Laboratory, Los Alamos National Laboratory **
*****************************************************************************
Change log:
------------
Version 2.34 changes (from 2.33):
1. Added support for new LLVM compilers (-c++=parascc).
2. Updated support for ROCm 6.2.3.
3. Added ZeroSum and PerfStubs repos as git submodules.
Version 2.33 changes (from 2.32):
1. Added -dyninst=download for binary rewriting with DyninstAPI.
2. Added tracking syscalls (configure -syscall, tau_exec -syscall, see examples/syscall).
3. Added support for rocm_smi plugin for monitoring in TAU.
4. Updated -syscall support in TAU.
5. Updated bfd=download for aarch64.
6. Added initial support for Grace-Grace and Grace-Hopper platforms.
7. Added support for Score-P 8.3 using -scorep=download.
8. Added support for DyninstAPI 13.0.0 using -dyninst=download.
9. Added support for Score-P 8.4 using -scorep=download.
10. Added support for -no-pie on Ubuntu for instrumenting an application using TAU scripts.
11. Added support for ROCm 6.x with rocprofiler-sdk.
Version 2.32 changes (from 2.31):
1. Added support for riscv64 (-optCompInst, tau_exec -ebs, mpi).
2. Updated support for hipcc as C++ compiler with MVAPICH2 on Cray systems for OMPT.
3. Updated default MPI configuration.
4. Updated support for ROCm 5.2.1.
5. Refactored TAU to use a dynamic threading implementation instead of static arrays.
6. Updated MPI wrapper interposition library to not invoke PMPI_Type_size when the type is MPI_DATATYPE_NULL.
7. Updated MPI_Status_f2c handling when PMPI_Status_f2c (and c2f) are defined instead.
8. Updated LLVM plugin for new pass manager.
9. Added support for ROCm 5.6.0.
10. Added support for DyninstAPI 12.0.
11. Added support for updated binutils and libunwind external packages.
12. Fixed trace buffer output on MPI rank 1 bug.
13. Updated memory instrumentation for aarch64.
14. Updated TAU plugin examples.
15. Added support for TAU LLVM plugin environment variables
(e.g., TAU_COMPILER_MIN_INSTRUCTION_COUNT - see plugins/llvm/src/README).
Version 2.31 changes (from 2.30):
1. Updates for MPI on craycnl (CRAY_MPICH_PREFIX).
2. Updates for NEC VE for memory profiling.
3. Added support for PrgEnv-aocc and PrgEnv-nvidia on craycnl.
4. Added support for CUDA 11.4 and ROCm 4.3.
5. Updated TAU's LLVM selective instrumentation for -optCompInst. Supports C++.
6. Updated tau2slog2 for multi-threaded testcases.
7. Updated support for demangling C++ symbol names (TauBfd.cpp).
8. Added new AMD HIP examples (examples/gpu/hip) including MPI.
9. Updated TAU for Windows 11.
10. Added support for TAU_ANONYMIZE: creates tauprofile.xml and tau_anonymized_key.xml.
11. Added tau_join.sh a tool to decrypt and join the files created by TAU_ANONYMIZE.
12. Added support for ROCm 4.5.2.
13. Added support for Cray PrgEnv-amd.
14. Fixed a bug with throttling and CUPTI timer stop.
15. Allows a user to specify different compilers on Cray systems.
16. Added support for TAU_COMPILER_SELECT_FILE for specifying selective instrumentation for hipcc/clang for C++ files.
17. Added support for PrgEnv-cray with -c++=hipcc -cc=clang for LLVM selective instrumentation plugin.
18. Added support for specifying full path of Fortran compiler (e.g., mpiifort) while building TAU.
19. Added support for using MPI compiler scripts with -c++=hipcc -cc=clang using mpiinc and mpilib on HPE/Cray.
20. Added initial support for configuring TAU with OMPT target offload with ROCm 5.1.0 (examples/openmp/mpi_target).
21. Added an example for Intel OpenMP target offload for Intel GPUs using icpx (examples/openmp/intelmpi_target).
Version 2.30 changes (from 2.29):
1. Added support for Fugaku Fujitsu FX system with Fujitsu compilers (-mpi -bfd=download) for Fugaku.
2. Added support for Intel OneAPI (TigerLake Intel GPUs) (-opencl=<dir> -level_zero=<dir>). See examples/gpu/oneapi.
3. Enhanced support for PAPI for NEC SX Aurora (-arch=nec-sx-aurora) using TAU_METRICS and tau_exec -ebs.
4. Added preliminary support for Apple macOS arm64_apple architecture (M1 Mac Mini).
5. Updated support for ROCm 3.9 and use of pthread with ROCm in system installed rocm (not spack).
6. Updated TAU to use PAPI from git repo on NEC SX Aurora.
7. Updated 3D support in ParaProf using new Jogl libs for Apple M1 macs (arch is arm64_apple).
8. Added support for demangling dpcpp kernel names using the -fno-sycl-unnamed-lambda compiler flag.
9. Added support for Cray CCE compilers on A64FX (aarch64) systems (Ookami) without libunwind.
10. Added support for flang v12 using gfortran as backend.
11. Added support for NVHPC (21.5) (-c++=nvc++ -cc=nvc -fortran=nvfortran).
12. Added support for Cray CNL (-arch=craycnl) with PrgEnv-gnu and AMD GPUs (Spock, ORNL).
13. Added initial support for a TAU LLVM plugin for supporting function level selective instrumentation.
14. Added support for PDT and C/C++ for IBM AIX 7.2 (ibm64).
15. Added support for TAU's LLVM selective instrumentation plugin (examples/plugin/llvm).
Version 2.29 changes (from 2.28):
1. Updated OMPT support.
2. Updated CUDA 10 support.
3. Updated Clang compiler-based instrumentation, OMPT, and HIP support.
4. Updated Paraprof support for 3D.
5. Updated OpenCL support for GPUs from multiple vendors.
6. Added support for NEC SX Aurora (-arch=nec-sx-aurora) platform.
7. Added support for ROCm 3.x using rocprofiler for AMD GPUs.
8. Added initial support for path based profiling (-PROFILEPATHS) for MVAPICH2 MPI_T w GPUs.
Version 2.28 changes (from 2.27):
1. Added support for AMD ROCm (configure -rocm[=/path]).
2. Updated OpenCL with pthread support.
3. Added TAU_EBS_RESOLUTION=file/function/line specification.
4. Added TAU_SET_NODE(=0) that may be used for instrumentation of serial codes with MPI configuration of TAU.
5. OpenSHMEM enhancements for supporting CUDA and threads.
6. Added -tau_python_interpreter to tau_python to specify a different interpreter.
7. Added -tau_ebs_resolution=[file|function|line] to tau_exec. Function is default.
8. Added OpenACC launch kernel and data transfer info with variables and lines numbers.
9. Added support for AMD hipcc and hcc compilers with ROCm.
10. Added support for compiler-based instrumentation for hipcc and hcc.
11. Fixed a bug with TAU_SELECT_FILE.
12. Added support for -roctracer=<dir>. Use with tau_exec -T roctracer -rocm ./a.out.
13. Added support for the region push/pop interface of Kokkos Profiling API.
14. Added support for chrome trace viewer output from tau_trace2json.
15. Added support for Cray XC with ARM64 platform (-arch=craycnl).
16. Added support for Score-P 5.0.
17. Added support for downloading and using libelf with libdwarf.
18. Added support for a new TAU plugin interface.
19. Added support for both profiling and tracing with -roctracer for AMD GPUs.
20. Enabling tracing now disables sampling.
21. Added support for CUDA 10 and ROCm 2.6.
22. Added support for setting MPI_T cvars on a per-communicator basis (-mpit -PROFILECOMMUNICATORS and env var TAU_MPI_T_COMM_METRIC_VALUES).
23. Added support for OMPT 5.0 (Clang 8.x, Intel 19.0.4+) and OMPT TR7 (Intel 19.0.1).
24. Updated SOS (for online profiling) to use the TAU plugin interface (examples/sos).
25. Updated Opari to support Cray CNL.
26. Added support for flang.
Version 2.27 changes (from 2.26):
1. Updated JOGL in ParaProf to JOGL2 to support 64 bit 3D window displays.
2. Added support for Mac OS X with PDT, EBS, MPI, and Compiler based instrumentation.
3. Updated PEBIL support in TAU in tau_pebil_rewrite with PDT 3.25.
4. Added a pycoolr based GUI for online performance evaluation using BEACON and SOS Flow.
5. Fixed MPI bugs for collective operations.
6. Updated PMI support in TAU for topology displays for Cray.
7. Added support for the Caliper API in TAU.
8. Updated support for DyninstAPI 9.3.2 in TAU.
9. Updated pthread wrapper support in TAU for use with OpenMP.
10. Added support for LIKWID (configure -likwid=download; export TAU_METRICS=TIME,LIKWID_<event>).
11. Added preliminary support for CUDA 9.0 and NVLink.
12. Updated support for SGI MPT MPI wrapper.
13. Added support for compiler-based instrumentation for LLVM compilers and tested on IBM Power 8 Linux and Cray XC40 with KNL platforms.
14. Added support for flang compiler-based instrumentation.
15. Added support for IBM Power 9 MVAPICH2 compilers.
16. Added support for OMPT TR6 (configure -ompt=download-tr6 with export TAU_OMPT_SUPPORT_LEVEL=full).
17. Added support for LIKWID 4.3.2.
18. Added Python args support.
19. Added support for armflang and armclang/armclang++ compilers.
20. Added -dyninst=download option to simplify installation of DyninstAPI and use with TAU.
21. Updated support for TAU's OMPT TR6 (-download=ompt-tr6).
22. Updated TAU's ParaTools Threadspotter [www.threadspotter.com] support in tau_exec (-ptts).
23. Added support for IBM XL compilers with MVAPICH2 under IBM Power Linux Power 9 + GPU platform.
24. Updated MPC support in TAU.
25. Added preliminary support for NEC SX-Aurora TSUBASA (-arch=nec-sx-aurora -mpi).
26. Added support for TAU_EBS_RESOLUTION=function (disables line info) for tau_exec -ebs. TAU_LITE=1 produces line number of 0 for functions.
Version 2.26 changes (from 2.25):
1. Upgraded OMPT to LLVM-0.2 and introduced TAU_OPENMP_RUNTIME_EVENTS=0.
2. Updated tau_exec -ptts for ThreadSpotter.
3. Fixed OMPT with Cray (-arch=craycnl) instrumentation on threads 1,..N-1.
4. Updated APEX API.
5. Added PAPI support for multiplexing.
6. Updated MPI-T interface for MVAPICH2.
7. Updated MPC support.
8. Added support for OpenACC (PGI 16.5) and CUDA (8.0).
9. Updated OTF2 support.
10. Updated ParaProf.
11. Added support for TAU_CALLSITE=1 with profiling and tracing. Tested with OpenSHMEM.
12. Added support for TAU_PROFILE_FORMAT=merged for unifying performance data for OpenSHMEM.
13. Added support for Python 3.x. Need to specify -pythoninc=<dir> -pythonlib=<dir> while configuring.
14. Added support for PySpark. See examples/pyspark.
15. Added support for PGI compilers with CUDA.
16. Added support for compiler-based instrumentation with CUDA. Simply compile .cu files with tau_cc.sh.
17. Updated MPI-T support for MVAPICH2, MPICH, and OpenMPI.
18. Updated support for resolving symbols using an updated BFD package (2.27). ./configure -bfd=download
19. Updated APEX with tau_exec.
20. Added support for Kokkos profiling interface.
21. Added tau_spark-submit wrapper for Spark profiling to propagate TAU environment variables.
22. Added support for Co-array Fortran (caflauncher, tau_caf.sh, taucaf).
23. Added tau_coalesce, a tool to merge profiles and traces from multiple workflow directories in the current directory.
24. Added support for TAU_TRACK_LOAD=1 for periodically tracking system load.
25. TAU traces now have metadata at the end. User event triggered with 1.
26. Added tau_trace2json and tau_prof_to_json.py tools for JSON ouput.
27. Added support for TAU_METADATA env. var. (export TAU_METADATA="<Var1=Val1:Var2:Val2:Var3=Val3>")
28. Added support for PGI 17 compilers under Power 8 Linux.
29. Added initial support for compiler-based instrumentation for Clang/LLVM under Power 8 Linux.
30. Added support for excluding functions in selective instrumentation file with -optCompInst for GNU/Clang.
31. Added initial support for Intel PIN in JIT mode for GNU compilers (./configure -pin=download; tau_exec -pin ./a.out)
32. Reduced memory footprint and cost of callpath profiling.
33. Added support for SOSflow.
34. Added support for runtime selective instrumentation (TAU_SELECT_FILE) with exclude/include list of routines and files.
35. Added support for ADIOS profiling interface.
36. Turned off TAU_EVENT_THRESHOLD (max and min marker events) by default. Specifying a non-zero value enables these.
37. Added support for specifying unwind offset for TAU_CALLSITE=1 using the env. var. TAU_CALLSITE_OFFSET.
38. Added support for native OTF2 traces. Use configure -otf=download and then TAU_TRACE=1, TAU_TRACE_FORMAT=otf2 while running.
39. Added support for both OpenSHMEM and MPI simultaneously. With MPI, serial programs now emit a warning.
40. Added Pycoolr based GUI for online monitoring.
41. Added -DISABLE_MEMORY_MANAGER configuration option.
42. Added support for Cray PMI (Process Management Interface) to track topology info for paraprof (configure -PMI).
43. Added support for Clang/LLVM under -arch=craycnl (PrgEnv-llvm at ALCF).
Version 2.25 changes (from 2.24):
1. Support for new PDT parsers cxxparse4101, cparse4101, and gfparse48
2. Introducing APEX.
3. Added support for tau_exec -numa for tracking PAPI remote DRAM references.
4. Added support for recycling pthread ids.
5. Updates to TAU for OMPT support.
6. PerfExplorer fixes for mean calculation for runtime breakdown charts.
7. Added support for Argo ExaOSR BEACON backplane for event communication.
8. Added preliminary support for Power 8 and ARM64.
9. Added support for CUDA 7.5.
10. Added support for OpenACC with PGI 15.10.
11. Added support for new versions of MAQAO, PEBIL, and DyninstAPI 9.0.3.
12. Added support for SGI MPT 2.13 with PSHMEM interface.
13. Added support for PGI 16.x OpenACC compilers.
14. Added support for gfparse48 from PDT 3.22 in tau_compiler.sh.
15. Updated OpenCL support in TAU.
16. Improved callsite resolution.
17. Improved BFD support for Xeon Phi.
18. Added support for MPI-T interface (TAU_TRACK_MPI_T_PVARS, TAU_MPI_T_CVAR_METRICS, TAU_MPI_T_CVAR_VALUES).
19. When a file is excluded from instrumentation, it is not parsed or compiled with compInst.
20. Added APEX measurement library.
21. Added OpenSHMEM wrapper generator.
22. Added -scorep=download configuration option to build TAU with Score-P.
23. Updated CUDA support.
24. OMPT updates.
25. Added support for Cray KNL systems (module load craype-mic-knl).
26. Fixed Opari2 compilation issues on Cray.
27. Added support for launching ParaTools ThreadSpotter through tau_exec (-ptts).
Version 2.24 changes (from 2.23):
1. Reduced OpenMP profiling overhead and memory footprint.
2. Include lists now work with tau_rewrite and MAQAO.
3. Impoved OMPT features with MPC.
4. Added support for 3D windows in ParaProf and PerfExplorer for Windows.
5. External package dependencies resolved in configure.
6. Added support for DyninstAPI 8.2.1 (-boost=dir).
7. Updated support for Fujitsu K computer.
8. Updated support for Score-P 1.3+ (--user).
9. Updated support for OpenSHMEM 1.0f.
10. Initial support for Android.
11. TAU Commander tool (tau) introduced.
12. Added support for tracking power (TAU_TRACK_POWER) with PAPI+PERF+RAPL.
13. Added support for OpenMP with GNU and Intel for MPC and TAU.
14. Added -optReuseFiles flag for hand-instrumentation, and speeding up header instrumentation.
15. Added support for TAU_TRACK_MEMORY.
16. Updated tau_instrumentor for use with Score-P.
17. Added tau_exec -ompt and -openacc (for PGI) flags. Uses PGI_ACC_PROF_LIB for OpenACC.
18. Added support for SGI_MPT_MPI for Fortran.
19. Added support for Cray XC40 with mic_linux. Configure -using -arch=craycnl with MIC compilers (cc).
20. Added support for tracking Unified Memory using CUPTI 6.5+ using tau_exec -um -cupti.
21. Better support for Cray UPC (tested with PrgEnv-cray/5.2.40 with cce/8.3.7).
22. Added support for MPI_IN_PLACE and MPI_BOTTOM in the MPI wrapper interposition library.
23. Added initial support for TAU_METRICS=ENERGY and ACCEL_ENERGY (in Joules). Currently, it supports -arch=craycnl only.
24. Added support for TAU_TRACK_POWER with Cray /sys/cray/pm_counters interface (default for -arch=craycnl).
25. Added support for OpenACC with PGI (v14.9+) on Cray XC systems with GPUs (examples/openacc/README).
Version 2.23 changes (from 2.22):
1. Added support for OpenMP Tools API (OMPT).
2. Added support for markers (events triggered when atomic events increase or decrease beyond a threshold).
3. Increase and decrease in heap memory tracked with TAU_TRACK_HEAP=1.
4. ParaProf event window can be launched from a node view.
5. Better support for creating ppk files from tauprofile.xml.
6. TAU_BFD_LOOKUP=0 and tau_resolve_addresses.py with TAU_PROFILE_FORMAT=merged for offline address translation with TAU_SAMPLING=1.
7. Better configure step that builds TAU.h and Makefiles from a skeleton.
8. Lower OpenMP runtime overhead.
9. Added support for tracking power (TAU_TRACK_POWER()) using PAPI --with-component=rapl.
10. Added support for Intel OMPT (-ompt=download).
11. Performance enhancements for OpenMP measurement.
12. Added DrawMultiGraph ui element to PerfExplorer Scripting API
13. Allowed specification of alternate binutils directory at configuration
14. More intelligent management of ParaProf/PerfExplorer jar downloads
15. Added support for adding/removing metadata fields for TauDB trials in ParaProf
16. Added option to hide the legend in PerfExplorer charts
17. Sorted event lists in PerfExplorer custom chart view
18. Allow editing of existing TauDB views in ParaProf
19. Updated Cube reader allows reading metadata and atomic events from Cube profiles in ParaProf
20. Improvements to call-site tracking to work more smoothly with multi-threaded application.
21. CUPTI counters are now assigned to the GPU kernels that produced them. If they cannot be attributed to a single kernel the values are automatically estimated given the structure of the application.
22. Added support for context events and markers in Score-P using SCOREP_PROFILING_FORMAT=CUBE_TUPLE.
23. Added TAU_OTF2_1_1 in tau2otf.
24. Added testing of MPICH3_CONST.
25. Added adapter-init options from scorep-config to generate TauScorePAdapterInit.c.
26. Added support for Intel MPC wrappers mpc_icc, mpc_icpc, mpc_ifort in configure.
27. Added support for Mellanox OpenSHMEM.
28. Added OpenCL support for Apple.
29. Added support for MPC's OMPT.
30. Added support for OpenMPI SHMEM.
31. Added support for Score-P 1.3.
32. Added support for OpenSSL in TAUdb.
33. Added support for Android.
34. Added support for DyninstAPI 8.2 with MPI.
Version 2.22 changes (from 2.21):
1. Added support for TAUdb (formerly PerfDMF).
2. Added support for GPI (www.gpi-site.com).
3. Enhanced ParaProf 3D window configuration for Mac OS X, AMD64 architectures.
4. Updated to OPARI2 1.0.6.
5. Updated Score-P's interface with TAU.
6. Updated communication matrix display for one-sided calls.
7. Improved TAU_SUMMARY=1 for tracking totals, max, and mean.
8. Improved support for topology visualization in ParaProf
9. Added CUPTI support for Cray OpenACC compilers.
10. Improved CUDA device to device communication profiling.
11. Added support for PEBIL based binary rewriter (tau_pebil_rewrite).
12. Added support for DyninstAPI 8.0.
13. Added support for runtime bounds checking (tau_exec -memory_debug, -optMemDbg)14. Throttling overhead reduced for multi-threaded code.
15. TAU_TRACK_CUDA_INSTRUCTIONS=<instruction> added to track instructions by each GPU kernel. Uses -lineinfo passed to nvcc for enhanced source info.
16. ParaProf, PerfExplorer enhancements.
17. tau_show_libs lists extra static libs that are needed to link in TAU for tau_rewrite, tau_run.
18. OpenMP bug fixes.
Version 2.21 changes (from 2.20):
1. Added support for rewriting binaries using Maqao (tau_rewrite) using PDT 3.17+.
2. Added support for event based sampling based on unwinding (TAU_EBS_UNWIND=1).
3. Added support for OpenSHMEM.
4. Added support for context events and atomic events in Score-P.
5. Added support for H2 database in PerfDMF/PerfExplorer.
6. Added support for Eclipse remote component.
7. Added support for UPC instrumentation in TAU.
8. Added support for CUDA and CUPTI v4.1 for NVIDIA.
9. Added a new view for visualizing atomic and context events in 3D topology display in ParaProf.
10. Added Opari2 for OpenMP instrumentation (with support for OpenMP 3.0 constructs).
11. Added support for CUBE4 reader in ParaProf.
12. Added debugging support with TAU_TRACK_SIGNALS=1 to capture callstack as metadata at point of failure.
13. Added TAU_SUMMARY=1 and pprof -f profile.Min/Max with paraprof -dumpsummary to generate just min/max/stddev/mean statistics instead of per-node data.
14. Added support for systemwide tracking using tau.conf.
15. Added support for MINGW compiler to create binaries for Windows.
16. Added CUDA Occupancy calculator.
17. Added support for UPC compiler scripts (tau_upc.sh, tauupc).
18. Updated Opari2.
19. Added EBS based on callstack unwinding.
20. Added Cray DMAPP library wrapper (configure -dmapp, -optTrackDMAPP TAU_OPTIONS).
21. Added -useropt=-DTAU_SYSTEMWIDE_TRACK_MSG_SIZE_AS_CTX_EVENT for tracking message size with context events.
22. Added -optLinkOnly for compiling .o file without instrumentation, then linking in TAU libraries.
23. Added support for IBM BlueGene/Q (-arch=bgq in configure).
24. Added support for callsite profiling (TAU_CALLSITE=1 at runtime).
25. Added an updated Opari2.
25. Added support for OpenACC profiling using the updated PGI 12.3 runtime library instrumentation API.
26. Cupti and OpenCL updates (tracking time spent in queued on the GPU).
27. Updates to ParaProf for IBM BG/Q topology displays.
28. Updates to tau_rewrite for Maqao instrumentation.
29. Added tau_macro.sh tool used for pre-processing (-optPreProcess) for Opari2.
30. Added support for -arch=arm_linux.
31. Added jogl.jar and libjogl.* changes for IBM BG/Q and ARM Linux.
32. Enhancements for TAU sampling and TAU_TRACK_SIGNALS for IBM BG/Q.
33. Enhancements to TAU to support parameter and phase based profiling in Score-P.
34. Added support for Cray and Berkeley UPC runtime library wrapper using -optTrackUPCR TAU_OPTIONS.
35. Reduced MPI overhead using a hash table, fixed BGP & BGQTIMERS.
36. Added support for ARM (arm_linux) and Intel MIC.
37. Added support for OTF2 (tau2otf2 -> tau2otf).
38. Added support for UPC wrapper generator with communication tracking.
39. Improved ParaProf 3D displays with wraparound torus configurations.
40. Added support for Fujitsu FX10.
41. Added support MPC.
42. Added support for LLVM.
43. Added support for TAU_LITE=1 for reduced overhead profiling.
44. Added -z to perfdmf_loadtrial to provide two ways to compute mean in PerfDMF. Preferences in ParaProf.
45. Opari support for multiple libraries in tau_compiler.sh (-optOpariLibs) for POMP2 registrations.
46. CUPTI updates.
47. Added support for GPI wrapper [www.gpi-site.com].
48. Added support for CUDA and CUPTI v5.0 for NVIDIA.
Version 2.20 changes (from 2.19):
1. Added support for GPGPUs using tau_exec -cuda and tau_exec -opencl. Configure with -cuda=<dir>.
2. Added support for 3D topology displays in paraprof.
3. Added support for tracking per-communicator performance data -PROFILECOMMUNICATORS.
4. Added support for binary rewriting of static executables using DyninstAPI 7.0.
5. Added support for derived metrics, cut, copy and paste of metrics in paraprof, perfexplorer.
6. Added support for sampling based measurements in tau_exec -ebs* (README.sampling).
7. Added support for ARMCI profiling (GA v5.0) in tau_exec -armci. Configure with -armci=<dir>.
8. Added support for loop level instrumentation in the binary rewriter (tau_run a.out -o a.i -f select.tau).
9. Added preliminary support for Eclipse PTP remote components.
10. Integrated tau_exec and tau_wrap (tau_exec -loadlib=<dir>/libwrapped.so) for instrumenting an external library.
11. Added support for pre-computing mean and std. dev. at the end of execution with TAU_PROFILE_FORMAT=merged.
12. Added support for compiler-based instrumentation for Cray CCE compilers.
13. TAU ported to Cray XE6 with updated default directories (gemini).
14. Improved support for Cray Shmem.
15. Improved support for thread based communication in tau2otf and ParaVer support in tau_convert -paraver.
16. Throttled functions are now marked explicitly in their names [THROTTLED].
17. Before reverting to compiler based instrumentation the user is prompted unless -optRevert is specified.
18. Updates to memory and I/O tracking support for Mac OS X (in tau_exec).
19. Added support for the Yorick programming language.
20. Added Java profiling support using JVMTI.
21. Added support for Score-P (www.score-p.org) measurement system. Configure with -scorep=<dir>.
22. Added support for importing Google perftools performance data (paraprof -f google).
23. Added support for profiling CUDA kernels on a separate thread.
24. Bug fixes (papi+tracing, profile merged with tau_exec -io, outer-loop level instrumentation dyinstAPI)
25. Added support for topology display with user specified topologies loadable from a text file in paraprof.
26. Added support for PGI 11.x compilers with support for accelerator primitives.
27. Added support for a scrollbar in the paraprof 3D window.
28. CUDA 4.0 support added and tracking of gpgpu thread execution is now supported.
29. Bug fixes for PGI -optCompInst for C++, tau_exec -memory for Intel -optCompInst.
30. PostgreSQL jar file support updated in PerfDMF.
31. Added support for Score-P LD_PRELOAD'ing using tau_exec and tau_run.
32. Added support for SOL2CC compiler for Solaris CC compiler.
33. Bug fixes for tau_instrumentor (single line Fortran DO loop instrumentation).
34. Added support for Cray CCE -optCompInst for OpenMP.
35. Bug fix for PAPI and TAU_TRACE=1 for initialization of papi.
36. Added support for demangling kernel names for CUDA executions.
37. Added a new tool tau_gen_wrapper <header> <lib> that generates wrappers.
38. Fixed papi initialization bug with C++ static ctors on Cray XE6.
39. Scrollbars in paraprof 3D window.
40. Support for NAG Fortran.
41. Update for tau_run rewriter for getting MPI rank from executable.
42. Sped up paraprof/perfexplorer's loading of trials and computing derived metrics.
43. Perfexplorer bugs fixed with custom charts. New charting option in custom charts for a single event in multiple experiments.
44. Collating performance data bug fixed.
45. Cray CCE -optCompInst bug fixed.
46. Added support for SHMEM communication tracking (profiling + tracing).
47. Added API in TAU to track one sided communication from a remote node.
48. Enhancements and bug fixes in tau_wrap.
49. EBS: Added profiling support for event based sampling.
50. Added support for system wide TAU configuration based on <taudir>/tau_system_defaults/tau.conf file. Support job id tracking in file name (PROFILEDIR).
51. Cleaned up GPGPU support with CUPTI.
52. Updated Paraprof 3D topology display with interval event and atomic event selection.
-------------------------------
Version 2.19 changes (from 2.18):
1. Added support for Chapel.
2. Added support for UPC (Berkeley upcc compiler, GASP, supports -optCompInst).
3. Added support for automatically generating LD_PRELOADable .so files from a .h file using tau_wrap (-r libname.so).
4. Derived metrics window in ParaProf and PerfExplorer.
5. -bfd=download now downloads binutils-2.20 instead of 2.18.
6. Added an aligned stacked bar chart in PerfExplorer that is similar to unchecking "Stack Bars Together" box in paraprof's options window.
7. Paraprof now automatically adjusts the memory used (tau_javamax.sh)
8. Added support for Intel compilers on Cray XT systems.
9. tau_validate now uses TAU_VALIDATE_PARALLEL and TAU_VALIDATE_SERIAL env vars to run the tests (see --help).
10. Added support for external tool configuration in PerfExplorer.
11. Updated PerfExplorer code to Weka 3.6.1.
12. Added support for DBSCAN clustering.
13. Updated Jython support to 2.5.1 that supports Python v2.5.
14. Created a utility to reconstruct a Paraver trace from TAU EBS samples.
15. Paraprof 3D communication matrix display has cross hairs and value boxes.
16. Enabled tree selection model for multi-selection.
17. New expression parsing window in Paraprof.
18. Paraprof 3D windows now work on IBM BG/P using ppc64 JOGL.
19. Group changer window in paraprof.
20. Added support for outer loop level instrumentation in tau_instrumentor's spec file mode.
21. When PDT based source instrumentation fails, compiler-based instrumentation is used as a fallback. Disable with -optNoCompInst in TAU_OPTIONS env. var.
22. Added support for PAPI-C v4.0 in TAU. Retains backward compatibility with earlier PAPI versions.
23. Added support for Cray CCE compilers on XT systems (module PrgEnv-cray).
24. Added support for tracking pthread barrier wait times.
25. Added support for TAU over MRNet.
26. Added a new tool (tau_exec) to evaluate I/O, memory and communication
27. PerfExplorer has a new derived metric pane and updates to configuration
28. tau_exec can also load wrapper libraries created by tau_wrap using -loadlib
29. Added support for sampling based profiling (README.sampling)
30. Added support for SHMEM wrappers for Cray XT
31. Added support for Cray XMT (-arch=crayxmt)
32. Added support for ScoreP (aka silc).
33. ParaProf 2D and 3D communication matrices show nodes and not threads.
34. Totals for context events and atomic events are now accessible in paraprof.
35. Refined the support for event based sampling (EBS).
Version 2.18 changes (from 2.17):
1. Added support for PIN based runtime instrumentation for Windows.
2. Added support for compiler based instrumentation for PGI and IBM compilers.
3. Added support for thread-safe throttling of events.
4. Added support for reading annotated snapshot perfsuite profiles generated by TAU.
5. Enhanced support for GNU based compiler instrumentation to support instrumented shared objects.
6. Added support for parsing C99 code using PDT.
7. Python API enhancements include support for setting node, data purging and exit.
8. tau_ompcheck now supports more OpenMP directives.
9. PerfExplorer now includes a CQoS classifier and a gap computing module.
10. Compiler based instrumentation supports selective instrumentation for the file level.
11. Added a -bfd=download configure option which will download and build binutils with -fPIC for compiler based instrumentation.
12. Added support for Intel v11 compilers.
13. Snapshot enhancements (read snapshot.*.*.* files, paraprof -f snapshot)
14. Better configuration support for SiCortex, IBM BG/P and Cray XT5 (PGI, Pathscale, GNU), Intel compilers for Apple.
15. Added support for preloading pthread calls enable with -useropt=-DTAU_PTHREAD_PRELOAD
16. Added support for -DISABLESHARED configure option that does not build libTAU.so.
17. Callpath profiling is now a runtime option enabled by setting the TAU_CALLPATH env var.
18. PGI Accelerator API support, pgi 7.x stl string fix for OpenMP, pgi 8.0 Fortran compiler based instrumentation fix.
19. Python C-API offers lower overhead than legacy ltau.py API. Now it is the default.
20. TAU_PROFILE_FORMAT "merged" generates merged profiles.
21. perfdmf_configure --create-default creates a Derby DB without any questions.
22. Eclipse plugin uses a new XML workflow for uploading performance data on disk to db.
23. 3D Communication Matrix in ParaProf.
24. TAU_CALLPATH_DEPTH of 0 and 1 for context events.
25. TAU_TRACK_HEAP and TAU_TRACK_HEADROOM environment variables - samples at function entry and exit, context events.
26. PerfExplorer has a working Derby database that can be used to load ppk files.
27. Default mode: TAU_TRACE=1 disables TAU_PROFILE. You may set it to get both.
Version 2.17 changes (from 2.16):
1. Added support for IBM BG/P (-arch=bgp).
2. Added a new tool for generating wrapper libraries, tau_wrap.
3. Improvements in Eclipse plugin for external tool support.
4. Improvements in paraprof and perfexplorer.
5. Improvements for SiCortex support and tauex.
6. Added support for atomic events in TAU library layered over VampirTrace.
7. Added a Posix I/O wrapper (-iowrapper) for tracking volume and bandwidth of I/O.
8. Added an MPI wrapper library for Windows Cluster 2003.
9. Added support for Scalasca 1.0. Works with both Kojak and Scalasca.
10. Added Opari in TAU.
11. Added IBM BG/P metadata for torus node information in profiles.
12. PerfExplorer adds support for user-defined events and improvements in custom charts.
13. Posix I/O tracking implemented without need for enabling profiling (for tracing).
14. Improved tau_inc.pl for generating include lists for Scalasca/Kojak based on callpath profiling.
15. Eclipse TAU plugin has support for two stage communication analysis.
16. Added -BGPTIMERS for IBM BG/P. Compatible with -BGLTIMERS.
17. Env vars TAU_VERBOSE, TAU_SYNCHRONIZE_CLOCKS, TAU_PROFILE_FORMAT (snapshot)
18. GCC 4.3.0 compatibility
19. Added bandwidth and bytes written info for MPI I/O write routines.
20. Added support for GNU, PathScale and PGI compilers on Cray XT systems [ORNL].
21. ParaProf can now generate selective instrumentation files.
22. TAU_THROTTLE = 0 disables throttling of events. Use TAU_VERBOSE=1 to see it.
23. perfdmf_configure now stores weka.jar files in ~/.ParaProf directory.
24. Added support for DyninstAPI 5.2.
25. Bug fixes for tau_instrumentor, context events, and tau2slog2.
26. Added support for pointer based profiling API (examples/profilercreate/README) [LSU].
27. -spec option for tau_instrumentor allows generic timer instrumentation support [FZJ].
28. Paraprof allows new windows for multiple metrics to compare data [SiCortex].
29. Posix I/O tracking now uses context events instead of user-defined events.
30. Added support for compiler based instrumentation for Intel 9.1, 10.x, GNU, and PathScale compilers.
31. Added extensions in PerfExplorer to support CQoS analysis, drawing charts from script [CCA].
32. Bug fixes in paraprof for selective instrumentation, printer support [NASA].
33. taucxx, taucc, tauf90 now use -optCompInst by default, tau_[cxx,cc,f90].sh use -optPDTInst by default.
34. Added -opari support in installtau.
35. Fixes for IBM BGL/BGP configuration.
36. Added support for tracking memory utilization and headroom in Python [ALCF].
Version 2.16 changes (from 2.15):
1. Added a new tool for correcting network time drifts in traces (tau_timecorrect).
2. Added support for an Eclipse analysis wizard and a graphical instrumentor.
3. Added support for 3D stereo visualizations in ParaProf.
4. Added support for a source browser in ParaProf.
5. Added support for generating source code information in tau_instrumentor.
6. Added support for Perflib based instrumentation and perf2tau [Jeff Brown, LANL].
7. Updated KTAU support in TAU for registering fork for kernel profiling [ANL].
8. Added tau_validate tool for checking if the TAU library is built correctly [UTK].
9. Added support for loading multiple ppk files in paraprof on the commandline [ORNL].
10. Enhancements in ParaProf and PerfExplorer for using metadata. [PERI]
11. Added support for capturing date and other cpu information in profiles. [LLNL]
12. Added support for Vampirtrace [LLNL].
13. Added support for Scalasca 0.5, and KOJAK 2.2 [FZJ, UTK].
14. Enhancements to Eclipse PTP plugin to support PAPI counter selection. [UTK]
15. Supports PDT v3.10 with EDG v3.8 C++/C parsers [LLNL].
16. Added support for application signatures [RENCI].
17. Added support for SiCortex Linux platform [SiCortex].
18. Added support for tracking leaks and dynamic memory allocation/deallocations in Fortran.
19. Improved tau memory tracking module to handle multi-line statements in Fortran.
20. Python profiler overhead is greatly reduced.
21. tauex script added for switching between libraries.
22. -optShared option added in tau_compiler.sh for linking in TAU's shared objs.23. Easy to use TAU API (TAU_START("string"), TAU_STOP("string")) introduced.
24. Paraprof enhancements include support for Cube 3.
25. PAPI threads (configure -papithreads) and PAPI Domains added for x86 linux.
26. Clock synchronization in traces.
27. Metadata fields in ppk files.
28. Custom charts with XML metadata in PerfExplorer.
29. TAU portal scripts to upload data to perfdmf database.
30. Added support for persistent communication events in traces.
31. Added support for KTAU OS level shared counter coupling.
32. Eclipse/PTP updates for accessing TAU options and build configurations.
33. Added support for tracking Fortran I/O.
34. Added support for accessing multiple databases, configuration of databases,
and context event displays in paraprof.
35. Added support for -arch=mips32 for SiCortex 32 bit compilation.
36. Updates for Epilog on Cray XT3 support using TAU.
37. Updates for Lahey 64 bit Fortran under Linux.
38. Added support for generating and viewing profile snapshots.
39. Added support for specifying phases and timers (static/dynamic) in the
instrumentation specification file (see examples/timerphase).
40. Updates for Eclipse/PTP plugin for supporting external tools such as
VampirTrace, Kojak and Perfsuite using TAU's tool plugin.
41. Added Support for Cray Compute Node Kernel for XT4 (-arch=craycnl).
42. Updates for tauex to include tau_load.sh functionality for generating
MPI performance data for shared library MPI.
43. Added signal handlers (SIGUSR1 and SIGUSR2) to dump performance data and
toggle instrumentation (enable/disable instrumentation) respectively.
44. Full compiler names and -show option is available for compilers scripts.
45. Added support for reading in OMPP profiles in paraprof.
46. Added support for Intel 10.x compilers, NAGWare Fortran, and g95 compilers.
Version 2.15 changes (from 2.14):
1. Added support for phase and comparative displays in ParaProf [UO]
2. Updated PerfExplorer [UO]
3. Added suport for Eclipse CDT, FDT [LANL]
4. Added support for OTF (tau2otf) [LLNL]
5. Added support for runtime throttling of events (TAU_THROTTLE) [UCAR]
6. Added support for ORC Open64 compiler [U. Houston/NCSA]
7. Added support for Solaris on x86_64 (Opterons) [SUN]
8. Added support for nested OpenMP calls [SUN, Aachen]
9. Added support for Cray XT3 and SHMEM wrapper [PSC]
10. Added support for multi-platform traces and a trace writer library [UFL, ORNL]
11. Added support for top level timer in OpenMP [UCAR]
12. Added support for PAPI on BGL and XT3 [ANL, PSC]
13. Added support for converting TAU traces to profiles.
14. Added support for converting TAU callpath profiles to phase profiles [LLNL].
15. Enhancements to Paraprof.
16. Better support for Intel compilers for linking C and Fortran codes to TAU [NOAA].
17. Added support for FreeBSD [ARL].
18. Added support for Eclipse PTP [LANL].
19. Added support for scripting in Paraprof using Jython to create custom views [LLNL].
20. Added support for Python 2.4 with instrumentation for C calls [LLNL].
21. Added support for loop level instrumentation for C and C++ [UTK, LANL].
22. Added support for parameter based profiling (-PROFILEPARAM) [UTK].
23. Added support for tau_load.sh for runtime MPI library instrumentation [UTK].
24. Added support for outer-loop level instrumentation for Fortran [UTK, LLNL].
25. Added a new tool: tau_ompcheck that completes OpenMP Fortran directives [NCAR].
26. Added support for preprocessing Fortran sources in tau_compiler.sh (-optPreProcess) [GSFC].
27. Added support for invoking tau_ompcheck in tau_compiler.sh [NCAR].
28. Added support for DB2 and Derby in PerfDMF [UTK, LLNL].
29. Added support for Infiniband MPICH on Opterons [NERSC].
30. Added support for Cray XT3 Memory headroom information and Cray Timers [PSC].
31. Added support for GNU Gfortran parser in PDT for tau_compiler.sh [LANL].
32. Added support for parameter based profiling (-PROFILEPARAM) for workload characterization [UTK].
33. Added support for upgrading from one version of TAU to another (upgradetau) [NERSC].
34. Added support for automatic instrumentation of pthread programs using PDT [Walt Disney].
35. Added Java TAU trace writer library [U. Reading].
36. Added support for gotos in outer-loop level instrumentation [UTK].
37. Added support for automatic MPI library level instrumentation using tau_poe [UTK].
38. Updated tau_ompcheck [NCAR].
39. Better support for instrumentation and parsing of Fortran programs [Goddard].
40. PerfExplorer enhancements (normal probability plots, event data, distribution info of events)
41. Automatic memory leak detection (-optDetectMemoryLeaks) for C/C++ malloc/free [UTK].
42. TAU Portal (tau.nic.uoregon.edu) to access database.
Version 2.14 changes (from 2.13):
1. MPI-2 support and Fortran wrappers added.
2. Support for Oracle database in PerfDMF.
3. VTF support for multiple PAPI counters in Vampir/VTF format trace files.
4. Improvements in Paraprof displays and database connectivity.
5. Improvements in tau_compiler.sh to automatically instrument applications.
6. Added support for phase based profiling and dynamic timers.
7. Introduced vtf2profile tool to get profiles from VTF3 traces.
8. Added histograms, full callgraph, not-normalized displays to paraprof.
9. Added support for PathScale compilers and -exec-prefix option.
10. Improved support for locking of performance data in multi-threaded apps.
11. Added 3D displays in Paraprof.
12. Added support for SLOG2 traces (to use TAU with Jumpshot) [ANL].
13. Added bettter support for configuring for BG/L (-arch=bgl) [ANL].
14. Added support for depth limit profiling and tracing (-DEPTHLIMIT) [ORNL].
15. Changes to the MPI wrapper library (for S3D) [ORNL].
16. TAU_MPI_MESSAGE_SIZE now reports sizes for MPI_Send, Recv, Allreduce, etc.[ORNL].
17. Added support for Charm thread library [UIUC, LLNL].
18. Added support for gfortran compiler (-fortran=gfortran).
19. Added support for reverse callpaths in paraprof [LLNL].
20. Added support for storing trials in paraprof [UTK].
21. Added support for user defined context events (callpaths) [ANL].
21. Added support for measuring memory headroom available (-PROFILEHEADROOM, examples/headroom) [ANL].
22. Added tau2elg trace conversion tool to convert to Epilog trace format [UTK].
23. Added search options to paraprof windows [LLNL].
24. Added support for -MPITRACE option for Kojak [UTK].
25. Paraprof has text table window now for callpath profiles [LLNL].
26. Changes to TAU_COMPILER to support Opari in Kojak [UTK].
27. Fixed bugs in tau2elg to support Kojak v 2.1 and 2.1.1 [FZJ].
28. Fixed a bug in TAU_COMPILER (when opari is not used) [UTK]].
29. Added support for cube (importer) in paraprof [UTK].
30. Added support for PGI v6.0 compilers.
31. Added Jumpshot/Slog2 package to TAU [ANL].
32. Added support for trace files > 2GB in TAU and VTF3 [TACC].
33. TAU no longer needs merged pdb files from PDT's F95 parser [UTK].
34. Enhancements in Paraprof to choose metrics for summary table, std. dev [LLNL].
35. TAU_COMPILER does not need -optReset for IBM xlf90 to eliminate -D* flags.
36. TAU scripts (tau_[cxx,cc,f90].sh) for use on commandline [UFL].
37. TAU Java Eclipse plugin [LANL].
38. Updated documentation.
39. Added PerfExplorer performance data mining and knowledge discovery framework [LLNL].
40. Enhancements in MPI libraries for scalability [LLNL].
41. Phase based profiling allows you to identify phases in paraprof.
42. Added tau_setup GUI for TAU installations [LANL].
Version 2.13 changes (from 2.12):
1. Paraprof enhancements.
2. TAU MPI wrapper library layer enhancements [CCA].
3. Better support for autoinstrumentation of F95 source code using PDT [LANL].
4. Support for autoinstrumentation of Java using JDK 1.3 and 1.4.x JVMPI.
5. Introduced the TAU Trace Input Library (TIL) [VNG, TUDresden].
6. Added support for detecting papi wallclock timer overflow [LLNL].
7. Added support for Power4 Linux 64 bit compilation (-arch=ibm64linux) [LLNL].
8. Paraprof enhancements for groups and multiple counters with multithreaded loading [LANL].
9. Added TAU Instrumentation Language for enhancing tau_reduce.
10. Added support for RTTI with g++ [ITT].
11. Added support for PAPI 3 so that TAU works with both PAPI 2 & 3 [UTK].
12. Paraprof enhancements for callpath profiling [LLNL].
13. Timer overhead measurements for callpath profiling.
14. Compensation of timing overhead introduced.
15. Malloc/free wrappers pinpoint memory allocation bugs (examples/malloc) [LLNL].
16. Added memory utilization tracking (examples/memory) [LLNL].
17. Added muse user defined events with TAU interrupt handlers [LANL].
18. Paraprof improvements (clickable callpaths, image, XML support) [LLNL].
19. Fuzzy matching of file names in tau_instrumentor (/home/foo.cpp ./foo.cpp) [TACC].
20. Added support for TAU_TRACK_MEMORY_HERE() [LLNL].
21. Improvements in PerfDMF and ParaProf's ability to connect to database [LLNL].
22. Added support for native PAPI events (setenv COUNTER1 PAPI_NATIVE_<nm>) [LLNL].
23. Added support for DyninstAPI v4.1 [UMD].
24. Added support for VTF3 binary trace generation library for Vampir.
25. Added hardware performance counters and other user defined events to trace.
26. Introduced hierarchical trace merging using tau_merge (both offline/online).
27. Added -PROFILEMEMORY option that tracks memory at each routine entry [LLNL].
28. Improved support for MySQL and PostgreSQL databases in PerfDMF.
29. Added automated trace merge/convert with tau2vtf using TAU_TRACEFILE env.
30. Added $(TAU_COMPILER) shell script/makefile variable for automatic instr.
Version 2.12 changes (from 2.11):
1. Enhancements in jracy for supporting multiple counter data [LLNL].
2. Improved memory handling and drawing speeds in jracy [LLNL].
3. Configuration changes for LAM MPI, PAPI, Tru64 [Utah, NCSA, LANL].
4. Added support for Python bindings [CACR, LLNL].
5. Added MPI shared library examples [CACR].
6. Added support for building multiple configurations (installtau) [LANL, LLNL].
7. Added support for Python under AIX and OSX [LLNL].
8. Bug fixes for IA-64 and Intel 7.1 compiler [NCSA].
9. Added TAU_CALLPATH_DEPTH env. variable specification for callpath profiling [LLNL].
10. Added support for -arch=ibm64. It suppports PAPI 64 bit/Power4. [UTK]
11. Bug fixes for shared libraries with MPI, g++/KCC under AIX 5.1. [LLNL]
12. Introduced paraprof profile browser (jracy symlinks to paraprof). [ASCI]
13. Added support for dumping profiles in python using a prefix. [LLNL]
14. Added support for DyninstAPI 4.0 including binary rewriting. [U. Maryland]
15. Added support for KOJAK's implementation of Opari and EPILOG. [FZJ]
16. Added support for file level selective instrumentation (PDT, Dyninst). [Utah]
17. Fixed Apple's OS X sscanf bug for reading long doubles in pprof.
18. Added support for DyninstAPI under AIX. [NERSC]
19. Added support for Cray X1 and AMD Opteron (ASCI Red Storm). [Cray]
20. Added support for MAGNET/MUSE. [LANL]
21. Added support for Performance Database [ASCI].
22. Added support for Multiplecounters with CRAY_TIMERS, MUSE and message size [CCA].
Version 2.11 changes (from 2.10):
1. Added -i header option for tau_instrumentor [CASC].
2. Added -LINUXTIMERS option for low overhead Linux wallclock time [CACR].
3. Added -c|-c++|-fortran options to tau_instrumentor [CACR].
4. Lowered the overhead of timers of disabled profile groups [CACR].
5. Added support for PAPI v2.1 [CACR].
6. Updated PCL bindings.
7. Added support for selective instrumentation [CACR].
8. Added support for multiple counters [CACR].
9. Added support for Paraver trace visualizer (CEPBA) in tau_convert.
10. Opari and PDT related changes (examples/opari/pdt_f90) [FZJ].
11. Added support for online access to performance data [CACR].
12. Added support for LINUXTIMERS for PGI and other Linux compilers [FZJ].
13. Changes to online access API [CACR].
14. Improved jracy GUI [ALPS].
15. Added support for EPILOG tracing package [FZJ].
16. Added support for Hitachi SR8000 [FZJ].
17. Added support for browsing by profile groups in jracy [ALPS, LLNL].
18. Made some modifications to Paraver trace format conversion [CEPBA].
19. Added support for NEC SX-5 [HLRS].
20. Added support for -mpilibrary option [LLNL].
21. Added support for g++ 2.96/3.x for tau_merge/tau_convert [ST].
22. Fixed a problem with the MPI wrapper library for Intel IA-64 compilers [NCSA].
23. Added support for tracking message sizes using user defined events [Rutgers].
24. Added support for low overhead, high resolution timers under IA-64 Linux [NCSA].
25. Added support for alternative returns in PDT based C instrumentation [PETSc, ANL].
26. Added a new tool - tau_reduce for reducing instrumentation overhead.
27. Added support for callpath profiling.
28. Fixed pprof to support exclusive percentage in callpath profiling.
29. Changes for CCA, jracy & DyninstAPI on IRIX, Sun.
Version 2.10 changes (from 2.9):
1. Better support for C instrumentation [HDF5].
2. Fixes for IBM.
3. Added support for multiple instrumentation requests per line [CACR F90/C++].
4. Added support for detecting threaded versions of MPI at configuration.
5. Made some modifications for PDT v2.1 [CACR C++/C].
6. Added jracy, TAU's new Java based profile browser to replace racy.
7. Added support for specifying a fortran compiler during configuration.
8. Added support for auto-detection of mpi libs and include dirs (-mpi).
9. Added IBM specific libs for MPI so we don't have to use mpCC, mpKCC [CACR].
10. Added TAU_LDFLAGS to MPI Makefiles [CACR].
11. Added support for enabling/disabling profile groups at runtime [PDT, CACR].
Version 2.9 changes (from 2.8):
1. Better support for mixed model programming
2. Changes for KCC and KAP/Pro.
3. Added support for MPI with DyninstAPI.
4. Added support for selective profiling in Java (-XrunTAU:exclude=java,sun)
5. Java RMI support changes.
6. Introduced TAU Java source instrumentation API.
7. Added support for enabling and disabling group level instrumentation.
8. Added support for PCL 2.0.
9. Fixed tau_instrumentor for PDT 1.3 using SGI CC and examples.
10. Fixed F90 bug on string concatenation.
11. Changed TauGroup_t to 64 bits (unsigned long).[Mapping addresses].
12. Added TAU_SHLIBS so DSO's are created everytime.
13. Support for incremental profile dumps.
14. PAPI on Solaris and other platforms requires linking with a static library.
15. Added support for Compaq Alpha (cxx, cc, f90).
16. Fix for MPT 1.4 under IRIX 6.5.
17. Changes in tau_merge to support Uintah.
18. Added support for SGI sproc threads.
19. Added support for dumping profile data in a consistent state (profile snapshot).
20. Added support for Opari OpenMP directive rewriting tool [EWOMP'01].
21. Improved MPI wrapper library support [Uintah].
22. Added support for gcc-3.0 (pprof).
23. Added a bug fix for Vampir (tau_convert -pv -longsymbolbugfix) [SAMRAI].
24. Changed Opari options (omperf to pomp name change).
25. Added support for dynamically assigning group names [SAMRAI].
26. Added support for evaluating perturbation of TAU_DB_DUMP() [Uintah].
27. Added support for C in tau_instrumentor.
28. Fixed RtsLayer bug for PDT based instrumentation of multi-threaded C++
applications.
29. Added -noinline flag to tau_instrumentor to suppress instrumentation of
inlined functions [POOMA].
30. Added support for F90 in tau_instrumentor.
31. Added support for abnormal exit in C [UPS].
32. Added support for Opari-1.1 [flush_enter/exit calls].
33. Added MPI wrapper layer for SGI Fortran [SAGE].
34. Made changes to SGI Fortran MPI layer [MPI_Init].
35. Added IA-64 support (threads, PDT, MPI ...) using RH 7.1 gcc 2.96.
Version 2.8 changes (from 2.7):
1. Added support for PAPI (Perf. API for accessing HW Perf. Counters).
2. Added better support for Dyninst.
3. Added support for CPUTIME (pthread/Linux).
4. Added support for multi-language programming for Java + C (JNI).
5. Added support for mpiJava.
6. Added support for tracing all MPI interprocess communication (incl. async.)
7. Added support for PAPIWALLCLOCK (with -papi=<...>) for low overhead timers.
8. Added support for PAPIVIRTUAL (with -papi=<...>) for user time using PAPI.
9. Added support for OpenMP and OpenMPI (PGI, KAP, IBM, SGI)
10. More compilers: IBM xlC, xlc, xlf90 on SP (See INSTALL file)
Version 2.7 changes (from 2.6):
1. Added Support for JAVA (JDK 1.2+).
2. Added support for DYNINST Dynamic Instrumentation Package from U. Maryland.
3. Added support for SUN 5.0 CC, F90 compilers
4. Added support for Microsoft Windows.
Version 2.6 changes (from 2.5):
1. TAU Mapping API introduced.
2. More platforms: Cray T3E with F90, Alpha/Linux, Intel/Linux
with PGI and Fujitsu compilers (C++/C/F90)
3. Added support for threadsafety in Fortran/C.
4. Added support for Program Database Toolkit for instrumenting C++
sources using tau_instrumentor
5. Added support for Performance Counter Library for accessing Hardware
Performance Counters on Cray, Intel, Alpha, UltraSparcs, MIPS, and
IBM Power platforms
6. TAU MPI wrapper library introduced for profiling MPI routines.
7. Added NAS Parallel Benchmark 2.3 LU & SP suites as Fortran90/MPI examples.
Version 2.5 changes (from 2.4):
1. Automatic instrumentation support using DUCTAPE.
2. Changes in directory structure and configuration.
3. Integrated with POOMA and SMARTS.
Version 2.4 changes (from 2.3):
1. Added support for SMARTS and Tulip user level threads.
2. Added support for Fortran and F90 API.
3. Added threadsafe user defined events.
4. Added threadsafe trace library.
Version 2.3 changes (from 2.2):
1. Added pthread support.
2. Added C-API support with the same lib/API.
3. Introduced User Events
Version 2.2 changes (from 2.1):
1. Added callstack profile viewing tool
2. Blitz++ compatibility changes.
Version 2.1 changes (from 2.0):
1. Better colors in racy
2. Support for T3E.
3. Support for Tcl/Tk 8.0 as the default.
4. Introduced Callstack profiling.
5. Blitz specific changes.
Version 2.0 changes (from 1.0):
1. Introduced Tracing.