forked from HPCToolkit/hpctoolkit
-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.ReleaseNotes
977 lines (746 loc) · 36.2 KB
/
README.ReleaseNotes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
$Id$
=============================================================================
=============================================================================
HPCToolkit/README.ReleaseNotes
=============================================================================
=============================================================================
Contents:
0. General Notes
1. Tool Notes
2. Revision History
=============================================================================
------------------------------------------------------------
0. General Notes
------------------------------------------------------------
----------------------------------------
0.1
----------------------------------------
- If you are having problems, please see "For More Information" in the
README file.
----------------------------------------
0.2. Unsupported experimental features
----------------------------------------
- HPCToolkit can be used to analyze the performance of code parallelized
using CilkPlus features available in Intel and GNU compilers. Without
runtime support, HPCToolkit shows an implementation-level view where
call stacks for worker threads contain whatever computation fragments
they stole rooted at the stolen frames.
- We are not yet supporting measurement and analysis of Pthreads lock
contention, though you are free to try it out. (PPoPP '10)
- We do not yet support MPI blame shifting on IBM BG/P. (SC '10)
- We have added prototype support for memory leak detection to hpcrun.
You can invoke this using the hpcrun options "-e MEMLEAK". The core
functionality is operational, but it will be faster and have
a better user interface in a forthcoming release.
=============================================================================
------------------------------------------------------------
1. Tool Notes
------------------------------------------------------------
----------------------------------------
1.1. hpcrun
----------------------------------------
1.1.1. Usage limitations:
- The user should choose a sampling period so that the total number of
samples is statistically significant, but so that the overhead is
not too high. Typically, this means targeting a sampling frequency
of hundreds of samples per second; thousands for short runs; tens
for longer runs.
- Unless you are using HPCToolkit to measure the performance of a
an application running as a Cobalt or PBS batch job, HPCToolkit
will not place measurement data in a unique directory. Therefore,
when re-profiling an application, it is advisable to rename any
prior measurement directory.
- In a threaded program, hpcrun may cause some syscalls to return
EINTR. This happens only in a threaded program and mainly with
"slow" syscalls such as select(), poll() or sem_wait().
1.1.2. Measurement limitations:
- Tail calls and inlining. Both of these compiler optimizations avoid
creating procedure frames and thus generate run-time stacks that do
not correspond to the user-level context. hpcrun does not
reconstruct the user-level stack. Note that using hpcstruct's
structure information enables hpcprof to identify inlined frames
within the call path.
- There are currently two blind spots in hpcrun. First, static
initializers are not profiled because sampling begins just before
"main" is called. Second, sampling is turned off for OS calls.
- Debugging messages. When hpcrun is invoked with debugging options
turned on, it may print log messages not only when processing
samples, but also when threads are created or dynamic loading of
shared libraries occurs. Presently hpcrun's messaging subsystem
acquires a lock before writing to stderr or a log file to avoid
unwanted interleaving of output from different threads. It is
possible (though it has never been observed) that a sample event may
occur while an output lock is held. If the thread holding a
messaging subsystem lock is hit with a sample event while holding
the lock, this can cause deadlock. This situation has never been
encountered in practice, but it is possible. Presently, the
messaging system is being overhauled and the interaction between
synchronous and asynchronous entries into hpcrun's run-time library
is being overhauled to avoid the potential for such race conditions.
----------------------------------------
1.2. hpcstruct
----------------------------------------
1.2.1. Limitations
- Incorrect or erroneous debugging information. The most important
requirement for hpcstruct to generate good static structure is good
debugging information. Ensure that you compile with as much
debugging information as possible. Even then, some compiler
generate erroneous information. hpcstruct attempts to ignore such
information, but this is only successful if the bulk of the
information is correct.
- Inlining recovery: Given accurate debugging information, hpcstruct
can detect almost all inlining. However, it does not attempt to
recover call path relationships between the inlined code. For
example, if a call path consisting of two procedures (a calls b) is
inlined into a third procedure x, hpcstruct shows that two routines
are inlined into x, but does not indicate that a called b.
- Incomplete binary analysis. hpcstruct does not analyze indirect
jumps which means it does not completely account for switch
statements and indirect jumps. This is very rarely a problem.
----------------------------------------
1.3. hpcprof/hpcprof-mpi
----------------------------------------
1.3.1. Limitations
- Experiment databases can still be quite large.
- hpcprof currently does not generate 'metric databases' for hpcviewer
scatter plots.
1.3.2. Bugs
- hpcprof/mpi currently does not properly combine multiple trace
measurement databases.
- When processing in parallel measurements for a *dynamically linked
binary*, hpcprof-mpi may abort with an MPI error. The problem is
that GNU binutils does not always return the same debug information
for a given object code address. We are working on a patch.
Workarounds:
1) Use hpcprof or hpcprof-mpi with one MPI process.
2) Use a statically linked binary.
3) Supply all static structure to hpcprof-mpi (-S option) by running
hpcstruct on the application and all DSOs
4) Run hpcprof-mpi in parallel only on one node. (Usually, GNU
binutils is deterministic during one execution on one node.)
----------------------------------------
1.4. hpcviewer
----------------------------------------
1.4.1. Bugs
- hpcviewer's attribution of inclusive costs in the caller's view
works for programs with simple recursion; however, costs can
be incorrect in the presence of mutual recursion.
=============================================================================
------------------------------------------------------------
2. Revision History
------------------------------------------------------------
----------------------------------------
HPCToolkit 2019.08 + some earlier
----------------------------------------
Enhancements
Build
Replaced hpctoolkit-externals with building prerequisites from spack.
See: http://hpctoolkit.org/software-instructions.html
or README.Install
Moved to using Dyninst 10 to provide better binary analysis used by
hpcstruct.
Hpcrun
Hpcrun can now tag some metrics to be hidden by default in
hpcviewer. This support was added to enable the full accounting for
the myriad GPU instructions stall reasons to be hidden by default.
Launch scripts are more robust for finding the install directory, and
provide an option for identifying the git version (branch and commit).
Hpcstruct
Hpcstruct now exploits threaded parallelism to analyze large
application binaries quickly. Use 'hpcstruct -j num binary' to
specify number of threads.
Improved handling of irreducible loops.
Hpcviewer
Hide metric columns if it doesn’t have metric values.
Using raw metric values in copying to clipboard and exporting to CSV
files, instead of exporting the displayed format of the value.
Bug fixes
hpcprof:
Fixed bug so that inlined procedures are now merged in bottom-up and
flat views. Previously, an internal buffer overflow prevented fusion
of some instances of inlined procedures in bottom-up and flat views.
Hpcrun
Fixed handling of dynamic thread allocation.
Fixed handling of precise ip modifier. Using Linux perf convention to
force hpcrun to enable precise-ip attribute (:p for slightly precise,
:pp for mostly precise, :ppp for exactly precise).
Fixed handling of memory barrier when manipulating buffer of events
recorded by Linux perf. Incorrect memory barrier handling caused some
event records to be handled incorrectly.
Fixed kernel blocking sample-source. Context-switch attribute is now
supported on Linux kernel 4.3 or newer. It is also available on RedHat
3.x or newer, which has a backport of the kernel support.
Hpcviewer
Fixed filtering nodes on Flat view.
----------
Merged the 'banal' branch into master. This fully replaces hpcstruct
and the banal code with the new inline tree format. However, the
format of the .hpcstruct file is unchanged.
The new banal code produces a better loop analysis (placement of loop
header) and has a smaller memory footprint. This also includes
support for displaying parseapi gaps (unclaimed code regions) in the
viewer.
Merged the 'perf-datacentric' branch into master. This adds support
for the perf family of events as a way of accessing the hardware
performance counters without using PAPI.
----------------------------------------
HPCToolkit Version 2017.06, June 2017
----------------------------------------
Updated the 'ompt' branch and merged master into ompt. This improves
scalability of the ompt support to large thread counts, as found on
KNL and Power8. The ompt branch, together with the LLVM OpenMP runtime
library provides
- support for attributing performance measurements to a global-view,
source-level program representation by unifying measurements of OpenMP
activity in worker threads with the main thread.
- support for attributing causes of idleness in OpenMP programs using
HPCToolkit event OMP_IDLE.
The LLVM OpenMP runtime can be used with Clang, Intel, and GNU compilers.
The LLVM OpenMP runtime with OMPT support for performance analysis
can be can be obtained as follows
svn co http://llvm.org/svn/llvm-project/openmp/trunk openmp
To use the draft OMPT support present in LLVM's OpenMP runtime, one must
build the LLVM OpenMP runtime with LIBOMP_OMPT_SUPPORT=TRUE. This version
of the runtime is designed to be used with HPCToolkit's ompt branch.
Note: support for the OMPT tools interface in HPCToolkit and LLVM
currently follows a design outlined in TR2 at openmp.org. An improved
version of the OMPT interface was designed for OpenMP 5.0, which
is described in TR4 at openmp.org. At some point soon, both LLVM and
HPCToolkit will transition to support the new API; presently, both lack
support for the new API.
Merged the 'atomics' branch into master. This branch replaces the
atomic operations in hpcrun with C11 atomics through the stdatomic.h
header file.
Updated Dyninst to version 9.3.2 in externals, plus a patch for better
binary analysis of functions that use jump tables.
Updated hpcstruct to handle new ABI on Power/LE architectures, which
has both internal and external interfaces for functions.
Various bug fixes:
-- enhanced binary analysis for call stack unwinding on x86-64, including
an enhancement to analyze functions that perform stack alignment and
a bug fix to track stack frame allocation/deallocation of space for local
variables using the load effective address (LEA) instruction.
-- fixed bug in hpcrun to correct data reinitialization after fork(). This
bug prevented using hpcrun to profile programs launched with shell scripts.
-- fixed bug in hpcrun in binary analysis related to ld.so.
-- fixed bug in hpcstruct in getRealPath() that caused hpcstruct
to sometimes report incorrect file names.
Known issues:
When profiling optimized code with HPCToolkit, one may find that a program
generates a significant number of "partial unwinds" where the call stack
can't be unwound all the way up to main. This more commonly happens on
x86-64 architectures than on PowerPC and ARM. A large number partial unwinds
may make it harder to use the top-down calling context view in hpcviewer,
which works best when call stacks unwind all the way up to main. Even
with significant numbers of partial unwinds, the bottom-up caller's view
and the flat view in hpcviewer can be used effectively for analyzing
performance. Ongoing work aims to improve call stack unwinding of
optimized code by employing compiler-generated unwinding information
where available in addition to using binary analysis to discover unwinding
recipes.
On x86-64, hpcfnbounds occasionally is too aggressive about inferring the
presence of stripped functions in optimized programs. We have noticed
this particularly for optimized Fortran. This can cause "partial unwinds",
where a call stack can't be unwound fully up to main. Improving this
analysis is the subject of ongoing work.
When using with the LLVM OpenMP runtime's OMPT support, measurements
of programs compiled with GCC using HPCToolkit's ompt branch sometimes
reveal implementation-level stack frames that belong to the OpenMP
runtime system. This will improve with the transition of HPCToolkit
and the LLVM OpenMP runtime to the new OMPT ABI designed for OpenMP
5.0. This transition should occur over the next 6 months. In the
meantime, there is nothing wrong with the quality of the information
collected. The only problem is HPCToolkit's measurements reveal more of
an implementation-level view of OpenMP than intended.
----------------------------------------
HPCToolkit Version 2016.12, Dec 2016
----------------------------------------
Added preliminary measurement, analysis, attribution, and GUI support
for Power8/LE.
Added preliminary measurement, analysis, and attribution support for
ARM64.
Added support for KNL, Knights Landing (configure as x86-64).
Overhauled data structures for managing shared state (binary analysis results)
in hpcrun to avoid mutual exclusion in the common case. This improves manycore
scalability.
Overhaul binary analysis to better attribute performance to highly-optimized
code that involves inlined functions, inlined templates, outlined OpenMP
functions.
Removed dependence on a locally-modified copy of binutils and switched
to use binutils 2.27, which supports Power8/LE and ARM64.
Updated build infrastructure to use newer versions of autotools that that
recognize the Power8 little-endian system type.
- autoconf 2.69.
- automake 1.15.
- libtool 2.4.6.
Use boost version 1.59.0.
Use dyninst version 9.3.0.
Use elfutils version 0.167.
Use libdwarf version 2016-11-24.
Use libmonitor version from Sept 15, 2016.
Use libunwind version from Feb 29, 2016.
Known problems:
HPCToolkit's GUI's are not yet available for ARM64 platforms.
Binary analysis on Power8/LE may fail to fully analyze routines that contain
switch tables. This has several effects.
(1) Samples attributed to code regions in a routine that are overlooked by
the binary analyzer will be attributed to the first source line of the
enclosing routine.
(2) Loops in code regions that are overlooked will not be reported in
hpcviewer.
----------------------------------------
HPCToolkit Version 5.4.x, Dec 2015
----------------------------------------
Merged the hpctoolkit-parseapi branch into trunk. Overhaul hpcstruct
and banal code to use ParseAPI to parse functions, blocks and loops.
Rework how hpcstruct identifies loop headers to use a top-down search
of the inline tree.
Partial merge of the non-openmp features from the hpctoolkit-ompt
branch into trunk. Support for name mappings for <program root>,
<thread root>, etc, removing redundant procedure names, and better
name demangling.
Support for filtering nodes in the viewer, allows hiding a subset of
the CCT nodes according to some pattern.
Move repository to github.
----------------------------------------
HPCToolkit Version 5.3.2 [2012.09.21]
----------------------------------------
Fix a few show-stopper problems related to recent upgrades on jaguarpf
(interlagos) at ORNL since 5.3.1.
- hpcrun
- add support for decoding the AMD XOP instructions. this was
causing many unwind failures on jaguarpf.
- fixed a bug in hpcrun-flat that was breaking the build with
PAPI 5.0
- externals
- fixed symtabAPI to handle the new STT_GNU_IFUNC symbols in the
latest glibc. this was breaking hpclink on jaguarpf.
----------------------------------------
HPCToolkit Version 5.3.1 [2012.08.27]
----------------------------------------
- hpcrun
- add documentation about environment variable HPCTOOLKIT, needed for
use on Cray XE6/XK6
- add support to use HPCRUN_TMPDIR or TMPDIR. On Cray platforms,
/tmp is unavailable
- don't try to integrate calling context trees for threads into the
calling context tree for the main thread.
- miscellaneous changes to satisfy stricter dependence checking by gcc 4.7
and include files that omit unistd.h on Debian Linux
- ensure that partial unwinds are properly categorized
- address portability issues with idleness sample source, which supports
blame shifting for OpenMP
- hpcprof
- handle case when measurements directory and database directory are on
different file systems. in this case, files can't be simply unlinked and
relinked.
- hpctraceviewer
- add support for filtering out threads and processes that are not of
interest.
- add -j option to install script to enable installer to curry a path
for the java required into the hpctraceviewer script
- fix memory leak for SWT native objects
- hpcviewer
- support a mix of flattening and zooming in flat view
- add new formatting options for derived metrics
- add -j option to install script to enable installer to curry a path
for the java required into the hpctraceviewer script
- externals
- fixed problem with GNU binutils on 64-bit PowerPC that caused performance
data to be misattributed because function "D" symbols were overlooked
- added support to disable thread tracking (needed for CUDA support)
- update documentation
----------------------------------------
HPCToolkit Version 5.3.0 [2012.06.25]
----------------------------------------
- hpcrun
- add an IO sample source to count the number of bytes read and written
- add a Global Arrays sample source
- add an API for the application to start and stop sampling
- get the process rank for gasnet and dmapp programs (static only)
- have hpcrun compress recursive calls by default
- early support for a plugin feature
- strengthen the async blocks to better handle the case of running
sync and async sample sources in the same run
- add a check that hpctoolkit and externals use the same C/C++ compilers.
this should all but eliminate the 'GLIBCXX not found' errors
- externals
- binutils: put the line info table into a splay tree. for some
BlueGene binaries, this speeds up hpcstruct by as much as 20x
- libmonitor: update to rev 140. this better handles signals and
the side threads from the Cray UPC compiler
- open analysis: bug fix to avoid missing some loops
- symtabAPI: fix an off-by-one error that sometimes caused a
segfault (thanks to Gary Mohr at Bull)
- xerces: fix a compilation bug on BlueGene
----------------------------------------
HPCToolkit Version 5.2.1: [2011.12.30]
----------------------------------------
A performance enhancement and maintenance release.
- hpcrun
- instead of discarding partial unwinds, record them in a special subtree
- create option for compressing call paths with (simple) recursive
function invocations
- rework representation of call path profile metrics to support
collection of tens or hundreds of metrics per run
- program PAPI events to count during system calls
- correct several subtle monitoring problems
- improve performance of memory leak detection
- fix how hpcrun opens files to be robust on systems where gethostid
or getpid don't behave as expected
- hpcviewer includes key performance enhancements:
- rapidly render scatter plots of metric values over all "MPI ranks"
(or threads)
- several bug fixes
- hpctraceviewer
- significantly enhance rendering performance
- several bug fixes
----------------------------------------
HPCToolkit Version 5.1.0: [2011.05.27]
----------------------------------------
- Full support for analyzing 64-bit Linux-POWER7 binaries [hpcprof, hpcstruct]
- To profile/trace large-scale executions, add support for sampling
processes to measure. Experimental. [hpcrun]
- W.r.t. binary analysis to compute unwind information for call path
profiling: improvements computing function bounds on partially
stripped code (32-bit x86 and libc on Ubuntu) [hpcfnbounds]
- Monitoring threaded applications: fix a rare race condition tracking
thread creation/destruction [libmonitor]
- When analyzing multiple measurement databases: resolve certain bugs
[hpcprof/mpi]
- When analyzing measurement databases in parallel: resolve a bug that
could cause summary metrics for certain Calling Context Tree nodes
to be compute incorrectly. [hpcprof-mpi]
- Upgrade libraries for reading binaries [hpctoolkit-externals]
- upgrade to Symtab API 7.0.1
- upgrade to libelf 0.8.13
- upgrade to libdwarf-20110113
----------------------------------------
HPCToolkit Version 5.0.0: [2011.02.15]
----------------------------------------
- HPCToolkit supports Cray XT, IBM BG/P, generic Linux-x86_64 and Linux-86
- Major overhaul of hpcrun:
- fully support dynamic loading
- fully support gathering thread creation contexts
- reenable support on x86 for tracking return counts
- use a CCT where a node keeps its children in a splay tree
- Rework post-mortem analysis tool, hpcprof:
- create scalable tool (hpcprof-mpi)
- unify hpcprof and hpcprof-mpi interface (and, when possible, internals)
- automatically compute summary metrics in both hpcprof/mpi
(using both incremental and 'standard' methods)
- rework how hpcprof/mpi searches for source files to 1) find files
more frequently; and 2) perform much more rapidly when multiple
source files must be found (the common case). No longer enter an
infinite loop if symbolic links create a cycle.
- support for --replace-path
- Rework hpcviewer:
- incrementally construct Callers View
- correctly construct Callers and Flat Views, even with summary
metrics (and without thread-level metrics)
- generate scatter plots of out-of-core thread-level metrics
- correctly expand hot path
- Rewrite hpcsummary
- Create (and recreate) a Users Manual
- Updates to HPCToolkit's externals
- full support for configuring/building from an arbitrary directory
and installing to an arbitrary --prefix.
- performance-patched binutils 2.20.1 (from 2.17)
- xerces 3.1.1 (from 2.8)
- libmonitor r116
----------------------------------------
HPCToolkit Version 4.9.9: [2009.08.29]
----------------------------------------
- Rewrite hpcrun (nee csprof)
- Support monitoring statically linked applications using hpclink
- Rename hpcrun to hpcrun-flat
- Rewrite hpcstruct (nee bloop)
- Rewrite hpcprof (nee xcsprof)
- Rewrite hpcprof-flat (nee hpcview, hpcquick, hpcprof)
- Rewrite hpcviewer
- Rewrite externals manager
- Move source code to SciDAC Outreach
- Web page: www.hpctoolkit.org
----------------------------------------
HPCToolkit Version 4.2.1: [2006.06.17]
----------------------------------------
- Use binutils 2.16.1
----------------------------------------
HPCToolkit Version 4.2.0: [2006.04.07]
----------------------------------------
- Add csprof to source tree
- Integrate separate version of xcsprof into source tree
- Integrate csprof viewer into source tree
----------------------------------------
HPCToolkit Version 4.1.3: [2006.01.09]
----------------------------------------
- Split code for hpcrun and hpcprof.
- Automake 1.9.5; Libtool 1.5.20 (still Autoconf 2.59)
----------------------------------------
HPCToolkit Version 4.1.2: [2005.08.15]
----------------------------------------
- Merge 'lump' (load module dumper) into source tree. Build when
configured with --enable-devtools.
- Automake 1.9.5; Libtool 1.5.18 (still Autoconf 2.59)
----------------------------------------
HPCToolkit Version 4.1.1: [2005.04.21]
----------------------------------------
- Add support for Group scopes
- hpcquick produces hpcviewer output by default
----------------------------------------
HPCToolkit Version 4.1.0: [2005.03.18]
----------------------------------------
- HPCToolkitRoot
- Improve sub-repository checkout
- Improve building by determining and propagating compiler
optimization and develop options when building external
repositories (OA, xerces, binutils)
- OpenAnalysis is now 'NewOA'
----------------------------------------
HPCToolkit Version 4.0.5: [2005.01.20]
----------------------------------------
- Binutils performance tuning:
- Replace binutils' DWARF2 linear-time line-lookup algorithm with
binary search.
- Use a one-element cache to drastically speed up ELF symbol table
function search
- Update libtool, autoconf and automake to fix build problems on
IRIX64 (linking templates), Tru64 (including templates in libtool
archives), and Linux (missing .so).
- Upgrade xercesc from 2.3.0 to 2.6.0
- hpcrun/hpcquick: minor fixes
----------------------------------------
HPCToolkit Version 4.0.0: [2004.11.06]
----------------------------------------
- Create HPCToolkitRoot, a shell repository to make obtaining sources,
building and installation easier.
- Autoconf HPCToolkitRoot and hpcviewer.
- HPCToolkit now uses libtool to build libraries.
- Revamp launching of hpcview et al. to not be dependent upon
Sourcemes. All needed envionment variables are set dynamically.
- Overhaul hpcquick to canonicalize all performace data files into
PROFILE files before extracting metrics. Include support for processing
hpcrun files.
- Merge separate hpcrun/hpcprof into HPCToolkit code base.
- Extend hpcrun:
- introduce support for profiling statically linked applications
- create profiles of multiple PAPI or native events
- monitor POSIX threads
- follow forks
- profile through execs
- create WALLCLK profiles
- Fix bugs in hpcprof:
- When no line information could be found, samples were dropped
- Fix several type-size problems.
- bloop:
- processes DSOs
- recognize one-bundle loops on IA64 (PC-relative target is 0)
- classify return instructions in IA64, x86 and Sparc ISAs classify,
ensuring that in CFG construction no fallthru edge is placed
between the return and possible subsequent (error handlin) code.
On Itanium with Intel's compiler, this can have drastic effects.
Make necessary changes to GNU binutils to propgate this information.
- Add options to treat irreducible intervals as loops and to turn
off potentially unsafe normalizations.
- hpcview and hpcquick handle multiple structure files
- Port HPCToolkit to opteron-Linux.
- Update documentation accordingly.
----------------------------------------
HPCToolkit Version 3.7.0: [2004.03.12]
----------------------------------------
- Replace make system with Autoconf/Automake make system
- Update code so it can be compiled by GCC 3.3.2
- hpcview:
- Use of hpcviewer is now preferred method for viewing data (as
opposed to static HTML database).
- bloop:
- Enable support for Sun's WorkShop/Forte/ONE compiler
- Revamp scope tree builder
- Rewrite key normalization routine (coalesce duplicate statements)
- Enable support of long option switches; improve option parsing.
- xprof:
- Enable reading of profile data from stdin or file
- Add basic support for processing Alpha relocated shared libraries
- Enable support of long option switches; improve option parsing.
----------------------------------------
HPCToolkit Version 3.6.0: [2003.07.05]
----------------------------------------
- Extend xprof to compute derived metrics from DCPI profiles. In the
process, significantly revise/rewrite most existing xprof code and
extend ISA and AlphaISA class.
- Significantly revise hpcquick to accept PROFILE files with -P option.
- Fix various hpcview bugs and use new xerces SAX interface.
- Revamp HPCToolkit make system (portions of the source tree can easily
be removed without breaking the build).
- Remove OpenAnalysis, binutils and xercesc from source tree
- Convert OpenAnalysys' make system to Autoconf/make.
- Add front end make system for binutils and xercesc.
- Update HPCToolkit tests for ISA changes and new xprof.
- Update to binutils 2.13.92 (snapshot) and then 2.14 (official release).
- Update to xercesc 2.3.0.
----------------------------------------
HPCToolkit Version 3.5.2: [2003.03.28]
----------------------------------------
- Rename from HPCTools to HPCToolkit to distinguish from others' use of
the name.
- Convert from RCS to CVS.
----------------------------------------
HPCTools Version 3.5.1: [2003.03.07]
----------------------------------------
- Update PGM and PROFILE formats to support load modules; other minor
tweaks. Update hpcview, bloop, xprof and ptran to use the new formats.
- Add initial DSO support to LoadModule classes.
- Test updates
- Add LoadModule library tester.
- Update support library tests.
- Add filter script for f90 modules (f90modfilt).
- Miscellaneous tweaks
- Fix strcpy bug in GetDemangledFuncName().
- Makefile tweaks
- Convert ArchIndType.h limits from const (statics) to #defines.
- Make trace a global variable so that tracing can be globally
switched on/off.
- Update alpha macros to support alpha GCC compiler
- hpcquick now supports recursive paths to option -I.
----------------------------------------
HPCTools Version 3.5.0: [2003.02.24]
----------------------------------------
- Merge HPCView 3.1 and HPCTools 1.20 into one distribution
- eliminate code duplication (support library, DTDs)
- port HPCView to ANSI/ISO style C includes (<cheader>; all functions in
std namespace
- unify and improve documentation
- Improve make system (e.g., each library is now built in a separate
location).
- Improve code organization (rename 'libs' to 'lib'; rename and cleanup
bloop's scope tree files; move general types files to 'src/include')
- Improve and test hvprof with PAPI 2.3.1
[see below for HPCView revision history]
----------------------------------------
HPCTools Version 1.20: (bloop 1.20, xprof 1.20) [2002.10.11]
----------------------------------------
- Add xprof test suite.
- Make minor changes to support GCC 3.2.
- Rewrite GNU binutils patch (for dwarf2.c) that handles the
out-of-order line sequences of Intel's 6.0 compiler. (The patch is
faster, slightly more accurate, and makes GNU happy.)
- Change the method of preventing conflict between GNU and Sun's demangler.
----------------------------------------
HPCTools Version 1.10: (bloop 1.10, xprof 1.10) [2002.08.30]
----------------------------------------
- Allow for use of either old-style C headers (<header.h>) or new
ANSI/ISO style (<cheader>; all functions are in std namespace). The
new style is now default.
- Improve error and exception handling. Detect memory allocation errors.
- Fix bug in GNU-binutils ECOFF reader.
----------------------------------------
HPCTools Version 1.05: (bloop 1.05, xprof 1.05) [2002.08.23]
----------------------------------------
- Update to use binutils-2.13
- Extend VLIW interface throughout HPCTools. (Impose explicit pc +
operation index interface for instructions. Now, many comments are
not lies!)
- Miscellaneous fixes and cleanup.
----------------------------------------
HPCTools Version 1.00: (bloop 1.00, xprof 1.00) [2002.08.16]
----------------------------------------
- Major revisions:
- Replace EEL binary support with new binutils library (uses GNU's
binutils) and new ISA library
- bloop: Replace EEL analysis with OpenAnalysis
- bloop: Add two new targets (i686-Linux, ia64-Linux) and improve support
for existing targets (alpha-OSF1, mips-IRIX64, sparc-SunOS);
- Support 'cross target' processing
- Miscellaneous fixes and cleanup.
- bloop: Use both system and GNU demangler in demangling attempts
- Update to use binutils-2.12
- Update to read (abnormal) Compaq ECOFF debugging info.
- Update to read (abnormal) SGI -64 DWARF2 and g++ -64 DWARF2 debugging
info.
- Update to read (abnormal) Intel DWARF2 debugging info
- Misc. updates
- Reorganize HPCTools directory tree
- Remove (outdated) backwards compatibility for non-standard STL headers
- Add hvprof to HPCTools/tools/hvprof
----------------------------------------
HPCTools Version 0.90 (bloop 0.90, xprof 0.90) [2001.09]
----------------------------------------
- Port to alpha-OSF1
- Port EEL to alpha
- Fix binutils 2.10 ECOFF support
- Add xprof tool (beta) for processing Compaq dcpi output
- Replace EEL dominator analysis with tarjan analysis
- Improve PGM scope tree normalization
- Bring code into compliance with ANSI/ISO C++
- Remove STLPort and use standard STL
- Bug fixes
----------------------------------------
HPCTools Version 0.80 [2001.02]
----------------------------------------
- Port to mips-IRIX64
- Port EEL to IRIX64
- Fix binutils 2.10 DWARF2 support
- Bug fixes
----------------------------------------
HPCTools Initial Version
----------------------------------------
- Support for processing sparc-SunOS binaries compiled with GCC
- Use EEL to read binaries and find loops
- Update EEL to use binutils 2.10
----------------------------------------
HPCView Version 3.1: [2003.01]
----------------------------------------
- Port to ia64-Linux
- Generated HTML/CSS and JavaScript now works with
Internet Explorer 5.0+, Netscape 4.x, Netscape 6.2+, or Mozilla 1.0+.
With one minor exception (in order to support Netscape 4.x) hpcview
generates valid HTML 4.0 Transitional/Frameset and CSS1/2 style
sheets.
- Update to use Xerces 2.1.0
----------------------------------------
HPCView Version 3.0 []
----------------------------------------
- This is a major revision to add the option of emitting a compact,
XML-formatted representation of internal data representation
("scope tree") so that it can be efficiently processed by
(1) HPCViewer, a stand-alone performance data browser (to be
released real soon now).
or (2) other down-stream analysis tools.
The original static HTML output is still supported.
- Almost all of the command line switches have changed. Type "hpcview"
(no arguments) for command line documentation.
- More run-time assertion exits have been replaced with informative
error messages.
- Update to use Xerces 1.7.0
----------------------------------------
HPCView Version 2.01 []
----------------------------------------
- Fixed bugs in support for flattening that caused some files to be omitted.
- Regularized error handling for metrics so that file and computed metric
errors are reported uniformly.
- Added -w option and omitted warnings for file synopsis generation at
default warning level.
- Fixed error handling so that errors caused by missing file metric data
or missing inputs to computed metrics (among other metric errors) cause
an error message to be printed and execution to halt.
- Separated out and removed dead code.
----------------------------------------
HPCView Version 2.0 []
----------------------------------------
- Added support for incorporating program structure information
generated by a binary analyzer. This enables the hierarchial scopes
display to show loop-level information. The binary analyzer is
language independent so this works for any programming language
whose compiler generates standard symbol table information.
- Hierarchical scopes display now supports "flattening" which allows one to
compare not only children of a scope, but grand children, great
grandchildren, etc. This allows one to view a program as composed as
a set of files, procedures, loops, or (when completely flattened) a
set of non-loop statements.
- Source pane selections now navigate hierarchical scopes display to
display the selection.
- Performance table eliminated because it is subsumed by the
hierarchical scopes display with flattening.
- Generated HTML and JavaScript now works with Internet Explorer 5.5
as well as Netscape Navigator 4.x.
- Sources adjusted to use standard STL.