-
Notifications
You must be signed in to change notification settings - Fork 0
/
INTERNALS
149 lines (122 loc) · 5.83 KB
/
INTERNALS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
The Systemtap Translator - a tour on the inside
Outline:
- general principles
- main data structures
- pass 1: parsing
- pass 2: semantic analysis (parts 1, 2, 3)
- pass 3: translation (parts 1, 2)
- pass 4: compilation
- pass 5: run
------------------------------------------------------------------------
Translator general principles
- written in standard C++
- mildly O-O, sparing use of C++ features
- uses "visitor" concept for type-dependent (virtual) traversal
------------------------------------------------------------------------
Main data structures
- abstract syntax tree <staptree.h>
- family of types and subtypes for language parts: expressions,
literals, statements
- includes outermost constructs: probes, aliases, functions
- an instance of "stapfile" represents an entire script file
- each annotated with a token (script source coordinates)
- data persists throughout run
- session <session.h>
- contains run-time parameters from command line
- contains all globals
- passed by reference to many functions
------------------------------------------------------------------------
Pass 1 - parsing
- hand-written recursive-descent <parse.cxx>
- language specified in man page <stap.1>
- reads user-specified script file
- also searches path for all <*.stp> files, parses them too
- => syntax errors are caught immediately, throughout tapset
- now includes baby preprocessor
probe kernel.
%( kernel_v == "2.6.9" %? inline("foo") %: function("bar") %)
{ }
- enforces guru mode for embedded code %{ C %}
------------------------------------------------------------------------
Pass 2 - semantic analysis - step 1: resolve symbols
- code in <elaborate.cxx>
- want to know all global and per-probe/function local variables
- one "vardecl" instance interned per variable
- fills in "referent" field in AST for nodes that refer to it
- collect "needed" probe/global/function list in session variable
- loop over file queue, starting with user script "stapfile"
- add to "needed" list this file's globals, functions, probes
- resolve any symbols used in this file (function calls, variables)
against "needed" list
- if not resolved, search through all tapset "stapfile" instances;
add to file queue if matched
- if still not resolved, create as local scalar, or signal an error
------------------------------------------------------------------------
Pass 2 - semantic analysis - step 2: resolve types
- fills in "type" field in AST
- iterate along all probes and functions, until convergence
- infer types of variables from usage context / operators:
a = 5 # a is a pe_long
b["foo",a]++ # b is a pe_long array with indexes pe_string and pe_long
- loop until no further variable types can be inferred
- signal error if any still unresolved
------------------------------------------------------------------------
Pass 2 - semantic analysis - step 3: resolve probes
- probe points turned to "derived_probe" instances by code in <tapsets.cxx>
- derived_probes know how to talk to kernel API for registration/callbacks
- aliases get expanded at this point
- some probe points ("begin", "end", "timer*") are very simple
- dwarf ("kernel*", "module*") implementation very complicated
- target-variables "$foo" expanded to getter/setter functions
with synthesized embedded-C
------------------------------------------------------------------------
Pass 3 - translation - step 1: data
- <translate.cxx>
- we now know all types, all variables
- strings are everywhere copied by value (MAXSTRINGLEN bytes)
- emit data storage mega-struct "context" for all probes/functions
- array instantiated per-CPU, per-nesting-level
- can be pretty big static data
------------------------------------------------------------------------
Pass 3 - translation - step 2: code
- map script functions to C functions taking a context pointer
- map probes to two C functions:
- one to interface with the probe point infrastructure (kprobes,
kernel timer): reserves per-cpu context
- one to implement probe body, just like a script function
- emit global startup/shutdown routine to manage orderly
registration/deregistration of probes
- expressions/statements emitted in "natural" evaluation sequence
- emit code to enforce activity-count limits, simple safety tests
- global variables protected by locks
global k
function foo () { k ++ } # write lock around increment
probe bar { if (k>5) ... } # read lock around read
- same thing for arrays, except foreach/sort take longer-duration locks
------------------------------------------------------------------------
Pass 4 - compilation
- <buildrun.cxx>
- write out C code in a temporary directory
- call into kbuild makefile to build module
------------------------------------------------------------------------
Pass 5 - running
- run "staprun"
- clean up temporary directory
- nothing to it!
------------------------------------------------------------------------
Peculiarities
- We tend to use visitor idioms for polymorphic traversals of parse
trees, in preference to dynamic_cast<> et al. The former is a
little more future-proof and harder to break accidentally.
{reinterpret,static}_cast<> should definitely be avoided.
- We use our interned_string type (a derivative of boost::string_ref)
to use shareable references to strings that may be used in duplicate
many times. It can slide in for std::string most of the time. It
can save RAM and maybe even CPU, if used judiciously: such as for
frequently duplicated strings, duplicated strings, duplicated strings,
duplicated.
OTOH, it costs CPU (for management of the interned string set, or if
copied between std::string and interned_string unnecessarily), and
RAM (2 pointers when empty, vs. 1 for std::string), and its
instances are not modifiable, so tradeoffs must be confirmed with
tools like memusage, massif, perf-stat, etc.