-
Notifications
You must be signed in to change notification settings - Fork 10
/
proxyFuncs.Rdb
368 lines (343 loc) · 10.2 KB
/
proxyFuncs.Rdb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
<?xml version="1.0"?>
<article xmlns:r="http://www.r-project.org"
xmlns:xi="http://www.w3.org/2003/XInclude"
xmlns:c="http://www.C.org"
xmlns:omg="http://www.omegahat.org"
xmlns:sh="http://www.shell.org">
<articleinfo>
<title></title>
<author><firstname>Duncan</firstname><surname>Temple Lang</surname>
<affiliation><orgname>University of California at Davis</orgname>
<orgdiv>Department of Statistics</orgdiv>
</affiliation>
</author>
</articleinfo>
<section>
<title></title>
<r:init>
library(RLLVMCompile)
</r:init>
<para>
Consider the task of interfacing <r/> to a library such as libcuda and
libcudart. We can do this manually by writing <c/> routines and <r/>
functions for each routine of interest in the library. We also have
to write C and R code to access, create, query and modify the data
structures, i.e. <c:struct/> types. Alternatively, we can use a
dynamic interface such as <r:pkg>rdyncall</r:pkg> and
<omg:pkg>Rffi</omg:pkg>. This avoids writing wrapper routines.
Instead, we need to be able describe each of the routines and data
structures. With these descriptions, we can access the routines and
fields. In this document, we will explore a different approach that
embraces the dynamic approach, but avoids using
<r:pkg>rdyncall</r:pkg> or <omg:pkg>Rffi</omg:pkg>.
Instead, we will use the <omg:func>Rllvm</omg:func>
and <omg:pkg>RLLVMCompile</omg:pkg> packages
to dynamically generate machine code to invoke
native routines.
</para>
<para>
We'll start with a simple example.
Suppose there already exists a <c/> routine named
<c:func>fib</c:func> and it is available via a dynamic library
(DSO/DSL).
We can load it into <r/> with, say,
<r:code>
dyn.load("fib.so")
</r:code>
and access the <c/> routine as <c:func>fib</c:func>.
</para>
<para>
Now, let's think about an <r/> function that acts as a simple
proxy for this.
We'll call this <r:func>fib</r:func>. We know that the input
type is an <r:var>Int32Type</r:var>.
We can write an <r/> function that interfaces to this with
<r:function eval="false"><![CDATA[
cat("XXXXXXX\n")
createProxy =
function(name, returnType, types = list())
{
mod = Module(name)
}
]]></r:function>
<invisible>
<r:code eval="false">source("createProxy.R")</r:code>
library(RLLVMCompile)
</invisible>
</para>
<para>
How do we get the descriptions of the native routines? Of course, we
can do this by hand, i.e. by reading the header files. Alternatively,
we can do this programmatically via a number of facilities. gccxml is
a extension to the GNU compiler that provides information about
routines and data structures. The <r/> packages
<omg:pkg>RGCCTranslationUnit</omg:pkg> and
<omg:pkg>RCIndex</omg:pkg> provide alternative mechanisms to get
the information. Given the description of the routines, we can generate
the code to invoke it.
In this way, the <omg:pkg>Rllvm</omg:pkg> package provides a
simplification of the <omg:pkg>Rffi</omg:pkg> and <r:pkg>rdyncall</r:pkg>
mechanism. However, it is much more powerful as it allows us to generate
code for more general situations, not just invoking existing routines.
In other words, we can compile <r/> code and other languages to native code.
</para>
<para>
The approach we use is quite simple, given the facilities in
the <omg:pkg>Rllvm</omg:pkg> and <omg:pkg>RLLVMCompile</omg:pkg> packages.
Essentially, we define a simple R function that is a direct proxy for the
native routine. It has the same parameters and the body merely calls
the native routine. In this sense, it is a true proxy function.
For example, consider the simple <c/> routine declared as
<c:decl>int fib(int)</c:decl>.
The <r/> proxy is simply
<r:function><![CDATA[
rfib=
function(n)
fib(n)
]]></r:function>
The function <r:func>mkProxyFn</r:func> creates this proxy function for us.
We can then pass this to <r:func>compileFunction</r:func> to
create the machine-level code to invoke the native routine.
We need to know the return type and the types of the parameters of
the native routine. Given these, we can make the <llvm/>
engine aware of this routine.
Then we can compile our proxy to invoke this.
</para>
<para>
We load the native code into R with
<r:code>
dyn.load("fib.so")
</r:code>
Next we can compile the machine code for our proxy
<r:code>
fi = createProxy("fib", Int32Type, list(n = Int32Type))
</r:code>
We use the same name as the native routine.
The name of the argument(s) is not important.
</para>
<para>
We can now invoke this function/routine with
<r:code>
.llvm(fi, 10)
</r:code>
</para>
</section>
<section>
<title>Programmatically Obtaining the Type Information</title>
<para>
We have show how we can create the proxy routine
using the type information.
The next step is to automate obtaining this type information.
We have several possible mechanisms.
We can nuse gcc-xml to dump information about C code into XML.
This is the approach <r:pkg>rdyncall</r:pkg> uses and provides.
I have two other approaches - <omg:pkg>RGCCTranslationUnit</omg:pkg>
and <omg:pkg>RCIndex</omg:pkg>.
</para>
<section r:eval="false">
<para>
To use <omg:pkg>RGCCTranslationUnit</omg:pkg>,
we have to compile our <c/> code to create a tu file.
We can do this with
<sh:code>
gcc -fdump-translation-unit -c fib.c -o /dev/null
</sh:code>
This creates <file>fib.c.001t.tu</file>.
We can then read this in <r/>
<r:code>
library(RGCCTranslationUnit)
tu = parseTU("fib.c.001t.tu")
</r:code>
<r:code>
r = getRoutines(tu)
rr = resolveType(r, tu)
</r:code>
<r:var>rr</r:var> is a list with just one element:
<r:code>
rr
<r:output><![CDATA[
$fib
[1] " int fib ( int n )"
attr(,"class")
[1] "ResolvedRoutineList"
]]></r:output>
</r:code>
</para>
<para>
This element contains information about the routine.
It is displayed here as a string, but in fact, is
much richer:
<r:code>
names(rr$fib)
<r:output><![CDATA[
[1] "parameters" "INDEX" "name" "node" "returnType"
[6] "pure" "virtual"
]]></r:output>
</r:code>
Importantly, we have the return type and the parameters
and their types, e.g.
<r:code>
rr$fib$returnType
<r:output><![CDATA[
An object of class "intType"
Slot "name":
[1] "int"
Slot "alias":
character(0)
Slot "qualifiers":
character(0)
Slot "scope":
NULL
]]></r:output>
</r:code>
This is a simple integer type (as we expected from the <c/> code).
This corresponds to <r:var>Int32Type</r:var> in <omg:pkg>Rllvm</omg:pkg>.
(See types.cc in <dir>examples</dir> in the <omg:pkg>RGCCTranslationUnit</omg:pkg> package for others.)
</para>
<para>
We can define a function that maps a type description from the
<omg:pkg>RGCCTranslationUnit</omg:pkg> package to an llvm type.
The basics of this are very simple, but there are lot of details to deal
with all the different sub-types.
<r:function><![CDATA[
library(Rllvm)
tuLLVMType =
function(type)
{
if(is(type, "ArrayType"))
return(arrayType(tuLLVMType(type@type), type@length))
else if(is(type, "PointerType"))
return(pointerType(tuLLVMType(type@type)))
switch(class(type),
intType = Int32Type,
doubleType = DoubleType,
shortUnsignedIntType = ,
shortIntType = Int16Type,
boolType = Int1Type,
longUnsignedIntType = ,
longIntType = Int32Type,
charType = Int8Type
)
}
]]></r:function>
This doesn't deal with extended/qualified types <c:type>long double</c:type>
but could be easily modified to do so.
</para>
<para>
We can then use this to get the types for the <c:func>fib</c:func> routine
and create our proxy function with
<r:code>
fun = createProxy("fib", tuLLVMType(rr$fib$returnType),
lapply(rr$fib$parameters, function(x) tuLLVMType(x$type)))
</r:code>
This is the same as we obtained above where we explicitly specified the types.
We can invoke the routine with
<r:code>
.llvm(fun, 20)
</r:code>
</para>
<ignore>
<para>
See types.cc and types.R in RGCCTranslationUnit/examples/
<r:code eval="false">
sapply(rgvars, function(x) tuLLVMType(x@type))
</r:code>
</para>
</ignore>
</section>
</section>
<section>
<title>Using <omg:pkg>RCIndex</omg:pkg></title>
<para>
We'll now see how we can do the same thing with
<omg:pkg>RCIndex</omg:pkg>.
We can start by obtaining the translation unit.
We could read it and traverse it in different ways.
However, we only want the descriptions of the routines, so we can
use
<r:code>
library(RCIndex)
funs = getRoutines("fib.c")
</r:code>
We can specify flags for the compiler via the <r:arg>args</r:arg> parameter of <r:func>parseTU</r:func>.
</para>
<para>
This is slightly simpler using <omg:pkg>RCIndex</omg:pkg>
as we can do it entirely in <r/>.
However, we did have to install the clang libraries.
</para>
<note>
<para>
If we want to read the
<r:code eval="false">
tu = parseTU("fib.c")
</r:code>
Then we can use <r:func>visitTU</r:func> to traverse the nodes.
</para>
</note>
<para>
<r:var>funs</r:var> is a <r:list/> and we can see the names of the functions
with
<r:code>
names(funs)
<r:output><![CDATA[
[1] "fib"
]]></r:output>
f = funs$fib
</r:code>
This element is a <r:class>FunctionDecl</r:class> object.
This is currently an <s3/> class which is a list
with elements for the return type, parameters and the function definition
node. These are all <r:class>CXCursor</r:class> objects.
</para>
<para>
We can find the nature of the return type with
<r:code>
getTypeKind(f@returnType)
<r:output><![CDATA[
Int
17
]]></r:output>
</r:code>
This is an enumerated type, with the name of the element giving us some
indication.
We can compare it the built-in values which are prefixed by <r:var>CXType_</r:var>,
e.g. <r:var>CXType_Int</r:var>.
</para>
<para>
We can also as many questions of the type, e.g. is it a constant, its size, etc.
For example,
<r:code>
getSizeOf(f@returnType)
</r:code>
yields 4.
</para>
<para>
We can also query the function's parameters:
<r:code>
sapply(f@params, getTypeKind)
</r:code>
</para>
<para>
We can now write our function that maps a CLang/<omg:pkg>RCIndex</omg:pkg> type
to the corresponding LLVM type.
<r:function><![CDATA[
clang2LLVMType =
function(type, kind = getTypeKind(type))
{
# if(kind == CXType_Array
switch(names(kind),
Int = Int32Type,
Double = DoubleType,
shortUnsignedIntType = ,
shortIntType = Int16Type,
Bool = Int1Type,
ULong = ,
Long = Int32Type,
SChar = Int8Type
)
}
]]></r:function>
</para>
</section>
</article>