-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the interpreter #3
Comments
Regarding proposal 3: I think I've implemented a drop-in replacement for Runtime.Data.NativeRow that will store the elements in unboxed form, and will allow getting/setting their values using type-parameterized methods. |
Attempted proposal 1. Issue: NilException is really pervasive! There are many places where nil is a perfectly fine result of evaluation, and there are way too many places where Evaluate is called on a plan subnode, so it's hard (for me) to decide how to rearrange the code. Proposal 1 seems out of reach. |
Sorry for the delayed response, but regarding proposal 1, when we first built Dataphor, we had exceptions for accessing nils, so that nil was effectively treated just like a null in C#, but we found that it caused many more problems than it solved, and ended up allowing nil propagation in general. Having said that, I like the suggestion that an operator be able to inform the compiler about it's nil propagation behavior so that the nilable characteristic could be inferred more generally, but I'm not sure what would be gained by throwing exceptions when nils are encountered? We can already use constraints to prevent assignment of nil, what problem is being addressed by adding the NilException? Regards, |
Regarding proposals 2 and 3, I agree, that would be a useful change, and I have been working quite a bit in that area lately. I would definitely be interested in seeing what you are doing there. Specifically, I've been looking at dealing with all values natively, rather than having to have DataValues. To do this, I'm introducing interfaces to support access to data values of different categories of types. IDataValue being the most general, which only specifies an IDataType. Not that all native types would have to implement IDataValue, but if they did, the engine could take advantage of it. Also looking at being able to support IList directly (instead of ListValue), and introducing ITable and IRow to deal with Table and Row. I have committed some of these changes to trunk already as part of adding more RESTful functionality, and some integration work with FHIR. Regards, |
It is currently the case that the interpreter code is littered with null-checking. I was under the impression that in null-heavy programs, using exceptions instead of explicit branching was faster (no idea where I got that, but probably read in some paper...). Regarding the nil-propogation characteristic: if nils are properly accounted for by an operator (e.g., we can say that it can never return a nil if its inputs are non-nil), then we could use the native .NET representations for the argument and result types, it should supposedly help with performance of arithmetic and comparison operators. However, I have yet to implement this characteristic.
I'm basically following the paper Self-optimizing AST interpreters, in particular: subsection 5.3 (return type specialization). I'm planning to go a bit further: let every plan node implement static methods with the native input types and the result type, then we could generate some IL code at call sites to avoid boxing completely. Much of IL code generation would then consist of a generic "apply function to arguments" routine. |
Please find an example of proposal 2. Obviously it needs more polish (I don't follow coding conventions..), but it shows concretely what I am proposing to do. I would like to know what you think (and especially, if you foresee any issues with this approach that would limit its usefulness). A simple expression like
That would be great. For instance, row types could be generated at runtime, and if they support some interface (say IRowType), then it would make it much easier to use from non-generated code. Also, tables could use different runtime representations as long as they conform to an interface. |
Stumbled on a generic nil. In an expression like How is generic supposed to work? (We don't have any generic code in D4 at the moment.) Is it like a C# |
Nils are typed as generic, but when attempting to resolve an operator reference, the compiler will look for a "compatible" type, and a type that is only "generic" is considered compatible with any type, similar to the way an untyped null is handled in C#. So in that respect, generic types in D4 are more like parametric types, but without the ability to declare anything other than the generics: "generic", "row", "list", "table", and "cursor". |
Thanks for the note on generic. I've implemented some basic imperative node compilation. Currently using the interpreter stack (so every assignment and variable reference involve boxing/unboxing), but I'd like to change that (eventually). Some sample programs that work:
Planning to tackle operator compilation and CallNode next. |
Very cool, I like the approach. I am concerned about having to create .NET types for every row type though, but it does have a lot of advantages. I will definitely take a look and see if this approach could work for native row representation. |
Also very cool, and yes, I think the approach will work. We started on IL emission a long time ago, but never completed the effort. I am working on general memory and performance improvements now though, and IL compilation is close to the top of the list. |
Note that I've been working on this some more, and the progress so far is as follows.
BooleanLibrary
Currently, the expression like Issue: DDL nodes, like CreateOperatorNode, cannot be truly compiled, because these nodes rely on some member variables when executed. I haven't been able to find an easy way to access these variables from the generated code. I'm wondering if you have something to propose about this? Currently working on removing the use of interpreter stack in compiled operator bodies. These are the issues I'm planning to work on after that:
|
One fairly major concern I have with actually building .NET types for rows is that there is no way to unload assemblies (without unloading the entire app domain), so building types for each row type effectively constitutes an unbounded cache. The system could remember types to reduce the number of unnecessarily created new row types, but it would still represent a potential memory issue. At the same time, I love the idea that a row type is just a class with properties for each attribute of the row, so I'm torn here. |
I've picked the idea of canonicalizing row types from some paper on extensible records. Basically, |
I've just pushed an update: dropped the use of interpreter stack in compiled code. TestFramework.Coverage.Scripts.Timing runs in about 1 second on my machine (at 10000000 iterations). Quite sure it could be further improved. One realization I had while working on this is that D4 programs use an unnamed De Bruijn representation for variables. Then, it occured to me that we could still introduce some local variables as needed (e.g. for nil checking), provided that the VariableContext stack is maintained in 1:1 correspondence to the Symbol stack. The alternative would be to introduce a let-insertion phase in the D4 compiler, such a phase would name every subexpression with a fresh variable name. The input program like:
would get transformed into:
I guess that the let-insertion phase is much harder to implement. Not using Program.Stack at all breaks batches (since a variable declaration at top level is just another batch, it gets compiled to a meaningless static function with 1 local). Any ideas on solving this issue? |
Below are some thoughts on improving the interpreter. Posting mainly for discussion.
Proposal 1 (on nil handling in the interpreter):
begin Foo(); Bar() end
, where Foo() is a function, a call to Foo() is evaluated and results in a nil)The upside is that a lot of repetitive conditional logic is replaced by exception handling, which should also be faster at runtime.
Proposal 2 (avoiding scalar type boxing when calling operators):
The PlanNode.Execute() function works only on boxed types. Avoiding boxing of return types and of arguments should improve performance considerably (by lowering GC stress).
A (Program, A1,A2,...,An)
if the operator is nil-propagating, andMakeNullable<A> (Program, MakeNullable<A1>,MakeNullable<A2>,...,MakeNullable<An>)
if the operator is not nil-propagating, where MakeNullable is just like Nullable if T is a value type, and is just T otherwiseType _returnType
for the .NET type it's Evaluate method evaluates to (or, introduce TypedPlanNode)Note that we could probably dispense with the Nodes array in the plan node for making every node typed on its arguments. For instance, IntegerPowerNode will have members
_left
of typeTypedPlanNode<int>
and_right
of typeTypedPlanNode<int>
. There doesn't seem to be much code that goes over the Nodes array in some involved computation.Is there a characteristic for operators which could help in deciding if Program argument is necessary for the operator to evaluate?
Obviously, some unboxing/boxing will still be necessary (e.g. when reading from/storing into variables).
Proposal 3 (avoid boxing of rows):
This one I haven't figured out yet. Currently, all rows are generically boxed (see Runtime.Data.Row and Runtime.Data.NativeRow).
http://www.codeproject.com/Articles/13337/Introduction-to-Creating-Dynamic-Types-with-Reflec
I guess we could specialize some interpreter code with the exact row types?
The text was updated successfully, but these errors were encountered: