-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New compiler: support function overloading #2533
Comments
My first idea was using symbol table entry numbers, but just using symbol table entry numbers to identify the types won't be enough. The issue is, a Instead, we'd probably need to stringify the types in a unique way and use some sort of concatenation of this for the mangled function name. One could imagine mangled function names such as
That might make for very long mangled function names, but as far as I know, other ecosystems such as GNU's To stringify a type in a unique way, as a first approach we might linearize its composition of data components (and ignore the attributes and function components)
If you've already GOT a stringification for types as part of your RTTI scheme, let's take that. If the mangled function names become too long, we might hash them so that we can look them up first by hash, then by exact name. The hashing only needs to occur once, at the linking stage, so this will probably not slow down program execution. |
So, in RTTI there are local tables with local typeids, and a joint table created at runtime, which assigns global numeric ids to the types. These tables are binded using a fully qualified type name (string). This is similar to how function fixups is done across scripts. I suppose that if you will use numeric local type ids in this mangled name, then it will be possible to resolve them into global type ids (whether numeric or string) at the script linking stage in the engine. As for string names, RTTI currently does not have any "shortcut" names, only full names. It's possible to add "shortcuts" there though, generated using your proposed "stringification" rules, if using numeric typeids does seem inconvenient. EDIT: |
Hm. We might keep things simple. We don't need RTTI at link time, but when RTTI already has a way to name types uniquely in a string (long names), then we could use just this naming mechanism and name the functions with these long names, too. That is,
That makes for long function names, but AFAIK they are only used for linking and aren't usually shown to the programmer.
It seems that the engine does 2. and 3. already, so this might be a comparatively simple modification of pre-existing code. In communication with the programmer, IMO they need to be told what they need to be told, it can't be helped. So when there are several functions of the same name that have different parameter lists and an error message needs to refer to one specific function, it would call the function |
Are bool and enums assumed to be int in this overload idea? Or would they be their own stuff? I am curious if this put new overhead in script function calls at runtime or if these would be figured out at compile time. Just to expand a bit, assuming we do have it and let's we use it in Maths in the engine API to also support int, I imagine in the ags manual we would have all the overloaded methods in the same entry. |
Currently, the compiler doesn't know As concerns We don't have casts in the AGS language, neither the C++ casts nor the C casts. So the language kind of relies on being able to assign ints to enums and vice versa, |
In other words, in order to distinguish overloads with bools and different enums, the compiler would have to register
Function overloads are resolved at compile and linking time. |
I looked into this a little. Where I got stumbled is the compiler part, and particularily the question of how to register different function variants in the symbol table. Just looking at options, having all variants under same symbol does not look feasible, because symbol must have "Declared" location, "Scope", and "LifeScope" (that was added for the runtime reflection), and obviously overloaded functions will have different ones. So, they must be separate symbols. But in this case, their string "Name" must be different, or they would require some additional field to be able to distinguish them when they are being looked up by the name. The Parser deals with Symbols, which are integer ids. The mapping between a Symbol and a Name is performed earlier by the Scanner. And Scanner does know nothing about function overload. OTOH there's already a separation between "unqualified" symbol and "qualified" symbol; currently it's used for things like struct member functions: a function may first be recognized by its "unqualified" symbol (matching "funcname" string), and then registered with "qualified" symbol (matching "type::funcname" string). This is done by Parser. So Parser should be able to do the overloading handling as well. Starting with "unqualified" function symbol, and generating a "qualified" function symbol, which corresponds to particular overload. The question remains how to distinguish these overloads, other than by symbol id itself. Should they have different Name value, or have other field that lets distinguish them? Afaik Name is used mostly for reporting compiler messages, and registering RTTI/TOC tables, but not in the parser logic (?). I don't have a full picture of this yet, but might research this further. I suppose, that Parser would need a quick way of finding other function variants. Maybe FunctionD member of a symbol entry could contain a vector of "overloads" with symbols with them. Similarly, a "Constructor" field in VartypeD, which I added recently, would be replaced with a vector of symbols, for multiple constructors. |
One way of having several parameter lists to one function name in the symbol table: Keep a vector of all the parameter lists.
Imported and exported functions would not be tracked by the name only, but by a "mangled name" that encodes the parameters, too.
We might do it so that a function This will still mean that for each parameter list, an offset into the code must be tracked where the function code begins, etc. How this would be done is a hassle, but it is doable. One way of doing it would be, have several entries in the symbol table:
|
Once we add functions that can have more than one parameter list, then our error messages will perhaps become less helpful to the programmers: Let's say that we have Now let's add Now when the compiler encounters So we might have to weigh what is more important to us: A language with more possibilities, or a restricted language with more specific help from the compiler. |
This sounds like a better plan.
|
Is function overloading something that assumes the return value is the same between functions? Like, could I have // header file
import int Square (int value);
import float Square (float value);
// script file
int Square(int value)
{
return value*value;
}
float Square(float value)
{
return value*value;
} |
Traditional rules is that return types may be different, but the functions cannot differ only by return type. That is because compiler won't be able to deduce which variant to call. Meaning, you can have:
But you cannot have
|
The proposal is to support script function overloading.
CC @fernewelten
Function overloading means that you may have multiple functions of identical name, but different prototype (return value and parameter list). For example:
EDIT: since we now have struct constructors (#2582), overloading should be supported for constructors too.
NOTE: overloading must have different argument list, it cannot support function variants that only differ in return type, because there will be no way to tell which of those variants is being called.
In order to support this, function variants must be distinguished on both compilation and linking stages. In other words, each function variant must be registered under a unique internal name. Right now AGS uses a "FUNC^N" notation for distinguishing imports with different number of parameters (and afaik "FUNC$N" as a corresponding export name). This was done primarily to let link deprecated API functions in the engine (i think). But number of parameters is not enough for overloading, as we would also need to differentiate variants with different return and argument types.
The first idea that comes to mind is to generate a second suffix which contains encoded parameter types. Note that they do not exactly have to be uniquely identified throughout the script or game: for the purpose of overloading itself having different suffixes is enough. But it may be still beneficial to have a strict rule for these, i.e. not a random garbage, at least because this may be useful for debugging. And there may also be additional uses found later, so it would be best to not block this opportunity.
Now, this is where this becomes bit complicated. I may imagine that primitive types such as ints, floats, etc, could be identified by a single letter, like
i
,f
, etc, but what about others? Having a single letter will not be suitable, having full type name may make this internal name quite long.As a random idea there may be a "compressed" name generated as min number of characters enough to distinguish the type, maybe starting with 3 letters (unless the type is shorter). And then this type name "shortcut" is also saved somewhere, like in RTTI table, as a way to reference a type, in case we may need to quickly find that type's entry.
Are there any other visible options here?
The text was updated successfully, but these errors were encountered: