ParseXS: refactor OUTPUT and RETVAL #22731

iabyn · 2024-11-12T14:45:27Z

This set of changes does not require a perldelta entry.

My previous merge commit, 5.41.4-108-g9621dfa822, added Node::Param objects - which store information about each parameter in an XSUB's signature - but updated with any further data from an INPUT line.

This branch extends that work by:

Also updating each Param object with any data from OUTPUT lines.
Adding a synthetic Param object for RETVAL, which means that in many places, RETVAL can now be handled just as another parameter (albeit one which is declared but not initialised, and which doesn't consume an arg). There's still plenty of special-casing code for RETVAL, but it now tends to be more in the places where it will have an effect (e.g. some special-handling in INPUT_handler() if RETVAL is seen in an INPUT section) rather than the effect being a side-effect of the way the code works elsewhere.
Does some general refactoring of the code responsible for emitting output code, e.g. in generate_output() etc. There's still more refactoring to be done in this area, but this is a start.
fixes CASE statements, which were partially broken by the previous merge commit.

iabyn · 2024-11-13T14:47:02Z

I've pushed 3 further commits which fix the build under old perls and couple of other minor things I noticed along the way.

tonycoz · 2024-11-14T00:44:42Z

dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

    $self->blurt("Can't determine output type for '$var'");
-    next;
+    return;
  }


On reading the commit message, my first impulse is to look for a new test that would have caught this.

I don't see any tests that check for this message.

This is more of "panic: can never happen" message, which can't (or at least shouldn't) be triggered by real-life XS code. I only spotted it during a mistake I made while refactoring, when I got an "exiting via next" warning along with the indication that something was up with types.

On further thought, it can be triggered by real XS code. I'll add some tests on my next set of commits.

A recent piece of refactoring by me, v5.41.4-92-g9c24c4ed29, moved some error-checking code from OUTPUT_handler() to generate_output(). The former is in a loop, the latter isn't. So in if (bad) { blurt(..); next } the 'next' needs changing to 'return'. It turns out no tests yet exercise this error condition, so it it wasn't spotted at the time.

During earlier refactoring, the captures of a particular match were assigned to variables; but a few lines further down, the code was still using '$1' rather than the variable which $1 had been assigned to. This commit makes it use the var instead. No functional changes.

There is an optimisation whereby the OP_ENTERSUB's TARG is sometimes used to return the result, rather than creating a new mortal. This optimisation has been present for many years, but was untested. This commit adds tests, including ones for custom typemap entries of the sort that can still be optimised. It also tests for the optimisation in conjunction with OUTLIST args. It also adds tests for when the output code is specified explicitly on the OUTPUT line rather than being retrieved from a typemap. Currently this isn't actually optimised.

The perl code which does the emitting of the C code for the RETVAL value return using TARG if possible, is in a different place than the general RETVAL code emitting. This commit moves them into the same function. For now this is a crude cut and paste: any simplifying of the resultant function will be done later. For now, it's good just that more C code generation functionality is concentrated in one place rather than being diffused around all parts of ParseXS.pm. In more detail: before this commit, the code looked like: sub process_file { ... if (RETVAL needs returning) { if (have template code from OUTPUT line) { # just directly emit the template code } elsif (can do TARG optimisation) { # directly emit code to do PUSHi() etc } else { $self->generate_output({ var => 'RETVAL', ...}); } } ... } sub generate_output { ... if ($var eq 'RETVAL') { # emit generic C code to return RETVAL to the caller } ... } After this commit, it looks like: sub process_file { ... if (RETVAL needs returning) { $self->generate_output({ var => 'RETVAL', ...}); } ... } sub generate_output { ... if ($var eq 'RETVAL') { if (have template code from OUTPUT line) { # just directly emit the template code return; if (can do TARG optimisation) { # directly emit code to do PUSHi() etc } else { # emit generic C code to return RETVAL to the caller } } ... }

In particular, test that the 'SETMAGIC: EN/DISABLE' modifiers work, and that duplicate vars are reported.

If RETVAL appears twice in the OUTPUT section, it will be reported as an error (using blurt()), but parsing will continue. During parsing of the second RETVAL, it is treated as an ordinary parameter rather than special-cased as RETVAL. This doesn't seem useful, although ultimately it doesn't matter, as the parse will be reported as failing at the end anyway, so it doesn't matter what code gets generated in the meantime. This commit makes RETVAL still be treated as RETVAL on second appearance in OUTPUT, which get makes the code slightly simpler. Also, change the dup-detecting lines above it from a postfix 'if' to a block 'if', so that the flow-control 'next' appears prominently on its own line.

If the parser sees RETVAL in a CODE section, but there isn't an OUTPUT section for the RETVAL, then it emits a warning. Test for this warning. Note that it currently warns only if the OUTPUT section is not present; it doesn't warn if the OUTPUT is present but doesn't contain RETVAL. The next commit will fix this.

ParseXS currently emits a warning if RETVAL is seen in a CODE section but there is no OUTPUT section. This commit extends this so it warns even if there *is* an OUTPUT section but that section doesn't contain RETVAL. To stop one of our existing tests giving a false positive warning, this commit now makes it ignore 'PERL_UNUSED_VAR(RETVAL)' in a CODE section in terms of 'have seen RETVAL in CODE', since this isn't a real use of the variable. Note that the original intent of adding this warning was that, without an OUTPUT RETVAL the emitted code would still do XSRETURN(1), but nothing would be stored at ST(0), so whatever was returned on the stack at that slot would be whatever was already at that position on the stack, which might be one of the passed args, or random garbage. This is true regardless of whether OUTPUT is completely absent, or just RETVAL missing from OUTPUT. So it's better to warn in both cases, not just the former. Doing this also means that the 'xsub_seen_OUTPUT' field member is no longer needed, simplifying the code.

If a var is listed in an OUTPUT section, but is not actually a parameter, then the parser outputs an error message: Error: OUTPUT foo not a parameter This commit: - Fixes a typo in the error message: it was previously "not an parameter". - Extends the warning to include RETVAL where the XSUB is void. Previously, it would plough on, and then give an error that it couldn't find 'void' in the typemap, which is less obvious what the problem is. - Adds tests - this warning previously had no tests for it.

Currently, an E::PXS::Node::Param object each created for each parameter declared in the XSUB's signature. This object is then updated with any extra data that appears in an INPUT line. This commit extends that so that the Param object is also updated with any extra data from an OUTPUT line. This allows us to remove the $self->{xsub_map_varname_to_seen_in_OUTPUT} field. The next few commits will build upon this to simplify various bits of code, now that the Param object contains OUTPUT state too.

The generate_output() method currently takes a hash of args. The fields of this hash are typically generated on the fly by each caller from the fields of an ExtUtils::ParseXS::Node::Param object. This commit makes it so that instead, that Param object is just passed directly to generate_output(). It simplifies the code, and is a step along the path to the eventual goal of making generate_output() be a method of ExtUtils::ParseXS::Node::Param rather than of ExtUtils::ParseXS. This commit causes one slight change in behaviour: set magic; and this is arguably a bug fix. There are two ways of specifying args that get updated on return (the XS equivalent of 'sub foo { $_[0]++ }'): either by declaring the parameter as 'OUT' in the signature: int foo(OUT int a) or by listing it on an OUTPUT line. For the latter, the arg has set magic called on it after the update by default, but this can be overridden with the SETMAGIC keyword: OUTPUT: arg1 // has a SETMAGIC() call SETMAGIC: DISABLE arg2 // doesn't have a SETMAGIC() call SETMAGIC: ENABLE arg3 // has a SETMAGIC() call On the other hand, OUT params have set magic by default *unless* there's an OUTPUT section with at least one SETMAGIC line, in which case the last such SETMAGIC line dictates whether the OUT arg gets set magic. This seems wrong, and this commit changes it so that OUT params *always* get set magic. If this turns out to be wrong, the line to look at is print "\tSvSETMAGIC($arg);\n" if !$param->{in_output} || $do_setmagic; which says to do magic always for OUT params, and to honour the SETMAGIC in force at the line for OUTPUT params.

The last couple of commits broke the unusual (and possibly not very sane) combination of a parameter declared both as IN_OUTLIST and listed in an OUTPUT section. This means that generate_output() is called twice for the same parameter, once to update the value of the arg's SV on the stack, and once to push its value onto the stack. So, distinguish between those two effects by whether the caller of generate_output() passed an extra arg (for OUTLIST) or not, rather than by whether the Param object has an in_out field of /OUTLIST$/. This was manifesting itself as an extra PUSHs and SvSETMAGIC appearing compared with 5.40.0. Fix and add tests.

A recent commit moved, from outside to inside generate_output(), the perl code which is responsible for generating the C code which handles the "use TARG if possible when returning RETVAL" optimisation. This commit now follows that up by simplifying and tidying that code. Now that it's within generate_output(), the setting of $type, looking up of the typemap etc, have all already been done at the start of the sub, so there's no need to duplicate that effort. Also, set {output_code} as a field value in the faked-up RETVAL param object, so that inside generate_output() it can just check for a generic $param->{output_code} field rather than the special-cased $self->{xsub_RETVAL_typemap_code} value.

OUTPUT sections can include explicit setting code to override the normal typemap. For example: OUTPUT: arg1 # use standard typemap entry arg2 my_setiv(ST(1), (IV)arg2); # override Formerly, the custom code was emitted directly during parsing by OUTPUT_handler(). This commit moves that emitting code into generate_output(), so that the emitting with/without override are done in the same place. This is a small part of an effort to concentrate all the code-emitting parts into a small number of well-defined places.

In earlier refactoring, I renamed an E::PXS object field to xsub_seen_NO_RETURN, when it should actually have been to xsub_seen_NO_OUTPUT. This commit renames the field again. Should be no functional changes.

Add a test for the specific: OUTPUT: RETVAL template_override_code

Normally RETVAL is an implicitly declared local var, but it is legal to also declare it as an explicit parameter. Add tests for the various permutations of this: - void and long XSUB return values; - no param type; int type in signature; int type in INPUT; - with and without an autocall. This isn't well documented as to how it should be interpreted, so these tests are more about checking current behaviour so that inadvertent changes are detected, rather than approving the current behaviour. Some of the current behaviour is less than satisfactory (generally marked with 'XXX' in the test code comments). In particular, a commit shortly is going to change how RETVAL is implemented internally. It also adds a few tests for NO_OUTPUT.

Currently, RETVAL is handled almost entirely using special-case code all over the place. For non-void XSUBs, this commit adds a Node::Param object with a name of RETVAL to the start of the list of Param objects in the Sig object - but flagged as synthetic and no_init (so it gets declared but not initialised, and doesn't consume an argument from the stack). This makes the emitting of the some_type RETVAL; declaration be part of general parameter processing, rather than being special-cased. Soon, this new object will also be used to make emitting of the code to return RETVAL be more regular; but for now just the initial declaration is handled in the new fashion. Note however, that things get complicated when RETVAL is also declared as a parameter, e.g. int foo(RETVAL) long RETVAL .... or int foo(long RETVAL) .... or even as a parameter but without a type: int foo(RETVAL) # no INPUT lines ... or even as an alien variable: int foo() long RETVAL = 99; Much of the behaviour in such cases is unspecified, and is somewhat emergent - i.e. it behaves that way due to the way the code happened to be structured, rather than due to any particular intent. This commit tries very hard to preserve the current behaviour - hence all the tests added in the previous commit. It has however changed behaviour in a couple of places where the old behaviour emitted broken C code. These changes are indicated by the updates to t/001-basic.t in this commit. To keep the backwards-compatibility behaviour, the basic approach of this commit has been to make the RETVAL object be in one of three main states: fully-synthetic What is initially pushed at the *start* of the param list. RETVAL hasn't yet appeared in a signature or INPUT. Its type is set to the XSUB's return type. semi-real Same as fully-synthetic, but with a defined argument number, and with an updated position within the parameter list. This signifies that a RETVAL has appeared in the signature, but without a type yet specified. It acts as a placeholder, causing one passed argument to be skipped, and to appear in usage() error messages etc. real The is_synthetic and no_init flags are turned off and its type comes from the signature or INPUT line. This is now just a normal parameter.

The previous commit added a synthetic RETVAL Node::Param object and used it for generating RETVAL declaration code rather than special-casing it. This commit extends this to use the object for output processing too, removing more special-case handling, and removing the need for some special state fields. The next few commits will remove more RETVAL special-handling code and generally simplify and clean up. Note that this commit actually introduces a bug which will be fixed in a few commits' time: the new code doesn't correctly handle multiple CASEs, each with a RETVAL output, such as: int foo(...) CASE: x ... OUTPUT: RETVAL CASE: y ... OUTPUT: RETVAL The code now sees this as a duplicate OUTPUT parameter. For now, a test in t/002-more.t and t/XSMore.xs has been disabled until this is fixed. (Actually, CASE with duplicate INPUT or OUTPUT sections has been broken for a few weeks - but this is the first commit that actually caused a test failure. All will be fixed soon...)

Following on from the previous commit, which uses the new RETVAL Node::Param object for generating output code, ensure that the return type used for returning the XSUB's value is the XSUB's type rather than RETVAL's type, where they differ. For example, with int foo(long RETVAL) RETVAL is is declared as long, but an int is returned to the caller. Add tests to confirm.

In an XSUB like NO_OUTPUT int foo(...) An 'int RETVAL;' declaration is emitted, but no code is emitted to return the value of RETVAL to the caller. Before this commit, the way this was handled was that, after all INPUT and CODE/PPCODE/autocall code had been emitted, ParseXS.pm did the following: ($implicit_OUTPUT_RETVAL, $self->{xsub_return_type}) = (0, 'void') if $self->{xsub_seen_NO_OUTPUT}; That is to say, halfway through parsing the XSUB, it suddenly pretends that the XSUB was declared void all along, and that there wasn't an autocall (which would give an implicit OUTPUT: RETVAL declaration). While this works, it is a confusing action-at-a-distance: at any particular point in the code, you don't easily know whether if ($self->{xsub_return_type} eq 'void) { ... } is dealing with a real void or fake void. This commit removes that halfway-along state modification, and instead handles NO_OUTPUT directly where needed: For $implicit_OUTPUT_RETVAL, it now just never sets it if NO_OUTPUT is present, rather than setting it and then later resetting it. For $self->{xsub_return_type}, this now always represents the actual declared return type of the XSUB, and whenever necessary, $self->{xsub_seen_NO_OUTPUT} is checked along side $self->{xsub_return_type} in various places. In addition, the $var_num variable in OUTPUT_handler() handler has been removed, as it was a vestigial value, now only used to determine whether to call $self->blurt("Error: OUTPUT $outarg not a parameter") which can be done directly as boolean expression. Several new NO_OUTPUT tests have been added.

3 commits ago I added a new scope. This commit re-indents that new block. Whitespace-only (apart from deleting a stray empty comment line too).

The $self->ST($num) method typically emits "ST(" . ($num-1) . ")" but it has special-case handling for $num being undef, or (in the case of output), being 0. The latter was a special-case for outputting RETVAL return code. Recent refactoring means that the ST() method is no longer called with $num == 0, so the method's code can be simplified (and the $is_output parameter eliminated). Should be no change in functionality.

For the specific case of RETVAL appearing in an OUTPUT section when NO_OUTPUT is in force, output this new more specific error message: can't use RETVAL in OUTPUT when NO_OUTPUT declared rather than the more generic (and in this case, slightly confusing) message: OUTPUT RETVAL not a parameter

My recent merge commit v5.41.4-108-g9621dfa822 partially broke the CASE: functionality; although I didn't spot it at first since the ParseXS test suite only has a single CASE: test, and none of the XS code bundled with perl core makes use of it. The issue was that an XSUB with multiple CASEs has a single signature, but potentially multiple sets of INPUT and OUTPUT sections. For example: int foo(abc, def) CASE: X int abc; int def; CODE: RETVAL = abc + def; OUTPUT: RETVAL CASE: Y long abc; long def; CODE: RETVAL = abc - def; OUTPUT: RETVAL Once my merge commit made the data from INPUT and OUTPUT sections be used to update each Node::Param object (which has been derived from the signature entry) then this starts to fall apart. It only started failing the single CASE test in t/002-more.t once *this* branch made RETVAL an object (rather than it being special-cased). The fix in this commit is a bit of hack. Once all of the XSUB's signature has been parsed into a list of Node::Param objects, a copy of this list is taken and kept as the original. Then each time the body is parsed (typically once, but multiple times for bodies with CASE keywords) that original list is copied and becomes the current parameter list. The objects in that list are updated by INPUT and OUTPUT sections as usual. Then for the start of the next CASE, the next copying blows away all those updates. If and when ParseXS takes on more of an AST-like appearance (with, hypothetically, Node::INPUT, Node::OUTPUT etc objects) then this approach should be revisited. Note that this commit restores the CASE test in t/002-more.t which I temporarily disabled a few commits ago.

My previous merge commit moved (and renamed) this function from ParseXS.pm to ParseXS/Node.pm. The former file has an indent of 2, while the latter has an indent of 4. When I moved the code across, I forgot to reindent it. So this commit does that now. Whitespace-only change, apart from rewrapping some code comment blocks.

A couple of newly-added regexes had (in part) /....\b*/, which gives the warning: \b* matches null string many times in regex so this commit removes the extraneous '*'

One line had: 'exists $param->{type}' as part of a condition. This worked fine when $param was a normal hash ref (created under 'use fields'), but the combination of: - a recent commit which copies those hashes to $sig->{old_params} then resets some of the fields (to allow for multiple CASEs); - older perls (in particular 5.8.9) which used pseudo-hashes to implement fields, was causing the field to spring into existence, even though it wasn't defined. So change the test from 'exists' to 'defined'.

A recent commit needed, at one point, to make a copy of a Node::Param object, and did so in a clunky way. But this can be simplified by instead using the new() method's built-in ability to take an initialisation hash ref as an arg. So this commit does that. However, it turns out that this new() method *shared* rather than copied that hash ref when running in an environment where 'use fields' isn't available - namely during the building of perl itself. So fix that too.

tonycoz reviewed Nov 14, 2024

View reviewed changes

tonycoz approved these changes Nov 17, 2024

View reviewed changes

iabyn added 27 commits November 18, 2024 10:12

ParseXS: bump version 3.55 => 3.56

9a13ff8

ParseXS: fix code comment typo

d4997b0

ParseXS: add more tests for OUTPUT keyword

0e9d090

In particular, test that the 'SETMAGIC: EN/DISABLE' modifiers work, and that duplicate vars are reported.

ParseXS: refactor: s/NO_RETURN/NO_OUTPUT/

9a19c04

In earlier refactoring, I renamed an E::PXS object field to xsub_seen_NO_RETURN, when it should actually have been to xsub_seen_NO_OUTPUT. This commit renames the field again. Should be no functional changes.

ParseXS: add test: OUTPUT RETVAL with code

f3e24d5

Add a test for the specific: OUTPUT: RETVAL template_override_code

ParseXS: refactor: reindent block

bbb9904

3 commits ago I added a new scope. This commit re-indents that new block. Whitespace-only (apart from deleting a stray empty comment line too).

iabyn added 4 commits November 18, 2024 10:12

ParseXS: t/001-basic.t: fixup regex

a67a29a

A couple of newly-added regexes had (in part) /....\b*/, which gives the warning: \b* matches null string many times in regex so this commit removes the extraneous '*'

iabyn force-pushed the davem/xs_refactor6 branch from 527a03f to a1fad1e Compare November 18, 2024 10:19

iabyn merged commit b0e9f2b into blead Nov 18, 2024
67 checks passed

iabyn deleted the davem/xs_refactor6 branch November 18, 2024 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParseXS: refactor OUTPUT and RETVAL #22731

ParseXS: refactor OUTPUT and RETVAL #22731

iabyn commented Nov 12, 2024

iabyn commented Nov 13, 2024

tonycoz Nov 14, 2024

iabyn Nov 14, 2024

iabyn Nov 14, 2024

ParseXS: refactor OUTPUT and RETVAL #22731

ParseXS: refactor OUTPUT and RETVAL #22731

Conversation

iabyn commented Nov 12, 2024

iabyn commented Nov 13, 2024

tonycoz Nov 14, 2024

Choose a reason for hiding this comment

iabyn Nov 14, 2024

Choose a reason for hiding this comment

iabyn Nov 14, 2024

Choose a reason for hiding this comment