feat!: PP, modal scanner, #include #59

StoneyJackson · 2023-10-11T12:54:39Z

In "The PLCC Tool Set" section of the plcc.pithon.net repository, I have uploaded the latest versions of plcc.py (in the src directory) and Scan.java and Token.pattern (in the src/Std directory). These incorporate the changes we discussed:

Dropping the PP preprocessor option in plcc.py
Implementing scanner "line mode" toggling using '^^...' tokens
Implementing #include directives anywhere in the specification text

Tim

"line mode" in scanner

On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this:

token PCT3$$ '^%%%'

then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion.

In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode.

Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics)

skip WHITESPACE '\s*'
token PCT3$$ '^%%%'         # toggles line mode
token ANYTHING_ELSE '\S*'
%
<start> ::= <stuff>
<stuff>:NoLineMode  ::= <ANYTHING_ELSE>
<stuff>:LineMode    ::= PCT3$$ <lines> PCT3$$
<lines>             **= <>
%
# no semantics

The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above.

In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing.

Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged.

Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this:

(cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\
java -cp Java Scan

where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how.

Tim

`#include`

Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.

My proposed 'include' feature allows for lines of the form

#include filename

just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.

But BEWARE: if a file has an include like this:

#include fff

and if the file 'fff' itself has the same include line

#include fff

the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.

The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has #include lines. This could happen, for example, if the target language were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).

Tim

Remove PP option from plcc.py

This option allowed a specified preprocessor command to be ran on the generated code. The implementation relied on Python's now deprecated pipes library. Unaware of any uses of the PP options, we have chosen to simplify plcc.py by removing the PP option and also the deprecated dependency.

BREAKING CHANGE:

Removal of the PP option will break code that relies on this option. There are no alternatives to this option.

Co-authored-by: Timothy Fossum [email protected]

Closes #49
Closes #55
Closes #58

Thank you for your help! Please read the following before contributing to
this project.

Legal

This project and its contents are licensed under GPL-3.0 or greater.
See the LICENSE file at the root of this project. Your contributions must
therefore also be licensed under GPL-3.0 or greater.

Also by contributing to this project, you are signing off on the
Developer Certificate of Origin (DCO)
asserting that your contributions may legally be licensed under GPL-3.0
or greater. If you do not want to sign off on the DCO, close or delete
this PR.

In "The PLCC Tool Set" section of the plcc.pithon.net repository, I have uploaded the latest versions of plcc.py (in the src directory) and Scan.java and Token.pattern (in the src/Std directory). These incorporate the changes we discussed: 1. Dropping the PP preprocessor option in plcc.py 2. Implementing scanner "line mode" toggling using '^^...' tokens 3. Implementing #include directives anywhere in the specification text Tim ---------------------- "line mode" in scanner ---------------------- On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this: token PCT3$$ '^%%%' then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion. In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode. Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics) skip WHITESPACE '\s*' token PCT3$$ '^%%%' # toggles line mode token ANYTHING_ELSE '\S*' % <start> ::= <stuff> <stuff>:NoLineMode ::= <ANYTHING_ELSE> <stuff>:LineMode ::= PCT3$$ <lines> PCT3$$ <lines> **= <> % # no semantics The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above. In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing. <IMPORTANT> Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged. </IMPORTANT> Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this: (cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\ java -cp Java Scan where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how. Tim ---------- `#include` ---------- Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line. My proposed 'include' feature allows for lines of the form #include filename just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up. But BEWARE: if a file has an include like this: #include fff and if the file 'fff' itself has the same include line #include fff the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions. The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has `#include` lines. This could happen, for example, if the target language were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!). Tim ----------------------------- Remove PP option from plcc.py ----------------------------- This option allowed a specified preprocessor command to be ran on the generated code. The implementation relied on Python's now deprecated `pipes` library. Unaware of any uses of the PP options, we have chosen to simplify plcc.py by removing the PP option and also the deprecated dependency. BREAKING CHANGE: Removal of the PP option will break code that relies on this option. There are no alternatives to this option. --- Co-authored-by: Timothy Fossum <[email protected]> Closes #49 Closes #55 Closes #58

github-actions · 2023-10-11T15:34:54Z

🎉 This PR is included in version 4.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

StoneyJackson merged commit 1c4b82f into main Oct 11, 2023
2 checks passed

StoneyJackson deleted the 4.0.0 branch October 11, 2023 13:02

github-actions bot added the released label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: PP, modal scanner, #include #59

feat!: PP, modal scanner, #include #59

StoneyJackson commented Oct 11, 2023 •

edited

Loading

github-actions bot commented Oct 11, 2023

feat!: PP, modal scanner, #include #59

feat!: PP, modal scanner, #include #59

Conversation

StoneyJackson commented Oct 11, 2023 • edited Loading

"line mode" in scanner

#include

Remove PP option from plcc.py

Legal

github-actions bot commented Oct 11, 2023

StoneyJackson commented Oct 11, 2023 •

edited

Loading

`#include`