-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: PP, modal scanner, #include #59
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In "The PLCC Tool Set" section of the plcc.pithon.net repository, I have uploaded the latest versions of plcc.py (in the src directory) and Scan.java and Token.pattern (in the src/Std directory). These incorporate the changes we discussed: 1. Dropping the PP preprocessor option in plcc.py 2. Implementing scanner "line mode" toggling using '^^...' tokens 3. Implementing #include directives anywhere in the specification text Tim ---------------------- "line mode" in scanner ---------------------- On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this: token PCT3$$ '^%%%' then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion. In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode. Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics) skip WHITESPACE '\s*' token PCT3$$ '^%%%' # toggles line mode token ANYTHING_ELSE '\S*' % <start> ::= <stuff> <stuff>:NoLineMode ::= <ANYTHING_ELSE> <stuff>:LineMode ::= PCT3$$ <lines> PCT3$$ <lines> **= <> % # no semantics The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above. In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing. <IMPORTANT> Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged. </IMPORTANT> Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this: (cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\ java -cp Java Scan where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how. Tim ---------- `#include` ---------- Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line. My proposed 'include' feature allows for lines of the form #include filename just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up. But BEWARE: if a file has an include like this: #include fff and if the file 'fff' itself has the same include line #include fff the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions. The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has `#include` lines. This could happen, for example, if the target language were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!). Tim ----------------------------- Remove PP option from plcc.py ----------------------------- This option allowed a specified preprocessor command to be ran on the generated code. The implementation relied on Python's now deprecated `pipes` library. Unaware of any uses of the PP options, we have chosen to simplify plcc.py by removing the PP option and also the deprecated dependency. BREAKING CHANGE: Removal of the PP option will break code that relies on this option. There are no alternatives to this option. --- Co-authored-by: Timothy Fossum <[email protected]> Closes #49 Closes #55 Closes #58
🎉 This PR is included in version 4.0.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In "The PLCC Tool Set" section of the plcc.pithon.net repository, I have uploaded the latest versions of plcc.py (in the src directory) and Scan.java and Token.pattern (in the src/Std directory). These incorporate the changes we discussed:
Tim
"line mode" in scanner
On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this:
then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion.
In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode.
Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics)
The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above.
In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing.
Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged.Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this:
where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how.
Tim
#include
Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.
My proposed 'include' feature allows for lines of the form
just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.
But BEWARE: if a file has an include like this:
and if the file 'fff' itself has the same include line
the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.
The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has
#include
lines. This could happen, for example, if the target language were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).Tim
Remove PP option from plcc.py
This option allowed a specified preprocessor command to be ran on the generated code. The implementation relied on Python's now deprecated
pipes
library. Unaware of any uses of the PP options, we have chosen to simplify plcc.py by removing the PP option and also the deprecated dependency.BREAKING CHANGE:
Removal of the PP option will break code that relies on this option. There are no alternatives to this option.
Co-authored-by: Timothy Fossum [email protected]
Closes #49
Closes #55
Closes #58
Thank you for your help! Please read the following before contributing to
this project.
Legal
This project and its contents are licensed under GPL-3.0 or greater.
See the LICENSE file at the root of this project. Your contributions must
therefore also be licensed under GPL-3.0 or greater.
Also by contributing to this project, you are signing off on the
Developer Certificate of Origin (DCO)
asserting that your contributions may legally be licensed under GPL-3.0
or greater. If you do not want to sign off on the DCO, close or delete
this PR.