Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: PP, modal scanner, #include #59

Merged
merged 1 commit into from
Oct 11, 2023
Merged

feat!: PP, modal scanner, #include #59

merged 1 commit into from
Oct 11, 2023

Conversation

StoneyJackson
Copy link
Member

@StoneyJackson StoneyJackson commented Oct 11, 2023

In "The PLCC Tool Set" section of the plcc.pithon.net repository, I have uploaded the latest versions of plcc.py (in the src directory) and Scan.java and Token.pattern (in the src/Std directory). These incorporate the changes we discussed:

  1. Dropping the PP preprocessor option in plcc.py
  2. Implementing scanner "line mode" toggling using '^^...' tokens
  3. Implementing #include directives anywhere in the specification text

Tim


"line mode" in scanner

On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this:

token PCT3$$ '^%%%'

then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion.

In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode.

Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics)

skip WHITESPACE '\s*'
token PCT3$$ '^%%%'         # toggles line mode
token ANYTHING_ELSE '\S*'
%
<start> ::= <stuff>
<stuff>:NoLineMode  ::= <ANYTHING_ELSE>
<stuff>:LineMode    ::= PCT3$$ <lines> PCT3$$
<lines>             **= <>
%
# no semantics

The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above.

In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing.

Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged.

Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this:

(cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\
java -cp Java Scan

where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how.

Tim


#include

Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.

My proposed 'include' feature allows for lines of the form

#include filename

just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.

But BEWARE: if a file has an include like this:

#include fff

and if the file 'fff' itself has the same include line

#include fff

the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.

The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has #include lines. This could happen, for example, if the target language were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).

Tim


Remove PP option from plcc.py

This option allowed a specified preprocessor command to be ran on the generated code. The implementation relied on Python's now deprecated pipes library. Unaware of any uses of the PP options, we have chosen to simplify plcc.py by removing the PP option and also the deprecated dependency.

BREAKING CHANGE:

Removal of the PP option will break code that relies on this option. There are no alternatives to this option.


Co-authored-by: Timothy Fossum [email protected]

Closes #49
Closes #55
Closes #58

Thank you for your help! Please read the following before contributing to
this project.

Legal

This project and its contents are licensed under GPL-3.0 or greater.
See the LICENSE file at the root of this project. Your contributions must
therefore also be licensed under GPL-3.0 or greater.

Also by contributing to this project, you are signing off on the
Developer Certificate of Origin (DCO)
asserting that your contributions may legally be licensed under GPL-3.0
or greater. If you do not want to sign off on the DCO, close or delete
this PR.

In "The PLCC Tool Set" section of the plcc.pithon.net repository, I
have uploaded the latest versions of plcc.py (in the src directory) and
Scan.java and Token.pattern (in the src/Std directory). These
incorporate the changes we discussed:

1.  Dropping the PP preprocessor option in plcc.py
2.  Implementing scanner "line mode" toggling using '^^...' tokens
3.  Implementing #include directives anywhere in the specification text

Tim

----------------------
"line mode" in scanner
----------------------

On a related issue, do you want me to implement the "lineMode" feature
as I proposed earlier? That is, if a token definition looks like this:

    token PCT3$$ '^%%%'

then the scanner will enter line mode whenever it sees the PCT3$$ token
on input, and will exit line mode when it sees another matching PCT3$$
token. The "$$" at the end of the token name is what I used to toggle
line mode. There are other ways we could use token names to trigger
line mode -- my choice of "$$" is only a suggestion.

In line mode, the appearance of <> on the RHS matches an entire LINE of
input, and it returns a token with the special token name of $LINE
(which cannot conflict with user-defined token names) whose lexeme
contains the entire line of input. This only works when the scanner is
in line mode.

Here's an example of a PLCC language specification file whose
implementation can process a PLCC specification file (but with no
semantics)

    skip WHITESPACE '\s*'
    token PCT3$$ '^%%%'         # toggles line mode
    token ANYTHING_ELSE '\S*'
    %
    <start> ::= <stuff>
    <stuff>:NoLineMode  ::= <ANYTHING_ELSE>
    <stuff>:LineMode    ::= PCT3$$ <lines> PCT3$$
    <lines>             **= <>
    %
    # no semantics

The RHS element <> behaves as if it were <$LINE>, where $LINE is the
special reserved token name for a line (which cannot be a user-defined
token name). The scanner cannot return a $LINE "token" unless it's in
line mode, which is toggled as described above.

In the above, once the scanner encounters the line mode toggle token
(PCT3$$ in the above), it consumes the rest of the line containing the
token and starts line mode processing beginning with the next input
line. It then continues reading the input, line-by-line, until it
encounters another instance of the same line mode toggle token,
whereupon it returns to normal token processing.

<IMPORTANT>
Making these changes to the PLCC tool set requires modifications to
plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter
the *behavior* of the Java implementation produced by the PLCC tool set
using the language examples in the Code repository, but of course the
resulting Java files Scan.java and Token.java will not look quite the
same. All of the other generated Java files -- namely, those generated
from the BNF grammar specification -- are unchanged.
</IMPORTANT>

Incidentally, the Scan, Parse, and Rep programs don't know anything
about 'include' in the files they are reading. So if you wanted to run
Scan on, say, the V6 language source files using the above language
definition, you would need to do something like this:

    (cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\
    java -cp Java Scan

where the Java directory has the PLCC code generated by the above
language. The 'cat ...' command will grab all of the code pieces
(codpieces?) and present them to the scanner as a single file. Just
running Scan on the 'grammar' file will not process the named include
files, because the language described above doesn't know how.

Tim

----------
`#include`
----------

Bowing to unrelenting pressure, I have succeeded in implementing an
'include' feature for input files to plcc.py. First, so as not to break
any existing code, the use of 'include ...' at the end (normally) of
the semantics section stays exactly the same: file names are simply
added to the argv array and processed as if they were parameters given
on the command line.

My proposed 'include' feature allows for lines of the form

    #include filename

just like C/C++. When such a line appears anywhere in the input file
(after any command-line switches), input lines switch to the file with
the given filename, and returns to the previous file once the new file
contents have been read. These #include directives can be nested --
that is, an #include file can itself have an #include part, and
everything gets stacked up.

But BEWARE: if a file has an include like this:

    #include fff

and if the file 'fff' itself has the same include line

    #include fff

the include mechanism could blow up with a stack overflow. I have made
it so that you can't have nested includes more than 4 levels deep,
which avoids this problem. I can't imagine nesting even this much, but
I'm open to suggestions.

The tricky part about this is that there might possibly be a situation
where code in the semantics section between the %%% ... %%% markers has
`#include` lines. This could happen, for example, if the target language
were C/C++ -- an unlikely situation, but oddly possible given the
insatiable desire of both of you to target any implementation language
that is Turing complete. In order to side-step this possibility, I have
TURNED OFF the processing of #include directives for code between %%%
... %%% markers. I think this makes sense, and basically treats this
code as being entirely language independent (except for lines
themselves starting with %%% -- ouch!).

Tim

-----------------------------
Remove PP option from plcc.py
-----------------------------

This option allowed a specified preprocessor command to be ran on the
generated code. The implementation relied on Python's now deprecated
`pipes` library. Unaware of any uses of the PP options, we have chosen
to simplify plcc.py by removing the PP option and also the deprecated
dependency.

BREAKING CHANGE:

Removal of the PP option will break code that relies on this option.
There are no alternatives to this option.

---

Co-authored-by: Timothy Fossum <[email protected]>

Closes #49
Closes #55
Closes #58
@StoneyJackson StoneyJackson merged commit 1c4b82f into main Oct 11, 2023
2 checks passed
@StoneyJackson StoneyJackson deleted the 4.0.0 branch October 11, 2023 13:02
@github-actions
Copy link

🎉 This PR is included in version 4.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing code blocks Add include to lex and bnf sections? Python 3.11 deprecates pipes
1 participant