Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct 2 x grammar rules for compilation unit name in #line #1120

Open
wants to merge 1 commit into
base: draft-v8
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions standard/lexical-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -1488,12 +1488,12 @@ fragment PP_Line_Indicator
;

fragment PP_Compilation_Unit_Name
: '"' PP_Compilation_Unit_Name_Character+ '"'
: '"' PP_Compilation_Unit_Name_Character* '"'
Copy link
Contributor

@Nigel-Ecma Nigel-Ecma May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check this is by design before making this change. Regardless of the answer there is probably some work to do:

  • §14.2 Compilation units defines a compilation unit, the definition does not include them having names… or being files… However the example (non-normative) uses two files A.cs and B.cs and refers to them as two compilation units.
  • §22.5.6.3 The CallerFilePath attribute which provides the file path (which is implementation-dependent) states “The file path may be affected by #line directives ([§6.5.8]”.
  • Here in §6.5.8 the #line allows the setting of the “compilation unit name”
  • So:
    • §6.5.8 states compilation units have names, which is omitted in §14.2; and
    • §22.5.6.3 tells us that the name is the file path, but leaves what that is implementation-dependent

What already exists isn’t overly clear and this change seeks to allow the compilation unit name to be the empty string, which is probably not a valid implementation-dependent path on any implementation… So if this
observed compiler behaviour is by design then it surely needs to have a defined meaning in the Standard.

If this change is to be made, and even if not, this all needs to tided up – either in this PR or spin it all off into a new one.


I might ask what the intended use of an empty file path/compilation unit name is but I might know – it was requested by the NSA so that the file names in NSA distributed software is not leaked and so endanger National Security! I’m only partially joking here, but that’s a story for another time ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillWagner told me that allowing an empty string was indeed a conscious decision, so I propose keeping that edit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nigel-Ecma

§6.1 Programs states:

A C# program consists of one or more source files, known formally as compilation units (§14.2). Although a compilation unit might have a one-to-one correspondence with a file in a file system, such correspondence is not required.

I propose appending to this, the following:

… As such, the accepted spelling of a compilation unit name, and its mapping, if any, to a filename is outside the scope of this specification.

I'm deliberately avoiding using any of the following terms:

  • behavior, implementation-defined – unspecified behavior where each implementation documents how the choice is made
  • behavior, undefined – behavior, upon use of a non-portable or erroneous construct or of erroneous data, for which this specification imposes no requirements
  • behavior, unspecified – behavior where this specification provides two or more possibilities and imposes no further requirements on which is chosen in any instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that extra text - Rex, do you think it's worth adding that to this PR, so we can merge it all in one go?

;

fragment PP_Compilation_Unit_Name_Character
// Any Input_Character except "
: ~('\u000D' | '\u000A' | '\u0085' | '\u2028' | '\u2029' | '#')
: ~('\u000D' | '\u000A' | '\u0085' | '\u2028' | '\u2029' | '"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # was a typo so should be fixed…

However §22.5.6.3 defines the format of the compilation unit name/file path as “implementation-dependent”. So this section might need a semantic rule saying this arbitrary string must conform to the same implementation-dependent rules §22.5.6.3, or that is does not need to (i.e. not be valid as a file path).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm not sure. It feels like it would be okay to allow values which aren't valid filenames, when specifying the compilation unit name directly in code, even if the CallerFilePathAttribute could never automatically generate such a name. (Indeed, I can see some cases where that would even be useful!)

I think this is speaking in favor of having a semantic rule saying "that it does not need to".

Anecdotally, I observe that Roslyn is okay with this:

#line 100 ":invalid:"

and even:

#line 100 ".."

(Interestingly, for the latter, it reports any subsequent error as belonging to the parent directory of the directory containing the file - it doesn't report it as ".." verbatim...)

Copy link
Contributor

@jskeet jskeet Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only just tried code with the former (":invalid:") with code containing errors after it: Roslyn crashes:

error MSB6006: "csc.dll" exited with code 1.

(But if there are no warnings/errors that need reporting, it's fine...)

;
```

Expand Down
Loading