Defining things like what an identifier is #100

ethindp · 2024-02-01T19:59:38Z

I can't seem to find any documentation on this, so I thought I'd try here.

In many grammar specifications, rules like NAME or NUMBER are used. I can see these defined in the file Tokens, but how do I define these? Is it safe to do:

identifier: characters_for_an_identifier

Or are there better ways of doing this? I'm curious because different languages define what an "identifier" is, so I was curious how this is handled, and where these rules/tokens are (actually) defined.

The text was updated successfully, but these errors were encountered:

lysnikolaou · 2024-02-02T13:20:29Z

Tokens like NAME or NUMBER come from the tokenizer and the parser has no control over them. In order to change what constitutes an identifier, the tokenizer would have to be changed to handle NAME tokens differently.

pegen uses the python tokenizer by default, which has a strict definition of what an identifier is, but you could pass a different tokenizer when instantiating a parser object, if you really want to change that.

ethindp · 2024-02-02T15:17:43Z

@lysnikolaou I might need a different tokenizer owing to the language I'm trying to parse having some unique lexical rules in regards to strings and such. The language is fully Unicode aware, so I have that to deal with. Are there any examples of overriding/replacing the tokenizer or should I just look at the default implementation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining things like what an identifier is #100

Defining things like what an identifier is #100

ethindp commented Feb 1, 2024

lysnikolaou commented Feb 2, 2024

ethindp commented Feb 2, 2024

Defining things like what an identifier is #100

Defining things like what an identifier is #100

Comments

ethindp commented Feb 1, 2024

lysnikolaou commented Feb 2, 2024

ethindp commented Feb 2, 2024