Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining things like what an identifier is #100

Open
ethindp opened this issue Feb 1, 2024 · 2 comments
Open

Defining things like what an identifier is #100

ethindp opened this issue Feb 1, 2024 · 2 comments

Comments

@ethindp
Copy link

ethindp commented Feb 1, 2024

I can't seem to find any documentation on this, so I thought I'd try here.

In many grammar specifications, rules like NAME or NUMBER are used. I can see these defined in the file Tokens, but how do I define these? Is it safe to do:

identifier: characters_for_an_identifier

Or are there better ways of doing this? I'm curious because different languages define what an "identifier" is, so I was curious how this is handled, and where these rules/tokens are (actually) defined.

@lysnikolaou
Copy link
Member

Tokens like NAME or NUMBER come from the tokenizer and the parser has no control over them. In order to change what constitutes an identifier, the tokenizer would have to be changed to handle NAME tokens differently.

pegen uses the python tokenizer by default, which has a strict definition of what an identifier is, but you could pass a different tokenizer when instantiating a parser object, if you really want to change that.

@ethindp
Copy link
Author

ethindp commented Feb 2, 2024

@lysnikolaou I might need a different tokenizer owing to the language I'm trying to parse having some unique lexical rules in regards to strings and such. The language is fully Unicode aware, so I have that to deal with. Are there any examples of overriding/replacing the tokenizer or should I just look at the default implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants