multiply applied capture groups seems to ignore some captures #127

asottile · 2020-03-11T03:55:13Z

a bit of an edge case, I'm not sure how this is supposed to be handled -- I don't have a concrete use case, just trying to implement my own parser in python using this as a reference

sample grammar

{
    "scopeName": "test",
    "patterns": [
        {
            "match": "((a)) ((b) c) (d (e)) ((f) )",
            "name": "matched",
            "captures": {
                "1": {"name": "g1"},
                "2": {"name": "g2"},
                "3": {"name": "g3"},
                "4": {"name": "g4"},
                "5": {"name": "g5"},
                "6": {"name": "g6"},
                "7": {
                    "patterns": [
                        {"match": "f", "name": "g7f"},
                        {"match": " ", "name": "g7space"}
                    ]
                },
                "8": {"name": "g8"}
            }
        }
    ]
}

sample file

a b c d e f z

tokenization using vs code

$ node vsc.js cap.json f

Tokenizing line: a b c d e f z
 - token from 0 to 1 (a) with scopes test, matched, g1, g2
 - token from 1 to 2 ( ) with scopes test, matched
 - token from 2 to 3 (b) with scopes test, matched, g3, g4
 - token from 3 to 5 ( c) with scopes test, matched, g3
 - token from 5 to 6 ( ) with scopes test, matched
 - token from 6 to 8 (d ) with scopes test, matched, g5
 - token from 8 to 9 (e) with scopes test, matched, g5, g6
 - token from 9 to 10 ( ) with scopes test, matched
 - token from 10 to 11 (f) with scopes test, matched, g7f
 - token from 11 to 12 ( ) with scopes test, matched, g7space
 - token from 12 to 14 (z) with scopes test

I expect the f to have the scope test, matched, g7f, g8:

>>> # ...
>>> state, regions = highlight_line(compiler, state, 'a b c d e f z', first_line=True)
>>> import pprint
>>> pprint.pprint(regions)
(Region(start=0, end=1, scope=('test', 'matched', 'g1', 'g2')),
 Region(start=1, end=2, scope=('test', 'matched')),
 Region(start=2, end=3, scope=('test', 'matched', 'g3', 'g4')),
 Region(start=3, end=5, scope=('test', 'matched', 'g3')),
 Region(start=5, end=6, scope=('test', 'matched')),
 Region(start=6, end=8, scope=('test', 'matched', 'g5')),
 Region(start=8, end=9, scope=('test', 'matched', 'g5', 'g6')),
 Region(start=9, end=10, scope=('test', 'matched')),
 Region(start=10, end=11, scope=('test', 'matched', 'g7f', 'g8')),
 Region(start=11, end=12, scope=('test', 'matched', 'g7space')),
 Region(start=12, end=13, scope=('test',)))

The text was updated successfully, but these errors were encountered:

alexdima · 2020-03-11T07:33:13Z

I have tried also in TextMate and they appear to handle this in the way you expect:

Here is the grammar converted to TextMate's format:

{	patterns = (
		{	
			match = "((a)) ((b) c) (d (e)) ((f) )";
			name = "matched";
			captures = {
				1 = { name = "g1"; };
				2 = { name = "g2"; };
				3 = { name = "g3"; };
				4 = { name = "g4"; };
				5 = { name = "g5"; };
				6 = { name = "g6"; };
				7 = {
					patterns = (
						{ match = "f"; name = "g7f"; },
						{ match = " "; name = "g7space"; },
					);
				};
				8 = { name = "g8"; };
			};
		},
	);
}

RedCMD · 2024-10-05T19:59:00Z

dup:
#164
#208

asottile · 2024-10-05T20:00:51Z

@RedCMD usually dupe goes the other way since this one is older and has more context

alexdima added the bug Issue identified by VS Code Team member as probable bug label Mar 11, 2020

This was referenced Oct 5, 2024

Scopes on Recursive Regex Cause Problems #208

Open

Subroutines breaking capture tokenizing inside of referenced capture group #164

Open

asottile closed this as completed Oct 5, 2024

asottile reopened this Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiply applied capture groups seems to ignore some captures #127

multiply applied capture groups seems to ignore some captures #127

asottile commented Mar 11, 2020

alexdima commented Mar 11, 2020 •

edited

Loading

RedCMD commented Oct 5, 2024

asottile commented Oct 5, 2024

multiply applied capture groups seems to ignore some captures #127

multiply applied capture groups seems to ignore some captures #127

Comments

asottile commented Mar 11, 2020

sample grammar

sample file

tokenization using vs code

alexdima commented Mar 11, 2020 • edited Loading

RedCMD commented Oct 5, 2024

asottile commented Oct 5, 2024

alexdima commented Mar 11, 2020 •

edited

Loading