Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MM-55972/MM-59809 Simplify parsing of table delimiters and improve how whitespace in tables is handled #19

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

hmhealey
Copy link
Member

@hmhealey hmhealey commented Aug 9, 2024

Summary

The issues in the linked tickets are caused by reTableDelimiter being really complicated in a way that broke RN's implementation of regular expressions despite working just fine on the ones implemented by browsers which made it hard to track down. Instead of having one really complicated regex to specifically match the delimiter row (aka the second row of the table that's all dashes), we'll now parse it as if it was a regular table row and then go back to double check if it's valid as a delimiter row (basically that it's all dashes).

When doing this, I also found out that we still don't handle whitespace around tables quite correctly, particularly because we parse the first two lines of the table at once which bypasses the built-in whitespace handling for the delimiter row, so I ended up having to check the indentation level of that delimiter row manually.

Ticket Link

https://mattermost.atlassian.net/browse/MM-59809
https://mattermost.atlassian.net/browse/MM-55972

@hmhealey hmhealey added the 2: Dev Review Requires review by a core committer label Aug 9, 2024
@hmhealey hmhealey requested review from larkox and rahimrahman August 9, 2024 18:00
@@ -787,20 +782,32 @@ var blockStarts = [
return 0;
}

const nextColumn = measureNonspaceColumn(parser.nextLine);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensuring that the second line of the table is near the same indentation level as the first line was previously part of reTableDelimiter, but it needs to be done manually now

// check for a delimiter first since it's stricter than the header row
const nextLine = trimSpacesAfterPipe(parser.nextLine);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimSpacesAfterPipe was added as a fix for https://mattermost.atlassian.net/browse/MM-13516. Before the change I commented on above, we could only trim the trailing whitespace before passing the line into reTableDelimiter. With the whitespace handled separately, we can just use a regular old .trim to get rid of that whitespace

lib/blocks.js Outdated
@@ -860,6 +867,23 @@ var blockStarts = [
}
];

const reValidTableDelimiter = /^:?-+:?$/;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the delimiter row to be valid, all of the cells in it must be made up of a string of dashes with a colon before or after to change the alignment of that row. They can start and end with whitespace, but that's removed before checking with the regex

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other regexes are between line 14095-14137. Should we put this one there too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I started defining the regexes with their corresponding functions in lib/inlines.js, but I guess I never did that for lib/blocks.js

@@ -949,6 +973,27 @@ var findNextNonspace = function() {
this.indented = this.indent >= CODE_INDENT;
};

function measureNonspaceColumn(line) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could've extracted this from the above method, but I'm hesitant about modifying the existing functions as much as possible to avoid future conflicts, so I just copied this out

@@ -488,20 +502,40 @@ aaa|bbb
<th>aaa</th>
</tr>
</thead></table>
````````````````````````````````
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the previous tests were wrong when comparing against GitHub, and that got broken with the new whitespace handling, so I ended up replacing them

Copy link

@rahimrahman rahimrahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood how the fix works.

I saw the tables.txt file was updated with the code that breaks. Would the unit test detect that as failures?

test/tables.txt Show resolved Hide resolved
lib/blocks.js Show resolved Hide resolved
lib/blocks.js Show resolved Hide resolved
lib/blocks.js Outdated
@@ -860,6 +867,23 @@ var blockStarts = [
}
];

const reValidTableDelimiter = /^:?-+:?$/;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other regexes are between line 14095-14137. Should we put this one there too?

lib/blocks.js Outdated Show resolved Hide resolved
@hmhealey hmhealey added 4: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels Aug 26, 2024
@hmhealey hmhealey merged commit 238f58c into master Aug 26, 2024
12 checks passed
@hmhealey hmhealey deleted the MM-55972-59809_tables branch August 26, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants