Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix content clickable when having HTML attributes with "<" in the value #39931

Merged
merged 3 commits into from
Nov 11, 2024

Conversation

renatho
Copy link
Contributor

@renatho renatho commented Oct 28, 2024

Related to https://github.com/Automattic/sensei-pro/issues/2574

Proposed changes:

  • It fixes the regex that splits the content to add the anchor tag in the URLs.

This regex was introduced years ago: https://github.com/Automattic/wpcomsh/pull/375/files#diff-617829ac2bd62b94e204d12538c81a709caa5671cc2471c1051b2b8e272e6b50R982. Notice that it tries to split the pieces starting with "<" and ending with ">" to handle them in chunks.

But with the previous regex, if we have "<" inside the tag (an attribute), it doesn't get the chunks properly. If I'm not missing anything, we don't need to skip the "<" from inside the tags because "<" wouldn't never be there. <this<isnotexpected>.

So removing that char from the excluding portion of the regex, fixes the issue.

To be more clear, see the following examples:

The current regex:

php > print_r( preg_split( '/(<[^<>]+>)/i', '<li data-test="<test&gt;"></li>', -1, PREG_SPLIT_DELIM_CAPTURE ) );
Array
(
    [0] => <li data-test="
    [1] => <test&gt;">
    [2] =>
    [3] => </li>
    [4] =>
)

The new regex:

php > print_r( preg_split( '/(<[^>]+>)/i', '<li data-test="<test&gt;"></li>', -1, PREG_SPLIT_DELIM_CAPTURE ) );
Array
(
    [0] =>
    [1] => <li data-test="<test&gt;">
    [2] =>
    [3] => </li>
    [4] =>
)

Notice that we could also have issues with the > inside the attributes, but this doesn't happen because of this: WordPress/gutenberg#9963.

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

Does this pull request change what data or activity we track or use?

Testing instructions:

Known issue: https://github.com/Automattic/sensei-pro/issues/2609

Copy link
Contributor

github-actions bot commented Oct 28, 2024

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Team Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


The e2e test report can be found here. Please note that it can take a few minutes after the e2e tests checks are complete for the report to be available.


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Choose a review path based on your changes:
    • A. Team Review: add the "[Status] Needs Team Review" label
      • For most changes, including minor cross-team impacts.
      • Example: Updating a team-specific component or a small change to a shared library.
    • B. Crew Review: add the "[Status] Needs Review" label
      • For significant changes to core functionality.
      • Example: Major updates to a shared library or complex features.
    • C. Both: Start with Team, then request Crew
      • For complex changes or when you need extra confidence.
      • Example: Refactor affecting multiple systems.
  3. Get at least one approval before merging.

Still unsure? Reach out in #jetpack-developers for guidance!


Wpcomsh plugin:

  • Next scheduled release: Atomic deploys happen twice daily on weekdays (p9o2xV-2EN-p2).

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

@renatho renatho requested a review from westi October 28, 2024 23:40
@renatho renatho marked this pull request as ready for review October 28, 2024 23:41
@renatho renatho changed the title Fix content clickable when having attriutes with "<" in the value Fix content clickable when having HTML attributes with "<" in the value Oct 29, 2024
m1r0
m1r0 previously approved these changes Oct 31, 2024
Copy link
Member

@m1r0 m1r0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and it works as described. 👍

@renatho
Copy link
Contributor Author

renatho commented Oct 31, 2024

Thank you for the review, @m1r0!

Hi @westi! 👋 I know the original code was introduced a long long time ago, but could you just take a quick look to see if we're not missing anything critical that you could remember?

@azaozz
Copy link
Contributor

azaozz commented Nov 4, 2024

with the previous regex, if we have "<" inside the tag (an attribute), it doesn't get the chunks properly
...
Notice that we could also have issues with the > inside the attributes, but this doesn't happen because

Think this restriction was (mostly) a security hardening. In theory both < and > seem allowed in attribute values, but not sure there is a good use case for using literal characters instead of entities.

If I remember right this code is more restrictive to "err on the side of safety", and to keep that functionality simple. If the < and > characters must be supported perhaps the WP HTML API can be used to split the tags, or even to handle most of this. That may actually be faster/better than the old code.

@renatho
Copy link
Contributor Author

renatho commented Nov 4, 2024

@azaozz, thank you for checking!

I agree that a refactor to use the WP HTML API would probably be better here. But since it would impact all WPCOM sites, I think it would need to be a cautious work, checking possible issues, performance, and so on.

Since we don't have a critical issue related to the brackets, I would go ahead with this current solution, and I could also create an issue for a refactor to use the WP HTML API. Does it make sense to you?

@azaozz
Copy link
Contributor

azaozz commented Nov 4, 2024

@renatho Sure, sounds good. I cannot remember if there was a specific (security hardening) reason to exclude HTML tags that may have extra < and/or > in attribute values, perhaps not. Also seems this fixes the current issue but may leave another edge case, for example:
<li data-test="<test&gt;" data-test-2="some > something">[url to convert]</li>

Would also be pretty good if some unit tests are added for this, would definitely help in similar situations in the future.

@renatho renatho force-pushed the fix/content-clickable branch 2 times, most recently from 8f2d703 to a68558a Compare November 11, 2024 20:01
@renatho renatho force-pushed the fix/content-clickable branch 2 times, most recently from 31ccde5 to 2624695 Compare November 11, 2024 20:35
@renatho renatho force-pushed the fix/content-clickable branch from 2624695 to 753bbe0 Compare November 11, 2024 20:36
@renatho
Copy link
Contributor Author

renatho commented Nov 11, 2024

It seems the issue with the test coverage is something also happening in other PRs, so considering it was already reviewed and I just wrote a test, I'm going ahead and merging this.

The test was added here: 753bbe0

Thank you for your approval @m1r0! And thank you for checking and for your suggestions @azaozz!

perhaps the WP HTML API can be used to split the tags, or even to handle most of this

Issue to refactor the whole approach created here: #40134

@renatho renatho merged commit 569d37b into trunk Nov 11, 2024
54 of 55 checks passed
@renatho renatho deleted the fix/content-clickable branch November 11, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants