Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageProxy: avoid oversight of wrapping an entire HTML document around image #212

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

Frenzie
Copy link
Member

@Frenzie Frenzie commented Mar 4, 2024

In reply to #206 (comment).

With special thanks to this comment: https://www.php.net/manual/en/domdocument.savehtml.php#119767

@Alkarex Alkarex merged commit 7bfab1c into FreshRSS:master Mar 4, 2024
1 check passed
@Frenzie Frenzie deleted the imageproxy branch March 4, 2024 21:11
@Frenzie
Copy link
Member Author

Frenzie commented Mar 4, 2024

Hm, I was just reading more and realized this may not be ideal, though at first glance the output looked okay:

For an example:

<h1>Foo</h1><p>bar</p>

will end up as:

<h1>Foo<p>bar</p></h1>

Easiest workaround is adding root tag yourself and stripping it later:

$html->loadHTML('<html>' . $content .'</html>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$content = str_replace(array('<html>','</html>') , '' , $html->saveHTML());

@Alkarex
Copy link
Member

Alkarex commented Mar 4, 2024

I have not looked precisely, but a typical way is to saveHTML($node). Try to search our codebase for ->saveHTML( for some examples

@Frenzie
Copy link
Member Author

Frenzie commented Mar 4, 2024

That's the first thing I tried, as $dom->getElementsByTagName('body')->item(0)->saveHTML().

But I see what you're referring to is actually a for loop appending to a string. I had considered that, but it felt hacky.

@Alkarex
Copy link
Member

Alkarex commented Mar 4, 2024

I think it makes a difference in behaviour to pass the node inside the saveHTML($node) function

@Alkarex
Copy link
Member

Alkarex commented Mar 4, 2024

@Frenzie
Copy link
Member Author

Frenzie commented Mar 4, 2024

Ah, in that case presumably it should simply be something like:

$output = $dom->getElementsByTagName('body')->item(0);
return $dom->saveHTML($output);

I'll test that tomorrow (if I find the time).

PS Here are the other results. I hadn't notice the third was slightly different, whoops.
https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/app/Models/Feed.php#L751-L756
https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/app/Models/Entry.php#L727-L743

@Frenzie
Copy link
Member Author

Frenzie commented Mar 5, 2024

For reference, this is the idea I played with first yesterday:

		$body = $doc->getElementsByTagName('body')->item(0);

		$output = '';
		foreach ($body->childNodes as $node) {
				$output .= $doc->saveHTML($node);
		}

		return $output;

The sample uses regex to effect the desired result. I will do so as well to limit new string creation.
https://github.com/FreshRSS/FreshRSS/blob/da43fff43713daadbf13d05557a2ba14d5166b11/lib/SimplePie/SimplePie/Sanitize.php#L470-L474

Frenzie added a commit to Frenzie/Extensions that referenced this pull request Mar 5, 2024
With the method in FreshRSS#212 undesired transformations could sometimes be effected.
Alkarex added a commit that referenced this pull request Mar 6, 2024
* ImageProxy: better DOMDocument HTML handling

With the method in #212 undesired transformations could sometimes be effected.

* Update xExtension-ImageProxy/extension.php

Co-authored-by: Alexandre Alapetite <[email protected]>

* Bump version to 0.7.1

---------

Co-authored-by: Alexandre Alapetite <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants