Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTML parsing features #11

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
67cd354
Apply 8.1 hotfixes from unmerged patch
mallardduck Oct 2, 2022
8c15b01
Initial HTML replacer code
mallardduck Oct 2, 2022
6a8cb4d
remove unused property
mallardduck Oct 2, 2022
b8af7e5
generate new emoji bytes
mallardduck Oct 2, 2022
b8bedbf
clean up code
mallardduck Oct 2, 2022
bb84284
Add test to cover image alt/title attributes
mallardduck Oct 2, 2022
eb121c0
refactor to use XPath to solve filtering text nodes problem
mallardduck Oct 2, 2022
92d9136
Remove try-guy now that it's unused
mallardduck Oct 2, 2022
4538750
refactor to ensure we allow HTML fragments too
mallardduck Oct 4, 2022
cd6e190
refactor tests to split up HTML pages and HTML fragments
mallardduck Oct 4, 2022
60c987f
Use internal tag as means of warning?
mallardduck Oct 4, 2022
f7616c0
Refactor method name to slightly better option
mallardduck Oct 4, 2022
e818b5c
fix code styles
mallardduck Oct 4, 2022
b1f83c7
make styleCI happy
mallardduck Oct 4, 2022
29f7d0a
Refactor to fix missed fragments and expand tests
mallardduck Oct 4, 2022
eeb5f0a
reorder code
mallardduck Oct 4, 2022
fcdd93d
Refactor new tests and add failing tests for current issues.
mallardduck Oct 17, 2022
2d77cdc
fix styles
mallardduck Oct 17, 2022
e0f2540
track the Pest helper file
mallardduck Oct 17, 2022
bc61a6a
fix pest file styles
mallardduck Oct 17, 2022
b3a57be
Add tests that cover the edge case I've been chasing
mallardduck Oct 17, 2022
62fdc48
Refactor how HTML fragments are handled
mallardduck Oct 17, 2022
3dd8fb5
Ensure extra spaces are not added
mallardduck Oct 17, 2022
b2bc8bb
Update tests with fixed results
mallardduck Oct 17, 2022
2e3278d
Manually correct snapshots to desired state
mallardduck Oct 17, 2022
55badec
Skip HTML fragment tests that cause errors
mallardduck Oct 17, 2022
c446d6f
Refactor exception
mallardduck Oct 17, 2022
1b5b3e0
remove dumper from composer file
mallardduck Oct 17, 2022
2ee59bf
Always use static builder method instead of new
mallardduck Oct 17, 2022
891c25f
Improve fragment parsing and enable more tests
mallardduck Oct 17, 2022
2b77947
Correct HTML pages without meta charset tag
mallardduck Oct 17, 2022
d5b6869
refactor UTF8 tag adding and enable test
mallardduck Oct 17, 2022
894b79f
Add test to cover when incorrect content type is corrected
mallardduck Oct 17, 2022
590ac97
Add ext-dom to suggested
mallardduck Oct 17, 2022
4468c8e
adjust styles
mallardduck Oct 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
/composer.lock
/vendor/
.phpunit.result.cache
13 changes: 9 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@
"ext-mbstring": "*"
},
"require-dev": {
"pestphp/pest": "^0.3.0",
"pestphp/pest": "^1.21",
"s9e/regexp-builder": "^1.4",
"spatie/emoji": "^2.3.0",
"spatie/pest-plugin-snapshots": "^1.0"
"spatie/pest-plugin-snapshots": "^1.0",
"wa72/htmlpagedom": "^2.0 || ^3.0"
},
"suggest": {
"spatie/emoji": "*"
"spatie/emoji": "*",
"wa72/htmlpagedom": "*"
},
"minimum-stability": "dev",
"prefer-stable": true,
Expand All @@ -38,7 +40,10 @@
}
},
"config": {
"sort-packages": true
"sort-packages": true,
"allow-plugins": {
"pestphp/pest-plugin": true
}
},
"scripts": {
"generate": "php ./generate.php",
Expand Down
61 changes: 61 additions & 0 deletions src/HtmlReplacer.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<?php

namespace Astrotomic\Twemoji;

use Astrotomic\Twemoji\Concerns\Configurable;
use RuntimeException;
use Wa72\HtmlPageDom\HtmlPageCrawler;

/**
* @internal This class is marked as Internal as it is considered Experimental. Code subject to change until warning removed.
mallardduck marked this conversation as resolved.
Show resolved Hide resolved
*/
class HtmlReplacer
{
use Configurable;

public function __construct()
{
if (! class_exists(HtmlPageCrawler::class)) {
throw new RuntimeException(
sprintf('Cannot use %s method unless `wa72/htmlpagedom` is installed.', __METHOD__)
);
}
}

public function parse(string $html): string
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to support full HTML docs and HTML fragments, then this method should:

  1. Immediately determine if the input $html is a full DOM page, then
  2. either use HtmlPage (used here) and work based on the Body, or
  3. use the HtmlPageCrawler to parse the fragment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in PHP partial HTML is more common than a full document. Except you are implementing it as some kind of middleware to parse the whole HTML response.
But in general it should support both if possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this by using the more general HTML parser, then adding a step where we check if the input HTML is a Page/Doc and selecting the body from that. As I was already replacing based on a HTML fragment (the body), supporting fragments as input was rather simple.

{
// Parse the HTML page or fragment...
$parsedHtmlRoot = new HtmlPageCrawler($html);
// Filter parsed HTML "root" into the twemoji relevant parts...
$parsedHtml = $this->whenHtmlDocFilterBody($parsedHtmlRoot);

// If the filtered DOM fragment doesn't have any children, return the input HTML.
if ($parsedHtml->children()->count() === 0) {
return $html;
}

// Use xpath to filter only the "TextNodes" within every "Element"
$textNodes = $parsedHtml->filterXPath('.//*[normalize-space(text())]');

$textNodes->each(function (HtmlPageCrawler $node) {
$twemojiContent = (new EmojiText($node->innerText()))
->base($this->base)
->type($this->type)
->toHtml();
$node->makeEmpty()->setInnerHtml($twemojiContent);

return $node;
});

return $parsedHtmlRoot->saveHTML();
}

private function whenHtmlDocFilterBody(HtmlPageCrawler $htmlRoot): HtmlPageCrawler
{
if ($htmlRoot->isHtmlDocument()) {
return $htmlRoot->filter('body');
}

return $htmlRoot;
}
}
3 changes: 2 additions & 1 deletion src/Twemoji.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ public function __construct(array $codepoints)

public static function emoji(string $emoji): self
{
$chars = preg_split('//u', $emoji, null, PREG_SPLIT_NO_EMPTY);
$chars = preg_split('//u', $emoji, -1, PREG_SPLIT_NO_EMPTY);

$codepoints = array_map(
fn (string $code): string => dechex(mb_ord($code)),
Expand Down Expand Up @@ -58,6 +58,7 @@ public function url(): string
);
}

#[\ReturnTypeWillChange]
public function jsonSerialize()
{
return $this->url();
Expand Down
2 changes: 1 addition & 1 deletion src/emoji_bytes.regexp

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions tests/Datasets/HtmlContent.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
<?php

dataset('html-pages', [
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head></head>
<body></body>
</html>
HTML,
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5🚀 Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body></body>
</html>
HTML,
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5🚀 Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<h1>Do a quick kickflip! 🛹</h1>
<p>This is HTML text that should be replaced, but the emoji in the head should not.</p>
<h2>Time for a CRAB RAVE!</h2>
<p>🦀🦀🦀🦀🦀</p>
<p>🦀🦀🦀</p>
<p>🦀🦀🦀🦀🦀</p>
<h2>🙏🐘</h2>
</body>
</html>
HTML,
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test with Emoji in ALT text</title>
</head>
<body>
<h1>Hello Friends 👋</h1>
<img src="http://fillmurray.lucidinternets.com/200/300" alt="A random image of Bill Murray 🍻" title="maybe an image of bill murry with a raised glass 🍺">
<h2>Time for a ElePHPant RAVE!</h2>
<p>🐘🐘🐘🐘</p>
<p>🐘🐘🐘</p>
<p>🐘🐘🐘🐘🐘</p>
<p>🐘🐘</p>
</body>
</html>
HTML,
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test with Emoji in ALT text</title>
</head>
<body>
<header>
<h1>Hello Friends 👋</h1>
<img src="http://fillmurray.lucidinternets.com/200/300" alt="A random image of Bill Murray 🍻" title="maybe an image of bill murry with a raised glass 🍺">
</header>
<main>
<section>
<h2>Time for a ElePHPant RAVE!</h2>
<p>🐘🐘🐘🐘</p>
<p>🐘🐘🐘</p>
<p>🐘🐘🐘🐘🐘</p>
<p>🐘🐘</p>
</section>
</main>
</body>
</html>
HTML,
<<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5🚀 Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<h1>Do a quick kickflip! 🛹</h1>
<p>This is HTML text that should be replaced, but the emoji in the head should not.</p>
<h2>Time for a CRAB RAVE!</h2>
<p>🦀🦀🦀🦀🦀</p>
<p>🦀🦀🦀</p>
<p>🦀🦀🦀🦀🦀</p>
<h2>🙏🐘</h2>
<script type="text/javascript">
/* Ensure it won't touch emoji in script sections */
var badTime = "💩";
</script>
</body>
</html>
HTML,
]);

dataset('html-fragments', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the most simple fragments here first? And add the more complex/combined ones to the end?

<p>🚀</p>
<img src="" alt="🎉"/>
<a href="" title="🎈">Link ⛓️</a>

So some of these single-element examples.

I can also imagine that example which shouldn'T be replaced:

<script>document.innerHTML = '🤷‍♂️';</script>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those more simple fragments to the top and even moved the fragment dataset to be the "primary" source.

This feedback clued me off to a flaw in my "early return" logic. So now I'm making sure I check for existence of "Text" nodes rather than children in general. Since your first example has no actual children but does have Text nodes I can work on that was being skipped with my original logic.

Only issue this highlighted is that currently the code does affect the script tag contents. I'll circle back later today/tomorrow to look into that aspect of things. Seems like I'll need to get creative about creating some sort of exclude list to skip Text nodes inside elements like script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for <style> I think. But like said, this will be experimental - so It's also fine to add a ->wip() todo() or whatever Pest test for such things. To highlight what has to be fixed/fulfilled to get it out of experimental mode.

How about passing Text only without any nodes?

This is some fancy-💃 Markdown/WYSIWYG text with surrounding <p> tags disabled. 🎉

Or without surrounding elements but one somewhere in the middle?

This is some fancy-💃 Markdown/WYSIWYG text with surrounding <code><p></code> tags disabled. 🎉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I haven't gotten back around to creating an exclude list concept, however I'm going to start taking that on next. Just wanted to report back on my findings related to passing those "plain text" examples. Unfortunately it seems that passing text not wrapped in a node is interpreted in a variety of ambiguous ways.

For instance, the first example technically should have those braces escaped as &lt; and &gt; - otherwise browsers (and PHP DOM) will automatically parse the P tag and add a close. So I've opted to test with the escaped codes and this line is interpreted as XML. So internally it's wrapped with a _root and as such the XPath to find the inner text nodes fails. Ultimately the result is the text gets returned untouched and Emoji's stay emojis. The second example is basically the same too - just without needing to escape the HTML tag since the code tag has a close.

Long story short, it got complicated really fast lol. So I think I need to take a step back and focus on the HTML parsing situation and get a handle on how various inputs are being parsed. The DOM library I picked for us uses Symfony/dom-crawler as it's core which in turn uses Mastermind\HTML5 library to compensate for PHP's core dom having some failings.

All that said, my next step will be to create my own test repo to assess the overall situation for PHP's HTML parsing abilities. Since I think we will want a more clear picture about what behaviour I'm seeing from each mechanic. For instance, how is PHP core ext-dom treating things, vs how Mastermind\HTML5 parser, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in general I can say that the symfony Dom crawler is an outstanding package and one of if not the best solution for HTML in PHP.

Like said before: don't escalate this too far. With tests that are meant to fail we at least have a good list of things needed for a final release. And even if you fix some of those, take the chance for more PRs. 😉🎃

Regarding the plain text thing: how about a detector with some cases/match arms?

  1. full HTML document
  2. partial HTML but starting/ending with a tag or even surrounded by one
  3. text with HTML sprinkles - solution could be to surround it with <div>, do the job and afterwards take $div->innerHTML

Again, these are just ideas - I'm also personally more happy to merge several smaller and focused PRs than one beast of a PR. As it's also easier to review these smaller ones.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that perspective! Always appreciate the sage advice and I think you nailed it. I'll start to refocus this PR to cover less of the edge cases and focus on the most common uses. Then I can adjust the tests a bit to give us a guiding star for future enhancements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one last thing: personally I would prefer datasets only for the same thing. Meaning really the same test case.

it does not replace emojis in HTML attributes

<a title="🚀">hello</a>
<img alt="🖼️"/>

it replaces emojis in text nodes

<p>🎉</p>
<div>💃</div>

So I'm more happy with a lot of copy'n'paste test cases with better/easier descriptions than one test case running through a massive dataset and no one really knows what's tested with that data-point and what is still missing or not covered. So every expectation to the lib should be one test case - and we can bullet proof that expectation by adding multiple variants of it. Like <a> and <img/> for the attributes. And later someone adds <button> to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the tests now and followed your advice of individual tests. Great idea as this allows me to more obviously target edge cases. So I've taken advantage of that by adding some explicit expectations to tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news!

I've caught the root of the issue I've been chasing that I thought was something I did wrong. Specifically the way HTML attributes were being cast/converted to their HTML entities equivalent instead of staying unicode.

It turns out that this is caused by DOMDocument itself. I thought it was being caused by either: a) the xpath I found to use, or b) the HTML5 transforming library I opted to choose. In the end, reading the comments on PHP's docs for DOMDocument was my salvation! Specifically the 15 year old comment from xuanbn that said:

I have discovered that, to help loadHTML() process utf file correctly, the meta tag should come first, before any utf string appear. For example, this HTML file. 

<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8">
        <title> Vietnamese - Tiếng Việt</title>
    </head>
<body></body>
</html>

With that new info I added an additional test that shows how the HTML being input determines how DOMDocument treats things up on calling saveHtml(). So I think this gets us to a lot better place where I've created a working solution, but we show that it's experimental and highlight the issue with a (skipped) failing test.

<<<'HTML'
<section class="comment-box">
<div class="comment-content">
<h2>Time for a ElePHPant RAVE!</h2>
<p>🐘🐘🐘🐘</p>
<p>🐘🐘🐘</p>
<p>🐘🐘🐘🐘🐘</p>
<p>🐘🐘</p>
</div>
<section class="sub-comments">
<section class="comment-box">
<div class="comment-content">
<h2>Time for a cRUSTation RAVE!</h2>
<p>🦀🦀🦀🦀</p>
<p>🦀🦀</p>
<p>🦀🦀🦀🦀</p>
<p>🦀</p>
</div>
</section>
<section class="comment-box">
<div class="comment-content">
<p>but what if the crabs and elephants rave together?!</p>
</div>
</section>
</section>
</section>
HTML,
<<<'HTML'
<article>
<p>Lorem 😂😂 ipsum 🕵️‍♂️dolor sit✍️ amet, consectetur adipiscing😇😇🤙 elit, sed do eiusmod🥰 tempor 😤😤🏳️‍🌈incididunt ut 👏labore 👏et👏 dolore 👏magna👏 aliqua.</p>
<p>Ut enim ad minim 🐵✊🏿veniam,❤️😤😫😩💦💦 quis nostrud 👿🤮exercitation ullamco 🧠👮🏿‍♀️🅱️laboris nisi ut aliquip❗️ ex ea commodo consequat.</p>
<p>💯Duis aute💦😂😂😂 irure dolor 👳🏻‍♂️🗿in reprehenderit 🤖👻👎in voluptate velit esse cillum dolore 🙏🙏eu fugiat🤔 nulla pariatur.</p>
<p>🙅‍♀️🙅‍♀️Excepteur sint occaecat🤷‍♀️🤦‍♀️ cupidatat💅 non💃 proident,👨‍👧 sunt🤗 in culpa😥😰😨 qui officia🤩🤩 deserunt mollit 🧐anim id est laborum.🤔🤔</p>
</article>
HTML,
]);
15 changes: 15 additions & 0 deletions tests/Unit/HtmlTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?php

use Astrotomic\Twemoji\HtmlReplacer;
use function Spatie\Snapshots\assertMatchesHtmlSnapshot;
use function Spatie\Snapshots\assertMatchesTextSnapshot;

it('can parse HTML Pages', function (string $html) {
$htmlReplacer = (new HtmlReplacer())->png();
assertMatchesHtmlSnapshot($htmlReplacer->parse($html));
})->with('html-pages');

it('can parse HTML fragments content', function (string $html) {
$htmlReplacer = (new HtmlReplacer())->png();
assertMatchesTextSnapshot($htmlReplacer->parse($html));
})->with('html-fragments');
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!DOCTYPE html>
<html lang="en">
<head></head>
<body></body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5&#128640; Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body></body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5&#128640; Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<h1>Do a quick kickflip! <img src="https://twemoji.maxcdn.com/v/latest/72x72/1f6f9.png" alt="&#128761;" width="72" height="72" loading="lazy" class="twemoji">
</h1>
<p>This is HTML text that should be replaced, but the emoji in the head should not.</p>
<h2>Time for a CRAB RAVE!</h2>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<h2>
<img src="https://twemoji.maxcdn.com/v/latest/72x72/1f64f.png" alt="&#128591;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji">
</h2>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test with Emoji in ALT text</title>
</head>
<body>
<h1>Hello Friends <img src="https://twemoji.maxcdn.com/v/latest/72x72/1f44b.png" alt="&#128075;" width="72" height="72" loading="lazy" class="twemoji">
</h1>
<img src="http://fillmurray.lucidinternets.com/200/300" alt="A random image of Bill Murray &#127867;" title="maybe an image of bill murry with a raised glass &#127866;">
<h2>Time for a ElePHPant RAVE!</h2>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test with Emoji in ALT text</title>
</head>
<body>
<header>
<h1>Hello Friends <img src="https://twemoji.maxcdn.com/v/latest/72x72/1f44b.png" alt="&#128075;" width="72" height="72" loading="lazy" class="twemoji">
</h1>
<img src="http://fillmurray.lucidinternets.com/200/300" alt="A random image of Bill Murray &#127867;" title="maybe an image of bill murry with a raised glass &#127866;">
</header>
<main>
<section>
<h2>Time for a ElePHPant RAVE!</h2>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji"></p>
</section>
</main>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>HTML 5&#128640; Boilerplate</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<h1>Do a quick kickflip! <img src="https://twemoji.maxcdn.com/v/latest/72x72/1f6f9.png" alt="&#128761;" width="72" height="72" loading="lazy" class="twemoji">
</h1>
<p>This is HTML text that should be replaced, but the emoji in the head should not.</p>
<h2>Time for a CRAB RAVE!</h2>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<p><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f980.png" alt="&#129408;" width="72" height="72" loading="lazy" class="twemoji"></p>
<h2>
<img src="https://twemoji.maxcdn.com/v/latest/72x72/1f64f.png" alt="&#128591;" width="72" height="72" loading="lazy" class="twemoji"><img src="https://twemoji.maxcdn.com/v/latest/72x72/1f418.png" alt="&#128024;" width="72" height="72" loading="lazy" class="twemoji">
</h2>
<script type="text/javascript">/* Ensure it won't touch emoji in script sections */ var badTime = "&#128169;";</script>
</body>
</html>
Loading