From 72cf1a9ee8bfa074cd61e78ebf9a76ce3afe8708 Mon Sep 17 00:00:00 2001 From: Barry Date: Wed, 30 Dec 2020 09:38:30 +0000 Subject: [PATCH 01/11] More edits --- src/content/en/2020/markup.md | 1151 ++++++++++++++---- src/static/images/2020/markup/script-use.png | Bin 17648 -> 18444 bytes 2 files changed, 887 insertions(+), 264 deletions(-) diff --git a/src/content/en/2020/markup.md b/src/content/en/2020/markup.md index 5dcd781008b..77ed0b7dbf0 100644 --- a/src/content/en/2020/markup.md +++ b/src/content/en/2020/markup.md @@ -29,7 +29,7 @@ The web is built on HTML. Without HTML there are no web pages, no web sites, no How do we use HTML, then, how great of a foundation do we have? In the introductory section of the [2019 Markup chapter](../2019/markup#introduction), author [Brian Kardell](../2019/contributors#bkardell) suggested that for a long time, we haven't really known. There were some smaller samples. For example, there was [Ian Hickson's research](https://web.archive.org/web/20060203035414/http://code.google.com/webstats/index.html) (one of modern HTML's parents) among a few others, but until last year we lacked major insight into how we as developers, as authors, make use of HTML. In 2019 we had both [Catalin Rosu's work](https://www.advancedwebranking.com/html/) (one of this chapter's co-authors) as well as the 2019 edition of the Web Almanac to give us a better view again of HTML in practice. -Last year's analysis was based on 5.8 million pages, of which 4.4 million were tested on desktop and 5.3 million on mobile. This year we analyzed 7.5 million pages, of which 5.6 million were tested on desktop and 6.3 million on mobile, using the [latest data](./methodology#websites) on the websites users are visiting in 2020. We do make some comparisons to last year, but just as we've tried to analyze additional metrics for new insights, we've also tried to impart our own personalities and perspectives throughout the chapter. +Last year's analysis was based on 5.8 million pages, of which 4.4 million were tested on desktop and 5.3 million on mobile. This year we analyzed 7.5 million pages, of which 5.6 million were tested on desktop and 6.3 million on mobile, using the [latest data](./methodology#websites) on the websites users are visiting in 2020. We do make some comparisons to last year but have also analyzed additional metrics for new insights. We've also tried to impart our own personalities and perspectives throughout the chapter.

In this Markup chapter, we're focusing almost exclusively on HTML, rather than SVG or MathML, which are also considered markup languages. Unless otherwise noted, stats presented in this chapter refer to the set of mobile pages. Additionally, the data for all Web Almanac chapters is open and available. Take a look at the results and share your observations with the community! @@ -51,16 +51,44 @@ In this section, we're covering the higher-level aspects of HTML like document t 96.82% of pages declare a [_doctype_](https://developer.mozilla.org/en-US/docs/Glossary/Doctype). HTML documents declaring a doctype is useful for historical reasons, "to avoid triggering quirks mode in browsers" as [Ian Hickson wrote in 2009](https://lists.w3.org/Archives/Public/public-html-comments/2009Jul/0020.html). What are the most popular values? -

-| Doctype | Pages | Percentage | -|---|---|---| -| HTML ("HTML5") | 5,441,815 | 85.73% | -| XHTML 1.0 Transitional | 382,322 | 6.02% | -| XHTML 1.0 Strict | 107,351 | 1.69% | -| HTML 4.01 Transitional | 54,379 | 0.86% | -| HTML 4.01 Transitional ([quirky](https://hsivonen.fi/doctype/#xml)) | 38,504 | 0.61% | - -
{{ figure_link(caption="The 5 most popular doctypes.", sheets_gid="1981441894", sql_file="summary_pages_by_device_and_doctype.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DoctypePagesPages (%)
HTML ("HTML5")5,441,81585.73%
XHTML 1.0 Transitional382,3226.02%
XHTML 1.0 Strict107,3511.69%
HTML 4.01 Transitional54,3790.86%
HTML 4.01 Transitional (quirky)38,5040.61%
+
{{ figure_link(caption="The 5 most popular doctypes.", sheets_gid="1981441894", sql_file="summary_pages_by_device_and_doctype.sql") }}
You can already tell how the numbers decrease quite a bit after XHTML 1.0, before entering the long tail with a few standard, some esoteric, and also bogus doctypes. @@ -74,7 +102,6 @@ Two things stand out from these results: A page's document size refers to the amount of HTML bytes transferred over the network, including compression if enabled. At the extremes of the set of 6.3 million documents: -{# TODO(authors, analysts): Revisit the "largest document" stat and interpretation. #} * 1,110 documents are empty (0 bytes). * The average document size is 49.17 KB ([in most cases compressed](https://w3techs.com/technologies/details/ce-gzipcompression)). * The largest document by far weighs 61.19 _MB_, almost deserving its own analysis and chapter in the Web Almanac. @@ -87,20 +114,17 @@ How is this situation in general, then? The median document weighs 24.65 KB, whi description="Document size in bytes per percentile, with the median document weighing 25.99 KB on desktop.", sheets_gid="2066175354", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=386686971&format=interactive", - width=600, - height=371, sql_file="summary_pages_by_device_and_percentile.sql" ) }} ### Document language -{# TODO(editors): Link directly to the relevant Accessibility section. #} -We identified 2,863 different values for the `lang` attribute on the `html` start tag (compare that to the [7,117 spoken languages](https://www.ethnologue.com/guides/how-many-languages) as per Ethnologue). Almost all of them seem valid, according to the [Accessibility](./accessibility) chapter. +We identified 2,863 different values for the `lang` attribute on the `html` start tag (compare that to the [7,117 spoken languages](https://www.ethnologue.com/guides/how-many-languages) as per Ethnologue). Almost all of them seem valid, according to the [Accessibility](./accessibility#language-identification) chapter. -22.36% of all documents specify no `lang` attribute. The commonly accepted view is that [they should](https://www.w3.org/TR/i18n-html-tech-lang/#overall), but beside the idea that software could eventually [detect language automatically](https://meiert.com/en/blog/lang/), document language can also be specified [on the protocol level](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Language). This is something we didn't check. +22.36% of all documents specify no `lang` attribute. The commonly accepted view is that [they should](https://www.w3.org/TR/i18n-html-tech-lang/#overall), but ignoring the fact that software could eventually [detect language automatically](https://meiert.com/en/blog/lang/), document language can also be specified [on the protocol level](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Language), which is something we didn't check. -Here are the 10 most popular (normalized) languages in our sample. It's important to note that the HTTP Archive crawls from US data centers with English language settings, so looking at the language pages are written in will be skewed towards English. Nevertheless we present the `lang` attributes seen to give some context to the sites analyzed. +Here are the 10 most popular (normalized) languages in our sample. It's important to note that the HTTP Archive crawls from US data centers with English language settings, so looking at the language pages are written in, will be skewed towards English. Nevertheless we present the `lang` attributes seen to give some context to the sites analyzed. {{ figure_markup( image="document-language.png", @@ -109,8 +133,6 @@ Here are the 10 most popular (normalized) languages in our sample. It's importan description="Bar chart showing the top 10 `lang` attributes used in our crawl with 22.82% of desktop and 22.36% of mobile pages not setting this, `en` being used on 20.09% and 18.08% respectively, `ja` on 15.17% and 13.27%, `es` on 4.86% and 4.09% , `pt-br` on 2.65% and 2.84%, `ru` on 2.21% 2.53%, `en-gb` on 2.35% and 2.19%, `de` on 1.50% and 1.92%, and finally `fr` being used on 1.55% and 1.43% respectively.", sheets_gid="2047285366", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=1873310240&format=interactive", - width=600, - height=371, sql_file="pages_almanac_by_device_and_html_lang.sql" ) }} @@ -123,7 +145,7 @@ Adding comments to code is generally a good practice and HTML comments are there ``` -Although many pages will have been stripped of comments for production, we found that index pages in the 90th percentile are using about 73 comments on mobile, respectively 79 comments on desktop, while in the 10th percentile the number of the comments is about 2. The median page uses 16 (mobile) or 17 comments (desktop). +Although many pages will have been stripped of comments for production, we found that home pages in the 90th percentile are using about 73 comments on mobile, respectively 79 comments on desktop, while in the 10th percentile the number of the comments is about 2. The median page uses 16 (mobile) or 17 comments (desktop). Around 89% of pages contain at least one HTML comment, while about 46% of them contain a conditional comment. @@ -135,9 +157,9 @@ Around 89% of pages contain at least one HTML comment, while about 46% of them c ``` -The above is a non-standard HTML conditional comment. While those have proven to be helpful in the past in order to tackle browser differences, they are history for some time as Microsoft [dropped conditional comments](https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/compatibility/hh801214(v=vs.85)) in Internet Explorer 10. +The above is a non-standard HTML conditional comment. While those have proven to be helpful in the past in order to tackle browser differences, they are historical for some time as Microsoft [dropped conditional comments](https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/compatibility/hh801214(v=vs.85)) in Internet Explorer 10. -Still, on the above percentile extremes, we found that web pages are using about 6 conditional comments in the 90th percentile, and 1 comment while in the 10th percentile. Most of the pages include them for helpers such as html5shiv, selectivizr, and respond.js. While being decentish and still active pages, our conclusion is that many of them were using obsolete CMS themes. +Still, on the above percentile extremes, we found that web pages are using about 6 conditional comments in the 90th percentile, and 1 conditional comment while in the 10th percentile. Most of the pages include them for helpers such as [html5shiv](https://github.com/aFarkas/html5shiv), [selectivizr](http://selectivizr.com/), and [respond.js](https://github.com/scottjehl/Respond). While being decentish and still active pages, our conclusion is that many of them were using obsolete CMS themes. For production, HTML comments are usually stripped by build tools. Considering all the above counts and percentages, and referring to the use of comments in general, we suppose that lots of pages are served without involving an HTML minifier. @@ -149,22 +171,20 @@ Overall, around 2% of pages contain no scripting at all, not even structured dat At the opposite end of the spectrum, the numbers show that about 97% of pages contain at least one script, either inline or external. -{# TODO(analysts): We still have a problem here with the x-axis label (“Containing”). Can someone help out and look at this? #} {{ figure_markup( image="script-use.png", + alt="Usage of the script element.", caption="Usage of the script element.", description="Percentages of pages (not) containing scripts, and scripts are present in almost every form on almost every page.", sheets_gid="150962402", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=1895084382&format=interactive", - width=600, - height=371, sql_file="pages_almanac_by_device.sql" ) }} When scripting is unsupported or turned off in the browser, the `noscript` element helps to add an HTML section within a page. Considering the above script numbers, we were curious about the `noscript` element as well. -Following the analysis, we found that about 49% of pages are using a `noscript` element. At the same time, about 16% of `noscript` elements were containing an `iframe` with a `src` value referring to "googletagmanager.com". +Following the analysis, we found that about 49% of pages are using a `noscript` element. At the same time, about 16% of `noscript` elements contain an `iframe` with a `src` value referring to "googletagmanager.com". This seems to confirm the theory that the total number of `noscript` elements in the wild may be affected by common scripts like Google Tag Manager which enforce users to add a `noscript` snippet after the `body` start tag on a page. @@ -198,22 +218,16 @@ The median web page, it turns out, uses 30 different elements, 587 times: description="Element types per percentile, with 90% of pages using at least 20 different elements.", sheets_gid="46490104", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=924238918&format=interactive", - width=600, - height=371, sql_file="pages_element_count_by_device_and_percentile.sql" ) }} -{# Editors note: The caption for the two figures below is intentionally identical. #} - {{ figure_markup( image="element-diversity.png", - caption="Distribution of the total number elements per page.", + caption="Distribution of the total number elements per page by percentile.", description="Elements per percentile, showing how 10% of all pages employ more than 1,665 elements.", sheets_gid="46490104", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=680594018&format=interactive", - width=600, - height=371, sql_file="pages_element_count_by_device_and_percentile.sql" ) }} @@ -228,8 +242,6 @@ How are these elements distributed? description="Element distribution in a scatter plot, and even for a trained observer it's hard to parse it; interesting is a large group of about 7,500 pages each using roughly 250 elements, after which fewer and fewer pages get back to more and more elements.", sheets_gid="1361520223", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQPKzFb574UnGTcfw5mcD1qR7RYHyGjQTc2hiMuYix0QoTH1DPe54Q2JucXL8bfZ6kjRoAfhk3ckudc/pubchart?oid=1468756779&format=interactive", - width=600, - height=371, sql_file="pages_element_count_by_device_and_element_count.sql" ) }} @@ -240,28 +252,76 @@ Not that much changed [compared to 2019](../2019/markup#fig-3)! In 2019, the Markup chapter of the Web Almanac featured the most frequently used elements in reference to [Ian Hickson's work in 2005](https://web.archive.org/web/20060203031713/http://code.google.com/webstats/2005-12/elements.html). We found this useful and had a look at that data again: -
-| 2005 | 2019 | 2020 | -|---|---|---| -| `title` | `div` | `div` | -| `a` | `a` | `a` | -| `img` | `span` | `span` | -| `meta` | `li` | `li` | -| `br` | `img` | `img` | -| `table` | `script` | `script` | -| `td` | `p` | `p` | -| `tr` | `option` | `link` | -| | | `i` | -| | | `option`| - -
{{ figure_link(caption="The most popular elements in 2005, 2019, and 2020.", sheets_gid="781932961", sql_file="pages_element_count_by_device_and_element_type_frequency.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
200520192020
titledivdiv
aaa
imgspanspan
metalili
brimgimg
tablescriptscript
tdpp
troptionlink
i
option
+
{{ figure_link(caption="The most popular elements in 2005, 2019, and 2020.", sheets_gid="781932961", sql_file="pages_element_count_by_device_and_element_type_frequency.sql") }}
Nothing changed in the Top 7, but the `option` element went a little out of favor and dropped from 8 to 10, letting both the `link` and the `i` element pass in popularity. These elements have risen in use, possibly due to an increase in use of [resource hints](./resource-hints) (as with prerendering and prefetching), as well icon solutions like [Font Awesome](https://fontawesome.com/), which _de facto_ misuses `i` elements for the purpose of displaying icons. #### `details` and `summary` -Another thing we were curious about was the use of the [`details` and `summary` elements](https://html.spec.whatwg.org/multipage/rendering.html#the-details-and-summary-elements), especially since 2020 brought [broad support](https://caniuse.com/details). Are they being used? Are they attractive for—even popular—among authors? As it turns out, only 0.39% of all tested pages are using them, although it's hard to gauge whether they were all used the correct way in exactly the situations when you need them, "popular" is the wrong word. +Another thing we were curious about was the use of the [`details` and `summary` elements](https://html.spec.whatwg.org/multipage/rendering.html#the-details-and-summary-elements), especially since 2020 brought [broad support](https://caniuse.com/details). Are they being used? Are they attractive for—even popular—among authors? As it turns out, only 0.39% of all tested pages are using them—although it's hard to gauge whether they were all used the correct way in exactly the situations when you need them. Here's a simple example showing the use of a `summary` in a `details` element: @@ -303,40 +363,93 @@ Accordingly, we looked at the number of `details` and `summary` elements and it -
{{ figure_link(caption="Adoption of the details and summary elements.", sheets_gid="1406534257", sql_file="pages_element_count_by_device.sql") }}
+
{{ figure_link(caption="Adoption of the details and summary elements.", sheets_gid="1406534257", sql_file="pages_element_count_by_device.sql") }}
### Probability of element use Taking another look at element popularity, how likely is it to find a certain element in the DOM of a page? Surely, `html`, `head`, `body` are present on every page (even though [their tags are all optional](https://meiert.com/en/blog/optional-html/)), making them common elements, but what other elements are to be found? -
-| Element | Probability | -|---|---| -| `title` | 99.34% | -| `meta` | 99.00% | -| `div` | 98.42% | -| `a` | 98.32% | -| `link` | 97.79% | -| `script` | 97.73% | -| `img` | 95.83% | -| `span` | 93.98% | -| `p` | 88.71% | -| `ul` | 87.68% | - -
{{ figure_link(caption="High probabilities of finding a given element in pages of the Web Almanac 2020 sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ElementProbability
title99.34%
meta99.00%
div98.42%
a98.32%
link97.79%
script97.73%
img95.83%
span93.98%
p88.71%
ul87.68%
+
{{ figure_link(caption="High probabilities of finding a given element in pages of the Web Almanac 2020 sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
-Standard elements are those that are or were part of the HTML specification. Which ones are you really rarely to find? In our sample, that would bring up the following: - -
-| Element | Probability | -|---|---| -| `dir` | 0.0082% | -| `rp` | 0.0087% | -| `basefont` | 0.0092% | +Standard elements are those that are or were part of the HTML specification. Which ones are rare to find? In our sample, that would bring up the following: -
{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + +
ElementProbability
dir0.0082%
rp0.0087%
basefont0.0092%
+
{{ figure_link(caption="Low probabilities of finding a given element in pages of the sample.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
We're including these elements to give an idea what elements may have gone out of favor. But while `dir` and `basefont` were last specified in XHTML 1.0 (2000), the rare use of `rp`, which has been mentioned [as early as 1998](https://www.w3.org/TR/1998/WD-ruby-19981221/#a2-4) but which is also [still part of HTML](https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-rp-element), may just suggest that Ruby markup is not very popular. @@ -345,34 +458,94 @@ We're including these elements to give an idea what elements may have gone out o The 2019 edition of the Web Almanac handled [custom elements](../2019/markup#custom-elements) by discussing several non-standard elements. This year, we found it valuable to have a closer look at custom elements. How did we determine these? Roughly by looking at [their definition](https://html.spec.whatwg.org/multipage/custom-elements.html#custom-elements-core-concepts), notably their use of a hyphen. Let's focus on the top elements, in this case elements used on ≥1% of all URLs in the sample: -{# TODO(authors, analysts): Clarify occurrences and percentages _of what_. Pages? Elements? #} - -
-| Element | Occurrences | Percentage | -|---|---|---| -| `ym-measure` | 141,156 | 2.22% | -| `wix-image` | 76,969 | 1.21% | -| `rs-module-wrap` | 71,272 | 1.12% | -| `rs-module` | 71,271 | 1.12% | -| `rs-slide` | 70,970 | 1.12% | -| `rs-slides` | 70,993 | 1.12% | -| `rs-sbg-px` | 70,414 | 1.11% | -| `rs-sbg-wrap` | 70,414 | 1.11% | -| `rs-sbg` | 70,413 | 1.11% | -| `rs-progress` | 70,651 | 1.11% | -| `rs-mask-wrap` | 63,871 | 1.01% | -| `rs-loop-wrap` | 63,870 | 1.01% | -| `rs-layer-wrap` | 63,849 | 1.01% | -| `wix-iframe` | 63,590 | 1% | - -
{{ figure_link(caption="The 14 most popular custom elements.", sheets_gid="770933671", sql_file="pages_element_count_by_device_and_custom_dash_elements.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ElementPagesPercentage
ym-measure141,1562.22%
wix-image76,9691.21%
rs-module-wrap71,2721.12%
rs-module71,2711.12%
rs-slide70,9701.12%
rs-slides70,9931.12%
rs-sbg-px70,4141.11%
rs-sbg-wrap70,4141.11%
rs-sbg70,4131.11%
rs-progress70,6511.11%
rs-mask-wrap63,8711.01%
rs-loop-wrap63,8701.01%
rs-layer-wrap63,8491.01%
wix-iframe63,5901%
+
{{ figure_link(caption="The 14 most popular custom elements.", sheets_gid="770933671", sql_file="pages_element_count_by_device_and_custom_dash_elements.sql") }}
These elements come from three sources: [Yandex Metrica](https://metrica.yandex.com/about) (`ym-`), an analytics solution we've also seen last year; [Slider Revolution](https://www.sliderrevolution.com/) (`rs-`), a WordPress slider, for which there are more elements to be found near the top of the sample; and [Wix](https://www.wix.com/) (`wix-`), a website builder. -{# TODO(authors, analysts): What do "cases" mean here: pages/elements? And for desktop or mobile? #} - -Other groups that stand out include [AMP markup](https://amp.dev/) with `amp-` elements like `amp-img` (11,700 cases), `amp-analytics` (10,256) and `amp-auto-ads` (7,621), as well as [Angular](https://angular.io/) `app-` elements like `app-root` (16,314), `app-footer` (6,745), and `app-header` (5,274). +Other groups that stand out include [AMP markup](https://amp.dev/) with `amp-` elements like `amp-img` (11,700 pages), `amp-analytics` (10,256) and `amp-auto-ads` (7,621), as well as [Angular](https://angular.io/) `app-` elements like `app-root` (16,314), `app-footer` (6,745), and `app-header` (5,274). ### Obsolete elements @@ -380,20 +553,64 @@ There are more questions to ask about the use of HTML, and one may relate to obs In our mobile dataset of 6.3 million pages, around 0.9 million pages (14.01%) contain one or more of these elements. Here are the top 9, which are used more than 10,000 times: -
-| Element | Occurrences | Pages (%) | -|---|---|---| -| `center` | 458,402 | 7.22% | -| `font` | 430,987 | 6.79% | -| `marquee` | 67,781 | 1.07% | -| `nobr` | 31,138 | 0.49% | -| `big` | 27,578 | 0.43% | -| `frame` | 19,363 | 0.31% | -| `frameset` | 19,163 | 0.30% | -| `strike` | 17,438 | 0.27% | -| `noframes` | 15,016 | 0.24% | - -
{{ figure_link(caption="Obsolete elements with more than 10,000 uses.", sheets_gid="1972617631", sql_file="pages_element_count_by_device_and_obsolete_elements.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ElementPagesPercentage
center458,4027.22%
font430,9876.79%
marquee67,7811.07%
nobr31,1380.49%
big27,5780.43%
frame19,3630.31%
frameset19,1630.30%
strike17,4380.27%
noframes15,0160.24%
+
{{ figure_link(caption="Obsolete elements with more than 10,000 uses.", sheets_gid="1972617631", sql_file="pages_element_count_by_device_and_obsolete_elements.sql") }}
Even `spacer` is still being used 1,584 times, and present on every 5,000th page. We know that Google has been using a `center` element on [their homepage](https://www.google.com/) [for 22 years](https://web.archive.org/web/19981202230410/https://www.google.com/) now, but why are there so many imitators? @@ -406,21 +623,58 @@ If you were wondering: The total number of [`isindex`](https://www.w3.org/TR/htm In our set of elements we found some that were neither standard HTML (nor SVG nor MathML) elements, nor custom ones, nor obsolete ones, but somewhat proprietary ones. The top 10 that we identified are the following: -
-| Element | Pages (%) | -|---|---| -| `noindex` | 0.89% | -| `jdiv` | 0.85% | -| `mediaelementwrapper` | 0.49% | -| `ymaps` | 0.26% | -| `yatag` | 0.20% | -| `ss` | 0.11% | -| `include` | 0.08% | -| `olark` | 0.07% | -| `h7` | 0.06% | -| `limespot` | 0.05% | - -
{{ figure_link(caption="Elements of questionable heritage.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ElementPages (%)
noindex0.89%
jdiv0.85%
mediaelementwrapper0.49%
ymaps0.26%
yatag0.20%
ss0.11%
include0.08%
olark0.07%
h70.06%
limespot0.05%
+
{{ figure_link(caption="Elements of questionable heritage.", sheets_gid="184700688", sql_file="pages_element_count_by_device_and_element_type_present.sql") }}
The source of these elements appears to be mixed, as in some are unknown while others can be traced. The most popular one, `noindex`, is probably due to [Yandex's recommendation](https://yandex.com/support/webmaster/adding-site/indexing-prohibition.html) of it to prohibit page indexing. `jdiv` was noted in [last year's Web Almanac](../2019/markup#products-and-libraries-and-their-custom-markup) and is from JivoChat. `mediaelementwrapper` comes from the MediaElement media player. Both `ymaps` and `yatag` are also from Yandex. The `ss` element could be from ProStores, a former ecommerce product from eBay, and `olark` may be from the Olark chat software. `h7` appears to be a mistake. `limespot` is probably related to the Limespot personalization program for ecommerce. None of these elements are part of a web standard. @@ -429,28 +683,76 @@ The source of these elements appears to be mixed, as in some are unknown while o [Headings](https://html.spec.whatwg.org/multipage/dom.html#heading-content) make for a special category of elements that play an important role in [sectioning](https://html.spec.whatwg.org/multipage/dom.html#sectioning-content-2) and for [accessibility](https://www.w3.org/WAI/tutorials/page-structure/headings/). -
-| Heading | Occurrences | Average per page | -|---|---|---| -| `h1` | 10,524,810 | 1.66 | -| `h2` | 37,312,338 | 5.88 | -| `h3` | 44,135,313 | 6.96 | -| `h4` | 20,473,598 | 3.23 | -| `h5` | 8,594,500 | 1.36 | -| `h6` | 3,527,470 | 0.56 | - -
{{ figure_link(caption="Frequency and average use of standard heading elements.", sheets_gid="277662548", sql_file="pages_wpt_bodies_by_device_and_percentile_and_heading_level.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
HeadingOccurrencesAverage per page
h110,524,8101.66
h237,312,3385.88
h344,135,3136.96
h420,473,5983.23
h58,594,5001.36
h63,527,4700.56
+
{{ figure_link(caption="Frequency and average use of standard heading elements.", sheets_gid="277662548", sql_file="pages_wpt_bodies_by_device_and_percentile_and_heading_level.sql") }}
You might have expected to only see the standard `

` to `

` elements, but some sites actually use more levels: -
-| Heading | Occurrences | Average per page | -|---|---|---| -| `h7` | 30,073 | 0.005 | -| `h8` | 9,266 | 0.0015 | - -
{{ figure_link(caption="Frequency and average use of non-standard heading elements.", sheets_gid="277662548", sql_file="pages_wpt_bodies_by_device_and_percentile_and_heading_level.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + +
HeadingOccurrencesAverage per page
h730,0730.005
h89,2660.0015
+
{{ figure_link(caption="Frequency and average use of non-standard heading elements.", sheets_gid="277662548", sql_file="pages_wpt_bodies_by_device_and_percentile_and_heading_level.sql") }}
The last two have never been part of HTML, of course, and should not be used. @@ -463,21 +765,69 @@ This section focuses on how attributes are used in documents and explores patter Similar to the section on the most [popular elements](#top-elements), this section delves into the most popular attributes on the web. Given how important the `href` attribute is for the web itself, or the `alt` attribute in order to make information [accessible](./accessibility), would these be most popular attributes? -
-| Attribute | Occurrences | Percentage | -|---|---|---| -| `class` | 2,998,695,114 | 34.23% | -| `href` | 928,704,735 | 10.60% | -| `style` | 523,148,251 | 5.97% | -| `id` | 452,110,137 | 5.16% | -| `src` | 341,604,471 | 3.90% | -| `type` | 282,298,754 | 3.22% | -| `title` | 231,960,356 | 2.65% | -| `alt` | 172,668,703 | 1.97% | -| `rel` | 171,802,460 | 1.96% | -| `value` | 140,666,779 | 1.61% | - -
{{ figure_link(caption="Top 10 attributes by frequency of use.", sheets_gid="1348855449", sql_file="pages_almanac_by_device_and_attribute_name_frequency.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeOccurrencesPercentage
class2,998,695,11434.23%
href928,704,73510.60%
style523,148,2515.97%
id452,110,1375.16%
src341,604,4713.90%
type282,298,7543.22%
title231,960,3562.65%
alt172,668,7031.97%
rel171,802,4601.96%
value140,666,7791.61%
+
{{ figure_link(caption="Top 10 attributes by frequency of use.", sheets_gid="1348855449", sql_file="pages_almanac_by_device_and_attribute_name_frequency.sql") }}
The most popular attribute is `class`, with nearly 3 billion occurrences in our dataset and constituting 34% of all attributes in use. `class` is by far the most prevalent attribute. @@ -488,21 +838,58 @@ The `value` attribute, which specifies the value of an `input` element, surprisi Are there attributes that we find in every document? Not quite, but almost: -
-Element | Pages (%) --- | -- -href | 99.21% -src | 99.18% -content | 98.88% -name | 98.61% -type | 98.55% -class | 98.24% -rel | 97.98% -id | 97.46% -style | 95.95% -alt | 90.75% - -
{{ figure_link( +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ElementPages (%)
href99.21%
src99.18%
content98.88%
name98.61%
type98.55%
class98.24%
rel97.98%
id97.46%
style95.95%
alt90.75%
+
{{ figure_link( caption="Top 10 attributes by page.", sheets_gid="1185369559", sql_file="pages_almanac_by_device_and_attribute_name_present.sql" @@ -517,25 +904,64 @@ Per the HTML spec, [`data-*` attributes](https://html.spec.whatwg.org/multipage/ The two most popular ones stand out because they are almost twice as popular than each of the attributes that followed (with >1% use): -
-| Attribute | Occurrences | Percentage | -|---|---|---| -| `data-src` | 26,734,560 | 3.30% | -| `data-id` | 26,596,769 | 3.28% | -| `data-toggle` | 12,198,883 | 1.50% | -| `data-slick-index` | 11,775,250 | 1.45% | -| `data-element_type` | 11,263,176 | 1.39% | -| `data-type` | 11,130,662 | 1.37% | -| `data-requiremodule` | 8,303,675 | 1.02% | -| `data-requirecontext` | 8,302,335 | 1.02% | - -
{{ figure_link(caption="The most popular data-* attributes.", sheets_gid="764700773", sql_file="pages_almanac_by_device_and_data_attribute_name_frequency.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeOccurrencesPercentage
data-src26,734,5603.30%
data-id26,596,7693.28%
data-toggle12,198,8831.50%
data-slick-index11,775,2501.45%
data-element_type11,263,1761.39%
data-type11,130,6621.37%
data-requiremodule8,303,6751.02%
data-requirecontext8,302,3351.02%
+
{{ figure_link(caption="The most popular data-* attributes.", sheets_gid="764700773", sql_file="pages_almanac_by_device_and_data_attribute_name_frequency.sql") }}
Attributes like `data-type`, `data-id`, and `data-src` can have multiple generic uses although `data-src` is used a lot with lazy image loading via JavaScript (e.g., Bootstrap 4). [Bootstrap](https://getbootstrap.com/) again explains the presence of `data-toggle`, where it's used as a state styling hook on toggle buttons. The [Slick carousel plugin](https://kenwheeler.github.io/slick/) is the source of `data-slick-index`, whereas `data-element_type` is part of [Elementor's WordPress website builder](https://elementor.com/). Both `data-requiremodule` and `data-requirecontext`, then, are part of [RequireJS](https://requirejs.org/). -{# TODO(authors): Update this interpretation given that the lazy loading stat is in terms of pages, not img elements. #} -Interestingly, the use of native lazy loading on images is similar to that of `data-src`. 3.86% of pages use the `` attribute. This appears to be growing very fast, as back in February, this number was about [0.8%](https://twitter.com/zcorpan/status/1237016679667970050). It's possible that these are being used together for a [cross-browser solution](https://addyosmani.com/blog/lazy-loading/). +Interestingly, the use of native lazy loading on images is similar to that of `data-src`. [3.86% of pages](https://docs.google.com/spreadsheets/d/1ram47FshAjzvbQVJbAQPgxZN7PPOPCKIK67VJZCo92c/edit#gid=2109061092) use the `` attribute. This appears to be growing very fast, as back in February, this number was about [0.8%](https://twitter.com/zcorpan/status/1237016679667970050). It's possible that these are being used together for a [cross-browser solution](https://addyosmani.com/blog/lazy-loading/). ## Miscellaneous @@ -549,17 +975,49 @@ Users should be able to zoom and scale the text [up to 500%](https://dequeuniver We had a look at the data and in order to better understand the results, we normalized it by removing spaces, converting everything to lowercase, and sorting by comma values of the `content` attribute. -
-| Content attribute value | Occurrences | Pages (%) | -|---|---|---| -| `initial-scale=1,width=device-width` | 2,728,491 | 42.98% | -| blank | 688,293 | 10,84% | -| `initial-scale=1,maximum-scale=1,width=device-width` | 373,136 | 5.88% | -| `initial-scale=1,maximum-scale=1,user-scalable=no,width=device-width` | 352,972 | 5.56% | -| `initial-scale=1,maximum-scale=1,user-scalable=0,width=device-width` | 249,662 | 3.93% | -| `width=device-width` | 231,668 | 3.65% | - -
{{ figure_link(caption="viewport specifications, and lack thereof.", sheets_gid="1414206386", sql_file="summary_pages_by_device_and_viewport.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Content attribute valuePagesPages (%)
initial-scale=1,width=device-width2,728,49142.98%
blank688,29310,84%
initial-scale=1,maximum-scale=1,width=device-width373,1365.88%
initial-scale=1,maximum-scale=1,user-scalable=no,width=device-width352,9725.56%
initial-scale=1,maximum-scale=1,user-scalable=0,width=device-width249,6623.93%
width=device-width231,6683.65%
+
{{ figure_link(caption="viewport specifications, and lack thereof.", sheets_gid="1414206386", sql_file="summary_pages_by_device_and_viewport.sql") }}
The results show that almost half of the pages we analyzed are using the typical viewport `content` value. Still, around 10% of mobile pages are entirely missing a proper `content` value for the viewport meta element, with the rest of them using an improper combination of `maximum-scale`, `minimum-scale`, `user-scalable=no`, or `user-scalable=0`. @@ -574,20 +1032,64 @@ The situation around favicons is fascinating. Favicons work with or without mark When we built our tests we didn't check for the presence of images, but only looked at the markup. That means, when you review the following, note that it's more about _how_ favicons are referenced rather than whether or how often they are used. -
-| Favicon format | Occurrences | Pages (%) | -|---|---|---| -| ICO | 2,245,646 | 35.38% | -| PNG | 1,966,530 | 30.98% | -| No favicon defined | 1,643,136 | 25.88% | -| JPG | 319,935 | 5.04% | -| No extension specified (no format identifiable) | 37,011 | 0.58% | -| GIF | 34,559 | 0.54% | -| WebP | 10,605 | 0.17% | -| … | | | -| SVG | 5,328 | 0.08% | - -
{{ figure_link(caption="Common favicon formats.", sheets_gid="1930085905", sql_file="pages_almanac_by_device_and_favicon_image_type.sql") }}
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Favicon formatPagesPages (%)
ICO2,245,64635.38%
PNG1,966,53030.98%
No favicon defined1,643,13625.88%
JPG319,9355.04%
No extension specified (no format identifiable)37,0110.58%
GIF34,5590.54%
WebP10,6050.17%
SVG5,3280.08%
+
{{ figure_link(caption="Common favicon formats.", sheets_gid="1930085905", sql_file="pages_almanac_by_device_and_favicon_image_type.sql") }}
There are a couple of surprises in here: @@ -608,33 +1110,90 @@ There has been a lot of [discussion](https://adrianroselli.com/2016/01/links-but sql_file="pages_markup_by_device.sql" ) }} -{# TODO(analysts): Where do these "occurrences" come from? Ideally we have a single sheet to link to with the results used by this table. #} -
-| Button types | Occurrences | Percentage | -|---|---|---| -| `