Extracts key document metadata from an HTML string.
const parser = require('magnet-html-parser');
parser.parse(html, url).then(result => ...);
The parser will attempt to return the following metadata:
title
- A title of the pagedescription
- A description of the pageimage
- An image that best represents this pageicon
- The site's iconsiteName
- The name of the site which this page belongsthemeColor
- A theme color of the site
What metadata is returned completely depends on how the page is marked-up. We cannot guarantee any results.
If the given URL has a hash fragment extension, magnet-html-parser
will
attempt to fetch an element with the given ID and return a title
, description
and image
from that scope.
title
- ThetextContent
of the first heading (<h1>
,<h2>
,<h3>
, ...) descendant of the fragment elementdescription
- ThetextContent
of the first<p>
descendant of the fragment elementimage
- Thesrc
of the first<img>
descendant of the fragment element.scoped
- Will betrue
when content was found within the given fragment scope.
An adaptor is a custom parser that performs some specific metadata extraction for a matched URL pattern. The library comes with some built in adaptors to cope with some common sites, but you are able to add more.
const parser = require('magnet-html-parser');
const adaptor = parser.adaptor;
parser.use(adaptor.extend({
pattern: /example.com/,
title() {
var mainHeading = $('h1').first();
return mainHeading.text();
},
description() {
var firstParagraph = $('p').first();
return firstParagraph.text();
},
image($, data) {
return $('.main-image').attr('src');
}
...
}));
As you can 'extending' the base adaptor, you only need to define the properties you wish to customise, all other properties will be parsed using the default logic.