Skip to content

asaladino/site-content

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Site Content

Extracts content from a bunch saved html documents. Should be used with site-index.

To start the content extraction, run:

node ./lib/site-index.js --domain "codingsimply.com" --verbose --html --type crawl --output "/some/reports"
node ./lib/site-content.js --domain "codingsimply.com" --verbose --output "/some/reports"

Domain and output folder are required parameters. To see a list of parameters, run

node ./lib/site-content.js --help

Site Content

  Extracts content from a bunch of urls. 

Options

  --domain www.domain.com   (Required) Domain to extract contents.       
  --output file             (Required) Folder to output the contents to. 
  --verbose                 Output information on the reporting.         
  --webSearch               Start a web server for search.               
  --help                    Print this usage guide. 

About

Extracts content from a saved html document.

Resources

License

Stars

Watchers

Forks

Packages

No packages published