GitHub - arezki1990/node-osmosis: Web scraper for NodeJS

#Osmosis

HTML/XML parser and web scraper for NodeJS.

##Features

Fast: uses libxml C bindings
Lightweight: no dependencies like jQuery, cheerio, or jsdom
Clean: promise based interface- no more nested callbacks
Flexible: supports both CSS and XPath selectors
Predictable: same input, same output, same order
Detailed logging for every step
Precise and natural IO flow- no setTimeout or process.nextTick
Easy debugging with built-in stack size and memory usage reporting
Memory leak free

##Example: scrape all craigslist listings

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites') 
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.find('p > a', '.totallink + a.button.next:first')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images[]':     'img@src'
})
.data(function(listing) {
    // do something with listing data
})

##Install

npm install osmosis

##Documentation

For documentation and examples check out https://github.com/rc0x03/node-osmosis/wiki

##Dependencies

libxmljs - libxml C bindings
needle - Lightweight HTTP wrapper

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
lib		lib
.gitignore		.gitignore
.npmignore		.npmignore
Readme.md		Readme.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

arezki1990/node-osmosis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages