Merge pull request #323 from extractus/7.2.6

v7.2.6 - Migrate to extractus org
extractus · Nov 30, 2022 · f31c80f · f31c80f
2 parents 8acc3d3 + edfcc1d
commit f31c80f
Show file tree

Hide file tree

Showing 21 changed files with 145 additions and 136 deletions.
diff --git a/.github/workflows/ci-test.yml b/.github/workflows/ci-test.yml
@@ -29,8 +29,8 @@ jobs:
         npm run build --if-present
         npm run test
 
-    - name: sync to coveralls
-      uses: coverallsapp/github-action@v1.1.2
+    - name: Coveralls GitHub Action
+      uses: coverallsapp/github-action@1.1.3
       with:
         github-token: ${{ secrets.GITHUB_TOKEN }}
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,20 +1,24 @@
-# Contributing to article-parser
+# Contributing to `@extractus/article-extractor`
 
-While `article-parser` is just a simple library with personal purpose, I'm happy if it can be useful for you too.
+Glad to see you here.
 
-Anyway, I hope it always gets better, so pull requests are welcome, though larger proposals should be discussed first.
+Collaborations and pull requests are always welcomed, though larger proposals should be discussed first.
 
-As an OSS, it should follow the Unix philosophy: "do one thing and do it well".
+As an OSS, it's better to follow the Unix philosophy: "do one thing and do it well".
 
-## Installation
+## What you can contribute?
 
-- Ensure you have `node` and `npm` installed.
-- After cloning the repository, run `npm install` in the root of the repository.
-- Run `npm test` to ensure that everything works correctly in your environment.
+We are planing to re-write this tool in TypeScript and make it Deno-first library.
+If you are interested, please join our team.
 
-If it works well, you can start modifying your fork.
+Besides that, you can also:
 
-In this process, you can use [`npm run eval` command](https://github.com/ndaidong/article-parser#quick-evaluation) to evaluate your changes.
+- Test and report bugs
+- Fix unresolved issues
+- Improve performance
+- Write better documentation
+- Create examples or build demos
+- Feedback on software design and APIs
 
 
 ## Third-party libraries
@@ -32,7 +36,7 @@ Make sure your code lints before opening a pull request.
 
 
 ```bash
-cd article-parser
+cd article-extractor
 
 # check coding convention issue
 npm run lint
@@ -49,18 +53,18 @@ npm run lint:fix
 Be sure to run the unit test suite before opening a pull request. An example test run is shown below.
 
 ```bash
-cd article-parser
+cd article-extractor
 npm test
 ```
 
-![feed-reader unit test](https://i.imgur.com/1ycj7Ks.png)
+![article-extractor unit test](https://i.imgur.com/1ycj7Ks.png)
 
 If test coverage decreased, please check test scripts and try to improve this number.
 
 
 ## Documentation
 
-If you've changed APIs, please update README and [the examples](https://github.com/ndaidong/article-parser/tree/main/examples).
+If you've changed APIs, please update README and [the examples](examples).
 
 
 ## Clean commit histories
@@ -79,6 +83,6 @@ For people new to git, please refer the following guides:
 
 ## License
 
-By contributing to `article-parser`, you agree that your contributions will be licensed under its [MIT license](https://github.com/ndaidong/article-parser/blob/main/LICENSE).
+By contributing to `@extractus/article-extractor`, you agree that your contributions will be licensed under its [MIT license](LICENSE).
 
 ---
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 The MIT License (MIT)
 
-Copyright (c) 2016 Dong Nguyen
+Copyright (c) 2016 Extractus
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -1,35 +1,27 @@
-# article-parser
+# @extractus/article-extractor
 
 Extract main article, main image and meta data from URL.
 
-[![NPM](https://badge.fury.io/js/article-parser.svg)](https://badge.fury.io/js/article-parser)
-![CI test](https://github.com/ndaidong/article-parser/workflows/ci-test/badge.svg)
-[![Coverage Status](https://coveralls.io/repos/github/ndaidong/article-parser/badge.svg)](https://coveralls.io/github/ndaidong/article-parser)
-![CodeQL](https://github.com/ndaidong/article-parser/workflows/CodeQL/badge.svg)
+[![npm version](https://badge.fury.io/js/@extractus%2Farticle-extractor.svg)](https://badge.fury.io/js/@extractus%2Farticle-extractor)
+![CI test](https://github.com/extractus/article-extractor/workflows/ci-test/badge.svg)
+[![Coverage Status](https://img.shields.io/coveralls/github/extractus/article-extractor)](https://coveralls.io/github/extractus/article-extractor?branch=main)
+![CodeQL](https://github.com/extractus/article-extractor/workflows/CodeQL/badge.svg)
 [![JavaScript Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://standardjs.com)
 
 
 ## Intro
 
-*article-parser* is a part of tool sets for content builder:
+*article-extractor* is a part of tool sets for content builder:
 
-- [feed-reader](https://github.com/ndaidong/feed-reader): extract & normalize RSS/ATOM/JSON feed
-- [article-parser](https://github.com/ndaidong/article-parser): extract main article from given URL
-- [oembed-parser](https://github.com/ndaidong/oembed-parser): extract oEmbed data from supported providers
+- [feed-extractor](https://github.com/extractus/feed-extractor): extract & normalize RSS/ATOM/JSON feed
+- [article-extractor](https://github.com/extractus/article-extractor): extract main article from given URL
+- [oembed-extractor](https://github.com/extractus/oembed-extractor): extract oEmbed data from supported providers
 
 You can use one or combination of these tools to build news sites, create automated content systems for marketing campaign or gather dataset for NLP projects...
 
-```
-                                    ┌────────────────┐
-                            ┌───────► article-parser ├──────────┐
-                            │       └────────────────┘          │
-┌─────────────┐   ┌─────────┴────┐                     ┌────────▼─────────┐   ┌─────────────┐
-│ feed-reader ├───► feed entries │                     │ content database ├───► public APIs │
-└─────────────┘   └─────────┬────┘                     └────────▲─────────┘   └─────────────┘
-                            │       ┌────────────────┐          │
-                            └───────► oembed-parser  ├──────────┘
-                                    └────────────────┘
-```
+### Attention
+
+`article-parser` has been renamed to `@extractus/article-extractor` since v7.2.5
 
 ## Demo
 
@@ -42,39 +34,43 @@ You can use one or combination of these tools to build news sites, create automa
 ### Node.js
 
 ```bash
-npm i article-parser
+npm i @extractus/article-extractor
 
 # pnpm
-pnpm i article-parser
+pnpm i @extractus/article-extractor
 
 # yarn
-yarn add article-parser
+yarn add @extractus/article-extractor
 ```
 
 ```ts
 // es6 module
-import { extract } from 'article-parser'
+import { extract } from '@extractus/article-extractor'
 
 // CommonJS
-const { extract } = require('article-parser')
+const { extract } = require('@extractus/article-extractor')
 
 // or specify exactly path to CommonJS variant
-const { extract } = require('article-parser/dist/cjs/article-parser.js')
+const { extract } = require('@extractus/article-extractor/dist/cjs/article-extractor.js')
 ```
 
 ### Deno
 
 ```ts
-import { extract } from 'https://esm.sh/article-parser'
+// deno > 1.28
+import { extract } from 'npm:@extractus/article-extractor'
+
+// deno < 1.28
+// import { extract } from 'https://esm.sh/@extractus/article-extractor'
 ```
 
 ### Browser
 
 ```ts
-import { extract } from 'https://unpkg.com/article-parser@latest/dist/article-parser.esm.js'
+import { read } from 'https://unpkg.com/@extractus/article-extractor@latest/dist/article-extractor.esm.js'
 ```
 
-Please check [the examples](https://github.com/ndaidong/article-parser/tree/main/examples) for reference.
+Please check [the examples](examples) for reference.
 
 
 ### Deta cloud
@@ -117,7 +113,7 @@ URL string links to the article or HTML content of that web page.
 For example:
 
 ```js
-import { extract } from 'article-parser'
+import { extract } from '@extractus/article-extractor'
 
 const input = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'
 extract(input)
@@ -157,12 +153,14 @@ Object with all or several of the following properties:
 For example:
 
 ```js
-import { extract } from 'article-parser'
+import { extract } from '@extractus/article-extractor'
 
-extract('https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html', {
+const article = await extract('https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html', {
   descriptionLengthThreshold: 120,
   contentLengthThreshold: 500
 })
+
+console.log(article)
 ```
 
 ##### `fetchOptions` *optional*
@@ -172,26 +170,28 @@ You can use this param to set request headers to [fetch](https://developer.mozil
 For example:
 
 ```js
-import { extract } from 'article-parser'
+import { extract } from '@extractus/article-extractor'
 
 const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'
-extract(url, null, {
+const article = await extract(url, null, {
   headers: {
     'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
   }
 })
+
+console.log(article)
 ```
 
 You can also specify a proxy endpoint to load remote content, instead of fetching directly.
 
 For example:
 
 ```js
-import { extract } from 'article-parser'
+import { extract } from '@extractus/article-extractor'
 
 const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'
 
-extract(url, null, {
+await extract(url, null, {
   headers: {
     'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
   },
@@ -204,7 +204,7 @@ extract(url, null, {
 })
 ```
 
-Passing requests to proxy is useful while running `article-parser` on browser. View [examples/browser-article-parser](https://github.com/ndaidong/article-parser/tree/main/examples/browser-article-parser) as reference example.
+Passing requests to proxy is useful while running `@extractus/article-extractor` on browser. View [examples/browser-article-parser](examples/browser-article-parser) as reference example.
 
 For more info about proxy authentication, please refer [HTTP authentication](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication)
 
@@ -227,7 +227,7 @@ At first, let's talk about `transformation` object.
 
 #### `transformation` object
 
-In `article-parser`, `transformation` is an object with the following properties:
+In `@extractus/article-extractor`, `transformation` is an object with the following properties:
 
 - `patterns`: required, a list of regexps to match the URLs
 - `pre`: optional, a function to process raw HTML
@@ -240,11 +240,11 @@ Basically, the meaning of `transformation` can be interpreted like this:
 > then extract main article content with normalized HTML, and if success <br>
 > let's run `post` function to normalize extracted article content
 
-![article-parser extraction process](https://res.cloudinary.com/pwshub/image/upload/v1657336822/documentation/article-parser_extraction_process.png)
+![article-extractor extraction process](https://res.cloudinary.com/pwshub/image/upload/v1657336822/documentation/article-parser_extraction_process.png)
 
 Here is an example transformation:
 
-```js
+```ts
 {
   patterns: [
     /([\w]+.)?domain.tld\/*/,
@@ -288,8 +288,8 @@ Here is an example transformation:
 
 Add a single transformation or a list of transformations. For example:
 
-```js
-import { addTransformations } from 'article-parser'
+```ts
+import { addTransformations } from '@extractus/article-extractor'
 
 addTransformations({
   patterns: [
@@ -344,7 +344,7 @@ To remove transformations that match the specific patterns.
 For example, we can remove all added transformations above:
 
 ```js
-import { removeTransformations } from 'article-parser'
+import { removeTransformations } from '@extractus/article-extractor'
 
 removeTransformations([
   /([\w]+.)?abc.tld\/*/,
@@ -384,21 +384,21 @@ Suppose that we have the following transformations:
 
 As you can see, an article from `goo.gl` certainly matches both them.
 
-In this scenario, `article-parser` will execute both transformations, one by one:
+In this scenario, `@extractus/article-extractor` will execute both transformations, one by one:
 
 `function_one` -> `function_three` -> extraction -> `function_two` -> `function_four`
 
 ---
 
 ### `sanitize-html`'s options
 
-`article-parser` uses [sanitize-html](https://www.npmjs.com/package/sanitize-html) to make a clean sweep of HTML content.
+`@extractus/article-extractor` uses [sanitize-html](https://www.npmjs.com/package/sanitize-html) to make a clean sweep of HTML content.
 
-Here is the [default options](https://github.com/ndaidong/article-parser/blob/main/src/config.js#L5)
+Here is the [default options](src/config.js#L5)
 
 Depending on the needs of your content system, you might want to gather some HTML tags/attributes, while ignoring others.
 
-There are 2 methods to access and modify these options in `article-parser`.
+There are 2 methods to access and modify these options in `@extractus/article-extractor`.
 
 - `getSanitizeHtmlOptions()`
 - `setSanitizeHtmlOptions(Object sanitizeHtmlOptions)`
@@ -410,8 +410,8 @@ Read [sanitize-html](https://www.npmjs.com/package/sanitize-html#what-are-the-de
 ## Quick evaluation
 
 ```bash
-git clone https://github.com/ndaidong/article-parser.git
-cd article-parser
+git clone https://github.com/extractus/article-extractor.git
+cd article-extractor
 pnpm i
 
 npm run eval {URL_TO_PARSE_ARTICLE}

diff --git a/SECURITY.md b/SECURITY.md
@@ -12,6 +12,6 @@ Description above is a general rule and may be altered on case by case basis.
 
 You can report low severity vulnerabilities as GitHub issues.
 
-More severe vulnerabilities should be reported to my email [email protected] or Twitter [@ndaidong](https://twitter.com/ndaidong).
+More severe vulnerabilities should be reported to email extractus.security@skiff.com.
 
 ---
diff --git a/build.js b/build.js
@@ -38,7 +38,7 @@ const esmVersion = {
   ...baseOpt,
   platform: 'browser',
   format: 'esm',
-  outfile: `dist/${pkg.name}.esm.js`,
+  outfile: 'dist/article-extractor.esm.js',
   banner: {
     js: comment
   }
@@ -50,7 +50,7 @@ const cjsVersion = {
   platform: 'node',
   format: 'cjs',
   mainFields: ['main'],
-  outfile: `dist/cjs/${pkg.name}.js`,
+  outfile: 'dist/cjs/article-extractor.js',
   banner: {
     js: comment
   }
@@ -60,7 +60,7 @@ buildSync(cjsVersion)
 const cjspkg = {
   name: pkg.name,
   version: pkg.version,
-  main: `./${pkg.name}.js`
+  main: './article-extractor.js'
 }
 
 writeFileSync(

diff --git a/build.test.js b/build.test.js
@@ -9,8 +9,8 @@ import {
 
 const pkg = JSON.parse(readFileSync('./package.json'))
 
-const esmFile = `./dist/${pkg.name}.esm.js`
-const cjsFile = `./dist/cjs/${pkg.name}.js`
+const esmFile = './dist/article-extractor.esm.js'
+const cjsFile = './dist/cjs/article-extractor.js'
 const cjsPkg = JSON.parse(readFileSync('./dist/cjs/package.json'))
 const cjsType = './dist/cjs/index.d.ts'
 

diff --git a/dist/article-parser.esm.js → dist/article-extractor.esm.js b/dist/article-parser.esm.js → dist/article-extractor.esm.js
diff --git a/dist/cjs/article-parser.js → dist/cjs/article-extractor.js b/dist/cjs/article-parser.js → dist/cjs/article-extractor.js
diff --git a/dist/cjs/package.json b/dist/cjs/package.json
@@ -1,5 +1,5 @@
 {
-  "name": "article-parser",
-  "version": "7.2.5",
-  "main": "./article-parser.js"
+  "name": "@extractus/article-extractor",
+  "version": "7.2.6",
+  "main": "./article-extractor.js"
 }