Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

Ability to determine String replacements for article titles #3

Open
jiminoc opened this issue Jan 29, 2011 · 2 comments
Open

Ability to determine String replacements for article titles #3

jiminoc opened this issue Jan 29, 2011 · 2 comments

Comments

@jiminoc
Copy link
Contributor

jiminoc commented Jan 29, 2011

Goose needs a way to filter out janky article titles where the title may have multiple delimiters such as
Breaking News: KCAL05: This just in - some guy won a million bucks

It gets confusing where to separate the titles from the prefix. It would be nice to have a text file that you can add special cases to where you can put in the text to replace with blanks
example:

domain replace
kcal9.com Breaking News: KCAL05:

@jiminoc
Copy link
Contributor Author

jiminoc commented Feb 23, 2012

also able to add a custom delimiter per domain

@shakiba
Copy link

shakiba commented Mar 23, 2012

I have just hit this problem. One solution could be to use html-title just as hint to find the exact tag that contains title. For example a tag which its content is included in html-title but is shorter or equal length.

raisercostin pushed a commit to raisercostin/goose that referenced this issue Jul 7, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants