Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom stopword lists #338

Open
oxygala opened this issue Mar 2, 2023 · 4 comments
Open

Custom stopword lists #338

oxygala opened this issue Mar 2, 2023 · 4 comments
Labels
enhancement New feature or request processors Involves self-contained analyticalprocessors.

Comments

@oxygala
Copy link

oxygala commented Mar 2, 2023

Would you consider adding a feature that would enable users to use custom stopword lists (ie. in different languages) with tools that use them?

Thanks

@stijn-uva
Copy link
Member

Hi @oxygala , yes, I can see how that would be a useful feature. There are a couple of ways to go about this - which way of providing the stopword lists to the processor(s) would be most convenient from your perspective?

@stijn-uva stijn-uva added enhancement New feature or request processors Involves self-contained analyticalprocessors. labels Mar 2, 2023
@oxygala
Copy link
Author

oxygala commented Mar 3, 2023

I think a simple upload box under every relevant processor that uses word lists would do.

@stijn-uva
Copy link
Member

Both the 'Tokenise' and 'Filter by words or phrases' processors already allow providing custom word lists for filtering; 'Always delete this words' option for Tokenise and 'Custom word list' for the Filter. Is there a specific processor you would like to be able to do this with that I'm not thinking of?

@oxygala
Copy link
Author

oxygala commented Mar 28, 2023

Yes, so does hatebase, though words separated by commas may not be the most practical list, if you are dealing with a long list. Being able to upload a .txt or .csv would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processors Involves self-contained analyticalprocessors.
Projects
None yet
Development

No branches or pull requests

2 participants