Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Content HTML #17

Closed
carletex opened this issue Jul 21, 2023 · 6 comments · Fixed by #23
Closed

Clean up Content HTML #17

carletex opened this issue Jul 21, 2023 · 6 comments · Fixed by #23
Assignees

Comments

@carletex
Copy link
Collaborator

It's on seed.ts (in the future it will be a util that we'll use to clean data before updating the database with contributions)

Let's discuss what we should clean in the HTML that we are importing. We for sure want to leave all the HTML tags (p, ul, etc) but maybe we want to remove some stuff: some attributes? buttons? should we keep all links?

We should also have a basic styling for the HTML (tables, p, ul list, etc)

cc @amy-jung

@damianmarti damianmarti self-assigned this Jul 28, 2023
@damianmarti
Copy link
Collaborator

I think we can start implementing a default sanitizer using https://github.com/apostrophecms/sanitize-html and then we can change the configuration to only have the tags and attributes we want, but at least in the meantime, we avoid things like XSS.

@carletex
Copy link
Collaborator Author

Sounds great! @damianmarti

It's a good start.

@carletex
Copy link
Collaborator Author

carletex commented Aug 4, 2023

@damianmarti Also, I think we can remove the the "style" attr (The most important one) and maybe "class" too.

What do you think?

@damianmarti
Copy link
Collaborator

Let's index all the content from each HTML document, show a summary (or a small text around the searched term) for each document at the search result and then link to the original content. Something like Google does.

Sounds good to you?

@damianmarti
Copy link
Collaborator

Sorry @carletex , the previous comment was more related to #22 too. I was replying here, but thinking about this...

@damianmarti
Copy link
Collaborator

@damianmarti Also, I think we can remove the the "style" attr (The most important one) and maybe "class" too.

What do you think?

Yes, we can remove all the styling attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants