Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Cleaning with 'markitdown' #66

Open
1 task done
stefanfrench opened this issue Dec 16, 2024 · 0 comments
Open
1 task done

Data Cleaning with 'markitdown' #66

stefanfrench opened this issue Dec 16, 2024 · 0 comments
Labels
data cleaning Related to the data cleaning module help wanted Extra attention is needed

Comments

@stefanfrench
Copy link
Contributor

Motivation

The data cleaning part of the document-to-podcast workflow could be made more robust, as currently it does not take into account all possible cases.

Alternatives

Have also considered: https://github.com/VikParuchuri/marker, which could be an interesting alternative to 'markitdown'.

Contribution

Re-implementing the data cleaning component to use markitdown

This would make the data cleaning component for robust, and potentially re-useable across many Blueprints.

There is potential to submit a PR related to updating the data cleaning compoenent to leverage markitdown

Have you searched for similar issues before submitting this one?

  • Yes, I have searched for similar issues
@stefanfrench stefanfrench added data cleaning Related to the data cleaning module help wanted Extra attention is needed labels Dec 16, 2024
@daavoo daavoo changed the title [FEATURE]: Data Cleaning with 'markitdown' Data Cleaning with 'markitdown' Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data cleaning Related to the data cleaning module help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant