-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podcast script generation component #15
podcast script generation component #15
Conversation
…t-Script-Generation-Component
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I do like the API docs, but I guess we can take that at a later stage. Maybe we can create an item for documentation specifically
- I would be happy with simple smoke tests like
assert isinstantce(load_model(...), Llama)
assert isinstantce(text_to_podcast(...), str)
but again, we can take it later if we feel like we are in a rush for an MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @daavoo. This is looking good and worked smoothy in Codespaces! A couple notes from my side before approval:
|
demo/app.py
Outdated
st.write(clean_text[:200]) | ||
st.text_area(f"Total Length: {len(clean_text)}", f"{clean_text[:500]} . . .") | ||
|
||
# I set this value as a quick safeguard but we should actually tokenize the text and count the number of real tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to set the text limit from the input doc in a bit of a more rigorous way, e.g. doing it based on the actual number of real tokens as you suggest in your comment. If you prefer we can do this as a separate issue
I am currently looking into this. I am trying to find a way to use the llama_cpp API to don't waste the tokenization call just for the sake of filtering.
If I don't find an easy solution today, maybe we can consider it a follow-up to not block the full PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daavoo - sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a try (to encode first) but the code became way more complicated.
However, it seems that people consider 1 token ~= 4 characters a common default and it is the value used when trying to estimate token consumption without expending calls.
So, I updated the code to use this 4 approximation and made a small improvement to use the context lenght
associated to each model (before I was too lazy and just hardcoded the number to 4096 as it was the value for OLMoE)
… token as a resonable default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the tests, all LGTM!
* Add devcontainer and requirements * Add pyproject.toml * Add data_loaders and tests * Add data_cleaners and tests * Update demo * Add `LOADERS` and `CLEANERS` * Add markdown and docx * Add API Reference * Update tests * Update install * Add initial scripts * More tests * fix merge * Add podcast writing to demo/app * Add missing deps * Add text_to_podcast module * Expose model options and prompt tuning in the app * pre-commit * Strip system_prompt * Rename to inference module. Add docstrings * pre-commit * Add CURATED_REPOS * JSON prompt * Update API docs * Fix format * Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default. * Add inference tests * Drop __init__ imports * Fix outdated arg * Drop redundant JSON output in prompt * Update default stop
* Add devcontainer and requirements * Add pyproject.toml * Add data_loaders and tests * Add data_cleaners and tests * Update demo * Add `LOADERS` and `CLEANERS` * Add markdown and docx * Add API Reference * Update tests * Update install * Add initial scripts * More tests * fix merge * Add podcast writing to demo/app * Add missing deps * Add text_to_podcast module * Expose model options and prompt tuning in the app * pre-commit * Strip system_prompt * Rename to inference module. Add docstrings * pre-commit * Add CURATED_REPOS * JSON prompt * Update API docs * Fix format * Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default. * Add inference tests * Drop __init__ imports * Fix outdated arg * Drop redundant JSON output in prompt * Update default stop
* update read.me with guidance docs initial draft * minor update to read.me * podcast script generation component (#15) * Add devcontainer and requirements * Add pyproject.toml * Add data_loaders and tests * Add data_cleaners and tests * Update demo * Add `LOADERS` and `CLEANERS` * Add markdown and docx * Add API Reference * Update tests * Update install * Add initial scripts * More tests * fix merge * Add podcast writing to demo/app * Add missing deps * Add text_to_podcast module * Expose model options and prompt tuning in the app * pre-commit * Strip system_prompt * Rename to inference module. Add docstrings * pre-commit * Add CURATED_REPOS * JSON prompt * Update API docs * Fix format * Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default. * Add inference tests * Drop __init__ imports * Fix outdated arg * Drop redundant JSON output in prompt * Update default stop * updates to read.me to simplify down and add diagram * update read.me with guidance docs initial draft * minor update to read.me * updates to read.me to simplify down and add diagram * updated docs and added new pages and assets * Changes to the docs files * deleting contributing.md from docs * Add tests workflow (#18) * Add new `tests` workflow * Use pip cache * Unify env setup. Drop UV in favor of setup-python * Update tests * podcast script generation component (#15) * Add devcontainer and requirements * Add pyproject.toml * Add data_loaders and tests * Add data_cleaners and tests * Update demo * Add `LOADERS` and `CLEANERS` * Add markdown and docx * Add API Reference * Update tests * Update install * Add initial scripts * More tests * fix merge * Add podcast writing to demo/app * Add missing deps * Add text_to_podcast module * Expose model options and prompt tuning in the app * pre-commit * Strip system_prompt * Rename to inference module. Add docstrings * pre-commit * Add CURATED_REPOS * JSON prompt * Update API docs * Fix format * Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default. * Add inference tests * Drop __init__ imports * Fix outdated arg * Drop redundant JSON output in prompt * Update default stop * pre commit checks * Apply suggestions from code review * Update docs/step-by-step-guide.md * lint * changes based on peer reviews * changes based pre commit checks --------- Co-authored-by: David de la Iglesia Castro <[email protected]>
What's changing
Added new
text_to_podcast
module.Updated
demo/app.py
to be able to try different OLMoE quatitized versions and system prompts.How to test it
Create a new Codespace using the
New with options
:Select this branch (
AH-104-Initial-Podcast-Script-Generation-Component
).Inside the codespace, run: