Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podcast script generation component #15

Merged
merged 32 commits into from
Nov 25, 2024

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Nov 21, 2024

What's changing

Added new text_to_podcast module.
Updated demo/app.py to be able to try different OLMoE quatitized versions and system prompts.

How to test it

Create a new Codespace using the New with options:

Select this branch (AH-104-Initial-Podcast-Script-Generation-Component).

Inside the codespace, run:

pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
python -m streamlit run demo/app.py

@daavoo daavoo linked an issue Nov 21, 2024 that may be closed by this pull request
@daavoo daavoo self-assigned this Nov 21, 2024
@daavoo daavoo marked this pull request as ready for review November 21, 2024 14:29
Copy link
Contributor

@Kostis-S-Z Kostis-S-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I do like the API docs, but I guess we can take that at a later stage. Maybe we can create an item for documentation specifically
  • I would be happy with simple smoke tests like
assert isinstantce(load_model(...), Llama)
assert isinstantce(text_to_podcast(...), str)

but again, we can take it later if we feel like we are in a rush for an MVP.

Copy link
Contributor

@Kostis-S-Z Kostis-S-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ll approve it cause there is nothing blocking here so feel free to leave the docs&tests for a later or the current PR.

pedro

@stefanfrench
Copy link
Contributor

stefanfrench commented Nov 21, 2024

Thanks @daavoo. This is looking good and worked smoothy in Codespaces!

A couple notes from my side before approval:

  • It would be good if there was a clear/easy way for the developer to input an alternative hf model (both in the backend as a parameter and also in the app as free-text). We should still keep the OLMOE model q8 as the default option. do you see any issues with this
  • I think we should try to set the text limit from the input doc in a bit of a more rigorous way, e.g. doing it based on the actual number of real tokens as you suggest in your comment. If you prefer we can do this as a separate issue
  • I think we should do some more experimentation to find a good default prompt to make the script generated feel more natural. I think the current one will feel very robotic once it is audio generated. If you prefer, we can create a seperate issue for this as well. (also - happy to support on this prompt experimentation is its quite time consuming!)

demo/app.py Show resolved Hide resolved
demo/app.py Show resolved Hide resolved
demo/app.py Outdated
st.write(clean_text[:200])
st.text_area(f"Total Length: {len(clean_text)}", f"{clean_text[:500]} . . .")

# I set this value as a quick safeguard but we should actually tokenize the text and count the number of real tokens.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanfrench :

I think we should try to set the text limit from the input doc in a bit of a more rigorous way, e.g. doing it based on the actual number of real tokens as you suggest in your comment. If you prefer we can do this as a separate issue

I am currently looking into this. I am trying to find a way to use the llama_cpp API to don't waste the tokenization call just for the sake of filtering.
If I don't find an easy solution today, maybe we can consider it a follow-up to not block the full PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daavoo - sounds good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try (to encode first) but the code became way more complicated.
However, it seems that people consider 1 token ~= 4 characters a common default and it is the value used when trying to estimate token consumption without expending calls.

So, I updated the code to use this 4 approximation and made a small improvement to use the context lenght associated to each model (before I was too lazy and just hardcoded the number to 4096 as it was the value for OLMoE)

@daavoo daavoo requested a review from Kostis-S-Z November 25, 2024 10:03
Copy link
Contributor

@Kostis-S-Z Kostis-S-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the tests, all LGTM!

@daavoo daavoo merged commit 3fc7ff1 into main Nov 25, 2024
1 check passed
@daavoo daavoo deleted the AH-104-Initial-Podcast-Script-Generation-Component branch November 25, 2024 12:54
stefanfrench pushed a commit that referenced this pull request Nov 27, 2024
* Add devcontainer and requirements

* Add pyproject.toml

* Add data_loaders and tests

* Add data_cleaners and tests

* Update demo

* Add `LOADERS` and `CLEANERS`

* Add markdown and docx

* Add API Reference

* Update tests

* Update install

* Add initial scripts

* More tests

* fix merge

* Add podcast writing to demo/app

* Add missing deps

* Add text_to_podcast module

* Expose model options and prompt tuning in the app

* pre-commit

* Strip system_prompt

* Rename to inference module. Add docstrings

* pre-commit

* Add CURATED_REPOS

* JSON prompt

* Update API docs

* Fix format

* Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default.

* Add inference tests

* Drop __init__ imports

* Fix outdated arg

* Drop redundant JSON output in prompt

* Update default stop
stefanfrench pushed a commit that referenced this pull request Dec 2, 2024
* Add devcontainer and requirements

* Add pyproject.toml

* Add data_loaders and tests

* Add data_cleaners and tests

* Update demo

* Add `LOADERS` and `CLEANERS`

* Add markdown and docx

* Add API Reference

* Update tests

* Update install

* Add initial scripts

* More tests

* fix merge

* Add podcast writing to demo/app

* Add missing deps

* Add text_to_podcast module

* Expose model options and prompt tuning in the app

* pre-commit

* Strip system_prompt

* Rename to inference module. Add docstrings

* pre-commit

* Add CURATED_REPOS

* JSON prompt

* Update API docs

* Fix format

* Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default.

* Add inference tests

* Drop __init__ imports

* Fix outdated arg

* Drop redundant JSON output in prompt

* Update default stop
daavoo added a commit that referenced this pull request Dec 3, 2024
* update read.me with guidance docs initial draft

* minor update to read.me

* podcast script generation component (#15)

* Add devcontainer and requirements

* Add pyproject.toml

* Add data_loaders and tests

* Add data_cleaners and tests

* Update demo

* Add `LOADERS` and `CLEANERS`

* Add markdown and docx

* Add API Reference

* Update tests

* Update install

* Add initial scripts

* More tests

* fix merge

* Add podcast writing to demo/app

* Add missing deps

* Add text_to_podcast module

* Expose model options and prompt tuning in the app

* pre-commit

* Strip system_prompt

* Rename to inference module. Add docstrings

* pre-commit

* Add CURATED_REPOS

* JSON prompt

* Update API docs

* Fix format

* Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default.

* Add inference tests

* Drop __init__ imports

* Fix outdated arg

* Drop redundant JSON output in prompt

* Update default stop

* updates to read.me to simplify down and add diagram

* update read.me with guidance docs initial draft

* minor update to read.me

* updates to read.me to simplify down and add diagram

* updated docs and added new pages and assets

* Changes to the docs files

* deleting contributing.md from docs

* Add tests workflow (#18)

* Add new `tests` workflow

* Use pip cache

* Unify env setup. Drop UV in favor of setup-python

* Update tests

* podcast script generation component (#15)

* Add devcontainer and requirements

* Add pyproject.toml

* Add data_loaders and tests

* Add data_cleaners and tests

* Update demo

* Add `LOADERS` and `CLEANERS`

* Add markdown and docx

* Add API Reference

* Update tests

* Update install

* Add initial scripts

* More tests

* fix merge

* Add podcast writing to demo/app

* Add missing deps

* Add text_to_podcast module

* Expose model options and prompt tuning in the app

* pre-commit

* Strip system_prompt

* Rename to inference module. Add docstrings

* pre-commit

* Add CURATED_REPOS

* JSON prompt

* Update API docs

* Fix format

* Make text cutoff based on `model.n_ctx()`. Consider ~4 characters per token as a resonable default.

* Add inference tests

* Drop __init__ imports

* Fix outdated arg

* Drop redundant JSON output in prompt

* Update default stop

* pre commit checks

* Apply suggestions from code review

* Update docs/step-by-step-guide.md

* lint

* changes based on peer reviews

* changes based pre commit checks

---------

Co-authored-by: David de la Iglesia Castro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial Podcast Script Generation Component
3 participants