diff --git a/README.md b/README.md index abc2d3a..0e27c83 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,100 @@ -Project Logo +[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) +

Project logo

-# Blueprint Title +# Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI -This blueprint guides you to ... +This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers. +It is designed to work on most local setups or with [GitHub Codespaces](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb), meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local. -![Blueprint Diagram](./images/blueprint-diagram.png) +### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/document-to-podcast/). +## Quick-start + +Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine. + +--- + +### **Option 1: GitHub Codespaces** + +The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces: + +[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) + +Once the Codespaces environment launches, follow these steps: + +1. **Install Dependencies** + Inside the Codespaces terminal, run: + ```bash + pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +2. **Run the Demo** + Inside the Codespaces terminal, start the Streamlit demo by running: + ```bash + python -m streamlit run demo/app.py + ``` + +### **Option 2: Local Installation** + +1. **Clone the Repository** + Inside the Codespaces terminal, run: + ```bash + git clone https://github.com/mozilla-ai/document-to-podcast.git + cd document-to-podcast + ``` + +2. **Install Dependencies** + Inside the terminal, run: + ```bash + pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +3. **Run the Demo** + Inside the terminal, start the Streamlit demo by running: + ```bash + python -m streamlit run demo/app.py + ``` + +## How it Works + + + + +1. **Document Upload** + Start by uploading a document in a supported format (e.g., PDF, .txt, or .docx). + +2. **Document Pre-Processing** + The uploaded document is processed to extract and clean the text. This involves: + - Extracting readable text from the document. + - Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured. + +3. **Script Generation** + The cleaned text is passed to a language model to generate a podcast transcript in the form of a conversation between two speakers. + - **Model Loading**: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups. + - **Customizable Prompt**: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format. + - **Output Transcript**: The model generates a podcast script in structured format, with each speaker's dialogue clearly labeled. + Example output: + ```json + { + "Speaker 1": "Welcome to the podcast on AI advancements.", + "Speaker 2": "Thank you! So what's new this week for the latest AI trends?", + "Speaker 1": "Where should I start.. Lots has been happening!", + ... + } + ``` + This step ensures that the podcast script is engaging, relevant, and ready for audio conversion. + +4. **Audio Generation** + - The generated transcript is converted into audio using a Text-to-Speech (TTS) model. + - Each speaker is assigned a distinct voice. + - The final output is saved as an audio file in formats like MP3 or WAV. ## Pre-requisites - **System requirements**: - OS: Windows, macOS, or Linux - Python 3.10 or higher - - Minimum RAM: 4 GB - - Disk space: 1 GB minimum + - Minimum RAM: 16 GB + - Disk space: 32 GB minimum - **Dependencies**: - - Dependencies listed in `requirements.txt` - -## Installation - ---- - -## Quick-start - ---- + - Dependencies listed in `pyproject.toml` ## License diff --git a/demo/app.py b/demo/app.py index cc4d875..95177fc 100644 --- a/demo/app.py +++ b/demo/app.py @@ -48,6 +48,7 @@ with col1: st.title("Raw Text") st.text_area(f"Total Length: {len(raw_text)}", f"{raw_text[:500]} . . .") + st.text_area(f"Total Length: {len(raw_text)}", f"{raw_text[:500]} . . .") clean_text = DATA_CLEANERS[extension](raw_text) with col2: @@ -89,3 +90,40 @@ if text.endswith("\n"): st.write(text) text = "" + st.text_area(f"Total Length: {len(clean_text)}", f"{clean_text[:500]} . . .") + + repo_name = st.selectbox("Select Repo", CURATED_REPOS) + model_name = st.selectbox( + "Select Model", + [ + x + for x in list_repo_files(repo_name) + if ".gguf" in x.lower() and ("q8" in x.lower() or "fp16" in x.lower()) + ], + index=None, + ) + if model_name: + with st.spinner("Downloading and Loading Model..."): + model = load_llama_cpp_model(model_id=f"{repo_name}/{model_name}") + + # ~4 characters per token is considered a reasonable default. + max_characters = model.n_ctx() * 4 + if len(clean_text) > max_characters: + st.warning( + f"Input text is too big ({len(clean_text)})." + f" Using only a subset of it ({max_characters})." + ) + clean_text = clean_text[:max_characters] + + system_prompt = st.text_area("Podcast generation prompt", value=PODCAST_PROMPT) + + if st.button("Generate Podcast Script"): + with st.spinner("Generating Podcast Script..."): + text = "" + for chunk in text_to_text_stream( + clean_text, model, system_prompt=system_prompt.strip() + ): + text += chunk + if text.endswith("\n"): + st.write(text) + text = "" diff --git a/docs/assets/custom.css b/docs/assets/custom.css new file mode 100644 index 0000000..c3dc2ff --- /dev/null +++ b/docs/assets/custom.css @@ -0,0 +1,7 @@ +@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap'); + +:root { + --md-default-font: "Inter", sans-serif; + --md-code-font: "Fira Code", monospace; + --md-primary-font: "Inter", sans-serif; +} diff --git a/docs/customization.md b/docs/customization.md new file mode 100644 index 0000000..92b68b3 --- /dev/null +++ b/docs/customization.md @@ -0,0 +1,89 @@ +# 🎨 **Customization Guide** + +The Document-to-Podcast Blueprint is designed to be flexible and easily adaptable to your specific needs. This guide will walk you through some key areas you can customize to make the Blueprint your own. + +--- + +## 🧠 **Changing the Text-to-Text Model** +You can swap the language model used for generating podcast scripts to suit your needs, such as using a smaller model for faster processing or a larger one for higher quality outputs. + +Customizing the app: + +1. Open the `app.py` file. +2. Locate the `load_text_to_text_model` function. +3. Replace the `model_id` with the ID of your desired model from a supported repository (e.g., Hugging Face). Note: The model repository must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF` + +Example: + +```python +@st.cache_resource +def load_text_to_text_model(): + return load_llama_cpp_model( + model_id="Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf" +``` + + +## 📝 **Modifying the Text Generation Prompt** +The system prompt defines the structure and tone of the generated script. Customizing this can allow you to generate conversations that align with your project’s needs. + +Customizing the app: + +1. Open the `app.py` file. +2. Locate the PODCAST_PROMPT variable. +3. Edit the instructions to suit your desired conversation style. + +Example: + +```python +PODCAST_PROMPT = """ +You are a radio show scriptwriter generating lively and humorous dialogues. +Speaker 1: A comedian who is interested in learning new things. +Speaker 2: A scientist explaining concepts in a fun way. +""" +``` + + +## 🎙️ **Customizing Speaker Descriptions** +Adjusting the speaker profiles allows you to create distinct and engaging voices for your podcast. + +Customizing the app: + +1. Open the `app.py` file. +2. Locate the SPEAKER_DESCRIPTIONS dictionary. +3. Update the descriptions to define new voice characteristics for each speaker +Example: + +```python +PODCAST_PROMPT = """ +SPEAKER_DESCRIPTIONS = { + "1": "A cheerful and animated voice with a fast-paced delivery.", + "2": "A calm and deep voice, speaking with authority and warmth." +} +""" +``` + + +## 🧠 **Changing the Text-to-Speech Model** +You can use a different TTS model to achieve specific voice styles or improve performance. + +Customizing the app: + +1. Open the `app.py` file. +2. Locate the `load_text_to_speech_model_and_tokenizer` function. +3. Replace the model_id with your preferred TTS model. + +Example: +```python +@st.cache_resource +def load_text_to_speech_model_and_tokenizer(): + return load_parler_tts_model_and_tokenizer( + "parler-tts/parler-tts-mini-expresso", "cpu") + +## 💡 Other Customization Ideas + +- Add Multiple Speakers: Modify `script_to_audio.py` to include additional speakers in your podcast. + + +## 🤝 **Contributing to the Blueprint** + +Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better! diff --git a/docs/future-features-contributions.md b/docs/future-features-contributions.md new file mode 100644 index 0000000..9223ae7 --- /dev/null +++ b/docs/future-features-contributions.md @@ -0,0 +1,30 @@ +# 🚀 **Future Features & Contributions** + +The Document-to-Podcast Blueprint is an evolving project designed to grow with the help of the open-source community. Whether you’re an experienced developer or just starting, there are many ways you can contribute and help shape the future of this tool. + +--- +## 🛠️ **This Page is Evolving** +As the community grows, we’ll use this space to highlight contributions, showcase new ideas, and share guidance on expanding the Blueprint ecosystem. + +We have some ideas of how this Blueprint can be extend and improved, will be sharing these ideas and request for contributions shortly. + +--- + +## 🌟 **How You Can Contribute** + +### 💡 **Share Your Ideas** +Got a vision for how this Blueprint could be improved? Share your suggestions through [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions). Your insights can help inspire new directions for the project. + +### 🛠️ **Enhance the Code** +Dive into the codebase and contribute enhancements, optimizations, or bug fixes. Whether it's a small tweak or a big feature, every contribution helps! Start by checking our Contribution Guide (coming soon). + + +### 🌍 **Build New Blueprints** +This project is part of a larger initiative to create a collection of reusable starter code solutions that use open-source AI tools. If you’re inspired to create your own Blueprint, we’d love to see it! + +--- + +## 🤝 **Get Involved** +- Visit our [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions) to explore ongoing conversations and share your thoughts. + +Your contributions help make this Blueprint better for everyone. Thank you for being part of the journey! 🎉 diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..08a89b9 --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,49 @@ +Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine. + +--- + +### ☁️ **Option 1: GitHub Codespaces** + +The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces: + +[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) + +Once the Codespaces environment launches, follow these steps: + +1. **Install Dependencies** + Inside the Codespaces terminal, run: +```bash +pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +``` +2. **Run the Demo** + Inside the Codespaces terminal, start the Streamlit demo by running: +```bash +python -m streamlit run demo/app.py +``` + + +### 💻 **Option 2: Local Installation** + +1. **Clone the Repository** + Inside the Codespaces terminal, run: + +```bash +git clone https://github.com/mozilla-ai/document-to-podcast.git +cd document-to-podcast +``` + + +2. **Install Dependencies** + Inside the terminal, run: + + +```bash +pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +``` + +3. **Run the Demo** + Inside the terminal, start the Streamlit demo by running: + +```bash +python -m streamlit run demo/app.py +``` diff --git a/docs/images/Blueprints-logo.png b/docs/images/Blueprints-logo.png new file mode 100644 index 0000000..b865f86 Binary files /dev/null and b/docs/images/Blueprints-logo.png differ diff --git a/docs/images/document-to-podcast-diagram.png b/docs/images/document-to-podcast-diagram.png new file mode 100644 index 0000000..5969382 Binary files /dev/null and b/docs/images/document-to-podcast-diagram.png differ diff --git a/docs/index.md b/docs/index.md index e3018ee..b5345e2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1 +1,37 @@ -# Wellcome to Blueprint docs +# **Document-to-Podcast Blueprint** + +
+ Project Logo +
+ +Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools. + +These docs are your your companion to mastering the **Document-to-Podcast Blueprint**—a local-first approach for transforming documents into engaging podcasts. + +--- + +### 🚀 **Get Started Quickly** +#### _Start building your own Document-to-Podcast pipeline in minutes:_ +- **[Getting Started](getting-started.md):** Quick setup and installation instructions. + +### 🔍 **Understand the System** +#### _Dive deeper into how the Blueprint works:_ +- **[Step-by-Step Guide](step-by-step-guide.md):** A detailed breakdown of the system’s design and workflow. +- **[API Reference](api.md):** Explore the technical details of the core modules. + +### 🎨 **Make It Yours** +#### _Customize the Blueprint to fit your needs:_ +- **[Customization Guide](customization.md):** Tailor prompts, voices, and settings to create unique podcasts. + +### 🌟 **Join the Community** +#### _Help shape the future of Blueprints:_ +- **[Future Features & Contributions](future-features-contributions.md):** Learn about exciting upcoming features and how to contribute to the project. + + +Have more questions? Reach out to us on Discord and we'll see how we can help: + +--- + +## **Why Blueprints?** + +Blueprints are more than starter code—they’re your gateway to building AI-powered solutions with confidence. With step-by-step guidance, modular design, and open-source tools, we make AI accessible for developers of all skill levels. diff --git a/docs/step-by-step-guide.md b/docs/step-by-step-guide.md new file mode 100644 index 0000000..70d1553 --- /dev/null +++ b/docs/step-by-step-guide.md @@ -0,0 +1,152 @@ +# **Step-by-Step Guide: How the Document-to-Podcast Blueprint Works** + +Transforming static documents into engaging podcast episodes involves a integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation. Here's how it all works under the hood: + +--- + +## **Overview** +This system has three core stages: + + +📄 **1. Document Pre-Processing** + Prepare the input document by extracting and cleaning the text. + +📜 **2. Podcast Script Generation** + Use an LLM to transform the cleaned text into a conversational podcast script. + +🎙️ **3. Audio Podcast Generation** + Convert the script into an engaging audio podcast with distinct speaker voices. + +We'll also look at how `app.py` brings all these steps together to build an end-to-end demo application. + +First, let’s dive into each step to understand how this works in practice. + + +--- + +## **Step 1: Document Pre-Processing** + +The process begins with preparing the input document for AI processing. The system handles various document types while ensuring the extracted content is clean and structured. + +Cleaner input data ensures that the model works with reliable and consistent information, reducing the likelihood of confusing with unexpected tokens and therefore helping it to generate better outputs. + +### ⚙️ **Key Components in this Doc Pre-Processing** + **1 - File Loading** + + - Uses functions defined in `data_loaders.py` + + - Supports `.html`, `.pdf`, `.txt`, and `.docx` formats. + + - Extracts readable text from uploaded files using specialized loaders. + + **2 - Text Cleaning** + + - Uses functions defined in [`data_cleaners.py`](../api/#opennotebookllm.inference.data_cleaners) + + - Removes unwanted elements like URLs, email addresses, and special characters using Python's `re` library, which leverages **Regular Expressions** (regex) to identify and manipulate specific patterns in text. + + - Ensures the document is clean and ready for the next step. + +## **Step 2: Podcast Script Generation** + +In this step, the pre-processed text is transformed into a conversational podcast transcript. Using a Language Model, the system generates a dialogue that’s both informative and engaging. + +### ⚙️ **Key Components in Script Generation** + + **1 - Model Loading** + + - The [`model_loader.py`](../api/#opennotebookllm.inference.model_loaders) script is responsible for loading GGUF-type models using the `llama_cpp` library. + + - The function `load_llama_cpp_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model. + + - This approach of using the `llama_cpp` library supports efficient CPU-based inference, making language models accessible even on machines without GPUs. + + **2 - Text-to-Text Generation** + + - The [`text_to_text.py`](../api/#opennotebookllm.inference.text_to_text) script manages the interaction with the language model, converting input text into a structured conversational podcast script. + + - It uses the `chat_completion` function to process the input text and a customizable system prompt, guiding the language to generate a text output (e.g. a coherent podcast script between speakers). + + - The `return_json` parameter allows the output to be formatted as a JSON object style, which can make it easier to parse and integrate structured responses into applications. + + - Supports both single-pass outputs (`text_to_text`) and real-time streamed responses (`text_to_text_stream`), offering flexibility for different use cases. + + +## **Step 3: Audio Podcast Generation** + +In this final step, the generated podcast transcript is brought to life as an audio file. Using a Text-to-Speech (TTS) model, each speaker in the script is assigned a unique voice, creating an engaging and professional-sounding podcast. + +### ⚙️ **Key Components in this Step** + +**1 - Text-to-Speech Audio Generation** + + - The `text_to_speech.py` script converts text into audio using a specified TTS model and tokenizer. + + - A **speaker profile** defines the voice characteristics (e.g., tone, speed, clarity) for each speaker. + + - The function `text_to_speech` takes the input text (e.g podcast script) and speaker profile, generating a waveform (audio data) that represents the spoken version of the text. + +**2 - Parsing and Combining Voices** + +- The `script_to_audio.py` script ensures each speaker’s dialogue is spoken in their unique voice. + +- The function `parse_script_to_waveform` splits the dialogue script by speakers and uses `text_to_speech` to generate audio for each speaker, stitching them together into a full podcast. + +- Once the podcast waveform is ready, the save_waveform_as_file function saves it as an audio file (e.g., MP3 or WAV), making it ready for distribution. + + +## **Bringing It All Together in `app.py`** + +The `app.py` demo app is shows you how all the components of the Document-to-Podcast Blueprint can come together. It demonstrates how you can take the individual steps—Document Pre-Processing, Podcast Script Generation, and Audio Podcast Generation—and integrate them into a functional application. This is the heart of the Blueprint in action, showing how you can build an app using the provided tools and components. + +This demo uses [Streamlit](https://streamlit.io/), an open-source Python framework for interactive apps. + +
+ Project Logo +
+ + +--- + +### 🧠 **How `app.py` Applies Each Step** + +**📄 Document Upload & Pre-Processing** + + - Users upload a file via the Streamlit interface (`st.file_uploader`), which supports `.pdf`, `.txt`, `.docx`, `.html`, and `.md` formats. + + - The uploaded file is passed to the **File Loading** and **Text Cleaning** modules. + + - Raw text is extracted using `DATA_LOADERS`, and the cleaned version is displayed alongside it using `DATA_CLEANERS`, and displayed to the end user. + +**⚙️ Loading Models** + +- The script uses `load_llama_cpp_model` from `model_loader.py` to load the LLM for generating the podcast script. + +- Similarly, `load_parler_tts_model_and_tokenizer` is used to prepare the TTS model and tokenizer for audio generation. + +- These models are cached using `@st.cache_resource` to ensure fast and efficient reuse during app interactions. + +**📝 Podcast Script Generation** + + - The cleaned text and a system-defined podcast prompt are fed into the text_to_text_stream function. + + - The `PODCAST_PROMPT` can be edited by the end-user to enable them to tailor their script results for their needs. + + - The script is streamed back to the user in real-time, allowing them to see the generated conversation between speakers + +**🎙️ Podcast Generation** + +- For each speaker in the podcast script, audio is generated using the `text_to_speech` function with distinct speaker profiles + +- The `SPEAKER_DESCRIPTION` enables the user to edit the podcast speakers voices to fit their needs. + +- The generated audio is displayed with a player so users can listen directly in the app. + + +## 🎨 **Customizing the Blueprint** + +To better understand how you can tailor this Blueprint to suit your specific needs, please visit the **[Customization Guide](../customization)**. + +## 🤝 **Contributing to the Blueprint** + +Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better! diff --git a/images/document-to-podcast-diagram.png b/images/document-to-podcast-diagram.png new file mode 100644 index 0000000..5969382 Binary files /dev/null and b/images/document-to-podcast-diagram.png differ diff --git a/mkdocs.yml b/mkdocs.yml index cbf0a8d..3d3940c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,13 +1,37 @@ -site_name: Blueprint Docs +site_name: Blueprints Docs nav: - Home: index.md + - Getting Started: getting-started.md + - Step-by-Step Guide: step-by-step-guide.md + - Customization Guide: customization.md - API Reference: api.md + - Future Features & Contributions: future-features-contributions.md theme: name: material palette: - primary: deep orange + - scheme: default + primary: "#005F6F" + toggle: + icon: material/lightbulb + name: Switch to dark mode + - scheme: slate + primary: "#005F6F" + toggle: + icon: material/lightbulb-outline + name: Switch to light mode + extra_css: + - assets/custom.css + +markdown_extensions: + - pymdownx.highlight: + anchor_linenums: true + line_spans: __span + pygments_lang_class: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.superfences plugins: - mkdocstrings: