From b795918fb093ef533c2c32c738006504af827f67 Mon Sep 17 00:00:00 2001
From: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com>
Date: Wed, 18 Dec 2024 13:17:44 +0100
Subject: [PATCH 1/2] API examples in step-by-step guide (#59)

* Add api examples

* updating step-by-step to include API examples

* adding contributions to the READ.me to be consistent with blueprint template

* updates to customoization and step by step guide

* small addition to customization examples

---------

Co-authored-by: stefanfrench <stefrfr@gmail.com>
---
 README.md                  |   4 ++
 docs/customization.md      | 105 ++++++++++++++++------------------
 docs/step-by-step-guide.md | 112 ++++++++++++++++++++++++++++++++++++-
 mkdocs.yml                 |   2 +
 4 files changed, 164 insertions(+), 59 deletions(-)

diff --git a/README.md b/README.md
index c627552..0ea7348 100644
--- a/README.md
+++ b/README.md
@@ -133,3 +133,7 @@ You are probably missing the `GNU Make` package. A quick way to solve it is run
 ## License
 
 This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.
+
+## Contributing
+
+Contributions are welcome! To get started, you can check out the [CONTRIBUTING.md](CONTRIBUTING.md) file.
diff --git a/docs/customization.md b/docs/customization.md
index 7d49d62..0461977 100644
--- a/docs/customization.md
+++ b/docs/customization.md
@@ -1,88 +1,79 @@
 # 🎨 **Customization Guide**
 
-The Document-to-Podcast Blueprint is designed to be flexible and easily adaptable to your specific needs. This guide will walk you through some key areas you can customize to make the Blueprint your own.
+The Document-to-Podcast Blueprint is designed to be flexible and adaptable to your specific needs.
+This guide outlines the key parameters you can customize and explains how to make these changes depending on whether you’re running the application via app.py or the CLI pipeline.
 
----
+## 🖋️ **Key Parameters for Customization**
 
-## 🧠 **Changing the Text-to-Text Model**
-You can swap the language model used for generating podcast scripts to suit your needs, such as using a smaller model for faster processing or a larger one for higher quality outputs.
+- **`input_file`**: The input file specifies the document to be processed. Supports the following formats: `pdf`, `html`, `txt`, `docx`, `md`.
 
-Customizing the app:
+- **`text_to_text_model`**: The language model used to generate the podcast script. Note: The model parameter must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf`.
 
-1. Open the `app.py` file.
-2. Locate the `load_text_to_text_model` function.
-3. Replace the `model_id` with the ID of your desired model from a supported repository (e.g., Hugging Face). Note: The model repository must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
+- **`text_to_text_prompt`**: Defines the tone, structure, and instructions for generating the podcast script. This prompt is crucial for tailoring the conversation style to your project.
 
-Example:
-
-```python
-@st.cache_resource
-def load_text_to_text_model():
-    return load_llama_cpp_model(
-        model_id="Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
-```
+- **`text_to_speech_model`**: Specifies the model used for text-to-speech conversion. You can change this to achieve the desired voice style or improve performance. Check `config.py` to choose from supported models.
 
+- **`speakers`**: Defines the podcast participants, including their names, roles, descriptions, and voice profiles. Customize this to create engaging personas and voices for your podcast.
 
-## 📝 **Modifying the Text Generation Prompt**
-The system prompt defines the structure and tone of the generated script. Customizing this can allow you to generate conversations that align with your project’s needs.
+## 🖥️ **Customizing When Running via `app.py`**
 
-Customizing the app:
-
-1.	Open the `app.py` file.
-2.	Locate the PODCAST_PROMPT variable.
-3.	Edit the instructions to suit your desired conversation style.
-
-Example:
+If you’re running the application using `app.py`, you can customize these parameters in the **`src/config.py`** file. This centralized configuration file simplifies the customization process.
 
+Running app.py:
 ```python
-PODCAST_PROMPT = """
-You are a radio show scriptwriter generating lively and humorous dialogues.
-Speaker 1: A comedian who is interested in learning new things.
-Speaker 2: A scientist explaining concepts in a fun way.
-"""
+python -m streamlit run demo/app.py
 ```
 
+### Steps to Customize
+1. Open the `config.py` file.
+2. Locate the relevant parameter you want to change (e.g., `text_to_text_model`, `speakers`).
+3. Update the value according to your needs.
 
-## 🎙️ **Customizing Speaker Descriptions**
-Adjusting the speaker profiles allows you to create distinct and engaging voices for your podcast.
-
-Customizing the app:
-
-1. Open the `app.py` file.
-2.	Locate the SPEAKER_DESCRIPTIONS dictionary.
-3.	Update the descriptions to define new voice characteristics for each speaker
-Example:
+#### Example: Updating the Prompt
+In `config.py`, modify the `text_to_text_prompt` parameter:
 
 ```python
-SPEAKER_DESCRIPTIONS_OUTE = {
-    "1": "A cheerful and animated voice with a fast-paced delivery.",
-    "2": "A calm and deep voice, speaking with authority and warmth."
-}
+DEFAULT_PROMPT = """
+You are a podcast scriptwriter generating engaging and humorous conversations in JSON format.
+The script features the following speakers:
+{SPEAKERS}
+Instructions:
+- Use a casual and fun tone.
+- Include jokes and lighthearted banter.
+- Format output as a JSON conversation.
+  {
+    "Speaker 1": "Well we a have a hilarious podcast in store for you today...",
+    "Speaker 2": "I can't wait, I had the weirdest week - let me tell you all about it...",
 """
 ```
 
+## ⌨️ **Customizing When Running via the CLI**
 
-## 🧠 **Changing the Text-to-Speech Model**
-You can use a different TTS model to achieve specific voice styles or improve performance.
+If you’re running the pipeline from the command line, you can customize the parameters by modifying the **`example_data/config.yaml`** file.
 
-Customizing the app:
+Running in the CLI:
+```bash
+document-to-podcast --from_config example_data/config.yaml
+```
 
-1. Open the `app.py` file.
-2. Locate the `load_text_to_speech_model_and_tokenizer` function.
-3.	Replace the model_id with your preferred TTS model.
+### Steps to Customize
+1. Open the `config.yaml` file.
+2. Locate the parameter you want to adjust.
+3. Update the value and save the file.
 
-Example:
-```python
-@st.cache_resource
-def load_text_to_speech_model_and_tokenizer():
-    return load_parler_tts_model_and_tokenizer(
-        "parler-tts/parler-tts-mini-expresso", "cpu")
+#### Example: Changing the Text-to-Text Model
+In `config.yaml`, modify the `text_to_text_model` entry:
+
+```yaml
+text_to_text_model: "Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
 ```
 
-## 💡 Other Customization Ideas
+## ✏️ **Customization Examples**
 
-- Add Multiple Speakers: Modify `script_to_audio.py` to include additional speakers in your podcast.
+Looking for inspiration? Check out these examples of how others have customized the Document-to-Podcast Blueprint for their unique needs:
 
+- **[Radio Drama Generator](https://github.com/stefanfrench/radio-drama-generator)**: A creative adaptation that generates radio dramas by customizing ng the Blueprint parameters.
+- **[Readme-to-Podcast](https://github.com/alexmeckes/readme-to-podcast)**: This project transforms GitHub README files into podcast-style audio, showcasing the Blueprint’s ability to handle diverse text inputs.
 
 ## 🤝 **Contributing to the Blueprint**
 
diff --git a/docs/step-by-step-guide.md b/docs/step-by-step-guide.md
index b3329ec..6237edd 100644
--- a/docs/step-by-step-guide.md
+++ b/docs/step-by-step-guide.md
@@ -48,6 +48,33 @@ Cleaner input data ensures that the model works with reliable and consistent inf
 
    - Ensures the document is clean and ready for the next step.
 
+### 🔍 **API Example**
+
+```py
+from document_to_podcast.preprocessing import DATA_CLEANERS, DATA_LOADERS
+
+input_file = "example_data/introducing-mozilla-ai-investing-in-trustworthy-ai.html"
+data_loader = DATA_LOADERS[".html"]
+data_cleaner = DATA_CLEANERS[".html"]
+
+raw_data = data_loader(input_file)
+print(raw_data[:200])
+"""
+<!doctype html>
+<html class="no-js" lang="en-US">
+
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <link rel="profile" href="https://gmpg.org/x
+"""
+clean_data = data_cleaner(raw_data)
+print(clean_data[:200])
+"""
+Skip to content Mozilla Internet Culture Deep Dives Mozilla Explains Interviews Videos Privacy Security Products Firefox Pocket Mozilla VPN Mozilla News Internet Policy Leadership Mitchell Baker, CEO
+"""
+```
+
 ## **Step 2: Podcast Script Generation**
 
 In this step, the pre-processed text is transformed into a conversational podcast transcript. Using a Language Model, the system generates a dialogue that’s both informative and engaging.
@@ -71,6 +98,60 @@ In this step, the pre-processed text is transformed into a conversational podcas
    - Supports both single-pass outputs (`text_to_text`) and real-time streamed responses (`text_to_text_stream`), offering flexibility for different use cases.
 
 
+### 🔍 **API Example**
+
+```py
+from document_to_podcast.inference.model_loaders import load_llama_cpp_model
+from document_to_podcast.inference.text_to_text import text_to_text, text_to_text_stream
+
+# Load the model
+model = load_llama_cpp_model(
+    "allenai/OLMoE-1B-7B-0924-Instruct-GGUF/olmoe-1b-7b-0924-instruct-q8_0.gguf"
+)
+
+# Define your input and system prompt
+input_text = (
+    "Electric vehicles (EVs) have seen a significant rise in adoption over the past "
+    "decade, driven by advancements in battery technology, government incentives, "
+    "and growing consumer awareness of environmental issues."
+)
+
+system_prompt = (
+    """
+    You are a podcast scriptwriter generating engaging and natural-sounding conversations in JSON format.
+    - Write dynamic, easy-to-follow dialogue.
+    - Include natural interruptions and interjections.
+    - Avoid repetitive phrasing between speakers.
+    - Format output as a JSON conversation.
+    Example:
+    {
+      "Speaker 1": "Welcome to our podcast! Today, we're exploring...",
+      "Speaker 2": "Hi! I'm excited to hear about this. Can you explain...",
+    }
+    """
+)
+
+# Generate a podcast script from the input text
+podcast_script = text_to_text(input_text, model, system_prompt)
+print(podcast_script)
+
+"""
+{
+  "Speaker 1": "Welcome to our podcast! Today, we're exploring the rise of electric vehicles (EVs) and what's driving this significant increase in adoption over the past decade.",
+  "Speaker 2": "Absolutely, it's fascinating to see how the market has evolved and how consumers are becoming more environmentally conscious.",
+  "Speaker 1": "Absolutely! Let's dive into the key factors driving this growth.",
+  "Speaker 2": "Sure, here are a few key drivers: advancements in battery technology, government incentives, and growing consumer awareness of environmental issues.",
+  ...
+}
+"""
+
+# Example of real-time script generation with streaming
+for chunk in text_to_text_stream(input_text, model, system_prompt):
+    print(chunk, end="")
+
+```
+
+
 ## **Step 3: Audio Podcast Generation**
 
 In this final step, the generated podcast transcript is brought to life as an audio file. Using a Text-to-Speech (TTS) model, each speaker in the script is assigned a unique voice, creating an engaging and professional-sounding podcast.
@@ -93,6 +174,33 @@ In this final step, the generated podcast transcript is brought to life as an au
 
    - The function `text_to_speech` takes the input text (e.g. podcast script) and speaker profile, generating a waveform (audio data in a numpy array) that represents the spoken version of the text.
 
+### 🔍 **API Example**
+
+```py
+import soundfile as sf
+from document_to_podcast.inference.model_loaders import load_outetts_model
+from document_to_podcast.inference.text_to_speech import text_to_speech
+
+# Load the TTS model
+model = load_outetts_model(
+    "OuteAI/OuteTTS-0.1-350M-GGUF/OuteTTS-0.1-350M-FP16.gguf"
+)
+
+# Generate the waveform
+waveform = text_to_speech(
+    input_text="Welcome to our amazing podcast",
+    model=model,
+    voice_profile="male_1"
+)
+
+# Save the audio file
+sf.write(
+    "podcast.wav",
+    waveform,
+    samplerate=model.audio_codec.sr
+)
+```
+
 ## **Bringing It All Together in `app.py`**
 
 The `app.py` demo app is shows you how all the components of the Document-to-Podcast Blueprint can come together. It demonstrates how you can take the individual steps—Document Pre-Processing, Podcast Script Generation, and Audio Podcast Generation—and integrate them into a functional application. This is the heart of the Blueprint in action, showing how you can build an app using the provided tools and components.
@@ -128,7 +236,7 @@ This demo uses [Streamlit](https://streamlit.io/), an open-source Python framewo
 
  - The cleaned text and a system-defined podcast prompt are fed into the text_to_text_stream function.
 
- - The `PODCAST_PROMPT` can be edited by the end-user to enable them to tailor their script results for their needs.
+ - The `DEFAULT_PROMPT` is loaded from `config.py`
 
  - The script is streamed back to the user in real-time, allowing them to see the generated conversation between speakers
 
@@ -136,7 +244,7 @@ This demo uses [Streamlit](https://streamlit.io/), an open-source Python framewo
 
 - For each speaker in the podcast script, audio is generated using the `text_to_speech` function with distinct speaker profiles
 
-- The `SPEAKER_DESCRIPTION` enables the user to edit the podcast speakers voices to fit their needs.
+- The `DEFAULT_SPEAKERS` is loaded from `config.py`
 
 - The generated audio is displayed with a player so users can listen directly in the app.
 
diff --git a/mkdocs.yml b/mkdocs.yml
index f1d5068..e4e48bf 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -28,6 +28,8 @@ theme:
         name: Switch to light mode
   extra_css:
     - assets/custom.css
+  features:
+    - content.code.copy
 
 markdown_extensions:
   - pymdownx.highlight:

From 2ec7ef903cc3eebd4b472eb670f709fb588a08b9 Mon Sep 17 00:00:00 2001
From: Kostis <Kostis-S-Z@users.noreply.github.com>
Date: Wed, 18 Dec 2024 13:54:00 +0000
Subject: [PATCH 2/2] Add fire to core dependencies (#73)

---
 pyproject.toml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pyproject.toml b/pyproject.toml
index 4cc6fc8..a6b8d92 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,6 +10,7 @@ requires-python = ">=3.10,<3.13"
 dynamic = ["version"]
 dependencies = [
   "beautifulsoup4",
+  "fire",
   "huggingface-hub",
   "llama-cpp-python",
   "loguru",