diff --git a/docs/docs/quick-start/getting-started-01.md b/docs/docs/quick-start/getting-started-01.md
new file mode 100644
index 000000000..4e6cf5eaf
--- /dev/null
+++ b/docs/docs/quick-start/getting-started-01.md
@@ -0,0 +1,340 @@
+---
+sidebar_position: 2
+---
+
+# Getting Started I: Basic Question Answering
+
+Let's walk through a quick example of **basic question answering** in DSPy. Specifically, let's build **a system for answering Tech questions**, e.g. about Linux or iPhone apps.
+
+Install the latest DSPy via `pip install -U dspy` and follow along. If you're looking instead for a conceptual overview of DSPy, this [recent lecture](https://www.youtube.com/live/JEMYuzrKLUw) is a good place to start.
+
+## Configuring the DSPy environment
+
+Let's tell DSPy that we will use OpenAI's `gpt-4o-mini` in our modules. To authenticate, DSPy will look into your `OPENAI_API_KEY`. You can easily swap this out for [other providers or local models](https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb).
+
+```python
+import dspy
+
+lm = dspy.LM('openai/gpt-4o-mini')
+dspy.configure(lm=lm)
+```
+
+## Exploring some basic DSPy `Module`s.
+
+You can always prompt the LM directly via `lm(prompt="prompt")` or `lm(messages=[...])`. However, DSPy gives you `Modules` as a better way to define your LM functions.
+
+The simplest module is `dspy.Predict`. It takes a [DSPy Signature](/docs/building-blocks/signatures), i.e. a structured input/output schema, and gives you back a callable function for the behavior you specified. Let's use the "in-line" notation for signatures to declare a module that takes a `question` (of type `str`) as input and produces a `response` as an output.
+
+```python
+qa = dspy.Predict('question: str -> response: str')
+qa(question="what are high memory and low memory on linux?").response
+```
+
+**Output:**
+```
+'In Linux, "high memory" and "low memory" refer to different regions of the system\'s memory address space, particularly in the context of 32-bit architectures.\n\n- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly accessible by the kernel and is used for kernel data structures and user processes. The low memory region is where most of the system\'s memory management occurs, and it is where the kernel can allocate memory for processes without needing special handling.\n\n- **High Memory**: This refers to memory above the 896 MB threshold in a 32-bit system. The kernel cannot directly access this memory without special mechanisms because of the limitations of the 32-bit address space. High memory is used for user processes that require more memory than what is available in the low memory region. The kernel can manage high memory through techniques like "highmem" support, which allows it to map high memory pages into the kernel\'s address space when needed.\n\nIn summary, low memory is directly accessible by the kernel, while high memory requires additional handling for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less relevant as the addressable memory space is significantly larger.'
+```
+
+Notice how the variable names we specified in the signature defined our input and output argument names and their role.
+
+Now, what did DSPy do to build this `qa` module? Nothing fancy in this example, yet. The module passed your signature, LM, and inputs to an [Adapter](/docs/building-blocks/language_models#structured-lm-output-with-adapters), which is a layer that handles structuring the inputs and parsing structured outputs to fit your signature.
+
+Let's see it directly. You can inspect the `n` last prompts sent by DSPy easily.
+
+```python
+dspy.inspect_history(n=1)
+```
+
+**Output:**
+``` 
+System message:
+
+Your input fields are:
+1. `question` (str)
+
+Your output fields are:
+1. `response` (str)
+
+All interactions will be structured in the following way, with the appropriate values filled in.
+
+[[ ## question ## ]]
+{question}
+
+[[ ## response ## ]]
+{response}
+
+[[ ## completed ## ]]
+
+In adhering to this structure, your objective is: 
+        Given the fields `question`, produce the fields `response`.
+
+
+User message:
+
+[[ ## question ## ]]
+what are high memory and low memory on linux?
+
+Respond with the corresponding output fields, starting with the field `response`, and then ending with the marker for `completed`.
+
+
+Response:
+
+[[ ## response ## ]]
+In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.
+
+- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly accessible by the kernel and is used for kernel data structures and user processes. The low memory region is where most of the system's memory management occurs, and it is where the kernel can allocate memory for processes without needing special handling.
+
+- **High Memory**: This refers to memory above the 896 MB threshold in a 32-bit system. The kernel cannot directly access this memory without special mechanisms because of the limitations of the 32-bit address space. High memory is used for user processes that require more memory than what is available in the low memory region. The kernel can manage high memory through techniques like "highmem" support, which allows it to map high memory pages into the kernel's address space when needed.
+
+In summary, low memory is directly accessible by the kernel, while high memory requires additional handling for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less relevant as the addressable memory space is significantly larger. 
+
+[[ ## completed ## ]]
+```
+
+DSPy has various built-in modules, e.g. `dspy.ChainOfThought`, `dspy.ProgramOfThought`, and `dspy.ReAct`. These are interchangeable with basic `dspy.Predict`: they take your signature, which is specific to your task, and they apply general-purpose prompting techniques and inference-time strategies to it.
+
+For example, `dspy.ChainOfThought` is an easy way to elicit `reasoning` out of your LM before it commits to the outputs requested in your signature.
+
+In the example below, we'll omit `str` types (as the default type is string). You should feel free to experiment with other fields and types, e.g. try `topics: list[str]` or `is_realistic: bool`.
+
+```python
+cot = dspy.ChainOfThought('question -> response')
+cot(question="should curly braces appear on their own line?")
+```
+
+**Output:**
+```
+Prediction(
+    reasoning="The placement of curly braces on their own line is largely a matter of coding style and conventions. In some programming languages and style guides, such as those used in C, C++, and Java, it is common to place opening curly braces on the same line as the control statement (like `if`, `for`, etc.) and closing braces on a new line. However, other styles, such as the Allman style, advocate for placing both opening and closing braces on their own lines. Ultimately, the decision should be based on the team's coding standards or personal preference, as long as it maintains readability and consistency throughout the code.",
+    response="Curly braces can either appear on their own line or not, depending on the coding style you choose to follow. It's important to be consistent with whichever style you adopt."
+)
+```
+
+
+Interestingly, asking for reasoning made the output `response` shorter in this case. Is this a good thing or a bad thing? It depends on what you need: there's no free lunch, but DSPy gives you the tools to experiment with different strategies extremely quickly.
+
+By the way, `dspy.ChainOfThought` is implemented in DSPy, using `dspy.Predict`. This is a good place to `dspy.inspect_history` if you're curious.
+
+## Using DSPy well involves evaluation and iterative development.
+
+You already know a lot about DSPy at this point. If all you want is quick scripting, this much of DSPy already enables a lot. Sprinkling DSPy signatures and modules into your Python control flow is a pretty ergonomic way to just get stuff done with LMs.
+
+That said, you're likely here because you want to build a high-quality system and improve it over time. The way to do that in DSPy is to iterate fast by evaluating the quality of your system and using DSPy's powerful tools, e.g. [Optimizers](/docs/building-blocks/optimizers). You can learn about the [appropriate development cycle in DSPy here](/docs/building-blocks/solving_your_task).
+
+## Manipulating `Example`s in DSPy.
+
+To measure the quality of your DSPy system, you need (1) a bunch of input values, like `question`s for example, and (2) a `metric` that can score the quality of an output from your system. Metrics vary widely. Some metrics need ground-truth labels of ideal outputs, e.g. for classification or question answering. Other metrics are self-supervised, e.g. checking faithfulness or lack of hallucination, perhaps using a DSPy program as a judge of these qualities.
+
+Let's load a dataset of questions and their (pretty long) gold answers. Since we started this notebook with the goal of building **a system for answering Tech questions**, we obtained a bunch of StackExchange-based questions and their correct answers from the RAG-QA Arena dataset. (Learn more about the [development cycle](/docs/building-blocks/solving_your_task) if you don't have data for your task.)
+
+
+```python
+import ujson
+
+# Download 500 question--answer pairs from the RAG-QA Arena "Tech" dataset.
+# !wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json
+
+with open('ragqa_arena_tech_500.json') as f:
+    data = ujson.load(f)
+
+# Inspect one datapoint.
+data[0]
+```
+
+**Output:**
+```
+{'question': 'how to transfer whatsapp voice message to computer?',
+  'response': 'To transfer voice notes from WhatsApp on your device to your computer, you have the option to select the "Share" feature within the app and send the files via Email, Gmail, Bluetooth, or other available services.  \nYou can also move the files onto your phone\'s SD card, connect your phone to your computer via a USB cable, then find and transfer the files via File Explorer on your PC. \nAlternatively, you can choose to attach all the desired voice notes to an email and, from your phone, send them to your own email address.  \nUpon receiving the email on your computer, you can then download the voice note attachments.'}
+```
+
+Given a simple dict like this, let's create a list of `dspy.Example`s, which is the datatype that carries training (or test) datapoints in DSPy.
+
+When you build a `dspy.Example`, you should generally specify `.with_inputs("field1", "field2", ...)` to indicate which fields are inputs. The other fields are treated as labels or metadata.
+
+```python
+data = [dspy.Example(**d).with_inputs('question') for d in data]
+
+# Let's pick an `example` here from the data.
+example = data[2]
+example
+```
+
+**Output:**
+```
+Example({'question': 'what are high memory and low memory on linux?', 'response': '"High Memory" refers to the application or user space, the memory that user programs can use and which isn\'t permanently mapped in the kernel\'s space, while "Low Memory" is the kernel\'s space, which the kernel can address directly and is permanently mapped. \nThe user cannot access the Low Memory as it is set aside for the required kernel programs.'}) (input_keys={'question'})
+```
+
+Now, let's divide the data into:
+
+- Training and Validation sets:
+    - These are the splits you typically give to DSPy optimizers.
+    - Optimizers typically learn directly from the training examples and check their progress using the validation examples.
+    - It's good to have 30--300 examples for training and validation each.
+    - For prompt optimizers in particular, it's often better to pass _more_ validation than training.
+
+- Development and Test sets: The rest, typically on the order of 30--1000, can be used for:
+    - development (i.e., you can inspect them as you iterate on your system) and
+    - testing (final held-out evaluation).
+
+
+```python
+trainset, valset, devset, testset = data[:50], data[50:150], data[150:300], data[300:500]
+
+len(trainset), len(valset), len(devset), len(testset)
+```
+
+**Output:**
+```
+(50, 100, 150, 200)
+```
+
+
+## Evaluation in DSPy.
+
+What kind of metric can suit our question-answering task? There are many choices, but since the answer are long, we may ask: How well does the system response _cover_ all key facts in the gold response? And the other way around, how well is the system response _not saying things_ that aren't in the gold response?
+
+That metric is essentially a **semantic F1**, so let's load a `SemanticF1` metric from DSPy. This metric is actually implemented as a [very simple DSPy module](/docs/building-blocks/modules) using whatever LM we're working with.
+
+
+```python
+from dspy.evaluate import SemanticF1
+
+# Instantiate the metric.
+metric = SemanticF1()
+
+# Produce a prediction from our `cot` module, using the `example` above as input.
+pred = cot(**example.inputs())
+
+# Compute the metric score for the prediction.
+score = metric(example, pred)
+
+print(f"Question: \t {example.question}\n")
+print(f"Gold Reponse: \t {example.response}\n")
+print(f"Predicted Response: \t {pred.response}\n")
+print(f"Semantic F1 Score: {score:.2f}")
+```
+
+**Output:**
+```
+Question: 	 what are high memory and low memory on linux?
+
+Gold Reponse: 	 "High Memory" refers to the application or user space, the memory that user programs can use and which isn't permanently mapped in the kernel's space, while "Low Memory" is the kernel's space, which the kernel can address directly and is permanently mapped. 
+The user cannot access the Low Memory as it is set aside for the required kernel programs.
+
+Predicted Response: 	 In Linux, "low memory" refers to the memory that is directly accessible by the kernel and user processes, typically the first 4GB on a 32-bit system. "High memory" refers to memory above this limit, which is not directly accessible by the kernel in a 32-bit environment. This distinction is crucial for memory management, particularly in systems with large amounts of RAM, as it influences how memory is allocated and accessed.
+
+Semantic F1 Score: 0.80
+```
+
+The final DSPy module call above actually happens inside `metric`. You might be curious how it measured the semantic F1 for this example.
+
+
+```python
+dspy.inspect_history(n=1)
+```
+
+**Output:**
+```
+System message:
+
+Your input fields are:
+1. `question` (str)
+2. `ground_truth` (str)
+3. `system_response` (str)
+
+Your output fields are:
+1. `reasoning` (str)
+2. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response
+3. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth
+
+All interactions will be structured in the following way, with the appropriate values filled in.
+
+[[ ## question ## ]]
+{question}
+
+[[ ## ground_truth ## ]]
+{ground_truth}
+
+[[ ## system_response ## ]]
+{system_response}
+
+[[ ## reasoning ## ]]
+{reasoning}
+
+[[ ## recall ## ]]
+{recall}
+
+[[ ## precision ## ]]
+{precision}
+
+[[ ## completed ## ]]
+
+In adhering to this structure, your objective is: 
+        Compare a system's response to the ground truth to compute its recall and precision.
+        If asked to reason, enumerate key ideas in each response, and whether they are present in the other response.
+
+
+User message:
+
+[[ ## question ## ]]
+what are high memory and low memory on linux?
+
+[[ ## ground_truth ## ]]
+"High Memory" refers to the application or user space, the memory that user programs can use and which isn't permanently mapped in the kernel's space, while "Low Memory" is the kernel's space, which the kernel can address directly and is permanently mapped. 
+The user cannot access the Low Memory as it is set aside for the required kernel programs.
+
+[[ ## system_response ## ]]
+In Linux, "low memory" refers to the memory that is directly accessible by the kernel and user processes, typically the first 4GB on a 32-bit system. "High memory" refers to memory above this limit, which is not directly accessible by the kernel in a 32-bit environment. This distinction is crucial for memory management, particularly in systems with large amounts of RAM, as it influences how memory is allocated and accessed.
+
+Respond with the corresponding output fields, starting with the field `reasoning`, then `recall`, then `precision`, and then ending with the marker for `completed`.
+
+
+Response:
+
+[[ ## reasoning ## ]]
+The system response correctly identifies "low memory" as the memory directly accessible by the kernel and user processes, and it mentions the typical limit of 4GB on a 32-bit system. It also accurately describes "high memory" as the memory above this limit that is not directly accessible by the kernel. However, it does not explicitly mention that "low memory" is set aside for kernel programs, which is a key point in the ground truth. The system response provides a good overview of the concepts but lacks the detail about user access to low memory and the permanent mapping aspect mentioned in the ground truth.
+
+[[ ## recall ## ]]
+0.75
+
+[[ ## precision ## ]]
+0.85
+
+[[ ## completed ## ]]
+```
+
+For evaluation, you could use the metric above in a simple loop and just average the score. But for nice parallelism and utilities, we can rely on `dspy.Evaluate`.
+
+```python
+# Define an evaluator that we can re-use.
+evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24,
+                         display_progress=True, display_table=3)
+
+# Evaluate the Chain-of-Thought program.
+evaluate(cot)
+```
+
+**Output:**
+```
+Average Metric: 59.565342393613165 / 150  (39.7): 100%|██████████| 150/150 [00:00<00:00, 432.92it/s]
+[TABLE HERE]
+39.71
+```
+
+The table you'll get in the output would look like:
+
+| | question | example_response | reasoning | pred_response | SemanticF1 |
+|---|---|---|---|---|---|
+| 0 | why is mercurial considered to be easier than git? | Mercurial's syntax is considered more familiar, especially for those accustomed to SVN, and is well documented. It focuses on interface aspects, which initially makes learning... | Mercurial is often considered easier than Git for several reasons. Firstly, Mercurial has a simpler command structure and a more consistent user interface, which can... | Mercurial is considered easier than Git primarily due to its simpler command structure and more consistent user interface, making it more approachable for beginners. Its... | ✔️ [0.545] |
+| 1 | open finder window from current terminal location? | If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`... | To open a Finder window from the current terminal location on a Mac, you can use the `open` command followed by a dot (`.`) which... | You can open a Finder window from your current terminal location by using the following command:\n```\nopen .\n``` | ✔️ [0.667] |
+| 2 | how to import secret gpg key (copied from one machine to another)? | It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man... | To import a secret GPG key that has been copied from one machine to another, you need to ensure that the key is in the... | To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. **Transfer the Key**: Ensure that the... | ✔️ [0.708] |
+
+## What's next?
+
+In this guide, we built a very simple chain-of-thought module for question answering and evaluated it on a small dataset.
+
+Can we do better? In the next guide, we will build a retrieval-augmented generation (RAG) program in DSPy for the same task.
+
+We'll see how this can boost the score substantially, then we'll use one of the DSPy Optimizers to _compile_ our RAG program to higher-quality prompts, raising our scores even more.
+
+Continue here. [Getting Started II: An Example for Basic RAG](/docs/quick-start/getting-started-02.md)
diff --git a/docs/docs/quick-start/getting-started-02.md b/docs/docs/quick-start/getting-started-02.md
new file mode 100644
index 000000000..62160649a
--- /dev/null
+++ b/docs/docs/quick-start/getting-started-02.md
@@ -0,0 +1,326 @@
+---
+sidebar_position: 3
+---
+
+# Getting Started II: Basic RAG
+
+Let's walk through a quick example of **basic retrieval-augmented generation (RAG)** in DSPy. Specifically, let's build **a system for answering Tech questions**, e.g. about Linux or iPhone apps.
+
+Install the latest DSPy via `pip install -U dspy` and follow along. You may also need to install PyTorch via `pip install torch`.
+
+## Continue from Getting Started I.
+
+In [Getting Started I: Basic Question Answering](/docs/quick-start/getting-started-01), we've set up the DSPy LM, loaded some data, and loaded a metric for evaluation.
+
+Let's do these again and also download the corpus data that we will use for RAG search. The next cell will seek to download 4 GBs, so it may take a few minutes. A future version of this notebook will come with a cache that allows you to skip downloads and the pytorch installation.
+
+
+```python
+import os
+import requests
+
+urls = [
+    'https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json',
+    'https://huggingface.co/datasets/colbertv2/lotte_passages/resolve/main/technology/test_collection.jsonl',
+    'https://huggingface.co/dspy/cache/resolve/main/index.pt'
+]
+
+for url in urls:
+    filename = os.path.basename(url)
+    remote_size = int(requests.head(url, allow_redirects=True).headers.get('Content-Length', 0))
+    local_size = os.path.getsize(filename) if os.path.exists(filename) else 0
+
+    if local_size != remote_size:
+        print(f"Downloading '{filename}'...")
+        with requests.get(url, stream=True) as r, open(filename, 'wb') as f:
+            for chunk in r.iter_content(chunk_size=8192): f.write(chunk)
+
+import ujson
+import dspy
+from dspy.evaluate import SemanticF1
+
+lm = dspy.LM('openai/gpt-4o-mini')
+dspy.configure(lm=lm)
+
+with open('ragqa_arena_tech_500.json') as f:
+    data = [dspy.Example(**d).with_inputs('question') for d in ujson.load(f)]
+    trainset, valset, devset, testset = data[:50], data[50:150], data[150:300], data[300:500]
+
+metric = SemanticF1()
+evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24, display_progress=True, display_table=3)
+```
+
+## Set up your system's retriever.
+
+As far as DSPy is concerned, you can plug in any Python code for calling tools or retrievers. Hence, for our RAG system, we can plug any tools for the search step. Here, we'll just use OpenAI Embeddings and PyTorch for top-K search, but this is not a special choice, just a convenient one.
+
+```python
+import torch
+import functools
+from litellm import embedding as Embed
+
+with open("test_collection.jsonl") as f:
+    corpus = [ujson.loads(line) for line in f]
+
+index = torch.load('index.pt', weights_only=True)
+max_characters = 4000 # >98th percentile of document lengths
+
+@functools.lru_cache(maxsize=None)
+def search(query, k=5):
+    query_embedding = torch.tensor(Embed(input=query, model="text-embedding-3-small").data[0]['embedding'])
+    topk_scores, topk_indices = torch.matmul(index, query_embedding).topk(k)
+    topK = [dict(score=score.item(), **corpus[idx]) for idx, score in zip(topk_indices, topk_scores)]
+    return [doc['text'][:max_characters] for doc in topK]
+```
+
+## Build your first RAG `Module`.
+
+In the previous guide, we looked at individual DSPy modules in isolation, e.g. `dspy.Predict("question -> answer")`.
+
+What if we want to build a DSPy _program_ that has multiple steps? The syntax below with `dspy.Module` allows you to connect a few pieces together, in this case, our retriever and a generation module, so the whole system can be optimized.
+
+Concretely, in the `__init__` method, you declare any sub-module you'll need, which in this case is just a `dspy.ChainOfThought('context, question -> response')` module that takes retrieved context, a question, and produces a response. In the `forward` method, you simply express any Python control flow you like, possibly using your modules. In this case, we first invoke the `search` function defined earlier and then invoke the `self.respond` ChainOfThought module.
+
+```python
+class RAG(dspy.Module):
+    def __init__(self, num_docs=5):
+        self.num_docs = num_docs
+        self.respond = dspy.ChainOfThought('context, question -> response')
+
+    def forward(self, question):
+        context = search(question, k=self.num_docs)
+        return self.respond(context=context, question=question)
+  
+rag = RAG()
+rag(question="what are high memory and low memory on linux?")
+```
+
+**Output:**
+```
+Prediction(
+    reasoning="High memory and low memory in Linux refer to the organization of memory in the system, particularly in the context of the Linux kernel's virtual memory management. High memory is the portion of physical memory that is not directly mapped by the kernel's page tables, meaning that user-space applications cannot access it directly. Low memory, on the other hand, is the part of memory that the kernel can access directly. In a typical 32-bit architecture, the virtual memory is split into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). The distinction is important for memory management, especially when dealing with physical memory that cannot be mapped contiguously. Understanding this split is crucial for developers working with the Linux kernel, as it affects how memory is allocated and accessed.",
+    response="In Linux, high memory refers to the portion of physical memory that is not directly mapped by the kernel's page tables, making it inaccessible to user-space applications. Low memory is the segment that the kernel can access directly. In a typical 32-bit architecture, the memory is divided into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). This organization is essential for efficient memory management and affects how the kernel interacts with physical memory, especially in scenarios where contiguous memory is required."
+)
+```
+
+```python
+dspy.inspect_history()
+```
+
+**Output:**
+```    
+System message:
+    
+Your input fields are:
+1. `context` (str)
+2. `question` (str)
+
+Your output fields are:
+1. `reasoning` (str)
+2. `response` (str)
+
+All interactions will be structured in the following way, with the appropriate values filled in.
+
+[[ ## context ## ]]
+{context}
+
+[[ ## question ## ]]
+{question}
+
+[[ ## reasoning ## ]]
+{reasoning}
+
+[[ ## response ## ]]
+{response}
+
+[[ ## completed ## ]]
+
+In adhering to this structure, your objective is: 
+        Given the fields `context`, `question`, produce the fields `response`.
+
+
+User message:
+
+[[ ## context ## ]]
+[1] «As far as I remember, High Memory is used for application space and Low Memory for the kernel. Advantage is that (user-space) applications cant access kernel-space memory.»
+[2] «For the people looking for an explanation in the context of Linux kernel memory space, beware that there are two conflicting definitions of the high/low memory split (unfortunately there is no standard, one has to interpret that in context): High memory defined as the totality of kernel space in VIRTUAL memory. This is a region that only the kernel can access and comprises all virtual addresses greater or equal than PAGE_OFFSET. Low memory refers therefore to the region of the remaining addresses, which correspond to the user-space memory accessible from each user process. For example: on 32-bit x86 with a default PAGE_OFFSET, this means that high memory is any address ADDR with ADDR ≥ 0xC0000000 = PAGE_OFFSET (i.e. higher 1 GB). This is the reason why in Linux 32-bit processes are typically limited to 3 GB. Note that PAGE_OFFSET cannot be configured directly, it depends on the configurable VMSPLIT_x options (source). To summarize: in 32-bit archs, virtual memory is by default split into lower 3 GB (user space) and higher 1 GB (kernel space). For 64 bit, PAGE_OFFSET is not configurable and depends on architectural details that are sometimes detected at runtime during kernel load. On x86_64, PAGE_OFFSET is 0xffff888000000000 for 4-level paging (typical) and 0xff11000000000000 for 5-level paging (source). For ARM64 this is usually 0x8000000000000000. Note though, if KASLR is enabled, this value is intentionally unpredictable. High memory defined as the portion of PHYSICAL memory that cannot be mapped contiguously with the rest of the kernel virtual memory. A portion of the kernel virtual address space can be mapped as a single contiguous chunk into the so-called physical low memory. To fully understand what this means, a deeper knowledge of the Linux virtual memory space is required. I would recommend going through these slides. From the slides: This kind of high/low memory split is only applicable to 32-bit architectures where the installed physical RAM size is relatively high (more than ~1 GB). Otherwise, i.e. when the physical address space is small (<1 GB) or when the virtual memory space is large (64 bits), the whole physical space can be accessed from the kernel virtual memory space. In that case, all physical memory is considered low memory. It is preferable that high memory does not exist at all because the whole physical space can be accessed directly from the kernel, which makes memory management a lot simpler and efficient. This is especially important when dealing with DMAs (which typically require physically contiguous memory). See also the answer by @gilles»
+[3] «Low and High do not refer to whether there is a lot of usage or not. They represent the way it is organized by the system. According to Wikipedia: High Memory is the part of physical memory in a computer which is not directly mapped by the page tables of its operating system kernel. There is no duration for the free command which simply computes a snapshot of the information available. Most people, including programmers, do not need to understand it more clearly as it is managed in a much simpler form through system calls and compiler/interpreter operations.»
+[4] «This is relevant to the Linux kernel; Im not sure how any Unix kernel handles this. The High Memory is the segment of memory that user-space programs can address. It cannot touch Low Memory. Low Memory is the segment of memory that the Linux kernel can address directly. If the kernel must access High Memory, it has to map it into its own address space first. There was a patch introduced recently that lets you control where the segment is. The tradeoff is that you can take addressable memory away from user space so that the kernel can have more memory that it does not have to map before using. Additional resources: http://tldp.org/HOWTO/KernelAnalysis-HOWTO-7.html http://linux-mm.org/HighMemory»
+[5] «HIGHMEM is a range of kernels memory space, but it is NOT memory you access but its a place where you put what you want to access. A typical 32bit Linux virtual memory map is like: 0x00000000-0xbfffffff: user process (3GB) 0xc0000000-0xffffffff: kernel space (1GB) (CPU-specific vector and whatsoever are ignored here). Linux splits the 1GB kernel space into 2 pieces, LOWMEM and HIGHMEM. The split varies from installation to installation. If an installation chooses, say, 512MB-512MB for LOW and HIGH mems, the 512MB LOWMEM (0xc0000000-0xdfffffff) is statically mapped at the kernel boot time; usually the first so many bytes of the physical memory is used for this so that virtual and physical addresses in this range have a constant offset of, say, 0xc0000000. On the other hand, the latter 512MB (HIGHMEM) has no static mapping (although you could leave pages semi-permanently mapped there, but you must do so explicitly in your driver code). Instead, pages are temporarily mapped and unmapped here so that virtual and physical addresses in this range have no consistent mapping. Typical uses of HIGHMEM include single-time data buffers.»
+
+[[ ## question ## ]]
+what are high memory and low memory on linux?
+
+Respond with the corresponding output fields, starting with the field `reasoning`, then `response`, and then ending with the marker for `completed`.
+
+
+Response:
+
+[[ ## reasoning ## ]]
+High memory and low memory in Linux refer to the organization of memory in the system, particularly in the context of the Linux kernel's virtual memory management. High memory is the portion of physical memory that is not directly mapped by the kernel's page tables, meaning that user-space applications cannot access it directly. Low memory, on the other hand, is the part of memory that the kernel can access directly. In a typical 32-bit architecture, the virtual memory is split into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). The distinction is important for memory management, especially when dealing with physical memory that cannot be mapped contiguously. Understanding this split is crucial for developers working with the Linux kernel, as it affects how memory is allocated and accessed.
+
+[[ ## response ## ]]
+In Linux, high memory refers to the portion of physical memory that is not directly mapped by the kernel's page tables, making it inaccessible to user-space applications. Low memory is the segment that the kernel can access directly. In a typical 32-bit architecture, the memory is divided into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). This organization is essential for efficient memory management and affects how the kernel interacts with physical memory, especially in scenarios where contiguous memory is required.
+
+[[ ## completed ## ]]
+```
+
+In the previous guide with a CoT module, we got nearly 40% in terms of semantic F1 on our `devset`. Would this `RAG` module score better?
+
+
+```python
+evaluate(RAG())
+```
+
+**Output:**
+```
+Average Metric: 79.44815807647615 / 150  (53.0): 100%|██████████| 150/150 [00:05<00:00, 29.11it/s]
+[TABLE GOES HERE]
+52.97
+```
+
+| | question | example_response | reasoning | pred_response | SemanticF1 |
+|---|---|---|---|---|---|
+| 0 | why is mercurial considered to be easier than git? | Mercurial's syntax is considered more familiar, especially for those accustomed to SVN, and is well documented. It focuses on interface aspects, which initially makes learning... | Mercurial is considered easier than Git for several reasons. Firstly, its syntax is more familiar to users coming from Subversion (SVN), making the transition smoother.... | Mercurial is considered easier than Git primarily because of its more familiar syntax for users transitioning from systems like Subversion (SVN), its straightforward workflow without... | ✔️ [0.797] |
+| 1 | open finder window from current terminal location? | If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`... | To open a Finder window from the current terminal location, you can use the command `open .` in the terminal. This command will open a... | You can open a Finder window from your current terminal location by typing the command `open .` in the terminal. This will launch a Finder... | ✔️ [0.857] |
+| 2 | how to import secret gpg key (copied from one machine to another)? | It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man... | To import a secret GPG key that has been copied from one machine to another, you typically need to use the `gpg --import` command. The... | To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. On the original machine, export your... | |
+
+## Using a DSPy `Optimizer` to improve your RAG prompt.
+
+Off the shelf, our `RAG` module scores 53%. What are our options to make it stronger? One of the various choices DSPy offers is optimizing the prompts in our pipeline.
+
+If there are many sub-modules in your program, all of them will be optimized together. In this case, there's only one: `self.respond = dspy.ChainOfThought('context, question -> response')`
+
+Let's set up and use DSPy's [MIPRO (v2) optimizer](/docs/deep-dive/optimizers/miprov2). The run below has a cost around $1.5 (for the `medium` auto setting) and may take some 20-30 minutes depending on your number of threads.
+
+
+```python
+tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)  # use fewer threads if your rate limit is small
+
+optimized_rag = tp.compile(RAG(), trainset=trainset, valset=valset,
+                           max_bootstrapped_demos=2, max_labeled_demos=2,
+                           requires_permission_to_run=False)
+```
+
+**Output:**
+```
+RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
+num_trials: 25
+minibatch: True
+num_candidates: 19
+valset size: 100
+
+...
+  
+Returning best identified program with score 62.57!
+```
+
+The prompt optimization process here is pretty systematic, you can learn about it for example in this paper. Importantly, it's not a magic button. It's very possible that it can overfit your training set for instance and not generalize well to a held-out set, making it essential that we iteratively validate our programs.
+
+Let's check on example here, asking the same question to the baseline `rag = RAG()` program, which was not optimized, and to the `optimized_rag = MIPROv2(..)(..)` program, after prompt optimization.
+
+
+```python
+baseline = rag(question="cmd+tab does not work on hidden or minimized windows")
+print(baseline.response)
+```
+
+**Output:**
+```
+You are correct; cmd+Tab does not activate hidden or minimized windows in macOS. It functions as an application switcher, allowing you to switch between open applications, but it does not bring up minimized windows. To access minimized windows, you would need to click on them directly or use other shortcuts.
+```
+
+
+```python
+pred = optimized_rag(question="cmd+tab does not work on hidden or minimized windows")
+print(pred.response)
+```
+
+**Output:**
+```
+In macOS, the Command+Tab shortcut is specifically designed to switch between applications rather than individual windows. This means that if an application is minimized or hidden, it will not appear in the Command+Tab application switcher. Therefore, you cannot use Command+Tab to access minimized or hidden windows directly.
+
+If you want to bring a minimized window back into view, you can click on the application's icon in the Dock, or you can use the Command+M shortcut to minimize the current window. For switching between windows of the same application, you can use Command+` (the backtick key) to cycle through open windows of the active application.
+
+For users who prefer a behavior similar to Windows, where minimized windows can be accessed through a single shortcut, third-party applications like HyperSwitch or Witch can provide additional functionality to manage window switching more effectively.
+```
+
+You can use `dspy.inspect_history(n=2)` to view the RAG prompt before optimization(link), after optimization(link), or their diff(link).
+
+Concretely, the optimized prompt:
+
+1. Constructs the following instruction,
+```
+Using the provided `context` and `question`, analyze the information step by step to generate a comprehensive and informative `response`. Ensure that the response clearly explains the concepts involved, highlights key distinctions, and addresses any complexities noted in the context.
+```
+
+2. And includes two fully worked out RAG examples with synthetic reasoning and answers, e.g. `how to transfer whatsapp voice message to computer?`.
+
+Let's now evaluate on the overall devset.
+
+
+```python
+evaluate(optimized_rag)
+```
+
+**Output:**
+```
+Average Metric: 92.16999654981839 / 150  (61.4): 100%|██████████| 150/150 [00:00<00:00, 399.21it/s]
+[TABLE HERE]
+61.45
+```
+| | question | example_response | reasoning | pred_response | SemanticF1 |
+|---|---|---|---|---|---|
+| 0 | why is mercurial considered to be easier than git? | Mercurial's syntax is considered more familiar, especially for those accustomed to SVN, and is well documented. It focuses on interface aspects, which initially makes learning... | Mercurial is often considered easier than Git due to its user-friendly design and interface, which is particularly appealing to those new to version control systems... | Mercurial is considered easier than Git for several reasons: 1. **Familiar Syntax**: Mercurial's command syntax is often seen as more intuitive, especially for users coming... | ✔️ [0.874] |
+| 1 | open finder window from current terminal location? | If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`... | To open a Finder window from the current terminal location on a Mac, there are several methods available. The simplest way is to use the... | To open a Finder window from your current terminal location on a Mac, you can use the following methods: 1. **Using Terminal Command**: - Simply... | ✔️ [0.333] |
+| 2 | how to import secret gpg key (copied from one machine to another)? | It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man... | To import a secret GPG key that has been copied from one machine to another, it is essential to follow a series of steps that... | To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. **Export the Secret Key from the... | |
+
+## Keeping an eye on cost.
+
+DSPy allows you to track the cost of your programs, which can be used to monitor the cost of your calls. Here, we'll show you how to track the cost of your programs with DSPy.
+
+```python
+sum([x['cost'] for x in lm.history if x['cost'] is not None])  # in USD, as calculated by LiteLLM for certain providers
+```
+
+## Saving and loading.
+
+The optimized program has a pretty simple structure on the inside. Feel free to explore it.
+
+Here, we'll save `optimized_rag` so we can load it again later without having to optimize from scratch.
+
+```python
+optimized_rag.save("optimized_rag.json")
+
+loaded_rag = RAG()
+loaded_rag.load("optimized_rag.json")
+
+loaded_rag(question="cmd+tab does not work on hidden or minimized windows")
+```
+
+**Output:**
+```
+Prediction(
+    reasoning='The behavior of the Command+Tab shortcut in macOS is designed to switch between applications rather than individual windows. When an application is minimized or hidden, it does not appear in the application switcher, which is why Command+Tab does not work for those windows. Understanding this limitation is important for users who expect similar functionality to that found in other operating systems, such as Windows, where Alt+Tab can switch between all open windows, including minimized ones.',
+    response="In macOS, the Command+Tab shortcut is specifically designed to switch between applications rather than individual windows. This means that if an application is minimized or hidden, it will not appear in the Command+Tab application switcher. Therefore, you cannot use Command+Tab to access minimized or hidden windows directly.\n\nIf you want to bring a minimized window back into view, you can click on the application's icon in the Dock, or you can use the Command+M shortcut to minimize the current window. For switching between windows of the same application, you can use Command+` (the backtick key) to cycle through open windows of the active application.\n\nFor users who prefer a behavior similar to Windows, where minimized windows can be accessed through a single shortcut, third-party applications like HyperSwitch or Witch can provide additional functionality to manage window switching more effectively."
+)
+```
+
+## What's next?
+
+Improving from just below 40% to above 60% on this task, in terms of `SemanticF1`, was pretty easy.
+
+But DSPy gives you paths to continue iterating on the quality of your system and we have barely scratched the surface.
+
+In general, you have the following tools:
+
+1. Explore better system architectures for your program, e.g. what if we ask the LM to generate search queries for the retriever? See this notebook(link) or the STORM pipeline(link).
+2. Explore different prompt optimizers or weight optimizers. See the **[Optimizers Docs](/docs/building-blocks/optimizers)**.
+3. Scale inference time compute using DSPy Optimizers, e.g. this notebook(link).
+4. Cut cost by distilling to a smaller LM, via prompt or weight optimization, e.g. [this notebook](/docs/deep-dive/optimizers/bootstrap-fewshot) or [this notebook](/docs/deep-dive/optimizers/copro).
+
+How do you do decide which ones to proceed with first?
+
+The first step is look at your system outputs, which will allow you to identify the sources of lower performance if any. While doing all of this, make sure you continue to refine your metric, e.g. by optimizing against your judgments, and to collect more (or more realistic) data, e.g. from related domains or from putting a demo of your system in front of users.
+
+Learn more about the [development cycle](/docs/building-blocks/solving_your_task) in DSPy.
diff --git a/docs/docs/quick-start/minimal-example.mdx b/docs/docs/quick-start/minimal-example.mdx
deleted file mode 100644
index 50c73e118..000000000
--- a/docs/docs/quick-start/minimal-example.mdx
+++ /dev/null
@@ -1,106 +0,0 @@
----
-sidebar_position: 2
----
-
-import AuthorDetails from '@site/src/components/AuthorDetails';
-
-# Minimal Working Example
-
-In this post, we walk you through a minimal working example using the DSPy library. 
-
-We make use of the [GSM8K dataset](https://huggingface.co/datasets/gsm8k) and the OpenAI GPT-3.5-turbo model to simulate prompting tasks within DSPy.
-
-## Setup
-
-Before we jump into the example, let's ensure our environment is properly configured. We'll start by importing the necessary modules and configuring our language model:
-
-```python
-import dspy
-from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
-
-# Set up the LM.
-turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=250)
-dspy.settings.configure(lm=turbo)
-
-# Load math questions from the GSM8K dataset.
-gsm8k = GSM8K()
-gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.dev[:10]
-```
-
-Let's take a look at what `gsm8k_trainset` and `gsm8k_devset` are:
-
-```python
-print(gsm8k_trainset)
-```
-
-The `gsm8k_trainset` and `gsm8k_devset` datasets contain lists of `dspy.Examples`, with each example having `question` and `answer` fields.
-
-## Define the Module
-
-With our environment set up, let's define a custom program that utilizes the [`ChainOfThought`](/docs/deep-dive/modules/ChainOfThought) module to perform step-by-step reasoning to generate answers:
-
-```python
-class CoT(dspy.Module):
-    def __init__(self):
-        super().__init__()
-        self.prog = dspy.ChainOfThought("question -> answer")
-    
-    def forward(self, question):
-        return self.prog(question=question)
-```
-
-## Compile and Evaluate the Model
-
-With our simple program in place, let's move on to compiling it with the [`BootstrapFewShot`](/docs/deep-dive/optimizers/bootstrap-fewshot) teleprompter:
-
-```python
-from dspy.teleprompt import BootstrapFewShot
-
-# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program.
-config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
-
-# Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
-teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
-optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)
-```
-
-Note that `BootstrapFewShot` is not an optimizing teleprompter, i.e. it simply creates and validates examples for steps of the pipeline (in this case, chain-of-thought reasoning) but does not optimize the metric. Other teleprompters like `BootstrapFewShotWithRandomSearch` and `MIPRO` will apply direct optimization.
-
-## Evaluate
-
-Now that we have a compiled (optimized) DSPy program, let's move to evaluating its performance on the dev dataset.
-
-
-```python
-from dspy.evaluate import Evaluate
-
-# Set up the evaluator, which can be used multiple times.
-evaluate = Evaluate(devset=gsm8k_devset, metric=gsm8k_metric, num_threads=4, display_progress=True, display_table=0)
-
-# Evaluate our `optimized_cot` program.
-evaluate(optimized_cot)
-```
-
-## Inspect the Model's History
-
-For a deeper understanding of the model's interactions, we can review the most recent generations through inspecting the model's history:
-
-```python
-turbo.inspect_history(n=1)
-```
-
-And there you have it! You've successfully created a working example using the DSPy library. 
-
-This example showcases how to set up your environment, define a custom module, compile a model, and rigorously evaluate its performance using the provided dataset and teleprompter configurations. 
-
-Feel free to adapt and expand upon this example to suit your specific use case while exploring the extensive capabilities of DSPy.
-
-If you want to try what you just built, run `optimized_cot(question='Your Question Here')`.
-
-:::note
-For a more comprehensive walkthrough with detailed examples, please refer to the introduction colab: [<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/intro.ipynb).
-:::
-
-***
-
-<AuthorDetails name="Herumb Shandilya"/>