From 251ca886bed61c9a72a45fbbf82c74ca29615c12 Mon Sep 17 00:00:00 2001 From: Omar Khattab Date: Wed, 16 Oct 2024 09:36:04 -0700 Subject: [PATCH] Docs --- docs/docs/quick-start/getting-started-01.md | 123 +------------------- docs/docs/quick-start/getting-started-02.md | 96 +++------------ 2 files changed, 24 insertions(+), 195 deletions(-) diff --git a/docs/docs/quick-start/getting-started-01.md b/docs/docs/quick-start/getting-started-01.md index 4e6cf5eaf..66eaedb2f 100644 --- a/docs/docs/quick-start/getting-started-01.md +++ b/docs/docs/quick-start/getting-started-01.md @@ -45,52 +45,10 @@ Let's see it directly. You can inspect the `n` last prompts sent by DSPy easily. dspy.inspect_history(n=1) ``` -**Output:** -``` -System message: - -Your input fields are: -1. `question` (str) - -Your output fields are: -1. `response` (str) - -All interactions will be structured in the following way, with the appropriate values filled in. - -[[ ## question ## ]] -{question} - -[[ ## response ## ]] -{response} - -[[ ## completed ## ]] - -In adhering to this structure, your objective is: - Given the fields `question`, produce the fields `response`. - - -User message: - -[[ ## question ## ]] -what are high memory and low memory on linux? - -Respond with the corresponding output fields, starting with the field `response`, and then ending with the marker for `completed`. +**Output:** +See this [gist](https://gist.github.com/okhat/aff3c9788ccddf726fdfeb78e40e5d22) -Response: - -[[ ## response ## ]] -In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures. - -- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly accessible by the kernel and is used for kernel data structures and user processes. The low memory region is where most of the system's memory management occurs, and it is where the kernel can allocate memory for processes without needing special handling. - -- **High Memory**: This refers to memory above the 896 MB threshold in a 32-bit system. The kernel cannot directly access this memory without special mechanisms because of the limitations of the 32-bit address space. High memory is used for user processes that require more memory than what is available in the low memory region. The kernel can manage high memory through techniques like "highmem" support, which allows it to map high memory pages into the kernel's address space when needed. - -In summary, low memory is directly accessible by the kernel, while high memory requires additional handling for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less relevant as the addressable memory space is significantly larger. - -[[ ## completed ## ]] -``` - DSPy has various built-in modules, e.g. `dspy.ChainOfThought`, `dspy.ProgramOfThought`, and `dspy.ReAct`. These are interchangeable with basic `dspy.Predict`: they take your signature, which is specific to your task, and they apply general-purpose prompting techniques and inference-time strategies to it. For example, `dspy.ChainOfThought` is an easy way to elicit `reasoning` out of your LM before it commits to the outputs requested in your signature. @@ -125,14 +83,14 @@ That said, you're likely here because you want to build a high-quality system an To measure the quality of your DSPy system, you need (1) a bunch of input values, like `question`s for example, and (2) a `metric` that can score the quality of an output from your system. Metrics vary widely. Some metrics need ground-truth labels of ideal outputs, e.g. for classification or question answering. Other metrics are self-supervised, e.g. checking faithfulness or lack of hallucination, perhaps using a DSPy program as a judge of these qualities. -Let's load a dataset of questions and their (pretty long) gold answers. Since we started this notebook with the goal of building **a system for answering Tech questions**, we obtained a bunch of StackExchange-based questions and their correct answers from the RAG-QA Arena dataset. (Learn more about the [development cycle](/docs/building-blocks/solving_your_task) if you don't have data for your task.) +Let's load a dataset of questions and their (pretty long) gold answers. Since we started this notebook with the goal of building **a system for answering Tech questions**, we obtained a bunch of StackExchange-based questions and their correct answers from the [RAG-QA Arena](https://arxiv.org/abs/2407.13998) dataset. (Learn more about the [development cycle](/docs/building-blocks/solving_your_task) if you don't have data for your task.) ```python import ujson # Download 500 question--answer pairs from the RAG-QA Arena "Tech" dataset. -# !wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json +!wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json with open('ragqa_arena_tech_500.json') as f: data = ujson.load(f) @@ -233,75 +191,8 @@ The final DSPy module call above actually happens inside `metric`. You might be dspy.inspect_history(n=1) ``` -**Output:** -``` -System message: - -Your input fields are: -1. `question` (str) -2. `ground_truth` (str) -3. `system_response` (str) - -Your output fields are: -1. `reasoning` (str) -2. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response -3. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth - -All interactions will be structured in the following way, with the appropriate values filled in. - -[[ ## question ## ]] -{question} - -[[ ## ground_truth ## ]] -{ground_truth} - -[[ ## system_response ## ]] -{system_response} - -[[ ## reasoning ## ]] -{reasoning} - -[[ ## recall ## ]] -{recall} - -[[ ## precision ## ]] -{precision} - -[[ ## completed ## ]] - -In adhering to this structure, your objective is: - Compare a system's response to the ground truth to compute its recall and precision. - If asked to reason, enumerate key ideas in each response, and whether they are present in the other response. - - -User message: - -[[ ## question ## ]] -what are high memory and low memory on linux? - -[[ ## ground_truth ## ]] -"High Memory" refers to the application or user space, the memory that user programs can use and which isn't permanently mapped in the kernel's space, while "Low Memory" is the kernel's space, which the kernel can address directly and is permanently mapped. -The user cannot access the Low Memory as it is set aside for the required kernel programs. - -[[ ## system_response ## ]] -In Linux, "low memory" refers to the memory that is directly accessible by the kernel and user processes, typically the first 4GB on a 32-bit system. "High memory" refers to memory above this limit, which is not directly accessible by the kernel in a 32-bit environment. This distinction is crucial for memory management, particularly in systems with large amounts of RAM, as it influences how memory is allocated and accessed. - -Respond with the corresponding output fields, starting with the field `reasoning`, then `recall`, then `precision`, and then ending with the marker for `completed`. - - -Response: - -[[ ## reasoning ## ]] -The system response correctly identifies "low memory" as the memory directly accessible by the kernel and user processes, and it mentions the typical limit of 4GB on a 32-bit system. It also accurately describes "high memory" as the memory above this limit that is not directly accessible by the kernel. However, it does not explicitly mention that "low memory" is set aside for kernel programs, which is a key point in the ground truth. The system response provides a good overview of the concepts but lacks the detail about user access to low memory and the permanent mapping aspect mentioned in the ground truth. - -[[ ## recall ## ]] -0.75 - -[[ ## precision ## ]] -0.85 - -[[ ## completed ## ]] -``` +**Output:** +See this [gist](https://gist.github.com/okhat/57bf86472d1e14812c0ae33fba5353f8) For evaluation, you could use the metric above in a simple loop and just average the score. But for nice parallelism and utilities, we can rely on `dspy.Evaluate`. @@ -317,8 +208,6 @@ evaluate(cot) **Output:** ``` Average Metric: 59.565342393613165 / 150 (39.7): 100%|██████████| 150/150 [00:00<00:00, 432.92it/s] -[TABLE HERE] -39.71 ``` The table you'll get in the output would look like: diff --git a/docs/docs/quick-start/getting-started-02.md b/docs/docs/quick-start/getting-started-02.md index 62160649a..87da009f7 100644 --- a/docs/docs/quick-start/getting-started-02.md +++ b/docs/docs/quick-start/getting-started-02.md @@ -12,8 +12,7 @@ Install the latest DSPy via `pip install -U dspy` and follow along. You may also In [Getting Started I: Basic Question Answering](/docs/quick-start/getting-started-01), we've set up the DSPy LM, loaded some data, and loaded a metric for evaluation. -Let's do these again and also download the corpus data that we will use for RAG search. The next cell will seek to download 4 GBs, so it may take a few minutes. A future version of this notebook will come with a cache that allows you to skip downloads and the pytorch installation. - +First, let's download the corpus data that we will use for RAG search. The next cell will seek to download 4 GBs, so it may take a few minutes. A future version of this notebook will come with a cache that allows you to skip downloads and the pytorch installation. ```python import os @@ -34,7 +33,11 @@ for url in urls: print(f"Downloading '{filename}'...") with requests.get(url, stream=True) as r, open(filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): f.write(chunk) +``` +Having downloaded these items, let's set up the data and other objects from the previous guide. + +```python import ujson import dspy from dspy.evaluate import SemanticF1 @@ -107,63 +110,9 @@ Prediction( dspy.inspect_history() ``` -**Output:** -``` -System message: - -Your input fields are: -1. `context` (str) -2. `question` (str) - -Your output fields are: -1. `reasoning` (str) -2. `response` (str) - -All interactions will be structured in the following way, with the appropriate values filled in. - -[[ ## context ## ]] -{context} - -[[ ## question ## ]] -{question} - -[[ ## reasoning ## ]] -{reasoning} - -[[ ## response ## ]] -{response} - -[[ ## completed ## ]] - -In adhering to this structure, your objective is: - Given the fields `context`, `question`, produce the fields `response`. - - -User message: - -[[ ## context ## ]] -[1] «As far as I remember, High Memory is used for application space and Low Memory for the kernel. Advantage is that (user-space) applications cant access kernel-space memory.» -[2] «For the people looking for an explanation in the context of Linux kernel memory space, beware that there are two conflicting definitions of the high/low memory split (unfortunately there is no standard, one has to interpret that in context): High memory defined as the totality of kernel space in VIRTUAL memory. This is a region that only the kernel can access and comprises all virtual addresses greater or equal than PAGE_OFFSET. Low memory refers therefore to the region of the remaining addresses, which correspond to the user-space memory accessible from each user process. For example: on 32-bit x86 with a default PAGE_OFFSET, this means that high memory is any address ADDR with ADDR ≥ 0xC0000000 = PAGE_OFFSET (i.e. higher 1 GB). This is the reason why in Linux 32-bit processes are typically limited to 3 GB. Note that PAGE_OFFSET cannot be configured directly, it depends on the configurable VMSPLIT_x options (source). To summarize: in 32-bit archs, virtual memory is by default split into lower 3 GB (user space) and higher 1 GB (kernel space). For 64 bit, PAGE_OFFSET is not configurable and depends on architectural details that are sometimes detected at runtime during kernel load. On x86_64, PAGE_OFFSET is 0xffff888000000000 for 4-level paging (typical) and 0xff11000000000000 for 5-level paging (source). For ARM64 this is usually 0x8000000000000000. Note though, if KASLR is enabled, this value is intentionally unpredictable. High memory defined as the portion of PHYSICAL memory that cannot be mapped contiguously with the rest of the kernel virtual memory. A portion of the kernel virtual address space can be mapped as a single contiguous chunk into the so-called physical low memory. To fully understand what this means, a deeper knowledge of the Linux virtual memory space is required. I would recommend going through these slides. From the slides: This kind of high/low memory split is only applicable to 32-bit architectures where the installed physical RAM size is relatively high (more than ~1 GB). Otherwise, i.e. when the physical address space is small (<1 GB) or when the virtual memory space is large (64 bits), the whole physical space can be accessed from the kernel virtual memory space. In that case, all physical memory is considered low memory. It is preferable that high memory does not exist at all because the whole physical space can be accessed directly from the kernel, which makes memory management a lot simpler and efficient. This is especially important when dealing with DMAs (which typically require physically contiguous memory). See also the answer by @gilles» -[3] «Low and High do not refer to whether there is a lot of usage or not. They represent the way it is organized by the system. According to Wikipedia: High Memory is the part of physical memory in a computer which is not directly mapped by the page tables of its operating system kernel. There is no duration for the free command which simply computes a snapshot of the information available. Most people, including programmers, do not need to understand it more clearly as it is managed in a much simpler form through system calls and compiler/interpreter operations.» -[4] «This is relevant to the Linux kernel; Im not sure how any Unix kernel handles this. The High Memory is the segment of memory that user-space programs can address. It cannot touch Low Memory. Low Memory is the segment of memory that the Linux kernel can address directly. If the kernel must access High Memory, it has to map it into its own address space first. There was a patch introduced recently that lets you control where the segment is. The tradeoff is that you can take addressable memory away from user space so that the kernel can have more memory that it does not have to map before using. Additional resources: http://tldp.org/HOWTO/KernelAnalysis-HOWTO-7.html http://linux-mm.org/HighMemory» -[5] «HIGHMEM is a range of kernels memory space, but it is NOT memory you access but its a place where you put what you want to access. A typical 32bit Linux virtual memory map is like: 0x00000000-0xbfffffff: user process (3GB) 0xc0000000-0xffffffff: kernel space (1GB) (CPU-specific vector and whatsoever are ignored here). Linux splits the 1GB kernel space into 2 pieces, LOWMEM and HIGHMEM. The split varies from installation to installation. If an installation chooses, say, 512MB-512MB for LOW and HIGH mems, the 512MB LOWMEM (0xc0000000-0xdfffffff) is statically mapped at the kernel boot time; usually the first so many bytes of the physical memory is used for this so that virtual and physical addresses in this range have a constant offset of, say, 0xc0000000. On the other hand, the latter 512MB (HIGHMEM) has no static mapping (although you could leave pages semi-permanently mapped there, but you must do so explicitly in your driver code). Instead, pages are temporarily mapped and unmapped here so that virtual and physical addresses in this range have no consistent mapping. Typical uses of HIGHMEM include single-time data buffers.» +**Output:** +See this [gist](https://gist.github.com/okhat/d807032e138862bb54616dcd2f4d481c) -[[ ## question ## ]] -what are high memory and low memory on linux? - -Respond with the corresponding output fields, starting with the field `reasoning`, then `response`, and then ending with the marker for `completed`. - - -Response: - -[[ ## reasoning ## ]] -High memory and low memory in Linux refer to the organization of memory in the system, particularly in the context of the Linux kernel's virtual memory management. High memory is the portion of physical memory that is not directly mapped by the kernel's page tables, meaning that user-space applications cannot access it directly. Low memory, on the other hand, is the part of memory that the kernel can access directly. In a typical 32-bit architecture, the virtual memory is split into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). The distinction is important for memory management, especially when dealing with physical memory that cannot be mapped contiguously. Understanding this split is crucial for developers working with the Linux kernel, as it affects how memory is allocated and accessed. - -[[ ## response ## ]] -In Linux, high memory refers to the portion of physical memory that is not directly mapped by the kernel's page tables, making it inaccessible to user-space applications. Low memory is the segment that the kernel can access directly. In a typical 32-bit architecture, the memory is divided into 3 GB for user space (low memory) and 1 GB for kernel space (high memory). This organization is essential for efficient memory management and affects how the kernel interacts with physical memory, especially in scenarios where contiguous memory is required. - -[[ ## completed ## ]] -``` In the previous guide with a CoT module, we got nearly 40% in terms of semantic F1 on our `devset`. Would this `RAG` module score better? @@ -175,8 +124,6 @@ evaluate(RAG()) **Output:** ``` Average Metric: 79.44815807647615 / 150 (53.0): 100%|██████████| 150/150 [00:05<00:00, 29.11it/s] -[TABLE GOES HERE] -52.97 ``` | | question | example_response | reasoning | pred_response | SemanticF1 | @@ -185,6 +132,7 @@ Average Metric: 79.44815807647615 / 150 (53.0): 100%|████████ | 1 | open finder window from current terminal location? | If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`... | To open a Finder window from the current terminal location, you can use the command `open .` in the terminal. This command will open a... | You can open a Finder window from your current terminal location by typing the command `open .` in the terminal. This will launch a Finder... | ✔️ [0.857] | | 2 | how to import secret gpg key (copied from one machine to another)? | It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man... | To import a secret GPG key that has been copied from one machine to another, you typically need to use the `gpg --import` command. The... | To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. On the original machine, export your... | | + ## Using a DSPy `Optimizer` to improve your RAG prompt. Off the shelf, our `RAG` module scores 53%. What are our options to make it stronger? One of the various choices DSPy offers is optimizing the prompts in our pipeline. @@ -202,18 +150,9 @@ optimized_rag = tp.compile(RAG(), trainset=trainset, valset=valset, requires_permission_to_run=False) ``` -**Output:** -``` -RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS: -num_trials: 25 -minibatch: True -num_candidates: 19 -valset size: 100 +**Output:** +See this [gist](https://gist.github.com/okhat/d6606e480a94c88180441617342699eb) -... - -Returning best identified program with score 62.57! -``` The prompt optimization process here is pretty systematic, you can learn about it for example in this paper. Importantly, it's not a magic button. It's very possible that it can overfit your training set for instance and not generalize well to a held-out set, making it essential that we iteratively validate our programs. @@ -245,7 +184,7 @@ If you want to bring a minimized window back into view, you can click on the app For users who prefer a behavior similar to Windows, where minimized windows can be accessed through a single shortcut, third-party applications like HyperSwitch or Witch can provide additional functionality to manage window switching more effectively. ``` -You can use `dspy.inspect_history(n=2)` to view the RAG prompt before optimization(link), after optimization(link), or their diff(link). +You can use `dspy.inspect_history(n=2)` to view the RAG prompt [before optimization](https://gist.github.com/okhat/5d04648f2226e72e66e26a8cb1456ee4) and [after optimization](https://gist.github.com/okhat/79405b8889b4b07da577ee19f1a3479a). Concretely, the optimized prompt: @@ -266,15 +205,16 @@ evaluate(optimized_rag) **Output:** ``` Average Metric: 92.16999654981839 / 150 (61.4): 100%|██████████| 150/150 [00:00<00:00, 399.21it/s] -[TABLE HERE] -61.45 ``` + + | | question | example_response | reasoning | pred_response | SemanticF1 | |---|---|---|---|---|---| | 0 | why is mercurial considered to be easier than git? | Mercurial's syntax is considered more familiar, especially for those accustomed to SVN, and is well documented. It focuses on interface aspects, which initially makes learning... | Mercurial is often considered easier than Git due to its user-friendly design and interface, which is particularly appealing to those new to version control systems... | Mercurial is considered easier than Git for several reasons: 1. **Familiar Syntax**: Mercurial's command syntax is often seen as more intuitive, especially for users coming... | ✔️ [0.874] | | 1 | open finder window from current terminal location? | If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`... | To open a Finder window from the current terminal location on a Mac, there are several methods available. The simplest way is to use the... | To open a Finder window from your current terminal location on a Mac, you can use the following methods: 1. **Using Terminal Command**: - Simply... | ✔️ [0.333] | | 2 | how to import secret gpg key (copied from one machine to another)? | It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man... | To import a secret GPG key that has been copied from one machine to another, it is essential to follow a series of steps that... | To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. **Export the Secret Key from the... | | + ## Keeping an eye on cost. DSPy allows you to track the cost of your programs, which can be used to monitor the cost of your calls. Here, we'll show you how to track the cost of your programs with DSPy. @@ -314,10 +254,10 @@ But DSPy gives you paths to continue iterating on the quality of your system and In general, you have the following tools: -1. Explore better system architectures for your program, e.g. what if we ask the LM to generate search queries for the retriever? See this notebook(link) or the STORM pipeline(link). -2. Explore different prompt optimizers or weight optimizers. See the **[Optimizers Docs](/docs/building-blocks/optimizers)**. -3. Scale inference time compute using DSPy Optimizers, e.g. this notebook(link). -4. Cut cost by distilling to a smaller LM, via prompt or weight optimization, e.g. [this notebook](/docs/deep-dive/optimizers/bootstrap-fewshot) or [this notebook](/docs/deep-dive/optimizers/copro). +1. Explore better system architectures for your program, e.g. what if we ask the LM to generate search queries for the retriever? See this [notebook](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/intro.ipynb) or the [STORM pipeline](https://arxiv.org/abs/2402.14207) built in DSPy. +2. Explore different [prompt optimizers](https://arxiv.org/abs/2406.11695) or [weight optimizers](https://arxiv.org/abs/2407.10930). See the **[Optimizers Docs](/docs/building-blocks/optimizers)**. +3. Scale inference time compute using DSPy Optimizers, e.g. this [notebook](https://github.com/stanfordnlp/dspy/blob/main/examples/agents/multi_agent.ipynb). +4. Cut cost by distilling to a smaller LM, via prompt or weight optimization, e.g. [this notebook](https://github.com/stanfordnlp/dspy/blob/main/examples/nli/scone/scone.ipynb) or [this notebook](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/qa/hotpot/multihop_finetune.ipynb). How do you do decide which ones to proceed with first?