-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modular RAG: Orchestration and Post-Retrieval #1767
Modular RAG: Orchestration and Post-Retrieval #1767
Conversation
Pre-Retrieval: * Consolidated naming and documentation Retrieval: * Consolidated naming and documentation * Introduced DocumentJoiner sub-module and CompositionDocumentJoiner operator Post-Retrieval: * Introduced main interfaces for sub-modules. Implementation waiting for missing features in Document APIs Orchestration: * Introduced QueryRouter sub-module and AllDocumentRetrieversQueryRouter operator Generation: * Consolidated naming and documentation Advisor: * Introduced BaseAdvisor to reduce boilerplate when implementing Advisors * Extended RetrievalAugmentationAdvisor to include the new sub-modules Relates to #spring-projectsgh-1603
@@ -96,6 +96,11 @@ | |||
<artifactId>micrometer-core</artifactId> | |||
</dependency> | |||
|
|||
<dependency> | |||
<groupId>io.micrometer</groupId> | |||
<artifactId>context-propagation</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using the TaskExecutor to run things asynchronously, no observation data is propagated unless we add this dependency (https://docs.micrometer.io/context-propagation/reference/index.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
private final boolean protectFromBlocking; | ||
private final Scheduler scheduler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing a protectFromBlocking
parameter, users can now pass a Scheduler
implementation if they want to customise the streaming behavior. This has two main benefits: more clarity about what is happening under the hood and also the possibility of using different schedulers than the elastic one.
The case where we don't need the protection from blocking is supported when passing Schedulers.immediate()
. The default, as we discussed last time, is running with Schedulers.boundedElastic()
.
|
||
// 3. Get similar documents for each query. | ||
Map<Query, List<List<Document>>> documentsForQuery = expandedQueries.stream() | ||
.map(query -> CompletableFuture.supplyAsync(() -> getDocumentsForQuery(query), taskExecutor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query expansion and/or routing can result in parallel executions that is supported by a TaskExecutor
, following the common practice in Spring. By default, a ThreadPoolTaskExecutor
is used, but it's possible to customise it. For example, you can pass the auto-configured TaskExecutor
from Spring Boot or another implementation, such as VirtualThreadTaskExecutor
. I'll describe this in detail in the separate PR where I'm writing the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is a good starting point that is generally useful. Users can implement their own RAG Advisor to customize how they do scatter gather based on other frameworks and patterns.
* @author Thomas Vitale | ||
* @since 1.0.0 | ||
*/ | ||
public interface BaseAdvisor extends CallAroundAdvisor, StreamAroundAdvisor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked last time, it's useful to remove some of the boilerplate when creating a new Advisor. I have created this interface which hides most of the boilerplate. And it's not RAG specific, so it can be used for any Advisor.
@@ -70,7 +69,7 @@ public final class MultiQueryExpander implements QueryExpander { | |||
Query variants: | |||
"""); | |||
|
|||
private static final Boolean DEFAULT_INCLUDE_ORIGINAL = false; | |||
private static final Boolean DEFAULT_INCLUDE_ORIGINAL = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the original query should be included since that's the most common case. My previous decision here for the default was not great. I fixed that now.
@@ -74,15 +74,15 @@ public final class ContextualQueryAugmentor implements QueryAugmentor { | |||
Politely inform the user that you can't answer it. | |||
"""); | |||
|
|||
private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = true; | |||
private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is very debatable so I'm eager the get some feedback.
When using explicit RAG, I probably want to reduce hallucinations as much as possible. If no similar documents are retrieved from a vector store, I probably want to short-circuit the generation and have the model answering that it doesn't know rather than making stuff up.
I would therefore recommend disallowing having empty context by default. Then the user can decide if they want the model to make stuff up. This should help a lot with troubleshooting and in spotting retrieval errors earlier.
The case where I want to fallback to "standard" generation without context is something that should be handled at the beginning of the RAG flow. There's typically an agentic component that starts a flow by determining whether to send a request directly to the model or go through the retrieval process. It's something I have in my backlog and that will be introduced as a "Scheduling" sub-module.
What do you think about this choice of the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the main point of advanced rag techniques is better accuracy, i think explicitly dealing with the empty context makes sense, as otherwise it will make up stuff, going against the 'pedal tone' of intent for these classes.
I had to add
to the |
This test didn't pass in the maven build in the terminal, but does in the IDE.
seems like the model sometimes doesn't follow instructions? |
merged in d759fb2 Will see how well the CI handles the test. |
Pre-Retrieval:
Retrieval:
Post-Retrieval:
Orchestration:
Generation:
Advisor:
Relates to #gh-1603