Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular RAG: Orchestration and Post-Retrieval #1767

Conversation

ThomasVitale
Copy link
Contributor

Pre-Retrieval:

  • Consolidated naming and documentation

Retrieval:

  • Consolidated naming and documentation
  • Introduced DocumentJoiner sub-module and CompositionDocumentJoiner operator

Post-Retrieval:

  • Introduced main interfaces for sub-modules. Implementation waiting for missing features in Document APIs

Orchestration:

  • Introduced QueryRouter sub-module and AllDocumentRetrieversQueryRouter operator

Generation:

  • Consolidated naming and documentation

Advisor:

  • Introduced BaseAdvisor to reduce boilerplate when implementing Advisors
  • Extended RetrievalAugmentationAdvisor to include the new sub-modules

Relates to #gh-1603

Pre-Retrieval:
* Consolidated naming and documentation

Retrieval:
* Consolidated naming and documentation
* Introduced DocumentJoiner sub-module and CompositionDocumentJoiner operator

Post-Retrieval:
* Introduced main interfaces for sub-modules. Implementation waiting for missing features in Document APIs

Orchestration:
* Introduced QueryRouter sub-module and AllDocumentRetrieversQueryRouter operator

Generation:
* Consolidated naming and documentation

Advisor:
* Introduced BaseAdvisor to reduce boilerplate when implementing Advisors
* Extended RetrievalAugmentationAdvisor to include the new sub-modules

Relates to #spring-projectsgh-1603
@@ -96,6 +96,11 @@
<artifactId>micrometer-core</artifactId>
</dependency>

<dependency>
<groupId>io.micrometer</groupId>
<artifactId>context-propagation</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the TaskExecutor to run things asynchronously, no observation data is propagated unless we add this dependency (https://docs.micrometer.io/context-propagation/reference/index.html)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


private final boolean protectFromBlocking;
private final Scheduler scheduler;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing a protectFromBlocking parameter, users can now pass a Scheduler implementation if they want to customise the streaming behavior. This has two main benefits: more clarity about what is happening under the hood and also the possibility of using different schedulers than the elastic one.

The case where we don't need the protection from blocking is supported when passing Schedulers.immediate(). The default, as we discussed last time, is running with Schedulers.boundedElastic().


// 3. Get similar documents for each query.
Map<Query, List<List<Document>>> documentsForQuery = expandedQueries.stream()
.map(query -> CompletableFuture.supplyAsync(() -> getDocumentsForQuery(query), taskExecutor))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query expansion and/or routing can result in parallel executions that is supported by a TaskExecutor, following the common practice in Spring. By default, a ThreadPoolTaskExecutor is used, but it's possible to customise it. For example, you can pass the auto-configured TaskExecutor from Spring Boot or another implementation, such as VirtualThreadTaskExecutor. I'll describe this in detail in the separate PR where I'm writing the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is a good starting point that is generally useful. Users can implement their own RAG Advisor to customize how they do scatter gather based on other frameworks and patterns.

* @author Thomas Vitale
* @since 1.0.0
*/
public interface BaseAdvisor extends CallAroundAdvisor, StreamAroundAdvisor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked last time, it's useful to remove some of the boilerplate when creating a new Advisor. I have created this interface which hides most of the boilerplate. And it's not RAG specific, so it can be used for any Advisor.

@@ -70,7 +69,7 @@ public final class MultiQueryExpander implements QueryExpander {
Query variants:
""");

private static final Boolean DEFAULT_INCLUDE_ORIGINAL = false;
private static final Boolean DEFAULT_INCLUDE_ORIGINAL = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the original query should be included since that's the most common case. My previous decision here for the default was not great. I fixed that now.

@@ -74,15 +74,15 @@ public final class ContextualQueryAugmentor implements QueryAugmentor {
Politely inform the user that you can't answer it.
""");

private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = true;
private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is very debatable so I'm eager the get some feedback.

When using explicit RAG, I probably want to reduce hallucinations as much as possible. If no similar documents are retrieved from a vector store, I probably want to short-circuit the generation and have the model answering that it doesn't know rather than making stuff up.

I would therefore recommend disallowing having empty context by default. Then the user can decide if they want the model to make stuff up. This should help a lot with troubleshooting and in spotting retrieval errors earlier.

The case where I want to fallback to "standard" generation without context is something that should be handled at the beginning of the RAG flow. There's typically an agentic component that starts a flow by determining whether to send a request directly to the model or go through the retrieval process. It's something I have in my backlog and that will be introduced as a "Scheduling" sub-module.

What do you think about this choice of the default value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the main point of advanced rag techniques is better accuracy, i think explicitly dealing with the empty context makes sense, as otherwise it will make up stuff, going against the 'pedal tone' of intent for these classes.

@markpollack markpollack self-assigned this Nov 20, 2024
@markpollack markpollack added this to the 1.0.0-M4 milestone Nov 20, 2024
@markpollack
Copy link
Member

I had to add

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>context-propagation</artifactId>
    <scope>test</scope>
</dependency>

to the spring-ai-integration-test to avoid classdef not found . I would have thought it would come transitively.

@markpollack
Copy link
Member

This test didn't pass in the maven build in the terminal, but does in the IDE.

[ERROR] org.springframework.ai.integration.tests.client.advisor.RetrievalAugmentationAdvisorIT.ragWithTranslation -- Time elapsed: 2.402 s <<< FAILURE!
java.lang.AssertionError: 

Expecting actual:
  "Anacletus og Birbas eventyr finder sted i de skotske højland."
to contain:
  "Highlands"
 (ignoring case)
    at org.springframework.ai.integration.tests.client.advisor.RetrievalAugmentationAdvisorIT.ragWithTranslation(RetrievalAugmentationAdvisorIT.java:131)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

seems like the model sometimes doesn't follow instructions?

@markpollack
Copy link
Member

merged in d759fb2 Will see how well the CI handles the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants