Modular RAG: Orchestration and Post-Retrieval #1767

ThomasVitale · 2024-11-19T19:31:58Z

Pre-Retrieval:

Consolidated naming and documentation

Retrieval:

Consolidated naming and documentation
Introduced DocumentJoiner sub-module and CompositionDocumentJoiner operator

Post-Retrieval:

Introduced main interfaces for sub-modules. Implementation waiting for missing features in Document APIs

Orchestration:

Introduced QueryRouter sub-module and AllDocumentRetrieversQueryRouter operator

Generation:

Consolidated naming and documentation

Advisor:

Introduced BaseAdvisor to reduce boilerplate when implementing Advisors
Extended RetrievalAugmentationAdvisor to include the new sub-modules

Relates to #gh-1603

Pre-Retrieval: * Consolidated naming and documentation Retrieval: * Consolidated naming and documentation * Introduced DocumentJoiner sub-module and CompositionDocumentJoiner operator Post-Retrieval: * Introduced main interfaces for sub-modules. Implementation waiting for missing features in Document APIs Orchestration: * Introduced QueryRouter sub-module and AllDocumentRetrieversQueryRouter operator Generation: * Consolidated naming and documentation Advisor: * Introduced BaseAdvisor to reduce boilerplate when implementing Advisors * Extended RetrievalAugmentationAdvisor to include the new sub-modules Relates to #spring-projectsgh-1603

ThomasVitale · 2024-11-19T19:34:08Z

spring-ai-core/pom.xml

@@ -96,6 +96,11 @@
 			<artifactId>micrometer-core</artifactId>
 		</dependency>

+		<dependency>
+			<groupId>io.micrometer</groupId>
+			<artifactId>context-propagation</artifactId>


When using the TaskExecutor to run things asynchronously, no observation data is propagated unless we add this dependency (https://docs.micrometer.io/context-propagation/reference/index.html)

ThomasVitale · 2024-11-19T19:36:26Z

...e/src/main/java/org/springframework/ai/chat/client/advisor/RetrievalAugmentationAdvisor.java


-	private final boolean protectFromBlocking;
+	private final Scheduler scheduler;


Instead of passing a protectFromBlocking parameter, users can now pass a Scheduler implementation if they want to customise the streaming behavior. This has two main benefits: more clarity about what is happening under the hood and also the possibility of using different schedulers than the elastic one.

The case where we don't need the protection from blocking is supported when passing Schedulers.immediate(). The default, as we discussed last time, is running with Schedulers.boundedElastic().

ThomasVitale · 2024-11-19T19:38:16Z

...e/src/main/java/org/springframework/ai/chat/client/advisor/RetrievalAugmentationAdvisor.java

+
+		// 3. Get similar documents for each query.
+		Map<Query, List<List<Document>>> documentsForQuery = expandedQueries.stream()
+			.map(query -> CompletableFuture.supplyAsync(() -> getDocumentsForQuery(query), taskExecutor))


Query expansion and/or routing can result in parallel executions that is supported by a TaskExecutor, following the common practice in Spring. By default, a ThreadPoolTaskExecutor is used, but it's possible to customise it. For example, you can pass the auto-configured TaskExecutor from Spring Boot or another implementation, such as VirtualThreadTaskExecutor. I'll describe this in detail in the separate PR where I'm writing the documentation.

I think is a good starting point that is generally useful. Users can implement their own RAG Advisor to customize how they do scatter gather based on other frameworks and patterns.

ThomasVitale · 2024-11-19T19:39:22Z

spring-ai-core/src/main/java/org/springframework/ai/chat/client/advisor/api/BaseAdvisor.java

+ * @author Thomas Vitale
+ * @since 1.0.0
+ */
+public interface BaseAdvisor extends CallAroundAdvisor, StreamAroundAdvisor {


As we talked last time, it's useful to remove some of the boilerplate when creating a new Advisor. I have created this interface which hides most of the boilerplate. And it's not RAG specific, so it can be used for any Advisor.

ThomasVitale · 2024-11-19T19:40:32Z

...rc/main/java/org/springframework/ai/rag/preretrieval/query/expansion/MultiQueryExpander.java

@@ -70,7 +69,7 @@ public final class MultiQueryExpander implements QueryExpander {
 			Query variants:
 			""");

-	private static final Boolean DEFAULT_INCLUDE_ORIGINAL = false;
+	private static final Boolean DEFAULT_INCLUDE_ORIGINAL = true;


By default, the original query should be included since that's the most common case. My previous decision here for the default was not great. I fixed that now.

ThomasVitale · 2024-11-19T19:46:44Z

...c/main/java/org/springframework/ai/rag/generation/augmentation/ContextualQueryAugmenter.java

@@ -74,15 +74,15 @@ public final class ContextualQueryAugmentor implements QueryAugmentor {
 			Politely inform the user that you can't answer it.
 			""");

-	private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = true;
+	private static final boolean DEFAULT_ALLOW_EMPTY_CONTEXT = false;


This one is very debatable so I'm eager the get some feedback.

When using explicit RAG, I probably want to reduce hallucinations as much as possible. If no similar documents are retrieved from a vector store, I probably want to short-circuit the generation and have the model answering that it doesn't know rather than making stuff up.

I would therefore recommend disallowing having empty context by default. Then the user can decide if they want the model to make stuff up. This should help a lot with troubleshooting and in spotting retrieval errors earlier.

The case where I want to fallback to "standard" generation without context is something that should be handled at the beginning of the RAG flow. There's typically an agentic component that starts a flow by determining whether to send a request directly to the model or go through the retrieval process. It's something I have in my backlog and that will be introduced as a "Scheduling" sub-module.

What do you think about this choice of the default value?

Since the main point of advanced rag techniques is better accuracy, i think explicitly dealing with the empty context makes sense, as otherwise it will make up stuff, going against the 'pedal tone' of intent for these classes.

markpollack · 2024-11-20T16:46:06Z

I had to add

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>context-propagation</artifactId>
    <scope>test</scope>
</dependency>

to the spring-ai-integration-test to avoid classdef not found . I would have thought it would come transitively.

markpollack · 2024-11-20T16:46:51Z

This test didn't pass in the maven build in the terminal, but does in the IDE.

[ERROR] org.springframework.ai.integration.tests.client.advisor.RetrievalAugmentationAdvisorIT.ragWithTranslation -- Time elapsed: 2.402 s <<< FAILURE!
java.lang.AssertionError: 

Expecting actual:
  "Anacletus og Birbas eventyr finder sted i de skotske højland."
to contain:
  "Highlands"
 (ignoring case)
    at org.springframework.ai.integration.tests.client.advisor.RetrievalAugmentationAdvisorIT.ragWithTranslation(RetrievalAugmentationAdvisorIT.java:131)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

seems like the model sometimes doesn't follow instructions?

markpollack · 2024-11-20T16:47:09Z

merged in d759fb2 Will see how well the CI handles the test.

ThomasVitale commented Nov 19, 2024

View reviewed changes

ThomasVitale mentioned this pull request Nov 19, 2024

Modular RAG - Part 1 #1603

Closed

markpollack self-assigned this Nov 20, 2024

markpollack added this to the 1.0.0-M4 milestone Nov 20, 2024

markpollack closed this Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular RAG: Orchestration and Post-Retrieval #1767

Modular RAG: Orchestration and Post-Retrieval #1767

ThomasVitale commented Nov 19, 2024

ThomasVitale Nov 19, 2024

markpollack Nov 20, 2024

ThomasVitale Nov 19, 2024

ThomasVitale Nov 19, 2024

markpollack Nov 20, 2024

ThomasVitale Nov 19, 2024

ThomasVitale Nov 19, 2024

ThomasVitale Nov 19, 2024

markpollack Nov 20, 2024

markpollack commented Nov 20, 2024

markpollack commented Nov 20, 2024

markpollack commented Nov 20, 2024


		private final boolean protectFromBlocking;
		private final Scheduler scheduler;

Modular RAG: Orchestration and Post-Retrieval #1767

Modular RAG: Orchestration and Post-Retrieval #1767

Conversation

ThomasVitale commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markpollack commented Nov 20, 2024

markpollack commented Nov 20, 2024

markpollack commented Nov 20, 2024