Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline: Add lecture chat pipeline connection #173

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

sebastianloose
Copy link

@sebastianloose sebastianloose commented Nov 11, 2024

Add a POST route to connect the lecture chat pipeline, enabling Artemis to send messages directly into the lecture chat system.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new LectureChatStatusUpdateDTO class for improved status updates in lecture chat scenarios.
    • Added a LectureChatCallback class to enhance handling of status updates during lecture chat processing.
    • Implemented a new endpoint /lecture-chat/{variant}/run for executing the lecture chat pipeline.
  • Enhancements

    • Updated LectureChatPipeline to integrate a callback mechanism for better response management.
    • Added COURSE_LANGUAGE property to the lecture schema for enhanced data organization.
    • Expanded API capabilities to include lecture chat processing and information about available variants.
  • Bug Fixes

    • Streamlined the property handling logic for collection creation in the lecture schema.

Copy link
Contributor

coderabbitai bot commented Nov 11, 2024

Walkthrough

This pull request introduces several enhancements to the lecture chat functionality within the application. It includes the addition of a new data transfer object (LectureChatStatusUpdateDTO) and a callback class (LectureChatCallback) for managing status updates specific to lecture chats. The LectureChatPipeline class is modified to incorporate a callback mechanism for better response handling, and a new endpoint is added to the FastAPI router for executing the lecture chat pipeline. Additionally, the schema for lecture collections in the vector database is updated to include a COURSE_LANGUAGE property.

Changes

File Path Change Summary
app/domain/status/lecture_chat_status_update_dto.py Class added: LectureChatStatusUpdateDTO with attribute result: str.
app/pipeline/chat/lecture_chat_pipeline.py - Variable added: callback: LectureChatCallback in LectureChatPipeline.
- Constructor updated to accept callback.
- __call__ method modified to use the callback for success and error handling.
- Updated gpt_version_equivalent from 3.5 to 4.5.
app/vector_database/lecture_schema.py Property added: COURSE_LANGUAGE in init_lecture_schema function for collection creation in Weaviate.
app/web/routers/pipelines.py - Method added: run_lecture_chat_pipeline_worker(dto, variant).
- Method added: run_lecture_chat_pipeline(variant: str, dto: LectureChatPipelineExecutionDTO).
- Case added: "LECTURE_CHAT" in get_pipeline(feature: str).
app/web/status/status_update.py Class added: LectureChatCallback extending StatusCallback.
- Constructor added: __init__(self, run_id: str, base_url: str, initial_stages: List[StageDTO]).

Possibly related PRs

  • Tutor Chat Pipeline with Lecture content.  #104: The LectureChatPipeline is directly related to the LectureChatStatusUpdateDTO class introduced in the main PR, as both are part of the lecture chat functionality enhancements. The new pipeline execution DTO and the callback mechanism in this PR utilize the LectureChatStatusUpdateDTO for managing status updates.

Suggested labels

component:LLM


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (8)
app/domain/status/lecture_chat_status_update_dto.py (2)

4-5: Consider adding field validation for the result field.

If the result field has specific format requirements or constraints, consider adding Pydantic field validators.

+from pydantic import validator
+
 class LectureChatStatusUpdateDTO(StatusUpdateDTO):
     result: str
+
+    @validator('result')
+    def validate_result(cls, v: str) -> str:
+        if not v.strip():
+            raise ValueError("Result cannot be empty or whitespace")
+        return v

2-3: Remove extra blank line.

There are two consecutive blank lines between the import statement and class definition. One blank line is sufficient according to PEP 8.

 from app.domain.status.status_update_dto import StatusUpdateDTO
 
-
 class LectureChatStatusUpdateDTO(StatusUpdateDTO):
app/vector_database/lecture_schema.py (1)

Line range hint 1-116: Consider documenting language handling strategy.

The addition of COURSE_LANGUAGE suggests language-specific handling in the lecture chat system. Consider documenting:

  1. How language preferences affect the lecture chat pipeline
  2. Whether any language-specific processing or validation is needed
  3. Default language handling when the property is not set
app/web/routers/pipelines.py (3)

136-140: Consider more specific variant matching

The current variant matching could accidentally match unintended variants. Consider using an explicit enum or constant for variant names.

+from enum import Enum
+
+class LectureChatVariant(str, Enum):
+    DEFAULT = "default"
+    REFERENCE = "lecture_chat_pipeline_reference_impl"
+
 match variant:
-    case "default" | "lecture_chat_pipeline_reference_impl":
+    case LectureChatVariant.DEFAULT | LectureChatVariant.REFERENCE:
         pipeline = LectureChatPipeline(callback=callback)
     case _:
         raise ValueError(f"Unknown variant: {variant}")

129-129: Add type hints to function parameters

Consider adding type hints to improve code maintainability and IDE support.

-def run_lecture_chat_pipeline_worker(dto, variant):
+def run_lecture_chat_pipeline_worker(dto: LectureChatPipelineExecutionDTO, variant: str):

269-276: Enhance variant description

Consider providing a more detailed description that explains the purpose and capabilities of the lecture chat variant.

         case "LECTURE_CHAT":
             return [
                 FeatureDTO(
                     id="default",
                     name="Default Variant",
-                    description="Default lecture chat variant.",
+                    description="Default lecture chat variant for processing and responding to lecture-related queries and discussions.",
                 )
             ]
app/web/status/status_update.py (2)

295-301: Consider adding more granular stages for better progress tracking.

The current implementation only has a single "Thinking" stage with 30% weight. Other chat callbacks in the codebase have multiple stages for better progress tracking. Consider adding more stages to match the granularity of similar callbacks, such as:

  • Initial processing/context loading
  • Response generation
  • Response refinement

This would provide better visibility into the pipeline's progress and align with the patterns seen in TextExerciseChatCallback and ExerciseChatCallback.


305-305: Consider removing explicit empty result initialization.

The explicit initialization of result="" might be unnecessary as the DTO should handle default values. Other callback implementations don't set an initial result value.

-            LectureChatStatusUpdateDTO(stages=stages, result=""),
+            LectureChatStatusUpdateDTO(stages=stages),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 326de7e and fbf2179.

📒 Files selected for processing (5)
  • app/domain/status/lecture_chat_status_update_dto.py (1 hunks)
  • app/pipeline/chat/lecture_chat_pipeline.py (3 hunks)
  • app/vector_database/lecture_schema.py (2 hunks)
  • app/web/routers/pipelines.py (4 hunks)
  • app/web/status/status_update.py (2 hunks)
🔇 Additional comments (9)
app/vector_database/lecture_schema.py (1)

71-76: Verify migration strategy for existing collections.

The COURSE_LANGUAGE property is correctly defined, but please ensure:

  1. Existing collections are migrated to include this new property
  2. The application code handles cases where this property might be missing in older documents

Let's verify the usage and migration needs:

app/pipeline/chat/lecture_chat_pipeline.py (3)

28-28: LGTM: Clean callback integration

The import statement and class attribute for LectureChatCallback are well-structured with proper type annotations.

Also applies to: 59-59


61-73: LGTM: Clean constructor implementation

The constructor properly initializes the callback and maintains the parent class initialization.


66-66: Verify GPT model version upgrade implications

The GPT version has been upgraded from 3.5 to 4.5, which could impact:

  • Response quality
  • API costs
  • Performance characteristics
✅ Verification successful

Let's examine the actual usage and configurations to better understand the implications.


GPT version upgrade is consistent with system requirements

The change to GPT version 4.5 in lecture_chat_pipeline.py is appropriate and verified because:

  • The README explicitly states that "most existing pipelines in Pyris require a model with a gpt_version_equivalent of 4.5 or higher"
  • Other chat-related pipelines (course_chat, code_feedback, exercise_chat) also require version 4.5
  • The configuration file shows proper cost structures and capabilities for 4.5 models (GPT 4 Omni)
  • The upgrade aligns with the system's documented model requirements and capabilities
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other GPT version configurations to ensure consistency
rg -l "gpt_version_equivalent.*3\.5" 

# Look for any cost-related configurations
rg -l "cost|pricing|budget" 

# Check for performance-related configurations
rg -l "timeout|performance|latency"

Length of output: 908


Script:

#!/bin/bash
# Check the actual GPT version configurations and their context
rg "gpt_version_equivalent" -B 2 -A 2

# Check cost-related configurations specifically in config files
cat llm_config.example.yml

# Look for any performance/timeout settings that might be model-specific
rg "timeout.*gpt|performance.*gpt" -B 2 -A 2

Length of output: 12513

app/web/routers/pipelines.py (4)

14-22: LGTM: Import statements are well-organized

The new imports for lecture chat functionality follow the established pattern and maintain consistency with the existing codebase structure.


167-174: LGTM: Endpoint implementation follows established patterns

The lecture chat endpoint is well-implemented, following the same patterns as other pipeline endpoints:

  • Proper route definition with variant parameter
  • Token validation dependency
  • Asynchronous execution using threading

142-143: ⚠️ Potential issue

Fix incorrect error message in logs

The error message refers to "exercise chat pipeline" instead of "lecture chat pipeline".

-        logger.error(f"Error preparing exercise chat pipeline: {e}")
+        logger.error(f"Error preparing lecture chat pipeline: {e}")

Likely invalid or redundant comment.


131-135: Verify status update integration

Let's verify that the LectureChatCallback is properly integrated with the status update system.

✅ Verification successful

Status update integration is properly implemented

The LectureChatCallback is correctly integrated with the status update system:

  • It inherits from the abstract StatusCallback base class
  • Properly initializes the status update URL, stages, and callback mechanisms
  • Inherits all necessary status update methods (in_progress, done, error, skip)
  • Uses the correct LectureChatStatusUpdateDTO for status updates
  • Follows the same pattern as other chat callbacks in the system
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify LectureChatCallback implementation and integration

# Check if LectureChatCallback properly extends StatusCallback
ast-grep --pattern 'class LectureChatCallback extends StatusCallback {
  $$$
}'

# Check for consistent status update method implementations
rg -A 2 'def (on_start|on_complete|on_error)' app/web/status/status_update.py

Length of output: 161


Script:

#!/bin/bash
# Let's try different approaches to verify the callback implementation

# Search for LectureChatCallback class definition
rg -A 10 "class LectureChatCallback" 

# Look for status update related methods in the callback
rg -A 3 "def (on_start|on_complete|on_error)" --type py

# Check imports to verify inheritance
rg "from.*StatusCallback.*import|import.*StatusCallback" --type py

# Find any usage of status update methods
rg "self\.(on_start|on_complete|on_error)" --type py

Length of output: 1396


Script:

#!/bin/bash
# Let's check the StatusCallback base class and its implementation

# Find the StatusCallback base class definition and methods
rg -A 15 "class StatusCallback" --type py

# Check the complete LectureChatCallback implementation
cat app/web/status/status_update.py

Length of output: 11528

app/web/status/status_update.py (1)

15-17: LGTM: Import statement is properly structured.

The import for LectureChatStatusUpdateDTO follows the established pattern and is correctly placed with other similar DTO imports.

Comment on lines 4 to 5
class LectureChatStatusUpdateDTO(StatusUpdateDTO):
result: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add documentation to explain the DTO's purpose and field usage.

The class and its result field lack documentation. Add docstrings to improve code maintainability and clarity.

 class LectureChatStatusUpdateDTO(StatusUpdateDTO):
+    """Data Transfer Object for lecture chat status updates.
+
+    This DTO extends the base StatusUpdateDTO to include the result of lecture chat
+    pipeline operations, facilitating communication between Artemis and the lecture
+    chat system.
+    """
+
     result: str
+    """The result message or status of the lecture chat pipeline operation."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class LectureChatStatusUpdateDTO(StatusUpdateDTO):
result: str
class LectureChatStatusUpdateDTO(StatusUpdateDTO):
"""Data Transfer Object for lecture chat status updates.
This DTO extends the base StatusUpdateDTO to include the result of lecture chat
pipeline operations, facilitating communication between Artemis and the lecture
chat system.
"""
result: str
"""The result message or status of the lecture chat pipeline operation."""

app/vector_database/lecture_schema.py Outdated Show resolved Hide resolved
app/pipeline/chat/lecture_chat_pipeline.py Show resolved Hide resolved
Comment on lines 35 to 42
# collection.config.add_property(
# Property(
# name=LectureSchema.COURSE_LANGUAGE.value,
# description="The language of the COURSE",
# data_type=DataType.TEXT,
# index_searchable=False,
# )
# )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as code rabbit pointed out, do we still need this? As this condition is only triggered when the LectureSchema.COLLECTION_NAME exists in the collection, you can be sure that the returned collection != null. Do you want to overwrite it or not?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we changed the code and now it should be better integrated

def run_lecture_chat_pipeline_worker(dto, variant):
try:
callback = LectureChatCallback(
run_id=dto.settings.authentication_token,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be your code, but it seems a bit sketchy that something as confidential as a authentication token is used for a less-confidential context such as a random callback id 😅 Is it supposed to be like this?

Comment on lines 172 to 173
def run_lecture_chat_pipeline(variant: str, dto: LectureChatPipelineExecutionDTO):
thread = Thread(target=run_lecture_chat_pipeline_worker, args=(dto, variant))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of arguments is flipped in this POST endpoint: First variant, second dto. In the internal methods you use the flipped order. I'd be happy if you could decide and align to one way of doing it ☺️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, we implemented it now :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
app/web/routers/pipelines.py (2)

142-143: Consider security implications of error logging.

The current implementation logs full stack traces which might expose sensitive information in production. Consider:

  1. Limiting stack trace logging to development/staging environments
  2. Sanitizing sensitive information from error messages

Also applies to: 150-151


173-174: Consider thread lifecycle management.

The current implementation starts threads but doesn't track their lifecycle. Consider:

  1. Using a thread pool to manage concurrent executions
  2. Implementing a cleanup mechanism for completed threads
  3. Adding timeout handling for long-running operations
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0b102 and cbf9d0f.

📒 Files selected for processing (2)
  • app/vector_database/lecture_schema.py (2 hunks)
  • app/web/routers/pipelines.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/vector_database/lecture_schema.py
🔇 Additional comments (3)
app/web/routers/pipelines.py (3)

14-22: LGTM!

The imports are well-organized and follow the existing pattern in the codebase.


132-132: Consider using a dedicated run ID instead of authentication token.

As noted in a previous review, using an authentication token as a run ID might not be the best practice from a security perspective.


269-276: LGTM!

The feature variant implementation follows the established pattern and is consistent with other features.

@@ -121,6 +126,32 @@ def run_text_exercise_chat_pipeline_worker(dto, variant):
callback.error("Fatal error.", exception=e)


def run_lecture_chat_pipeline_worker(variant, dto):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align argument order with other worker functions.

For consistency with other worker functions (e.g., run_course_chat_pipeline_worker, run_text_exercise_chat_pipeline_worker), consider changing the argument order to (dto, variant).

-def run_lecture_chat_pipeline_worker(variant, dto):
+def run_lecture_chat_pipeline_worker(dto, variant):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def run_lecture_chat_pipeline_worker(variant, dto):
def run_lecture_chat_pipeline_worker(dto, variant):

dependencies=[Depends(TokenValidator())],
)
def run_lecture_chat_pipeline(variant: str, dto: LectureChatPipelineExecutionDTO):
thread = Thread(target=run_lecture_chat_pipeline_worker, args=(variant, dto))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix argument order in Thread creation.

The arguments passed to the worker function are in reverse order compared to the worker's parameter order. This will cause the wrong values to be passed to the function.

-    thread = Thread(target=run_lecture_chat_pipeline_worker, args=(variant, dto))
+    thread = Thread(target=run_lecture_chat_pipeline_worker, args=(dto, variant))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
thread = Thread(target=run_lecture_chat_pipeline_worker, args=(variant, dto))
thread = Thread(target=run_lecture_chat_pipeline_worker, args=(dto, variant))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants