feat: openai vision toolkit #269

AaronGoldsmith · 2024-11-18T15:43:38Z

This pull request introduces a new ImageUrl content type and integrates it across various modules in the exchange package. The changes include updates to the content handling logic, message validation, and new tests to ensure the functionality of the ImageUrl type. Additionally, a new VisionToolkit has been added to the goose toolkit for image analysis using AI capabilities.

The toolkit uses the OpenAI Vision API to analyze image URLs and base64-encoded images.
Future enhancements: Support for additional providers (ex: Anthropic)

New Content Type and Integration:

Added ImageUrl class to handle image URLs as content and implemented to_dict and summary methods in packages/exchange/src/exchange/content.py.
Updated validate_role_and_content and content_converter functions in packages/exchange/src/exchange/message.py to support ImageUrl. [1] [2]
Added image_content property to the Message class to retrieve all ImageUrl instances in packages/exchange/src/exchange/message.py.

Providers and Utilities:

Updated messages_to_openai_spec function in packages/exchange/src/exchange/providers/utils.py to handle ImageUrl content.

Testing Enhancements:

Added tests for ImageUrl integration in packages/exchange/tests/test_image_tool_integration.py and packages/exchange/tests/test_message.py. [1] [2]

New Toolkit:

Introduced VisionToolkit for image analysis in src/goose/toolkit/vision.py and updated pyproject.toml to include the new toolkit entry point. [1] [2]

If the team prefers having the toolkit and the new content type as separate PRs, I’d be happy to split them up.

use 4o-mini bump version

AaronGoldsmith · 2024-11-18T17:27:13Z

packages/exchange/src/exchange/message.py

+def content_converter(contents: list) -> list[Content]:                                                                                                                      
+     result = []                                                                                                                                                              
+     for c in contents:                                                                                                                                                       
+         if isinstance(c, dict) and 'type' in c:  # Structured content logic                                                                                                  
+             content_type = c.pop('type')                                                                                                                                     
+             if content_type in CONTENT_TYPES:                                                                                                                                
+                 result.append(CONTENT_TYPES[content_type](**c))                                                                                                              
+         elif isinstance(c, Content):  # Already a Content instance                                                                                                           
+             result.append(c)                                                                                                                                                 
+         elif isinstance(c, str):  # Plain text handling                                                                                                                      
+             result.append(Text(text=c))                                                                                                                                      
+         else:                                                                                                                                                                
+             # Handle unexpected content formats if necessary                                                                                                                 
+             raise ValueError(f"Unsupported content type: {type(c)}")                                                                                                         
+     return result       


Before updating content_converter I would get the following:

"Traceback (most recent call last): File \"/Users/aarong/Development/goose/packages/exchange/src/exchange/exchange.py\", line 149, in call_function output = json.dumps(tool.function(**tool_use.parameters)) File \"/Users/aarong/Development/goose/src/goose/toolkit/vision.py\", line 22, in describe_image user_message = Message(role=\"user\", content=[f\"{instructions}: \", image]) File \"<attrs generated init exchange.message.Message>\", line 13, in __init__ _setattr('content', __attr_converter_content(content)) File \"/Users/aarong/Development/goose/packages/exchange/src/exchange/message.py\", line 29, in content_converter\n return [(CONTENT_TYPES[c.pop(\"type\")](**c) if c.__class__ not in CONTENT_TYPES.values() else c) for c in contents] File \"/Users/aarong/Development/goose/packages/exchange/src/exchange/message.py\", line 29, in <listcomp>\n return [(CONTENT_TYPES[c.pop(\"type\")](**c) if c.__class__ not in CONTENT_TYPES.values() else c) for c in contents]\nAttributeError: 'str' object has no attribute 'pop' 'str' object has no attribute 'pop'", "is_error": true, "type": "ToolResult"}]

add support for local image files

AaronGoldsmith added 4 commits November 15, 2024 16:35

initial commit with vision

62fbd06

use 4o-mini bump version

update integration tests

fc9fdde

branch clean up

b1c0690

fix pathing

07f74de

AaronGoldsmith commented Nov 18, 2024

View reviewed changes

feat: enhance image handling in message processing and tests

3966947

add support for local image files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: openai vision toolkit #269

feat: openai vision toolkit #269

AaronGoldsmith commented Nov 18, 2024 •

edited

Loading

AaronGoldsmith Nov 18, 2024

feat: openai vision toolkit #269

Are you sure you want to change the base?

feat: openai vision toolkit #269

Conversation

AaronGoldsmith commented Nov 18, 2024 • edited Loading

New Content Type and Integration:

Providers and Utilities:

Testing Enhancements:

New Toolkit:

AaronGoldsmith Nov 18, 2024

Choose a reason for hiding this comment

AaronGoldsmith commented Nov 18, 2024 •

edited

Loading