Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

Open
kkmarv opened this issue Nov 22, 2024 · 0 comments
Open

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

kkmarv opened this issue Nov 22, 2024 · 0 comments

Comments

@kkmarv
Copy link

kkmarv commented Nov 22, 2024

Problem Description

Our pipeline definition (see below) runs into a RecursionError: maximum recursion depth exceeded exception. We're using QdrantDocumentStore and QdrantEmbeddingRetriever from qdrant-haystack which seems to be causing the error since running the pipeline with its in-memory counterparts instead is successful.

Expected Behaviour

The pipeline should run without throwing an exception, correctly handling the Qdrant integration types.

Observed Behavior

$ hayhooks run --pipelines-dir ./pipelines
Stacktrace
INFO:     Pipelines dir set to: ./pipelines/retrieval/
  File ".venv/bin/hayhooks", line 8, in <module>
    sys.exit(hayhooks())
             ^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/cli/run/__init__.py", line 20, in run
    uvicorn.run("hayhooks.server:app", host=host, port=port)
  File ".venv/lib/python3.12/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 69, in serve
    await self._serve(sockets)
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 76, in _serve
    config.load()
  File ".venv/lib/python3.12/site-packages/uvicorn/config.py", line 434, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File ".venv/lib/python3.12/site-packages/hayhooks/server/__init__.py", line 1, in <module>
    from hayhooks.server.app import app
  File ".venv/lib/python3.12/site-packages/hayhooks/server/app.py", line 32, in <module>
    app = create_app()
          ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/app.py", line 27, in create_app
    deployed_pipeline = deploy_pipeline_def(app, pipeline_defintion)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/deploy_utils.py", line 20, in deploy_pipeline_def
    PipelineRunRequest = get_request_model(pipeline_def.name, pipe.inputs())
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/pipelines/models.py", line 29, in get_request_model
    input_type = handle_unsupported_types(typedef["type"], {DataFrame: dict})
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 65, in handle_unsupported_types
    return handle_generics(type_)
           ^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 61, in handle_unsupported_types
    new_type[arg_name] = handle_generics(arg_type)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 61, in handle_unsupported_types
    new_type[arg_name] = handle_generics(arg_type)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^

 #  ... repeated frames truncated ...
                        
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 59, in handle_unsupported_types
    for arg_name, arg_type in get_type_hints(type_).items():
                              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 2244, in get_type_hints
    value = _eval_type(value, base_globals, base_locals)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 414, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 929, in _evaluate
    self.__forward_value__ = _eval_type(
                             ^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded
pipeline.yml
components:
  embedder:
    init_parameters:
      model: null
      prefix: ''
      suffix: ''
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
      url: http://localhost:8080
    type: agrichat.ingestion.components.embedders.HuggingFaceTEITextEmbedder
  list_to_str_adapter:
    init_parameters:
      custom_filters: {}
      output_type: str
      template: '{{ replies[0] }}'
      unsafe: false
    type: haystack.components.converters.output_adapter.OutputAdapter
  llm:
    init_parameters:
      api_base_url: http://localhost:8000/v1
      api_key:
        env_vars:
        - OPENAI_API_KEY
        strict: true
        type: env_var
      generation_kwargs: {}
      model: mistralai/Mistral-Nemo-Instruct-2407
      organization: null
      streaming_callback: null
    type: haystack.components.generators.chat.openai.OpenAIChatGenerator
  memory_joiner:
    init_parameters:
      type_: list[haystack.dataclasses.chat_message.ChatMessage]
    type: haystack.components.joiners.branch.BranchJoiner
  memory_retriever:
    init_parameters:
      last_k: 10
      message_store:
        init_parameters: {}
        type: haystack_experimental.chat_message_stores.in_memory.InMemoryChatMessageStore
    type: haystack_experimental.components.retrievers.chat_message_retriever.ChatMessageRetriever
  memory_writer:
    init_parameters:
      message_store:
        init_parameters: {}
        type: haystack_experimental.chat_message_stores.in_memory.InMemoryChatMessageStore
    type: haystack_experimental.components.writers.chat_message_writer.ChatMessageWriter
  prompt_builder:
    init_parameters:
      required_variables: &id001 !!python/tuple
      - query
      - documents
      - memories
      template: null
      variables: *id001
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
  query_rephrase_llm:
    init_parameters:
      api_base_url: http://localhost:8000/v1
      api_key:
        env_vars:
        - OPENAI_API_KEY
        strict: true
        type: env_var
      generation_kwargs: {}
      model: mistralai/Mistral-Nemo-Instruct-2407
      organization: null
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.OpenAIGenerator
  query_rephrase_prompt_builder:
    init_parameters:
      required_variables: null
      template: "\nRewrite the question for semantic search while keeping its meaning\
        \ and key terms intact.\nIf the conversation history is empty, DO NOT change\
        \ the query.\nDo not translate the question.\nUse conversation history only\
        \ if necessary, and avoid extending the query with your own knowledge.\nIf\
        \ no changes are needed, output the current question as is.\n\nConversation\
        \ history:\n{% for memory in memories %}\n    {{ memory.content }}\n{% endfor\
        \ %}\n\nUser Query: {{query}}\nRewritten Query:\n"
      variables: null
    type: haystack.components.builders.prompt_builder.PromptBuilder
  retriever:
    init_parameters:
      document_store:
        init_parameters:
          api_key: null
          embedding_dim: 768
          force_disable_check_same_thread: false
          grpc_port: 6334
          hnsw_config: null
          host: null
          https: null
          index: Document
          init_from: null
          location: null
          metadata: {}
          on_disk: false
          on_disk_payload: null
          optimizers_config: null
          path: null
          payload_fields_to_index: null
          port: 6333
          prefer_grpc: false
          prefix: null
          progress_bar: false
          quantization_config: null
          recreate_index: false
          replication_factor: null
          return_embedding: false
          scroll_size: 10000
          shard_number: null
          similarity: cosine
          sparse_idf: false
          timeout: null
          url: http://localhost:6333
          use_sparse_embeddings: false
          wait_result_from_api: true
          wal_config: null
          write_batch_size: 100
          write_consistency_factor: null
        type: haystack_integrations.document_stores.qdrant.document_store.QdrantDocumentStore
      filter_policy: replace
      filters: null
      group_by: null
      group_size: null
      return_embedding: false
      scale_score: false
      score_threshold: null
      top_k: 3
    type: haystack_integrations.components.retrievers.qdrant.retriever.QdrantEmbeddingRetriever
connections:
- receiver: query_rephrase_llm.prompt
  sender: query_rephrase_prompt_builder.prompt
- receiver: list_to_str_adapter.replies
  sender: query_rephrase_llm.replies
- receiver: embedder.text
  sender: list_to_str_adapter.output
- receiver: retriever.query_embedding
  sender: embedder.embedding
- receiver: prompt_builder.documents
  sender: retriever.documents
- receiver: llm.messages
  sender: prompt_builder.prompt
- receiver: memory_joiner.value
  sender: llm.replies
- receiver: query_rephrase_prompt_builder.memories
  sender: memory_retriever.messages
- receiver: prompt_builder.memories
  sender: memory_retriever.messages
- receiver: memory_writer.messages
  sender: memory_joiner.value
max_runs_per_component: 100
metadata: {}

Hypothesis

The recursion happens when handle_unsupported_types processes nested or generic types like qdrant_client.http.models.models.Filter.

I've monkey-patched a print statement in handle_unsupported_types to have a look at its parameters causing the recursion:

def handle_unsupported_types(
    type_: type, types_mapping: Dict[type, type], skip_callables: bool = True
) -> Union[GenericAlias, type, None]:
    """
    Recursively handle types that are not supported by Pydantic by replacing them with the given types mapping.
    """

   print(type_, types_mapping)
   ...

which repeatedly prints the following before also throwing the exception:

Console Prints
<class 'qdrant_client.http.models.models.Filter'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.FieldCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchValue'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'bool'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'str'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchText'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchAny'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchExcept'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.Range'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.DatetimeRange'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoBoundingBox'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoRadius'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoPolygon'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.ValuesCount'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.IsEmptyCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.IsNullCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.HasIdCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.NestedCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}

It seems like handle_unsupported_types doesn't terminate for certain nested generic types. Manually increasing recursion depth might solve this.

requirements.txt

"colorama~=0.4.6",
"hayhooks~=0.0.18"
"haystack-ai~=2.7.0",
"haystack-experimental~=0.3.0",
"huggingface_hub~=0.24.2",
"minio~=7.2.9",
"pymupdf~=1.24.7",
"pymupdf4llm~=0.0.8",
"qdrant-haystack~=7.0.0",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant