RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

kkmarv · 2024-11-22T15:57:02Z

Problem Description

Our pipeline definition (see below) runs into a RecursionError: maximum recursion depth exceeded exception. We're using QdrantDocumentStore and QdrantEmbeddingRetriever from qdrant-haystack which seems to be causing the error since running the pipeline with its in-memory counterparts instead is successful.

Expected Behaviour

The pipeline should run without throwing an exception, correctly handling the Qdrant integration types.

Observed Behavior

$ hayhooks run --pipelines-dir ./pipelines

Stacktrace

INFO:     Pipelines dir set to: ./pipelines/retrieval/
  File ".venv/bin/hayhooks", line 8, in <module>
    sys.exit(hayhooks())
             ^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/cli/run/__init__.py", line 20, in run
    uvicorn.run("hayhooks.server:app", host=host, port=port)
  File ".venv/lib/python3.12/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 69, in serve
    await self._serve(sockets)
  File ".venv/lib/python3.12/site-packages/uvicorn/server.py", line 76, in _serve
    config.load()
  File ".venv/lib/python3.12/site-packages/uvicorn/config.py", line 434, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File ".venv/lib/python3.12/site-packages/hayhooks/server/__init__.py", line 1, in <module>
    from hayhooks.server.app import app
  File ".venv/lib/python3.12/site-packages/hayhooks/server/app.py", line 32, in <module>
    app = create_app()
          ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/app.py", line 27, in create_app
    deployed_pipeline = deploy_pipeline_def(app, pipeline_defintion)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/deploy_utils.py", line 20, in deploy_pipeline_def
    PipelineRunRequest = get_request_model(pipeline_def.name, pipe.inputs())
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/pipelines/models.py", line 29, in get_request_model
    input_type = handle_unsupported_types(typedef["type"], {DataFrame: dict})
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 65, in handle_unsupported_types
    return handle_generics(type_)
           ^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 61, in handle_unsupported_types
    new_type[arg_name] = handle_generics(arg_type)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 61, in handle_unsupported_types
    new_type[arg_name] = handle_generics(arg_type)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^

 #  ... repeated frames truncated ...
                        
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 44, in handle_generics
    result = handle_unsupported_types(t, types_mapping)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/hayhooks/server/utils/create_valid_type.py", line 59, in handle_unsupported_types
    for arg_name, arg_type in get_type_hints(type_).items():
                              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 2244, in get_type_hints
    value = _eval_type(value, base_globals, base_locals)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 414, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 929, in _evaluate
    self.__forward_value__ = _eval_type(
                             ^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/typing.py", line 428, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded

pipeline.yml

components:
  embedder:
    init_parameters:
      model: null
      prefix: ''
      suffix: ''
      token:
        env_vars:
        - HF_API_TOKEN
        strict: false
        type: env_var
      url: http://localhost:8080
    type: agrichat.ingestion.components.embedders.HuggingFaceTEITextEmbedder
  list_to_str_adapter:
    init_parameters:
      custom_filters: {}
      output_type: str
      template: '{{ replies[0] }}'
      unsafe: false
    type: haystack.components.converters.output_adapter.OutputAdapter
  llm:
    init_parameters:
      api_base_url: http://localhost:8000/v1
      api_key:
        env_vars:
        - OPENAI_API_KEY
        strict: true
        type: env_var
      generation_kwargs: {}
      model: mistralai/Mistral-Nemo-Instruct-2407
      organization: null
      streaming_callback: null
    type: haystack.components.generators.chat.openai.OpenAIChatGenerator
  memory_joiner:
    init_parameters:
      type_: list[haystack.dataclasses.chat_message.ChatMessage]
    type: haystack.components.joiners.branch.BranchJoiner
  memory_retriever:
    init_parameters:
      last_k: 10
      message_store:
        init_parameters: {}
        type: haystack_experimental.chat_message_stores.in_memory.InMemoryChatMessageStore
    type: haystack_experimental.components.retrievers.chat_message_retriever.ChatMessageRetriever
  memory_writer:
    init_parameters:
      message_store:
        init_parameters: {}
        type: haystack_experimental.chat_message_stores.in_memory.InMemoryChatMessageStore
    type: haystack_experimental.components.writers.chat_message_writer.ChatMessageWriter
  prompt_builder:
    init_parameters:
      required_variables: &id001 !!python/tuple
      - query
      - documents
      - memories
      template: null
      variables: *id001
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
  query_rephrase_llm:
    init_parameters:
      api_base_url: http://localhost:8000/v1
      api_key:
        env_vars:
        - OPENAI_API_KEY
        strict: true
        type: env_var
      generation_kwargs: {}
      model: mistralai/Mistral-Nemo-Instruct-2407
      organization: null
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.OpenAIGenerator
  query_rephrase_prompt_builder:
    init_parameters:
      required_variables: null
      template: "\nRewrite the question for semantic search while keeping its meaning\
        \ and key terms intact.\nIf the conversation history is empty, DO NOT change\
        \ the query.\nDo not translate the question.\nUse conversation history only\
        \ if necessary, and avoid extending the query with your own knowledge.\nIf\
        \ no changes are needed, output the current question as is.\n\nConversation\
        \ history:\n{% for memory in memories %}\n    {{ memory.content }}\n{% endfor\
        \ %}\n\nUser Query: {{query}}\nRewritten Query:\n"
      variables: null
    type: haystack.components.builders.prompt_builder.PromptBuilder
  retriever:
    init_parameters:
      document_store:
        init_parameters:
          api_key: null
          embedding_dim: 768
          force_disable_check_same_thread: false
          grpc_port: 6334
          hnsw_config: null
          host: null
          https: null
          index: Document
          init_from: null
          location: null
          metadata: {}
          on_disk: false
          on_disk_payload: null
          optimizers_config: null
          path: null
          payload_fields_to_index: null
          port: 6333
          prefer_grpc: false
          prefix: null
          progress_bar: false
          quantization_config: null
          recreate_index: false
          replication_factor: null
          return_embedding: false
          scroll_size: 10000
          shard_number: null
          similarity: cosine
          sparse_idf: false
          timeout: null
          url: http://localhost:6333
          use_sparse_embeddings: false
          wait_result_from_api: true
          wal_config: null
          write_batch_size: 100
          write_consistency_factor: null
        type: haystack_integrations.document_stores.qdrant.document_store.QdrantDocumentStore
      filter_policy: replace
      filters: null
      group_by: null
      group_size: null
      return_embedding: false
      scale_score: false
      score_threshold: null
      top_k: 3
    type: haystack_integrations.components.retrievers.qdrant.retriever.QdrantEmbeddingRetriever
connections:
- receiver: query_rephrase_llm.prompt
  sender: query_rephrase_prompt_builder.prompt
- receiver: list_to_str_adapter.replies
  sender: query_rephrase_llm.replies
- receiver: embedder.text
  sender: list_to_str_adapter.output
- receiver: retriever.query_embedding
  sender: embedder.embedding
- receiver: prompt_builder.documents
  sender: retriever.documents
- receiver: llm.messages
  sender: prompt_builder.prompt
- receiver: memory_joiner.value
  sender: llm.replies
- receiver: query_rephrase_prompt_builder.memories
  sender: memory_retriever.messages
- receiver: prompt_builder.memories
  sender: memory_retriever.messages
- receiver: memory_writer.messages
  sender: memory_joiner.value
max_runs_per_component: 100
metadata: {}

Hypothesis

The recursion happens when handle_unsupported_types processes nested or generic types like qdrant_client.http.models.models.Filter.

I've monkey-patched a print statement in handle_unsupported_types to have a look at its parameters causing the recursion:

def handle_unsupported_types(
    type_: type, types_mapping: Dict[type, type], skip_callables: bool = True
) -> Union[GenericAlias, type, None]:
    """
    Recursively handle types that are not supported by Pydantic by replacing them with the given types mapping.
    """

   print(type_, types_mapping)
   ...

which repeatedly prints the following before also throwing the exception:

Console Prints

<class 'qdrant_client.http.models.models.Filter'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.FieldCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchValue'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'bool'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'str'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchText'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchAny'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.MatchExcept'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.Range'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'float'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.DatetimeRange'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.datetime'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'datetime.date'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoBoundingBox'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoRadius'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.GeoPolygon'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.ValuesCount'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'int'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'NoneType'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.IsEmptyCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.IsNullCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.HasIdCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}
<class 'qdrant_client.http.models.models.NestedCondition'> {<class 'pandas.core.frame.DataFrame'>: <class 'dict'>}

It seems like handle_unsupported_types doesn't terminate for certain nested generic types. Manually increasing recursion depth might solve this.

requirements.txt

"colorama~=0.4.6",
"hayhooks~=0.0.18"
"haystack-ai~=2.7.0",
"haystack-experimental~=0.3.0",
"huggingface_hub~=0.24.2",
"minio~=7.2.9",
"pymupdf~=1.24.7",
"pymupdf4llm~=0.0.8",
"qdrant-haystack~=7.0.0",

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

kkmarv commented Nov 22, 2024

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

RecursionError when using qdrant-haystack 7.0.0 with Hayhooks #43

Comments

kkmarv commented Nov 22, 2024