Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't use Azure OpenAI deployed model for Llama Parse #491

Open
tituslhy opened this issue Nov 18, 2024 · 0 comments
Open

Couldn't use Azure OpenAI deployed model for Llama Parse #491

tituslhy opened this issue Nov 18, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tituslhy
Copy link

Describe the bug
I was following this example notebook on Llama Parse but was unable to replicate the results using my gpt4o model on Azure OpenAI.

Files
The file I used is available in the example notebook on llama parse

Job IDs
286f9c91-2244-4bce-bf40-624246be8f75
a77c610c-e2be-4d47-8849-09295b26677e
1fa80bba-a23f-4818-b2d2-6e2ace6c1d05

The different job ids were me filling in the blanks of my azure_endpoint. I noticed that the Llama Parse Docs require the azure_endpoint to be beyond "https://.openai.azure.com/" unlike the AzureOpenAI LLM object so the job IDs were me testing

  1. Just "https://.openai.azure.com/"
  2. Up to f"https://{org}.openai.azure.com/deployments/chat/completions?api-version={version}"
  3. "https://{org}.openai.azure.com/deployments/chat/completions?api-version=<{version}>" with the extra "<>" in front of {version}

** Code **

from llama_parse import LlamaParse

parser = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    split_by_page=True,
    verbose = True,
    azure_openai_deployment_name=deployment_name,
    azure_openai_endpoint = azure_endpoint,
    azure_openai_api_version = api_version,
    azure_openai_key = api_key
)
documents = parser.load_data("./2019-tesla-impact-report-15.pdf")

** Results: **

[Document(id_='2bcbdf4a-bd05-47ad-b43c-3a6f66380008', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='c717118b-fad2-4dd3-b4fb-bc23838ca91d', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='a44fce52-c9af-4cb9-8bec-e156467b684a', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='d3296420-f2af-4b88-a2e3-51fbb3447c5d', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='4cb174c0-7d9a-4452-91c0-685b6fec3eac', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='6d58c8df-af55-4a97-ab20-ee36bfc8ba61', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='34e55283-4049-469c-8801-c3345a40c76b', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='49a0f8f9-f76e-4016-b574-cbedd4abbd95', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='f021e308-c701-415c-a734-0ac5db293181', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='bcf52338-3461-444c-86d8-9dc55ebe0900', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='f62ec862-190a-417e-ad6c-db8ca3109093', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='99d11a72-1e42-4148-89d8-d453c0d334df', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='4064790f-91d7-487c-8025-11c53a830308', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='ef3c3ba1-1556-46a9-bbd9-810906472f91', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='eda1dc8d-70e9-457d-836f-c528ea962576', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='NO_CONTENT_HERE', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

Documents full of "NO_CONTENT_HERE".

Client:
Please remove untested options:

  • Notebook
@tituslhy tituslhy added the bug Something isn't working label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant