-
Notifications
You must be signed in to change notification settings - Fork 736
feat: load SharePoint Pages content, feat: load docs from root folder in drive, feat: optionally only load specific file types. #930
base: main
Are you sure you want to change the base?
Conversation
…er file_types when loading documents.
The tests fail in _extract_page, the output type hints states ‘None | Dict[str…’ , because sometimes a SharePoint page has no ‘TextWebParts’, meaning there is no valid text to extract from the HTML. When this is the case, None is returned. instead, this situation should raise a ValueError that is handled by _download_pages_and_extract_metadata by passing on that page. |
Tested locally after installing test_requirements.txt, got a missing dependency error (llmsherpa), which is likely an issue with test_requirements.txt. After installing llmsherpa, tests run with 91 passed, 13 skipped, 1 warning in 17.43s |
I found out there is a way to do batch requests. I have an implementation running now that, for a sharepoint with +200 sites cuts down the download step from 1 minute to 7 seconds. Please wait with running tests until this is committed. |
Successfully implemented the use of batch requests to retrieve the content of pages, significantly speeding up retrieval of page content. Ran all tests locally with result 91 passed, 13 skipped, 2 warnings in 8.73s, after format and lint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @CraftingLevi! Have added minor comments that needs to be addressed. Then we can move ahead and merge
…rs, and what the default parameters are.
@anoopshrma I've added all the requested changes and synced the fork. In summary: I've adjusted the default argument of 'root' for 'sharepoint_folder_path' to "", in case someone ever wants to have a folder in the root folder called 'root', then this allows for that to happen. And it works nicely with the if/else comment you've made. Ran make format, make lint and make test, seems all good. |
Hey @CraftingLevi , Could you push your changes there directly!! |
Description
Added functionality to load more data from a Sharepoint Site.
Fixes # (issue)
#936
#937
#938
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods