Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outlook] Feature: Office365 multi-user support #2844

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ilyasabdellaoui
Copy link

Introduced BaseOffice365User abstract base class to standardize Office 365 user handling. Added MultiOffice365Users to manage multiple emails from config. Added client_emails (comma-separated) in OutlookDataSource config. Resolved issue with fetching too many users causing SMTP server not found error.

Closes #2843

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

@ilyasabdellaoui ilyasabdellaoui requested a review from a team September 23, 2024 02:07
Copy link

cla-checker-service bot commented Sep 23, 2024

💚 CLA has been signed

@ilyasabdellaoui ilyasabdellaoui marked this pull request as draft September 23, 2024 02:14
@ilyasabdellaoui ilyasabdellaoui marked this pull request as ready for review September 23, 2024 02:15
Introduced BaseOffice365User abstract base class to standardize Office 365 user handling.
Added MultiOffice365Users to manage multiple emails from config.
Added client_emails (comma-separated) in OutlookDataSource config.
Resolved issue with fetching too many users causing SMTP server not found error.
@ilyasabdellaoui ilyasabdellaoui force-pushed the feature/office365-multi-user-support branch from 37f7483 to 897fed8 Compare September 23, 2024 02:32
@seanstory
Copy link
Member

buildkite test this

Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for submitting a patch! I've left a few comments, but overall I think this is a pretty clean approach and solves a clear ask.

Outside of the below comments, I'd also request that you change the

Closes #2843

in your PR description to use "Relates to" or "Part of", since another part of the issue you filed is that the current solution is limited to 999 users. I expect that's something we can better control with the graph API and another separate configuration, for folks who want "all users", and have more than 999.

connectors/sources/outlook.py Show resolved Hide resolved
connectors/sources/outlook.py Show resolved Hide resolved
tests/sources/test_outlook.py Outdated Show resolved Hide resolved
connectors/sources/outlook.py Outdated Show resolved Hide resolved
connectors/sources/outlook.py Outdated Show resolved Hide resolved
connectors/sources/outlook.py Outdated Show resolved Hide resolved
@seanstory
Copy link
Member

Screenshot 2024-09-24 at 3 45 50 PM

The tests pass, but the linter is currently failing. You can fix this by running make autoformat when you're done making changes.

@ilyasabdellaoui ilyasabdellaoui requested a review from a team as a code owner September 29, 2024 12:24
@ilyasabdellaoui
Copy link
Author

ilyasabdellaoui commented Sep 29, 2024

in your PR description to use "Relates to" or "Part of", since another part of the issue you filed is that the current solution is limited to 999 users. I expect that's something we can better control with the graph API and another separate configuration, for folks who want "all users", and have more than 999.

@seanstory After rechecking the code, I realized I was mistaken in my initial description. The code isn't limited to fetching the top 999 users—it handles pagination using the @odata.nextLink property from the Microsoft Graph API, which retrieves all users. I missed that detail. I'll update my issue description to reflect the correct behavior.

async def get_users(self):
access_token = await self._fetch_token()
url = f"https://graph.microsoft.com/v1.0/users?$top={TOP}"
while True:
try:
async with self._get_session.get(
url=url,
headers={
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
},
) as response:
json_response = await response.json()
yield json_response
url = json_response.get("@odata.nextLink")
if url is None:
break
except Exception:
raise

@ilyasabdellaoui ilyasabdellaoui marked this pull request as draft September 29, 2024 13:25
…Cloud connector

- Refactored the  method to utilize Microsoft Graph's JSON batching feature.
- The method now processes client emails in batches of up to 20 to optimize API requests.
- Added error handling for individual user fetch errors within the batch response.
- If any batch request fails, the method collects the errors and raises an exception with details.
@seanstory
Copy link
Member

buildkite test this

@ilyasabdellaoui
Copy link
Author

ilyasabdellaoui commented Sep 30, 2024

buildkite test this

Lint still fails due to async-related issues. Autoformat doesn't solve the lint errors. I'll investigate further after completing test functions. If unable to resolve, I'll seek assistance

@artem-shelkovnikov
Copy link
Member

@ilyasabdellaoui do you want to make this PR ready for review, or is there anything we can help with?

@ilyasabdellaoui ilyasabdellaoui marked this pull request as ready for review November 11, 2024 03:03
@ilyasabdellaoui
Copy link
Author

@artem-shelkovnikov Thanks for checking in! I encountered some linting issues, which I've now resolved in my latest commit, particularly related to async iteration. Everything is fixed and ready for review.

@artem-shelkovnikov
Copy link
Member

buildkite test this

Copy link
Member

@artem-shelkovnikov artem-shelkovnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a comment about configurable field to make it consistent with other connectors, but otherwise great change.

Additional question - have you done any benchmarks on how faster the connector gets?

I know that you're saving a lot of time on network, but throttling-wise I believe this change makes not impact, is it true?

Comment on lines 865 to 867
"required": False,
"type": "list",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally handle lists like this with * being default value and * meaning that everything should be exported.

See this as an example: https://github.com/elastic/connectors/blob/main/connectors/sources/sharepoint_online.py#L1192-L1199

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied the suggested changes and updated the logic to the WILDCARD * when client_emails is not specified
920697a

async def get_users(self):
access_token = await self._fetch_token()
errors = []
for i in range(0, len(self.client_emails), 20):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to make it a constant at least (20). Also why was 20 chosen as batch size?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use the GRAPH_API_BATCH_SIZE constant instead of hardcoding the value 20, see commit
e4923b5.

The batch size of 20 is due to the max limit set by the Graph API. For ref you can check the Graph API documentation on JSON batching

@ilyasabdellaoui
Copy link
Author

Looks good! Left a comment about configurable field to make it consistent with other connectors, but otherwise great change.

Additional question - have you done any benchmarks on how faster the connector gets?

I know that you're saving a lot of time on network, but throttling-wise I believe this change makes not impact, is it true?

I haven't run any benchmarks yet, but I expect this change will improve performance by reducing unnecessary user fetches.

As for throttling, the client_emails configuration is most effective at avoiding it when the list of emails is small, by limiting the API calls to only the relevant users, it reduces the load on the API. For larger email lists, though, the impact on throttling might not be as important!

@artem-shelkovnikov
Copy link
Member

buildkite test this

@artem-shelkovnikov
Copy link
Member

Change looks good, but I'll not merge immediately, will address a bit later - we need to do same change in Kibana to make it properly work for native connectors. I'll follow-up and merge both at the same time

@artem-shelkovnikov
Copy link
Member

Hi @ilyasabdellaoui, it's on my radar - we currently don't have a running subscription with. Outlook to be able to manually test it, it's ongoing work and we'll get back to this PR once it's been properly tested.

@ilyasabdellaoui
Copy link
Author

Thanks for the update @artem-shelkovnikov! Let me know if I can assist with anything to facilitate testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Allow Outlook Cloud connector to specify client emails instead of defaulting to all users
3 participants