-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage #1560
Comments
@taers232c fyi |
I'm not totally convinced this is the way to go for low memory handling. It works fine if the returned data is iterated over but as soon as you try to do something like data[0] it fails which will lead to unusual bugs An alternative approach would be to write a function that replaces get_all_pages and includes a "callback" function that processes each page, saving the need to store 100k or millions sometimes of user, device or other list objects. The challenge here is things like csv output need to know columns before you start adding rows and we may be dynamically adding columns during later page callbacks. |
I already do something like this, although not with callbacks. I get a page, process it, delete it, get the next page.
|
Right that's easy enough but getting it into a generic function would make
it easier to use elsewhere without all the API error handling.
…On Thu, Sep 15, 2022, 6:31 PM Ross Scroggs ***@***.***> wrote:
I already do something like this, although not with callbacks. I get a
page, process it, delete it, get the next page.
printGettingAllEntityItemsForWhom(Ent.DRIVE_FILE_OR_FOLDER, user, i, count, query=DLP.fileIdEntity['query'])
pageMessage = getPageMessageForWhom()
pageToken = None
totalItems = 0
userError = False
while True:
try:
feed = callGAPI(drive.files(), 'list',
throwReasons=GAPI.DRIVE_USER_THROW_REASONS+[GAPI.NOT_FOUND, GAPI.TEAMDRIVE_MEMBERSHIP_REQUIRED],
retryReasons=[GAPI.UNKNOWN_ERROR],
pageToken=pageToken,
orderBy=OBY.orderBy,
fields=pagesFields, pageSize=GC.Values[GC.DRIVE_MAX_RESULTS], **btkwargs)
except (GAPI.notFound, GAPI.teamDriveMembershipRequired) as e:
entityActionFailedWarning([Ent.USER, user, Ent.SHAREDDRIVE_ID, fileIdEntity['shareddrive']['driveId']], str(e), i, count)
userError = True
break
except (GAPI.serviceNotAvailable, GAPI.authError, GAPI.domainPolicy) as e:
userSvcNotApplicableOrDriveDisabled(user, str(e), i, count)
userError = True
break
pageToken, totalItems = _processGAPIpagesResult(feed, 'files', None, totalItems, pageMessage, None, Ent.DRIVE_FILE_OR_FOLDER)
if feed:
extendFileTree(fileTree, feed.get('files', []), DLP, stripCRsFromName)
del feed
if not pageToken:
_finalizeGAPIpagesResult(pageMessage)
break
—
Reply to this email directly, view it on GitHub
<#1560 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDIZMAI5LSRQORKTRV56L3V6OPUPANCNFSM6AAAAAAQMWGFAM>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Agreed. Something to do in my spare time. |
Rather than having to provide a callback function, you may want to consider creating something like I would also be careful with using something like |
yield is a good idea. It does mean we end up waiting for a page to be processed locally before we get the next page so it's not necessarily faster. I really like the idea of a callback that is getting pages as fast as possible while also parsing them locally in parallel but maybe adding that additional complexity isn't worth the performance gain. What makes sense to me would be to add a function like |
Today GAM can use a lot of memory when running a command like:
gam report user
in a large domain. Any command that uses
gapi.get_all_pages()
may use a LOT of memory while downloading many pages from Google.Rewriting every usage of get_all_pages to do all parsing after each page would require a LOT of work but we can at least save some memory with the way we generate all_pages.
Rather than all_pages being a normal list object we can use the builtin Python shelve library to write the list to disk and save memory.
The text was updated successfully, but these errors were encountered: