Update custom function for downloading files from s3 #458

DyfanJones · 2021-09-10T11:21:28Z

This method by-passes R when collecting data from s3. Instead it calls httr:write_disk from paws.common:::issue and relies on libcurl to do the heavy lifting through curl::curl_fetch_disk.

Addresses #401

codecov · 2021-09-10T11:25:40Z

Codecov Report

Merging #458 (761529a) into main (c0b0866) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #458      +/-   ##
==========================================
+ Coverage   82.68%   82.71%   +0.03%     
==========================================
  Files         112      112              
  Lines        6400     6412      +12     
==========================================
+ Hits         5292     5304      +12     
  Misses       1108     1108

Impacted Files	Coverage Δ
paws/paws.common/R/request.R	`79.16% <0.00%> (+0.29%)`	⬆️
...sers/runner/work/paws/paws/paws.common/R/request.R	`79.16% <0.00%> (+0.29%)`	⬆️
paws/paws.common/R/net.R	`86.66% <0.00%> (+0.95%)`	⬆️
Users/runner/work/paws/paws/paws.common/R/net.R	`86.66% <0.00%> (+0.95%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

philiporlando · 2024-10-14T16:29:34Z

Hi, thank you for this great update! I’m trying to confirm my understanding of how the s3$download_file() function behaves in terms of memory management in the newer version.

I’m using s3$download_file() within a {targets} pipeline with large-scale branching targets. This is essentially making thousands of individual calls to s3$download_file() across multiple CPU threads. I’ve noticed that my system's memory consumption rises significantly during the download process, eventually leading to memory limits being hit and the system crashing. My understanding is that, with this merged PR, s3$download_file() should not accumulate memory since files are now written directly to disk instead of being held in memory first. I'm trying to pinpoint where the memory accumulation is happening.

Can you confirm if there are any known memory issues in the current implementation, or if there might still be room for memory to accumulate under certain circumstances? I am troubleshooting my pipeline and would appreciate any insights on whether explicit gc() calls might help, or if there’s something I’m missing in how the function behaves.

Thanks in advance!

DyfanJones added 4 commits September 9, 2021 15:48

feature: pass output location to httr request

8f84d37

feature: modify s3_download_file to utilize httr to write to disk

3c84100

feature: prevent reading in data when a output is not null

0bb0487

feature: check if http content is written to disk

86d4c05

davidkretch self-assigned this Feb 12, 2022

DyfanJones mentioned this pull request Mar 26, 2022

Asynchronous API requests #493

Open

DyfanJones mentioned this pull request Aug 12, 2022

s3 download directly to a file? #521

Closed

DyfanJones added 3 commits August 16, 2022 16:35

Merge branch 'main' into feature/write-disk

c9d537c

formatting

85e82be

rename output to dest

761529a

DyfanJones merged commit 6af5568 into paws-r:main Aug 16, 2022

DyfanJones mentioned this pull request Aug 16, 2022

Functionality to copy s3 file directly to local disk without table intermediate #401

Closed

DyfanJones added a commit to DyfanJones/s3fs that referenced this pull request Aug 16, 2022

switch to using s3 method download_file due to PR paws-r/paws#458

322e85a

DyfanJones deleted the feature/write-disk branch January 31, 2023 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update custom function for downloading files from s3 #458

Update custom function for downloading files from s3 #458

DyfanJones commented Sep 10, 2021

codecov bot commented Sep 10, 2021 •

edited

Loading

philiporlando commented Oct 14, 2024 •

edited

Loading

Update custom function for downloading files from s3 #458

Update custom function for downloading files from s3 #458

Conversation

DyfanJones commented Sep 10, 2021

codecov bot commented Sep 10, 2021 • edited Loading

Codecov Report

philiporlando commented Oct 14, 2024 • edited Loading

codecov bot commented Sep 10, 2021 •

edited

Loading

philiporlando commented Oct 14, 2024 •

edited

Loading