Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added --recursive support to upload_object for s3 file storage #277

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mechpaul
Copy link

@mechpaul mechpaul commented Dec 4, 2021

This commit adds --recursive and folder support to upload_object for s3 file storage.

  • Updated the argparse help documentation to show that both files and folders are supported
  • Added --recursive support.
  • Added support for parsing folders

If a user supplies a folder but does not supply --recursive, then only the files from that folder are uploaded.

If a user supplies a folder and supplies --recursive, then all files from that folder recursively are uploaded.

I did not test my code against symbolic/hard links.

@Dorthu Dorthu self-requested a review December 6, 2021 12:31
Copy link
Collaborator

@Dorthu Dorthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Testing this locally, I see that folder structure is not preserved when using --recursive as it is implemented here.

Using a contrived example:

wsmith@linode::linode-cli[mechpaul/master]$ tree tmp/
tmp/
├── a
│   └── file2
├── b
│   └── c
│       ├── file3
│       └── file4
└── file1

3 directories, 4 files

I would expect to see a bucket like this:

wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj ls 277-test-s3cmd
                  DIR  a/
                  DIR  b/
2021-12-07 18:19  0    file1

What I end up with is this:

wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj ls 277-test
2021-12-07 18:24  0  file1
2021-12-07 18:24  0  file2
2021-12-07 18:24  0  file3
2021-12-07 18:24  0  file4

(the reference bucket above was created using the equivalent s3cmd upload action)

It looks like all that needs to be done to achieve the desired behavior is to keep the paths in addition to the filenames when collecting files, which shouldn't be terribly difficult. If you don't mind, I might tinker with this a little bit and push up a commit that does this.

@mechpaul
Copy link
Author

mechpaul commented Dec 7, 2021

Working as intended.

It was coded this way for the following reasons:

  • The Linode web GUI does not allow for the creation of folders
  • The original implementation of this function did not preserve folder structure. For example, if you passed foo/bar/baz.txt, it did not upload foo/bar/baz.txt but only baz.txt. This was done to maintain parity with the older implementation.
  • There is no documentation regarding the preservation of folders in object storage.

To continue with the critique of this, now that I'm running this against larger file sets only the first 1400 files are uploaded. The rest of the files are skipped...? I'm wondering if there is some Linode limitation on the number of files uploaded in one login session.

@Dorthu
Copy link
Collaborator

Dorthu commented Dec 7, 2021

There's no login session at work here - the CLI uses Object Storage keys to perform operations through this plugin, and these are effectively stateless. There is probably an upper limit to objects in a bucket, but it would be much higher than 1400.

I don't think that preserving folder structure is a break from old behavior; in the example above, I asked the s3cmd to recursively upload tmp/ and did not get all keys prefixed with tmp/, it merely preserved the structure beneath that directory. I think the CLI should do the same thing; tmp/file1 should still end up with the key file1, but tmp/a/file2 should preserve its prefix beneath tmp, become a/file2.

Supposing all structure was flattened, what would happen if I had two files with the same name in different directories during a recursive upload? Take for example:

# set up a new folder, tmp2, with one subdirectory and a file named "pet" in each
wsmith@linode::linode-cli[mechpaul/master]$ mkdir -p tmp2/a
wsmith@linode::linode-cli[mechpaul/master]$ echo "cat" > tmp2/a/pet
wsmith@linode::linode-cli[mechpaul/master]$ echo "dog" > tmp2/pet
# recursively upload tmp2
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj put --recursive tmp2/  277-test
pet
 |####################################################################################################| 100.0%
pet
 |####################################################################################################| 100.0%
Done.
# but only one file exists in the bucket - the second one overrode the first
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj ls 277-test
2021-12-07 19:57  4  pet
# let's see which one it is
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj get 277-test pet
 |####################################################################################################| 100.0%
Done.
wsmith@linode::linode-cli[mechpaul/master]$ cat pet
dog

In the above, I lost my cat. Maybe this explains where some of the files you were attempting to upload went too?

In object storage, there aren't really "folders" - these are emulated client-side using prefixes (i.e. delimitated on a /, a/ is a prefix in a/pet). Most s3 clients (i.e. s3cmd, cyberduck, even the Cloud Manager) display forward-slash delimitated prefixes as folders, but they are essentially in a flat space in object storage. Incidentally, these "folders" only exist if there are any keys under them; in my first example, removing file2 would also remove the a/ folder entirely. Maybe that's why the Cloud Manager won't let you create them yourself?

@mechpaul
Copy link
Author

mechpaul commented Dec 7, 2021

Ahhhh, I didn't know those folders were just virtual folders in a flat space!

The files I was trying to upload were about 1.2M PNG files - about 14 GB worth. There is no folder structure (all flat), and each file is named {sha256}.png. So no, there's no name clobbering here.

You can reject the PR. I'll work on preserving folder structure and submit a new PR with the changes requested. Thanks for sharing your knowledge with me. Linode is my first cloud platform.

@Dorthu
Copy link
Collaborator

Dorthu commented Dec 7, 2021

Thanks for contributing! I'm fine to leave this open if you want to push changes here. I also don't mind doing it myself if you don't feel like it doing it.

@mechpaul
Copy link
Author

mechpaul commented Dec 8, 2021

Go ahead and leave it open. I'll resubmit by EOWeekend.

@mechpaul
Copy link
Author

mechpaul commented Dec 10, 2021

Hmmm... dealing with a lot of corner cases to support folders on the source computer and virtual folders in s3.

  • Suppose the user puts in ../../myfolder/. In this case, the user would not want to have ../ in the virtual folder name.
  • In either Windows or Linux, what if an absolute path from the directory root (in linux) or drive (D:\whatever) is provided?

Pretty sure I have these figured out now.

After working out these issues, I'm now trying to figure out another issue.

subfolder\00000e5e66a365da9134ab537cf8125f.png
|####################################################################################################| 100.0%
Error: SignatureDoesNotMatch

Going to dig into this error to figure out what's going on. Looks like it's an error from Boto.

I passed in the following:

linode obj put d:\maplestory\imageslinode\ --recursive happychinchilla

I threw about 70 images in here, half of which are in the folder /subfolder/ which should be preserved in the S3 object key. It's throwing this error on the first image contained within a subfolder.

@Dorthu
Copy link
Collaborator

Dorthu commented Dec 14, 2021

I took a stab at this in mechpaul#1 (which is a PR against this branch) - let me know if it works for you.

@Dorthu
Copy link
Collaborator

Dorthu commented Dec 14, 2021

I commented in the above-linked PR, but I can reproduce that SignatureDoesNotMatch error on the master branch; it seems to be a bug with how some characters in file names are treated (in your case \, in mine $).

Dorthu added a commit that referenced this pull request Dec 14, 2021
…ters

Related to #277

@mechpaul discovered that putting filenames with some special
characters, such as `\` and `$` using the `obj` plugin returns a
SignatureDoesNotMatch error.

To reproduce:

```bash
$ echo "test" > 'test$file\name'
$ python3 -m linodecli obj put 'test$file\name' some-bucket
```

This change properly quotes the URLs such that these characters are
accepted in keys.
@Dorthu
Copy link
Collaborator

Dorthu commented Dec 14, 2021

I put up a fix for the bug in #278; let me know if it works for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants