Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading more than once to a path adds it to subfolders #138

Open
redspart opened this issue Aug 6, 2019 · 1 comment
Open

Uploading more than once to a path adds it to subfolders #138

redspart opened this issue Aug 6, 2019 · 1 comment

Comments

@redspart
Copy link

redspart commented Aug 6, 2019

I am not sure if this is indented please let me know if it is.

When uploading a directory first time it will add the data into the correct spot; i.e: hdfs-path/sub-folder. However, when trying to add more data to the same place it output it in the /hdfs_path/sub-folder/<local_name>/.

If this is not an intended output, I believe the culprit is here on line 553 where hdfs_path and local_name are joined. I removed the local_name on the join and it seemed to upload all data into hdfs_path while making no subfolders.

hdfs_path = psp.join(hdfs_path, local_name)

EDIT

Coded used:

for p in files:
    file_path = "sub_folder"
    upload_path = "%s/%s" % ("/hdfs-path", "sub_folder")
    client.upload(upload_path, file_path, overwrite=True, n_threads=0)

After a bit more debugging, I found that if the path in hdfs exists, it will append the folder name in which the files are coming from. I need the files to be added to the specified directory and not to the directory + sub folder. To remedy this I created a new variable called use_existing. When True it will use the hdfs path and not the hdfs+local_name.

Again let me know if my understanding is off, or you would like a PR with the added variable.

@mtth
Copy link
Owner

mtth commented Aug 8, 2019

Thanks for the detailed report. Your understanding is correct. It is implemented this way to be consistent with local commands:

# In an empty directory
$ mkdir src1 src2
$ cp -r src1 dst # Copies src1 as dst
$ cp -r src2 dst # Copies src2 as dst/src2

As you point out, there is a usability gap though. You can achieve what you are trying to do locally by globbing (cp -r src2/* dst) but there is no equivalent here, at least until #105. I think this justifies adding an option; if you send a PR I would be happy to review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants