How to merge multiple directories into one (Documentation/Help Request) #10

alexmc6 · 2015-05-11T15:52:20Z

Hi Folks,

This is not a bug as such - just that I am not sure the capabilities of the --regex and --replacement features.

What I want is ideally to convert directories "one directory per hour" eg

...somedirectory/2015/05/10/21/...lots of files...
...somedirectory/2015/05/10/22/...lots of files...
...somedirectory/2015/05/10/23/...lots of files...
...somedirectory/2015/05/11/00/...lots of files...
...somedirectory/2015/05/11/01/...lots of files...
...somedirectory/2015/05/11/02/...lots of files...

into "one directory per day"

...somedirectory/2015/05/10/oneBigFile
...somedirectory/2015/05/11/oneBigFile

or, if necessary

...somedirectory/2015/05/10/00/oneBigFile
...somedirectory/2015/05/11/00/oneBigFile

(And ideally I'd love it to tell Hive HCatalog at the same time, but that might be asking too much)

I am trying to use the --regex and --replacement features to do this. Should it work?

This just adds in a new directory

--regex=".*/\d\d/(.+)"
--replacement=00/$1-${crush.timestamp}-${crush.task.num}-${crush.file.num} \

Should I be trying something like

--regex=".*/(\d\d)/(.+)"
--replacement=00/$2-${crush.timestamp}-${crush.task.num}-${crush.file.num} \

I suppose my fall back solution would be to move everything from the low level directories one directory up before running the file crush. That would be a bit of a pain - I suppose I could write a perl or shell script to do that which ran "hadoop fs -mv " commands

Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to merge multiple directories into one (Documentation/Help Request) #10

How to merge multiple directories into one (Documentation/Help Request) #10

alexmc6 commented May 11, 2015

How to merge multiple directories into one (Documentation/Help Request) #10

How to merge multiple directories into one (Documentation/Help Request) #10

Comments

alexmc6 commented May 11, 2015