preserve decimals/periods? #258

jzadra · 2018-12-18T23:20:16Z

Is there any way to preserve decimals/periods? On occasion I have column names from ACS data that have decimal numbers in them, and I would like to keep the periods as changing both the periods and the spaces to underscores produces some rather confusing names.

Tazinho · 2018-12-29T01:37:08Z

I imagine this issue relates to clean_names()!? There is currently no way to avoid this behaviour. However, you could switch to the snakecase pkg for more low level support.

Other possibility: @sfirke what do u think? Should we add the sep_in argument to the clean_names() family of functions? We would need to leave one gsub() in the preprocessing pipeline for the "`" removement and we should be careful when removing make.names() from the pipeline, but this would allow to let special characters through and handle them via sep_in from snakecase). More general: Should we inherit more arguments from snakecase? From my perspective we could try to give make_clean_names() as many of the same parameters as to_any_case() has as possible.

jzadra · 2018-12-29T23:00:01Z

Yes, sorry this is for clean_names()

sfirke · 2020-03-07T05:20:35Z

@Tazinho sorry to have let this sit, I just commented on #283 that it would be nice to support abbrevations and we could do that one, and this one, by taking your suggestion to pass more arguments through to to_any_case. I'm leaning toward just adding a ... to the end of the clean_names arguments that can be for any other arguments the user wants to supply to to_any_case. Easily enough and then we don't have to duplicate your documentation and the function stays lean.

billdenney · 2020-03-08T03:51:15Z

I think that it makes sense to pass arguments on. I'll insert that in my next revisions to the clean_names rewrite.

sfirke · 2020-04-05T01:52:22Z

@Tazinho now that all of the snakecase::to_any_case options are exposed, do you see a way to use them from clean_names to preserve the . character? I tried a few things and couldn't keep them from being turned into underscores.

E.g., a column named "HasDecimal2.3".

I don't see anything else to be done on the janitor side but it would be great to resolve this issue by showing how if at all it can be done via the snakecase options, or noting that it can't.

Tazinho · 2020-04-05T12:10:25Z

Hi @sfirke this can be done via sep_in.

> make_clean_names(string, sep_in = "[^[:alnum:]|(?<!\\d)\\.]")
[1] "has_decimal2.3"

I'd like to explain 1) what it does and 2) why it is a bit more complicated then one might initially assume...

sep_in replaces all its matches with "_". The regular expression "[^[:alnum:]|(?<!\\d)\\.]" can be read as ("[") "everything that is not an alpha-numeric" ("^[:alnum:]") "except" ("|") "dots which are behind a numeric" ("(?<!\\d)\\.") ("]"). Therefore, snakecase will work exactly as it's default (sep_in = "[^[:alnum:]]") just letting decimals after periods stay within the string.
snakecase internally (i.e. when set sep_in = NULL) only uses "_" and " " as separators. So in exactly the use case from this issue "(?<!\\d)\\." would already work. However, then we would loose the default "[^[:alnum:]]" which is what is typically wanted.

billdenney · 2020-04-05T12:20:59Z

I'd hesitate to use this solution because I think (without testing) that it will break some more common cases. My guess is that a better solution would be to work on this through the replacement path and have a default of periods replaced with underscores. The issue with that is that punctuation is replaced by regexp, and periods are conceptually different than other punctuation, but it would be hard to separate them out.

Tazinho · 2020-04-05T14:40:21Z

The replacement path would also involve regular expressions. So I don't see a big difference.

As the "replacement path" and the sep_in argument overlap a bit in the current implementation one would need an additional use of sep_in, e.g. sep_in = NULL to suppress the default in make_clean_names (sep_in = "\\.") but it's also possible:

> make_clean_names(string, replace = c("[^[:alnum:]|(?<!\\d)\\.]" = "."), sep_in = NULL)
[1] "has_decimal2.3abc"

It's certainly an option to move the "\." replacement as a default into the replace argument, but I don't think that this is an optimal path as it is in general a bit unfortunate to have this overlaping arguments. (I am not sure if I should integrate at some point a replacement mechanism into snakecase, just to make all these options more convenient working together and allow a more compact inferface for janitor).

billdenney · 2020-04-05T14:51:10Z

Maybe I misunderstood. I would not want to add a new user-facing argument that is a regular expressions as that tends to add complexity for many users, and not everyone understands regular expressions.

I'd say that given tight timelines for CRAN and a better solution may be available by integrating something into snakecase, let's punt this a bit longer.

sfirke · 2020-04-05T17:47:44Z

I should have been clearer, I think we can leave janitor::clean_names as is -- I just wanted to show if/how this can be done by the determined user facing this situation. For @jzadra but also anyone who ends up here via Google. I think @Tazinho's examples do that perfectly, thank you! So I will close this now as solved.

sfirke added this to the v2.0 milestone Mar 7, 2020

sfirke closed this as completed Apr 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preserve decimals/periods? #258

preserve decimals/periods? #258

jzadra commented Dec 18, 2018

Tazinho commented Dec 29, 2018 •

edited

Loading

jzadra commented Dec 29, 2018

sfirke commented Mar 7, 2020

billdenney commented Mar 8, 2020

sfirke commented Apr 5, 2020

Tazinho commented Apr 5, 2020 •

edited

Loading

billdenney commented Apr 5, 2020 via email

Tazinho commented Apr 5, 2020

billdenney commented Apr 5, 2020

sfirke commented Apr 5, 2020

preserve decimals/periods? #258

preserve decimals/periods? #258

Comments

jzadra commented Dec 18, 2018

Tazinho commented Dec 29, 2018 • edited Loading

jzadra commented Dec 29, 2018

sfirke commented Mar 7, 2020

billdenney commented Mar 8, 2020

sfirke commented Apr 5, 2020

Tazinho commented Apr 5, 2020 • edited Loading

billdenney commented Apr 5, 2020 via email

Tazinho commented Apr 5, 2020

billdenney commented Apr 5, 2020

sfirke commented Apr 5, 2020

Tazinho commented Dec 29, 2018 •

edited

Loading

Tazinho commented Apr 5, 2020 •

edited

Loading