Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserve decimals/periods? #258

Closed
jzadra opened this issue Dec 18, 2018 · 10 comments
Closed

preserve decimals/periods? #258

jzadra opened this issue Dec 18, 2018 · 10 comments
Milestone

Comments

@jzadra
Copy link
Contributor

jzadra commented Dec 18, 2018

Is there any way to preserve decimals/periods? On occasion I have column names from ACS data that have decimal numbers in them, and I would like to keep the periods as changing both the periods and the spaces to underscores produces some rather confusing names.

@Tazinho
Copy link
Contributor

Tazinho commented Dec 29, 2018

I imagine this issue relates to clean_names()!? There is currently no way to avoid this behaviour. However, you could switch to the snakecase pkg for more low level support.

Other possibility: @sfirke what do u think? Should we add the sep_in argument to the clean_names() family of functions? We would need to leave one gsub() in the preprocessing pipeline for the "`" removement and we should be careful when removing make.names() from the pipeline, but this would allow to let special characters through and handle them via sep_in from snakecase). More general: Should we inherit more arguments from snakecase? From my perspective we could try to give make_clean_names() as many of the same parameters as to_any_case() has as possible.

@jzadra
Copy link
Contributor Author

jzadra commented Dec 29, 2018

Yes, sorry this is for clean_names()

@sfirke
Copy link
Owner

sfirke commented Mar 7, 2020

@Tazinho sorry to have let this sit, I just commented on #283 that it would be nice to support abbrevations and we could do that one, and this one, by taking your suggestion to pass more arguments through to to_any_case. I'm leaning toward just adding a ... to the end of the clean_names arguments that can be for any other arguments the user wants to supply to to_any_case. Easily enough and then we don't have to duplicate your documentation and the function stays lean.

@sfirke sfirke added this to the v2.0 milestone Mar 7, 2020
@billdenney
Copy link
Collaborator

I think that it makes sense to pass arguments on. I'll insert that in my next revisions to the clean_names rewrite.

@sfirke
Copy link
Owner

sfirke commented Apr 5, 2020

@Tazinho now that all of the snakecase::to_any_case options are exposed, do you see a way to use them from clean_names to preserve the . character? I tried a few things and couldn't keep them from being turned into underscores.

E.g., a column named "HasDecimal2.3".

I don't see anything else to be done on the janitor side but it would be great to resolve this issue by showing how if at all it can be done via the snakecase options, or noting that it can't.

@Tazinho
Copy link
Contributor

Tazinho commented Apr 5, 2020

Hi @sfirke this can be done via sep_in.

> make_clean_names(string, sep_in = "[^[:alnum:]|(?<!\\d)\\.]")
[1] "has_decimal2.3"

I'd like to explain 1) what it does and 2) why it is a bit more complicated then one might initially assume...

  1. sep_in replaces all its matches with "_". The regular expression "[^[:alnum:]|(?<!\\d)\\.]" can be read as ("[") "everything that is not an alpha-numeric" ("^[:alnum:]") "except" ("|") "dots which are behind a numeric" ("(?<!\\d)\\.") ("]"). Therefore, snakecase will work exactly as it's default (sep_in = "[^[:alnum:]]") just letting decimals after periods stay within the string.
  2. snakecase internally (i.e. when set sep_in = NULL) only uses "_" and " " as separators. So in exactly the use case from this issue "(?<!\\d)\\." would already work. However, then we would loose the default "[^[:alnum:]]" which is what is typically wanted.

@billdenney
Copy link
Collaborator

billdenney commented Apr 5, 2020 via email

@Tazinho
Copy link
Contributor

Tazinho commented Apr 5, 2020

The replacement path would also involve regular expressions. So I don't see a big difference.

As the "replacement path" and the sep_in argument overlap a bit in the current implementation one would need an additional use of sep_in, e.g. sep_in = NULL to suppress the default in make_clean_names (sep_in = "\\.") but it's also possible:

> make_clean_names(string, replace = c("[^[:alnum:]|(?<!\\d)\\.]" = "."), sep_in = NULL)
[1] "has_decimal2.3abc"

It's certainly an option to move the "\." replacement as a default into the replace argument, but I don't think that this is an optimal path as it is in general a bit unfortunate to have this overlaping arguments. (I am not sure if I should integrate at some point a replacement mechanism into snakecase, just to make all these options more convenient working together and allow a more compact inferface for janitor).

@billdenney
Copy link
Collaborator

Maybe I misunderstood. I would not want to add a new user-facing argument that is a regular expressions as that tends to add complexity for many users, and not everyone understands regular expressions.

I'd say that given tight timelines for CRAN and a better solution may be available by integrating something into snakecase, let's punt this a bit longer.

@sfirke
Copy link
Owner

sfirke commented Apr 5, 2020

I should have been clearer, I think we can leave janitor::clean_names as is -- I just wanted to show if/how this can be done by the determined user facing this situation. For @jzadra but also anyone who ends up here via Google. I think @Tazinho's examples do that perfectly, thank you! So I will close this now as solved.

@sfirke sfirke closed this as completed Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants