-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean_names inserts unwanted underscore "_" in between NON-ASCII characters #268
Comments
I looked into https://github.com/sfirke/janitor/blob/master/R/make_clean_names.R and it seems transliterations argument is hardcoded as new_names <- old_names %>%
gsub("'", "", .) %>% # remove single quotation marks
gsub("\"", "", .) %>% # remove double quotation marks
gsub("%", ".percent_", .) %>% # starting with "." as a workaround, to make
# ".percent" a valid name. The "." will be replaced in the call to to_any_case
# via the preprocess argument anyway.
gsub("#", ".number_", .) %>%
gsub("^[[:space:][:punct:]]+", "", .) %>% # remove leading spaces & punctuation
make.names(.) %>%
# Handle dots, multiple underscores, case conversion, string transliteration
# Parsing option 4 removes underscores around numbers, #153
snakecase::to_any_case(.,
case = case, sep_in = "\\.",
transliterations = c("Latin-ASCII"), parsing_option = 1,
numerals = "asis"
)
# Handle duplicated names - they mess up dplyr pipelines
# This appends the column number to repeated instances of duplicate variable names
dupe_count <- vapply(seq_along(new_names), function(i) {
sum(new_names[i] == new_names[1:i])
}, integer(1)) |
Thanks for reporting! I agree that underscores should not be inserted as in your example. I'm not sure the best way to fix this, but welcome ideas. I think we put |
It looks like setting snakecase::to_any_case("介護_看護_女",
case = "snake", sep_in = "\\.",
transliterations = "Latin-ASCII", parsing_option = 0,
numerals = "asis"
) |
What would a fix for this look like? Bill, I agree that I thought about creating some interface to the parsing_option argument like
@Tazinho, do you have advice or preference, as to how we help users avoid this behavior? |
@sfirke I think the |
Hrm does #340 close this too? Or if it doesn't or can't, and we determine we can't address this issue, we could close this. Of course if this one could be addressed and we just are unable to do it right now, it can stay open. @billdenney I'll stop tagging you in stuff now 😆 |
In #340, I'm working to expose all of the options from It's difficult to test this in detail because |
That sounds great. |
Bug reports
Version
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
janitor 1.1.1
Description
When you have non-ascii column names,
clean_name
inserts unwanted underscore in between NON-ASCII characters.Below explains how unwanted underscore is inserted in between NON-ASCII characters.
Created on 2019-02-08 by the reprex package (v0.2.1)
For this example, clean_names shoule not put "_" in between characters like "介_護_看_護_女" but should return "介護_看護_女".
The text was updated successfully, but these errors were encountered: