getDrugIngredientCodes and non UTF-8 characters #233

tleht · 2024-12-12T12:12:58Z

Describe the bug
Calling the function getDrugIngredientCodes with the argument "name" specified returns the following error

Error in `dplyr::filter()`:
ℹ In argument: `tidyWords(.data$concept_name) %in% tidyWords(.env$name)`.
Caused by error in `sub()`:
! input string 32262 is invalid UTF-8

The error comes from lines 
    ingredientConcepts <- cdm$concept %>% dplyr::filter(.data$standard_concept == 
        "S") %>% dplyr::filter(.data$concept_class_id == "Ingredient") %>% 
        dplyr::select("concept_id", "concept_name", "concept_code") %>% 
        dplyr::collect()
    if (!is.null(name)) {
        ingredientConcepts <- ingredientConcepts %>% dplyr::filter(tidyWords(.data$concept_name) %in% 
            tidyWords(.env$name))
    }

The error is caused by the standard RxNorm Extension drug ingredient concept 1253507 "[ ¹⁸ F]AlF-NOTA-FAPI-04" present in our concept-table.

To Reproduce
getDrugIngredientCodes(cdm = cdm, name = "Adalimumab")

The text was updated successfully, but these errors were encountered:

tleht · 2024-12-12T12:49:47Z

To be more exact, this seems to be an issue with the helper function tidyWords:

> tidyWords("[ ¹⁸ F]AlF-NOTA-FAPI-04")
Error in sub(re, "", x, perl = TRUE) : input string 1 is invalid UTF-8
In addition: Warning message:
In sub(re, "", x, perl = TRUE) :
  unable to translate '[ Â¹â<81>¸ F]AlF-NOTA-FAPI-04' to UTF-8

More specifically the following lines:

    Encoding(words) <- "latin1"
    
    # some generic formatting
    workingWords <- trimws(words)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getDrugIngredientCodes and non UTF-8 characters #233

getDrugIngredientCodes and non UTF-8 characters #233

tleht commented Dec 12, 2024 •

edited

Loading

tleht commented Dec 12, 2024

getDrugIngredientCodes and non UTF-8 characters #233

getDrugIngredientCodes and non UTF-8 characters #233

Comments

tleht commented Dec 12, 2024 • edited Loading

tleht commented Dec 12, 2024

tleht commented Dec 12, 2024 •

edited

Loading