From 5d88bf2d77d330c5fce505241f04c637d11d18b0 Mon Sep 17 00:00:00 2001 From: "Win Cowger, PhD" Date: Sat, 25 Nov 2023 05:06:24 -0800 Subject: [PATCH] Streamlining identification (#157) * make it easier to do matches, this way many will be able to bypass the preprocessing. * updates to sig_noise make it more flexible * add attributes * new CRAN submission --------- Co-authored-by: Zacharias Steinmetz --- CRAN-SUBMISSION | 6 +- DESCRIPTION | 4 +- NEWS.md | 9 ++ R/as_OpenSpecy.R | 144 +++++++++++++++++------------ R/def_features.R | 2 +- R/match_spec.R | 39 ++++++-- R/sig_noise.R | 63 +++++++++---- README.md | 2 +- cran-comments.md | 2 +- man/as_OpenSpecy.Rd | 127 +++++++++++++------------ man/match_spec.Rd | 8 +- man/sig_noise.Rd | 36 +++++++- tests/testthat/test-def_features.R | 12 +-- tests/testthat/test-match_spec.R | 98 ++++++++++++++++++-- tests/testthat/test-sig_noise.R | 13 +++ tests/testthat/test-workflows.R | 23 ++++- vignettes/app.Rmd | 37 ++++---- vignettes/sop.Rmd | 81 ++++++++-------- 18 files changed, 480 insertions(+), 226 deletions(-) diff --git a/CRAN-SUBMISSION b/CRAN-SUBMISSION index b0067385..645d2890 100644 --- a/CRAN-SUBMISSION +++ b/CRAN-SUBMISSION @@ -1,3 +1,3 @@ -Version: 1.0.5 -Date: 2023-10-31 10:27:37 UTC -SHA: 419b04607656039958a19393f6218f3ca61b817d +Version: 1.0.6 +Date: 2023-11-25 12:56:02 UTC +SHA: 11f89935f939a7a7430eceaba1fda06445134587 diff --git a/DESCRIPTION b/DESCRIPTION index e06d43cc..1a890c9b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: OpenSpecy Type: Package Title: Analyze, Process, Identify, and Share Raman and (FT)IR Spectra -Version: 1.0.5 -Date: 2023-10-31 +Version: 1.0.6 +Date: 2023-11-25 Authors@R: c(person("Win", "Cowger", role = c("cre", "aut", "dtc"), email = "wincowger@gmail.com", comment = c(ORCID = "0000-0001-9226-3104")), diff --git a/NEWS.md b/NEWS.md index 4e82221f..790bb24c 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,12 @@ +# OpenSpecy 1.0.6 + +## Minor Improvements + +- Add attributes to `OpenSpecy` objects +- More flexible `sig_noise()` +- Simpler matching + + # OpenSpecy 1.0.5 ## Minor Improvements diff --git a/R/as_OpenSpecy.R b/R/as_OpenSpecy.R index d4b3557a..8b0b51e0 100644 --- a/R/as_OpenSpecy.R +++ b/R/as_OpenSpecy.R @@ -13,6 +13,8 @@ #' per spectrum. #' @param metadata metadata for each spectrum with one row per spectrum, #' see details. +#' @param attributes a list of attributes describing critical aspects for interpreting the spectra. +#' see details. #' @param coords spatial coordinates for the spectra. #' @param session_id logical. Whether to add a session ID to the metadata. #' The session ID is based on current session info so metadata of the same @@ -32,67 +34,75 @@ #' provides or is harvested from the files themselves. #' #' The \code{metadata} argument may contain a named list with the following -#' details (\code{*} = minimum recommended): +#' details (\code{*} = minimum recommended). +#' +#' \describe{ +#' \item{`file_name*`}{The file name, defaults to +#' \code{\link[base]{basename}()} if not specified} +#' \item{`user_name*`}{User name, e.g. "Win Cowger"} +#' \item{`contact_info`}{Contact information, e.g. "1-513-673-8956, +#' wincowger@@gmail.com"} +#' \item{`organization`}{Affiliation, e.g. "University of California, +#' Riverside"} +#' \item{`citation`}{Data citation, e.g. "Primpke, S., Wirth, M., Lorenz, C., +#' & Gerdts, G. (2018). Reference database design for the automated analysis +#' of microplastic samples based on Fourier transform infrared (FTIR) +#' spectroscopy. \emph{Analytical and Bioanalytical Chemistry}. +#' \doi{10.1007/s00216-018-1156-x}"} +#' \item{`spectrum_type*`}{Raman or FTIR} +#' \item{`spectrum_identity*`}{Material/polymer analyzed, e.g. +#' "Polystyrene"} +#' \item{`material_form`}{Form of the material analyzed, e.g. textile fiber, +#' rubber band, sphere, granule } +#' \item{`material_phase`}{Phase of the material analyzed (liquid, gas, solid) } +#' \item{`material_producer`}{Producer of the material analyzed, e.g. Dow } +#' \item{`material_purity`}{Purity of the material analyzed, e.g. 99.98%} +#' \item{`material_quality`}{Quality of the material analyzed, e.g. +#' consumer product, manufacturer material, analytical standard, +#' environmental sample } +#' \item{`material_color`}{Color of the material analyzed, +#' e.g. blue, #0000ff, (0, 0, 255) } +#' \item{material_other}{Other material description, e.g. 5 µm diameter +#' fibers, 1 mm spherical particles } +#' \item{`cas_number`}{CAS number, e.g. 9003-53-6 } +#' \item{`instrument_used`}{Instrument used, e.g. Horiba LabRam } +#' \item{instrument_accessories}{Instrument accessories, e.g. +#' Focal Plane Array, CCD} +#' \item{`instrument_mode`}{Instrument modes/settings, e.g. +#' transmission, reflectance } +#' \item{`intensity_units*`}{Units of the intensity values for the spectrum, +#' options transmittance, reflectance, absorbance } +#' \item{`spectral_resolution`}{Spectral resolution, e.g. 4/cm } +#' \item{`laser_light_used`}{Wavelength of the laser/light used, e.g. +#' 785 nm } +#' \item{`number_of_accumulations`}{Number of accumulations, e.g 5 } +#' \item{`total_acquisition_time_s`}{Total acquisition time (s), e.g. 10 s} +#' \item{`data_processing_procedure`}{Data processing procedure, +#' e.g. spikefilter, baseline correction, none } +#' \item{`level_of_confidence_in_identification`}{Level of confidence in +#' identification, e.g. 99% } +#' \item{`other_info`}{Other information } +#' \item{`license`}{The license of the shared spectrum; defaults to +#' \code{"CC BY-NC"} (see \url{https://creativecommons.org/licenses/by-nc/4.0/} +#' for details). Any other creative commons license is allowed, for example, +#' CC0 or CC BY} +#' \item{`session_id`}{A unique user and session identifier; populated +#' automatically with \code{paste(digest(Sys.info()), digest(sessionInfo()), +#' sep = "/")}} +#' \item{`file_id`}{A unique file identifier; populated automatically +#' with \code{digest(object[c("wavenumber", "spectra")])}} +#' } +#' +#' The \code{attributes} argument may contain a named list with the following +#' details, when set, they will be used to automate transformations and warning messages: #' -#' \tabular{ll}{ -#' \code{file_name*}: \tab The file name, defaults to -#' \code{\link[base]{basename}()} if not specified\cr -#' \code{user_name*}: \tab User name, e.g. "Win Cowger"\cr -#' \code{contact_info}: \tab Contact information, e.g. "1-513-673-8956, -#' wincowger@@gmail.com"\cr -#' \code{organization}: \tab Affiliation, e.g. "University of California, -#' Riverside"\cr -#' \code{citation}: \tab Data citation, e.g. "Primpke, S., Wirth, M., Lorenz, -#' C., & Gerdts, G. (2018). Reference database design for the automated analysis -#' of microplastic samples based on Fourier transform infrared (FTIR) -#' spectroscopy. \emph{Analytical and Bioanalytical Chemistry}. -#' \doi{10.1007/s00216-018-1156-x}"\cr -#' \code{spectrum_type*}: \tab Raman or FTIR\cr -#' \code{spectrum_identity*}: \tab Material/polymer analyzed, e.g. -#' "Polystyrene"\cr -#' \code{material_form}: \tab Form of the material analyzed, e.g. textile fiber, -#' rubber band, sphere, granule \cr -#' \code{material_phase}: \tab Phase of the material analyzed (liquid, gas, -#' solid) \cr -#' \code{material_producer}: \tab Producer of the material analyzed, -#' e.g. Dow \cr -#' \code{material_purity}: \tab Purity of the material analyzed, e.g. 99.98% -#' \cr -#' \code{material_quality}: \tab Quality of the material analyzed, e.g. -#' consumer product, manufacturer material, analytical standard, -#' environmental sample \cr -#' \code{material_color}: \tab Color of the material analyzed, -#' e.g. blue, #0000ff, (0, 0, 255) \cr -#' \code{material_other}: \tab Other material description, e.g. 5 µm diameter -#' fibers, 1 mm spherical particles \cr -#' \code{cas_number}: \tab CAS number, e.g. 9003-53-6 \cr -#' \code{instrument_used}: \tab Instrument used, e.g. Horiba LabRam \cr -#' \code{instrument_accessories}: \tab Instrument accessories, e.g. -#' Focal Plane Array, CCD\cr -#' \code{instrument_mode}: \tab Instrument modes/settings, e.g. -#' transmission, reflectance \cr -#' \code{intensity_units*}: \tab Units of the intensity values for the spectrum, -#' options transmittance, reflectance, absorbance \cr -#' \code{spectral_resolution}: \tab Spectral resolution, e.g. 4/cm \cr -#' \code{laser_light_used}: \tab Wavelength of the laser/light used, e.g. -#' 785 nm \cr -#' \code{number_of_accumulations}: \tab Number of accumulations, e.g 5 \cr -#' \code{total_acquisition_time_s}: \tab Total acquisition time (s), e.g. 10 s -#' \cr -#' \code{data_processing_procedure}: \tab Data processing procedure, -#' e.g. spikefilter, baseline correction, none \cr -#' \code{level_of_confidence_in_identification}: \tab Level of confidence in -#' identification, e.g. 99% \cr -#' \code{other_info}: \tab Other information \cr -#' \code{license}: \tab The license of the shared spectrum; defaults to -#' \code{"CC BY-NC"} (see -#' \url{https://creativecommons.org/licenses/by-nc/4.0/} for details). Any other -#' creative commons license is allowed, for example, CC0 or CC BY \cr -#' \code{session_id}: \tab A unique user and session identifier; populated -#' automatically with \code{paste(digest(Sys.info()), digest(sessionInfo()), -#' sep = "/")}\cr -#' \code{file_id}: \tab A unique file identifier; populated automatically -#' with \code{digest(object[c("wavenumber", "spectra")])}\cr +#' \describe{ +#' \item{`intensity_units`}{supported options include `"absorbance"`, +#' `"transmittance"`, or `"reflectance"`} +#' \item{`derivative_order`}{supported options include `"0"`, `"1"`, or +#' `"2"`} +#' \item{`baseline`}{supported options include `"raw"` or `"nobaseline"`} +#' \item{`spectra_type`}{supported options include `"ftir"` or `"raman"`} #' } #' #' @return @@ -250,6 +260,12 @@ as_OpenSpecy.default <- function(x, spectra, level_of_confidence_in_identification = NULL, other_info = NULL, license = "CC BY-NC"), + attributes = list( + intensity_unit = NULL, + derivative_order = NULL, + baseline = NULL, + spectra_type = NULL + ), coords = "gen_grid", session_id = FALSE, ...) { @@ -266,7 +282,13 @@ as_OpenSpecy.default <- function(x, spectra, if (length(x) != nrow(spectra)) stop("'x' and 'spectra' must be of equal length", call. = F) - obj <- structure(list(), class = c("OpenSpecy", "list")) + obj <- structure(list(), + class = c("OpenSpecy", "list"), + intensity_unit = attributes$intensity_unit, + derivative_order = attributes$derivative_order, + baseline = attributes$baseline, + spectra_type = attributes$spectra_type + ) obj$wavenumber <- x[order(x)] diff --git a/R/def_features.R b/R/def_features.R index 0b167299..52c9a499 100644 --- a/R/def_features.R +++ b/R/def_features.R @@ -119,7 +119,7 @@ def_features.OpenSpecy <- function(x, features, ...) { #' @importFrom stats dist .def_features <- function(x, binary, name = NULL) { # Label connected components in the binary image - binary_matrix <- matrix(binary, ncol = max(x$metadata$y) + 1, byrow = T) + binary_matrix <- matrix(binary, ncol = max(x$metadata$x) + 1, byrow = T) labeled_image <- imager::label(imager::as.cimg(binary_matrix), high_connectivity = T) diff --git a/R/match_spec.R b/R/match_spec.R index 1a3467c5..0d473786 100644 --- a/R/match_spec.R +++ b/R/match_spec.R @@ -14,6 +14,8 @@ #' \code{filter_spec()} filters an Open Specy object. #' #' @param x an \code{OpenSpecy} object, typically with unknowns. +#' @param conform Whether to conform the spectra to the library wavenumbers or not. +#' @param type the type of conformation to make returned by \code{conform_spec()} #' @param library an \code{OpenSpecy} or \code{glmnet} object representing the #' reference library of spectra or model to use in identification. #' @param na.rm logical; indicating whether missing values should be removed @@ -93,11 +95,30 @@ cor_spec.default <- function(x, ...) { #' @rdname match_spec #' #' @export -cor_spec.OpenSpecy <- function(x, library, na.rm = T, ...) { +cor_spec.OpenSpecy <- function(x, library, na.rm = T, conform = F, + type = "roll", ...) { + if(conform) x <- conform_spec(x, library$wavenumber, res = NULL, type) + + if(!is.null(attr(x, "intensity_unit")) && + attr(x, "intensity_unit") != attr(library, "intensity_unit")) + warning("Intensity units between the library and unknown are not the same") + + if(!is.null(attr(x, "derivative_order")) && + attr(x, "derivative_order") != attr(library, "derivative_order")) + warning("Derivative orders between the library and unknown are not the same") + + if(!is.null(attr(x, "baseline")) && + attr(x, "baseline") != attr(library, "baseline")) + warning("Baselines between the library and unknown are not the same") + + if(!is.null(attr(x, "spectra_type")) && + attr(x, "spectra_type") != attr(library, "spectra_type")) + warning("Spectra types between the library and unknown are not the same") + if(sum(x$wavenumber %in% library$wavenumber) < 3) stop("there are less than 3 matching wavenumbers in the objects you are ", - "trying to correlate; this won't work for correlation analysis. ", - "Consider first conforming the spectra to the same wavenumbers.", + "trying to correlate; this won't work for correlation analysis; ", + "consider first conforming the spectra to the same wavenumbers", call. = F) if(!all(x$wavenumber %in% library$wavenumber)) @@ -134,11 +155,12 @@ match_spec.default <- function(x, ...) { #' @rdname match_spec #' #' @export -match_spec.OpenSpecy <- function(x, library, na.rm = T, top_n = NULL, - order = NULL, add_library_metadata = NULL, +match_spec.OpenSpecy <- function(x, library, na.rm = T, conform = F, + type = "roll", top_n = NULL, order = NULL, + add_library_metadata = NULL, add_object_metadata = NULL, fill = NULL, ...) { if(is_OpenSpecy(library)) { - res <- cor_spec(x, library = library) |> + res <- cor_spec(x, library = library, conform = conform, type = type) |> ident_spec(x, library = library, top_n = top_n, add_library_metadata = add_library_metadata, add_object_metadata = add_object_metadata) @@ -259,6 +281,11 @@ filter_spec.OpenSpecy <- function(x, logic, ...) { x$spectra <- x$spectra[, logic, with = F] x$metadata <- x$metadata[logic,] + if(ncol(x$spectra) == 0 | ncol(x$metadata) == 0) + stop("the OpenSpecy object created contains zero spectra, this is not well ", + "supported, if you have specific scenarios where this is required ", + "please share it with the developers and we can make a workaround") + return(x) } diff --git a/R/sig_noise.R b/R/sig_noise.R index ecef02c8..ec36f961 100644 --- a/R/sig_noise.R +++ b/R/sig_noise.R @@ -7,13 +7,19 @@ #' #' @param x an \code{OpenSpecy} object. #' @param metric character; specifying the desired metric to calculate. +#' @param step numeric; the step size of the region to look for the run_sig_over_noise option. +#' @param sig_min numeric; the minimum wavenumber value for the signal region. +#' @param sig_max numeric; the maximum wavenumber value for the signal region. +#' @param noise_min numeric; the minimum wavenumber value for the noise region. +#' @param noise_max numeric; the maximum wavenumber value for the noise region. +#' @param abs logical; whether to return the absolute value of the result #' Options include \code{"sig"} (mean intensity), \code{"noise"} (standard #' deviation of intensity), \code{"sig_times_noise"} (absolute value of #' signal times noise), \code{"sig_over_noise"} (absolute value of signal / #' noise), \code{"run_sig_over_noise"} (absolute value of signal / -#' noise where signal is estimated as the max intensity and noise is -#' estimated as the height of a low intensity region.), -#' \code{"log_tot_sig"} (sum of the inverse log intensities, useful for spectra in log units), +#' noise where signal is estimated as the max intensity and noise is +#' estimated as the height of a low intensity region.), +#' \code{"log_tot_sig"} (sum of the inverse log intensities, useful for spectra in log units), #' or \code{"tot_sig"} (sum of intensities). #' @param na.rm logical; indicating whether missing values should be removed #' when calculating signal and noise. Default is \code{TRUE}. @@ -23,6 +29,7 @@ #' A numeric vector containing the calculated metric for each spectrum in the #' \code{OpenSpecy} object. #' +#' @seealso [restrict_range()] #' @examples #' data("raman_hdpe") #' @@ -49,32 +56,50 @@ sig_noise.default <- function(x, ...) { #' #' @export sig_noise.OpenSpecy <- function(x, metric = "run_sig_over_noise", - na.rm = TRUE, ...) { - vapply(x$spectra, function(y) { - if(length(y[!is.na(y)]) < 20) { - warning("Need at least 20 intensity values to calculate the signal or ", - "noise values accurately; returning NA", call. = F) - return(NA) - } + na.rm = TRUE, step = 20, + sig_min = NULL, sig_max = NULL, + noise_min = NULL, noise_max = NULL, abs = T, ...) { + values <- vapply(x$spectra, function(y) { if(metric == "run_sig_over_noise") { - max <- frollapply(y[!is.na(y)], 20, max) - max[(length(max) - 19):length(max)] <- NA - signal <- max(max, na.rm = T)#/mean(x, na.rm = T) + if(length(y[!is.na(y)]) < step) { + warning(paste0("Need at least ", step, " intensity values to calculate ", + "the signal or noise values accurately with ", + "run_sig_over_noise; returning NA"), call. = F) + return(NA) + } + max <- frollapply(y[!is.na(y)], step, max) + max[(length(max) - (step-1)):length(max)] <- NA + signal <- max(max, na.rm = T) noise <- median(max[max != 0], na.rm = T) - } - else { - signal = mean(y, na.rm = na.rm) - noise = sd(y, na.rm = na.rm) + } else { + if(!is.null(sig_min) & !is.null(sig_max)){ + sig_intens <- y[x$wavenumber >= sig_min & x$wavenumber <= sig_max] + } else { + sig_intens <- y + } + if(!is.null(noise_min) & !is.null(noise_max)){ + noise_intens <- y[x$wavenumber >= noise_min & x$wavenumber <= noise_max] + } else { + noise_intens <- y + } + signal <- mean(sig_intens, na.rm = na.rm) + noise <- sd(noise_intens, na.rm = na.rm) } if(metric == "sig") return(signal) if(metric == "noise") return(noise) - if(metric == "sig_times_noise") return(abs(signal * noise)) + if(metric == "sig_times_noise") return(signal * noise) if(metric %in% c("sig_over_noise", "run_sig_over_noise")) - return(abs(signal/noise)) + return(signal/noise) if(metric == "tot_sig") return(sum(y)) if(metric == "log_tot_sig") return(sum(exp(y))) }, FUN.VALUE = numeric(1)) + + if(abs) { + return(abs(values)) + } else { + return(values) + } } diff --git a/README.md b/README.md index 097f5e68..118a4bfd 100644 --- a/README.md +++ b/README.md @@ -103,5 +103,5 @@ Needs an Open Source Community: Open Specy to the Rescue!” [10.1021/acs.analchem.1c00123](https://doi.org/10.1021/acs.analchem.1c00123). Cowger W, Steinmetz Z, Leong N, Faltynkova A (2023). “OpenSpecy: Analyze, -Process, Identify, and Share Raman and (FT)IR Spectra.” *R package*, **1.0.5**. +Process, Identify, and Share Raman and (FT)IR Spectra.” *R package*, **1.0.6**. [https://github.com/wincowgerDEV/OpenSpecy-package](https://github.com/wincowgerDEV/OpenSpecy-package). diff --git a/cran-comments.md b/cran-comments.md index 80f5fa4a..43052686 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,6 +1,6 @@ ## Test environments -* manjaro linux 6.5.5-1 (local), R-4.3.1 +* manjaro linux 6.6.1-1 (local), R-4.3.2 * macOS latest (via GitHub Actions), R-release * ubuntu latest (via GitHub Actions), R-devel * ubuntu latest (via GitHub Actions), R-release diff --git a/man/as_OpenSpecy.Rd b/man/as_OpenSpecy.Rd index c5df4dc1..693bd25e 100644 --- a/man/as_OpenSpecy.Rd +++ b/man/as_OpenSpecy.Rd @@ -36,6 +36,8 @@ as_OpenSpecy(x, ...) total_acquisition_time_s = NULL, data_processing_procedure = NULL, level_of_confidence_in_identification = NULL, other_info = NULL, license = "CC BY-NC"), + attributes = list(intensity_unit = NULL, derivative_order = NULL, baseline = NULL, + spectra_type = NULL), coords = "gen_grid", session_id = FALSE, ... @@ -68,6 +70,9 @@ per spectrum.} \item{metadata}{metadata for each spectrum with one row per spectrum, see details.} +\item{attributes}{a list of attributes describing critical aspects for interpreting the spectra. +see details.} + \item{coords}{spatial coordinates for the spectra.} \item{n}{number of spectra to generate the spatial coordinate grid with.} @@ -96,67 +101,75 @@ the third item is another \code{data.table} with any metadata the user provides or is harvested from the files themselves. The \code{metadata} argument may contain a named list with the following -details (\code{*} = minimum recommended): - -\tabular{ll}{ -\code{file_name*}: \tab The file name, defaults to -\code{\link[base]{basename}()} if not specified\cr -\code{user_name*}: \tab User name, e.g. "Win Cowger"\cr -\code{contact_info}: \tab Contact information, e.g. "1-513-673-8956, -wincowger@gmail.com"\cr -\code{organization}: \tab Affiliation, e.g. "University of California, -Riverside"\cr -\code{citation}: \tab Data citation, e.g. "Primpke, S., Wirth, M., Lorenz, -C., & Gerdts, G. (2018). Reference database design for the automated analysis +details (\code{*} = minimum recommended). + +\describe{ +\item{\verb{file_name*}}{The file name, defaults to +\code{\link[base]{basename}()} if not specified} +\item{\verb{user_name*}}{User name, e.g. "Win Cowger"} +\item{\code{contact_info}}{Contact information, e.g. "1-513-673-8956, +wincowger@gmail.com"} +\item{\code{organization}}{Affiliation, e.g. "University of California, +Riverside"} +\item{\code{citation}}{Data citation, e.g. "Primpke, S., Wirth, M., Lorenz, C., +& Gerdts, G. (2018). Reference database design for the automated analysis of microplastic samples based on Fourier transform infrared (FTIR) spectroscopy. \emph{Analytical and Bioanalytical Chemistry}. -\doi{10.1007/s00216-018-1156-x}"\cr -\code{spectrum_type*}: \tab Raman or FTIR\cr -\code{spectrum_identity*}: \tab Material/polymer analyzed, e.g. -"Polystyrene"\cr -\code{material_form}: \tab Form of the material analyzed, e.g. textile fiber, -rubber band, sphere, granule \cr -\code{material_phase}: \tab Phase of the material analyzed (liquid, gas, -solid) \cr -\code{material_producer}: \tab Producer of the material analyzed, -e.g. Dow \cr -\code{material_purity}: \tab Purity of the material analyzed, e.g. 99.98\% -\cr -\code{material_quality}: \tab Quality of the material analyzed, e.g. +\doi{10.1007/s00216-018-1156-x}"} +\item{\verb{spectrum_type*}}{Raman or FTIR} +\item{\verb{spectrum_identity*}}{Material/polymer analyzed, e.g. +"Polystyrene"} +\item{\code{material_form}}{Form of the material analyzed, e.g. textile fiber, +rubber band, sphere, granule } +\item{\code{material_phase}}{Phase of the material analyzed (liquid, gas, solid) } +\item{\code{material_producer}}{Producer of the material analyzed, e.g. Dow } +\item{\code{material_purity}}{Purity of the material analyzed, e.g. 99.98\%} +\item{\code{material_quality}}{Quality of the material analyzed, e.g. consumer product, manufacturer material, analytical standard, -environmental sample \cr -\code{material_color}: \tab Color of the material analyzed, -e.g. blue, #0000ff, (0, 0, 255) \cr -\code{material_other}: \tab Other material description, e.g. 5 µm diameter -fibers, 1 mm spherical particles \cr -\code{cas_number}: \tab CAS number, e.g. 9003-53-6 \cr -\code{instrument_used}: \tab Instrument used, e.g. Horiba LabRam \cr -\code{instrument_accessories}: \tab Instrument accessories, e.g. -Focal Plane Array, CCD\cr -\code{instrument_mode}: \tab Instrument modes/settings, e.g. -transmission, reflectance \cr -\code{intensity_units*}: \tab Units of the intensity values for the spectrum, -options transmittance, reflectance, absorbance \cr -\code{spectral_resolution}: \tab Spectral resolution, e.g. 4/cm \cr -\code{laser_light_used}: \tab Wavelength of the laser/light used, e.g. -785 nm \cr -\code{number_of_accumulations}: \tab Number of accumulations, e.g 5 \cr -\code{total_acquisition_time_s}: \tab Total acquisition time (s), e.g. 10 s -\cr -\code{data_processing_procedure}: \tab Data processing procedure, -e.g. spikefilter, baseline correction, none \cr -\code{level_of_confidence_in_identification}: \tab Level of confidence in -identification, e.g. 99\% \cr -\code{other_info}: \tab Other information \cr -\code{license}: \tab The license of the shared spectrum; defaults to -\code{"CC BY-NC"} (see -\url{https://creativecommons.org/licenses/by-nc/4.0/} for details). Any other -creative commons license is allowed, for example, CC0 or CC BY \cr -\code{session_id}: \tab A unique user and session identifier; populated +environmental sample } +\item{\code{material_color}}{Color of the material analyzed, +e.g. blue, #0000ff, (0, 0, 255) } +\item{material_other}{Other material description, e.g. 5 µm diameter +fibers, 1 mm spherical particles } +\item{\code{cas_number}}{CAS number, e.g. 9003-53-6 } +\item{\code{instrument_used}}{Instrument used, e.g. Horiba LabRam } +\item{instrument_accessories}{Instrument accessories, e.g. +Focal Plane Array, CCD} +\item{\code{instrument_mode}}{Instrument modes/settings, e.g. +transmission, reflectance } +\item{\verb{intensity_units*}}{Units of the intensity values for the spectrum, +options transmittance, reflectance, absorbance } +\item{\code{spectral_resolution}}{Spectral resolution, e.g. 4/cm } +\item{\code{laser_light_used}}{Wavelength of the laser/light used, e.g. +785 nm } +\item{\code{number_of_accumulations}}{Number of accumulations, e.g 5 } +\item{\code{total_acquisition_time_s}}{Total acquisition time (s), e.g. 10 s} +\item{\code{data_processing_procedure}}{Data processing procedure, +e.g. spikefilter, baseline correction, none } +\item{\code{level_of_confidence_in_identification}}{Level of confidence in +identification, e.g. 99\% } +\item{\code{other_info}}{Other information } +\item{\code{license}}{The license of the shared spectrum; defaults to +\code{"CC BY-NC"} (see \url{https://creativecommons.org/licenses/by-nc/4.0/} +for details). Any other creative commons license is allowed, for example, +CC0 or CC BY} +\item{\code{session_id}}{A unique user and session identifier; populated automatically with \code{paste(digest(Sys.info()), digest(sessionInfo()), -sep = "/")}\cr -\code{file_id}: \tab A unique file identifier; populated automatically -with \code{digest(object[c("wavenumber", "spectra")])}\cr + sep = "/")}} +\item{\code{file_id}}{A unique file identifier; populated automatically +with \code{digest(object[c("wavenumber", "spectra")])}} +} + +The \code{attributes} argument may contain a named list with the following +details, when set, they will be used to automate transformations and warning messages: + +\describe{ +\item{\code{intensity_units}}{supported options include \code{"absorbance"}, +\code{"transmittance"}, or \code{"reflectance"}} +\item{\code{derivative_order}}{supported options include \code{"0"}, \code{"1"}, or +\code{"2"}} +\item{\code{baseline}}{supported options include \code{"raw"} or \code{"nobaseline"}} +\item{\code{spectra_type}}{supported options include \code{"ftir"} or \code{"raman"}} } } \examples{ diff --git a/man/match_spec.Rd b/man/match_spec.Rd index 8cf79842..46c43442 100644 --- a/man/match_spec.Rd +++ b/man/match_spec.Rd @@ -24,7 +24,7 @@ cor_spec(x, ...) \method{cor_spec}{default}(x, ...) -\method{cor_spec}{OpenSpecy}(x, library, na.rm = T, ...) +\method{cor_spec}{OpenSpecy}(x, library, na.rm = T, conform = F, type = "roll", ...) match_spec(x, ...) @@ -34,6 +34,8 @@ match_spec(x, ...) x, library, na.rm = T, + conform = F, + type = "roll", top_n = NULL, order = NULL, add_library_metadata = NULL, @@ -81,6 +83,10 @@ reference library of spectra or model to use in identification.} \item{na.rm}{logical; indicating whether missing values should be removed when calculating correlations. Default is \code{TRUE}.} +\item{conform}{Whether to conform the spectra to the library wavenumbers or not.} + +\item{type}{the type of conformation to make returned by \code{conform_spec()}} + \item{top_n}{integer; specifying the number of top matches to return. If \code{NULL} (default), all matches will be returned.} diff --git a/man/sig_noise.Rd b/man/sig_noise.Rd index 99d0e588..306d34e5 100644 --- a/man/sig_noise.Rd +++ b/man/sig_noise.Rd @@ -10,12 +10,38 @@ sig_noise(x, ...) \method{sig_noise}{default}(x, ...) -\method{sig_noise}{OpenSpecy}(x, metric = "run_sig_over_noise", na.rm = TRUE, ...) +\method{sig_noise}{OpenSpecy}( + x, + metric = "run_sig_over_noise", + na.rm = TRUE, + step = 20, + sig_min = NULL, + sig_max = NULL, + noise_min = NULL, + noise_max = NULL, + abs = T, + ... +) } \arguments{ \item{x}{an \code{OpenSpecy} object.} -\item{metric}{character; specifying the desired metric to calculate. +\item{metric}{character; specifying the desired metric to calculate.} + +\item{na.rm}{logical; indicating whether missing values should be removed +when calculating signal and noise. Default is \code{TRUE}.} + +\item{step}{numeric; the step size of the region to look for the run_sig_over_noise option.} + +\item{sig_min}{numeric; the minimum wavenumber value for the signal region.} + +\item{sig_max}{numeric; the maximum wavenumber value for the signal region.} + +\item{noise_min}{numeric; the minimum wavenumber value for the noise region.} + +\item{noise_max}{numeric; the maximum wavenumber value for the noise region.} + +\item{abs}{logical; whether to return the absolute value of the result Options include \code{"sig"} (mean intensity), \code{"noise"} (standard deviation of intensity), \code{"sig_times_noise"} (absolute value of signal times noise), \code{"sig_over_noise"} (absolute value of signal / @@ -25,9 +51,6 @@ estimated as the height of a low intensity region.), \code{"log_tot_sig"} (sum of the inverse log intensities, useful for spectra in log units), or \code{"tot_sig"} (sum of intensities).} -\item{na.rm}{logical; indicating whether missing values should be removed -when calculating signal and noise. Default is \code{TRUE}.} - \item{\ldots}{further arguments passed to subfunctions; currently not used.} } \value{ @@ -46,3 +69,6 @@ sig_noise(raman_hdpe, metric = "noise") sig_noise(raman_hdpe, metric = "sig_times_noise") } +\seealso{ +\code{\link[=restrict_range]{restrict_range()}} +} diff --git a/tests/testthat/test-def_features.R b/tests/testthat/test-def_features.R index d1420350..50bb1126 100644 --- a/tests/testthat/test-def_features.R +++ b/tests/testthat/test-def_features.R @@ -15,11 +15,11 @@ test_that("features are identified when given logical", { unique(id_map$metadata$feature_id) |> expect_length(2) max(id_map$metadata$area, na.rm = T) |> expect_equal(16) max(id_map$metadata$feret_max, na.rm = T) |> round(2) |> - expect_equal(13.04) + expect_equal(16) max(id_map$metadata$feret_min, na.rm = T) |> round(2) |> - expect_equal(1.23) + expect_equal(1) max(id_map$metadata$perimeter, na.rm = T) |> round(2) |> - expect_equal(25.05) + expect_equal(30) }) test_that("particles are identified when given character", { @@ -31,7 +31,7 @@ test_that("particles are identified when given character", { max(id_map$metadata$area, na.rm = T) |> expect_equal(176) max(id_map$metadata$feret_max, na.rm = T) |> round(2) |> - expect_equal(18.69) + expect_equal(19.03) }) test_that("an error is thrown for invalid feature input", { @@ -81,7 +81,7 @@ test_that("collapse particles returns expected values", { test_collapsed$metadata |> nrow() |> expect_equal(3) test_collapsed$metadata$feret_max |> round(2) |> - expect_equal(c(13.04, 13.04, 18.69)) + expect_equal(c(16.0, 16.0, 19.03)) test_collapsed$metadata$centroid_x |> unique() |> expect_equal(7.5) @@ -95,7 +95,7 @@ test_that("collapse particles returns expected values", { test_collapsed$metadata |> nrow() |> expect_equal(2) test_collapsed$metadata$feret_max |> round(2) |> - expect_equal(c(NA, 13.04)) + expect_equal(c(NA, 16)) test_collapsed$metadata$centroid_x |> unique() |> expect_equal(7.5) diff --git a/tests/testthat/test-match_spec.R b/tests/testthat/test-match_spec.R index c77b118c..943ba564 100644 --- a/tests/testthat/test-match_spec.R +++ b/tests/testthat/test-match_spec.R @@ -46,6 +46,81 @@ test_that("match_spec() handles input errors correctly", { match_spec(1:1000) |> expect_error() }) +test_that("match_spec() handles attribute issues correctly", { + preproc_wa <- as_OpenSpecy(preproc$wavenumber, + preproc$spectra, + preproc$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "absorbance", + derivative_order = 1, + baseline = "nobaseline", + spectra_type = "ftir")) + + test_lib_wa <- as_OpenSpecy(test_lib$wavenumber, + test_lib$spectra, + test_lib$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "absorbance", + derivative_order = 1, + baseline = "nobaseline", + spectra_type = "ftir")) + + match_spec(x = preproc_wa, library = test_lib_wa, na.rm = T, top_n = 5, + add_library_metadata = "sample_name", + add_object_metadata = "col_id") |> + expect_silent() + + test_lib_wa2 <- as_OpenSpecy(test_lib$wavenumber, + test_lib$spectra, + test_lib$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "transmittance", + derivative_order = 1, + baseline = "nobaseline", + spectra_type = "ftir")) + + match_spec(x = preproc_wa, library = test_lib_wa2, na.rm = T, top_n = 5, + add_library_metadata = "sample_name", + add_object_metadata = "col_id") |> + expect_warning() + + test_lib_wa3 <- as_OpenSpecy(test_lib$wavenumber, + test_lib$spectra, + test_lib$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "absorbance", + derivative_order = 2, + baseline = "nobaseline", + spectra_type = "ftir")) + + match_spec(x = preproc_wa, library = test_lib_wa3, na.rm = T, top_n = 5, + add_library_metadata = "sample_name", + add_object_metadata = "col_id") |> + expect_warning() + + test_lib_wa4 <- as_OpenSpecy(test_lib$wavenumber, + test_lib$spectra, + test_lib$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "absorbance", + derivative_order = 1, + baseline = "raw", + spectra_type = "ftir")) + + match_spec(x = preproc_wa, library = test_lib_wa4, na.rm = T, top_n = 5, + add_library_metadata = "sample_name", + add_object_metadata = "col_id") |> + expect_warning() + + test_lib_wa5 <- as_OpenSpecy(test_lib$wavenumber, + test_lib$spectra, + test_lib$metadata[,-c("x", "y")], + attributes = list(intensity_unit = "absorbance", + derivative_order = 1, + baseline = "nobaseline", + spectra_type = "raman")) + + match_spec(x = preproc_wa, library = test_lib_wa5, na.rm = T, top_n = 5, + add_library_metadata = "sample_name", + add_object_metadata = "col_id") |> + expect_warning() +}) + test_that("match_spec() returns correct structure", { matches <- match_spec(x = preproc, library = test_lib, na.rm = T, top_n = 5, add_library_metadata = "sample_name", @@ -137,12 +212,9 @@ test_that("Test that raman hdpe accurately identified", { }) # Write the tests for filter_spec function -test_that("filter_spec() returns erroneous OpenSpecy object when removing all spectra", { +test_that("filter_spec() does not allow for OpenSpecy object without spectra", { os_filtered <- filter_spec(test_lib, logic = rep(F, ncol(test_lib$spectra))) |> - expect_silent() - check_OpenSpecy(os_filtered) |> expect_warning() |> expect_warning() - expect_equal(ncol(os_filtered$spectra), 0) - expect_equal(nrow(os_filtered$metadata), 0) + expect_error() }) # Write the tests for filter_spec function @@ -153,7 +225,6 @@ test_that("filter_spec() returns OpenSpecy object with filtered spectra", { os_filtered <- filter_spec(test_lib, logic = logic) |> expect_silent() expect_true(check_OpenSpecy(os_filtered)) - expect_equal(ncol(os_filtered$spectra), 1) expect_equal(nrow(os_filtered$metadata), 1) }) @@ -168,6 +239,21 @@ test_that("cor_spec() routine and match_spec() return same values", { names <- max_correlations |> sort(decreasing = T) |> names() top_matches <- match_spec(x = tiny_map, library = test_lib, top_n = 1) expect_identical(names, top_matches$library_id) + top_matches_2 <- match_spec(x = tiny_map, library = test_lib, top_n = 2)[, head(.SD, 1), by = "object_id"] + expect_identical(names, top_matches$library_id) +}) + +test_that("cor_spec() routine with preprocessing returns same values as setting conform = T", { + tiny_map2 <- read_extdata("CA_tiny_map.zip") |> + read_any() |> + process_spec(smooth_intens = T, conform_spec = F, make_rel = T) + + tiny_map3 <- tiny_map2 |> + conform_spec(range = test_lib$wavenumber, res = NULL, type = "roll") + + cors <- cor_spec(tiny_map3, test_lib) + cors2 <- cor_spec(tiny_map2, test_lib, conform = T, type = "roll") + expect_identical(cors, cors2) }) # Tidy up diff --git a/tests/testthat/test-sig_noise.R b/tests/testthat/test-sig_noise.R index c4cfc0c0..ed536021 100644 --- a/tests/testthat/test-sig_noise.R +++ b/tests/testthat/test-sig_noise.R @@ -6,6 +6,18 @@ test_that("sig_noise() handles input errors corretly", { expect_warning() }) +test_that("sig_noise() can restrict the range correctly", { + big_peak <- sig_noise(raman_hdpe, metric = "sig", sig_max = 3000, + sig_min = 2500) + low_region <- sig_noise(raman_hdpe, metric = "sig", sig_max = 1000, + sig_min = 500) + expect_true(big_peak > low_region) + restric_peak <- restrict_range(raman_hdpe, min = 2500, max = 3000, + make_rel = F) |> + sig_noise(metric = "sig") + expect_identical(big_peak, restric_peak) +}) + test_that("sig_noise() returns correct values", { sig_noise(raman_hdpe, metric = "sig") |> round(2) |> unname() |> expect_equal(101.17) @@ -22,3 +34,4 @@ test_that("sig_noise() returns correct values", { sig_noise(raman_hdpe, metric = "tot_sig") |> round(2) |> unname() |> expect_equal(97527) }) + diff --git a/tests/testthat/test-workflows.R b/tests/testthat/test-workflows.R index 59d8e8e2..468f51d1 100644 --- a/tests/testthat/test-workflows.R +++ b/tests/testthat/test-workflows.R @@ -20,7 +20,7 @@ test_that("Raman batch analysis with test library", { filter_spec(lib, lib$metadata$SpectrumType == "Raman") |> expect_silent() batch2 <- conform_spec(batch, lib$wavenumber, res = spec_res(lib$wavenumber)) |> expect_silent() - + expect_true(check_OpenSpecy(batch2)) plotly_spec(batch2) |> expect_silent() @@ -31,7 +31,7 @@ test_that("Raman batch analysis with test library", { plotly_spec(x = batch3, x2 = batch) |> expect_silent() expect_true(check_OpenSpecy(batch3)) - + matches <- cor_spec(batch3, library = lib) |> expect_silent() test_max_cor <- max_cor_named(matches) |> expect_silent() sig_noise(batch3, metric = "run_sig_over_noise") |> @@ -78,5 +78,24 @@ test_that("Raman batch analysis with complete library", { expect_silent() }) +test_that("One particle is identified with standard workflow in map", { + skip_on_cran() + + map <- read_extdata("CA_tiny_map.zip") |> read_any() + signal_noise <- sig_noise(map, metric = "sig_times_noise", abs = F) + + id_map <- def_features(map,signal_noise > 0.01) + unique(id_map$metadata$feature_id) |> length() |> expect_equal(4) + + test_collapsed <- collapse_spec(id_map) + + test_collapsed$metadata |> nrow() |> + expect_equal(4) + test_collapsed$metadata$feret_max |> round(2) |> + expect_equal(c(NA, 8, 12.31, 4.00)) + test_collapsed$metadata$centroid_x |> round(2) |> + expect_equal(c(7.87, 2.00, 7.9, 0.00)) +}) + # Tidy up unlink(tmp, recursive = T) diff --git a/vignettes/app.Rmd b/vignettes/app.Rmd index 0d0eba2c..f27b1663 100644 --- a/vignettes/app.Rmd +++ b/vignettes/app.Rmd @@ -20,21 +20,23 @@ knitr::opts_chunk$set( # Document Overview -This document outlines a common workflow for using the Open Specy -app and highlights some topics that users are often requesting a tutorial on. If -the document is followed sequentially from beginning to end, the user will have -a better understanding of every procedure involved in using the Open Specy R - app as a tool for interpreting spectra. It takes approximately 45 -minutes to read through and follow along with this standard operating procedure -the first time. Afterward, knowledgeable users should be able to thoroughly -analyze spectra at an average speed of 1 min^-1^ or faster with the new batch -and automated procedures. If you are looking for documentation about the R package please see the [package vignette](http://wincowger.com/OpenSpecy-package/articles/sop.html) +This document outlines a common workflow for using the Open Specy app and +highlights some topics that users are often requesting a tutorial on. If the +document is followed sequentially from beginning to end, the user will have a +better understanding of every procedure involved in using the Open Specy R app +as a tool for interpreting spectra. It takes approximately 45 minutes to read +through and follow along with this standard operating procedure the first time. +Afterward, knowledgeable users should be able to thoroughly analyze spectra at +an average speed of 1 min^-1^ or faster with the new batch and automated +procedures. If you are looking for documentation about the R package please see +the [package vignette](http://wincowger.com/OpenSpecy-package/articles/sop.html) # Running the App + To get started with the Open Specy user interface, access -[https://openanalysis.org/openspecy/](https://openanalysis.org/openspecy/) -or start the Shiny GUI directly from your own computer in R. -If using the package, you just need to read in the library and run the command `run_app()`. +[https://openanalysis.org/openspecy/](https://openanalysis.org/openspecy/) or +start the Shiny GUI directly from your own computer in R. If using the package, +you just need to read in the library and run the command `run_app()`. ```{r setup} library(OpenSpecy) @@ -46,8 +48,8 @@ run_app() # Reading Data -Once the app is open, if you have your own data, click **Browse** at the top left hand corner of the -Analyze Spectra tab. +Once the app is open, if you have your own data, click **Browse** at the top +left hand corner of the Analyze Spectra tab. ```{r, fig.align="center", out.width="98%", echo=FALSE} knitr::include_graphics("app/mainpage.jpg") @@ -91,7 +93,8 @@ so that we can work on integrating it. The specific steps to converting your instrument's native files to .csv can be found in its software manual or you can check out [Spectragryph](https://www.effemm2.de/spectragryph/), which supports many -spectral file conversions see [Spectragryph Tutorial](http://wincowger.com/OpenSpecy-package/articles/spectragryph.html). +spectral file conversions see [Spectragryph +Tutorial](http://wincowger.com/OpenSpecy-package/articles/spectragryph.html). If you don't have your own data, you can use a test dataset. @@ -366,7 +369,9 @@ website administrator to discuss options for collaborating. ## Download Top Matches -To download the top matches and associated metadata, use the top matches download option. +To download the top matches and associated metadata, use the top matches +download option. + ```{r, fig.align="center", echo=FALSE} knitr::include_graphics("app/download.jpg") ``` diff --git a/vignettes/sop.Rmd b/vignettes/sop.Rmd index f11b8126..1d045aab 100644 --- a/vignettes/sop.Rmd +++ b/vignettes/sop.Rmd @@ -361,16 +361,19 @@ explanations of each processing sub-function below. Considering whether you have enough signal to analyze spectra is important. Classical spectroscopy would recommend your highest peak to be at least 10 times the baseline of your processed spectra before you begin analysis. If your -spectra is below that threshold, you may want to consider recollecting it. In -practice, we are rarely able to collect spectra of that good quality and more -often use 5. The Run Signal Over Noise technique searches your spectra for high -and low regions and conducts division on them to derive the signal to noise -ratio. This is a good way to automatically calculate the signal to noise ratio. -In the example below you can see that our signal to noise ratio is increased by -the processing, the goal of processing is generally to maximize this. Signal -Times Noise multiplies the mean signal by the standard deviation of the signal -and Total Signal sums the intensities. The latter can be really useful for -thresholding spectral maps to identify particles which we will discuss later. +spectra is below that threshold even after processing, you may want to consider +recollecting it. In practice, we are rarely able to collect spectra of that good +quality and more often use 5. The "run_sig_over_noise" `metric` searches your +spectra for high and low regions and conducts division on them to derive the +signal to noise ratio and `step` specifies the number of intensities to search. +This is a good way to automatically calculate the signal to noise ratio. In the +example below you can see that our signal to noise ratio is increased by the +processing, the goal of processing is generally to maximize this. +"sig_times_noise" `metric` multiplies the mean signal by the standard deviation +of the signal and "tot_sig" `metric` sums the intensities. The latter can be +really useful for thresholding spectral maps to identify particles which we will +discuss later. If you know where your signal region and noise regions are you +can specify them with `sig_min`, `sig_max`, `noise_min`, and `noise_max`. ```{r, eval=F} sig_noise(processed, metric = "run_sig_over_noise") > @@ -599,10 +602,7 @@ advanced uses). The previously mentioned libraries all have Raman and FTIR spectra in them. `mediod` is the mediod compressed library version of the absolute first derivative for FTIR, `model` is an exception because it is a multinomial regression approach for FTIR to identification of absolute -derivative spectra. All of the libraries have wavenumbers at 5 cm^-1 resolution. -Whichever library you choose, you first need to get your spectra into a similar -enough format to use for comparison. That means conforming the wavenumbers to -the same range and processing the spectra using the same processing procedures. +derivative spectra. In this example we use the `data("test_lib")` which is a subsampled version of the absolute derivative library and `data("raman_hdpe")` which is an unprocessed Raman spectrum in absorbance units of HDPE plastic. With single spectra it is @@ -615,10 +615,9 @@ data("test_lib") data("raman_hdpe") processed <- process_spec(x = raman_hdpe, - conform_spec_args = list(range = test_lib$wavenumber, - res = NULL, - type = "interp" - )) + conform_spec = F, #We will conform during matching. + smooth_intens = T #Conducts the default derivative transformation. + ) # Check to make sure that the signal to noise ratio of the processed spectra is # greater than 10. @@ -627,29 +626,33 @@ plotly_spec(raman_hdpe, processed) ``` After your spectra is processed similarly to the library specifications, you can -identify the spectra using `match_spec()`. The `add_library_metadata` and -`add_object_metadata` options specify the column name in the metadata that you -want to add metadata from and `top_n` specifies how many matches you want. In -this example we just identified a single spectrum with the library but you can -also send an OpenSpecy object with multiple spectra. The output `matches` is a -data.table with at least 3 columns, `object_id` tells you the column names of -the spectra in `x`, `library_id` tells you the column names from the library -that it matched to. `match_val` is the value of the Pearson correlation -coefficient (default) or other correlation if specified in `...` or if using the -model identification option `match_val` will be the model confidence. The output -in this example returned the correct material type, HDPE, as the top match. If -using Pearson correlation, 0.7 is a good threshold to use for a positive ID. In -this example, only our top match is greater than the threshold so we would -disregard the other matches. If no matches were above our threshold, we would -proclaim that the spectrum is of an unknown identity. You'll also notice in this -example that we matched to a library with both Raman and FTIR spectra but the -Raman spectra had the highest hits, this is the rationale for lazily matching to -a library with both. If you want to just match to a library with FTIR or Raman -spectra, you can first filter the library using `filter_spec()` using -`SpectrumType`. +identify the spectra using `match_spec()`. All of the libraries have wavenumbers +at 5 cm^-1 resolution. Whichever library you choose, you need to get your +spectra into a similar enough format to use for comparison. That means +conforming the wavenumbers to the same values using `conform = T`. +Alternatively, you could have done the conformation during the processing. The +`add_library_metadata` and `add_object_metadata` options specify the column name +in the metadata that you want to add metadata from and `top_n` specifies how +many matches you want. In this example we just identified a single spectrum with +the library but you can also send an OpenSpecy object with multiple spectra. The +output `matches` is a data.table with at least 3 columns, `object_id` tells you +the column names of the spectra in `x`, `library_id` tells you the column names +from the library that it matched to. `match_val` is the value of the Pearson +correlation coefficient (default) or other correlation if specified in `...` or +if using the model identification option `match_val` will be the model +confidence. The output in this example returned the correct material type, HDPE, +as the top match. If using Pearson correlation, 0.7 is a good threshold to use +for a positive ID. In this example, only our top match is greater than the +threshold so we would disregard the other matches. If no matches were above our +threshold, we would proclaim that the spectrum is of an unknown identity. You'll +also notice in this example that we matched to a library with both Raman and +FTIR spectra but the Raman spectra had the highest hits, this is the rationale +for lazily matching to a library with both. If you want to just match to a +library with FTIR or Raman spectra, you can first filter the library using +`filter_spec()` using `SpectrumType`. ```{r eval=FALSE} -matches <- match_spec(x = processed, library = test_lib, +matches <- match_spec(x = processed, library = test_lib, conform = T, add_library_metadata = "sample_name", top_n = 5) print(matches[,c("object_id", "library_id", "match_val", "SpectrumType", "SpectrumIdentity")])