You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A couple of thoughts, having tried with a few pre-existing datasets:
Raw lab data are often stored in wide format (for easy viewing of values in Excel etc.). The typically involves either: an ID and time value, then columns for biomarkers (like this flu dataset from Fonville et al, Science, 2014). Or an ID then columns that concatenate year and biomarker. Wonder if there's scope to allow either wide or long data to be input? The latter harder to automate, of course - below example for wrangling DENV/ZIKV data from Henderson et al, eLife, 2020:
library(dplyr)
library(tidyr)
# Load the data from CSVdata_in<- read.csv("https://raw.githubusercontent.com/hendersonad/zika-sero-pacific/refs/heads/master/data/dset3-fiji-neutralizationassay.csv")
# Reshape the data into long formatlong_data<-data_in|>
select(id, starts_with("D")) |>
pivot_longer(
cols= starts_with("D"),
names_to="column",
values_to="value"
) |>
mutate(
biomarker= sub("s\\d+", "", column), # Extract the biomarker (e.g., D1, D2)year= paste0("20", sub("D\\ds", "", column)), # Extract and format the yearvalue= ifelse(value==-Inf, 0, value) # Replace -Inf with -1
) %>%
select(id, year, biomarker, value)
# View the transformed data
print(long_data)
# Write new CSV
write.csv(long_data,"data_seroviz.csv")
I notice that current app doesn't handle NA or -Inf values (which were in the above raw datasets – the Ha Nam one also includes * for missing entries). Maybe it would be useful to allow user to define a value to represent 'missing'? Or, easier, tell them it has to be given as NA? I notice issue Allow omission of values outside detection limits #26 is already looking at undetectable titres.
For the above DENV/ZIKV data, I also got this warning in the app:
Some traces generated warnings
all:
pseudoinverse used at 2013
neighborhood radius 4.02
reciprocal condition number 1.0336e-16
There are other near singularities as well. 4.0804
I'm guessing this is instability in the smoothing spline? This may be a common issue for sparse data, so could make the warning more informative for less technical users?
The text was updated successfully, but these errors were encountered:
Thanks for putting this useful tool together!
A couple of thoughts, having tried with a few pre-existing datasets:
For the above DENV/ZIKV data, I also got this warning in the app:
I'm guessing this is instability in the smoothing spline? This may be a common issue for sparse data, so could make the warning more informative for less technical users?
The text was updated successfully, but these errors were encountered: