From 7e99356f88ab5147d28bfe338e01bda701a18016 Mon Sep 17 00:00:00 2001
From: "Dr. Hannah De los Santos" That data.table reports “using 2 threads” indicates that installation
has succeeded. If the message reports using only one thread, see the
advice under the “OpenMP
@@ -208,7 +211,7 @@ To install and run The image tag Whichever package you use, the first time this command is run, it
might take a few minutes to download and extract several necessary
components, but this should be fully automated. If successful, you
@@ -228,8 +231,8 @@ Note that the slashes in file paths reverse direction from the
reference to the folder location on your Windows machine (before the
colon) to the folder location on the Docker container (after the colon);
@@ -242,8 +245,8 @@ If you are on macOS, and your data is in
If you mapped a folder, then inside the Docker environment’s R
prompt, when you then issue a command like If you are developing the Once you have the To invoke If you have many small files saved, for example, as
This lists your input files, and passes the filename list to parallel
to use in invoking the driver script, one file at a time, and to process
as parallel jobs as resources are available. Each run of the job then
@@ -275,7 +278,7 @@ All available options for For To use In a 9-step sweep with a In a 3-step sweep with a For the script, the output in the working directory will contain the
sweep parameters, like the above, in a file called
For example, a 5-step sweep with the or with this function execution: Note that an odd-numbered length will include the default values in
the middle run of the sweep (hence the examples with 5 and 9 step
sweeps). The fifth row in the example above demonstrates the results of the
experimental script; for runs 1 and 2, the result is not changed, but
for runs 3-5, the measurement is adjusted for reinclusion. To
demonstrate the range, the following is an extract of measurements only
marked as carried forward exclusions by Some of these values are not adjusted at all; one is from run 1 on, a
few are from run 2 on, and one is from run 3 on. Now available is the in-development release of the infants algorithm,
+which expands pediatric algorithm to clean 0 - 2 and revises its steps.
+This is strongly not recommended for general use, and still contains
+bugs. Possible bugs are enumerated below, and provide a starting point
+for potential developers: If you would still like to use this in-development feature, you can
+do this in the cleangrowth algorithm by turning on the option
+ For information on the steps in the infants algorithm, please see this
+document. This algorithm expansion also contains information on smoothing
+z-scores between observations of 2 and 4 years old. If you would like to
+use this in your work, code can be found in growth.R lines 525-555. If you have any feedback on this in-development feature, would like
+to provide feedback on the algorithm logic or code, or discuss other
+collaborations, please contact the maintainer of this package, Carrie
+Daymont. It can be used as in the following example. Note that processing the
example data with If you are able to run these steps and see a similar result, you have
the This set of warnings is coming from one of
Note that this assumes that The wide dataset In this example, the subject identifiers previously marked as
Adult algorithm
- 2023-03-13
+ 2023-09-13
Source: vignettes/adult-algorithm.Rmd
adult-algorithm.Rmd
Configuration
- 2023-03-13
+ 2023-09-13
Source: vignettes/configuration.Rmd
configuration.Rmd
Developer guidelines
- 2023-03-13
+ 2023-09-13
Source: vignettes/developer-guidelines.Rmd
developer-guidelines.Rmd
Advanced
Installation
- 2023-03-13
+ 2023-09-13
Source: vignettes/installation.Rmd
installation.Rmd
macOS
threads. You will know it worked successfully if, when you load the
data.table
library in R, you see something like the
following:
-library(data.table)
-1.12.2 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com data.table
Docker
growthcleanr
using Docker, open the
PowerShell on Windows, or open the Terminal on macOS, and enter this
docker
command:docker run -it ghcr.io/mitre/growthcleanr/gcr-image:latest R
latest
in the example above will refer to
the latest version of the package available on the main branch of the mitre/growthcleanr
repository, which is typically in close sync with the upstream carriedaymont/growthcleanr
@@ -216,7 +219,7 @@ Docker
growthcleanr
. To explicitly choose a release by name,
replace latest
with the release tag, e.g. for the released
package v2.1.0
:docker run -it ghcr.io/mitre/growthcleanr/gcr-image:v2.1.0 R
Docker
your own data. For example, if you are on Windows, and your data is in
C:\Users\exampleuser\analysis
, specify a mapping using the
added -v
step below:
-docker run -it -v C:\Users\exampleusers\analysis:/usr/src/app \
- ghcr.io/mitre/growthcleanr/gcr-image:latest R
Docker
/Users/exampleuser/analysis
, specify a folder mapping like
this:docker run -it -v /Users/exampleuser/analysis:/usr/src/app \
- ghcr.io/mitre/growthcleanr/gcr-image:latest R
docker run -it -v /Users/exampleuser/analysis:/usr/src/app \
+ ghcr.io/mitre/growthcleanr/gcr-image:latest R
list.files()
,
you should see a list of the same files in the R session that you see in
@@ -283,7 +286,7 @@
growthcleanr
code itself, you
can download or clone the growthcleanr
source code and then
install it from source. To clone the source using git
:git clone https://github.com/carriedaymont/growthcleanr.git
growthcleanr
package source, open an R
session from the growthcleanr
base directory. Then install
growthcleanr using the R devtools
package:Working with large data sets
- 2023-03-13
+ 2023-09-13
Source: vignettes/large-data-sets.Rmd
large-data-sets.Rmd
Splitting input datasets
exec/gcdriver.R
on a single input file using
Rscript
:Rscript exec/gcdriver.R --quietly --sdrecenter nhanes mydata.csv mydata-cleaned.csv
mydata.00001.csv
, mydata.00002.csv
, etc., this
driver can be invoked using parallel
:ls mydata.?????.csv | parallel -j2 --eta \
-"Rscript exec/gcdriver.R --quietly --sdrecenter nhanes {} {}-clean.csv"
ls mydata.?????.csv | parallel -j2 --eta \
+ "Rscript exec/gcdriver.R --quietly --sdrecenter nhanes {} {}-clean.csv"
Batch splitting for adult data
Rscript exec/gcdriver.R --numbatches 4 --adult_split 50 my-large-input.csv my-large-input-cleaned.csv
Rscript exec/gcdriver.R --numbatches 4 --adult_split 50 my-large-input.csv my-large-input-cleaned.csv
--numbatches 4
may be a good starting point for testing
@@ -293,29 +296,29 @@ Reference for
gcdriver.R
gcdriver.R
are described
below.Rscript exec/gcdriver.R --help
-usage: gcdriver.R [--] [--help] [--quietly] [--opts OPTS] [--sdrecenter
-SDRECENTER] [--adult_cutpoint ADULT_CUTPOINT] [--weightcap
- WEIGHTCAP] [--numbatches NUMBATCHES] [--adult_split ADULT_SPLIT]
- infile outfile
-
-CLI driver for growthcleanr
-
-positional arguments:
-infile input file
- outfile output file
-
-flags:
--h, --help show this help message and exit
- -q, --quietly Disable verbose output
-
-optional arguments:
--x, --opts RDS file containing argument values
- -s, --sdrecenter sd.recenter data file [default: ]
- -a, --adult_cutpoint adult cutpoint [default: 20]
- -w, --weightcap weight cap [default: Inf]
- -n, --numbatches Number of batches [default: 1]
- --adult_split Number of splits to run data on [default: Inf]
Rscript exec/gcdriver.R --help
+usage: gcdriver.R [--] [--help] [--quietly] [--opts OPTS] [--sdrecenter
+ SDRECENTER] [--adult_cutpoint ADULT_CUTPOINT] [--weightcap
+ WEIGHTCAP] [--numbatches NUMBATCHES] [--adult_split ADULT_SPLIT]
+ infile outfile
+
+CLI driver for growthcleanr
+
+positional arguments:
+ infile input file
+ outfile output file
+
+flags:
+ -h, --help show this help message and exit
+ -q, --quietly Disable verbose output
+
+optional arguments:
+ -x, --opts RDS file containing argument values
+ -s, --sdrecenter sd.recenter data file [default: ]
+ -a, --adult_cutpoint adult cutpoint [default: 20]
+ -w, --weightcap weight cap [default: Inf]
+ -n, --numbatches Number of batches [default: 1]
+ --adult_split Number of splits to run data on [default: Inf]
Next steps
- 2023-03-13
+ 2023-09-13
Source: vignettes/next-steps.Rmd
next-steps.Rmd
Running the experimentgrowthcleanr as described in the main
README.md
file, save it to a CSV file for
exec/testadjustcf.R
:
-> fwrite(cleaned_data, "cleaned.csv", row.names = F)
testacf()
, keep cleaned_data in your environment.
Note that the column names should be as described for
cleaned_data
the Example under Quickstart.Running the experiment
-
Rscript exec/testadjustcf.R cleaned.csv
testacf()
, run the following in the console (with
cleaned_data
in the environment):
@@ -351,42 +354,42 @@
Running the experimentFor example, for a 9-step sweep with the default search type,
random
, the parameters passed to the function in each pass
will be:
-
- run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over1 0.494454649 0.331710969 1.681997601 5.438065292 0.371428523 -0.200524185 0.296497153 0.244186167
-2 0.198872727 0.918207332 0.0261138 0.361051567 0.370286443 -0.618056939 0.318943811 0.280842425
-3 0.057848889 0.343496154 2.957211272 3.448713172 0.758593493 -0.240298769 0.189112731 0.586874211
-4 0.034874339 0.462954204 0.949754412 2.697612725 1.694048784 -0.563224398 0.237626234 0.410851815
-5 0.5 2 3 5.5 2 0 0.33 1.5
-6 0.621874695 3.545623892 4.918346826 10.84063427 2.996152267 0.904217721 0.585439346 1.787876622
-7 0.896005213 2.192603083 3.885669706 7.492214661 3.581171147 0.319534914 0.537161064 2.25658771
-8 0.670031176 2.90689554 5.990111082 9.239964036 3.676927744 0.082569093 0.568586483 2.645760535
-9 0.98603125 2.169401426 5.718063961 6.950459617 2.91380773 0.816289079 0.457654322 2.540503306
run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over
+1 0.494454649 0.331710969 1.681997601 5.438065292 0.371428523 -0.200524185 0.296497153 0.244186167
+2 0.198872727 0.918207332 0.0261138 0.361051567 0.370286443 -0.618056939 0.318943811 0.280842425
+3 0.057848889 0.343496154 2.957211272 3.448713172 0.758593493 -0.240298769 0.189112731 0.586874211
+4 0.034874339 0.462954204 0.949754412 2.697612725 1.694048784 -0.563224398 0.237626234 0.410851815
+5 0.5 2 3 5.5 2 0 0.33 1.5
+6 0.621874695 3.545623892 4.918346826 10.84063427 2.996152267 0.904217721 0.585439346 1.787876622
+7 0.896005213 2.192603083 3.885669706 7.492214661 3.581171147 0.319534914 0.537161064 2.25658771
+8 0.670031176 2.90689554 5.990111082 9.239964036 3.676927744 0.082569093 0.568586483 2.645760535
+9 0.98603125 2.169401426 5.718063961 6.950459617 2.91380773 0.816289079 0.457654322 2.540503306
line-grid
search type, the
parameters passed to the function in each pass will be:
- run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over1 0 0 0 0 0 -1 0 0
-2 0.125 0.5 0.75 1.375 0.5 -0.75 0.0825 0.375
-3 0.25 1 1.5 2.75 1 -0.5 0.165 0.75
-4 0.375 1.5 2.25 4.125 1.5 -0.25 0.2475 1.125
-5 0.5 2 3 5.5 2 0 0.33 1.5
-6 0.625 2.5 3.75 6.875 2.5 0.25 0.4125 1.875
-7 0.75 3 4.5 8.25 3 0.5 0.495 2.25
-8 0.875 3.5 5.25 9.625 3.5 0.75 0.5775 2.625
-9 1 4 6 11 4 1 0.66 3
run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over
+1 0 0 0 0 0 -1 0 0
+2 0.125 0.5 0.75 1.375 0.5 -0.75 0.0825 0.375
+3 0.25 1 1.5 2.75 1 -0.5 0.165 0.75
+4 0.375 1.5 2.25 4.125 1.5 -0.25 0.2475 1.125
+5 0.5 2 3 5.5 2 0 0.33 1.5
+6 0.625 2.5 3.75 6.875 2.5 0.25 0.4125 1.875
+7 0.75 3 4.5 8.25 3 0.5 0.495 2.25
+8 0.875 3.5 5.25 9.625 3.5 0.75 0.5775 2.625
+9 1 4 6 11 4 1 0.66 3
full-grid
search type, with the
--param
CSV/param
data frame specified as in
the above example, the parameters passed to the function in each pass
will be:
- run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over1 0.0 3 3 5.5 0 0 0.5 1.5
-2 0.5 3 3 5.5 0 0 0.5 1.5
-3 1.0 3 3 5.5 0 0 0.5 1.5
-4 0.0 3 3 5.5 2 0 0.5 1.5
-5 0.5 3 3 5.5 2 0 0.5 1.5
-6 1.0 3 3 5.5 2 0 0.5 1.5
-7 0.0 3 3 5.5 4 0 0.5 1.5
-8 0.5 3 3 5.5 4 0 0.5 1.5
-9 1.0 3 3 5.5 4 0 0.5 1.5
run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over
+1 0.0 3 3 5.5 0 0 0.5 1.5
+2 0.5 3 3 5.5 0 0 0.5 1.5
+3 1.0 3 3 5.5 0 0 0.5 1.5
+4 0.0 3 3 5.5 2 0 0.5 1.5
+5 0.5 3 3 5.5 2 0 0.5 1.5
+6 1.0 3 3 5.5 2 0 0.5 1.5
+7 0.0 3 3 5.5 4 0 0.5 1.5
+8 0.5 3 3 5.5 4 0 0.5 1.5
+9 1.0 3 3 5.5 4 0 0.5 1.5
test_adjustcarrforward_DATE_TIME_parameters.csv
, and the
@@ -398,7 +401,7 @@ Running the experimenttestacf_res entry.
line-grid
search
would be run with this command:Rscript exec/textadjustcf.R --gridlength 5 --searchtype line-grid cleaned.csv
result_list <- testacf(
@@ -410,12 +413,12 @@
(script)/Running the experimenttest_adjustcarrforward_DATE_TIME_parameters.csv
params
data frame of result_list
(function) would be:
-
- run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over1 0 0 0 0 0 -1 0 0
-2 0.25 1 1.5 2.75 1 -0.5 0.165 0.75
-3 0.5 2 3 5.5 2 0 0.33 1.5
-4 0.75 3 4.5 8.25 3 0.5 0.495 2.25
-5 1 4 6 11 4 1 0.66 3
run minfactor maxfactor banddiff banddiff_plus min_ht.exp_under min_ht.exp_over max_ht.exp_under max_ht.exp_over
+1 0 0 0 0 0 -1 0 0
+2 0.25 1 1.5 2.75 1 -0.5 0.165 0.75
+3 0.5 2 3 5.5 2 0 0.33 1.5
+4 0.75 3 4.5 8.25 3 0.5 0.495 2.25
+5 1 4 6 11 4 1 0.66 3
Running the experimenttest_adjustcarrforward_DATE_TIME.csv
(script)/
testacf_res
data frame of result_list
(function) would be:
--1 run-2 run-3 run-4 run-5
- id subjid sex agedays param measurement gcr_result run1510 775155 0 889 HEIGHTCM 84.9 Exclude-Extraneous-Same-Day Missing Missing Missing Missing Missing
-1511 775155 0 889 HEIGHTCM 89.06 Include No Change No Change No Change No Change No Change
-1512 775155 0 1071 HEIGHTCM 92.5 Include No Change No Change No Change No Change No Change
-1513 775155 0 1253 HEIGHTCM 96.2 Include No Change No Change No Change No Change No Change
-1514 775155 0 1435 HEIGHTCM 96.2 Exclude-Carried-Forward No Change No Change Include Include Include
-1515 775155 0 1435 HEIGHTCM 99.692 Include No Change No Change No Change No Change No Change
-1516 775155 0 1806 HEIGHTCM 106.1 Include No Change No Change No Change No Change No Change
-1517 775155 0 2177 HEIGHTCM 112.3 Include No Change No Change No Change No Change No Change
-1518 775155 0 889 WEIGHTKG 13.1 Include No Change No Change No Change No Change No Change
id subjid sex agedays param measurement gcr_result run-1 run-2 run-3 run-4 run-5
+1510 775155 0 889 HEIGHTCM 84.9 Exclude-Extraneous-Same-Day Missing Missing Missing Missing Missing
+1511 775155 0 889 HEIGHTCM 89.06 Include No Change No Change No Change No Change No Change
+1512 775155 0 1071 HEIGHTCM 92.5 Include No Change No Change No Change No Change No Change
+1513 775155 0 1253 HEIGHTCM 96.2 Include No Change No Change No Change No Change No Change
+1514 775155 0 1435 HEIGHTCM 96.2 Exclude-Carried-Forward No Change No Change Include Include Include
+1515 775155 0 1435 HEIGHTCM 99.692 Include No Change No Change No Change No Change No Change
+1516 775155 0 1806 HEIGHTCM 106.1 Include No Change No Change No Change No Change No Change
+1517 775155 0 2177 HEIGHTCM 112.3 Include No Change No Change No Change No Change No Change
+1518 775155 0 889 WEIGHTKG 13.1 Include No Change No Change No Change No Change No Change
cleangrowth()
:-1 run-2 run-3 run-4 run-5
- id subjid sex agedays param measurement gcr_result run1514 775155 0 1435 HEIGHTCM 96.2 Exclude-Carried-Forward No Change No Change Include Include Include
-1521 775155 0 1435 WEIGHTKG 15.3 Exclude-Carried-Forward No Change No Change No Change No Change No Change
-7952 1340377 1 1806 HEIGHTCM 107.1 Exclude-Carried-Forward No Change Include Include Include Include
-7967 1340377 1 1806 WEIGHTKG 18.4 Exclude-Carried-Forward No Change No Change No Change No Change No Change
-41775 3643526 1 1253 HEIGHTCM 87.808 Exclude-Carried-Forward Include Include Include Include Include
-44901 3706097 0 4032 HEIGHTCM 138.8 Exclude-Carried-Forward No Change Include Include Include Include
-30011 5792371 1 3661 HEIGHTCM 145.4 Exclude-Carried-Forward No Change Include Include Include Include
-30013 5792371 1 4032 HEIGHTCM 145.4 Exclude-Carried-Forward No Change No Change No Change No Change No Change
-30016 5792371 1 1071 WEIGHTKG 15.9 Exclude-Carried-Forward No Change No Change No Change No Change No Change
id subjid sex agedays param measurement gcr_result run-1 run-2 run-3 run-4 run-5
+1514 775155 0 1435 HEIGHTCM 96.2 Exclude-Carried-Forward No Change No Change Include Include Include
+1521 775155 0 1435 WEIGHTKG 15.3 Exclude-Carried-Forward No Change No Change No Change No Change No Change
+7952 1340377 1 1806 HEIGHTCM 107.1 Exclude-Carried-Forward No Change Include Include Include Include
+7967 1340377 1 1806 WEIGHTKG 18.4 Exclude-Carried-Forward No Change No Change No Change No Change No Change
+41775 3643526 1 1253 HEIGHTCM 87.808 Exclude-Carried-Forward Include Include Include Include Include
+44901 3706097 0 4032 HEIGHTCM 138.8 Exclude-Carried-Forward No Change Include Include Include Include
+30011 5792371 1 3661 HEIGHTCM 145.4 Exclude-Carried-Forward No Change Include Include Include Include
+30013 5792371 1 4032 HEIGHTCM 145.4 Exclude-Carried-Forward No Change No Change No Change No Change No Change
+30016 5792371 1 1071 WEIGHTKG 15.9 Exclude-Carried-Forward No Change No Change No Change No Change No Change
Understanding growthcleanr output
- 2023-03-13
+ 2023-09-13
Source: vignettes/output.Rmd
output.Rmd
Preliminary infants algorithm
+
+ 2023-09-13
+
+ Source: vignettes/prelim-infants-algorithm.Rmd
+ prelim-infants-algorithm.Rmd
+
+prelim_infants
:
+
# prepare data as a data.table
+data <- as.data.table(source_data)
+
+# set the data.table key for better indexing
+setkey(data, subjid, param, agedays)
+
+# generate new exclusion flag field using function
+cleaned_data <-
+ data[, gcr_result := cleangrowth(subjid, param, agedays, sex, measurement,
+ prelim_infants = TRUE)]
Quickstart
- 2023-03-13
+ 2023-09-13
Source: vignettes/quickstart.Rmd
quickstart.Rmd
Usage
- 2023-03-13
+ 2023-09-13
Source: vignettes/usage.Rmd
usage.Rmd
Basic operations using ex
cleangrowth()
will likely take a few
minutes to complete.library(data.table)
-library(dplyr)
-
-# Convert the `syngrowth` data frame to a `data.table`
-<- as.data.table(syngrowth)
- data
-# `setkey()` creates an efficient sorting key on the `data.table`; this is required
-# for `cleangrowth()`
-setkey(data, subjid, param, agedays)
-
-# Add a column `gcr_result` using `cleangrowth`
-<- data[, gcr_result := cleangrowth(subjid, param, agedays, sex, measurement)]
- cleaned_data
-# View a sample of the results
-head(cleaned_data)
-
- id subjid sex agedays param measurement gcr_result1: 83330 002986c5-354d-bb9d-c180-4ce26813ca28 1 20489.22 HEIGHTCM 151.1 Include
-2: 83332 002986c5-354d-bb9d-c180-4ce26813ca28 1 20860.22 HEIGHTCM 151.1 Include
-3: 83334 002986c5-354d-bb9d-c180-4ce26813ca28 1 20860.22 HEIGHTCM 150.6 Exclude-Same-Day-Extraneous
-4: 83335 002986c5-354d-bb9d-c180-4ce26813ca28 1 21231.22 HEIGHTCM 151.1 Include
-5: 83337 002986c5-354d-bb9d-c180-4ce26813ca28 1 21602.22 HEIGHTCM 151.1 Include
-6: 83339 002986c5-354d-bb9d-c180-4ce26813ca28 1 21623.22 HEIGHTCM 151.1 Include
-
-# Summarize results by result type
-%>% group_by(gcr_result) %>% tally(sort=TRUE)
- cleaned_data # A tibble: 26 x 2
-
- gcr_result n<fct> <int>
- 1 Include 61652
- 2 Exclude-Extraneous-Same-Day 11263
- 3 Exclude-Carried-Forward 7093
- 4 Exclude-Same-Day-Extraneous 4010
- 5 Exclude-Same-Day-Identical 623
- 6 Exclude-SD-Cutoff 175
- 7 Exclude-EWMA-8 139
- 8 Exclude-Distinct-3-Or-More 125
- 9 Exclude-BIV 108
- 10 Exclude-EWMA-Extreme 99
-# … with 16 more rows
library(data.table)
+library(dplyr)
+
+# Convert the `syngrowth` data frame to a `data.table`
+data <- as.data.table(syngrowth)
+
+# `setkey()` creates an efficient sorting key on the `data.table`; this is required
+# for `cleangrowth()`
+setkey(data, subjid, param, agedays)
+
+# Add a column `gcr_result` using `cleangrowth`
+cleaned_data <- data[, gcr_result := cleangrowth(subjid, param, agedays, sex, measurement)]
+
+# View a sample of the results
+head(cleaned_data)
+ id subjid sex agedays param measurement gcr_result
+1: 83330 002986c5-354d-bb9d-c180-4ce26813ca28 1 20489.22 HEIGHTCM 151.1 Include
+2: 83332 002986c5-354d-bb9d-c180-4ce26813ca28 1 20860.22 HEIGHTCM 151.1 Include
+3: 83334 002986c5-354d-bb9d-c180-4ce26813ca28 1 20860.22 HEIGHTCM 150.6 Exclude-Same-Day-Extraneous
+4: 83335 002986c5-354d-bb9d-c180-4ce26813ca28 1 21231.22 HEIGHTCM 151.1 Include
+5: 83337 002986c5-354d-bb9d-c180-4ce26813ca28 1 21602.22 HEIGHTCM 151.1 Include
+6: 83339 002986c5-354d-bb9d-c180-4ce26813ca28 1 21623.22 HEIGHTCM 151.1 Include
+
+# Summarize results by result type
+cleaned_data %>% group_by(gcr_result) %>% tally(sort=TRUE)
+# A tibble: 26 x 2
+ gcr_result n
+ <fct> <int>
+ 1 Include 61652
+ 2 Exclude-Extraneous-Same-Day 11263
+ 3 Exclude-Carried-Forward 7093
+ 4 Exclude-Same-Day-Extraneous 4010
+ 5 Exclude-Same-Day-Identical 623
+ 6 Exclude-SD-Cutoff 175
+ 7 Exclude-EWMA-8 139
+ 8 Exclude-Distinct-3-Or-More 125
+ 9 Exclude-BIV 108
+10 Exclude-EWMA-Extreme 99
+# … with 16 more rows
growthcleanr
package installed correctly. The resulting
cleaned_data
can be reviewed, subsetted, and compared in
@@ -257,9 +260,9 @@ Basic configuration options
:
- Warning messages1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
-2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
Warning messages:
+1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
+2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
growthcleanr
’s dependencies, and does not indicate either
failure or improper execution.Utilities for computing pediatric BMI
percentiles, Z-scores, and related tools
- 2023-03-13
+ 2023-09-13
Source: vignettes/utilities.Rmd
utilities.Rmd
Converting long g
cleaned_data
has the same
structure as described in Quickstart - Data
preparation:names(cleaned_data)
-1] "id" "subjid" "sex" "agedays" "param" "measurement" "gcr_result" [
cleaned_data_wide
will include rows
with aligned height and weight measurements drawn from the observations
in cleaned_data
marked by cleangrowth()
for
inclusion. As such, it will be a shorter dataset (fewer rows) based on
fewer observations.dim(cleaned_data)
-1] 85728 7
- [
-dim(cleaned_data_wide)
-1] 26701 9
- [
-head(cleaned_data_wide)
-
- subjid agey agem sex wt wt_id ht ht_id agedays1 002986c5-354d-bb9d-c180-4ce26813ca28 56.0964 673.1568 2 71.7 83331 151.1 83330 20489.22
-2 002986c5-354d-bb9d-c180-4ce26813ca28 57.1122 685.3464 2 73.2 83333 151.1 83332 20860.22
-3 002986c5-354d-bb9d-c180-4ce26813ca28 58.1279 697.5348 2 74.6 83336 151.1 83335 21231.22
-4 002986c5-354d-bb9d-c180-4ce26813ca28 59.1437 709.7244 2 72.8 83338 151.1 83337 21602.22
-5 002986c5-354d-bb9d-c180-4ce26813ca28 59.2012 710.4144 2 72.4 83340 151.1 83339 21623.22
-6 002986c5-354d-bb9d-c180-4ce26813ca28 60.1594 721.9128 2 69.4 83343 151.1 83342 21973.22
dim(cleaned_data)
+[1] 85728 7
+
+dim(cleaned_data_wide)
+[1] 26701 9
+
+head(cleaned_data_wide)
+ subjid agey agem sex wt wt_id ht ht_id agedays
+1 002986c5-354d-bb9d-c180-4ce26813ca28 56.0964 673.1568 2 71.7 83331 151.1 83330 20489.22
+2 002986c5-354d-bb9d-c180-4ce26813ca28 57.1122 685.3464 2 73.2 83333 151.1 83332 20860.22
+3 002986c5-354d-bb9d-c180-4ce26813ca28 58.1279 697.5348 2 74.6 83336 151.1 83335 21231.22
+4 002986c5-354d-bb9d-c180-4ce26813ca28 59.1437 709.7244 2 72.8 83338 151.1 83337 21602.22
+5 002986c5-354d-bb9d-c180-4ce26813ca28 59.2012 710.4144 2 72.4 83340 151.1 83339 21623.22
+6 002986c5-354d-bb9d-c180-4ce26813ca28 60.1594 721.9128 2 69.4 83343 151.1 83342 21973.22
subjid
are now in the id
column; individual
identifiers for observations of a single parameter are not present.Converting long g
your input set uses different column names. For example, if
my_cleaned_data
specifies age in days as aged
and parameter type as type
, specify each, with quotes:
head(my_cleaned_data)
-
- id subjid sex aged type measurement gcr_result1: 1510 775155 0 889 HEIGHTCM 84.90 Exclude-Extraneous-Same-Day
-2: 1511 775155 0 889 HEIGHTCM 89.06 Include
-3: 1518 775155 0 889 WEIGHTKG 13.10 Include
-4: 1512 775155 0 1071 HEIGHTCM 92.50 Include
-5: 1519 775155 0 1071 WEIGHTKG 14.70 Include
-6: 1513 775155 0 1253 HEIGHTCM 96.20 Include
-longwide(my_cleaned_data, agedays = "aged", param = "type")
head(my_cleaned_data)
+ id subjid sex aged type measurement gcr_result
+1: 1510 775155 0 889 HEIGHTCM 84.90 Exclude-Extraneous-Same-Day
+2: 1511 775155 0 889 HEIGHTCM 89.06 Include
+3: 1518 775155 0 889 WEIGHTKG 13.10 Include
+4: 1512 775155 0 1071 HEIGHTCM 92.50 Include
+5: 1519 775155 0 1071 WEIGHTKG 14.70 Include
+6: 1513 775155 0 1253 HEIGHTCM 96.20 Include
+longwide(my_cleaned_data, agedays = "aged", param = "type")
By default, longwide()
will only transform records
flagged by cleangrowth()
for inclusion. To include more
categories assigned by cleangrowth()
, use the
@@ -287,32 +290,32 @@
ext_bmiz()
can
be called:
-<- ext_bmiz(cleaned_data_bmi)
- cleaned_data_bmiz head(cleaned_data_bmiz)
-
- subjid agey age sex wt wt_id ht ht_id agedays bmi bmiz<char> <num> <num> <int> <num> <int> <num> <int> <int> <num> <num>
- 1: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 10.0233 120.2796 2 35.4 17 141.6 15 3661 17.65537 0.3236612
-2: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 11.0390 132.4680 2 39.2 19 147.9 18 4032 17.92048 0.1734315
-3: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.0548 144.6576 2 44.8 21 155.1 20 4403 18.62320 0.1832443
-4: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.5914 151.0968 2 47.8 23 158.7 22 4599 18.97903 0.1829183
-5: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 13.0705 156.8460 2 50.5 26 160.8 24 4774 19.53077 0.2586449
-6: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 3.9288 47.1456 2 16.6 2 102.6 1 1435 15.76933 0.3453978
-
- bmip waz wp haz hp p95 p97 bmip95 mod_bmiz mod_waz mod_haz<num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
- 1: 62.69027 0.3498817 63.67862 0.5140553 69.63933 22.96109 24.57902 76.89254 0.18485501 0.2399201 0.5008944
-2: 56.88438 0.2311645 59.14065 0.5002225 69.15408 24.13836 25.90525 74.24067 0.09543668 0.1556259 0.4955022
-3: 57.26968 0.3237374 62.69316 0.4803298 68.45036 25.26981 27.17179 73.69745 0.10065274 0.2201655 0.4855100
-4: 57.25689 0.3812700 64.84986 0.5153244 69.68368 25.83904 27.80781 73.45100 0.10011666 0.2596479 0.5212281
-5: 60.20454 0.4440465 67.14955 0.4849566 68.61464 26.32757 28.35405 74.18371 0.14353771 0.3024342 0.4885546
-6: 63.51023 0.4369963 66.89430 0.5348018 70.36065 18.02950 18.60078 87.46409 0.24462581 0.3332011 0.5224816
-
- sigma original_bmip original_bmiz sev_obese obese<num> <num> <num> <int> <int>
- 1: 4.443536 62.69027 0.3236612 0 0
-2: 4.797031 56.88438 0.1734315 0 0
-3: 5.148292 57.26968 0.1832443 0 0
-4: 5.332930 57.25689 0.1829183 0 0
-5: 5.497248 60.20454 0.2586449 0 0
-6: 2.274792 63.51023 0.3453978 0 0
cleaned_data_bmiz <- ext_bmiz(cleaned_data_bmi)
+head(cleaned_data_bmiz)
+ subjid agey age sex wt wt_id ht ht_id agedays bmi bmiz
+ <char> <num> <num> <int> <num> <int> <num> <int> <int> <num> <num>
+1: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 10.0233 120.2796 2 35.4 17 141.6 15 3661 17.65537 0.3236612
+2: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 11.0390 132.4680 2 39.2 19 147.9 18 4032 17.92048 0.1734315
+3: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.0548 144.6576 2 44.8 21 155.1 20 4403 18.62320 0.1832443
+4: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.5914 151.0968 2 47.8 23 158.7 22 4599 18.97903 0.1829183
+5: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 13.0705 156.8460 2 50.5 26 160.8 24 4774 19.53077 0.2586449
+6: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 3.9288 47.1456 2 16.6 2 102.6 1 1435 15.76933 0.3453978
+ bmip waz wp haz hp p95 p97 bmip95 mod_bmiz mod_waz mod_haz
+ <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
+1: 62.69027 0.3498817 63.67862 0.5140553 69.63933 22.96109 24.57902 76.89254 0.18485501 0.2399201 0.5008944
+2: 56.88438 0.2311645 59.14065 0.5002225 69.15408 24.13836 25.90525 74.24067 0.09543668 0.1556259 0.4955022
+3: 57.26968 0.3237374 62.69316 0.4803298 68.45036 25.26981 27.17179 73.69745 0.10065274 0.2201655 0.4855100
+4: 57.25689 0.3812700 64.84986 0.5153244 69.68368 25.83904 27.80781 73.45100 0.10011666 0.2596479 0.5212281
+5: 60.20454 0.4440465 67.14955 0.4849566 68.61464 26.32757 28.35405 74.18371 0.14353771 0.3024342 0.4885546
+6: 63.51023 0.4369963 66.89430 0.5348018 70.36065 18.02950 18.60078 87.46409 0.24462581 0.3332011 0.5224816
+ sigma original_bmip original_bmiz sev_obese obese
+ <num> <num> <num> <int> <int>
+1: 4.443536 62.69027 0.3236612 0 0
+2: 4.797031 56.88438 0.1734315 0 0
+3: 5.148292 57.26968 0.1832443 0 0
+4: 5.332930 57.25689 0.1829183 0 0
+5: 5.497248 60.20454 0.2586449 0 0
+6: 2.274792 63.51023 0.3453978 0 0
The output columns include: