Releases: amices/mice
mice 3.17.0
Major changes
-
Imputing categorical data by predictive mean matching. Predictive mean matching (PMM) is the default method of
mice()
for imputing numerical variables, but it has long been possible to impute factors. This enhancement introduces better support to work with categorical variables in PMM. The former system translated factors into integers byynum <- as.integer(f)
. However, the order of integers inynum
may have no sensible interpretation for an unordered factor. The new system quantifiesynum
and could yield better results because of higher$R^2$ . The method calculates the canonical correlation betweeny
(as dummy matrix) and a linear combination of imputation model predictorsx
. The algorithm then replaces each category ofy
by a single number taken from the first canonical variate. After this step, the imputation model is fitted, and the predicted values from that model are extracted to function as the similarity measure for the matching step. -
The method works for both ordered and unordered factors. No special precautions are taken to ensure monotonicity between the category numbers and the quantifications, so the method should be able to preserve quadratic and other non-monotone relations of the predicted metric. It may be beneficial to remove very sparsely filled categories, for which there is a new
trim
argument. All you have to use the new technique is specify tomice(..., method = "pmm", ...)
. Both numerical and categorical variables will then be imputed by PMM. -
Potential advantages are:
- Simpler and faster than fitting a generalised linear model, e.g., logistic regression or the proportional odds model;
- Should be insensitive to the order of categories;
- No need to solve problems with perfect prediction;
- Should inherit the good statistical properties of predictive mean matching.
-
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
-
New system-independent method for pooling: This version introduces a new function
pool.table()
that takes a tidy table of parameter estimates stemming fromm
repeated analyses. The input data must consist of three columns (parameter name, estimate, standard error) and a specification of the degrees of freedom of the model fitted to the complete data. Thepool.table()
function outputs 14 pooled statistics in a tidy form. The primary use ofpool.table()
is to support parameter pooling for techiques that have notidy()
orglance()
methods, either withinR
or outsideR
. Thepool.table()
function also allows for a novel workflows that 1) break apart the traditionalpool()
function into a data-wrangling part and a parameters-reducing part, and 2) does not necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren -
literanger: Adds support for the
literanger
package forrf
imputation that is about twice as fast asranger
(#648). Thanks @stephematician for the contribution.
Breaking changes
-
The
complete(..., action = "long", ...)
command puts the columns named".imp"
and".id"
in the last two positions of the long data (instead of first two positions). In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with. Note that any existing code that assumes that variables".imp"
and".id"
are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names".imp"
and".id"
. If you want the old behaviour, specify the argumentorder = "first"
. (#569). Contributed @stefvanbuuren -
Drops support for S4. Convert S4-related code to S3. Syntax
as(df, "mids")
is deprecated. Useas.mids(df)
instead. -
Adopts the
broom
-convention for naming lower and upper bounds of the confidence interval as"conf.low"
and"conf.high"
. Do not use non-syntactic names anymore, like"2.5 %"
.
Minor changes
- Adds support for the
dots
argument toranger::ranger(...)
inmice.impute.rf()
(#563). Contributed @edbonneville - Prepares for the deprecation of the
blocks
argument at various places - Removes the need for
blocks
ininitialize_chain()
- In
rbind()
, when formulas are concatenated and duplicate names are found, also rename the duplicated variables in formulas by their new name - Solves problem with the package documentation link
- Simplifies
NEWS.md
formatting to get correct version sequence on CRAN and in-package NEWS - Initialize single-variables blocks in
make.method()
in a more efficient way (resolves #672) - Prevent
as.mids()
from filling theimp
object for complete variables - Defines S3 class constructors for
mids
,mads
,mira
andmipo
objects
Bug fixes
- Fixes the "large logo" problem. (#574). Contributed @hanneoberman
- Patches a bug in
complete()
that auto-repeated imputed values into cells that should NOT be imputed (occurred as a special case ofrbind()
, where the first set of rows was imputed and the second was not). - Replaces the internal variable
type
by the more informativepred
(currently active row ofpredictorMatrix
) - Fixes a bug in
filter.mids()
that incorrectly removed empty components in theimp
object - Fixes a bug in
ibind()
that incorrectly usedlength(blocks)
as the first dimension of thechainMean
andchainVar
objects - Corrects the description
visitSequence
,chainMean
andchainVar
components of themids
object - Fixes problems with zero predictors (#588)
- Fixes a problem with the
minpuc
argument inquickpred()
(#634) - Fixes
coef() not available on S4 object
when using withlavaan
(#615, #616) - Adds
.github/dependabot.yml
configuration to automate daily check (#598) - Update documentation tags to
roxygen2 7.3.1
requirements - Repairs lost braces in the documentation
- Fixes an installation problem when
Rprofile
prints tostdout
on Fedora, R version 4.1.3 (#646, #647). Thanks @brookslogan for the fix. - Fixes a bug during initialization of factor values
- Removes
methods
andrlang
fromDepends
- Removes export of non-user facing
ampute()
helpers - Clears
\link
statements that do not pass CRAN checks
mice 3.16.0
Major changes
-
Expands
futuremice()
functionality by allowing for external packages and user-written functions (#550). Contributed @thomvolker -
Adds GH issue templates
bug_report
,feature_request
andhelp_wanted
(#560). Contributed @hanneoberman
Minor changes
- Removes documentation files for
rbind.mids()
andcbind.mids()
to conform to CRAN policy - Adds
mitml
andglmnet
to imports so that test code conforms to_R_CHECK_DEPENDS_ONLY=true
flag inR CMD check
- Initializes random number generator in
futuremice()
if there is no.Random.seed
yet. - Updates GitHub actions for package checking and site building
- Preserves user settings in
predictorMatrix
for case F by adding apredictorMatrix
argument tomake.predictorMatrix()
- Polishes
mice.impute.mpmm()
example code
Bug fixes
- Adds proper support for factors to
mice.impute.2lonly.pmm()
(#555) - Solves function naming problems for S3 generic functions
tidy()
,update()
,format()
andsum()
- Out-comments and weeds example&test code to silence
R CMD check
with_R_CHECK_DEPENDS_ONLY=true
- Fixes small bug in
futuremice()
that throws an error when the number of cores is not specified, but the number of available cores is greater than the number of imputations. - Solves a bug in
mice.impute.mpmm()
that changed the column order of the data
mice 3.15.0
mice 3.15.0
Major changes
-
Adds a function
futuremice()
with support for parallel imputation using thefuture
package (#504). Contributed @thomvolker, @gerkovink -
Adds multivariate predictive mean matching
mice.impute.mpmm()
. (#460). Contributed @Mingyang-Cai -
Adds
convergence()
for convergence evaluation (#484). Contributed @hanneoberman -
Reverts the internal seed behaviour back to
mice 3.13.10
(#515). #432 introduced new local seed in response to #426. However, various issues arose with this facility (#459, #492, #502, #505). This version restores the old behaviour using global.Random.seed
. Contributed @gerkovink -
Adds a
custom.t
argument topool()
that allows the advanced user to specify a custom rule for calculating the total variance$T$ . Contributed @gerkovink -
Adds new argument
exclude
tomice.impute.pmm()
that excludes a user-specified vector of values from matching. Excluded values will not appear in the imputations. Since the observed values are not imputed, the user-specified values are still being used to fit the imputation model (#392, #519). Contributed @gerkovink
Minor changes
- Styles all
.R
and.Rmd
files - Makes post-processing assignment consistent with lines 85/86 in
sampler.R
(#511) - Edit test broken on R<4 (#501). Contributed @MichaelChirico
- Adds support for models reporting contrasts rather than terms (#498). Contributed @LukasWallrich
- Applies edits to autocorrelation function (#491). Contributed @hanneoberman
- Changes p-value calculation to more robust alternative (#494). Contributed @AndrewLawrence
- Uses
inherits()
to check on class membership - Adds decprecation notices to
parlmice()
- Adapt
prop
,patterns
andweights
matrices for pattern with only 1's - Adds warning when patterns cannot be generated (#449, #317, #451)
- Adds warning on the order of model terms in
D1()
andD2()
(#420) - Adds example code to fit model on train data and apply to test data to
mice()
- Adds example code on synthetic data generation and analysis in
make.where()
- Adds testfile
test-mice.impute.rf.R
(#448)
Bug fixes
- Replaces
.Random.seed
reads from the.GlobalEnv
byget(".Random.seed", envir = globalenv(), mode = "integer", inherits = FALSE)
- Repairs capitalisation problems with
lastSeedValue
variable name - Solves
x$lastSeedValue
problem incbind.mids()
(#502) - Fixes problems with
ampute()
- Preserves stochastic nature of
mice()
by smarter random seed initialisation (#459) - Repairs a
drop = FALSE
buglet inmice.impute.rf()
(#447, #448) - @str-amg reported that the new dependency on
withr
package should have version 2.4.0 (published in January 2021) or higher. Versionswithr 2.3.0
and before may giveError: object 'local_seed' is not exported by 'namespace:withr'
. Either update manually, or install the patched versionmice 3.14.1
from GitHub. (#445). NOTE:withr
is no longer needed inmice 3.15.0
mice 3.14.0
Major changes
- Adds four new univariate functions using the lasso for automatic variable selection:
Function | Description |
---|---|
mice.impute.lasso.norm() |
Lasso linear regression |
mice.impute.lasso.logreg() |
Lasso logistic regression |
mice.impute.lasso.select.norm() |
Lasso selector + linear regression |
mice.impute.lasso.select.logreg() |
Lasso selector + logistic regression |
Contributed by @EdoardoCostantini (#438).
-
Adds Jamshidian && Jalal's non-parametric MCAR test,
mice::MCAR()
and associated plot method. Contributed by @cjvanlissa (#423). -
Adds two new functions
pool.syn()
andpool.scalar.syn()
that specialise pooling estimates from synthetic data. The"reiter2003"
pooling rule assumes that synthetic data were created from complete data. Thanks Thom Volker (#436). -
Avoids changing the global
.Random.seed
(#426, #432) by implementingwithr::local_preserve_seed()
andwithr::local_seed()
. This change provides stabler behavior in complex scripts. The change does not appear to break reproducibility whenmice()
was run with a seed. Nevertheless, if you run into a reproducibility problem, installmice 3.13.12
or before. -
Improves the imputation of parabolic data in
mice.impute.quadratic()
, adds a parameterquad.outcome
containing the name of the outcome variable in the complete-data model. Contributed @Mingyang-Cai, @gerkovink (#408) -
By default,
mice.impute.rf()
now uses the fasterranger
package as back-end instead ofrandomForest
package. If you want the old behaviour specify therfPackage = "randomForest"
argument to themice(...)
call. Contributed @prockenschaub (#431). -
Generalises
pool()
so that it processes the parameters from allgamlss
sub-models. Thanks Marcio Augusto Diniz (#406, #405) -
Uses the robust standard error estimate for pooling when
pool()
can extractrobust.se
from the object returned bybroom::tidy()
(#310)
Bug fixes
- Contains an emergency solution as
install.on.demand()
broke the standard CRAN workflow. mice 3.14.0 does not callinstall.on.demand()
anymore for recommended packages. Also,install.on.demand()
will not run anymore in non-interactive mode. - Repairs an error in the
mice:::barnard.rubin()
function for infinitedfcom
. Thanks @huftis (#441). - Solves problem with
Xi <- as.matrix(...)
inmice.impute.2l.lmer()
that occurred when a cluster contains only one observation (#384) - Edits the
predictorMatrix
to a monotone pattern ifvisitSequence = "monotone"
andmaxit = 1
(#316) - Solves a problem with the plot produced by
md.pattern()
(#318, #323) - Fixes the intercept in
make.formulas()
(#305, #324) - Fixes seed when using
newdata
inmice.mids()
(#313, #325) - Solves a problem with row names of the
where
element created inrbind()
(#319) - Solves a bug in mnar imputation routine. Contributed by Margarita Moreno Betancur.
Minor changes
- Replaces URL to jstatsoft with DOI
- Update reference to literature (#442)
- Informs the user that
pool()
cannot take amids
object (#433) - Updates documentation for post-processing functionality (#387)
- Adds Rcpp necessities
- Solves a problem with "last resort" initialisation of factors (#410)
- Documents the "flat-line behaviour" of
mice.impute.2l.lmer()
to indicate a problem in fitting the imputation model (#385) - Add reprex to test (#326)
- Documents that multivariate imputation methods do not support the
post
parameter (#326)
mice 3.13.0
Major changes
- Updated
mids2spss()
replaces theforeign
byhaven
package. Contributed Gerko Vink (#291)
Minor changes
mice 3.12.0
Much faster predictive mean matching
- The new
matchindex
C function makes predictive mean matching 50 to 600 times faster.
The speed ofpmm
is now on par with normal imputation (mice.impute.norm()
)
and with themiceFast
package, without compromising on the statistical quality of
the imputations. Thanks to Polkas Polkas/miceFast#10 and
suggestions by Alexander Robitzsch. See #236 for more details.
New ignore
argument to mice
- New
ignore
argument tomice()
. This argument is a logical vector
ofnrow(data)
elements indicating which rows are ignored when creating
the imputation model. We may use theignore
argument to split the data
into a training set (on which the imputation model is built) and a test
set (that does not influence the imputation model estimates). The argument
is based on the suggestion in
#32 (comment). See #32 for
more background and techniques. Crafted by Patrick Rockenschaub
New filter()
function for mids
objects
- New
filter()
method that subsets amids
object (multiply-imputed data set).
The method accepts a logical vector of lengthnrow(data)
, or an expression
to construct such a vector from the incomplete data. (#269).
Crafted by Patrick Rockenschaub.
Changes affecting reproducibility
- Breaking change: The
matcher
algorithm inpmm
has changed tomatchindex
for speed improvements. If you want the old behavior, specifymice(..., use.matcher = TRUE)
.
Minor changes
- Corrected installation problem related to
cpp11
package (#286) - Simplifies
with.mids()
by callingeval_tidy()
on a quosure. Does not yet solve #265. - Improve documentation for
pool()
andpool.scalar()
(#142, #106, #190 and others) - Makes
tidy.mipo
more flexible (#276) - Solves a problem if
nelsonaalen()
gets atibble
(#272) - Add explanation to how
NA
s can appear in the imputed data (#267) - Add warning to
quickpred()
documentation (#268) - Styles all sources files with styler
- Improves consistency in code and documentation
- Moves internally defined functions to global namespace
- Solves bug in internal
sum.scores()
- Adds deprecated messages to
lm.mids()
,glm.mids()
,pool.compare()
- Removes
expandcov()
- Strips out all
return()
calls placed just before end-of-function - Remove all trailing spaces
- Repairs a bug in the routine for finding the
printFlag
value (#258) - Update URL's after transfer to organisation
amices
mice 3.11.0
Major changes
- The Cox model does not return
df.residual
, which caused problematic behavior in theD1()
,D2()
,D3()
,anova()
andpool()
.mice
now extracts the relevant information from other parts of the objects returned bysurvival::coxph()
, which solves long-standing issues with the integration of the Cox model (#246). - Adds missing
Rccp
dependency to work withtidyr 1.1.1
(#248).
Minor changes
- Addresses warnings:
Non-file package-anchored link(s) in documentation object
. - Updates on
ampute
documentation (#251). - Ask user permission before installing a package from
suggests
.
mice 3.10.0
mice 3.9.0
Major changes
- The
D3()
function inmice
gave incorrect results. This version solves a problem in the calculation of theD3
-statistic. See #226 and #228 for more details. The documentation explains why results frommice::D3()
andmitml::testModels()
may differ. - The
pool()
function is now more forgiving when there is noglance()
function (#233) - It is possible to bypass
remove.lindep()
by settingeps = 0
(#225)
Minor changes
- Adds reference to Leacy's thesis
- Adds an example to the
plot.mids()
documentation
mice 3.8.0
Major changes
- This version adds two new NARFCS methods for imputing data under the Missing Not at Random (MNAR) assumption. NARFCS is generalised version of the so-called delta-adjustment method. Margarita Moreno-Betancur and Ian White kindly contributed the functions
mice.impute.mnar.norm()
andmice.impute.mnar.logreg()
. These functions aid in performing sensitivity analysis to investigate the impact of different MNAR assumptions on the conclusion of the study. An alternative for MNAR is the oldermice.impute.ri()
function. - Installation of
mice
is faster. External packages needed for imputation and analyses are now installed on demand. The number of dependencies as estimated byrsconnect::appDepencies()
decreased from 132 to 83. - The name clash with the
complete()
function oftidyr
should no longer be a problem. - There is now a more flexible
pool()
function that integrates better with thebroom
andbroom.mixed
packages.
Bug fixes
- Deprecates
pool.compare()
. UseD1()
instead (#220) - Removes everything in
utils::globalVariables()
- Prevents name clashes with
tidyr
by definingcomplete.mids()
as an S3 method for thetidyr::complete()
generic (#212) - Extends the
pool()
function to deal with multiple sets of parameters. Currently supported keywords are:term
(allbroom
functions),component
(somebroom.mixed
functions) andy.values
(formultinom()
model) (#219) - Adds a new
install.on.demand()
function for lighter installation - Adds
toenail2
and remove dependency onHSAUR3
- Solves problem with
ampute
in extreme cases (#216) - Solves problem with
pool
withmgcv::gam
(#218) - Adds
.gitattributes
for consistent line endings