Major changes
-
Imputing categorical data by predictive mean matching. Predictive mean matching (PMM) is the default method of
mice()
for imputing numerical variables, but it has long been possible to impute factors. This enhancement introduces better support to work with categorical variables in PMM. The former system translated factors into integers byynum <- as.integer(f)
. However, the order of integers inynum
may have no sensible interpretation for an unordered factor. The new system quantifiesynum
and could yield better results because of higher$R^2$ . The method calculates the canonical correlation betweeny
(as dummy matrix) and a linear combination of imputation model predictorsx
. The algorithm then replaces each category ofy
by a single number taken from the first canonical variate. After this step, the imputation model is fitted, and the predicted values from that model are extracted to function as the similarity measure for the matching step. -
The method works for both ordered and unordered factors. No special precautions are taken to ensure monotonicity between the category numbers and the quantifications, so the method should be able to preserve quadratic and other non-monotone relations of the predicted metric. It may be beneficial to remove very sparsely filled categories, for which there is a new
trim
argument. All you have to use the new technique is specify tomice(..., method = "pmm", ...)
. Both numerical and categorical variables will then be imputed by PMM. -
Potential advantages are:
- Simpler and faster than fitting a generalised linear model, e.g., logistic regression or the proportional odds model;
- Should be insensitive to the order of categories;
- No need to solve problems with perfect prediction;
- Should inherit the good statistical properties of predictive mean matching.
-
Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren
-
New system-independent method for pooling: This version introduces a new function
pool.table()
that takes a tidy table of parameter estimates stemming fromm
repeated analyses. The input data must consist of three columns (parameter name, estimate, standard error) and a specification of the degrees of freedom of the model fitted to the complete data. Thepool.table()
function outputs 14 pooled statistics in a tidy form. The primary use ofpool.table()
is to support parameter pooling for techiques that have notidy()
orglance()
methods, either withinR
or outsideR
. Thepool.table()
function also allows for a novel workflows that 1) break apart the traditionalpool()
function into a data-wrangling part and a parameters-reducing part, and 2) does not necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren -
literanger: Adds support for the
literanger
package forrf
imputation that is about twice as fast asranger
(#648). Thanks @stephematician for the contribution.
Breaking changes
-
The
complete(..., action = "long", ...)
command puts the columns named".imp"
and".id"
in the last two positions of the long data (instead of first two positions). In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with. Note that any existing code that assumes that variables".imp"
and".id"
are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names".imp"
and".id"
. If you want the old behaviour, specify the argumentorder = "first"
. (#569). Contributed @stefvanbuuren -
Drops support for S4. Convert S4-related code to S3. Syntax
as(df, "mids")
is deprecated. Useas.mids(df)
instead. -
Adopts the
broom
-convention for naming lower and upper bounds of the confidence interval as"conf.low"
and"conf.high"
. Do not use non-syntactic names anymore, like"2.5 %"
.
Minor changes
- Adds support for the
dots
argument toranger::ranger(...)
inmice.impute.rf()
(#563). Contributed @edbonneville - Prepares for the deprecation of the
blocks
argument at various places - Removes the need for
blocks
ininitialize_chain()
- In
rbind()
, when formulas are concatenated and duplicate names are found, also rename the duplicated variables in formulas by their new name - Solves problem with the package documentation link
- Simplifies
NEWS.md
formatting to get correct version sequence on CRAN and in-package NEWS - Initialize single-variables blocks in
make.method()
in a more efficient way (resolves #672) - Prevent
as.mids()
from filling theimp
object for complete variables - Defines S3 class constructors for
mids
,mads
,mira
andmipo
objects
Bug fixes
- Fixes the "large logo" problem. (#574). Contributed @hanneoberman
- Patches a bug in
complete()
that auto-repeated imputed values into cells that should NOT be imputed (occurred as a special case ofrbind()
, where the first set of rows was imputed and the second was not). - Replaces the internal variable
type
by the more informativepred
(currently active row ofpredictorMatrix
) - Fixes a bug in
filter.mids()
that incorrectly removed empty components in theimp
object - Fixes a bug in
ibind()
that incorrectly usedlength(blocks)
as the first dimension of thechainMean
andchainVar
objects - Corrects the description
visitSequence
,chainMean
andchainVar
components of themids
object - Fixes problems with zero predictors (#588)
- Fixes a problem with the
minpuc
argument inquickpred()
(#634) - Fixes
coef() not available on S4 object
when using withlavaan
(#615, #616) - Adds
.github/dependabot.yml
configuration to automate daily check (#598) - Update documentation tags to
roxygen2 7.3.1
requirements - Repairs lost braces in the documentation
- Fixes an installation problem when
Rprofile
prints tostdout
on Fedora, R version 4.1.3 (#646, #647). Thanks @brookslogan for the fix. - Fixes a bug during initialization of factor values
- Removes
methods
andrlang
fromDepends
- Removes export of non-user facing
ampute()
helpers - Clears
\link
statements that do not pass CRAN checks