Skip to content

mice 3.17.0

Latest
Compare
Choose a tag to compare
@stefvanbuuren stefvanbuuren released this 27 Nov 19:24

Major changes

  • Imputing categorical data by predictive mean matching. Predictive mean matching (PMM) is the default method of mice() for imputing numerical variables, but it has long been possible to impute factors. This enhancement introduces better support to work with categorical variables in PMM. The former system translated factors into integers by ynum <- as.integer(f). However, the order of integers in ynum may have no sensible interpretation for an unordered factor. The new system quantifies ynum and could yield better results because of higher $R^2$. The method calculates the canonical correlation between y (as dummy matrix) and a linear combination of imputation model predictors x. The algorithm then replaces each category of y by a single number taken from the first canonical variate. After this step, the imputation model is fitted, and the predicted values from that model are extracted to function as the similarity measure for the matching step.

  • The method works for both ordered and unordered factors. No special precautions are taken to ensure monotonicity between the category numbers and the quantifications, so the method should be able to preserve quadratic and other non-monotone relations of the predicted metric. It may be beneficial to remove very sparsely filled categories, for which there is a new trim argument. All you have to use the new technique is specify to mice(..., method = "pmm", ...). Both numerical and categorical variables will then be imputed by PMM.

  • Potential advantages are:

    • Simpler and faster than fitting a generalised linear model, e.g., logistic regression or the proportional odds model;
    • Should be insensitive to the order of categories;
    • No need to solve problems with perfect prediction;
    • Should inherit the good statistical properties of predictive mean matching.
  • Note that we still lack solid evidence for these claims. (#576). Contributed @stefvanbuuren

  • New system-independent method for pooling: This version introduces a new function pool.table() that takes a tidy table of parameter estimates stemming from m repeated analyses. The input data must consist of three columns (parameter name, estimate, standard error) and a specification of the degrees of freedom of the model fitted to the complete data. The pool.table() function outputs 14 pooled statistics in a tidy form. The primary use of pool.table() is to support parameter pooling for techiques that have no tidy() or glance() methods, either within R or outside R. The pool.table() function also allows for a novel workflows that 1) break apart the traditional pool() function into a data-wrangling part and a parameters-reducing part, and 2) does not necessarily depend on classed R objects. (#574). Contributed @stefvanbuuren

  • literanger: Adds support for the literanger package for rf imputation that is about twice as fast as ranger (#648). Thanks @stephematician for the contribution.

Breaking changes

  • The complete(..., action = "long", ...) command puts the columns named ".imp" and ".id" in the last two positions of the long data (instead of first two positions). In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with. Note that any existing code that assumes that variables ".imp" and ".id" are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names ".imp" and ".id". If you want the old behaviour, specify the argument order = "first". (#569). Contributed @stefvanbuuren

  • Drops support for S4. Convert S4-related code to S3. Syntax as(df, "mids") is deprecated. Use as.mids(df) instead.

  • Adopts the broom-convention for naming lower and upper bounds of the confidence interval as "conf.low" and "conf.high". Do not use non-syntactic names anymore, like "2.5 %".

Minor changes

  • Adds support for the dots argument to ranger::ranger(...) in mice.impute.rf() (#563). Contributed @edbonneville
  • Prepares for the deprecation of the blocks argument at various places
  • Removes the need for blocks in initialize_chain()
  • In rbind(), when formulas are concatenated and duplicate names are found, also rename the duplicated variables in formulas by their new name
  • Solves problem with the package documentation link
  • Simplifies NEWS.md formatting to get correct version sequence on CRAN and in-package NEWS
  • Initialize single-variables blocks in make.method() in a more efficient way (resolves #672)
  • Prevent as.mids() from filling the imp object for complete variables
  • Defines S3 class constructors for mids, mads, mira and mipo objects

Bug fixes

  • Fixes the "large logo" problem. (#574). Contributed @hanneoberman
  • Patches a bug in complete() that auto-repeated imputed values into cells that should NOT be imputed (occurred as a special case of rbind(), where the first set of rows was imputed and the second was not).
  • Replaces the internal variable type by the more informative pred (currently active row of predictorMatrix)
  • Fixes a bug in filter.mids() that incorrectly removed empty components in the imp object
  • Fixes a bug in ibind() that incorrectly used length(blocks) as the first dimension of the chainMean and chainVar objects
  • Corrects the description visitSequence, chainMean and chainVar components of the mids object
  • Fixes problems with zero predictors (#588)
  • Fixes a problem with the minpuc argument in quickpred() (#634)
  • Fixes coef() not available on S4 object when using with lavaan (#615, #616)
  • Adds .github/dependabot.yml configuration to automate daily check (#598)
  • Update documentation tags to roxygen2 7.3.1 requirements
  • Repairs lost braces in the documentation
  • Fixes an installation problem when Rprofile prints to stdout on Fedora, R version 4.1.3 (#646, #647). Thanks @brookslogan for the fix.
  • Fixes a bug during initialization of factor values
  • Removes methods and rlang from Depends
  • Removes export of non-user facing ampute() helpers
  • Clears \link statements that do not pass CRAN checks