If something is missing or not clear - please chat with us on our slack?
Fix CRAN warnings. No new functionality.
Fix CRAN tests. No new functionality.
- fixed conditional/joint distribution confusion in the documentation of information_gain #85
- turned off the OpenMP parallel backend in information_gain function - I could cause segfault sometimes. Better safe than sorry.
- fix parameters mismatch in init.c
- remove strip debug info.
The purpose of this release is to make more compatible with FSelector
package. We changed some of the default behaviors, so the results might be different between version 2.*
and 3.0
.
information_gain
anddiscretize
get new parameterdiscIntegers
to control if integer columns should be discretized. Default value isTRUE
, so it means that they're treated like numerics. For more information please refer tovignette("integer-variables", package = "FSelectorRcpp")
.- FSelectorRcpp tries to mimic the behavior of the FSelector when there are NAs in the dependent variable passed to
information_gain
(remove only those rows which contain NAs dependent variable, NAs in independents variables are removed column-wise).
- Changed
discretize
argumentall
toTRUE
. - Added
customBreaksControl
for creating custom breaks indiscretize
function. discretize
can be now evaluated with data as a first argument in the formula interface- You can now use
discretize(iris, Species ~ .)
ordiscretize(Species ~ ., iris)
. discretize(iris, Species ~ .)
seems to be more pipe friendly.
- You can now use
discretize_transform
allows applying the discretization cut points to the new data set.extract_discretize_transformer
produces small object containing all cutpoints. It can be also used to transform the new data set.extract_discretize_transformer
can be useful in ML pipelines where the training data needs to be discarded to save memory.
- Removed support for the doSnow progress bar. We now use doParallel in our examples, which is more recommended than doSnow.
Bug fixes:
- Fixed build using Rcpp 0.12.12
- feature_search now returns proper structure.
Bug fixes:
- Export C++ routine for tests.
- Use dontrun to hide problematic example (for CRAN).
Bug fixes:
- Removed problematic links in vignettes.
Bug fixes:
- Skip benchmark if the
RTCGA.rnaseq
package is not available. - Minor fixes in vignettes.
Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with sparse matrix support.
Provided functions
discretize()
with additionalequalsizeControl()
andmdlControl
- discretize a range of numeric attributes in the dataset into nominal attributes. Minimum Description Length (MDL) method is set as the default control. There is also availableequalsizeControl()
method.information_gain()
- algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute,feature_search()
- a convenience wrapper for \code{greedy} and \code{exhaustive} feature selection algorithms that extract valuable attributes depending on the evaluation method (called evaluator),cut_attrs()
- select attributes by their score/rank/weights, depending on the cutoff that may be specified by the percentage of the highest ranked attributes or by the number of the highest ranked attributes,to_formula()
(misc) - create aformula
object from a vector.