Skip to content

Latest commit

 

History

History
56 lines (47 loc) · 3.13 KB

glossary.md

File metadata and controls

56 lines (47 loc) · 3.13 KB

Glossary of terms introduced throughout the course

(constantly updated)

  • features = observables = 'branches' (in ROOT terminology) = 'columns' of table
  • in supervised learning x_i - vector of features, y_i is target (something we want to reconstruct).
    • optionally we may have sample weights w_i = 'cost of mistaking' at predicting this sample
  • most frequent problems in supervised learning: classification and regression
  • in unsupervised learning we are left only with vectors x_i
  • knn - k Nearest neighobours
  • ρ (rho) denotes distance in the space of features
  • p denotes probability, p_{+1}(x) = p(y = +1 | x) is a probability of x belonging to the class 1
  • < a, b > is dot product
  • < f(z) >_{over some set of z} is taking average of f(z) over the specicied set of z
  • η (eta) is learning rate a.k.a shrinkage (not to be messed up with pseudorapidity!)

Things specific to this course

  • for binary clasification target is taken to be y_i = 1 (signal) or y_i = - 1 (background)
  • classifiers / regressors have intermediate step - a decision function, which is denoted as d(x) for simple models.
    • Example: d(x) = < w, x >
  • for ensembles D(x) is used as notion.
    • Example: D(x) = \sum_j d_j(x) - simply summing decisions of weak learners
  • Beautiful L is loss, something we are minimizing in the algorithm. Typically, this is (upper) estimate of our risks, taken to have nice optimization properties.
  • Regularizations are a way to effectively bound the combinations checked during optimization
    • typically we add L_1, L2 or mixed regularization.
    • (those are very nice)
    • used to prevent overfitting (see below)
  • w (if vector), W (if matrix) are parameters of ML models to be optimized (see previous point).
    • called parameters of a model or weights of a model.
    • conflict of demotions: w_i are sample weights, because i is indexing samples in the data

Some general things

  • Process of using ML is split into
    • training = fitting = learning
    • and predicting or transforming
  • cross-validation is a process of getting reliable estimation of quality
    • in simplest case, we estimate the quality on a separate holdout - part of the data not used in training.
  • linear models are using linear decision function d(x) = < w, x >
  • generalized linear models, e.g. SVM, are using Kernel functions (which are dot products in some space)
  • Decision tree operates by checking splits, e.g. mass > 3.7
    • split is described by feature used (mass) and threshold (3.7)
  • Decision tree has pre-stopping conditions and can be post-pruned (simplified) after it was trained
  • subsampling and bagging are the same as subsampling with / without replacement
  • RSM subsamples features
  • Bagging is used for taking subsets of samples in RandomForest
  • overfitting - a very vague term in ML to denote problems with trained formula.
    • see lectures for more details
    • if you use term overfitting, you'd better directly explain what do you mean by this
  • Gradient Boosting - general technique, but typically we run it over decision trees
    • GB = GBDT = GBRT = GBM ~= MART are all typically used as names for gradient boosting over decision trees