Skip to content

Commit

Permalink
Machine learning terms; cheat sheet
Browse files Browse the repository at this point in the history
  • Loading branch information
jzkyu committed Apr 9, 2024
1 parent 9019a9f commit 87a0ee6
Show file tree
Hide file tree
Showing 16 changed files with 89 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/Public/Math/LaTeX cheat sheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@
`\begin{figure}[h]`: Starts a figure

`\circ`: $\circ$

`\Sigma`: $\Sigma$

`\overline{ab}`: $\overline{ab}$

`\infty`: $\infty$

`\sqrt{}`: $\sqrt{}$
15 changes: 15 additions & 0 deletions docs/Public/Math/Statistics/correlation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
*See: [[covariance]]*

*Correlation* measures the linear relationship between two variables ([[variable]]) $X$ and $Y$ using the Pearson correlation coefficient, the most commonly used measure of correlation for [[continuous]] variables.

$$
\rho_{X,Y} = \frac{cov(X,Y)}{\sigma_{X}\sigma_{Y}}
$$
$$
\sigma_X=\sqrt{\frac{\sum_{i=1}^n\left(X_i-\bar{X}\right)^2}{n}}
$$
$$
\sigma_Y=\sqrt{\frac{\sum_{i=1}^n\left(Y_i-\bar{Y}\right)^2}{n}}
$$

*Correlation* measures the direction and magnitude of the linear relationship ([[linear regression]]).
15 changes: 15 additions & 0 deletions docs/Public/Math/Statistics/covariance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
*See: [[correlation]]*

*Covariance* is a normalized ([[normalmeasure of how two variables ([[variable]]) $X$ and $Y$ change together linearly.

Given a population of size $n$, it is calculated as follows:

$$
COV[X, Y] = E[(X - E[X])(Y - E[Y])] = \frac{\Sigma^n_{i=1} (X_i-\overline{X})(Y_i-\overline{Y})}{n}
$$
The sign of the *covariance* indicates the direction of the relationship between variables:
- when $COV[X, Y] > 0$, $X$ and $Y$ increase and decrease together.
- when $COV[X, Y] < 0$, $X$ tends to decrease while $Y$ tends to increase and vice versa.
- when $COV[X, Y] = 0$, $X$ and $Y$ do not display any of the above two tendencies. No linear relationship between $X$ and $Y$.

Note: *Covariance can only measure the directional relationship, not the magnitude. *
7 changes: 7 additions & 0 deletions docs/Public/Software/AI/bias.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
*Bias* is the error between average model prediction (i.e. $E[g(x)]$) and the ground truth $f(x)$.

*Bias* indicates the training error.

$$
Bias^2 = E[(E[(g(x)-f(x))^2])]
$$
7 changes: 7 additions & 0 deletions docs/Public/Software/AI/coefficient of determination.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
The *coefficient of determination*, or $R^2$, measures the strength of the relationship between [[independent variable]] (inputs and the [[dependent variable]] (outputs).

It indicates the goodness of the fit of a [[linear regression]].
$$
0.0 \le R^2 \le 1.0
$$
Note: $R^2$ can increase as more predictors are added to a model, which can give an illusion of improvement.
3 changes: 3 additions & 0 deletions docs/Public/Software/AI/cross validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*Cross validation* is used to compare models and prevent overfitting.

It is a resampling procedure to help he model to generalize well
6 changes: 6 additions & 0 deletions docs/Public/Software/AI/curve fitting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*Curve fitting* refers to the process of determining a [[function]] that best approximates the relationship between some [[independent variable]] and some [[dependent variable]].

We want to minimize the overall error.

*Under-fitting* is when a model fails to capture the complex underlying patterns in the data
- Bias-Variance Trade off: if a model has high bias and low variance, the model under-fits the data
3 changes: 3 additions & 0 deletions docs/Public/Software/AI/dependent variable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
A *dependent variable* is a [[variable]] which depends on changes in the [[independent variable]]. [^1]In other words, it is the effect of the change.

[^1]: https://www.scribbr.com/methodology/independent-and-dependent-variables/
12 changes: 12 additions & 0 deletions docs/Public/Software/AI/error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
In [[machine learning]], the *error* of a model is:

$$
Bias^2 + Variance + Ireeducible Error
$$
where $Bias^2$ is

![bias](bias)

and $Variance$ is

![variance](variance)
3 changes: 3 additions & 0 deletions docs/Public/Software/AI/independent variable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
An *independent variable* is a [[variable]] that stands alone and isn't changed by the other variables you're trying to measure. [^1] In other words, it causes change.

[^1]: https://nces.ed.gov/nceskids/help/user_guide/graph/variables.asp
5 changes: 5 additions & 0 deletions docs/Public/Software/AI/mean squared error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The *mean squared error*, or $MSE$, is a way to measure the fit of the model on training data.

It can also be used to measure the fit of the model on the test data.

Note: $MSE$ doesn't inherently increase with more predictors, making it a more "honest" metric when tweaking model complexity.
3 changes: 3 additions & 0 deletions docs/Public/Software/AI/variable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
A *variable* is any characteristic that can take on multiple values, such as height, age, temperature, and score. [^1]

[^1]: https://www.scribbr.com/methodology/independent-and-dependent-variables/
1 change: 1 addition & 0 deletions docs/Public/Software/Networks/IP address.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
An *IP address* (Internet Protocol address) is an [[address]] primarily used to identify a device connected on a [[network]] using the Internet Protocol.
3 changes: 3 additions & 0 deletions docs/Public/Software/Networks/MAC address.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
A *MAC address* (Media Access Control address) is an [[address]] primarily used as a unique identifier assigned to a piece of [[hardware]].

It is a 12-digit hexadecimal number assigned to each device connected to the [[network]].
1 change: 1 addition & 0 deletions docs/Public/Software/Networks/address.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
An *address* is a way to identify where some [[hardware]] or [[software]] is located.
3 changes: 3 additions & 0 deletions docs/Public/Software/Networks/network.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
A *network* is a [[set]] of computers ([[hardware]]) sharing resources (like printers) and exchanging [[data]] with each other. [^1]

[^1]: https://fcit.usf.edu/network/chap1/chap1.htm

0 comments on commit 87a0ee6

Please sign in to comment.