Skip to content

Commit

Permalink
afternoon 0x2165
Browse files Browse the repository at this point in the history
  • Loading branch information
rajp152k committed Sep 28, 2023
1 parent 143b487 commit 19c6de8
Show file tree
Hide file tree
Showing 10 changed files with 164 additions and 9 deletions.
2 changes: 0 additions & 2 deletions Content/20230712131112-blogging.org
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,6 @@ An index of all that I write about, published, work in progress and prospective.
- there's the objective past (questionable what's objective) (an event that's one and done) and then there's your perspective about it (that is a stream that you carry on for life (I'm not accounting for forgetting right now if you journal, blog, create any sort of sensible content as it is easily accessible : (example : I could remember what stage of Life I was in based on the book that I was reading then : Once I saw the book : All memories fell into context like dominoes))).
- you can only alter your perspective about the past in the present moment of the stream but never change your previous thoughts in the stream.
- There is no mental time machine allowing you to manipulate your past memories : you simply partially overwrite stuff but never alter it's state in the past...

- thinking of how I could structure these posts : the future also plays a part in making decisions - there is some level of certainty associated with the future if you chalk out your actions and have realistic expectations out of them.
- being in a definite caloric deficit while following an recordable resistance training protocol will yield results that will be within a certain definite range around your expectations when you set out with the goal.
** Prospective
Expand All @@ -137,5 +136,4 @@ An index of all that I write about, published, work in progress and prospective.
- meditative walks
- mental games -> structuring an article mentally given a writing prompt is a pretty complex and satisfying mental game
- if you're a physics aficionado like I am, consider observing your surroundings and coming up with mental mathematical models to represent reality.

*** Why text is awesome (for logicians.) and semantically discrete images is what you should limit yourself to
15 changes: 15 additions & 0 deletions Content/20230721111610-cosine_similarity.org
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,18 @@
- is a measure of closeness of two vectors.
- a common use case is lifting a collection of real life [[id:b8178e96-18bd-43da-915b-11909971a316][datum]] objects into a dense vector space and being able to comment on their semantic closeness/farness using the notions of vector similarity.

#+begin_src lisp
(defun dot-product (vec-a vec-b)
(assert (= (len vec-a) (len vec-b)))
(reduce #'+
(mapcar #'* vec-a vec-b)
0))

(defun l2-norm (vec)
(sqrt (reduce #'+ (mapcar #'square vec) 0)))

(defun cosine-similarity (vec-a vec-b)
(/ (dot-product vec-a vec-b)
(* (l2-norm vec-a)
(l2-norm vec-b))))
#+end_src
11 changes: 10 additions & 1 deletion Content/20230911114632-the100pagemlbook.org
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#+filetags: :book:ml:ai:

The parent reference sentinel for this book is rooted as [[id:523db378-6e64-41a3-8890-ad782c67b5e9][The Hundred Page Machine Learning Book]] under the major machine learning node.
I populate this node with the intention to index into other major nodes of the field and fill in some holes that are generic and require a book for and end to end coverage cause tending to them in a non-project oriented scenario isn't worth the time.
I populate this node with the intention to index into other major nodes of the field and fill in some holes that are generic and require a book for an end to end coverage cause tending to them in a non-project oriented scenario isn't worth the time.

* Introduction
** What is ML?
Expand Down Expand Up @@ -117,3 +117,12 @@ I populate this node with the intention to index into other major nodes of the f
** [[id:91729987-32db-482a-bc1b-91469579413b][Logistic Regression]]
** [[id:a2c424a5-d412-496c-abcb-1fd216548a02][Decision Trees]]
** [[id:b8194cd8-57bc-4f4a-9862-baa8d5599033][k-Nearest Neighbors]]
* Anatomy of a Learning Algorithm
Any learning algorithm is centered around certain basics:
- A [[id:d99d5a5f-93fc-4f3b-b72e-ea59037956f9][loss function]]
- an [[id:7b9be887-8c39-4a37-8217-f0e21a6cb64e][optimization]] ..
- criterion, inspired from the loss function
- routine, that finds a solution to the optimization criterion
* Basic Practice
** [[id:5ca10a46-d9b8-4a6b-8aab-34ec17d55049][Feature Engineering]]
** [[id:c3e62ed9-31d6-4ceb-ad82-c4d0e9b48c77][Algorithm Selection]]
13 changes: 13 additions & 0 deletions Content/20230911123345-clustering.org
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,16 @@
#+title: Clustering
#+filetags: :tbp:ml:ai:


To be populated ...

* Loss

The nature of a clustering can be evaluated via multiple perspectives and their combinations. This leads to several loss functions that are losely based on the two major criteria:
- larger Inter cluster distance is better
- small Intra-cluser diameter is better

Some advanced metrics may even consider considering the shape of the clusters but I won't be exploring that here.

checkout [[https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation][scikit learn clustering docs]] to know more about evaluating clustering performance.

3 changes: 3 additions & 0 deletions Content/20230911161621-optimization.org
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
#+title: Optimization
#+filetags: :tbp:math:

A branch of math dealing with optimizing a certain criterion (see [[id:d99d5a5f-93fc-4f3b-b72e-ea59037956f9][loss]]) by choosing corresponding optimal parameters.

The most generic and popular optimization algorithm might be (Stochastic) [[id:a4761c32-806d-4a7f-ba18-27136a3de1fc][Gradient Descent]]
21 changes: 16 additions & 5 deletions Content/20230912162836-k_nearest_neighbors.org
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
:PROPERTIES:
:ID: b8194cd8-57bc-4f4a-9862-baa8d5599033
:END:
#+title: k-Nearest Neighbors
#+filetags: :ml:ai:
ai:

An instance of a [[id:f8ed9d28-324b-4657-84e4-29cf735a782f][non parameteric learning algorithm]]. It doesn't distill the training data into parameters but the data is retained as a part of the algorithm.

* Basics

When fed a new feature vector, it's clustered into the nearest existing group subject to a closeness criterion summarized as:
- fetch the k-nearest existing feature vectors to the new vector
- classify the new one into the cluster holding majority in the k fetches in case of classification
- average the k fetches numerical label in case of regression to tag the new vector

The closeness criterion most commonly used is the L2-norm (euclidean distance).
A popular alternative is [[id:2ec4a33e-479d-466b-b2b1-0a5925c0222c][cosine similarity]] when you'd like to capture the notion of an angle between two vectors.
Some other criteria that can be considered : Chebychev distance, Mahalanobis distance and Hamming Distance.

The hyperparameters of the algorithm then can be defined to be the choice of the nearness criterion and the value of k.

34 changes: 33 additions & 1 deletion Content/20230914153411-gradient_descent.org
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,37 @@
:ID: a4761c32-806d-4a7f-ba18-27136a3de1fc
:END:
#+title: Gradient Descent
#+filetags: :tbp:ml:ai:
#+filetags: :ml:ai:

- an iterative [[id:7b9be887-8c39-4a37-8217-f0e21a6cb64e][optimization]] algorithm used to minimize a function (see [[id:d99d5a5f-93fc-4f3b-b72e-ea59037956f9][loss]]).
- speaking briefly:
1. we start at a random point on the parameter-space vs loss contour
2. then we step down the hyper-hill, trying to avoid getting stuck in local hyper-troughs, repeating until we can report convergence when we reach a satisfactory (usually) hyper-valley
- note that we actually can't see the hyper-hill and need to calculate the loss everytime we step somewhere, akin to hiking in the dark.
3. the parameter-space step size is controllable via a hyper-parameter -> the learning rate

- when working with a convex optimization criterion, we're sure to find a global minimum. Whereas a settlement might be necessary with complex contours.


** Impovements

*** Stochastic Gradient Descent (SGD)
:PROPERTIES:
:ID: e419c0a9-9753-48f1-82c4-f2004cc2e29c
:END:
Computing the actual loss for all of the training data can be very slow and doing so using stochastically selected smaller batches leads to the idea of stochastic gradient descent.
*** Adagrad
This scales the learning rate individually for each parameter (ADAptive GRADient descent) according to the history of gradients.
- the learning rate is smaller for large gradients and larger for smaller gradients as a consequence.
*** Momentum
Accelerate SGD by retaining a sense of past gradients to impart some inertia to the optimizatio process
- helps deals with oscillations and move more meaningfully

*** RMSProp (Root Mean Square Propagation):
RMSProp (like Adagrad) adapts the learning rate for each parameter based on the past gradients, helping to stabilize and speed up the training process.
- note that RMSprop is an improvement over Adagrad and deals with the diminishing learning rate issue.
- read more at [[https://en.wikipedia.org/wiki/Stochastic_gradient_descent][this wikipedia page]]

*** Adam (Adaptive Moment Estimation):
Adam combines the benefits of momentum and RMSProp, using both past gradients and their magnitudes to adjust learning rates, making it a versatile and efficient optimization algorithm.

60 changes: 60 additions & 0 deletions Content/20230928154934-feature_engineering.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
:PROPERTIES:
:ID: 5ca10a46-d9b8-4a6b-8aab-34ec17d55049
:END:
#+title: Feature Engineering
#+filetags: :ml:ai:

- preparing the dataset to be used for the learning algorithm
- the goal is to convert the data in to features with high-predictive power and make them usable in the first place

Some common feature engineering processes are:
** One Hot Encoding
- converting categoricals into separate booleans
** Binning (Bucketting)
- converting a continuous feature into multiple exclusive boolean buckets (based on value ranges)
- 0 to 10, 10 to 20, and so on... , for instance.
** Normalization
- converting varying numerical ranges into a standard (-1 to 1 or 0 to 1).
- aids learning algorithms computationally (avoid precision and overflow discrepancies)

#+begin_src lisp
(defun normalize (numerical-data-vector)
(let* ((min (minimum numerical-data-vector))
(max (maximum numerical-data-vector))
(span (- max min)))
(mapcar #'(lambda (feature)
(/ (- feature min)
span))
numerical-data-vector)))
#+end_src

** Standardization
- aka z-score normalization
- rescaling features so that they have the properties of a standard [[id:2f44701c-e3e4-4b02-a899-e91e747db41a][normal distribution]] (zero mean, unit variance)

#+begin_src lisp
(defun standardize (numerical-data-vector)
(let* ((mu (mean numerical-data-vector))
(sigma (sqrt (variance numerical-data-vector))))
(mapcar #'(lambda (feature)
(/ (- feature mu)
sigma))
numerical-data-vector)))
#+end_src

** Dealing with Missing Features
Possible approaches:
- removing examples with missing features
- using a learning algorithm that can deal with missing data
- data imputation techniques
** Data Imputation Techniques
- replace by mean, median or other similar statistic
- something outside the normal range to indicate imputation (-1 in a normal 2-5 range for instance)
- something according to the range and not a statistic (0 for -1 to 1 for instance)

A more advanced approach is modelling the imputation as a regression problem before proceeding with the actual task. In this case all the other features are used to predict the missing feature.

In cases of a large dataset, one can introduce an extra indicator feature to signify missing data and then place a value of choice.

- test more than 1 technique and proceed with what suits best

6 changes: 6 additions & 0 deletions Content/20230928155802-normal_distribution.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
:PROPERTIES:
:ID: 2f44701c-e3e4-4b02-a899-e91e747db41a
:END:
#+title: Normal Distribution
#+filetags: :tbp:math:

8 changes: 8 additions & 0 deletions Content/20230928161331-algorithm_selection.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:PROPERTIES:
:ID: c3e62ed9-31d6-4ceb-ad82-c4d0e9b48c77
:END:
#+title: Algorithm Selection
#+filetags: :ml:ai:

* Factors to consider

0 comments on commit 19c6de8

Please sign in to comment.