Skip to content

Commit

Permalink
nw
Browse files Browse the repository at this point in the history
  • Loading branch information
antoniofrancaib committed Nov 22, 2024
1 parent 36693e7 commit fc2e11b
Show file tree
Hide file tree
Showing 12 changed files with 534 additions and 292 deletions.
290 changes: 0 additions & 290 deletions 4F13/3-ranking.md → 4F13/3-probabilistic-ranking.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
- [15-Introduction-to-Probabilistic-Ranking](#15-Introduction-to-Probabilistic-Ranking)
- [16-Gibbs-Sampling-for-Inference](#16-Gibbs-Sampling-for-Inference)
- [17-Gibbs-Sampling-in-TrueSkill](#17-Gibbs-Sampling-in-TrueSkill)
- [18-Factor-Graphs](#18-Factor-Graphs)
- [19-Applying-Message-Passing-to-TrueSkill™](#19-Applying-Message-Passing-to-TrueSkill™)

---

Expand Down Expand Up @@ -373,291 +371,3 @@ $$p(y_{ij} = +1) \approx \frac{1}{N} \sum_{s=1}^N \Phi\left(\frac{w_i^{(s)} - w_
where $w_i^{(s)}$ and $w_j^{(s)}$ are samples from the posterior.


---

# 18-Factor-Graphs

Probabilistic graphical models provide a powerful framework for representing complex distributions and performing efficient inference. Among these models, factor graphs are particularly useful for representing the factorization of probability distributions and facilitating efficient computation of marginal and conditional probabilities through message passing algorithms.

### What are Factor Graphs?
A factor graph is a bipartite graphical model that represents the factorization of a function, typically a joint probability distribution. It consists of two types of nodes:
1. **Variable Nodes**: Represent the variables in the function.
2. **Factor Nodes**: Represent local functions (factors) that depend on a subset of variables.

Edges connect factor nodes to variable nodes if the factor depends on that variable.
#### Purpose of Factor Graphs
1. **Visualization**: Provides a clear graphical representation of the dependencies between variables and factors.
2. **Computation**: Facilitates efficient computation of marginal and conditional distributions through message passing algorithms.
3. **Generalization**: Factor graphs generalize other graphical models like Bayesian networks (directed graphs) and Markov networks (undirected graphs).
#### Example of a Factor Graph
Consider a joint probability distribution that factors as:

$$p(v,w,x,y,z)=f_1(v,w)\cdot f_2(w,x)\cdot f_3(x,y)\cdot f_4(x,z)$$

The corresponding factor graph has:
- **Variable Nodes**: $v, w, x, y, z$
- **Factor Nodes**: $f_1, f_2, f_3, f_4$
- **Edges**: Connect factors to the variables they depend on.

![[Pasted image 20241121214716.png]]
#### Questions We Can Answer Using Factor Graphs
1. **Marginal Distributions**: What is $p(w)$?
2. **Conditional Distributions**: What is $p(w\mid y)$?
3. **Efficient Computation**: How can we compute these distributions efficiently using the structure of the factor graph?

### Efficient Computation with Factor Graphs

#### Challenges with Naïve Computation
Computing marginals directly can be computationally expensive due to the high dimensionality and combinatorial explosion of possible variable configurations.

For example, computing $p(w)$ naively involves:

$$p(w)=\sum_v \sum_x \sum_y \sum_z f_1(v,w)f_2(w,x)f_3(x,y)f_4(x,z)$$

If each variable can take $K$ values, the computational complexity is $O(K^5)$, which becomes infeasible as $K$ and the number of variables grow.

#### Exploiting the Factor Graph Structure
The key to efficient computation lies in exploiting the distributive property of multiplication over addition and the separability of the factor graph:
1. **Distributive Property**: Allows us to rearrange sums and products to reduce computations.
2. **Tree Structure**: In tree-structured graphs (graphs without loops), each node separates the graph into disjoint subgraphs, enabling recursive computations.

### Step-by-Step Computation
1. **Group Terms**: Start by grouping terms associated with each variable or factor.
2. **Local Computations**: Compute messages locally at each node, passing summarized information to neighboring nodes.
3. **Recursive Summations**: Use the distributive property to perform sums over variables in a recursive manner, reducing the overall computational complexity.

#### Example
Compute $p(w)$ by rearranging terms:

$$p(w)=\left(\sum_v f_1(v,w)\right)\cdot\left(\sum_x f_2(w,x)\cdot\left(\sum_y f_3(x,y)\cdot\sum_z f_4(x,z)\right)\right)$$

The complexity reduces from $O(K^5)$ to $O(K^4)$ or even $O(K^2)$ with further optimizations.

![[Pasted image 20241121215227.png]]

### The Sum-Product Algorithm

The sum-product algorithm is a message passing algorithm used to compute marginal distributions in factor graphs efficiently. It is also known as belief propagation or factor-graph propagation.

#### Key Steps in the Sum-Product Algorithm
1. **Initialization**: Set initial messages, typically starting with uniform distributions or prior information.
2. **Message Passing**: Iteratively compute messages from factors to variables and from variables to factors until convergence.
3. **Marginal Computation**: Once messages have stabilized, compute the marginal distributions by combining incoming messages at each variable node.

$$p(t) = \prod_{f \in F_t} m_{f \to t}(t)$$

#### Message Computation Rules
##### Messages from Factors to Variables
For a factor $f$ connected to variables $x_1, x_2, \ldots, x_n$, the message from factor $f$ to variable $x_i$ is:

$$m_{f \to t_1}(t_1) = \sum_{t_2} \sum_{t_3} \cdots \sum_{t_n} f(t_1, t_2, \ldots, t_n) \prod_{i \neq 1} m_{t_i \to f}(t_i)$$

**Interpretation**: Sum over all variables except $x_i$, multiplying the factor $f$ with messages from neighboring variables.

#### Messages from Variables to Factors
For a variable $x$ connected to factors $f_1, f_2, \ldots, f_k$, the message from variable $x$ to factor $f$ is:

$$m_{t \to f}(t) = \prod_{f_j \in F_t \setminus \{f\}} m_{f_j \to t}(t) = \frac{p(t)}{m_{f \to t}(t)}$$

**Interpretation**: Multiply all incoming messages from neighboring factors except the recipient factor $f$.

### Computing Marginals
The marginal distribution for a variable $x$ is:

$$p(x)=\prod_{f\in ne(x)} m_{f\to x}(x)$$

**Interpretation**: Multiply all incoming messages from factors connected to $x$.

---

## 19-Applying-Message-Passing-to-TrueSkill™
[-](#index)
### The TrueSkill™ Model
TrueSkill™ is a Bayesian rating system that models player skills and predicts match outcomes. It consists of:
1. **Player Skills ($w_i$)**: Random variables representing the skill levels of players.
2. **Performance Differences ($t_g$)**: Observed performance differences in games.
3. **Game Outcomes ($y_g$)**: Observed outcomes of games, where $y_g=+1$ if Player $I_g$ wins and $y_g=-1$ if Player $J_g$ wins.

### TrueSkill™ Factor Graph
The factor graph for TrueSkill™ includes:
1. **Prior Factors**: Representing prior beliefs about player skills.

$$f_i(w_i)=N(w_i;\mu_0,\sigma_0^2)$$

2. **Game Factors**: Modeling the relationship between skills and performance differences.

$$h_g(w_{I_g},w_{J_g},t_g)=N(t_g;w_{I_g}-w_{J_g},\sigma_n^2)$$

3. **Outcome Factors**: Incorporating observed game outcomes.

$$k_g(t_g,y_g)=\delta(y_g-\text{sign}(t_g))$$

### Goals
1. **Compute Marginals**: Determine the marginal distributions of player skills $p(w_i)$.
2. **Update Beliefs**: Incorporate observed game outcomes to update beliefs about player skills.

---

## Part 5: Addressing Challenges in TrueSkill™

### Handling Loops in the Factor Graph
1. **Approximate Inference**: When the factor graph is not a tree, we can still apply message passing algorithms approximately.
2. **Iterative Message Passing**: Messages are passed iteratively until convergence, similar to belief propagation in loopy graphs.

### Approximation of Non-Standard Messages
1. **Expectation Propagation (EP)**: An approximation method that replaces complex messages with approximations (e.g., Gaussian distributions) by matching moments.
2. **Moment Matching**: Adjusting the parameters of the approximating distribution so that its first and second moments match those of the true distribution.

# Part 6: Expectation Propagation (EP) in TrueSkill™

## Overview of Expectation Propagation
EP is an iterative algorithm used to approximate complex probability distributions by simpler ones (e.g., Gaussians). It involves:

- **Approximate Factors**: Replace intractable factors with approximate ones that are tractable (e.g., Gaussian approximations).
- **Moment Matching**: Ensure that the approximate distribution matches certain moments (mean and variance) of the true distribution.
- **Iterative Updates**: Repeat the process until convergence.

## Steps in EP for TrueSkill™
1. **Initialize Messages**: Start with initial messages, typically set to uniform or prior distributions.
2. **Update Skill Marginals**: Compute the marginal distributions for skills using incoming messages.
3. **Compute Messages from Skills to Games**: Use the current skill marginals to send messages to game factors.
4. **Compute Messages from Games to Performances**: Combine messages to compute the distribution of performance differences.
5. **Approximate Performance Marginals**: Use moment matching to approximate the true distribution of performance differences with a Gaussian.
6. **Compute Messages from Performances to Games**: Update messages based on the approximate performance marginals.
7. **Compute Messages from Games to Skills**: Update skill messages based on incoming messages from performances.
8. **Iterate**: Repeat the process until messages and marginals converge.

## Detailed Equations

### Step 1: Initialize Messages
Set initial messages from game factors to skills to be uniform or prior distributions.

### Step 2: Update Skill Marginals
Compute the marginal for each skill $w_i$:

$$q(w_i) = f_i(w_i) \prod_{g \in \text{games involving } w_i} m_{h_g \to w_i}(w_i)$$

### Step 3: Compute Messages from Skills to Games
For each game $g$:

$$m_{w_{I_g} \to h_g}(w_{I_g}) = \frac{q(w_{I_g})}{m_{h_g \to w_{I_g}}(w_{I_g})}$$

$$m_{w_{J_g} \to h_g}(w_{J_g}) = \frac{q(w_{J_g})}{m_{h_g \to w_{J_g}}(w_{J_g})}$$

### Step 4: Compute Messages from Games to Performances
Compute the message from game factor $h_g$ to performance difference $t_g$:

$$m_{h_g \to t_g}(t_g) = \int \int h_g(w_{I_g}, w_{J_g}, t_g) m_{w_{I_g} \to h_g}(w_{I_g}) m_{w_{J_g} \to h_g}(w_{J_g}) \, dw_{I_g} \, dw_{J_g}$$

This results in a Gaussian distribution because $h_g$ and the messages are Gaussian.

### Step 5: Approximate Performance Marginals
Compute the marginal for $t_g$:

$$q(t_g) = m_{h_g \to t_g}(t_g) \cdot k_g(t_g, y_g)$$

Since $k_g(t_g, y_g)$ involves a step function, $q(t_g)$ is not Gaussian.

**Moment Matching**: Approximate $q(t_g)$ with a Gaussian by matching the first and second moments.

### Step 6: Compute Messages from Performances to Games
Update the message from performance difference $t_g$ back to the game factor $h_g$:

$$m_{t_g \to h_g}(t_g) = \frac{q(t_g)}{m_{h_g \to t_g}(t_g)}$$

### Step 7: Compute Messages from Games to Skills
Update messages from game factors back to skills:

$$m_{h_g \to w_{I_g}}(w_{I_g}) = \int h_g(w_{I_g}, w_{J_g}, t_g) m_{t_g \to h_g}(t_g) m_{w_{J_g} \to h_g}(w_{J_g}) \, dt_g \, dw_{J_g}$$

This results in a Gaussian message after integration.

### Step 8: Iterate Until Convergence
Repeat steps 2-7 until the messages and marginals stabilize.

---

# Part 7: Moment Matching Approximation

## Why Moment Matching?
1. **Intractable Distributions**: The true distributions may involve step functions or other non-Gaussian components that are difficult to handle analytically.
2. **Gaussian Approximation**: By approximating these distributions with Gaussians, we can leverage the tractability and closed-form solutions available for Gaussian distributions.

## Calculating Moments for Truncated Gaussians

### Truncated Gaussian Distribution
Consider a truncated Gaussian distribution:

$$p(t) = \frac{1}{Z} N(t; \mu, \sigma^2) \cdot I(y \cdot t > 0)$$

where $I(\cdot)$ is the indicator function, and $y \in \{-1, +1\}$.

### Normalization Constant
The normalization constant $Z$ is:

$$Z = \Phi\left(\frac{y\mu}{\sigma}\right)$$

where $\Phi(\cdot)$ is the standard normal cumulative distribution function (CDF).

### First Moment (Mean)
The mean $E[t]$ of the truncated Gaussian is:

$$E[t] = \mu + \sigma \frac{N\left(\frac{y\mu}{\sigma}\right)}{\Phi\left(\frac{y\mu}{\sigma}\right)} = \mu + \sigma \lambda\left(\frac{y\mu}{\sigma}\right)$$

where:

$$\lambda(z) = \frac{N(z)}{\Phi(z)}$$

and $N(z)$ is the standard normal PDF.

### Second Moment (Variance)
The variance $V[t]$ is:

$$V[t] = \sigma^2 \left(1 - \lambda\left(\frac{y\mu}{\sigma}\right) \left(\lambda\left(\frac{y\mu}{\sigma}\right) + \frac{y\mu}{\sigma}\right)\right)$$

---

# Part 8: Detailed Example of Message Passing in TrueSkill™

## Notations
- **Means and Variances**:
- $\mu_i, \sigma_i^2$: Parameters of skill marginals.
- $\mu_{t_g}, \sigma_{t_g}^2$: Parameters of performance difference marginals.
- **Precisions**:
- $\tau_i = \sigma_i^{-2}$
- $\tau_{t_g} = \sigma_{t_g}^{-2}$

## Steps

### Step 1: Initialize Messages
Messages from game factors to skills are initialized as:

$$m_{h_g \to w_{I_g}}(w_{I_g}) = 1, \, m_{h_g \to w_{J_g}}(w_{J_g}) = 1$$

### Step 2: Update Skill Marginals
For each player $i$:

$$q(w_i) = f_i(w_i) \prod_{g \in \text{games involving } i} m_{h_g \to w_i}(w_i)$$

Since all factors are Gaussian, $q(w_i)$ remains Gaussian with updated mean and variance.

---

# Part 9: Summary and Key Takeaways
1. **Factor Graphs**: Provide a framework for representing factorized functions and enable efficient computation through message passing.
2. **Sum-Product Algorithm**: An algorithm for exact inference in tree-structured factor graphs using message passing.
3. **TrueSkill™ Application**: Message passing can be applied to the TrueSkill™ model to update player skill estimates based on game outcomes.

**Challenges in TrueSkill™**:
- Non-Tree Structure: Requires approximate inference methods due to loops in the graph.
- Non-Gaussian Messages: Handled by approximation methods like Expectation Propagation.

**Expectation Propagation**:
- An iterative method that approximates complex distributions with tractable ones by matching moments.

**Moment Matching**:
- A key technique in EP for approximating intractable distributions (e.g., truncated Gaussians) with Gaussians.

## Conclusion
Understanding factor graphs and message passing algorithms is crucial for performing efficient inference in complex probabilistic models. By delving into the details of the sum-product algorithm and approximation methods like Expectation Propagation, we gain the tools to tackle challenges in real-world applications like TrueSkill™.

Loading

0 comments on commit fc2e11b

Please sign in to comment.