The Expectation-Maximization (EM) algorithm is employed to estimate the parameters of Gaussian Mixture Models (GMMs) from observed data. A GMM assumes that the data is generated from a mixture of several Gaussian distributions, each characterized by its mean, covariance matrix, and mixture weight. The EM algorithm iteratively refines these parameters to maximize the likelihood of the observed data.
EM-GMM-UNMIX is a MATLAB project that utilizes the Expectation-Maximization (EM) algorithm to unmix and estimate the parameters of Gaussian Mixture Models (GMMs). This project is perfect for data scientists, statisticians, and machine learning enthusiasts.
- Accurate Parameter Estimation: Uses the EM algorithm to estimate means, covariances, and weights of Gaussian components.
- Visualization Tools: Plots confidence ellipses to help visualize and interpret results.
- Synthetic Data Generation: Example scripts demonstrate the application of the EM algorithm.
ellipseCalculator
: Function to plot confidence ellipses.EMAlgorithm_GaussianUnmix
: Function to implement the EM algorithm for GMMs.EMAlgorithm_GaussianUnmix_Example
: Script to generate synthetic data, apply the EM algorithm, and visualize results.
- Clone the repository:
https://github.com/czichiy/EM-GMM-Unmix.git cd EM-GMM-Unmix
- Run the Example Script:
Open MATLAB and navigate to the project directory.:run('EMAlgorithm_GaussianUnmix_Example.m')
Randomly initialize the parameters of the Gaussian components, including means, covariance matrices, and weights.
Compute the posterior probabilities (responsibilities) that each data point belongs to each Gaussian component.
Update the parameters to maximize the expected log-likelihood of the data given the responsibilities.
Repeat the E-Step and M-Step until convergence, i.e., until the parameters stabilize and the increase in log-likelihood is below a certain threshold.
Functionality:
This file contains a function that calculates the points needed to draw an ellipse.
Technical Explanation:
• Inputs:
• x, y: Coordinates of the center of the ellipse.
• a: Semi-major axis.
• b: Semi-minor axis.
• angle: Rotation angle of the ellipse (in degrees).
• steps: Number of points to calculate (default is 36).
• Outputs:
• X, Y: Coordinates of the points on the ellipse.
• Procedure:
• The function converts the angle from degrees to radians.
• It calculates the sine and cosine of the angle.
• It creates an array of angles (alpha) from 0 to 360 degrees.
• It calculates the sine and cosine of these angles.
• It computes the X and Y coordinates of the points on the ellipse using the parametric equation of an ellipse.
• It adjusts these points based on the input center coordinates and rotation.
Functionality:
This file contains a function UnmixGaussEM that implements the Expectation-Maximization (EM) algorithm for unmixing Gaussian processes. The function estimates the parameters (means, covariances, and weights) of individual Gaussian components from a dataset generated by mixed Gaussian distributions.
Technical Explanation:
• Inputs:
• x: (N x D) matrix of input random variables data from mixed Gaussian processes.
• a0: (L x 1) vector of initial guesses for mixture proportions.
• MuE0: (L x D) matrix of initial guesses for the means of the Gaussian components.
• SigE0: (D x D x L) tensor of initial guesses for the covariance matrices of the Gaussian components.
• Iter_stop: Stopping criterion for the iterations, which can be a maximum number of iterations or a convergence threshold for the log-likelihood.
• Outputs:
• a: (L x Iter_number) matrix tracking the evolution of mixture proportion estimates over iterations.
• MuE: (L x D x Iter_number) tensor tracking the evolution of mean estimates over iterations.
• SigE: (D x D x Iter_number) tensor tracking the evolution of covariance estimates over iterations.
• Lh: Vector tracking the log-likelihood values over iterations.
• Procedure:
• Initialization:
• Parameters (means, covariances, and weights) are initialized with the provided initial guesses.
• E-Step:
• The responsibilities are computed for each data point and each Gaussian component using the current parameter estimates.
• M-Step:
• Parameters (means, covariances, and weights) are updated based on the computed responsibilities.
• Iteration and Convergence Check:
• The process iterates until convergence is achieved based on the log-likelihood. If the change in log-likelihood is below a specified threshold or the maximum number of iterations is reached, the algorithm stops.
Functionality:
This file contains a function that performs the Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMMs).
Technical Explanation:
• Inputs:
• data: A matrix of size (n_samples, n_features) containing the data points.
• num_clusters: The number of clusters (Gaussian components).
• max_iter: The maximum number of iterations for the EM algorithm.
• tol: The tolerance for convergence.
• Outputs:
• mu: A matrix of size (num_clusters, n_features) containing the means of the Gaussian components.
• Sigma: A 3D matrix of size (num_clusters, n_features, n_features) containing the covariances of the Gaussian components.
• weights: A vector of size (num_clusters, 1) containing the weights of the Gaussian components.
• Procedure:
• The function initializes the parameters (means, covariances, and weights) randomly.
• E-step: It computes the responsibilities (the probability that each data point belongs to each cluster).
• M-step: It updates the parameters (means, covariances, and weights) based on the responsibilities.
• The function iterates between the E-step and M-step until convergence is achieved or the maximum number of iterations is reached.