# Gaussian Mixture Models

## Motivation

In order to train my low-level model to actuate given a vector of cluster info, I will need to actually create a method of quantifying clusters with factoring in the distance to each cluster.The obvious solution to this is a Gaussian Mixture Model which can be thought of as an extension of K-Means Clustering that factors in the overall variance of the dataset.

## Implementation

### Load Dataset

Our dataset consists of a large csv of steer,motor values. Let’s load this up and make all of our importaint imports.

```
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import progressbar
import seaborn
from IPython.core.pylabtools import figsize
figsize(15, 8) # set default figure size
a = pd.read_csv('out.csv')
```

### Construction of Gaussian Mixture Model

In the last post we discovered the optimum number of clusters for our data is 10 using the elbow method. Let’s construct a Gaussian Mixture Model according to these specifications and fit it to our dataset.

```
from sklearn import mixture
k = 10
gmm = mixture.GaussianMixture(n_components=10, covariance_type='full').fit(a)
```

### Plotting Results

We need some way to plot the Gaussian mixture model we have generated, thankfully sci-kit learn has some example code for this that we can adapt to our purpouses:

```
from scipy import linalg
import matplotlib as mpl
import itertools
color_iter = itertools.cycle(plt.cm.rainbow(np.linspace(0,1,10)))
def plot_results(X, Y_, means, covariances, index, title):
splot = plt.subplot()
for i, (mean, covar, color) in enumerate(zip(
means, covariances, color_iter)):
v, w = linalg.eigh(covar)
v = 2. * np.sqrt(2.) * np.sqrt(v)
u = w[0] / linalg.norm(w[0])
# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant
# components.
if not np.any(Y_ == i):
continue
plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], .8, color=color)
# Plot an ellipse to show the Gaussian component
angle = np.arctan(u[1] / u[0])
angle = 180. * angle / np.pi # convert to degrees
ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)
ell.set_clip_box(splot.bbox)
ell.set_alpha(0.5)
splot.add_artist(ell)
plt.xticks(())
plt.yticks(())
plt.title(title)
plot_results(a.values, gmm.predict(a), gmm.means_, gmm.covariances_, 0,
'Gaussian Mixture')
plt.show()
```

Wow! This is fairly different to the model we saw in my previous kmeans-clustering model I had expected some difference between the k-means clustering models and the gaussian mixture model but clearly it was more significant when taking variance into account

Interestingly the outliers seem to be given their own cluster in this model, seen through the large green circle in the top middle section. This may prove to be useful as it will identify a “tricky situation with high speed maneuvers neccessary” for my model.

Also I’m suprised by the overlapping nature of these Gaussian processes, it makes me want to try finding a secondary “natural” number of modes in the dataset. One method other than the “elbow method” that I can use for this is a Bayesian Guassian Mixture with a Drichlet process prior.. The benefit of this method is it doesn’t require any human interpretation per say, so can be considered more objective in it’s measurement. There are two problems with this method however:

- The extra paremeterization neccessary for variational inference makes it slower.
- There can be biases in the inference algorithms as well as the Drichlet processed used.

Given the dramatic differences between the k-means model and this one, I decided it would be a good idea to try the Bayseian model.

### Bayesian Gaussian Mixture

```
dpgmm = mixture.BayesianGaussianMixture(n_components=10,
covariance_type='full').fit(a)
plot_results(a.values, dpgmm.predict(a), dpgmm.means_, dpgmm.covariances_, 1,
'Bayesian Gaussian Mixture with a Dirichlet process prior')
plt.show()
```

Alright, so the Bayesian Gaussian Mixture kept all ten components we initially assigned, leading me to believe that ten is indeed the ideal number of clusters. We also see this warning message given by python “Initialization 1 did not converge.”, this is one of the failures of the Bayesian Drichlet Process Model, as it does not always converge. It also took quite a long time for the entire training and inference process: about 45 min.

However it did serve it’s purpouse in verifying the 10 clusters we use. Before I finished off and determine that the 10 component model is perfect, I wanted to compare the cluster diagram from the k-means clustering to the diagram generated for the Gaussian mixture model. Let’s do this now.

```
def plot_results(X, Y_, means, covariances, index, title):
splot = plt.subplot()
for i, (mean, covar, color) in enumerate(zip(
means, covariances, color_iter)):
v, w = linalg.eigh(covar)
v = 2. * np.sqrt(2.) * np.sqrt(v)
u = w[0] / linalg.norm(w[0])
# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant
# components.
if not np.any(Y_ == i):
continue
plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], .8, color=color)
# # Plot an ellipse to show the Gaussian component
# angle = np.arctan(u[1] / u[0])
# angle = 180. * angle / np.pi # convert to degrees
# ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)
# ell.set_clip_box(splot.bbox)
# ell.set_alpha(0.5)
# splot.add_artist(ell)
plt.xticks(())
plt.yticks(())
plt.title(title)
plot_results(a.values, gmm.predict(a), gmm.means_, gmm.covariances_, 0,
'Gaussian Mixture')
plt.show()
```

Looking at these two models side-by-side without the ellipses drawin it becomes much simpler to identify the similar characterisics.

- Both models seem to divide the data primarily via steering values.
- The gaussian model has a high motor speed cluster, which makes sense as this would limit the individual variance in each cluster, something the Gaussian model takes into account.
- We somewhat lose the symmetry of the clusters that we had in the k-means diagram as there is an odd number of clusters in the Gaussian Mixture Model.

## Analysis/Interesting Future Work

The gaissian mixture creates a more interesting distribution than the k-means clustering approached achieved by factoring in variance and minimizing the distance to all clusters rather than to a single cluster. These differences may manifest themselves in more interesting behaviors to monitor.

Future work could be in analyzing a relationship between clusters in the dataset and a high loss value. As I said in the last post, it would be interesting to overlay these clusters on top of the video of the car driving, it may provide us with some interesting context that we were missing before.

## Next Step

For my current paper the next step will be to design a network to take the distance to each cluster from the Gaussian mixture model as input and act upon this (deciding how to act). To prevent the network from simply memorizing the locations of the clusters and using this info to directly act upon the data, I will introduce a great deal of noise into the cluster data during training. This network will also recieve camera input data, but the exact format and structure of this network is yet to be determined.