# Linear scaling of covariances

In our software PopPUNK we draw a plot of a Gaussian mixture model that uses both the implementation and the excellent example in the scikit-learn documentation:

My input is 2D distance, which I first use StandardScaler to normalise each axes between 0 and 1, which helps standardise methods across various parts of the code. This is fine if you then create these plots in the scaled space, and as it is a simple linear scaling it is generally trivial to convert back into the original co-ordinates:

# scale X scale = np.amax(X, axis = 0) scaled_X = X / scale # plot scaled plt.scatter([(scaled_X)[Y == i, 0]], [(scaled_X)[Y == i, 1]], .4, color=color) # plot original plt.scatter([(scaled_X*scale)[Y == i, 0]], [(scaled_X*scale)[Y == i, 1]], .4, color=color)

The only thing that wasn’t obvious was how to scale the covariances, which are used to draw the ellipses. I knew they needed to be multiplied by the scale twice as they are variances (squared), and had a vague memory of something like xAx^T from transforming ellipses/conic sections in one of my first year maths courses. Happily that was enough to do a search, turning up an explanation on stackoverflow which confirmed this vague memory: https://math.stackexchange.com/questions/1184523/scaling-a-covariance-matrix

Here is the code for making the plot and incorporating a linear scaling

color_iter = itertools.cycle(['navy', 'c', 'cornflowerblue', 'gold','darkorange']) fig=plt.figure(figsize=(11, 8), dpi= 160, facecolor='w', edgecolor='k') splot = plt.subplot(1, 1, 1) for i, (mean, covar, color) in enumerate(zip(means, covariances, color_iter)): scaled_covar = np.matmul(np.matmul(np.diag(scale), covar), np.diag(scale).T) v, w = np.linalg.eigh(scaled_covar) v = 2. * np.sqrt(2.) * np.sqrt(v) u = w[0] / np.linalg.norm(w[0]) # as the DP will not use every component it has access to # unless it needs it, we shouldn't plot the redundant # components. if not np.any(Y == i): continue plt.scatter([(X)[Y == i, 0]], [(X)[Y == i, 1]], .4, color=color) # Plot an ellipse to show the Gaussian component angle = np.arctan(u[1] / u[0]) angle = 180. * angle / np.pi # convert to degrees ell = mpl.patches.Ellipse(mean*scale, v[0], v[1], 180. + angle, color=color) ell.set_clip_box(splot.bbox) ell.set_alpha(0.5) splot.add_artist(ell) plt.title(title) plt.savefig("ellipses.png") plt.close()