If we have a complete matrix \(A\), we can factorise the matrix using SVD.
import numpy as np
np.set_printoptions(precision=1)
np.set_printoptions(suppress=True)
A=np.array([[4,2,6],[5,1,7],[2,4,4],[3,4,2],[5,0,1],[3,1,2]])
print A
The output of SVD are three matrices, their product equals the original matrix. (some constraints exists regarding acceptable forms for A).
U,s,V = np.linalg.svd(A);
S = np.zeros([U.shape[0],V.shape[1]])
diags = np.diag(s)
S[:diags.shape[0],:diags.shape[1]] =+ diags
Think of it as a rotation, a scale and another rotation...
The trick is we can ignore the dimensions that have little influence on the final answers (i.e. those rows in which \(s_i\) is small). s is neatly sorted for us by the svd function.
The shape of the matrices originally:
print U.shape
print S.shape
print V.shape
To illustrate the importance of the three factors, compare the values in \(s\):
s
So this works, but as the last term in the diagonal is smaller than the others, we can set it to zero... ...think about which rows and columns in U and V become irrelevant.
Summary: We can get rid of the last column of U and the last row of V.
r = 2
U = U[:,0:r]
S = S[0:r,0:r]
V = V[0:r,:]
print "U:"
print U
print "S:"
print S
print "V:"
print V
Comparing the output from this reduced matrix factorisation and the original matrix:
print "Approximation:"
print np.dot(U,np.dot(S,V))
print "Original:"
print A
Clearly it is quite similar. The new matrix is of rank-2 rather than rank-3, but the least important dimension has been removed.
What do we do if some of the elements in \(A\) are absent?
To illustrate, I'll use a bigger matrix for A. Note that there is some order, or dependence, between columns: The first and third column are more similar to each other, as are the second and fourth columns.
A=np.array([[5,0,5,0],[4,1,3,0],[0,4,1,5],[5,1,3,1],[4,0,4,1],[1,3,0,4],[1,3,0,3],[3,2,4,1],[0,5,0,5],[0,4,1,4]])
print A
We need to know which of these elements is available as training data, so a new 'weight matrix' is created, with
1=known
0=unknown
W=np.array([[1,1,0,1],[0,1,1,0],[1,1,0,1],[1,0,1,1],[1,0,1,1],[1,0,0,1],[0,1,1,1],[0,1,1,0],[1,1,1,0],[1,1,1,1]])
print W
This algorithm, which doesn't seem to really work operates as follows:
1. Generate a new guess from the old guess
1.1. Find the SVD of the current guess
1.2. Reduce the rank of the matrices to an arbitrary rank, r
1.3. Generate a new matrix, of this reduced rank
2. Replace the guessed elements of this new matrix for which we know their true values
3. Repeat 1. and 2.
Function to estimate new matrix...
def est(X,r):
U,s,V = np.linalg.svd(X)
S = np.zeros([U.shape[0],V.shape[1]])
diags = np.diag(s)
S[:diags.shape[0],:diags.shape[1]] =+ diags
U = U[:,0:r]
S = S[0:r,0:r]
V = V[0:r,:]
return np.dot(U,np.dot(S,V))
Function to iterate this algorithm
known = A*W
guess = known
for i in range(1000):
fullguess = est(guess,2)
guess = (fullguess * (np.ones(W.shape)-W)) + known
print "Guesses"
print guess
print "Unknown elements"
print A*(np.ones(W.shape)-W)
This seems to work quite well. In the "Unknown elements" matrix above, we reveal those elements that I've left out of the training data, to test whether the algorithm can generalise. Pleasingly it seems that the guesses are roughly in the right direction.
I was expecting that 'demeaning' would improve the algorithm, but it seems to become unstable, or have some sort of problem.
I've not investigated why - mainly because I'm not sure that repeatedly apply the SVD is a very clever way of solving this problem.
Aprime = A - np.mean(A)
known = Aprime*W
guess = known
for i in range(10000):
fullguess = est(guess,2)
guess = (fullguess * (np.ones(W.shape)-W)) + known
print "Guesses"
print guess
print "Unknown elements"
print Aprime*(np.ones(W.shape)-W)
Next, minimising the sum-square error...