Lesson 2: (Detour) Learning SVD and its use as a form of Collaborative Filtering

Singular Value Decomposition

If we have a complete matrix \(A\), we can factorise the matrix using SVD.

In [2]:
import numpy as np
np.set_printoptions(precision=1)
np.set_printoptions(suppress=True)

A=np.array([[4,2,6],[5,1,7],[2,4,4],[3,4,2],[5,0,1],[3,1,2]])
print A
[[4 2 6]
 [5 1 7]
 [2 4 4]
 [3 4 2]
 [5 0 1]
 [3 1 2]]

The output of SVD are three matrices, their product equals the original matrix. (some constraints exists regarding acceptable forms for A).

In [3]:
U,s,V = np.linalg.svd(A);

S = np.zeros([U.shape[0],V.shape[1]])
diags = np.diag(s)
S[:diags.shape[0],:diags.shape[1]] =+ diags

Think of it as a rotation, a scale and another rotation...

The trick is we can ignore the dimensions that have little influence on the final answers (i.e. those rows in which \(s_i\) is small). s is neatly sorted for us by the svd function.

The shape of the matrices originally:

In [4]:
print U.shape
print S.shape
print V.shape
(6, 6)
(6, 3)
(3, 3)

To illustrate the importance of the three factors, compare the values in \(s\):

In [5]:
s
Out[5]:
array([ 14.3,   4.4,   3.6])

So this works, but as the last term in the diagonal is smaller than the others, we can set it to zero... ...think about which rows and columns in U and V become irrelevant.

Summary: We can get rid of the last column of U and the last row of V.

In [6]:
r = 2
U = U[:,0:r]
S = S[0:r,0:r]
V = V[0:r,:]
print "U:"
print U
print "S:"
print S
print "V:"
print V
U:
[[-0.5 -0. ]
 [-0.6  0.3]
 [-0.4 -0.6]
 [-0.3 -0.4]
 [-0.3  0.6]
 [-0.3  0.1]]
S:
[[ 14.3   0. ]
 [  0.    4.4]]
V:
[[-0.6 -0.3 -0.7]
 [ 0.5 -0.8 -0.1]]

Comparing the output from this reduced matrix factorisation and the original matrix:

In [7]:
print "Approximation:"
print np.dot(U,np.dot(S,V))
print "Original:"
print A
Approximation:
[[ 4.6  2.4  5.3]
 [ 5.9  1.7  5.9]
 [ 2.   4.   4. ]
 [ 1.8  3.1  3.4]
 [ 3.7 -0.9  2.5]
 [ 2.6  0.7  2.5]]
Original:
[[4 2 6]
 [5 1 7]
 [2 4 4]
 [3 4 2]
 [5 0 1]
 [3 1 2]]

Clearly it is quite similar. The new matrix is of rank-2 rather than rank-3, but the least important dimension has been removed.

Missing data?

What do we do if some of the elements in \(A\) are absent?

To illustrate, I'll use a bigger matrix for A. Note that there is some order, or dependence, between columns: The first and third column are more similar to each other, as are the second and fourth columns.

In [8]:
A=np.array([[5,0,5,0],[4,1,3,0],[0,4,1,5],[5,1,3,1],[4,0,4,1],[1,3,0,4],[1,3,0,3],[3,2,4,1],[0,5,0,5],[0,4,1,4]])
print A
[[5 0 5 0]
 [4 1 3 0]
 [0 4 1 5]
 [5 1 3 1]
 [4 0 4 1]
 [1 3 0 4]
 [1 3 0 3]
 [3 2 4 1]
 [0 5 0 5]
 [0 4 1 4]]

We need to know which of these elements is available as training data, so a new 'weight matrix' is created, with

1=known
0=unknown
In [9]:
W=np.array([[1,1,0,1],[0,1,1,0],[1,1,0,1],[1,0,1,1],[1,0,1,1],[1,0,0,1],[0,1,1,1],[0,1,1,0],[1,1,1,0],[1,1,1,1]])
print W
[[1 1 0 1]
 [0 1 1 0]
 [1 1 0 1]
 [1 0 1 1]
 [1 0 1 1]
 [1 0 0 1]
 [0 1 1 1]
 [0 1 1 0]
 [1 1 1 0]
 [1 1 1 1]]

Repeatedly apply SVD?

This algorithm, which doesn't seem to really work operates as follows:

1. Generate a new guess from the old guess
   1.1. Find the SVD of the current guess
   1.2. Reduce the rank of the matrices to an arbitrary rank, r
   1.3. Generate a new matrix, of this reduced rank
2. Replace the guessed elements of this new matrix for which we know their true values
3. Repeat 1. and 2.

Function to estimate new matrix...

In [12]:
def est(X,r):
    U,s,V = np.linalg.svd(X)
    S = np.zeros([U.shape[0],V.shape[1]])
    diags = np.diag(s)
    S[:diags.shape[0],:diags.shape[1]] =+ diags
    U = U[:,0:r]
    S = S[0:r,0:r]
    V = V[0:r,:]
    return np.dot(U,np.dot(S,V))

Function to iterate this algorithm

In [13]:
known = A*W
guess = known
for i in range(1000):
    fullguess = est(guess,2)
    guess = (fullguess * (np.ones(W.shape)-W)) + known

print "Guesses"
print guess
print "Unknown elements"
print A*(np.ones(W.shape)-W)
Guesses
[[ 5.   0.   3.8  0. ]
 [ 3.8  1.   3.   1.1]
 [ 0.   4.   0.4  5. ]
 [ 5.   0.9  3.   1. ]
 [ 4.   1.   4.   1. ]
 [ 1.   3.6  1.1  4. ]
 [-0.4  3.   0.   3. ]
 [ 5.   2.   4.   2.2]
 [ 0.   5.   0.   5.5]
 [ 0.   4.   1.   4. ]]
Unknown elements
[[ 0.  0.  5.  0.]
 [ 4.  0.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  3.  0.  0.]
 [ 1.  0.  0.  0.]
 [ 3.  0.  0.  1.]
 [ 0.  0.  0.  5.]
 [ 0.  0.  0.  0.]]

This seems to work quite well. In the "Unknown elements" matrix above, we reveal those elements that I've left out of the training data, to test whether the algorithm can generalise. Pleasingly it seems that the guesses are roughly in the right direction.

Subtracting the mean

I was expecting that 'demeaning' would improve the algorithm, but it seems to become unstable, or have some sort of problem.

I've not investigated why - mainly because I'm not sure that repeatedly apply the SVD is a very clever way of solving this problem.

In [14]:
Aprime = A - np.mean(A)
known = Aprime*W
guess = known
for i in range(10000):
    fullguess = est(guess,2)
    guess = (fullguess * (np.ones(W.shape)-W)) + known

print "Guesses"
print guess
print "Unknown elements"
print Aprime*(np.ones(W.shape)-W)
Guesses
[[  2.7  -2.3  -8.4  -2.3]
 [  1.4  -1.3   0.7  -0.9]
 [ -2.3   1.7  22.8   2.7]
 [  2.7  -2.3   0.7  -1.3]
 [  1.7  -1.7   1.7  -1.3]
 [ -1.3   0.9  17.    1.7]
 [ -0.9   0.7  -2.3   0.7]
 [  0.3  -0.3   1.7  -0.1]
 [ -2.3   2.7  -2.3   1.7]
 [ -2.3   1.7  -1.3   1.7]]
Unknown elements
[[ 0.  -0.   2.7 -0. ]
 [ 1.7 -0.   0.  -2.3]
 [-0.   0.  -1.3  0. ]
 [ 0.  -1.3  0.  -0. ]
 [ 0.  -2.3  0.  -0. ]
 [-0.   0.7 -2.3  0. ]
 [-1.3  0.  -0.   0. ]
 [ 0.7 -0.   0.  -1.3]
 [-0.   0.  -0.   2.7]
 [-0.   0.  -0.   0. ]]

Next, minimising the sum-square error...