SVM from Scratch? Anyone?

We have already seen about SVM here but this post is borne out of trying to understand inner workings of kernel and predict function.  While trying to work on custom kernel example in Sklearn,  found that the support_vectors_ are empty, that lead to  question, how a custom kernel is used to transform the the test points. Here is the answer!

For example, an input data X of \(i\)x\(j\), when transformed by a Gaussian kernel, the new data is of  \(i\)x\(i\) matrix. We use it to train the model. Now when we want to predict some data points, say X1 \((m\)x\(j)\), we need to use the kernel and get a matrix of \(m\)x\(i\), then how it is done?
So, we need to use the original training set to transform and the equation is here that’s \( K(x_i, x)\), where \(x_i\) is the test data point and \(x\) is the original data set

\[ g(x) = \text{sgn}\left(b + \sum_{i=1}^n \alpha_i y_i K(x_i, x)\right)\]

Curtsy Andrew Ng, ML Octave code, and the implementation goes like this, where Xi is the test data and X is the original data

def gaussianKernel(Xi, X, sigma):
    """
    Returns a gaussian kernel between Xi and X
    Returns:
        K: array
    """
    K = np.zeros((Xi.shape[0], X.shape[0]))
    for i, x1 in enumerate(Xi):
        for j, x2 in enumerate(X):
            x1 = x1.ravel()
            x2 = x2.ravel()
            K[i, j] = np.exp(-np.sum(np.square(x1 - x2)) / (2 * (sigma**2)))
    return K

Here the function  predict

#takes test data Xi and original X, weights and intercept
def predict_sgd(Xi,X,kernelFunction,w,b,sigma):
    K=kernelFunction(Xi, X, sigma)
    return [ 1 if np.dot(w, K[i])+b > 1 else 0
                        for i in range(len(K))]

Let’s put together all in one

First the training algorithm

# The Gradient descent using hinge loss function 
def svm_sgd(X,y,C=1,epochs=10000):
    #initializing weight number of features or variables
    w = np.zeros(X.shape[1])
    #loop until epochs 
    for e in range(1,epochs):
        for i, x in enumerate(X):
            if (y[i]*np.dot(X[i], w)) < 1:
                w = w + C * ((y[i]*X[i]) + (-2*(1/e)*w))
            else:
                w = w + C * (-2*(1/e)*w)
        #clueless about the epochs.. here is a progress indicator !! 
        print('{} / {} complete.'.format(e,epochs), end='r')
        sys.stdout.flush()
    return w
# Takes the X and label y as well as kernel function
# returns weights w and y-intercept b   
def svm_sgd_train(X, y, kernelFunction, C, epochs, sigma=2):
    K=kernelFunction(X, X, sigma)
    w=svm_sgd(K,y,C=C, epochs=epochs)
    b=sum(y-K.dot(w))/len(K)
    return w,b
#

Running the algorithm on a data set from Angrew Ng’s ML course, las one where sigma=0.1 and C=0.8 seem to fit nicely!

 

Lastly the train and visualization that uses function predict to draw the boundaries!

# Takes X, y labesl and a kernel function, wieghts and intercept.
def visualize_svm_sgd(X, y, kernelFunction,w,b,sigma, title):
    fig, ax = plt.subplots()
    x1plot = np.linspace(X[:,0].min(), X[:,0].max(), 100).T
    x2plot = np.linspace(X[:,1].min(), X[:,1].max(), 100).T
    X1, X2 = np.meshgrid(x1plot, x2plot)
    Z = np.zeros(X1.shape)
    for i in range(X1.shape[1]):
        this_X = np.column_stack((X1[:, i], X2[:, i]))
        Z[:, i] = predict_sgd(this_X, X,kernelFunction,w,b, sigma)
    # Plot the SVM boundary
    plt.contour(X1, X2, Z, colors="b", levels=[0,0])
    # Plot the training data.
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, edgecolors='k')
    # title for the plots
    plt.title(title)

#####
sigma=0.1
C=0.8
w,b=svm_sgd_train(Xval,yval,gaussianKernel,C, 10000, sigma)

####
visualize_svm_sgd(Xval, yval, gaussianKernel,w,b,sigma,
 "Sigma=0.1 C=2: SVM (Gaussian Kernel) Decision Boundary")

The performance was not that bad! we are very close to the Andres Ng’s code eh!