In this blog, you will learn the mathematical intuition of Support Vector Machine and its kernels.

# Support Vector Machine

Support Vector Machines (SVM) is a supervised machine learning algorithm that can be used for classification or regression tasks. The goal of the SVM algorithm is to find the hyperplane in an N-dimensional space that maximally separates the two classes.

## Mathematical Intuition

Support Vector Machines (SVMs) are a type of supervised machine learning algorithm that can be used for classification or regression tasks. The goal of an SVM is to find the hyperplane in a high-dimensional space that maximally separates the different classes.

Imagine we have two classes of data points, represented by circles and rectangles

The SVM algorithm will try to find the hyperplane that best separates these two classes. The hyperplane is defined as the set of all points that satisfy the equation:

w^T x + b = 0

where w is a vector perpendicular to the hyperplane and x is a data point. The bias term b determines the position of the hyperplane.

In the case of two classes, the SVM will try to find the hyperplane that maximally separates the two classes and is also as far as possible from the closest data points of both classes. These points are called support vectors, as they support the position of the hyperplane.

The distance between the hyperplane and the closest data points of both classes is called the margin. The SVM algorithm tries to maximize the margin to increase the robustness and generalization of the model.

In the case of multiple classes, the SVM algorithm uses a technique called one-versus-all classification, where a separate binary SVM classifier is trained for each class. The class that is assigned the highest score by the classifiers is the predicted class for a given data point.

Overall, the mathematical intuition behind the SVM algorithm is the optimization of the hyperplane that maximally separates the different classes, while also maximizing the margin and minimizing the distance to the support vectors.

## SVM Kernels

A kernel is a function that takes in two data points and returns the inner product between them in the transformed space. Different kernels can be used to transform the data in different ways, and the choice of the kernel can have a significant impact on the resulting classifier. Some common kernels include:

Linear Kernel: It is used when the data is Linearly separable, that is, it can be separated using a single Line. It is one of the most common kernels to be used. It is mostly used when there are a Large number of Features in a particular Data Set.  If x and y are column vectors, their linear kernel is:

Polynomial kernel: It represents the similarity of vectors in the training set of data in a feature space over polynomials of the original variables used in the kernel.

Where, x, y are vectors, and d is kernel degree. If c_0=0 the kernel is said to be homogeneous.

Radial basis function (RBF) kernel: It is used to perform transformation adding radial basis method to improve transformation when there is no prior knowledge about data.

where x and y are the input vectors. If \gamma = \sigma^{-2} the kernel is known as the Gaussian kernel of variance \sigma^{2}.

Sigmoid Kernel: This function computes the sigmoid kernel between two vectors. The sigmoid kernel is also known as a hyperbolic tangent or Multilayer Perceptron (because, in the neural network field, it is often used as a neuron activation function).

Where, x and y are the input vectors, \gamma is the slope, and c_0 is the intercept.

The SVM algorithm uses the kernel function to transform the data points into a higher-dimensional space, where it is easier to find a hyperplane that separates the classes. The parameters of the kernel function (such as gamma and coef0 for the RBF and polynomial kernels, respectively) can be tuned to improve the performance of the classifier.

## Summary

In summary, the SVM algorithm is a powerful tool for classification and regression tasks because it can find the hyperplane that maximally separates the two classes in the feature space, and it can also handle high-dimensional data and non-linearly separable data by using kernel tricks.

## Closing Remarks

If you have any queries, please let me know in the comment section.

For the practical implementation of SVM for both classification and regression problem statements, please visit my GitHub.