**Support Vector Machines**

- In machine learning,
**support-vector machines**(**SVMs**, also**support-vector networks**) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. - SVM + kernel SVM

https://colab.research.google.com/drive/1bdT5U1vRbWnu1FxsOqu8TnUZFFNGyaQO?usp=sharing

https://colab.research.google.com/drive/1a3BxoT_EX3zu4-s5VMRw0YGIXw6QCMFH?usp=sharin

Given an arbitrary dataset, you typically don’t know which kernel may work best. I recommend starting with the simplest hypothesis space first — given that you don’t know much about your data — and work your way up towards the more complex hypothesis spaces. So, the linear kernel works fine if your dataset if linearly separable; however, if your dataset isn’t linearly separable, a linear kernel isn’t going to cut it (almost in a literal sense ;)).

This works perfectly fine. And here comes the RBF kernel SVM:

**WHEN NOT TO USE RBF KERNEL:**

Linear SVM is a parametric model, an RBF kernel SVM isn’t, and the complexity of the latter grows with the size of the training set. Not only is it more expensive to train an RBF kernel SVM, but you also have to keep the kernel matrix around, and the projection into this “infinite” higher dimensional space where the data becomes linearly separable is more expensive as well during prediction. Furthermore, you have more hyperparameters to tune, so model selection is more expensive as well! And finally, it’s much easier to overfit a complex model!

BUT LINEAR KERNEL WORKS ONLY ON LINEARLY SEPARABLE DATA:

LIKE IN THIS CASE THE DATA IS NOT LINEARLY SEPARABLE AND USING LINEAR KERNEL DOES BLUNDER

In this case, a RBF kernel would make so much more sense:

In practice, it is less useful for efficiency (computational as well as predictive) performance reasons. So, the rule of thumb is: use linear SVMs (or logistic regression) for linear problems, and nonlinear kernels such as the Radial Basis Function kernel for non-linear problems.

The RBF kernel SVM decision region is actually also a linear decision region. What RBF kernel SVM actually does is to create non-linear combinations of your features to uplift your samples onto a higher-dimensional feature space where you can use a linear decision boundary to separate your classes:

*Polynomial Kernel*

*Think of the polynomial kernel as a transformer/processor to generate new features by applying the polynomial combination of all the existing features.*

*Existing Feature: X = np.array([-2,-1,0, 1,2])*

*Label: Y = np.array([1,1,0,1,1])*

*it’s impossible for us to find a line to separate the yellow (1)and purple (0) dots (shown on the left).*

*But, if we apply transformation X² to get:*

*New Feature: X = np.array([4,1,0, 1,4])*

*By combing the existing and new feature, we can certainly draw a line to separate the yellow purple dots (shown on the right).*

*Support vector machine with a polynomial kernel can generate a non-linear decision boundary using those polynomial features.*

*Radial Basis Function (RBF) kernel*

*Think of the Radial Basis Function kernel as a transformer/processor to generate new features by measuring the distance between all other dots to a specific dot/dots — centers. The most popular/basic RBF kernel is the Gaussian Radial Basis Function:*

Existing Feature: X = np.array([-2,-1,0, 1,2])

Label: Y = np.array([1,1,0,1,1])

Again, it’s impossible for us to find a line to separate the dots (on left hand).

But, if we apply Gaussian RBF transformation using two centers (-1,0) and (2,0) to get new features, we will then be able to draw a line to separate the yellow purple dots (on the right):

New Feature 1: X_new1 = array([1.01, 1.00, 1.01, 1.04, 1.09])

New Feature 2: X_new2 = array([1.09, 1.04, 1.01, 1.00, 1.01])