The basic steps to deciding which algorithm to use will depend on a number of factors. A few factors which one can look for are listed below:
- The number of examples in the training set.
- Dimensions of featured space.
- Do we have correlated features?
- Is overfitting a problem?
These are just a few factors on which the selection of the algorithm may depend. Once you have the answers to all these questions, you can move ahead to decide the algorithm.
- The main reason to use an SVM instead is that the problem might not be linearly separable. In that case, we will have to use an SVM with a non-linear kernel (e.g. RBF).
- Another related reason to use SVMs is if you are in a higher-dimensional space. For example, SVMs have been reported to work better for text classification.
But it requires a lot of time for training. So, it is not recommended when we have a large number of training examples.
- It is robust to noisy training data and is effective in the case of a large number of training examples.
But for this algorithm, we have to determine the value of parameter K (number of nearest neighbors) and the type of distance to be used. The computation time is also very much as we need to compute the distance of each query instance to all training samples.
- Random Forest is nothing more than a bunch of Decision Trees combined. They can handle categorical features very well.
- This algorithm can handle high-dimensional spaces as well as a large number of training examples.
Random Forests can almost work out of the box and that is one reason why they are very popular.
Important Notice for college students
If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at firstname.lastname@example.org