In this post we will be discussing the fundamentals of building machine learning models to solve problems
In Machine Learning terminology models refer to the structure that maps from an input to an output.
So how is a model defined?
- What needs to be predicted?
So Classification ,Regression, Patterns and groupings need to be determined
- What attributes must be defined?
This is important because often we don’t need all attributes of a thing in order to make a determination for specific task. For example, if we are deciding whether or not to turn on the television, the color of the television or how far away it is in relation to us is irrelevant.
- What dataset is to be used?
Whether we go for labeled or unlabeled data. In other words it’s supervised or unsupervised model?
- We have to determine how big or small our dataset should be. Models typically require many examples for each predictable element and models need to be tested.
A common method for testing is to split example dataset into a training set and testing set.
You test the percentage of accurate outputs with the testing set and repeat the test with different splits of the dataset – a process know as cross validation. And there are important considerations for model’s quality. Is it consistent? How many examples can it correctly predict over and over again? What’s the size of the dataset? Small Datasets can cause overfitting – an statistical phenomena that occurs when a model contain too much noise and loses predictive power. Precision and recall is an important quality consideration and statistical methods like F1 Score can be used to determine model’s quality.
Feature Engineering is the process of creating features for datasets and it’s used to optimize results i.e. provide a better environment for machine learning agents to learn and make decisions.
Examples include combining attributes , e.g. determining area by multiplying height and width. Decomposing attributes- say taking a system’s datetime function and breaking it out into year, month, day and time of the day.
And transforming data is a method of feature engineering where you change units or normalizing data, for example.
There are benefits of feature engineering. It can make models yield better results and it can make algorithms perform better.
Finally when selecting features, features can be scored by running different combinations of features with training algorithms.