Decision Tree

Random Forest Algorithm

Random Forest Algorithm

Random Forest is a robust machine-learning algorithm that is used for both classification and regression tasks. It is a type of ensemble learning method, which means that it combines multiple decision trees to create a more accurate and stable model. The mathematical intuition behind Random Forest is rooted in the concept of decision trees and bagging. A decision tree is a tree-like structure in which the internal nodes represent the feature(s) of the data, the branches represent the decision based on those features, and the leaves represent the output or class label. Each internal node in a decision tree represents…
Read More
Decision Tree

Decision Tree

Decision tree algorithms are a type of supervised learning algorithm used to solve both regression and classification problems. The goal is to create a model that predicts the value of a target variable based on several input variables. Decision trees use a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. The model is based on a decision tree that can be used to map out all possible outcomes of a decision. A decision tree algorithm works by breaking down a dataset into smaller and smaller subsets while at the same time, an…
Read More
What is the VGG 19 neural network?

What is the VGG 19 neural network?

VGG 19 is a convolutional neural network architecture that is 19 layers deep. The main purpose for which the VGG net was designed was to win the ILSVRC imagenet competition. Let’s take a brief look at the architecture of VGG19. Input: The VGG-19 takes in an image input size of 224×224.Convolutional Layers: VGG’s convolutional layers leverage a minimal receptive field, i.e., 3×3, the smallest possible size that still captures up/down and left/right. This is followed by a ReLU activation function. ReLU stands for rectified linear unit activation function, it is a piecewise linear function that will output the input if positive otherwise, the output is zero. Stride is…
Read More
How to create Movie Recommendation System

How to create Movie Recommendation System

In this notebook, I will try to use a few recommendation algorithms (content-based, popular-based and shared filters) and try to build a collection of these models to come up with our final movie recommendation system. For us, we have two MovieLens data sets. Full Data Set: Contains 26,000,000 ratings and 750,000 tag requests applied to 45,000 movies by 270,000 users. Includes genome tag data with 12 million affiliate scores on 1,100 tags.Small Data Set: Includes 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.I will create a Simple Recommendation using movies from the Full Database while…
Read More
Predictive Analysis with different approaches

Predictive Analysis with different approaches

The goal of this notebook is not to do the best model for each Time series. It is just a comparison of few models when you have one Time Series. The presentation present a different approaches to forecast a Time Series.In this notebook we will be using web traffic data from kaggle. The plan of the notebook is: I. Importation & Data CleaningII. Aggregation & VisualizationIII. Machine Learning ApproachIV Basic Model ApproachV. ARIMA approach (Autoregressive Integrated Moving Average)VI. (FB) Prophet ApproachVII. Keras StarterVIII. Comparaison & Conclusion I. Importation & Data Cleaning In this first part we will choose the Time…
Read More
Feature engineering and SGDReg with Regularization With Students Performance Data

Feature engineering and SGDReg with Regularization With Students Performance Data

All Need Imports for the data import pandas as pd pd.options.display.max_colwidth = 80 import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.preprocessing import StandardScaler from sklearn.linear_model import SGDRegressor from sklearn.svm import SVC # SVM model with kernels from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score from sklearn.metrics import mean_squared_error import warnings warnings.filterwarnings('ignore') Loading and Exploring Data There are two files of students performance in two subjects: math and Portuguese (Portugal is the country the dataset is from). Important notice : description (later on, as DESCR) tells that "there are…
Read More
Analysis on campus recruitment data

Analysis on campus recruitment data

Campus recruitment is a strategy for sourcing, engaging and hiring young talent for internship and entry-level positions. College recruiting is typically a tactic for medium- to large-sized companies with high-volume recruiting needs, but can range from small efforts (like working with university career centers to source potential candidates) to large-scale operations (like visiting a wide array of colleges and attending recruiting events throughout the spring and fall semester). Campus recruitment often involves working with university career services centers and attending career fairs to meet in-person with college students and recent graduates. Context of our Dataset: Our dataset revolves around the placement season…
Read More
Outliers and Various methods of Detection

Outliers and Various methods of Detection

WHAT IS AN OUTLIER? Outlier is an observation that is numerically distant from the rest of the data or in a simple word it is the value which is out of the range.let’s take an example to check what happens to a data set with and data set without outliers. Data without outlierData with outlierData1,2,3,3,4,5,41,2,3,3,4,5,400Mean3.14259.714Median33Standard Deviation1.345185150.057 As you can see, data set with outliers has significantly different mean and standard deviation. In the first scenario, we will say that average is 3.14. But with the outlier, average soars to 59.71. This would change the estimate completely. Lets take a real…
Read More
What is Supervised Learning

What is Supervised Learning

In Machine Learning we generally talk about three types of learning Supervised LearningUnsupervised LearningReinforcement Learning In this post We'll be discussing Supervised learning. Supervised Learning Supervised learning is learning a model based on a set of labeled example. The technical description of a labeled example maybe expressed as a vector containing parameters paired with a desired output value. But a friendlier way of saying that is that supervised learning models are given example of what worked or what not. So they have some experience and use that experience to make current and future decisions. There are plenty if problems machine…
Read More
A Data Science Framework: To Achieve 99% Accuracy :Part 3

A Data Science Framework: To Achieve 99% Accuracy :Part 3

Part 3-Model Implementation This post is in continuation of previous post. If you have not read it yet I recommend you to visit here Step 5: Model Data Data Science is a multi-disciplinary field between mathematics (i.e. statistics, linear algebra, etc.), computer science (i.e. programming languages, computer systems, etc.) and business management (i.e. communication, subject-matter knowledge, etc.). Most data scientist come from one of the three fields, so they tend to lean towards that discipline. However, data science is like a three-legged stool, with no one leg being more important than the other. So, this step will require advanced knowledge in…
Read More