Random Forest

Which one to use – RandomForest vs SVM vs KNN?

Which one to use – RandomForest vs SVM vs KNN?

The basic steps to deciding which algorithm to use will depend on a number of factors. A few factors which one can look for are listed below: The number of examples in the training set.Dimensions of featured space.Do we have correlated features?Is overfitting a problem? These are just a few factors on which the selection of the algorithm may depend. Once you have the answers to all these questions, you can move ahead to decide the algorithm. SVM The main reason to use an SVM instead is that the problem might not be linearly separable. In that case, we will…
Read More
How to create Movie Recommendation System

How to create Movie Recommendation System

In this notebook, I will try to use a few recommendation algorithms (content-based, popular-based and shared filters) and try to build a collection of these models to come up with our final movie recommendation system. For us, we have two MovieLens data sets. Full Data Set: Contains 26,000,000 ratings and 750,000 tag requests applied to 45,000 movies by 270,000 users. Includes genome tag data with 12 million affiliate scores on 1,100 tags.Small Data Set: Includes 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.I will create a Simple Recommendation using movies from the Full Database while…
Read More
How to Predict Movie will be Flop or Hit and it’s Revenue?

How to Predict Movie will be Flop or Hit and it’s Revenue?

The Birth of the motion picture camera in the late 18th century gave birth to the most powerful form of entertainment available: Cinema. Movies have been able to entertain audiences from the emergence of a single second of horse racing in the 1890s to the introduction of sound in the 1920s to the birth of color in the 1930s to create 3D Movies in early 2010. Cinema had a humble background in terms of design, direction and acting (especially due to its very short time in its early days) but since then, the film industry around the world has been…
Read More
What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

Table of Contents What is Dimensionality ReductionWhy Dimensionality Reduction is ImportantDimensionality Reduction Methods and ApproachesDimensionality Reduction TechniquesDimensionality Reduction Example Learning by machine is not an easy task. Okay, so that's a lesser statement. Artificial Intelligence and machine learning represent a major step in making computers think like humans, but both concepts are challenging to understand. Fortunately, the profit is worth the effort. Today we are dealing with the process of reducing size, analyzing a key component in machine learning. We will cover its meaning, why it is important, how to do it, and give you a related example to illustrate…
Read More
Interpreting ACF and PACF | Time Series

Interpreting ACF and PACF | Time Series

Introduction Autocorrelation analysis is an important step in the Exploratory Data Analysis (EDA) of time series. The autocorrelation analysis helps in detecting hidden patterns and seasonality and in checking for randomness. It is especially important when you intend to use an ARIMA model for forecasting because the autocorrelation analysis helps to identify the AR and MA parameters for the ARIMA model. Overview FundamentalsAuto-Regressive and Moving Average ModelsStationarityAutocorrelation Function and Partial Autocorrelation FunctionOrder of AR, MA, and ARMA ModelExamplesAR(1) ProcessAR(2) ProcessMA(1) ProcessMA(2) ProcessPeriodicalTrendWhite NoiseRandom-WalkConstant🚀 Cheat SheetCase StudyBitcoinEthereumDiscussion on Random-Walk import numpy as np # linear algebra from numpy.random import seed import math import…
Read More
Predictive Analysis with different approaches

Predictive Analysis with different approaches

The goal of this notebook is not to do the best model for each Time series. It is just a comparison of few models when you have one Time Series. The presentation present a different approaches to forecast a Time Series.In this notebook we will be using web traffic data from kaggle. The plan of the notebook is: I. Importation & Data CleaningII. Aggregation & VisualizationIII. Machine Learning ApproachIV Basic Model ApproachV. ARIMA approach (Autoregressive Integrated Moving Average)VI. (FB) Prophet ApproachVII. Keras StarterVIII. Comparaison & Conclusion I. Importation & Data Cleaning In this first part we will choose the Time…
Read More
Analysis on campus recruitment data

Analysis on campus recruitment data

Campus recruitment is a strategy for sourcing, engaging and hiring young talent for internship and entry-level positions. College recruiting is typically a tactic for medium- to large-sized companies with high-volume recruiting needs, but can range from small efforts (like working with university career centers to source potential candidates) to large-scale operations (like visiting a wide array of colleges and attending recruiting events throughout the spring and fall semester). Campus recruitment often involves working with university career services centers and attending career fairs to meet in-person with college students and recent graduates. Context of our Dataset: Our dataset revolves around the placement season…
Read More
Outliers and Various methods of Detection

Outliers and Various methods of Detection

WHAT IS AN OUTLIER? Outlier is an observation that is numerically distant from the rest of the data or in a simple word it is the value which is out of the range.let’s take an example to check what happens to a data set with and data set without outliers. Data without outlierData with outlierData1,2,3,3,4,5,41,2,3,3,4,5,400Mean3.14259.714Median33Standard Deviation1.345185150.057 As you can see, data set with outliers has significantly different mean and standard deviation. In the first scenario, we will say that average is 3.14. But with the outlier, average soars to 59.71. This would change the estimate completely. Lets take a real…
Read More
A Data Science Framework: To Achieve 99% Accuracy :Part 3

A Data Science Framework: To Achieve 99% Accuracy :Part 3

Part 3-Model Implementation This post is in continuation of previous post. If you have not read it yet I recommend you to visit here Step 5: Model Data Data Science is a multi-disciplinary field between mathematics (i.e. statistics, linear algebra, etc.), computer science (i.e. programming languages, computer systems, etc.) and business management (i.e. communication, subject-matter knowledge, etc.). Most data scientist come from one of the three fields, so they tend to lean towards that discipline. However, data science is like a three-legged stool, with no one leg being more important than the other. So, this step will require advanced knowledge in…
Read More
Introduction to Ensembling /Stacking in Python | Part 2

Introduction to Ensembling /Stacking in Python | Part 2

This post is in continuation of previous post. If you have not read it yet I recommend you to visit here Feature importance generated from the different classifiers Now having learned our the first-level classifiers, we can utilize a very nifty feature of the Sklearn models and that is to output the importance of the various features in the training and test sets with one very simple line of code. As per the Sklearn documentation, most of the classifiers are built in with an attribute which returns feature importances by simply typing in .featureimportances. Therefore we will invoke this very useful…
Read More