Data Science

8 Essential Machine Learning Terms You must Know

8 Essential Machine Learning Terms You must Know

Data Wrangling Data Wrangling is the process of gathering, selecting, cleaning, structuring, and enriching raw data into the desired format for better decision-making in less time. If you want to create an efficient ETL pipeline(Extract, transform, and load) or create beautiful data visualizations, you should be prepared to do a lot of data wrangling-springboard. Data Imputation Data Imputation is the substitution of estimated values for missing or inconsistent data items(fields). The substituted values are intended to create a data record that does not fail edits. The most common technique is mean imputation, where you take the mean of the existing…
Read More
What are Bias and Variance in Machine Learning

What are Bias and Variance in Machine Learning

As machine learning is increasingly used in applications, machine learning algorithms have gained more scrutiny. With larger data sets, various implementations, algorithms, and learning requirements, it has become even more complex to create and evaluate ML models since all those factors directly impact the overall accuracy and learning outcome of the model. This is further skewed by false assumptions, noise, and outliers. Machine learning models cannot be a black box. The user needs to be fully aware of their data and algorithms to trust the outputs and outcomes. Any issues in the algorithm or polluted data set can negatively impact the ML model. The main…
Read More
Which one to use – RandomForest vs SVM vs KNN?

Which one to use – RandomForest vs SVM vs KNN?

The basic steps to deciding which algorithm to use will depend on a number of factors. A few factors which one can look for are listed below: The number of examples in the training set.Dimensions of featured space.Do we have correlated features?Is overfitting a problem? These are just a few factors on which the selection of the algorithm may depend. Once you have the answers to all these questions, you can move ahead to decide the algorithm. SVM The main reason to use an SVM instead is that the problem might not be linearly separable. In that case, we will…
Read More
Clustering & Visualization of Clusters using PCA

Clustering & Visualization of Clusters using PCA

Customer's Segmentation based on their Credit Card usage behavior Dataset for this notebook consists of the credit card usage behavior of customers with 18 behavioral features. Segmentation of customers can be used to define marketing strategies. Content of this Kernel: Data PreprocessingClustering using KMeansInterpretation of ClustersVisualization of Clusters using PCA # This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load in import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g.…
Read More
What is clustering in machine learning?

What is clustering in machine learning?

Clustering is one of the most popular techniques in unsupervised learning where data is grouped based on the similarity of the data points. The basic principle behind clustering is the assignment of a given set of observations into subgroups or clusters such that observations present in the same clusters have a degree of similarity. It is a method of unsupervised learning since there is no label attached to the data points. The machine has to learn the features and patterns all by itself without any given input-output mapping. There are several clustering in Machine Learning, Some common clustering algorithms are Centroid-based clustering: The first and foremost…
Read More
How to create Movie Recommendation System

How to create Movie Recommendation System

In this notebook, I will try to use a few recommendation algorithms (content-based, popular-based and shared filters) and try to build a collection of these models to come up with our final movie recommendation system. For us, we have two MovieLens data sets. Full Data Set: Contains 26,000,000 ratings and 750,000 tag requests applied to 45,000 movies by 270,000 users. Includes genome tag data with 12 million affiliate scores on 1,100 tags.Small Data Set: Includes 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.I will create a Simple Recommendation using movies from the Full Database while…
Read More
How to Predict Movie will be Flop or Hit and it’s Revenue?

How to Predict Movie will be Flop or Hit and it’s Revenue?

The Birth of the motion picture camera in the late 18th century gave birth to the most powerful form of entertainment available: Cinema. Movies have been able to entertain audiences from the emergence of a single second of horse racing in the 1890s to the introduction of sound in the 1920s to the birth of color in the 1930s to create 3D Movies in early 2010. Cinema had a humble background in terms of design, direction and acting (especially due to its very short time in its early days) but since then, the film industry around the world has been…
Read More
How to create a simple movie recommendation System

How to create a simple movie recommendation System

Introduction This part of the Content Editor Internship ” “Every time I go to a movie, it's magical, no matter what the movie is about. - Steven Spielberg Everyone loves movies regardless of age, gender, race, color, or location. We are all in some way connected to each other in this amazing way. But most interesting is the fact that our choices and combinations are different in terms of our preferences. Some people like movies that are specific to the genre, be it entertaining, romantic, or sci-fi, while others focus on the main characters and directors. Considering all of that,…
Read More
Starting Data Pipelines | Fundamentals of Data Engineering

Starting Data Pipelines | Fundamentals of Data Engineering

This article includes a comprehensive introduction with step-by-step definitions and code in data pipelines to introduce the basics of data engineering. Data pipelines are widely used in data science and machine learning and are essential in the process of machine learning to integrate data from multiple streams to gain business intelligence for competitive and profitable analysis. What is a Data Pipeline? Data pipeline is a set of rules that motivates and converts data from multiple sources to an area where new values ​​can be obtained. In the simplest way, the pipeline can only extract data from various sources such as…
Read More
What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

Table of Contents What is Dimensionality ReductionWhy Dimensionality Reduction is ImportantDimensionality Reduction Methods and ApproachesDimensionality Reduction TechniquesDimensionality Reduction Example Learning by machine is not an easy task. Okay, so that's a lesser statement. Artificial Intelligence and machine learning represent a major step in making computers think like humans, but both concepts are challenging to understand. Fortunately, the profit is worth the effort. Today we are dealing with the process of reducing size, analyzing a key component in machine learning. We will cover its meaning, why it is important, how to do it, and give you a related example to illustrate…
Read More