Pandas

How to deal with outliers

How to deal with outliers

In this Notebook, we will describe how to deal with outliers #Importing the dataset import pandas as p import numpy as n import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import load_boston import warnings warnings.filterwarnings('ignore') boston=load_boston() #it is stored as dictionary df= p.DataFrame(boston['data'],columns=boston['feature_names']) df.head() sns.distplot(df['RM']) #As we can see outliers sns.boxplot(df['RM']) Trimming outliers from the dataset def outliers(data): IQR=data.quantile(0.75)-data.quantile(0.25) lr=data.quantile(0.25)-(1.5*IQR) #lower range hr=data.quantile(0.70)+(1.5*IQR) #higher range return data.loc[~(n.where(data<lr,True,n.where(data>hr,True,False)))] outliers(df['RM']) #as we csn there is no outliers sns.boxplot(outliers(df['RM'])) #We can find outlier with using mean and standard deviation in case of IQR def outliers(data,k): lr=data.mean()-(data.std()*k) #where n is number hr=data.mean()+(data.std()*k)…
Read More
How to do Feature Encoding and Exploratory Data Analysis

How to do Feature Encoding and Exploratory Data Analysis

Categorical variables are those values that are selected from a group of categories or labels. For example, the variable Gender with the values of male or female is categorical, and so is the variable marital status with the values of never married, married, divorced, or widowed. In some categorical variables, the labels have an intrinsic order, for example, in the variable Student's grade, the values of A, B, C, or Fail are ordered, A being the highest grade and Fail the lowest. These are called ordinal categorical variables. Variables in which the categories do not have an intrinsic order are…
Read More
Clustering & Visualization of Clusters using PCA

Clustering & Visualization of Clusters using PCA

Customer's Segmentation based on their Credit Card usage behavior Dataset for this notebook consists of the credit card usage behavior of customers with 18 behavioral features. Segmentation of customers can be used to define marketing strategies. Content of this Kernel: Data PreprocessingClustering using KMeansInterpretation of ClustersVisualization of Clusters using PCA # This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load in import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g.…
Read More
How is the axis indexed in NumPy’s array?

How is the axis indexed in NumPy’s array?

By definition, the axis number of the dimension is the index of that dimension within the array's shape. It is also the position used to access that dimension during indexing. So we'll be learning how is the axis indexed For example, if a 2D array a has shape (5,6), then you can access a[0,0] up to a[4,5]. Axis 0 is thus the first dimension (the "rows"), and axis 1 is the second dimension (the "columns"). In higher dimensions, where "row" and "column" stop really making sense, try to think of the axes in terms of the shapes and indices involved. If you do  .sum(axis=n), for example,…
Read More
Stationarity Analysis in Time Series Data

Stationarity Analysis in Time Series Data

Hey Geeks !!! in this blog, we'll dive into the concept of stationarity using time series data. We'll first understand what is time-series data, what is stationarity, why and when data should be stationary etc...We'll use the dataset I created specifically for this blog to analyze whether the data is stationary or not. We'll also see how to convert the non-stationary data to stationary. Index IntroductionImport Libraries and DependenciesDefine TimeSeriesData ClassImport DatasetAccumulating Number of Sales by monthCreate objectStationarity TestsGraphical TestRolling-Statistics TestAugmented Dickey-Fuller Test (ADF)Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS)Zivot-Andrews TestConclusionConvert data to StationaryDerivativesTransformation using Logarithmic FunctionADF TestKPSS TestZivot-Andrews TestRolling-Statistics TestConclusion 1. Introduction 1.1…
Read More
How to create Movie Recommendation System

How to create Movie Recommendation System

In this notebook, I will try to use a few recommendation algorithms (content-based, popular-based and shared filters) and try to build a collection of these models to come up with our final movie recommendation system. For us, we have two MovieLens data sets. Full Data Set: Contains 26,000,000 ratings and 750,000 tag requests applied to 45,000 movies by 270,000 users. Includes genome tag data with 12 million affiliate scores on 1,100 tags.Small Data Set: Includes 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.I will create a Simple Recommendation using movies from the Full Database while…
Read More
How to Predict Movie will be Flop or Hit and it’s Revenue?

How to Predict Movie will be Flop or Hit and it’s Revenue?

The Birth of the motion picture camera in the late 18th century gave birth to the most powerful form of entertainment available: Cinema. Movies have been able to entertain audiences from the emergence of a single second of horse racing in the 1890s to the introduction of sound in the 1920s to the birth of color in the 1930s to create 3D Movies in early 2010. Cinema had a humble background in terms of design, direction and acting (especially due to its very short time in its early days) but since then, the film industry around the world has been…
Read More
How to create a simple movie recommendation System

How to create a simple movie recommendation System

Introduction This part of the Content Editor Internship ” “Every time I go to a movie, it's magical, no matter what the movie is about. - Steven Spielberg Everyone loves movies regardless of age, gender, race, color, or location. We are all in some way connected to each other in this amazing way. But most interesting is the fact that our choices and combinations are different in terms of our preferences. Some people like movies that are specific to the genre, be it entertaining, romantic, or sci-fi, while others focus on the main characters and directors. Considering all of that,…
Read More
How to collect Covid19 Data using API in Python

How to collect Covid19 Data using API in Python

In this tutorial, we will be collecting covid19 Data using API in Python. What is API? API (Application Programming Interface) is a computing interface that interacts between multiple software.  What is JSON? JSON (JavaScript Object Notation) is a lightweight format for storing and transporting data. It is used to send data from server to web. Required modules:  matplotlibrequestspandasjson Commands to install modules: pip install matplotlib pip install requests pip install pandas Ignore this section if you've already installed these modules. Steps: Importing all required modules.Calling API and getting JSON data.Getting the Data for a particular stateVisualization of data. The below URL redirects…
Read More
What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

What is Dimensionality Reduction? Overview, Objectives, and Popular Techniques

Table of Contents What is Dimensionality ReductionWhy Dimensionality Reduction is ImportantDimensionality Reduction Methods and ApproachesDimensionality Reduction TechniquesDimensionality Reduction Example Learning by machine is not an easy task. Okay, so that's a lesser statement. Artificial Intelligence and machine learning represent a major step in making computers think like humans, but both concepts are challenging to understand. Fortunately, the profit is worth the effort. Today we are dealing with the process of reducing size, analyzing a key component in machine learning. We will cover its meaning, why it is important, how to do it, and give you a related example to illustrate…
Read More