Machine learning

Hierarchical Clustering Algorithm

Hierarchical Clustering Algorithm

Hierarchical Clustering is a type of unsupervised machine learning algorithm used to group similar data points together. The goal of this algorithm is to create a hierarchy of clusters, where each cluster is a subset of the previous one. The algorithm starts by treating each data point as its own cluster. It then repeatedly merges the two closest clusters, until all points are in the same cluster or a stopping criterion is met. The result is a tree-like structure called a dendrogram, which shows the hierarchy of the clusters. There are two main types of Hierarchical Clustering: Agglomerative and Divisive.…
Read More
K-Means Clustering Algorithm

K-Means Clustering Algorithm

K-Means is a widely used clustering algorithm that partitions a set of data points into K clusters, where each cluster is defined by its centroid. The goal of the algorithm is to minimize the sum of squared distances between each data point and its closest centroid. The algorithm starts by randomly selecting K initial centroids and assigning each data point to the closest centroid. Then, it iteratively updates the position of the centroids and reassigns each data point to the closest centroid until the assignments no longer change. The algorithm terminates when the centroids reach a stable position. Mathematical Intuition…
Read More
Random Forest Algorithm

Random Forest Algorithm

Random Forest is a robust machine-learning algorithm that is used for both classification and regression tasks. It is a type of ensemble learning method, which means that it combines multiple decision trees to create a more accurate and stable model. The mathematical intuition behind Random Forest is rooted in the concept of decision trees and bagging. A decision tree is a tree-like structure in which the internal nodes represent the feature(s) of the data, the branches represent the decision based on those features, and the leaves represent the output or class label. Each internal node in a decision tree represents…
Read More
Decision Tree

Decision Tree

Decision tree algorithms are a type of supervised learning algorithm used to solve both regression and classification problems. The goal is to create a model that predicts the value of a target variable based on several input variables. Decision trees use a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. The model is based on a decision tree that can be used to map out all possible outcomes of a decision. A decision tree algorithm works by breaking down a dataset into smaller and smaller subsets while at the same time, an…
Read More
Support Vector Machine

Support Vector Machine

Support Vector Machines (SVM) is a supervised machine learning algorithm that can be used for classification or regression tasks. The goal of the SVM algorithm is to find the hyperplane in an N-dimensional space that maximally separates the two classes. Mathematical Intuition Support Vector Machines (SVMs) are a type of supervised machine learning algorithm that can be used for classification or regression tasks. The goal of an SVM is to find the hyperplane in a high-dimensional space that maximally separates the different classes. Imagine we have two classes of data points, represented by circles and rectangles The SVM algorithm will…
Read More
Five Courses that can be finished in one week to advance Pandas skills

Five Courses that can be finished in one week to advance Pandas skills

𝟏. 𝐖𝐫𝐢𝐭𝐢𝐧𝐠 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐂𝐨𝐝𝐞 𝐰𝐢𝐭𝐡 𝐩𝐚𝐧𝐝𝐚𝐬: This course will build on your knowledge of Python and the panda's library and introduce you to efficient built-in pandas functions to perform tasks faster. Link:- Get the course here 𝟐. 𝐉𝐨𝐢𝐧𝐢𝐧𝐠 𝐃𝐚𝐭𝐚 𝐰𝐢𝐭𝐡 𝐩𝐚𝐧𝐝𝐚𝐬: In this course, you will learn to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. Get this course here 𝟑. 𝐒𝐭𝐫𝐞𝐚𝐦𝐥𝐢𝐧𝐞𝐝 𝐃𝐚𝐭𝐚 𝐈𝐧𝐠𝐞𝐬𝐭𝐢𝐨𝐧 𝐰𝐢𝐭𝐡 𝐩𝐚𝐧𝐝𝐚𝐬: This course teaches you how to build pipelines to import data kept in common storage formats. You’ll use pandas to get data from a variety of sources, from spreadsheets of…
Read More
How to do Feature Encoding and Exploratory Data Analysis

How to do Feature Encoding and Exploratory Data Analysis

Categorical variables are those values that are selected from a group of categories or labels. For example, the variable Gender with the values of male or female is categorical, and so is the variable marital status with the values of never married, married, divorced, or widowed. In some categorical variables, the labels have an intrinsic order, for example, in the variable Student's grade, the values of A, B, C, or Fail are ordered, A being the highest grade and Fail the lowest. These are called ordinal categorical variables. Variables in which the categories do not have an intrinsic order are…
Read More
8 Essential Machine Learning Terms You must Know

8 Essential Machine Learning Terms You must Know

Data Wrangling Data Wrangling is the process of gathering, selecting, cleaning, structuring, and enriching raw data into the desired format for better decision-making in less time. If you want to create an efficient ETL pipeline(Extract, transform, and load) or create beautiful data visualizations, you should be prepared to do a lot of data wrangling-springboard. Data Imputation Data Imputation is the substitution of estimated values for missing or inconsistent data items(fields). The substituted values are intended to create a data record that does not fail edits. The most common technique is mean imputation, where you take the mean of the existing…
Read More
Which one to use – RandomForest vs SVM vs KNN?

Which one to use – RandomForest vs SVM vs KNN?

The basic steps to deciding which algorithm to use will depend on a number of factors. A few factors which one can look for are listed below: The number of examples in the training set.Dimensions of featured space.Do we have correlated features?Is overfitting a problem? These are just a few factors on which the selection of the algorithm may depend. Once you have the answers to all these questions, you can move ahead to decide the algorithm. SVM The main reason to use an SVM instead is that the problem might not be linearly separable. In that case, we will…
Read More
Clustering & Visualization of Clusters using PCA

Clustering & Visualization of Clusters using PCA

Customer's Segmentation based on their Credit Card usage behavior Dataset for this notebook consists of the credit card usage behavior of customers with 18 behavioral features. Segmentation of customers can be used to define marketing strategies. Content of this Kernel: Data PreprocessingClustering using KMeansInterpretation of ClustersVisualization of Clusters using PCA # This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load in import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g.…
Read More