Hierarchical Clustering Algorithm

Hierarchical Clustering Algorithm

Hierarchical Clustering is a type of unsupervised machine learning algorithm used to group similar data points together. The goal of this algorithm is to create a hierarchy of clusters, where each cluster is a subset of the previous one. The algorithm starts by treating each data point as its own cluster. It then repeatedly merges the two closest clusters, until all points are in the same cluster or a stopping criterion is met. The result is a tree-like structure called a dendrogram, which shows the hierarchy of the clusters. There are two main types of Hierarchical Clustering: Agglomerative and Divisive.…
Read More
How to handle categorical data in machine learning

How to handle categorical data in machine learning

Understanding Categorical Data and its Importance in Machine Learning Categorical data is a type of data that can be divided into distinct groups or categories. In machine learning, it is common to encounter categorical data in the form of labels, such as a classification problem where the output is a set of predefined categories. Handling categorical data is an important step in preprocessing your data for machine learning, as the algorithms used in machine learning often require numerical input. One of the most common ways to handle categorical data is through encoding. Encoding involves converting categorical data into a numerical…
Read More
K-Means Clustering Algorithm

K-Means Clustering Algorithm

K-Means is a widely used clustering algorithm that partitions a set of data points into K clusters, where each cluster is defined by its centroid. The goal of the algorithm is to minimize the sum of squared distances between each data point and its closest centroid. The algorithm starts by randomly selecting K initial centroids and assigning each data point to the closest centroid. Then, it iteratively updates the position of the centroids and reassigns each data point to the closest centroid until the assignments no longer change. The algorithm terminates when the centroids reach a stable position. Mathematical Intuition…
Read More
How to connect OpenAI api with python code

How to connect OpenAI api with python code

To connect OpenAI API with Python code, you will need to use the OpenAI Python library, which can be installed using pip: pip install openai You will also need to have an API key for the OpenAI service you want to use. You can get an API key by creating an account on the OpenAI website. Once you have the OpenAI library and an API key, you can use the following code as an example on how to connect the OpenAI API with Python: import openai # Set the API key openai.api_key = "YOUR_API_KEY" # Define the prompt prompt =…
Read More
Random Forest Algorithm

Random Forest Algorithm

Random Forest is a robust machine-learning algorithm that is used for both classification and regression tasks. It is a type of ensemble learning method, which means that it combines multiple decision trees to create a more accurate and stable model. The mathematical intuition behind Random Forest is rooted in the concept of decision trees and bagging. A decision tree is a tree-like structure in which the internal nodes represent the feature(s) of the data, the branches represent the decision based on those features, and the leaves represent the output or class label. Each internal node in a decision tree represents…
Read More
Decision Tree

Decision Tree

Decision tree algorithms are a type of supervised learning algorithm used to solve both regression and classification problems. The goal is to create a model that predicts the value of a target variable based on several input variables. Decision trees use a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. The model is based on a decision tree that can be used to map out all possible outcomes of a decision. A decision tree algorithm works by breaking down a dataset into smaller and smaller subsets while at the same time, an…
Read More
Support Vector Machine

Support Vector Machine

Support Vector Machines (SVM) is a supervised machine learning algorithm that can be used for classification or regression tasks. The goal of the SVM algorithm is to find the hyperplane in an N-dimensional space that maximally separates the two classes. Mathematical Intuition Support Vector Machines (SVMs) are a type of supervised machine learning algorithm that can be used for classification or regression tasks. The goal of an SVM is to find the hyperplane in a high-dimensional space that maximally separates the different classes. Imagine we have two classes of data points, represented by circles and rectangles The SVM algorithm will…
Read More
Steps to Create a Tensorflow Model

Steps to Create a Tensorflow Model

There are 3 fundamental steps to creating a model Create a Model -> Connect the layers of NN yourself by using Sequential or Functional API or import a previously built model(Transfer Learning)Compile a Model -> Define how a model's performance should be measured(metrics) and how to improve it by using an optimizer(Adam, SGD, etc.)Fit a Model -> Model tries to find a pattern in the data. Sequential and Functional API Sequential Model: A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. A Sequential model is not…
Read More
How to deal with outliers

How to deal with outliers

In this Notebook, we will describe how to deal with outliers #Importing the dataset import pandas as p import numpy as n import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import load_boston import warnings warnings.filterwarnings('ignore') boston=load_boston() #it is stored as dictionary df= p.DataFrame(boston['data'],columns=boston['feature_names']) df.head() sns.distplot(df['RM']) #As we can see outliers sns.boxplot(df['RM']) Trimming outliers from the dataset def outliers(data): IQR=data.quantile(0.75)-data.quantile(0.25) lr=data.quantile(0.25)-(1.5*IQR) #lower range hr=data.quantile(0.70)+(1.5*IQR) #higher range return data.loc[~(n.where(data<lr,True,n.where(data>hr,True,False)))] outliers(df['RM']) #as we csn there is no outliers sns.boxplot(outliers(df['RM'])) #We can find outlier with using mean and standard deviation in case of IQR def outliers(data,k): lr=data.mean()-(data.std()*k) #where n is number hr=data.mean()+(data.std()*k)…
Read More
What is data leakage in Machine Learning

What is data leakage in Machine Learning

When training a machine learning model, we normally prefer selecting a generalized model which is performing well both on training and validation/test data. However, there can be a situation where the model performs well during testing but fails to achieve the same level of performance with real-world (production data) usage. For example, your model is giving 95% accuracy on test data but as soon as it productized and acts on real data, it fails to achieve the same or nearby performance. Such a discrepancy between test performance and real-world performance is often referred to as Leakage. What is Train/Test bleed?…
Read More