Geeky Codes-Data Science

Data Science Deep Learning Machine Learning NLP

What is Stochastic Gradient Descent?

July 2, 2025 Geeky Codes No Comments

Stochastic Gradient Descent (SGD) is an optimization algorithm commonly used in machine learning for training models, particularly in large-scale and online learning settings. It is an iterative optimization algorithm that…

Data Science Machine Learning Pandas Python

Quintile Analysis: Bringing it all togetherand making decisions

July 2, 2025 No Comments

Introduction Quintile analysis is a statistical method used to divide a data set into five equal parts, each representing 20% of the total observations. This method is often used in…

Data Science Machine Learning Pandas Python

Getting Started With Pandas | Part 1

July 2, 2025 No Comments

Installation or Setup Detailed instructions on getting pandas set up or installed can be found here in the official documentation. Installing pandas with Anaconda Installing pandas and the rest of…

Data Science Machine Learning

Selecting the Number of Clusters

July 2, 2025 Geeky Codes No Comments

With K-Means, you could use the inertia or the silhouette score to select the appropriate number of clusters, but with Gaussian mixtures, it is not possible to use these metrics…

Data Science Machine Learning NLP

Tokenization in NLP

July 2, 2025 Geeky Codes No Comments

Word Level Tokenzation Splitting text into individual words “the quick brown fox” -> BUT Character Level Tokenization Splitting text into individual characters “the quick brown fox” -> But N-GRAM MODELS…

Data Science Machine Learning Pandas Python

Appending to DataFrame | Pandas

July 2, 2025 No Comments

Appending a new row to DataFrame import pandas as pd df = pd.DataFrame(columns = ) Appending a row by a single column value: df.loc = 1 df Appending a row,…

Data Science Machine Learning Python

What is Exploratory Data Analysis (EDA)?

July 2, 2025 Geeky Codes No Comments

Exploratory Data Analysis (EDA) is an essential step in any data science project. It involves investigating and analyzing datasets to understand their characteristics, identify patterns, detect outliers, and uncover relationships…

Data Science Machine Learning Unsupervised Learning

Anomaly Detection using Gaussian Mixtures

July 2, 2025 No Comments

Introduction Anomaly detection (also called outlier detection) is the task of detecting instances that deviate strongly from the norm. These instances are of course called anomalies or outliers, while the…

Data Science Machine Learning

Bayesian Gaussian Mixture Models

July 2, 2025 No Comments

Rather than manually searching for the optimal number of clusters, it is possible to use instead the BayesianGaussianMixture class which is capable of giving weights equal (or close) to zero…

Data Science Machine Learning Unsupervised Learning

Understanding DBSCAN Clustering Algorithm: Implementation in Python

July 2, 2025 No Comments

Before we move on to Gaussian mixture models, let’s take a look at DBSCAN, another popular clustering algorithm that illustrates a very different approach based on local density estimation. This…

Data Science Machine Learning Unsupervised Learning

Using Clustering for Semi-Supervised Learning

July 2, 2025 No Comments

Another use case for clustering is in semi-supervised learning, when we have plenty of unlabeled instances and very few labeled instances. Let’s train a logistic regression model on a sample…

Data Science Machine Learning Python Unsupervised Learning

Using clustering for image segmentation

July 2, 2025 Geeky Codes No Comments

Image segmentation is the task of partitioning an image into multiple segments. In semantic segmentation, all pixels that are part of the same object type get assigned to the same…

Data Science Machine Learning Support Vector Machine

Accelerated K-Means and Mini-batch K-Means

July 2, 2025 Geeky Codes No Comments

Introduction Another important improvement to the K-Means algorithm was proposed in a 2003 paper by Charles Elkan. It considerably accelerates the algorithm by avoiding many unnecessary distance calculations: this is…

Data Science Machine Learning Unsupervised Learning

Implementation of K-Means Clustering in Machine Learning

July 2, 2025 No Comments

Consider the unlabeled dataset represented in Figure below: you can clearly see 5 blobs of instances. The K-Means algorithm is a simple algorithm capable of clustering this kind of dataset…

Blog Data Science Machine Learning Unsupervised Learning

What is Clustering? A Simple Approach

July 2, 2025 Geeky Codes No Comments

As you enjoy a hike in the mountains, you stumble upon a plant you have never seen before. You look around and you notice a few more. They are not…

Data Science Decision Tree Machine Learning Random Forest

What is Stacking of Models in Machine Learning?

July 2, 2025 No Comments

The last Ensemble method we will discuss in this series is called stacking (short for stacked generalization). It is based on a simple idea: instead of using trivial functions (such…

Data Science Machine Learning

Locally linear Embedding For Dimensionality Reduction in Machine Learning

July 2, 2025 No Comments

Locally Linear Embedding (LLE) is another very powerful nonlinear dimensionality reduction (NLDR) technique. It is a Manifold Learning technique that does not rely on projections like the previous algorithms. In…

Data Science Machine Learning NLP

RAG using Llama 2, Langchain and ChromaDB

August 2, 2024 Geeky Codes No Comments

Introduction Objective Use Llama 2.0, Langchain and ChromaDB to create a Retrieval Augmented Generation (RAG) system. This will allow us to ask questions about our documents (that were not included…

Data Science Machine Learning

XGBoost: A Comprehensive Tutorial

April 18, 2024 Geeky Codes No Comments

Introduction: In the realm of machine learning algorithms, XGBoost stands tall as a powerhouse, renowned for its efficiency, effectiveness, and versatility. This tutorial aims to provide a thorough understanding of…

Data Science Interview Interview Questions Machine Learning

Ensemble Learning: A Comprehensive Guide to AdaBoost and Gradient Boosting

April 14, 2024 No Comments

Introduction: In the realm of machine learning, ensemble learning techniques such as AdaBoost and Gradient Boosting have revolutionized the way we approach classification and regression tasks. These powerful algorithms harness…

You missed

Generative AI Machine Learning

What is Stochastic Gradient Descent?

Quintile Analysis: Bringing it all togetherand making decisions

Getting Started With Pandas | Part 1

Selecting the Number of Clusters

Tokenization in NLP

Appending to DataFrame | Pandas

What is Exploratory Data Analysis (EDA)?

Anomaly Detection using Gaussian Mixtures

Bayesian Gaussian Mixture Models

Understanding DBSCAN Clustering Algorithm: Implementation in Python

Using Clustering for Semi-Supervised Learning

Using clustering for image segmentation

Accelerated K-Means and Mini-batch K-Means

Implementation of K-Means Clustering in Machine Learning

What is Clustering? A Simple Approach

What is Stacking of Models in Machine Learning?

Locally linear Embedding For Dimensionality Reduction in Machine Learning

RAG using Llama 2, Langchain and ChromaDB

XGBoost: A Comprehensive Tutorial

Ensemble Learning: A Comprehensive Guide to AdaBoost and Gradient Boosting

You missed

Why RAG Chatbots Struggle in Production

Measuring ROI for a GenAI Initiative in Healthcare

Unique Strings with Odd and Even Swapping Allowed

Applying SOLID Principles and Dependency Injection in Python

Category: Data Science

You missed