Locally Linear Embedding (LLE) is another very powerful nonlinear dimensionality reduction (NLDR) technique. It is a Manifold Learning technique that does not rely on projections like the previous algorithms. In a nutshell, LLE works by first measuring how each training instance linearly relates to its closest neighbors (c.n.), and then looking for a low-dimensional representation of the training set where these local relationships are best preserved (more details shortly). This makes it particularly good at unrolling twisted manifolds, especially when there is not too much noise.

For example, the following code uses Scikit-Learn’s LocallyLinearEmbedding class to unroll the Swiss roll. The resulting 2D dataset is shown in Figure below As you can see, the Swiss roll is completely unrolled and the distances between instances are locally well preserved. However, distances are not preserved on a larger scale: the left part of the unrolled Swiss roll is stretched, while the right part is squeezed. Nevertheless, LLE did a pretty good job at modeling the manifold

from sklearn.manifold import LocallyLinearEmbedding
lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced = lle.fit_transform(X)
Scatter plot of unrolled Swiss roll using Locally Linear Embedding (LLE), showing points represented in two dimensions (z1 and z2) with a gradient color scheme ranging from black to yellow.
Unrolled Swiss roll using LLE

Here’s how LLE works: first, for each training instance x(i) , the algorithm identifies its k closest neighbors (in the preceding code k = 10), then tries to reconstruct x(i) as a linear function of these neighbors. More specifically, it finds the weights wi,j such that the squared distance between x(i) and ∑j = 1,m wi, jx(j) is as small as possible, assuming wi,j= 0 if x(j)is not one of the k closest neighbors of x(i) . Thus the first step of LLE is the constrained optimization problem described in Equation 8-4, where W is the weight matrix containing all the weights wi,j. The second constraint simply normalizes the weights for each training instance x(i).

LLE step 1: linearly modeling local relationships

Mathematical optimization formula for Locally Linear Embedding (LLE), showing the process to compute weights for reconstructing instances based on their closest neighbors.

After this step, the weight matrix W (containing the weights wi, j) encodes the local linear relationships between the training instances. Now the second step is to map the training instances into a d-dimensional space (where d < n) while preserving these local relationships as much as possible. If z(i) is the image of x(i) in this d-dimensional space, then we want the squared distance between z(i) and ∑j = 1 m wi, j z(j) to be as small as possible. This idea leads to the unconstrained optimization problem described in Equation 8-5. It looks very similar to the first step, but instead of keeping the instances fixed and finding the optimal weights, we are doing the reverse: keeping the weights fixed and finding the optimal position of the instances’ images in the lowdimensional space. Note that Z is the matrix containing all z(i).

LLE step 2: reducing dimensionality while preserving relationships

Mathematical equation representing the optimization problem in Locally Linear Embedding (LLE), showcasing how to minimize the squared distance between points in a reduced dimensional space.

Scikit-Learn’s LLE implementation has the following computational complexity: O(m log(m)n log(k)) for finding the k nearest neighbors, O(mnk^3 ) for optimizing the weights, and O(dm^2 ) for constructing the low-dimensional representations. Unfortunately, the m^2 in the last term makes this algorithm scale poorly to very large datasets.

Other Dimensionality Reduction Techniques

There are many other dimensionality reduction techniques, several of which are
available in Scikit-Learn. Here are some of the most popular:

  • Multidimensional Scaling (MDS) reduces dimensionality while trying to preserve
    the distances between the instances
  • Isomap creates a graph by connecting each instance to its nearest neighbors, then reduces dimensionality while trying to preserve the geodesic distances9 between the instances.
  • t-Distributed Stochastic Neighbor Embedding (t-SNE) reduces dimensionality
    while trying to keep similar instances close and dissimilar instances apart. It is
    mostly used for visualization, in particular to visualize clusters of instances in
    high-dimensional space (e.g., to visualize the MNIST images in 2D). Linear Discriminant Analysis (LDA) is actually a classification algorithm, but during training it learns the most discriminative axes between the classes, and these axes can then be used to define a hyperplane onto which to project the data. The benefit is that the projection will keep classes as far apart as possible, so LDA is a good technique to reduce dimensionality before running another classification algorithm such as an SVM classifier.
A comparison of three dimensionality reduction techniques: Multidimensional Scaling (MDS) visualized with a circular pattern, Isomap showing a grid-like structure, and t-Distributed Stochastic Neighbor Embedding (t-SNE) illustrating distinct clusters.

By

Leave a Reply

Discover more from Geeky Codes

Subscribe now to keep reading and get access to the full archive.

Continue reading