Dimensionality Reduction

5 min readMay 12, 2022

A technique that lets you to view higher dimensional data in lower space

When you starts for solving a problem by taking dataset some of the features available in the dataset can be redundant that makes no sense in training data.These features needs to be reduced by some method.

Dimensionality reduction is the technique of reducing number of features in a dataset. The higher the number of features, the harder it gets to visualize the training set and then work on it. Most of these features are correlated, and hence redundant.

Also redundant features can create problem in modelling which is called curse of dimensionality

Curse of Dimensionality :

Idea which explains bunch of things that happen when dimensionality is high for a dataset.There a lots of weird things happen which you don’t think common sensical.There are number of problems given on wikipedia, i will discuss some of them

Hughe’s Phenomenon : When we fix dataset size, increasing dimension can drop performance of model
Increasing space and time complexity
Difficulty in clustering

Some Methods of Dimensionality Reduction

There are many methods categorized in dimensionality reduction lets discuss some of them

Principle Component Analysis :

We are humans can not visualize n-dimensional data and its hard to draw higher dimensional data on graph or paper.So what we can do to solve this problem is, we can skip some of the features.

PCA is a method of presenting data in the direction with maximal variance/spread because more spread gives more information.We will map the data to lower dimension with higher variance

Optimization function for PCA is :

Steps to follow for PCA :

Standardization: The data has to be transformed to a common scale by taking the difference between the original dataset with the mean of the whole dataset. This will make the distribution 0 centered.
Finding covariance: Covariance will help us to understand the relationship between the mean and original data.
Determining the principal components: Principal components can be determined by calculating the eigenvectors and eigenvalues. Eigenvectors are a special set of vectors that help us to understand the structure and the property of the data that would be principal components. The eigenvalues on the other hand help us to determine the principal components. The highest eigenvalues and their corresponding eigenvectors make the most important principal components.
Final output: It is the dot product of the standardized matrix and the eigenvector. Note that the number of columns or features will be changed.

You can follow above steps one by one, but scikit learn library already implemented PCA, we can leverage this

Lets load MNIST dataset(Dataset for images of numbers) — Link of dataset understanding

Applying PCA

As you we have reduced 64 dimensions to 2 dimensions,Now lets visualize 64 dimensional dataset to 2 dimensional space :

As you saw ,PCA did this to lower dimensional space in the direction of higher variance

2. t-distributed Stochastic Neighbour Embedding(t-SNE) :

As you see PCA fails here to visualize MNIST dataset because it is a linear algorithm and it can’t help in complex relationships

Basic Idea : Stochastic Neighbour Embedding

t-SNE tries to retain local shape of data when going from higher dimension to lower dimension and other side PCA tries to retain global shape of data.

As you see t-SNE trying to retain local neighbours of point

Crowding Problem in t-SNE :

It is a condition when we model high-dimensional data in low dimensions (2D or 3D). It is difficult to segregate the nearby data points from moderately distant data points and gaps can not form between clusters.

How to apply t-SNE and interpret its output

. We have two main parameters of t-SNE(perplexity,iteration).Check for multiple values of perplexity ,it should be less than size of dataset otherwise it will be messed up
Run the iteration till the shape of data doesn’t change much
If we run t-SNE multiple times you might get slightly different results with same perplexity and same step size(iteration) because t-SNE is not a deterministic algorithm,it is probabilistic(stochastic) algorithm
t-SNE doesn’t preserve distances between clusters
t-SNE expands dense clusters and shrinks sparse clusters

t-SNE on MNIST dataset results:

Refer this beautiful blog for t-SNE and play with multiple sets: https://distill.pub/2016/misread-tsne/

Dimensionality Reduction

Basic Idea : Stochastic Neighbour Embedding

Written by SanDeep DuBey