Skip to main content


Home Unsupervised machine learning

Unsupervised machine learning

Unsupervised machine learning definition

Unsupervised machine learning is where algorithms find patterns in data on their own, without guidance. Unlike supervised learning, which uses clear instructions, unsupervised learning focuses on exploring the data's hidden structures.

See also: artificial intelligence, machine learning, adversarial machine learning, generative AI, cluster analysis, anomaly-based detection

How unsupervised machine learning works

  1. 1.Unsupervised learning starts with a dataset that doesn't have labels or target values. It typically consists of many features for each data point.
  2. 2.Depending on the goal, a suitable unsupervised algorithm is selected. The two primary goals in unsupervised learning are clustering and dimensionality reduction.
  3. 3.The chosen algorithm processes the data to find patterns or structures:
    • Clustering algorithms group similar data points together. Examples of clustering methods include K-means clustering, hierarchical clustering, and DBSCAN.
    • Dimensionality reduction algorithms reduce the number of data features but keep a lot of the original information. Principal component analysis (PCA) and t-SNE are examples of this approach.
  4. 4.Once the model is trained, it reveals hidden structures or patterns in the data. Clustering identifies groups, which is useful for customer segmentation or spotting anomalies.
  5. 5.Dimensionality reduction gives a simpler data view for easier visualization or processing.
  6. 6.Evaluating these models is tricky because there's no “ground truth” to compare against. But some metrics help check the quality of the model's output.
  7. 7.This learning type helps gain insights, find anomalies, or prepare data for other tasks.

History of unsupervised machine learning

Early beginnings (before the 20th century): The foundations of statistics and probability, laid by Gauss and Laplace, set the groundwork for the algorithms we use today.

Mid-20th century: Scientists began developing clustering techniques. Hierarchical clustering, used to study taxonomies in biology, led to many today's methods. Also, techniques like principal component analysis (PCA) were created in the early 1900s but became key tools in the mid-century.

1970s-1980s: Algorithms like K-means were developed for more efficient clustering of data points. Some neural networks hinted at their potential use in unsupervised learning.

1990s: Backpropagation and a renewed interest in neural networks led to early autoencoders. Association rule learning algorithms like Apriori were developed to find relations between variables in large databases. This method would become a cornerstone in market-basket analysis.

2000s: Deep autoencoders gained traction. Manifold learning tools such as t-SNE were developed to simplify and visualize complex data.

2010s and beyond: Generative adversarial networks (GANs) and variational autoencoders (VAEs) emerged as tools for generating data. Another approach, self-supervised learning, generates labels from the data itself. It blends supervised and unsupervised learning methods and has shown promise in areas like image analysis.