Blog
Unsupervised Learning – tell me more…
- September 11, 2023
- Posted by: William Dorrington
- Category: Beginner Level Machine Learning Unsupervised Learning
Let’s take a wander into the curious world of “Unsupervised Learning” which folds under the category of ‘Machine Learning‘. Machine Learning comprises of hundreds or different algorithmic approaches, see it as the ability for a machine to learn from data without explicit instruction. However all Machine learning can be divided into a few overarching categories – one of which is “Unsupervised learning” (the other Supervised).
This article will take the reader on a basic introduction to Supervised Learning. It is best to read the ‘Supervised Learning‘ article before reading this one.
So let’s make a start.
Unsupervised Learning
If you have read the Supervised learning article you’ll now that – that type of learning works by labelling inputs and outputs in a model so that the pattern between the two is learnt. Unsupervised is a little more chaotic than this.
In Unsupervised Learning, datasets are not labelled. This approach to Machine Learning focuses on the identifying relationships within the Features ( denoted as x). We have the input data (Features, x), but we do not have the Output data, as is the case in Supervised Learning. The model independently detect these relationships – hence the name “Unsupervised”.
Unsupervised learning, when utilised correctly, is a powerful tool in the area of Machine Learning. It can uncover patterns, and groupings of data they may be outside the realm of consideration. This is where Unsupervised Learning comes into it’s strength. It is particularly useful where outputs may not be available or change constantly, making it difficult to keep up with. A great example of this is Fraud detection. Fraud has been around from the inception of trade, since were able to trade chickens for pigs, shiny stones for a suit of armour, numbers on a credit card for a cinema ticket…you get the point. The nature of fraud is constantly evolving, changing daily even hourly. We expect and demand our chosen banking establishments to keep up with this aggressively evolving menace!
A really basic example of how Unsupervised Learning can be leveraged for Fraud detection is as follows. My name is William Dorrington, and I love nothing more than Rum and the occasional indulgence in a generous chunk of chocolate too. Now if the bank fed my transactions as the input dataset (Features, x), the model would notice that I have purchased many different delicious rums and moreish chocolate frequently. Consequently, it may start grouping these together under, let’s say for now, “normal activity”. The model would then be equipped to detect anomalies and outliers, these curious items that do not align to a pattern, a pattern that may have been elusive to a human eye/brain – in this case the seemingly out-of-character purchases of a bottle of water or a nice healthy red apple!
Important to note – although the unsupervised model may put the “Water” and “Apple” as anomalies it does not mean, nor does the model know, they are a fraudulent activity, they may be completely safe activity. Human intervention is often required to interpret the findings.
When developing Supervised models, humans primarily rely on their pre-existing knowledge. This approach has an inherent flaw – it is potentially biased and confined to the limitations of those training the models. These limitations relate to the trainers’ understanding of the significant inputs (Features, x) to focus on, and the anticipated outputs (Labels, y) to look out for.
Important to note – that although we stated Supervised models have bias we are not stating that unsupervised models do not. Unsupervised models can still show inherit bias due to the input data.
To further colour in our understanding let us carry on with the example of fraud detection and my unhealthy bank transactions. In a scenario where a Supervised model had been trained on more conventional “normal purchases” then these seemingly innocent unhealthy purchases may have slipped by without raising alarm. After all these are normal purchases. However, the situation is different when looked through the lens of my personal shopping habits, where these items are indeed abnormal. This is where the true power of Unsupervised comes to bear. As an Unsupervised model is not constrained by predefined labels it would analyse the broader pattern of data and note the frequent purchases of rum and chocolate – consequently flagging the apple and water as anomaly ready for further investigation.
Identifying nuance patterns and anomalies that might be ignored by supervise learning models during setup which might be limited by the preconceived understanding and biases of the glorious human trainers.
Unsupervised learning models do not label data with meaningful labels – instead, they group them under generic labels such as “Cluster 1”, “Cluster 2”, and so on. We can then proceed to interpret meaningful names for these groups based on their characteristics and patterns.
Categorisations of Unsupervised Learning
Unsupervised learning can be split into three types:
-
- Clustering algorithms
- Association Rules
- Dimensionality Reduction
These, in turn, will be covered later by the “Data Science Frontiers”.