Machine learning is one of the quickest growing technological fields, but despite how often the words “machine learning” are tossed around, it can be difficult to understand what machine learning is, precisely.
Machine learning doesn’t refer to just one thing, it’s an umbrella term that can be applied to many different concepts and techniques. Understanding machine learning means being familiar with different forms of model analysis, variables, and algorithms. Let’s take a close look at machine learning to better understand what it encompasses.
What Is Machine Learning?
While the term machine learning can be applied to many different things, in general, the term refers to enabling a computer to carry out tasks without receiving explicit line-by-line instructions to do so. A machine learning specialist doesn’t have to write out all the steps necessary to solve the problem because the computer is capable of “learning” by analyzing patterns within the data and generalizing these patterns to new data.
Machine learning systems have three basic parts:
The inputs are the data that is fed into the machine learning system, and the input data can be divided into labels and features. Features are the relevant variables, the variables that will be analyzed to learn patterns and draw conclusions. Meanwhile, the labels are classes/descriptions given to the individual instances of the data.
Features and labels can be used in two different types of machine learning problems: supervised learning and unsupervised learning.
Unsupervised vs. Supervised Learning
In supervised learning, the input data is accompanied by a ground truth. Supervised learning problems have the correct output values as part of the dataset, so the expected classes are known in advance. This makes it possible for the data scientist to check the performance of the algorithm by testing the data on a test dataset and seeing what percentage of items were correctly classified.
In contrast, unsupervised learning problems do not have ground truth labels attached to them. A machine learning algorithm trained to carry out unsupervised learning tasks must be able to infer the relevant patterns in the data for itself.
Supervised learning algorithms are typically used for classification problems, where one has a large dataset filled with instances that must be sorted into one of many different classes. Another type of supervised learning is a regression task, where the value output by the algorithm is continuous in nature instead of categorical.
Meanwhile, unsupervised learning algorithms are used for tasks like density estimation, clustering, and representation learning. These three tasks need the machine learning model to infer the structure of the data, there are no predefined classes given to the model.
Let’s take a brief look at some of the most common algorithms used in both unsupervised learning and supervised learning.
Common supervised learning algorithms include:
- Naive Bayes
- Support Vector Machines
- Logistic Regression
- Random Forests
- Artificial Neural Networks
Support Vector Machines are algorithms that divide up a dataset into different classes. Data points are grouped into clusters by drawing lines that separate the classes from one another. Points found on one side of the line will belong to one class, while the points on the other side of the line are a different class. Support Vector Machines aim to maximize the distance between the line and the points found on either side of the line, and the greater the distance the more confident the classifier is that the point belongs to one class and not another class.
Logistic Regression is an algorithm used in binary classification tasks when data points need to be classified as belonging to one of two classes. Logistic Regression works by labeling the data point either a 1 or a 0. If the perceived value of the data point is 0.49 or below, it is classified as 0, while if it is 0.5 or above it is classified as 1.
Decision Tree algorithms operate by dividing datasets up into smaller and smaller fragments. The exact criteria used to divide the data is up to the machine learning engineer, but the goal is to ultimately divide the data up into single data points, which will then be classified using a key.
A Random Forest algorithm is essentially many single Decision Tree classifiers linked together into a more powerful classifier.
The Naive Bayes Classifier calculates the probability that a given data point has occurred based on the probability of a prior event occurring. It is based on Bayes Theorem and it places the data points into classes based on their calculated probability. When implementing a Naive Bayes classifier, it is assumed that all the predictors have the same influence on the class outcome.
An Artificial Neural Network, or multi-layer perceptron, are machine learning algorithms inspired by the structure and function of the human brain. Artificial neural networks get their name from the fact that they are made out of many nodes/neurons linked together. Every neuron manipulates the data with a mathematical function. In artificial neural networks, there are input layers, hidden layers, and output layers.
The hidden layer of the neural network is where the data is actually interpreted and analyzed for patterns. In other words, it is where the algorithm learns. More neurons joined together make more complex networks capable of learning more complex patterns.
Unsupervised Learning algorithms include:
- K-means clustering
- Principal Component Analysis
K-means clustering is an unsupervised classification technique, and it works by separating points of data into clusters or groups based on their features. K-means clustering analyzes the features found in the data points and distinguishes patterns in them that make the data points found in a given class cluster more similar to each other than they are are to clusters containing the other data points. This is accomplished by placing possible centers for the cluster, or centroids, in a graph of the data and reassigning the position of the centroid until a position is found that minimizes the distance between the centroid and the points that belong to that centroid’s class. The researcher can specify the desired number of clusters.
Principal Component Analysis is a technique that reduces large numbers of features/variables down into a smaller feature space/fewer features. The “principal components” of the data points are selected for preservation, while the other features are squeezed down into a smaller representation. The relationship between the original data potions is preserved, but since the complexity of the data points is simpler, the data is easier to quantify and describe.
Autoencoders are versions of neural networks that can be applied to unsupervised learning tasks. Autoencoders are capable of taking unlabeled, free-form data and transforming them into data that a neural network is capable of using, basically creating their own labeled training data. The goal of an autoencoder is to convert the input data and rebuild it as accurately as possible, so it’s in the incentive of the network to determine which features are the most important and extract them.
Study Shows That Workers Now Trust A Robot More Than Their Managers
Technology giant Oracle Corporation recently published a study that examined the relation of workers that indicates a distinct change in which artificial intelligence is changing the relationship between people and technology at work. According to this research that was done on a global scale, based on the analysis of the data, Oracle came to the conclusion that now 64% of people are inclined to trust a robot more than their manager.
The study was done involving 8,370 employees, managers and HR leaders across 10 countries. As is stated in the company’s press release, it found that AI has changed the relationship between people and technology at work and is reshaping the role HR teams and managers need to play in attracting, retaining and developing talent.
These results counter some common fears present that AI will have a negative impact on jobs, employees, managers and as the release states, “HR leaders across the globe are reporting increased adoption of AI at work and many are welcoming AI with love and optimism.”
In presenting the results of the study, ItProPortal notes that the report’s results show that “the majority of people would trust a robot more than their manager. They’d rather turn to a robot for advice, than their manager.”
Summarising the main points of the report, the press release point to the following:
[ ] AI is becoming more prominent with 50 percent of workers currently using some form of AI at work compared to only 32 percent last year. Workers in China (77 percent) and India (78 percent) have adopted AI over 2X more than those in France (32 percent) and Japan (29 percent).
[ ] The majority (65 percent) of workers are optimistic, excited and grateful about having robot co-workers and nearly a quarter report having a loving and gratifying relationship with AI at work.
[ ] Workers in India (60 percent) and China (56 percent) are the most excited about AI, followed by the UAE (44 percent), Singapore (41 percent), Brazil (32 percent), Australia/New Zealand (26 percent), Japan (25 percent), U.S. (22 percent), UK (20 percent) and France (8 percent).
[ ] Men have a more positive view of AI at work than women with 32 percent of men optimistic vs. 23 percent of women.
The results also indicate that most of the interviewed people believe that, as ItProPortal says, “robots would do a better job than their managers at providing unbiased information, maintaining work schedules, solving problems and maintaining a budget. At the same time, humans are considered better at understanding employee feelings, coaching and building a work culture.”
Also, it turns out that people weren’t afraid of losing their jobs to AI, with most of them being “optimistic, excited and grateful” to be able to work with the latest advancements in technology. The report quotes Jeanne Meister, Founding Partner of Future Workplace, which said that the company’s 2019 results “reveal that forward-looking companies are already capitalizing on the power of AI. As workers and managers leverage the power of artificial intelligence in the workplace, they are moving from fear to enthusiasm as they see the possibility of being free of many of their routine tasks and having more time to solve critical business problems for the enterprise.”
What Is Deep Learning?
Deep learning is one of the most influential and fastest growing fields in artificial intelligence. However, getting an intuitive understanding of deep learning can be difficult because the term deep learning covers a variety of different algorithms and techniques. Deep learning is also a subdiscipline of machine learning in general, so it’s important to understand what machine learning is in order to understand deep learning.
Put simply, machine learning is a method of enabling computers to carry out specific tasks without explicitly coding every line of the algorithms used to accomplish those tasks. There are many different machine learning algorithms, but one of the most commonly used algorithms is a multilayer perceptron. A multilayer perceptron is also referred to as a neural network, and it is comprised of a series of nodes/neurons linked together. There are three different layers in a multilayer perceptron: the input layer, the hidden layer, and the output layer.
The input layer takes the data into the network, where it is manipulated by the nodes in the middle/hidden layer. The nodes in the hidden layer are mathematical functions that can manipulate the data coming from the input layer, extracting relevant patterns from the input data. This is how the neural network “learns”. Neural networks get their name from the fact that they are inspired by the structure and function of the human brain.
The connections between nodes in the network have values called weights. These values are essentially assumptions about how the data in one layer is related to the data in the next layer. As the network trains the weights are adjusted, and the goal is that the weights/assumptions about the data will eventually converge on values that accurately represent the meaningful patterns within the data.
Activation functions are present in the nodes of the network, and these activation functions transform the data in a non-linear fashion, enabling the network to learn complex representations of the data. Activation functions multiply the input values by the weight values and add a bias term.
Defining Deep Learning
Deep learning is the term given to machine learning architectures that join many multilayer perceptrons together, so that there isn’t just one hidden layer but many hidden layers. The “deeper” that the deep neural network is, the more sophisticated patterns the network can learn.
The deep layer networks comprised of neurons are sometimes referred to as fully connected networks or fully connected layers, referencing the fact that a given neuron maintains a connection to all the neurons surrounding it. Fully connected networks can be combined with other machine learning functions to create different deep learning architectures.
Different Deep Learning Architectures
There are a variety of deep learning architectures used by researchers and engineers, and each of the different architectures has its own specialty use case.
Convolutional Neural Networks
Convolutional neural networks, or CNNs, are the neural network architecture commonly used in the creation of computer vision systems. The structure of convolutional neural networks enables them to interpret image data, converting them into numbers that a fully connected network can interpret. A CNN has four major components:
- Convolutional layers
- Subsampling/pooling layers
- Activation functions
- Fully connected layers
The convolutional layers are what takes in the images as inputs into the network, analyzing the images and getting the values of the pixels. Subsampling or pooling is where the image values are converted/reduced to simplify the representation of the images and reduce the sensitivity of the image filters to noise. The activation functions control how the data flows from one layer to the next layer, and the fully connected layers are what analyze the values that represent the image and learn the patterns held in those values.
Recurrent neural networks, or RNNs, are popular for tasks where the order of the data matters, where the network must learn about a sequence of data. RNNs are commonly applied to problems like natural language processing, as the order of words matters when decoding the meaning of a sentence. The “recurrent” part of the term Recurrent Neural Network comes from the fact that the output for a given element in a sequence in dependant on the previous computation as well as the current computation. Unlike other forms of deep neural networks, RNNs have “memories”, and the information calculated at the different time steps in the sequence is used to calculate the final values.
There are multiple types of RNNs, including bidirectional RNNs, which take future items in the sequence into account, in addition to the previous items, when calculating an item’s value. Another type of RNN is a Long Short-Term Memory, or LSTM, network. LSTMs are types of RNN that can handle long chains of data. Regular RNNs may fall victim to something called the “exploding gradient problem”. This issue occurs when the chain of input data becomes extremely long, but LSTMs have techniques to combat this problem.
Most of the deep learning architectures mentioned so far are applied to supervised learning problems, rather than unsupervised learning tasks. Autoencoders are able to transform unsupervised data into a supervised format, allowing neural networks to be used on the problem.
Autoencoders are frequently used to detect anomalies in datasets, an example of unsupervised learning as the nature of the anomaly isn’t known. Such examples of anomaly detection include fraud detection for financial institutions. In this context, the purpose of an autoencoder is to determine a baseline of regular patterns in the data and identify anomalies or outliers.
The structure of an autoencoder is often symmetrical, with hidden layers arrayed such that the output of the network resembles the input. The four types of autoencoders that see frequent use are:
- Regular/plain autoencoders
- Multilayer encoders
- Convolutional encoders
- Regularized encoders
Regular/plain autoencoders are just neural nets with a single hidden layer, while multilayer autoencoders are deep networks with more than one hidden layer. Convolutional autoencoders use convolutional layers instead of, or in addition to, fully-connected layers. Regularized autoencoders use a specific kind of loss function that lets the neural network carry out more complex functions, functions other than just copying inputs to outputs.
Generative Adversarial Networks
Generative Adversarial Networks (GANs) are actually multiple deep neural networks instead of just one network. Two deep learning models are trained at the same time, and their outputs are fed to the other network. The networks are in competition with each other, and since they get access to each other’s output data, they both learn from this data and improve. The two networks are essentially playing a game of counterfeit and detection, where the generative model tries to create new instances that will fool the detective model/the discriminator. GANs have become popular in the field of computer vision.
Deep learning extends the principles of neural networks to create sophisticated models that can learn complex patterns and generalize those patterns to future datasets. Convolutional neural networks are used to interpret images, while RNNs/LSTMs are used to interpret sequential data. Autoencoders can transform unsupervised learning tasks into supervised learning tasks. Finally, GANs are multiple networks pitted against each other that are especially useful for computer vision tasks.