Connect with us

AI 101

What is NLP (Natural Language Processing)?

mm

Updated

 on

What is Natural Language Processing (NLP)?

Natural Language Processing (NLP) is the study and application of techniques and tools that enable computers to process, analyze, interpret, and reason about human language. NLP is an interdisciplinary field and it combines techniques established in fields like linguistics and computer science. These techniques are used in concert with AI to create chatbots and digital assistants like Google Assistant and Amazon’s Alexa.

Let’s take some time to explore the rationale behind Natural Language Processing, some of the techniques used in NLP, and some common uses cases for NLP.

Why Natural Language Processing (NLP) Matters

In order for computers to interpret human language, they must be converted into a form that a computer can manipulate. However, this isn’t as simple as converting text data into numbers. In order to derive meaning from human language, patterns have to be extracted from the hundreds or thousands of words that make up a text document. This is no easy task. There are few hard and fast rules that can be applied to the interpretation of human language. For instance, the exact same set of words can mean different things depending on the context. Human language is a complex and often ambiguous thing, and a statement can be uttered with sincerity or sarcasm.

Despite this, there are some general guidelines that can be used when interpreting words and characters, such as the character “s” being used to denote that an item is plural. These general guidelines have to be used in concert with each other to extract meaning from the text, to create features that a machine learning algorithm can interpret.

Natural Language Processing involves the application of various algorithms capable of taking unstructured data and converting it into structured data. If these algorithms are applied in the wrong manner, the computer will often fail to derive the correct meaning from the text. This can often be seen in the translation of text between languages, where the precise meaning of the sentence is often lost. While machine translation has improved substantially over the past few years, machine translation errors still occur frequently.

Natural Language Processing (NLP) Techniques

Photo: Tamur via WikiMedia Commons, Public Domain (https://commons.wikimedia.org/wiki/File:ParseTree.svg)

Many of the techniques that are used in natural language processing can be placed in one of two categories: syntax or semantics. Syntax techniques are those that deal with the ordering of words, while semantic techniques are the techniques that involve the meaning of words.

Syntax NLP Techniques

Examples of syntax include:

  • Lemmatization
  • Morphological Segmentation
  • Part-of-Speech Tagging
  • Parsing
  • Sentence Breaking
  • Stemming
  • Word Segmentation

Lemmatization refers to distilling the different inflections of a word down to a single form. Lemmatization takes things like tenses and plurals and simplifies them, for example, “feet” might become “foot” and “stripes” may become “stripe”.  This simplified word form makes it easier for an algorithm to interpret the words in a document.

Morphological segmentation is the process of dividing words into morphemes or the base units of a word. These units are things like free morphemes (which can stand alone as words) and prefixes or suffixes.

Part-of-speech tagging is simply the process of identifying which part of speech every word in an input document is.

Parsing refers to analyzing all the words in a sentence and correlating them with their formal grammar labels or doing grammatical analysis for all the words.

Sentence breaking, or sentence boundary segmentation, refers to deciding where a sentence begins and ends.

Stemming is the process of reducing words down to the root form of the word. For instance, connected, connection, and connections would all be stemmed to “connect”.

Word Segmentation is the process of dividing large pieces of text down into small units, which can be words or stemmed/lemmatized units.

Semantic NLP Techniques

Semantic NLP techniques include techniques like:

  • Named Entity Recognition
  • Natural Language Generation
  • Word-Sense disambiguation

Named entity recognition involves tagging certain text portions that can be placed into one of a number of different preset groups. Pre-defined categories include things like dates, cities, places, companies, and individuals.

Natural language generation is the process of using databases to transform structured data into natural language. For instance, statistics about the weather, like temperature and wind speed could be summarized with natural language.

Word-sense disambiguation is the process of assigning meaning to words within a text based on the context the words appear in.

Deep Learning Models For NLP

Regular multilayer perceptrons are unable to handle the interpretation of sequential data, where the order of the information is important. In order to deal with the importance of order in sequential data, a type of neural network is used that preserves information from previous timesteps in the training.

Recurrent Neural Networks are types of neural networks that loop over data from previous timesteps, taking them into account when calculating the weights of the current timestep. Essentially, RNN’s have three parameters that are used during the forward training pass: a matrix based on the Previous Hidden State, a matrix based on the Current Input, and a matrix that is between the hidden state and the output. Because RNNs can take information from previous timesteps into account, they can extract relevant patterns from text data by taking earlier words in the sentence into account when interpreting the meaning of a word.

Another type of deep learning architecture used to process text data is a Long Short-Term Memory (LSTM) network. LSTM networks are similar to RNNs in structure, but owing to some differences in their architecture they tend to perform better than RNNs. They avoid a specific problem that often occurs when using RNNs called the exploding gradient problem.

These deep neural networks can be either unidirectional or bi-directional. Bi-directional networks are capable of taking not just the words that come prior to the current word into account, but the words that come after it. While this leads to higher accuracy, it is more computationally expensive.

Use Cases For Natural Language Processing (NLP)

Photo: mohammed_hassan via Pixabay, Pixabay License (https://pixabay.com/illustrations/chatbot-chat-application-artificial-3589528/)

Because Natural Language Processing involves the analysis and manipulation of human languages, it has an incredibly wide range of applications. Possible applications for NLP include chatbots, digital assistants, sentiment analysis, document organization, talent recruitment, and healthcare.

Chatbots and digital assistants like Amazon’s Alexa and Google Assistant are examples of voice recognition and synthesis platforms that use NLP to interpret and respond to vocal commands. These digital assistants help people with a wide variety of tasks, letting them offload some of their cognitive tasks to another device and free up some of their brainpower for other, more important things. Instead of looking up the best route to the bank on a busy morning, we can just have our digital assistant do it.

Sentiment analysis is the use of NLP techniques to study people’s reactions and feelings to a phenomenon, as communicated by their use of language. Capturing the sentiment of a statement, like interpreting whether a review of a product is good or bad, can provide companies with substantial information regarding how their product is being received.

Automatically organizing text documents is another application of NLP. Companies like Google and Yahoo use NLP algorithms to classify email documents, putting them in the appropriate bins such as “social” or “promotions”. They also use these techniques to identify spam and prevent it from reaching your inbox.

Groups have also developed NLP techniques are being used to identify potential job hires, finding them based on relevant skills. Hiring managers are also using NLP techniques to help them sort through lists of applicants.

NLP techniques are also being used to enhance healthcare. NLP can be used to improve the detection of diseases. Health records can be analyzed and symptoms extracted by NLP algorithms, which can then be used to suggest possible diagnoses. One example of this is Amazon’s Comprehend Medical platform, which analyzes health records and extracts diseases and treatments. Healthcare applications of NLP also extend to mental health. There are apps such as WoeBot, which talks users through a variety of anxiety management techniques based in Cognitive Behavioral Therapy.

Spread the love

Blogger and programmer with specialties in Machine Learning and Deep Learning topics. Daniel hopes to help others use the power of AI for social good.

AI 101

What is an Autoencoder?

mm

Updated

 on

If you’ve read about unsupervised learning techniques before, you may have come across the term “autoencoder”. Autoencoders are one of the primary ways that unsupervised learning models are developed. Yet what is an autoencoder exactly?

Briefly, autoencoders operate by taking in data, compressing and encoding the data, and then reconstructing the data from the encoding representation. The model is trained until the loss is minimized and the data is reproduced as closely as possible. Through this process, an autoencoder can learn the important features of the data. While that’s a quick definition of an autoencoder, it would be beneficial to take a closer look at autoencoders and gain a better understanding of how they function. This article will endeavor to demystify autoencoders, explaining the architecture of autoencoders and their applications.

What is an Autoencoder?

Autoencoders are neural networks. Neural networks are composed of multiple layers, and the defining aspect of an autoencoder is that the input layers contain exactly as much information as the output layer. The reason that the input layer and output layer has the exact same number of units is that an autoencoder aims to replicate the input data. It outputs a copy of the data after analyzing it and reconstructing it in an unsupervised fashion.

The data that moves through an autoencoder isn’t just mapped straight from input to output, meaning that the network doesn’t just copy the input data. There are three components to an autoencoder: an encoding (input) portion that compresses the data, a component that handles the compressed data (or bottleneck), and a decoder (output) portion. When data is fed into an autoencoder, it is encoded and then compressed down to a smaller size. The network is then trained on the encoded/compressed data and it outputs a recreation of that data.

So why would you want to train a network to just reconstruct the data that is given to it? The reason is that the network learns the “essence”, or most important features of the input data. After you have trained the network, a model can be created that can synthesize similar data, with the addition or subtraction of certain target features. For instance, you could train an autoencoder on grainy images and then use the trained model to remove the grain/noise from the image.

Autoencoder Architecture

Let’s take a look at the architecture of an autoencoder. We’ll discuss the main architecture of an autoencoder here. There are variations on this general architecture that we’ll discuss in the section below.

Photo: Michela Massi via Wikimedia Commons,(https://commons.wikimedia.org/wiki/File:Autoencoder_schema.png)

As previously mentioned an autoencoder can essentially be divided up into three different components: the encoder, a bottleneck, and the decoder.

The encoder portion of the autoencoder is typically a feedforward, densely connected network. The purpose of the encoding layers is to take the input data and compress it into a latent space representation, generating a new representation of the data that has reduced dimensionality.

The code layers, or the bottleneck, deal with the compressed representation of the data. The bottleneck code is carefully designed to determine the most relevant portions of the observed data, or to put that another way the features of the data that are most important for data reconstruction. The goal here is to determine which aspects of the data need to be preserved and which can be discarded. The bottleneck code needs to balance two different considerations: representation size (how compact the representation is) and variable/feature relevance. The bottleneck performs element-wise activation on the weights and biases of the network. The bottleneck layer is also sometimes called a latent representation or latent variables.

The decoder layer is what is responsible for taking the compressed data and converting it back into a representation with the same dimensions as the original, unaltered data. The conversion is done with the latent space representation that was created by the encoder.

The most basic architecture of an autoencoder is a feed-forward architecture, with a structure much like a single layer perceptron used in multilayer perceptrons. Much like regular feed-forward neural networks, the auto-encoder is trained through the use of backpropagation.

Attributes of An Autoencoder

There are various types of autoencoders, but they all have certain properties that unite them.

Autoencoders learn automatically. They don’t require labels, and if given enough data it’s easy to get an autoencoder to reach high performance on a specific kind of input data.

Autoencoders are data-specific. This means that they can only compress data that is highly similar to data that the autoencoder has already been trained on. Autoencoders are also lossy, meaning that the outputs of the model will be degraded in comparison to the input data.

When designing an autoencoder, machine learning engineers need to pay attention to four different model hyperparameters: code size, layer number, nodes per layer, and loss function.

The code size decides how many nodes begin the middle portion of the network, and fewer nodes compress the data more. In a deep autoencoder,  while the number of layers can be any number that the engineer deems appropriate, the number of nodes in a layer should decrease as the encoder goes on. Meanwhile, the opposite holds true in the decoder, meaning the number of nodes per layer should increase as the decoder layers approach the final layer. Finally, the loss function of an autoencoder is typically either binary cross-entropy or mean squared error. Binary cross-entropy is appropriate for instances where the input values of the data are in a 0 – 1 range.

Autoencoder Types

As mentioned above, variations on the classic autoencoder architecture exist. Let’s examine the different autoencoder architectures.

Sparse

Photo: Michela Massi via Wikimedia Commons, CC BY SA 4.0 (https://commons.wikimedia.org/wiki/File:Autoencoder_sparso.png)

While autoencoders typically have a bottleneck that compresses the data through a reduction of nodes, sparse autoencoders are an alternative to that typical operational format. In a sparse network, the hidden layers maintain the same size as the encoder and decoder layers. Instead, the activations within a given layer are penalized, setting it up so the loss function better captures the statistical features of input data. To put that another way, while the hidden layers of a sparse autoencoder have more units than a traditional autoencoder, only a certain percentage of them are active at any given time. The most impactful activation functions are preserved and others are ignored, and this constraint helps the network determine just the most salient features of the input data.

Contractive

Contractive autoencoders are designed to be resilient against small variations in the data, maintaining a consistent representation of the data. This is accomplished by applying a penalty to the loss function. This regularization technique is based on the Frobenius norm of the Jacobian matrix for the input encoder activations. The effect of this regularization technique is that the model is forced to construct an encoding where similar inputs will have similar encodings.

Convolutional

Convolutional autoencoders encode input data by splitting the data up into subsections and then converting these subsections into simple signals that are summed together to create a new representation of the data. Similar to convolution neural networks,  a convolutional autoencoder specializes in the learning of image data, and it uses a filter that is moved across the entire image section by section. The encodings generated by the encoding layer can be used to reconstruct the image, reflect the image, or modify the image’s geometry. Once the filters have been learned by the network, they can be used on any sufficiently similar input to extract the features of the image.

Denoising

Photo: MAL via Wikimedia Commons, CC BY SA 3.0 (https://en.wikipedia.org/wiki/File:ROF_Denoising_Example.png)

Denoising autoencoders introduce noise into the encoding, resulting in an encoding that is a corrupted version of the original input data. This corrupted version of the data is used to train the model, but the loss function compares the output values with the original input and not the corrupted input. The goal is that the network will be able to reproduce the original, non-corrupted version of the image. By comparing the corrupted data with the original data, the network learns which features of the data are most important and which features are unimportant/corruptions. In other words, in order for a model to denoise the corrupted images, it has to have extracted the important features of the image data.

Variational

Variational autoencoders operate by making assumptions about how the latent variables of the data are distributed. A variational autoencoder produces a probability distribution for the different features of the training images/the latent attributes. When training, the encoder creates latent distributions for the different features of the input images.

 

Because the model learns the features or images as Gaussian distributions instead of discrete values, it is capable of being used to generate new images. The Gaussian distribution is sampled to create a vector, which is fed into the decoding network, which renders an image based on this vector of samples. Essentially, the model learns common features of the training images and assigns them some probability that they will occur. The probability distribution can then be used to reverse engineer an image, generating new images that resemble the original, training images.

When training the network, the encoded data is analyzed and the recognition model outputs two vectors, drawing out the mean and standard deviation of the images. A distribution is created based on these values. This is done for the different latent states. The decoder then takes random samples from the corresponding distribution and uses them to reconstruct the initial inputs to the network.

Autoencoder Applications

Autoencoders can be used for a wide variety of applications, but they are typically used for tasks like dimensionality reduction, data denoising, feature extraction, image generation, sequence to sequence prediction, and recommendation systems.

Data denoising is the use of autoencoders to strip grain/noise from images. Similarly, autoencoders can be used to repair other types of image damage, like blurry images or images missing sections. Dimensionality reduction can help high capacity networks learn useful features of images, meaning the autoencoders can be used to augment the training of other types of neural networks. This is also true of using autoencoders for feature extraction, as autoencoders can be used to identify features of other training datasets to train other models.

In terms of image generation, autoencoders can be used to generate fake human images or animated characters, which has applications in designing face recognition systems or automating certain aspects of animation.

Sequence to sequence prediction models can be used to determine the temporal structure of data, meaning that an autoencoder can be used to generate the next even in a sequence. For this reason, an autoencoder could be used to generate videos. Finally, deep autoencoders can be used to create recommendation systems by picking up on patterns relating to user interest, with the encoder analyzing user engagement data and the decoder creating recommendations that fit the established patterns.

Spread the love
Continue Reading

AI 101

What Is Synthetic Data?

mm

Updated

 on

What is Synthetic Data?

Synthetic data is a quickly expanding trend and emerging tool in the field of data science. What is synthetic data exactly? The short answer is that synthetic data is comprised of data that isn’t based on any real-world phenomena or events, rather it’s generated via a computer program. Yet why is synthetic data becoming so important for data science? How is synthetic data created? Let’s explore the answers to these questions.

What is a Synthetic Dataset?

As the term “synthetic” suggests, synthetic datasets are generated through computer programs, instead of being composed through the documentation of real-world events. The primary purpose of a synthetic dataset is to be versatile and robust enough to be useful for the training of machine learning models.

In order to be useful for a machine learning classifier, the synthetic data should have certain properties. While the data can be categorical, binary, or numerical, the length of the dataset should be arbitrary and the data should be randomly generated. The random processes used to generate the data should be controllable and based on various statistical distributions. Random noise may also be placed in the dataset.

If the synthetic data is being used for a classification algorithm, the amount of class separation should be customizable, in order that the classification problem can be made easier or harder according to the problem’s requirements. Meanwhile, for a regression task, non-linear generative processes can be employed to generate the data.

Why Use Synthetic Data?

As machine learning frameworks like TensorfFlow and PyTorch become easier to use and pre-designed models for computer vision and natural language processing become more ubiquitous and powerful, the primary problem that data scientists must face is the collection and handling of data. Companies often have difficulty acquiring large amounts of data to train an accurate model within a given time frame. Hand-labeling data is a costly, slow way to acquire data. However, generating and using synthetic data can help data scientists and companies overcome these hurdles and develop reliable machine learning models a quicker fashion.

There are a number of advantages to using synthetic data. The most obvious way that the use of synthetic data benefits data science is that it reduces the need to capture data from real-world events, and for this reason it becomes possible to generate data and construct a dataset much more quickly than a dataset dependent on real-world events. This means that large volumes of data can be produced in a short timeframe. This is especially true for events that rarely occur, as if an event rarely happens in the wild, more data can be mocked up from some genuine data samples. Beyond that, the data can be automatically labeled as it is generated, drastically reducing the amount of time needed to label data.

Synthetic data can also be useful to gain training data for edge cases, which are instances that may occur infrequently but are critical for the success of your AI. Edge cases are events that are very similar to the primary target of an AI but differ in important ways. For instance, objects that are only partially in view could be considered edge cases when designing an image classifier.

Finally, synthetic datasets can minimize privacy concerns. Attempts to anonymize data can be ineffective, as even if sensitive/identifying variables are removed from the dataset, other variables can act as identifiers when they are combined. This isn’t an issue with synthetic data, as it was never based on a real person, or real event, in the first place.

Uses Cases for Synthetic Data

Synthetic data has a wide variety of uses, as it can be applied to just about any machine learning task. Common use cases for synthetic data include self-driving vehicles, security, robotics, fraud protection, and healthcare.

One of the initial use cases for synthetic data was self-driving cars, as synthetic data is used to create training data for cars in conditions where getting real, on-the-road training data is difficult or dangerous. Synthetic data is also useful for the creation of data used to train image recognition systems, like surveillance systems, much more efficiently than manually collecting and labeling a bunch of training data. Robotics systems can be slow to train and develop with traditional data collection and training methods. Synthetic data allows robotics companies to test and engineer robotics systems through simulations. Fraud protection systems can benefit from synthetic data, and new fraud detection methods can be trained and tested with data that is constantly new when synthetic data is used. In the healthcare field, synthetic data can be used to design health classifiers that are accurate, yet preserve people’s privacy, as the data won’t be based on real people.

Synthetic Data Challenges

While the use of synthetic data brings many advantages with it, it also brings many challenges.

When synthetic data is created, it often lacks outliers. Outliers occur in data naturally, and while often dropped from training datasets, their existence may be necessary to train truly reliable machine learning models. Beyond this, the quality of synthetic data can be highly variable. Synthetic data is often generated with an input, or seed, data, and therefore the quality of the data can be dependent on the quality of the input data. If the data used to generate the synthetic data is biased, the generated data can perpetuate that bias. Synthetic data also requires some form of output/quality control. It needs to be checked against human-annotated data, or otherwise authentic data is some form.

How Is Synthetic Data Created?

Synthetic data is created programmatically with machine learning techniques. Classical machine learning techniques like decision trees can be used, as can deep learning techniques. The requirements for the synthetic data will influence what type of algorithm is used to generate the data. Decision trees and similar machine learning models let companies create non-classical, multi-modal data distributions, trained on examples of real-world data. Generating data with these algorithms will provide data that is highly correlated with the original training data. For instances where the typical distribution of data is known , a company can generate synthetic data through use of a Monte Carlo method.

Deep learning-based methods of generating synthetic data typically make use of either a variational autoencoder (VAE) or a generative adversarial network (GAN). VAEs are unsupervised machine learning models that make use of encoders and decoders. The encoder portion of a VAE is responsible for compressing the data down into a simpler, compact version of the original dataset, which the decoder then analyzes and uses to generate an a representation of the base data. A VAE is trained with the goal of having an optimal relationship between the input data and output, one where both input data and output data are extremely similar.

When it comes to GAN models, they are called “adversarial” networks due to the fact that GANs are actually two networks that compete with each other. The generator is responsible for generating synthetic data, while the second network (the discriminator) operates by comparing the generated data with a real dataset and tries to determine which data is fake. When the discriminator catches fake data, the generator is notified of this and it makes changes to try and get a new batch of data by the discriminator. In turn, the discriminator becomes better and better at detecting fakes. The two networks are trained against each other, with fakes becoming more lifelike all the time.

Spread the love
Continue Reading

AI 101

How Does Image Classification Work?

mm

Updated

 on

How can your phone determine what an object is just by taking a photo of it? How do social media websites automatically tag people in photos? This is accomplished through AI-powered image recognition and classification.

The recognition and classification of images is what enables many of the most impressive accomplishments of artificial intelligence. Yet how do computers learn to detect and classify images? In this article, we’ll cover the general methods that computers use to interpret and detect images and then take a look at some of the most popular methods of classifying those images.

Pixel-Level vs. Object-Based Classification

Image classification techniques can mainly be divided into two different categories: pixel-based classification and object-based classification.

Pixels are the base units of an image, and the analysis of pixels is the primary way that image classification is done. However, classification algorithms can either use just the spectral information within individual pixels to classify an image or examine spatial information (nearby pixels) along with the spectral information. Pixel-based classification methods utilize only spectral information (the intensity of a pixel), while object-based classification methods take into account both pixel spectral information and spatial information.

There are different classification techniques used for pixel-based classification. These include minimum-distance-to-mean, maximum-likelihood, and minimum-Mahalanobis-distance. These methods require that the means and variances of the classes are known, and they all operate by examining the “distance” between class means and the target pixels.

Pixel-based classification methods are limited by the fact that they can’t use information from other nearby pixels. In contrast, object-based classification methods can include other pixels and therefore they also use spatial information to classify items. Note that “object” just refers to contiguous regions of pixels and not whether or not there is a target object within that region of pixels.

Preprocessing Image Data For Object Detection

The most recent and reliable image classification systems primarily use object-level classification schemes, and for these approaches image data must be prepared in specific ways. The objects/regions need to be selected and preprocessed.

Before an image, and the objects/regions within that image, can be classified the data that comprises that image has to be interpreted by the computer. Images need to be preprocessed and readied for input into the classification algorithm, and this is done through object detection. This is a critical part of readying the data and preparing the images to train the machine learning classifier.

Object detection is done with a variety of methods and techniques. To begin with, whether or not there are multiple objects of interest or a single object of interest impacts how the image preprocessing is handled. If there is just one object of interest, the image undergoes image localization. The pixels that comprise the image have numerical values that are interpreted by the computer and used to display the proper colors and hues. An object known as a bounding box is drawn around the object of interest, which helps the computer know what part of the image is important and what pixel values define the object. If there are multiple objects of interest in the image, a technique called object detection is used to apply these bounding boxes to all the objects within the image.

Photo: Adrian Rosebrock via Wikimedia Commons, CC BY SA 4.0 (https://commons.wikimedia.org/wiki/File:Intersection_over_Union_-_object_detection_bounding_boxes.jpg)

Another method of preprocessing is image segmentation. Image segmentation functions by dividing the whole image into segments based on similar features. Different regions of the image will have similar pixel values in comparison to other regions of the image, so these pixels are grouped together into image masks that correspond to the shape and boundaries of the relevant objects within the image. Image segmentation helps the computer isolate the features of the image that will help it classify an object, much like bounding boxes do, but they provide much more accurate, pixel-level labels.

After the object detection or image segmentation has been completed, labels are applied to the regions in question. These labels are fed, along with the values of the pixels comprising the object, into the machine learning algorithms that will learn patterns associated with the different labels.

Machine Learning Algorithms

Once the data has been prepared and labeled, the data is fed into a machine learning algorithm, which trains on the data. We’ll cover some of the most common kinds of machine learning image classification algorithms below.

K-Nearest Neighbors

K-Nearest Neighbors is a classification algorithm that examines the closest training examples and looks at their labels to ascertain the most probable label for a given test example. When it comes to image classification using KNN, the feature vectors and labels of the training images are stored and just the feature vector is passed into the algorithm during testing. The training and testing feature vectors are then compared against each other for similarity.

KNN-based classification algorithms are extremely simple and they deal with multiple classes quite easily. However, KNN calculates similarity based on all features equally. This means that it can be prone to misclassification when provided with images where only a subset of the features is important for the classification of the image.

Support Vector Machines

Support Vector Machines are a classification method that places points in space and then draws dividing lines between the points, placing objects in different classes depending on which side of the dividing plane the points fall on. Support Vector Machines are capable of doing nonlinear classification through the use of a technique known as the kernel trick. While SVM classifiers are often very accurate, a substantial drawback to SVM classifiers is that they tend to be limited by both size and speed, with speed suffering as size increases.

Multi-Layer Perceptrons (Neural Nets)

Multi-layer perceptrons, also called neural network models, are machine learning algorithms inspired by the human brain. Multilayer perceptrons are composed of various layers that are joined together with each other, much like neurons in the human brain are linked together. Neural networks make assumptions about how the input features are related to the data’s classes and these assumptions are adjusted over the course of training. Simple neural network models like the multi-layer perceptron are capable of learning non-linear relationships, and as a result, they can be much more accurate than other models. However, MLP models suffer from some notable issues like the presence of non-convex loss functions.

Deep Learning Algorithms (CNNs)

Photo: APhex34 via Wikimedia Commons, CC BY SA 4.0 (https://commons.wikimedia.org/wiki/File:Typical_cnn.png)

The most commonly used image classification algorithm in recent times is the Convolutional Neural Network (CNNs). CNNs are customized versions of neural networks that combine the multilayer neural networks with specialized layers that are capable of extracting the features most important and relevant to the classification of an object. CNNs can automatically discover, generate, and learn features of images. This greatly reduces the need to manually label and segment images to prepare them for machine learning algorithms. They also have an advantage over MLP networks because they can deal with non-convex loss functions.

Convolutional Neural Networks get their name from the fact that they create “convolutions”. CNNs operate by taking a filter and sliding it over an image. You can think of this as viewing sections of a landscape through a moveable window, concentrating on just the features that are viewable through the window at any one time. The filter contains numerical values which are multiplied with the values of the pixels themselves. The result is a new frame, or matrix, full of numbers that represent the original image. This process is repeated for a chosen number of filters, and then the frames are joined together into a new image that is slightly smaller and less complex than the original image. A technique called pooling is used to select just the most important values within the image, and the goal is for the convolutional layers to eventually extract just the most salient parts of the image that will help the neural network recognize the objects in the image.

Convolutional Neural Networks are comprised of two different parts. The convolutional layers are what extract the features of the image and convert them into a format that the neural network layers can interpret and learn from. The early convolutional layers are responsible for extracting the most basic elements of the image, like simple lines and boundaries. The middle convolutional layers begin to capture more complex shapes, like simple curves and corners. The later, deeper convolutional layers extract the high-level features of the image, which are what is passed into the neural network portion of the CNN, and are what the classifier learns.

Spread the love
Continue Reading