Natural Language Processing (NLP) is the study and application of techniques and tools that enable computers to process, analyze, interpret, and reason about human language. NLP is an interdisciplinary field and it combines techniques established in fields like linguistics and computer science. These techniques are used in concert with AI to create chatbots and digital assistants like Google Assistant and Amazon’s Alexa.
Let’s take some time to explore the rationale behind Natural Language Processing, some of the techniques used in NLP, and some common uses cases for NLP.
Why Is Natural Language Processing Important?
In order for computers to interpret human language, they must be converted into a form that a computer can manipulate. However, this isn’t as simple as converting text data into numbers. In order to derive meaning from human language, patterns have to be extracted from the hundreds or thousands of words that make up a text document. This is no easy task. There are few hard and fast rules that can be applied to the interpretation of human language. For instance, the exact same set of words can mean different things depending on the context. Human language is a complex and often ambiguous thing, and a statement can be uttered with sincerity or sarcasm.
Despite this, there are some general guidelines that can be used when interpreting words and characters, such as the character “s” being used to denote that an item is plural. These general guidelines have to be used in concert with each other to extract meaning from the text, to create features that a machine learning algorithm can interpret.
Natural Language Processing involves the application of various algorithms capable of taking unstructured data and converting it into structured data. If these algorithms are applied in the wrong manner, the computer will often fail to derive the correct meaning from the text. This can often be seen in the translation of text between languages, where the precise meaning of the sentence is often lost. While machine translation has improved substantially over the past few years, machine translation errors still occur frequently.
Natural Language Processing Techniques
Many of the techniques that are used in natural language processing can be placed in one of two categories: syntax or semantics. Syntax techniques are those that deal with the ordering of words, while semantic techniques are the techniques that involve the meaning of words.
Syntax NLP Techniques
Examples of syntax include:
- Morphological Segmentation
- Part-of-Speech Tagging
- Sentence Breaking
- Word Segmentation
Lemmatization refers to distilling the different inflections of a word down to a single form. Lemmatization takes things like tenses and plurals and simplifies them, for example, “feet” might become “foot” and “stripes” may become “stripe”. This simplified word form makes it easier for an algorithm to interpret the words in a document.
Morphological segmentation is the process of dividing words into morphemes or the base units of a word. These units are things like free morphemes (which can stand alone as words) and prefixes or suffixes.
Part-of-speech tagging is simply the process of identifying which part of speech every word in an input document is.
Parsing refers to analyzing all the words in a sentence and correlating them with their formal grammar labels or doing grammatical analysis for all the words.
Sentence breaking, or sentence boundary segmentation, refers to deciding where a sentence begins and ends.
Stemming is the process of reducing words down to the root form of the word. For instance, connected, connection, and connections would all be stemmed to “connect”.
Word Segmentation is the process of dividing large pieces of text down into small units, which can be words or stemmed/lemmatized units.
Semantic NLP Techniques
Semantic NLP techniques include techniques like:
- Named Entity Recognition
- Natural Language Generation
- Word-Sense disambiguation
Named entity recognition involves tagging certain text portions that can be placed into one of a number of different preset groups. Pre-defined categories include things like dates, cities, places, companies, and individuals.
Natural language generation is the process of using databases to transform structured data into natural language. For instance, statistics about the weather, like temperature and wind speed could be summarized with natural language.
Word-sense disambiguation is the process of assigning meaning to words within a text based on the context the words appear in.
Deep Learning Models For Natural Language Processing
Regular multilayer perceptrons are unable to handle the interpretation of sequential data, where the order of the information is important. In order to deal with the importance of order in sequential data, a type of neural network is used that preserves information from previous timesteps in the training.
Recurrent Neural Networks are types of neural networks that loop over data from previous timesteps, taking them into account when calculating the weights of the current timestep. Essentially, RNN’s have three parameters that are used during the forward training pass: a matrix based on the Previous Hidden State, a matrix based on the Current Input, and a matrix that is between the hidden state and the output. Because RNNs can take information from previous timesteps into account, they can extract relevant patterns from text data by taking earlier words in the sentence into account when interpreting the meaning of a word.
Another type of deep learning architecture used to process text data is a Long Short-Term Memory (LSTM) network. LSTM networks are similar to RNNs in structure, but owing to some differences in their architecture they tend to perform better than RNNs. They avoid a specific problem that often occurs when using RNNs called the exploding gradient problem.
These deep neural networks can be either unidirectional or bi-directional. Bi-directional networks are capable of taking not just the words that come prior to the current word into account, but the words that come after it. While this leads to higher accuracy, it is more computationally expensive.
Use Cases For Natural Language Processing
Because Natural Language Processing involves the analysis and manipulation of human languages, it has an incredibly wide range of applications. Possible applications for NLP include chatbots, digital assistants, sentiment analysis, document organization, talent recruitment, and healthcare.
Chatbots and digital assistants like Amazon’s Alexa and Google Assistant are examples of voice recognition and synthesis platforms that use NLP to interpret and respond to vocal commands. These digital assistants help people with a wide variety of tasks, letting them offload some of their cognitive tasks to another device and free up some of their brainpower for other, more important things. Instead of looking up the best route to the bank on a busy morning, we can just have our digital assistant do it.
Sentiment analysis is the use of NLP techniques to study people’s reactions and feelings to a phenomenon, as communicated by their use of language. Capturing the sentiment of a statement, like interpreting whether a review of a product is good or bad, can provide companies with substantial information regarding how their product is being received.
Automatically organizing text documents is another application of NLP. Companies like Google and Yahoo use NLP algorithms to classify email documents, putting them in the appropriate bins such as “social” or “promotions”. They also use these techniques to identify spam and prevent it from reaching your inbox.
Groups have also developed NLP techniques are being used to identify potential job hires, finding them based on relevant skills. Hiring managers are also using NLP techniques to help them sort through lists of applicants.
NLP techniques are also being used to enhance healthcare. NLP can be used to improve the detection of diseases. Health records can be analyzed and symptoms extracted by NLP algorithms, which can then be used to suggest possible diagnoses. One example of this is Amazon’s Comprehend Medical platform, which analyzes health records and extracts diseases and treatments. Healthcare applications of NLP also extend to mental health. There are apps such as WoeBot, which talks users through a variety of anxiety management techniques based in Cognitive Behavioral Therapy.
To Learn More
|Recommended Natural Language Processing Courses||Offered By||Duration||Difficulty|
Deep Learning AI
Higher School of Economics
What is Computer Vision?
Computer vision algorithms are one of the most transformative and powerful AI systems in the world, at the moment. Computer vision systems see use in autonomous vehicles, robot navigation, facial recognition systems, and more. However, what are computer vision algorithms exactly? How do they work? In order to answer these questions, we’ll dive deep into the theory behind computer vision, computer vision algorithms, and applications for computer vision systems.
How Do Computer Vision Sytems Work?
In order to fully appreciate how computer vision systems work, let’s first take a moment to discuss how humans recognize objects. The best explanation neuropsychology has for how we recognize objects is a model that describes the initial phase of object recognition as one where the basic components of objects, such as form, color, and depth are interpreted by the brain first. The signals from the eye that enter the brain are analyzed to pull out the edges of an object first, and these edges are joined together into a more complex representation that complete’s the object’s form.
Computer vision systems operate very similarly to the human visual system, by first discerning the edges of an object and then joining these edges together into the object’s form. The big difference is that because computers interpret images as numbers, a computer vision system needs some way to interpret the individual pixels that comprise the image. The computer vision system will assign values to the pixels in the image and by examining the difference in values between one region of pixels and another region of pixels, the computer can discern edges. For instance, if the image in question is greyscale, then the values will range from black (represented by 0) to white (represented by 255). A sudden change in the range of values of pixels near each other will indicate an edge.
This basic principle of comparing pixel values can also be done with colored images, with the computer comparing differences between the different RGB color channels. So know that we know how a computer vision system examines pixel values to interpret an image, let’s take a look at the architecture of a computer vision system.
Convolutional Neural Networks
The primary type of AI used in computer vision tasks is one based on convolutional neural networks. What’s a convolution exactly?
Convolutions are mathematical processes the network uses to determine the difference in values between pixels. If you envision a grid of pixel values, picture a smaller grid being moved over this main grid. The values underneath the second grid are being analyzed by the network, so the network is only examining a handful of pixels at a time. This is often called the “sliding windows” technique. The values being analyzed by the sliding window are summarized by the network, which helps reduce the complexity of the image and make it easier for the network to extract patterns.
Convolutional neural networks are divided into two different sections, the convolutional section and the fully connected section. The convolutional layers of the network are the feature extractors, whose job is to analyze the pixels within the image and form representations of them that the densely connected layers of the neural network can learn patterns from. The convolutional layers start by just examining the pixels and extracting the low-level features of the image like edges. Later convolutional layers join the edges together into more complex shapes. By the end, the network will hopefully have a representation of the edges and details of the image that it can pass to the fully connected layers.
While a convolutional neural network can extract patterns from images by itself, the accuracy of the computer vision system can be greatly improved by annotating the images. Image annotation is the process of adding metadata to the image that assists the classifier in detecting important objects in the image. The use of image annotation is important whenever computer vision systems need to be highly accurate, such as when controlling an autonomous vehicle or robot.
There are various ways that images can be annotated to improve the performance of a computer vision classifier. Image annotation is often done with bounding boxes, a box that surrounds the edges of the target object and tells the computer to focus its attention within the box. Semantic segmentation is another type of image annotation, which operates by assigning an image class to every pixel in an image. In other words, every pixel that could be considered “grass” or “trees” will be labeled as belonging to those classes. The technique provides pixel-level precision, but creating semantic segmentation annotations is more complex and time-consuming than creating simple bounding boxes. Other annotation methods, like lines and points, also exist.
What Are Neural Networks?
Many of the biggest advances in AI are driven by artificial neural networks. Artificial Neural Networks (ANNs) are the connection of mathematical functions joined together in a format inspired by the neural networks found in the human brain. These ANNs are capable of extracting complex patterns from data, applying these patterns to unseen data to classify/recognize the data. In this way, the machine “learns”. That’s a quick rundown on neural networks, but let’s take a closer look at neural networks to better understand what they are and how they operate.
Understanding The Multi-layer Perceptron
Before we look at more complex neural networks, we’re going to take a moment to look at a simple version of an ANN, a Multi-Layer Perceptron (MLP).
Imagine an assembly line at a factory. On this assembly line, one worker receives an item, makes some adjustments to it, and then passes it on to the next worker in the line who does the same. This process continues until the last worker in the line puts the finishing touches on the item and puts it on a belt that will take it out of the factory. In this analogy, there are multiple “layers” to the assembly line, and products move between layers as they move from worker to worker. The assembly line also has an entry point and an exit point.
A Multi-Layer Perceptron can be thought of as a very simple production line, made out of three layers total: an input layer, a hidden layer, and an output layer. The input layer is where the data is fed into the MLP, and in the hidden layer some number of “workers” handle the data before passing it onto the output layer which gives the product to the outside world. In the instance of an MLP, these workers are called “neurons” (or sometimes nodes) and when they handle the data they manipulate it through a series of mathematical functions.
Within the network, there are structures connecting node to node called “weights”. Weights are an assumption about how data points are related as they move through the network. To put that another way, weights reflect the level of influence that one neuron has over another neuron. The weights pass through an “activation function” as they leave the current node, which is a type of mathematical function that transforms the data. They transform linear data into non-linear representations, which enables the network to analyze complex patterns.
The analogy to the human brain implied by “artificial neural network” comes from the fact that the neurons which make up the human brain are joined together in a similar fashion to how nodes in an ANN are linked.
While multi-layer perceptrons have existed since the 1940s, there were a number of limitations that prevented them from being especially useful. However, over the course of the past couple of decades, a technique called “backpropagation” was created that allowed networks to adjust the weights of the neurons and thereby learn much more effectively. Backpropagation changes the weights in the neural network, allowing the network to better capture the actual patterns within the data.
Deep Neural Networks
Deep neural networks take the basic form of the MLP and make it larger by adding more hidden layers in the middle of the model. So instead of there being an input layer, a hidden layer, and an output layer, there are many hidden layers in the middle and the outputs of one hidden layer become the inputs for the next hidden layer until the data has made it all the way through the network and been returned.
The multiple hidden layers of a deep neural network are able to interpret more complex patterns than the traditional multilayer perceptron. Different layers of the deep neural network learn the patterns of different parts of the data. For instance, if the input data consists of images, the first portion of the network might interpret the brightness or darkness of pixels while the later layers will pick out shapes and edges that can be used to recognize objects in the image.
Different Types Of Neural Networks
There are various types of neural networks, and each of the various neural network types has its own advantages and disadvantages (and therefore their own use cases). The type of deep neural network described above is the most common type of neural network, and it is often referred to as a feedforward neural network.
One variation on neural networks is the Recurrent Neural Network (RNN). In the case of Recurrent Neural Networks, looping mechanisms are used to hold information from previous states of analysis, meaning that they can interpret data where the order matters. RNNs are useful in deriving patterns from sequential/chronological data. Recurrent Neural Networks can be either unidirectional or bidirectional. In the case of a bi-directional neural network, the network can take information from later in the sequence as well as earlier portions of the sequence. Since the bi-directional RNN takes more information into account, it’s better able to draw the right patterns from the data.
A Convolutional Neural Network is a special type of neural network that is adept at interpreting the patterns found within images. A CNN operates by passing a filter over the pixels of the image and achieving a numerical representation of the pixels within the image, which it can then analyze for patterns. A CNN is structured so that the convolutional layers which pull the pixels out of the image come first, and then the densely connected feed-forward layers come, those that will actually learn to recognize objects, come after this.
Supervised vs Unsupervised Learning
In machine learning, most tasks can be easily categorized into one of two different classes: supervised learning problems or unsupervised learning problems. In supervised learning, data has labels or classes appended to it, while in the case of unsupervised learning the data is unlabeled. Let’s take a close look at why this distinction is important and look at some of the algorithms associated with each type of learning.
Supervised Vs. Unsupervised Learning
Most machine learning tasks are in the domain of supervised learning. In supervised learning algorithms, the individual instances/data points in the dataset have a class or label assigned to them. This means that the machine learning model can learn to distinguish which features are correlated with a given class and that the machine learning engineer can check the model’s performance by seeing how many instances were properly classified. Classification algorithms can be used to discern many complex patterns, as long as the data is labeled with the proper classes. For instance, a machine-learning algorithm can learn to distinguish different animals from each other based off of characteristics like “whiskers”, “tail”, “claws”, etc.
In contrast to supervised learning, unsupervised learning involves creating a model that is able to extract patterns from unlabeled data. In other words, the computer analyzes the input features and determines for itself what the most important features and patterns are. Unsupervised learning tries to find the inherent similarities between different instances. If a supervised learning algorithm aims to place data points into known classes, unsupervised learning algorithms will examine the features common to the object instances and place them into groups based on these features, essentially creating its own classes.
Examples of supervised learning algorithms are Linear Regression, Logistic Regression, K-nearest Neighbors, Decision Trees, and Support Vector Machines.
Meanwhile, some examples of unsupervised learning algorithms are Principal Component Analysis and K-Means Clustering.
Supervised Learning Algorithm Examples
Linear Regression is an algorithm that takes two features and plots out the relationship between them. Linear Regression is used to predict numerical values in relation to other numerical variables. Linear Regression has the equation of Y = a +bX, where b is the line’s slope and a is where y crosses the X-axis.
Logistic Regression is a binary classification algorithm. The algorithm examines the relationship between numerical features and finds the probability that the instance can be classified into one of two different classes. The probability values are “squeezed” towards either 0 or 1. In other words, strong probabilities will approach 0.99 while weak probabilities will approach 0.
K-Nearest Neighbors assigns a class to new data points based on the assigned classes of some chosen amount of neighbors in the training set. The number of neighbors considered by the algorithm is important, and too few or too many neighbors can misclassify points.
Decision Trees are a type of classification and regression algorithm. A decision tree operates by splitting up a dataset down into smaller and smaller portions until the subsets can’t be split any further and what results is a tree with nodes and leaves. The nodes are where decisions about data points are made using different filtering criteria, while the leaves are the instances that have been assigned some label (a data point that has been classified). Decision tree algorithms are capable of handling both numerical and categorical data. Splits are made in the tree on specific variables/features.
Support Vector Machines are a classification algorithm that operates by drawing hyperplanes, or lines of separation, between data points. Data points are separated into classes based upon which side of the hyperplane they are on. Multiple hyperplanes can be drawn across a plane, diving a dataset into multiple classes. The classifier will try to maximize the distance between the diving hyperplane and the points on either side of the plane, and the greater the distance between the line and the points, the more confident the classifier is.
Unsupervised Learning Algorithms
Principal Component Analysis is a technique used for dimensionality reduction, meaning that the dimensionality or complexity of the data is represented in a simpler fashion. The Principal Component Analysis algorithm finds new dimensions for the data that are orthogonal. While the dimensionality of the data is reduced, the variance between the data should be preserved as much as possible. What this means in practical terms is that it takes the features in the dataset and distills them down into fewer features that represent most of the data.
K-Means Clustering is an algorithm that automatically groups data points into clusters based on similar features. The patterns within the dataset are analyzed and the datapoints split into groups based on these patterns. Essentially, K-means creates its own classes out of unlabeled data. The K-Means algorithm operates by assigning centers to the clusters, or centroids, and moving the centroids until the optimal position for the centroids is found. The optimal position will be one where the distance between the centroids to the surrounding data points within the class is minimized. The “K” in K-means clustering refers to how many centroids have been chosen.
To close, let’s quickly go over the key differences between supervised and unsupervised learning.
As we previously discussed, in supervised learning tasks the input data is labeled and the number of classes are known. Meanwhile, input data is unlabeled and the number of classes not known in unsupervised learning cases. Unsupervised learning tends to be less computationally complex, whereas supervised learning tends to be more computationally complex. While supervised learning results tend to be highly accurate, unsupervised learning results tend to be less accurate/moderately accurate.
To Learn More
|Recommended Machine Learning Courses||Offered By||Duration||Difficulty|
University of Washingotn