Deep learning is one of the most influential and fastest growing fields in artificial intelligence. However, getting an intuitive understanding of deep learning can be difficult because the term deep learning covers a variety of different algorithms and techniques. Deep learning is also a subdiscipline of machine learning in general, so it’s important to understand what machine learning is in order to understand deep learning.
Deep learning is an extension of some of the concepts originating from machine learning, so for that reason, let’s take a minute to explain what machine learning is.
Put simply, machine learning is a method of enabling computers to carry out specific tasks without explicitly coding every line of the algorithms used to accomplish those tasks. There are many different machine learning algorithms, but one of the most commonly used algorithms is a multilayer perceptron. A multilayer perceptron is also referred to as a neural network, and it is comprised of a series of nodes/neurons linked together. There are three different layers in a multilayer perceptron: the input layer, the hidden layer, and the output layer.
The input layer takes the data into the network, where it is manipulated by the nodes in the middle/hidden layer. The nodes in the hidden layer are mathematical functions that can manipulate the data coming from the input layer, extracting relevant patterns from the input data. This is how the neural network “learns”. Neural networks get their name from the fact that they are inspired by the structure and function of the human brain.
The connections between nodes in the network have values called weights. These values are essentially assumptions about how the data in one layer is related to the data in the next layer. As the network trains the weights are adjusted, and the goal is that the weights/assumptions about the data will eventually converge on values that accurately represent the meaningful patterns within the data.
Activation functions are present in the nodes of the network, and these activation functions transform the data in a non-linear fashion, enabling the network to learn complex representations of the data. Activation functions multiply the input values by the weight values and add a bias term.
Defining Deep Learning
Deep learning is the term given to machine learning architectures that join many multilayer perceptrons together, so that there isn’t just one hidden layer but many hidden layers. The “deeper” that the deep neural network is, the more sophisticated patterns the network can learn.
The deep layer networks comprised of neurons are sometimes referred to as fully connected networks or fully connected layers, referencing the fact that a given neuron maintains a connection to all the neurons surrounding it. Fully connected networks can be combined with other machine learning functions to create different deep learning architectures.
Different Deep Learning Architectures
There are a variety of deep learning architectures used by researchers and engineers, and each of the different architectures has its own specialty use case.
Convolutional Neural Networks
Convolutional neural networks, or CNNs, are the neural network architecture commonly used in the creation of computer vision systems. The structure of convolutional neural networks enables them to interpret image data, converting them into numbers that a fully connected network can interpret. A CNN has four major components:
- Convolutional layers
- Subsampling/pooling layers
- Activation functions
- Fully connected layers
The convolutional layers are what takes in the images as inputs into the network, analyzing the images and getting the values of the pixels. Subsampling or pooling is where the image values are converted/reduced to simplify the representation of the images and reduce the sensitivity of the image filters to noise. The activation functions control how the data flows from one layer to the next layer, and the fully connected layers are what analyze the values that represent the image and learn the patterns held in those values.
Recurrent neural networks, or RNNs, are popular for tasks where the order of the data matters, where the network must learn about a sequence of data. RNNs are commonly applied to problems like natural language processing, as the order of words matters when decoding the meaning of a sentence. The “recurrent” part of the term Recurrent Neural Network comes from the fact that the output for a given element in a sequence in dependant on the previous computation as well as the current computation. Unlike other forms of deep neural networks, RNNs have “memories”, and the information calculated at the different time steps in the sequence is used to calculate the final values.
There are multiple types of RNNs, including bidirectional RNNs, which take future items in the sequence into account, in addition to the previous items, when calculating an item’s value. Another type of RNN is a Long Short-Term Memory, or LSTM, network. LSTMs are types of RNN that can handle long chains of data. Regular RNNs may fall victim to something called the “exploding gradient problem”. This issue occurs when the chain of input data becomes extremely long, but LSTMs have techniques to combat this problem.
Most of the deep learning architectures mentioned so far are applied to supervised learning problems, rather than unsupervised learning tasks. Autoencoders are able to transform unsupervised data into a supervised format, allowing neural networks to be used on the problem.
Autoencoders are frequently used to detect anomalies in datasets, an example of unsupervised learning as the nature of the anomaly isn’t known. Such examples of anomaly detection include fraud detection for financial institutions. In this context, the purpose of an autoencoder is to determine a baseline of regular patterns in the data and identify anomalies or outliers.
The structure of an autoencoder is often symmetrical, with hidden layers arrayed such that the output of the network resembles the input. The four types of autoencoders that see frequent use are:
- Regular/plain autoencoders
- Multilayer encoders
- Convolutional encoders
- Regularized encoders
Regular/plain autoencoders are just neural nets with a single hidden layer, while multilayer autoencoders are deep networks with more than one hidden layer. Convolutional autoencoders use convolutional layers instead of, or in addition to, fully-connected layers. Regularized autoencoders use a specific kind of loss function that lets the neural network carry out more complex functions, functions other than just copying inputs to outputs.
Generative Adversarial Networks
Generative Adversarial Networks (GANs) are actually multiple deep neural networks instead of just one network. Two deep learning models are trained at the same time, and their outputs are fed to the other network. The networks are in competition with each other, and since they get access to each other’s output data, they both learn from this data and improve. The two networks are essentially playing a game of counterfeit and detection, where the generative model tries to create new instances that will fool the detective model/the discriminator. GANs have become popular in the field of computer vision.
Deep learning extends the principles of neural networks to create sophisticated models that can learn complex patterns and generalize those patterns to future datasets. Convolutional neural networks are used to interpret images, while RNNs/LSTMs are used to interpret sequential data. Autoencoders can transform unsupervised learning tasks into supervised learning tasks. Finally, GANs are multiple networks pitted against each other that are especially useful for computer vision tasks.
To Learn More
|Recommended Deep Learning Courses||Offered By||Duration||Difficulty|
Deep Learning AI
Deep Learning AI
New AI Powered Tool Enables Video Editing From Themed Text Documents
A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.
Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.
While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.
Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.
The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.
According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.
“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.
The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.
Structured vs Unstructured Data
Unstructured data is data that isn’t organized in a pre-defined fashion or lacks a specific data model. Meanwhile, structured data is data that has clear, definable relationships between the data points, with a pre-defined model containing it. That’s the short answer on the difference between structured and unstructured data, but let’s take a closer look at the differences between the two types of data.
When it comes to computer science, data structures refer to specific ways of storing and organizing data. Different data structures possess different relationships between data points, but data can also be unstructured. What does it mean to say that data is structured? To make this definition clearer, let’s take a look at some of the various ways of structuring data.
Structured data is often held in tables such as Excel files or SQL databases. In these cases, the rows and columns of the data hold different variables or features, and it is often possible to discern the relationship between data points by checking to see where data rows and columns intersect. Structured data can easily be fit into a relational database, and examples of different features in a structured dataset can include items like names, addresses, dates, weather statistics, credit card numbers, etc. While structured data is most often text data, it is possible to store things like images and audio as structured data as well.
Common sources of structured data include things like data collected from sensors, weblogs, network data, and retail or e-commerce data. Structured data can also be generated by people filling in spreadsheets or databases with data collected from computers and other devices. For instance, data collected through online forms is often immediately fed into a data structure.
Structured data has a long history of being stored in relational databases and SQL. These storage methods are popular because of the ease of reading and writing in these formats, with most platforms and languages being able to interpret these data formats.
In a machine learning context, structured data is easier to train a machine learning system on, because the patterns within the data are more explicit. Certain features can be fed into a machine learning classifier and used to label other data instances based on those selected features. In contrast, training a machine learning system on unstructured data tends to be more difficult, for reasons that will become clear.
Unstructured data is data that isn’t organized according to a pre-defined data model or structure. Unstructured data is often called qualitative data because it can’t be analyzed or processed in traditional ways using the regular methods used for structured data.
Because unstructured data doesn’t have any defined relationships between data points, it can’t be organized in relational databases. In contrast, the way unstructured data is stored is typically with a NoSQL database, or a non-relational database. If the structure of the database is of little concern, a data lake, or a large pool of unstructured data, can be used to store the data instead of a NoSQL database.
Unstructured data is difficult to analyze, and making sense of unstructured data often involves examining individual pieces of data to discern potential features and then looking to see if those features occur in other pieces of data within the pool.
The vast majority of data is in unstructured formats, with estimates that unstructured data comprises around 80% of all data. Data mining techniques can be used to help structure data.
In terms of machine learning, certain techniques can help order unstructured data and turn it into structured data. A popular tool for turning unstructured data into structured data is a system called an autoencoder.
What is Natural Language Processing?
Natural Language Processing (NLP) is the study and application of techniques and tools that enable computers to process, analyze, interpret, and reason about human language. NLP is an interdisciplinary field and it combines techniques established in fields like linguistics and computer science. These techniques are used in concert with AI to create chatbots and digital assistants like Google Assistant and Amazon’s Alexa.
Let’s take some time to explore the rationale behind Natural Language Processing, some of the techniques used in NLP, and some common uses cases for NLP.
Why Is Natural Language Processing Important?
In order for computers to interpret human language, they must be converted into a form that a computer can manipulate. However, this isn’t as simple as converting text data into numbers. In order to derive meaning from human language, patterns have to be extracted from the hundreds or thousands of words that make up a text document. This is no easy task. There are few hard and fast rules that can be applied to the interpretation of human language. For instance, the exact same set of words can mean different things depending on the context. Human language is a complex and often ambiguous thing, and a statement can be uttered with sincerity or sarcasm.
Despite this, there are some general guidelines that can be used when interpreting words and characters, such as the character “s” being used to denote that an item is plural. These general guidelines have to be used in concert with each other to extract meaning from the text, to create features that a machine learning algorithm can interpret.
Natural Language Processing involves the application of various algorithms capable of taking unstructured data and converting it into structured data. If these algorithms are applied in the wrong manner, the computer will often fail to derive the correct meaning from the text. This can often be seen in the translation of text between languages, where the precise meaning of the sentence is often lost. While machine translation has improved substantially over the past few years, machine translation errors still occur frequently.
Natural Language Processing Techniques
Many of the techniques that are used in natural language processing can be placed in one of two categories: syntax or semantics. Syntax techniques are those that deal with the ordering of words, while semantic techniques are the techniques that involve the meaning of words.
Syntax NLP Techniques
Examples of syntax include:
- Morphological Segmentation
- Part-of-Speech Tagging
- Sentence Breaking
- Word Segmentation
Lemmatization refers to distilling the different inflections of a word down to a single form. Lemmatization takes things like tenses and plurals and simplifies them, for example, “feet” might become “foot” and “stripes” may become “stripe”. This simplified word form makes it easier for an algorithm to interpret the words in a document.
Morphological segmentation is the process of dividing words into morphemes or the base units of a word. These units are things like free morphemes (which can stand alone as words) and prefixes or suffixes.
Part-of-speech tagging is simply the process of identifying which part of speech every word in an input document is.
Parsing refers to analyzing all the words in a sentence and correlating them with their formal grammar labels or doing grammatical analysis for all the words.
Sentence breaking, or sentence boundary segmentation, refers to deciding where a sentence begins and ends.
Stemming is the process of reducing words down to the root form of the word. For instance, connected, connection, and connections would all be stemmed to “connect”.
Word Segmentation is the process of dividing large pieces of text down into small units, which can be words or stemmed/lemmatized units.
Semantic NLP Techniques
Semantic NLP techniques include techniques like:
- Named Entity Recognition
- Natural Language Generation
- Word-Sense disambiguation
Named entity recognition involves tagging certain text portions that can be placed into one of a number of different preset groups. Pre-defined categories include things like dates, cities, places, companies, and individuals.
Natural language generation is the process of using databases to transform structured data into natural language. For instance, statistics about the weather, like temperature and wind speed could be summarized with natural language.
Word-sense disambiguation is the process of assigning meaning to words within a text based on the context the words appear in.
Deep Learning Models For Natural Language Processing
Regular multilayer perceptrons are unable to handle the interpretation of sequential data, where the order of the information is important. In order to deal with the importance of order in sequential data, a type of neural network is used that preserves information from previous timesteps in the training.
Recurrent Neural Networks are types of neural networks that loop over data from previous timesteps, taking them into account when calculating the weights of the current timestep. Essentially, RNN’s have three parameters that are used during the forward training pass: a matrix based on the Previous Hidden State, a matrix based on the Current Input, and a matrix that is between the hidden state and the output. Because RNNs can take information from previous timesteps into account, they can extract relevant patterns from text data by taking earlier words in the sentence into account when interpreting the meaning of a word.
Another type of deep learning architecture used to process text data is a Long Short-Term Memory (LSTM) network. LSTM networks are similar to RNNs in structure, but owing to some differences in their architecture they tend to perform better than RNNs. They avoid a specific problem that often occurs when using RNNs called the exploding gradient problem.
These deep neural networks can be either unidirectional or bi-directional. Bi-directional networks are capable of taking not just the words that come prior to the current word into account, but the words that come after it. While this leads to higher accuracy, it is more computationally expensive.
Use Cases For Natural Language Processing
Because Natural Language Processing involves the analysis and manipulation of human languages, it has an incredibly wide range of applications. Possible applications for NLP include chatbots, digital assistants, sentiment analysis, document organization, talent recruitment, and healthcare.
Chatbots and digital assistants like Amazon’s Alexa and Google Assistant are examples of voice recognition and synthesis platforms that use NLP to interpret and respond to vocal commands. These digital assistants help people with a wide variety of tasks, letting them offload some of their cognitive tasks to another device and free up some of their brainpower for other, more important things. Instead of looking up the best route to the bank on a busy morning, we can just have our digital assistant do it.
Sentiment analysis is the use of NLP techniques to study people’s reactions and feelings to a phenomenon, as communicated by their use of language. Capturing the sentiment of a statement, like interpreting whether a review of a product is good or bad, can provide companies with substantial information regarding how their product is being received.
Automatically organizing text documents is another application of NLP. Companies like Google and Yahoo use NLP algorithms to classify email documents, putting them in the appropriate bins such as “social” or “promotions”. They also use these techniques to identify spam and prevent it from reaching your inbox.
Groups have also developed NLP techniques are being used to identify potential job hires, finding them based on relevant skills. Hiring managers are also using NLP techniques to help them sort through lists of applicants.
NLP techniques are also being used to enhance healthcare. NLP can be used to improve the detection of diseases. Health records can be analyzed and symptoms extracted by NLP algorithms, which can then be used to suggest possible diagnoses. One example of this is Amazon’s Comprehend Medical platform, which analyzes health records and extracts diseases and treatments. Healthcare applications of NLP also extend to mental health. There are apps such as WoeBot, which talks users through a variety of anxiety management techniques based in Cognitive Behavioral Therapy.
To Learn More
|Recommended Natural Language Processing Courses||Offered By||Duration||Difficulty|
Deep Learning AI
Higher School of Economics
- AI System Automatically Transforms To Evade Censorship Attempts
- Optical Switch Can Reroute Light Between Chips Extremely Fast
- New AI Powered Tool Enables Video Editing From Themed Text Documents
- How we can use Deep Learning with Small Data? – Thought Leaders
- A New AI System Could Create More Hope For People With Epilepsy