Put simply, reinforcement learning is a machine learning technique that involves training an artificial intelligence agent through the repetition of actions and associated rewards. A reinforcement learning agent experiments in an environment, taking actions and being rewarded when the correct actions are taken. Over time, the agent learns to take the actions that will maximize its reward. That’s a quick definition of reinforcement learning, but taking a closer look at the concepts behind reinforcement learning will help you gain a better, more intuitive understanding of it.
Reinforcement In Psychology
The term “reinforcement learning” is adapted from the concept of reinforcement in psychology. For that reason, let’s take a moment to understand the psychological concept of reinforcement. In the psychological sense, the term reinforcement refers to something that increases the likelihood that a particular response/action will occur. This concept of reinforcement is a central idea of the theory of operant conditioning, initially proposed by the psychologist B.F. Skinner. In this context, reinforcement is anything that causes the frequency of a given behavior to increase. If we think about possible reinforcement for humans, these can be things like praise, a raise at work, candy, and fun activities.
In the traditional, psychological sense, there are two types of reinforcement. There’s positive reinforcement and negative reinforcement. Positive reinforcement is the addition of something to increase a behavior, like giving your dog a treat when it is well behaved. Negative reinforcement involves removing a stimulus to elicit a behavior, like shutting off loud noises to coax out a skittish cat.
Positive and Negative Reinforcement In Machine Learning
Positive reinforcement increases the frequency of a behavior while negative reinforcement decreases the frequency. In general, positive reinforcement is the most common type of reinforcement used in reinforcement learning, as it helps models maximize the performance on a given task. Not only that but positive reinforcement leads the model to make more sustainable changes, changes which can become consistent patterns and persist for long periods of time.
In contrast, while negative reinforcement also makes a behavior more likely to occur, it is used for maintaining a minimum performance standard rather than reaching a model’s maximum performance. Negative reinforcement in reinforcement learning can help ensure that a model is kept away from undesirable actions, but it can’t really make a model explore desired actions.
Training A Reinforcement Agent
Imagine that we are training a reinforcement agent to play a platforming video game where the AI’s goal is to make it to the end of the level by moving right across the screen. The initial state of the game is drawn from the environment, meaning the first frame of the game is analyzed and given to the model. Based on this information, the model must decide on an action.
During the initial phases of training, these actions are random but as the model is reinforced, certain actions will become more common. After the action is taken the environment of the game is updated and a new state or frame is created. If the action taken by the agent produced a desirable result, let’s say in this case that the agent is still alive and hasn’t been hit by an enemy, some reward is given to the agent and it becomes more likely to do the same in the future.
This basic system is constantly looped, happening again and again, and each time the agent tries to learn a little more and maximize its reward.
Episodic vs Continuous Tasks
Reinforcement learning tasks can typically be placed in one of two different categories: episodic tasks and continual tasks.
Episodic tasks will carry out the learning/training loop and improve their performance until some end criteria are met and the training is terminated. In a game, this might be reaching the end of the level or falling into a hazard like spikes. In contrast, continual tasks have no termination criteria, essentially continuing to train forever until the engineer chooses to end the training.
Monte Carlo vs Temporal Difference
There are two primary ways of learning, or training, a reinforcement learning agent. In the Monte Carlo approach, rewards are delivered to the agent (its score is updated) only at the end of the training episode. To put that another way, only when the termination condition is hit does the model learn how well it performed. It can then use this information to update and when the next training round is started it will respond in accordance to the new information.
The temporal-difference method differs from the Monte Carlo method in that the value estimation, or the score estimation, is updated during the course of the training episode. Once the model advances to the next time step the values are updated.
Explore vs Exploit
Training a reinforcement learning agent is a balancing act, involving the balancing of two different metrics: exploration and exploitation.
Exploration is the act of collecting more information about the surrounding environment, while exploration is using the information already known about the environment to earn reward points. If an agent only explores and never exploits the environment, the desired actions will never be carried out. On the other hand, if the agent only exploits and never explores, the agent will only learn to carry out one action and won’t discover other possible strategies of earning rewards. Therefore, balancing exploration and exploitation is critical when creating a reinforcement learning agent.
Uses For Reinforcement Learning
Reinforcement learning can be used in a wide variety of roles, and it is best suited for applications where tasks require automation.
Automation of tasks to be carried out by industrial robots is one area where reinforcement learning proves useful. Reinforcement learning can also be used for problems like text mining, creating models that are able to summarize long bodies of text. Researchers are also experimenting with using reinforcement learning in the healthcare field, with reinforcement agents handling jobs like the optimization of treatment policies. Reinforcement learning could also be used to customize educational material for students.
Reinforcement learning is a powerful method of constructing AI agents that can lead to impressive and sometimes surprising results. Training an agent through reinforcement learning can be complex and difficult, as it takes many training iterations and a delicate balance of the explore/exploit dichotomy. However, if successful, an agent created with reinforcement learning can carry out complex tasks under a wide variety of different environments.
To Learn More
|Recommended Reinforcement Learning Courses||Offered By||Duration||Difficulty|
University of Alberta
University of Washington
University of Alberta
New AI Powered Tool Enables Video Editing From Themed Text Documents
A team of computer science researchers from Tsinghua and Beihand University in China, IDC Herzilya in Israel, and Harvard University have recently created a tool that generates edited videos based on a text description and a repository of video clips.
Massive amounts of video footage are recorded every day by professional videographers, hobbyists, and regular people. Yet editing this video down into a presentation that makes sense is still a costly time investment, often requiring the use of complex editing tools that can manipulate raw footage. The international team of researchers recently developed a tool that takes themed text descriptions and generates videos based on them. The tool is capable of examining video clips in a repository and selecting the clips that correspond with the input text describing the storyline. The goal is that the tool is user-friendly and powerful enough to produce quality videos without the need for extensive video editing skills or expensive video editing software.
While current video editing platforms require knowledge of video editing techniques, the tool created by the researchers lets novice video creates create compositions that tells stories in a more natural, intuitive fashion. “Write-A-Video”, as it is dubbed by its creators, lets users edit videos by just editing the text that accompanies the video. If a user deletes text, adds text, or moves sentences around, these changes will be reflected in the video. Corresponding shots will be cut or added as the user manipulates the text and the final resulting video tailored to the user’s description.
Ariel Shamir, the Dean of the Efi Arazi School of Computer Science at IDC Herzliya explained that the Write-A-Video tool lets the user interact with the video mainly through text, using natural language processing techniques to match video shots based on the provided semantic meaning. An optimization algorithm is then used to assemble the video by cutting and swapping shots. The tool allows users to experiment with different visual styles as well, tweaking how scenes are presented by using specific film idioms that will speed up or slow down the action, or make more/fewer cuts.
The program selects possible shots based on their aesthetic appeal. The program considers how shots are framed, focused, and light in order to determine the aesthetic appeal. The tool will select shots that are better focused, instead of blurry or unstable, and it will also prioritize shots that are well lit. According to the creators of Write-A-Video, the user can render the generated video at any point and preview it with a voice-over narration that describes the text used to select the clips.
According to the research team, their experiment demonstrated that digital techniques that combine aspects of computer vision and natural language processing can assist users in creative processes like the editing of videos.
“Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” explained Shamir to TechXplore.
The researchers tested their tool out on different video repositories combined with themed text documents. User studies and quantitative evaluation was performed to interpret the results of the experiment. The results of the user studies found that non-professionals could sometimes produce high quality edited videos using the tool faster than professionals using frame-based editing software could. As reported by TechXplore, the team will be presenting their work in a few days at the ACM SIGGRAPH Asia conference held in Australia. Other entities are also using AI to augment video editing. Adobe has also been working on its own AI-powered extensions for Premiere Pro, its editing platform. The tool helps people ensure that changes in aspect ratio don’t cut out important pieces of video.
Structured vs Unstructured Data
Unstructured data is data that isn’t organized in a pre-defined fashion or lacks a specific data model. Meanwhile, structured data is data that has clear, definable relationships between the data points, with a pre-defined model containing it. That’s the short answer on the difference between structured and unstructured data, but let’s take a closer look at the differences between the two types of data.
When it comes to computer science, data structures refer to specific ways of storing and organizing data. Different data structures possess different relationships between data points, but data can also be unstructured. What does it mean to say that data is structured? To make this definition clearer, let’s take a look at some of the various ways of structuring data.
Structured data is often held in tables such as Excel files or SQL databases. In these cases, the rows and columns of the data hold different variables or features, and it is often possible to discern the relationship between data points by checking to see where data rows and columns intersect. Structured data can easily be fit into a relational database, and examples of different features in a structured dataset can include items like names, addresses, dates, weather statistics, credit card numbers, etc. While structured data is most often text data, it is possible to store things like images and audio as structured data as well.
Common sources of structured data include things like data collected from sensors, weblogs, network data, and retail or e-commerce data. Structured data can also be generated by people filling in spreadsheets or databases with data collected from computers and other devices. For instance, data collected through online forms is often immediately fed into a data structure.
Structured data has a long history of being stored in relational databases and SQL. These storage methods are popular because of the ease of reading and writing in these formats, with most platforms and languages being able to interpret these data formats.
In a machine learning context, structured data is easier to train a machine learning system on, because the patterns within the data are more explicit. Certain features can be fed into a machine learning classifier and used to label other data instances based on those selected features. In contrast, training a machine learning system on unstructured data tends to be more difficult, for reasons that will become clear.
Unstructured data is data that isn’t organized according to a pre-defined data model or structure. Unstructured data is often called qualitative data because it can’t be analyzed or processed in traditional ways using the regular methods used for structured data.
Because unstructured data doesn’t have any defined relationships between data points, it can’t be organized in relational databases. In contrast, the way unstructured data is stored is typically with a NoSQL database, or a non-relational database. If the structure of the database is of little concern, a data lake, or a large pool of unstructured data, can be used to store the data instead of a NoSQL database.
Unstructured data is difficult to analyze, and making sense of unstructured data often involves examining individual pieces of data to discern potential features and then looking to see if those features occur in other pieces of data within the pool.
The vast majority of data is in unstructured formats, with estimates that unstructured data comprises around 80% of all data. Data mining techniques can be used to help structure data.
In terms of machine learning, certain techniques can help order unstructured data and turn it into structured data. A popular tool for turning unstructured data into structured data is a system called an autoencoder.
What is Natural Language Processing?
Natural Language Processing (NLP) is the study and application of techniques and tools that enable computers to process, analyze, interpret, and reason about human language. NLP is an interdisciplinary field and it combines techniques established in fields like linguistics and computer science. These techniques are used in concert with AI to create chatbots and digital assistants like Google Assistant and Amazon’s Alexa.
Let’s take some time to explore the rationale behind Natural Language Processing, some of the techniques used in NLP, and some common uses cases for NLP.
Why Is Natural Language Processing Important?
In order for computers to interpret human language, they must be converted into a form that a computer can manipulate. However, this isn’t as simple as converting text data into numbers. In order to derive meaning from human language, patterns have to be extracted from the hundreds or thousands of words that make up a text document. This is no easy task. There are few hard and fast rules that can be applied to the interpretation of human language. For instance, the exact same set of words can mean different things depending on the context. Human language is a complex and often ambiguous thing, and a statement can be uttered with sincerity or sarcasm.
Despite this, there are some general guidelines that can be used when interpreting words and characters, such as the character “s” being used to denote that an item is plural. These general guidelines have to be used in concert with each other to extract meaning from the text, to create features that a machine learning algorithm can interpret.
Natural Language Processing involves the application of various algorithms capable of taking unstructured data and converting it into structured data. If these algorithms are applied in the wrong manner, the computer will often fail to derive the correct meaning from the text. This can often be seen in the translation of text between languages, where the precise meaning of the sentence is often lost. While machine translation has improved substantially over the past few years, machine translation errors still occur frequently.
Natural Language Processing Techniques
Many of the techniques that are used in natural language processing can be placed in one of two categories: syntax or semantics. Syntax techniques are those that deal with the ordering of words, while semantic techniques are the techniques that involve the meaning of words.
Syntax NLP Techniques
Examples of syntax include:
- Morphological Segmentation
- Part-of-Speech Tagging
- Sentence Breaking
- Word Segmentation
Lemmatization refers to distilling the different inflections of a word down to a single form. Lemmatization takes things like tenses and plurals and simplifies them, for example, “feet” might become “foot” and “stripes” may become “stripe”. This simplified word form makes it easier for an algorithm to interpret the words in a document.
Morphological segmentation is the process of dividing words into morphemes or the base units of a word. These units are things like free morphemes (which can stand alone as words) and prefixes or suffixes.
Part-of-speech tagging is simply the process of identifying which part of speech every word in an input document is.
Parsing refers to analyzing all the words in a sentence and correlating them with their formal grammar labels or doing grammatical analysis for all the words.
Sentence breaking, or sentence boundary segmentation, refers to deciding where a sentence begins and ends.
Stemming is the process of reducing words down to the root form of the word. For instance, connected, connection, and connections would all be stemmed to “connect”.
Word Segmentation is the process of dividing large pieces of text down into small units, which can be words or stemmed/lemmatized units.
Semantic NLP Techniques
Semantic NLP techniques include techniques like:
- Named Entity Recognition
- Natural Language Generation
- Word-Sense disambiguation
Named entity recognition involves tagging certain text portions that can be placed into one of a number of different preset groups. Pre-defined categories include things like dates, cities, places, companies, and individuals.
Natural language generation is the process of using databases to transform structured data into natural language. For instance, statistics about the weather, like temperature and wind speed could be summarized with natural language.
Word-sense disambiguation is the process of assigning meaning to words within a text based on the context the words appear in.
Deep Learning Models For Natural Language Processing
Regular multilayer perceptrons are unable to handle the interpretation of sequential data, where the order of the information is important. In order to deal with the importance of order in sequential data, a type of neural network is used that preserves information from previous timesteps in the training.
Recurrent Neural Networks are types of neural networks that loop over data from previous timesteps, taking them into account when calculating the weights of the current timestep. Essentially, RNN’s have three parameters that are used during the forward training pass: a matrix based on the Previous Hidden State, a matrix based on the Current Input, and a matrix that is between the hidden state and the output. Because RNNs can take information from previous timesteps into account, they can extract relevant patterns from text data by taking earlier words in the sentence into account when interpreting the meaning of a word.
Another type of deep learning architecture used to process text data is a Long Short-Term Memory (LSTM) network. LSTM networks are similar to RNNs in structure, but owing to some differences in their architecture they tend to perform better than RNNs. They avoid a specific problem that often occurs when using RNNs called the exploding gradient problem.
These deep neural networks can be either unidirectional or bi-directional. Bi-directional networks are capable of taking not just the words that come prior to the current word into account, but the words that come after it. While this leads to higher accuracy, it is more computationally expensive.
Use Cases For Natural Language Processing
Because Natural Language Processing involves the analysis and manipulation of human languages, it has an incredibly wide range of applications. Possible applications for NLP include chatbots, digital assistants, sentiment analysis, document organization, talent recruitment, and healthcare.
Chatbots and digital assistants like Amazon’s Alexa and Google Assistant are examples of voice recognition and synthesis platforms that use NLP to interpret and respond to vocal commands. These digital assistants help people with a wide variety of tasks, letting them offload some of their cognitive tasks to another device and free up some of their brainpower for other, more important things. Instead of looking up the best route to the bank on a busy morning, we can just have our digital assistant do it.
Sentiment analysis is the use of NLP techniques to study people’s reactions and feelings to a phenomenon, as communicated by their use of language. Capturing the sentiment of a statement, like interpreting whether a review of a product is good or bad, can provide companies with substantial information regarding how their product is being received.
Automatically organizing text documents is another application of NLP. Companies like Google and Yahoo use NLP algorithms to classify email documents, putting them in the appropriate bins such as “social” or “promotions”. They also use these techniques to identify spam and prevent it from reaching your inbox.
Groups have also developed NLP techniques are being used to identify potential job hires, finding them based on relevant skills. Hiring managers are also using NLP techniques to help them sort through lists of applicants.
NLP techniques are also being used to enhance healthcare. NLP can be used to improve the detection of diseases. Health records can be analyzed and symptoms extracted by NLP algorithms, which can then be used to suggest possible diagnoses. One example of this is Amazon’s Comprehend Medical platform, which analyzes health records and extracts diseases and treatments. Healthcare applications of NLP also extend to mental health. There are apps such as WoeBot, which talks users through a variety of anxiety management techniques based in Cognitive Behavioral Therapy.
To Learn More
|Recommended Natural Language Processing Courses||Offered By||Duration||Difficulty|
Deep Learning AI
Higher School of Economics
- AI System Automatically Transforms To Evade Censorship Attempts
- Optical Switch Can Reroute Light Between Chips Extremely Fast
- New AI Powered Tool Enables Video Editing From Themed Text Documents
- How we can use Deep Learning with Small Data? – Thought Leaders
- A New AI System Could Create More Hope For People With Epilepsy