### AI 101

# What are Convolutional Neural Networks?

Perhaps you’ve wondered how Facebook or Instagram is able to automatically recognize faces in an image, or how Google lets you search the web for similar photos just by uploading a photo of your own. These features are examples of computer vision, and they are powered by convolutional neural networks (CNNs). Yet what exactly are convolutional neural networks? Let’s take a deep dive into the architecture of a CNN and understand how they operate.

**Defining Neural Networks**

Before we begin talking about convolutional neural networks, let’s take a moment to define regular neural networks. There’s another article on the topic of neural networks available, so we won’t go too deep into them here. However, to briefly define them they are computational models inspired by the human brain. A neural network operates by taking in data and manipulating the data by adjusting “weights”, which are assumptions about how the input features are related to each other and the object’s class. As the network is trained the values of the weights are adjusted and they will hopefully converge on weights that accurately capture the relationships between features.

This is how a feed-forward neural network operates, and CNNs are comprised of two halves: a feed-forward neural network and a group of convolutional layers.

**What’s A Convolution?**

What are the “convolutions” that happen in a convolutional neural network? A convolution is a mathematical operation that creates a set of weights, essentially creating a representation of parts of the image. This set of weights is referred to as a kernel or filter. The filter that is created is smaller than the entire input image, covering just a subsection of the image. The values in the filter are multiplied with the values in the image. The filter is then moved over to form a representation of a new part of the image, and the process is repeated until the entire image has been covered.

Another way to think about this is to imagine a brick wall, with the bricks representing the pixels in the input image. A “window” is being slid back and forth along the wall, which is the filter. The bricks that are viewable through the window are the pixels having their value multiplied by the values within the filter. For this reason, this method of creating weights with a filter is often referred to as the “sliding windows” technique.

The output from the filters being moved around the entire input image is a two-dimensional array representing the whole image. This array is called a “feature map”.

**Why Convolutions?**

What is the purpose of creating convolutions anyway? Convolutions are necessary because a neural network has to be able to interpret the pixels in an image as numerical values. The function of the convolutional layers is to convert the image into numerical values that the neural network can interpret and then extract relevant patterns from. The job of the filters in the convolutional network is to create a two-dimensional array of values that can be passed into the later layers of a neural network, those that will learn the patterns in the image.

**Filters And Channels**

CNNs don’t use just one filter to learn patterns from the input images. Multiple filters are used, as the different arrays created by the different filters leads to a more complex, rich representation of the input image. Common numbers of filters for CNNs are 32, 64, 128, and 512. The more filters there are, the more opportunities the CNN has to examine the input data and learn from it.

A CNN analyzes the differences in pixel values in order to determine the borders of objects. In a grayscale image, the CNN would only look at the differences in black and white, light-to-dark terms. When the images are color images, not only does the CNN take dark and light into account, but it has to take the three different color channels – red, green, and blue – into account as well. In this case, the filters possess 3 channels, just like the image itself does. The number of channels that a filter has is referred to as its depth, and the number of channels in the filter must match the number of channels in the image.

**The Architecture Of A Convolutional Neural Network**

Let’s take a look at the complete architecture of a convolutional neural network. A convolutional layer is found at the beginning of every convolutional network, as it’s necessary to transform the image data into numerical arrays. However, convolutional layers can also come after other convolutional layers, meaning that these layers can be stacked on top of one another. Having multiple convolutional layers means that the outputs from one layer can undergo further convolutions and be grouped together in relevant patterns. Practically, this means that as the image data proceeds through the convolutional layers, the network begins to “recognize” more complex features of the image.

The early layers of a ConvNet are responsible for extracting the low-level features, such as the pixels that make up simple lines. Later layers of the ConvNet will join these lines together into shapes. This process of moving from surface-level analysis to deep-level analysis continues until the ConvNet is recognizing complex shapes like animals, human faces, and cars.

After the data has passed through all of the convolutional layers, it proceeds into the densely connected part of the CNN. The densely-connected layers are what a traditional feed-forward neural network looks like, a series of nodes arrayed into layers that are connected to one another. The data proceeds through these densely connected layers, which learns the patterns that were extracted by the convolutional layers, and in doing so the network becomes capable of recognizing objects.

### AI 101

# AI Algorithms Used To Develop Drugs That Fight Drug-Resistant Bacteria

One of the biggest challenges facing the medical industry is drug-resistant bacteria. Currently, there are some estimated 700,000 deaths due to drug-resistant bacteria, and more strains of drug-resistant bacteria are developing. Scientists and engineers are attempting to develop new methods of combatting drug-resistant bacteria. One method of developing new antibiotics is employing artificial intelligence and machine learning to isolate new compounds that could deal with new strains of super-bacteria.

As SingularityHub reported, a new antibiotic was designed with the assistance of AI. The antibiotic has been named halicin, after the AI HAL from 2001: A Space Odyssey. The newly developed antibiotic proved successful at eliminating some of the virile super-bacteria strains. The new antibiotic was discovered through the use of machine learning algorithms. Specifically, the machine learning model was trained using a large dataset comprised of approximately 2,500 compounds. Nearly half of the drugs used to train the model were drugs already approved by the FDA, while the other half of the training set was comprised of naturally occurring compounds. The team of researchers tweaked the algorithms to prioritize molecules that simultaneously possessed antibiotic properties but different from existing antibiotic structures. They then examined the results to determine which compounds would be safe for human consumption.

According to The Guardian, the drug proved extremely effective at fighting drug-resistant bacteria in a recent study. It is so effective because it degrades the membrane of the bacteria, which disables the ability of the bacteria to produce energy. For bacteria to develop defenses against the effects of halicin it could take more than a few genetic mutations, which gives halicin staying power. The research team also tested how the compound performed in mice, where it was able to successfully clear mice infected with a strain of bacteria resistant to all current antibiotics. With the results of the studies so promising, the research team is hoping to move into a partnership with a pharmaceutical entity and prove the drug safe for use by people.

James Collins, professor of bioengineering and senior author at MIT, and Regina Barzilay, computer science professor at MIT were both senior authors on the paper. Collins, Barzilay, and other researchers hope that algorithms like the type they used to design halicin could help fast-track the discovery of new antibiotics to deal with the proliferation of drug-resistant strains of the disease.

Halicin is far from the only drug compound discovered with the use of AI. The research team lead by Collin and Barzilay want to go farther and create new compounds training more models using around 100 million molecules pulled from the ZINC 15 database, an online library of over 1.5 billion drug compounds. Reportedly the team has already managed to find at least 23 different candidates that satisfy the criteria of being possibly safe for human use and structurally different from current antibiotics.

An unfortunate side effect of antibiotics is that, while they kill harmful bacteria, they also kill off the necessary gut bacteria that the human body needs. The research hopes that they could use techniques similar to the those used to create halicin to create antibiotics with fewer side effects, drugs less likely to harm the human gut microbiome.

Many other companies are also attempting to use machine learning to simplify the complex, long, and often expensive drug creation process. Other companies have also been training AI algorithms to synthesize new drug compounds. Just recently one company was able to develop a proof-of-concept drug in only a month and a half, a much shorter amount of time than the months or even years it can take to create a drug the traditional way.

Barzilay is optimistic that AI-driven drug discovery methods can transform the landscape of drug discovery in meaningful ways. Barzilay explained that the work on halicin is a practical example of how effective machine learning techniques can be:

“There is still a question of whether machine-learning tools are really doing something intelligent in healthcare, and how we can develop them to be workhorses in the pharmaceuticals industry. This shows how far you can adapt this tool.”

### AI 101

# What is K-Nearest Neighbors?

K-Nearest Neighbors is a machine learning technique and algorithm that can be used for both regression and classification tasks. K-Nearest Neighbors examines the labels of a chosen number of data points surrounding a target data point, in order to make a prediction about the class that the data point falls into. K-Nearest Neighbors (KNN) is a conceptually simple yet very powerful algorithm, and for those reasons, it’s one of the most popular machine learning algorithms. Let’s take a deep dive into the KNN algorithm and see exactly how it works. Having a good understanding of how KNN operates will let you appreciated the best and worst use cases for KNN.

## An Overview Of KNN

Let’s visualize a dataset on a 2D plane. Picture a bunch of data points on a graph, spread out along the graph in small clusters. KNN examines the distribution of the data points and, depending on the arguments given to the model, it separates the data points into groups. These groups are then assigned a label. The primary assumption that a KNN model makes is that data points/instances which exist in close proximity to each other are highly similar, while if a data point is far away from another group it’s dissimilar to those data points.

A KNN model calculates similarity using the distance between two points on a graph. The greater the distance between the points, the less similar they are. There are multiple ways of calculating the distance between points, but the most common distance metric is just Euclidean distance (the distance between two points in a straight line).

KNN is a supervised learning algorithm, meaning that the examples in the dataset must have labels assigned to them/their classes must be known. There are two other important things to know about KNN. First, KNN is a non-parametric algorithm. This means that no assumptions about the dataset are made when the model is used. Rather, the model is constructed entirely from the provided data. Second, there is no splitting of the dataset into training and test sets when using KNN. KNN makes no generalizations between a training and testing set, so all the training data is also used when the model is asked to make predictions.

## How The KNN Algorithm Operates

A KNN algorithm goes through three main phases as it is carried out:

- Setting K to the chosen number of neighbors.
- Calculating the distance between a provided/test example and the dataset examples.
- Sorting the calculated distances.
- Getting the labels of the top K entries.
- Returning a prediction about the test example.

In the first step, K is chosen by the user and it tells the algorithm how many neighbors (how many surrounding data points) should be considered when rendering a judgment about the group the target example belongs to. In the second step, note that the model checks the distance between the target example and every example in the dataset. The distances are then added into a list and sorted. Afterward, the sorted list is checked and the labels for the top K elements are returned. In other words, if K is set to 5, the model checks the labels of the top 5 closest data points to the target data point. When rendering a prediction about the target data point, it matters if the task is a regression or classification task. For a regression task, the mean of the top K labels is used, while the mode of the top K labels is used in the case of classification.

The exact mathematical operations used to carry out KNN differ depending on the chosen distance metric. If you would like to learn more about how the metrics are calculated, you can read about some of the most common distance metrics, such as Euclidean, Manhattan, and Minkowski.

## Why The Value Of K Matters

The main limitation when using KNN is that in an improper value of K (the wrong number of neighbors to be considered) might be chosen. If this happen, the predictions that are returned can be off substantially. It’s very important that, when using a KNN algorithm, the proper value for K is chosen. You want to choose a value for K that maximizes the model’s ability to make predictions on unseen data while reducing the number of errors it makes.

Lower values of K mean that the predictions rendered by the KNN are less stable and reliable. To get an intuition of why this is so, consider a case where we have 7 neighbors around a target data point. Let’s assume that the KNN model is working with a K value of 2 (we’re asking it to look at the two closest neighbors to make a prediction). If the vast majority of the neighbors (five out of seven) belong to the Blue class, but the two closest neighbors just happen to be Red, the model will predict that the query example is Red. Despite the model’s guess, in such a scenario Blue would be a better guess.

If this is the case, why not just choose the highest K value we can? This is because telling the model to consider too many neighbors will also reduce accuracy. As the radius that the KNN model considers increases, it will eventually start considering data points that are closer to other groups than they are the target data point and misclassification will start occurring. For example, even if the point that was initially chosen was in one of the red regions above, if K was set too high, the model would reach into the other regions to consider points. When using a KNN model, different values of K are tried to see which value gives the model the best performance.

## KNN Pros And Cons

Let’s examine some of the pros and cons of the KNN model.

**Pros:**

KNN can be used for both regression and classification tasks, unlike some other supervised learning algorithms.

KNN is highly accurate and simple to use. It’s easy to interpret, understand, and implement.

KNN doesn’t make any assumptions about the data, meaning it can be used for a wide variety of problems.

**Cons:**

KNN stores most or all of the data, which means that the model requires a lot of memory and its computationally expensive. Large datasets can also cause predictions to be take a long time.

KNN proves to be very sensitive to the scale of the dataset and it can be thrown off by irrelevant features fairly easily in comparison to other models.

## Summing Up

K-Nearest Neighbors is one of the simplest machine learning algorithms. Despite how simple KNN is, in concept, it’s also a powerful algorithm that gives fairly high accuracy on most problems. When you use KNN, be sure to experiment with various values of K in order to find the number that provides the highest accuracy.

### AI 101

# What is Linear Regression?

Linear regression is an algorithm used to predict, or visualize, a relationship between two different features/variables. In linear regression tasks, there are two kinds of variables being examined: the dependent variable and the independent variable. The independent variable is the variable that stands by itself, not impacted by the other variable. As the independent variable is adjusted, the levels of the dependent variable will fluctuate. The dependent variable is the variable that is being studied, and it is what the regression model solves for/attempts to predict. In linear regression tasks, every observation/instance is comprised of both the dependent variable value and the independent variable value.

That was a quick explanation of linear regression, but let’s make sure we come to a better understanding of linear regression by looking at an example of it and examining the formula that it uses.

## Understanding Linear Regression

Assume that we have a dataset covering hard-drive sizes and the cost of those hard drives.

Let’s suppose that the dataset we have is comprised of two different features: the amount of memory and cost. The more memory we purchase for a computer, the more the cost of the purchase goes up. If we plotted out the individual data points on a scatter plot, we might get a graph that looks something like this:

The exact memory-to-cost ratio might vary between manufacturers and models of hard drive, but in general, the trend of the data is one that starts in the bottom left (where hard drives are both cheaper and have smaller capacity) and moves to the upper right (where the drives are more expensive and have higher capacity).

If we had the amount of memory on the X-axis and the cost on the Y-axis, a line capturing the relationship between the X and Y variables would start in the lower-left corner and run to the upper right.

The function of a regression model is to determine a linear function between the X and Y variables that best describes the relationship between the two variables. In linear regression, it’s assumed that Y can be calculated from some combination of the input variables. The relationship between the input variables (X) and the target variables (Y) can be portrayed by drawing a line through the points in the graph. The line represents the function that best describes the relationship between X and Y (for example, for every time X increases by 3, Y increases by 2). The goal is to find an optimal “regression line”, or the line/function that best fits the data.

Lines are typically represented by the equation: Y = m*X + b. X refers to the dependent variable while Y is the independent variable. Meanwhile, m is the slope of the line, as defined by the “rise” over the “run”. Machine learning practitioners represent the famous slope-line equation a little differently, using this equation instead:

y(x) = w0 + w1 * x

In the above equation, y is the target variable while “w” is the model’s parameters and the input is “x”. So the equation is read as: “The function that gives Y, depending on X, is equal to the parameters of the model multiplied by the features”. The parameters of the model are adjusted during training to get the best-fit regression line.

## Multiple Regression

The process described above applies to simple linear regression, or regression on datasets where there is only a single feature/independent variable. However, a regression can also be done with multiple features. In the case of “multiple linear regression”, the equation is extended by the number of variables found within the dataset. In other words, while the equation for regular linear regression is y(x) = w0 + w1 * x, the equation for multiple linear regression would be y(x) = w0 + w1x1 plus the weights and inputs for the various features. If we represent the total number of weights and features as w(n)x(n), then we could represent the formula like this:

y(x) = w0 + w1x1 + w2x2 + … + w(n)x(n)

After establishing the formula for linear regression, the machine learning model will use different values for the weights, drawing different lines of fit. Remember that the goal is to find the line that best fits the data in order to determine which of the possible weight combinations (and therefore which possible line) best fits the data and explains the relationship between the variables.

A cost function is used to measure how close the assumed Y values are to the actual Y values when given a particular weight value. The cost function for linear regression is mean squared error, which just takes the average (squared) error between the predicted value and the true value for all of the various data points in the dataset. The cost function is used to calculate a cost, which captures the difference between the predicted target value and the true target value. If the fit line is far from the data points, the cost will be higher, while the cost will become smaller the closer the line gets to capturing the true relationships between variables. The weights of the model are then adjusted until the weight configuration that produces the smallest amount of error is found.