Connect with us

AI 101

What is Gradient Descent?

mm

Published

 on

If you’ve read about how neural networks are trained, you’ve almost certainly come across the term “gradient descent” before. Gradient descent is the primary method of optimizing a neural network’s performance, reducing the network’s loss/error rate. However, gradient descent can be a little hard to understand for those new to machine learning, and this article will endeavor to give you a decent intuition for how gradient descent operates.

Gradient descent is an optimization algorithm. It’s used to improve the performance of a neural network by making tweaks to the parameters of the network such that the difference between the network’s predictions and the actual/expected values of the network (referred to as the loss) is a small as possible. Gradient descent takes the initial values of the parameters and uses operations based in calculus to adjust their values towards the values that will make the network as accurate as it can be. You don’t need to know a lot of calculus to understand how gradient descent works, but you do need to have an understanding of gradients.

What Are Gradients?

Assume that there is a graph that represents the amount of error a neural network makes. The bottom of the graph represents the points of lowest error while the top of the graph is where the error is the highest. We want to move from the top of the graph down to the bottom. A gradient is just a way of quantifying the relationship between error and the weights of the neural network. The relationship between these two things can be graphed as a slope, with incorrect weights producing more error. The steepness of the slope/gradient represents how fast the model is learning.

A steeper slope means large reductions in error are being made and the model is learning fast, whereas if the slope is zero the model is on a plateau and isn’t learning. We can move down the slope towards less error by calculating a gradient, a direction of movement (change in the parameters of the network) for our model.

Let’s shift the metaphor just slightly and imagine a series of hills and valleys. We want to get to the bottom of the hill and find the part of the valley that represents the lowest loss. When we start at the top of the hill we can take large steps down the hill and be confident that we are heading towards the lowest point in the valley.

However, as we get closer to the lowest point in the valley, our steps will need to become smaller, or else we could overshoot the true lowest point. Similarly, it’s possible that when adjusting the weights of the network, the adjustments can actually take it further away from the point of lowest loss, and therefore the adjustments must get smaller over time. In the context of descending a hill towards a point of lowest loss, the gradient is a vector/instructions detailing the path we should take and how large our steps should be.

Now we know that gradients are instructions that tell us which direction to move in (which coefficients should be updated) and how large the steps we should take are (how much the coefficients should be updated), we can explore how the gradient is calculated.

Calculating Gradients and Gradient Descent Procedure

Gradient descent starts at a place of high loss and by through multiple iterations, takes steps in the direction of lowest loss, aiming to find the optimal weight configuration. Photo: Роман Сузи via Wikimedia Commons, CCY BY SA 3.0 (https://commons.wikimedia.org/wiki/File:Gradient_descent_method.png)

In order to carry out gradient descent, the gradients must first be calculated. In order to calculate the gradient, we need to know the loss/cost function. We’ll use the cost function to determine the derivative. In calculus, the derivative just refers to the slope of a function at a given point, so we’re basically just calculating the slope of the hill based on the loss function. We determine the loss by running the coefficients through the loss function. If we represent the loss function as “f”, then we can state that the equation for calculating the loss is as follows (we’re just running the coefficients through our chosen cost function):

Loss = f(coefficient)

We then calculate the derivative, or determine the slope. Getting the derivative of the loss will tell us which direction is up or down the slope, by giving us the appropriate sign to adjust our coefficients by. We’ll represent the appropriate direction as “delta”.

delta = derivative_function(loss)

We’ve now determined which direction is downhill towards the point of lowest loss. This means we can update the coefficients in the neural network parameters and hopefully reduce the loss. We’ll update the coefficients based on the previous coefficients minus the appropriate change in value as determined by the direction (delta) and an argument that controls the magnitude of change (the size of our step). The argument that controls the size of the update is called the “learning rate” and we’ll represent it as “alpha”.

coefficient = coefficient – (alpha * delta)

We then just repeat this process until the network has converged around the point of lowest loss, which should be near zero.

It’s very important to choose the right value for the learning rate (alpha). The chosen learning rate must be neither too small or too large. Remember that as we approach the point of lowest loss our steps must become smaller or else we will overshoot the true point of lowest loss and end up on the other side. The point of smallest loss is small and if our rate of change is too large the error can end up increasing again. If the step sizes are too large the network’s performance will continue to bounce around the point of lowest loss, overshooting it on one side and then the other. If this happens the network will never converge on the true optimal weight configuration.

In contrast, if the learning rate is too small the network can potentially take an extraordinarily long time to converge on the optimal weights.

Types Of Gradient Descent

Now that we understand how gradient descent works in general, let’s take a look at some of the different types of gradient descent.

Batch Gradient Descent: This form of gradient descent runs through all the training samples before updating the coefficients. This type of gradient descent is likely to be the most computationally efficient form of gradient descent, as the weights are only updated once the entire batch has been processed, meaning there are fewer updates total. However, if the dataset contains a large number of training examples, then batch gradient descent can make training take a long time.

Stochastic Gradient Descent: In Stochastic Gradient Descent only a single training example is processed for every iteration of gradient descent and parameter updating. This occurs for every training example. Because only one training example is processed before the parameters are updated, it tends to converge faster than Batch Gradient Descent, as updates are made sooner. However, because the process must be carried out on every item in the training set, it can take quite a long time to complete if the dataset is large, and so use of one of the other gradient descent types if preferred.

Mini-Batch Gradient Descent: Mini-Batch Gradient Descent operates by splitting the entire training dataset up into subsections. It creates smaller mini-batches that are run through the network, and when the mini-batch has been used to calculate the error the coefficients are updated. Mini-batch Gradient Descent strikes a middle ground between Stochastic Gradient Descent and Batch Gradient Descent. The model is updated more frequently than in the case of Batch Gradient Descent, which means a slightly faster and more robust convergence on the model’s optimal parameters. It’s also more computationally efficient than Stochastic Gradient Descent

Spread the love

Blogger and programmer with specialties in Machine Learning and Deep Learning topics. Daniel hopes to help others use the power of AI for social good.

AI 101

What is Robotic Process Automation (RPA)?

mm

Published

on

A great deal of the work that people do every day doesn’t involve any of their creativity or unique skills, being highly tedious and simple tasks like categorizing emails and messages, updating spreadsheets, processing transactions, and more. Robotic Process Automation (RPA) is an emerging technology that often leverages aspects of artificial intelligence to automate these tasks, with the goal of enabling workers to devote their attention to more important tasks. RPA can be accomplished with a variety of different techniques, tools, and algorithms, and the corrected applications of RPA can bring organizations many benefits.

Defining Robotic Process Automation

Despite having the name “robot” in it, Robotic Process Automation has nothing to do with physical robots. Rather, the robots referred to in RPA are software bots, and RPA systems are essentially just a collection of bots that carry out specific, often tedious tasks. RPA bots can run on either physical or virtual machines, and they can be directed to carry out tasks by the software’s user. RPA interfaces are intended to allow even people unfamiliar with the construction of the bots to define a set of tasks for the bot to perform.

As previously mentioned, the main purpose of an RPA is to automate the many repetitive, mundane tasks that people often have to do in a workplace. Saving time and resources is the goal of RPA. The tasks that RPA is used to carry out need to be fairly simple, with a concrete series of steps to follow to accomplish this task.

Benefits of RPA

When properly utilized, RPA technology can free up timer, personnel, and resources, letting them be applied to more important tasks and challenges. RPA can be used to enable better customer service by handling the first interactions with customers and directing them to the right customer service agent. RPA systems can also be used to improve how data is collected and handled. For instance, when transactions occur they can be digitized and automatically entered into a database.

RPA systems can also be used to ensure that the operations of a business comply with established standards and regulations. RPA can also meaningfully reduce human error rates and log actions taken so that if there if the system does produce an error, the events that led to the error can easily be identified. Ultimately, the benefits of RPA apply to any situation where a process can be made more efficient by automating many of the steps needed to complete that process.

How Does RPA Work?

The exact methods RPA platforms and bots use to carry out their task vary, but they often employ some machine learning and AI algorithms, as well as computer vision algorithms.

Machine learning and AI techniques may be employed to let the bots learn which actions are correlated with the goals the operator has defined. However, RPA platforms often carry out most of their actions according to rules, therefore acting more like traditional programs than AI. As a result, there is some debate regarding whether or not RPA systems should be classified as AI systems.

Even so, RPA often works in concert with AI technologies and algorithms. Deep neural networks can be used to interpret complex image and text data, enabling the bots to determine what actions need to be carried out to handle this data in the manner the user has specified, even if the actions the bot takes is strictly rules-based. For instance, convolutional neural networks can be used to allow a network to interpret images on a screen and react based upon how those images are classified.

What Processes Can Be Handled By RPA?

Examples of tasks that can be handled by RPA systems include basic data manipulation, transaction processing, and communicating with other digital systems. A RPA system could be set up to collect data from specific sources or clean data that has been received. In general, there are four criteria that a task must fulfill to be a good candidate for automation with RPA.

First, the process must be rule-based, with very specific instructions and ground facts that can be used to determine what to do with the information the system encounters. Secondly, the process should occur at specific times or have a definable start condition. Thirdly, the process should have clear inputs and outputs. Finally, the task should have volume, it should deal with a sizable amount of information and require a fair amount of time to complete so that it would make sense to automate the process.

Based on these principles, let’s examine some potential use cases for RPA.

One way that RPA could be used is to expedite the process of handling customer returns. Returns are typically a costly, time-intensive endeavor. When a return is requested, the customer service agent has to send a number of messages that confirm the return and how the customer would like their money refunded, update current inventory in the system, and then after making the payment to the customer update the sales figures. Much of this could be handled by an RPA that ascertains which items are being returned and how the customer wants their refund dispersed. The RPA would just use rules that take as an input the product being returned and the customer’s information and output a complete refund document that the agent would just have to glance at and approve.

Another potential use case for RPA is for retailers who would like to automate aspects of their supply chain management. RPA could be used to keep items in stock, checking inventory levels whenever an item is sold and when the stock falls below a certain threshold orders for replacements can be made.

Drawbacks To Using RPA

While RPA systems have the potential to save companies who use them time, money, and effort, they are not suited to every task. RPA implementations may often fail due to the constraints of the system they operate in. If not properly designed and implemented, RPA systems can also exacerbate currently existing problems as they operate on rules that may cease to be applicable as situations evolve. For example, if an RPA system is instructed to order replacements of items whenever a stock falls too low, it may not be able to adjust to fluctuations in demand and continue ordering large batches of products even as the overall demand for those products declines. Scaling RPA platforms up across a company also proves to be difficult, as the more rules-based a system-becomes the more inflexible it becomes.

Additionally, the act of installing thousands of bots across a system might be much more time-intensive and costly than expected, potentially costly enough that the savings the RPA system brings don’t offset the costs of installation. The economic impacts of RPA systems can be difficult to predict and the relationship between automation and cost reduction is not a linear one. Automating 30% of a task will not necessarily reduce a company’s costs by 30%.

Spread the love
Continue Reading

AI 101

What is the Turing Test and Why Does it Matter?

mm

Published

on

If you’ve been around Artificial Intelligence (AI) you have undoubtedly heard of ‘The Turing Test‘.  This was a test first proposed by Alan Turing in 1950, the test was designed to be the ultimate experiment on whether or not an AI has achieved human level intelligence. Conceptually, if the AI is able to pass the test, it has achieved intelligence that is equivalent to, or indistinguishable from that of a human.

We will explore who Alan Turing is, what the test is, why it matters, and why the definition of the test may need to evolve.

Who is Alan Turing?

Turing is an eccentric British Mathematician who is recognized for his futurist ground breaking ideas.

In 1935, at the age of 22 his work on probability theory won him a Fellowship of King’s College, University of Cambridge. His abstract mathematical ideas served to push him in a completely different direction in a field that was yet to be invented.

In 1936, Turing published a paper that is now recognized as the foundation of computer science. This is where he invented the concept of a ‘Universal Machine’ that could decode and perform any set of instructions.

In 1939, Turing was recruited by the British government’s code-breaking department. At the time Germany was using what is called an ‘enigma machine‘ to encipher all its military and naval signals. Turing rapidly developed a new machine (the ‘Bombe’) which was capable of breaking Enigma messages on an industrial scale. This development has been deemed as instrumental in assisting in pushing back the aggression’s of Nazi Germany.

In 1946, Turing returned to working on his revolutionary idea published in 1936 to develop an electronic computer, capable of running various types of computations. He produced a detailed design for what was was called the Automatic Computing Engine (ACE.)

In 1950, Turing published his seminal work asking if a “Machine Can Think?“.  This paper completely transformed both computer science and AI.

In 1952, after being reported to the police by a young man, Turing was convicted of gross indecency due to his homosexual activities.  Due to this his security clearance for the government was revoked, and his career was destroyed. In order to punish him he was chemically castrated.

With his life shattered he was later discovered in his home by his cleaner on 8 June, 1954. He had died from cyanide poisoning the day before. A partly eaten apple lay next to his body. The coroner’s verdict was suicide.

Fortunately, his legacy continues to live on.

What is the Turing Test?

In 1950, Alan Turing published a seminal paper titled “Computing Machinery and Intelligence” in Mind magazine. In this detailed paper the question “Can Machines Think?” was proposed. The paper suggested abandoning the quest to define if a machine can think, to instead test the machine with the ‘imitation game’. This simple game is played with three people:

  • a man (A)
  • a woman (B),
  • and an interrogator (C) who may be of either sex.

The concept of the game is that the interrogator stays in a room that is separate from both the man (A) and the woman (B), the goal is for the interrogator to identify who the man is, and who the woman is. In this instance the goal of the man (A) is to deceive the interrogator, meanwhile the woman (B) can attempt to help the interrogator (C). To make this fair, no verbal cues can be used, instead only typewritten questions and answers are sent back and forth. The question then becomes: How does the interrogator know who to trust?

The interrogator only knows them by the labels X and Y, and at the end of the game he simply states either ‘X is A and Y is B’ or ‘X is B and Y is A’.

The question then becomes, if we remove the man (A) or the woman (B), and replace that person with an intelligent machine, can the machine use its AI system to trick the interrogator (C) into believing that it’s a man or a woman? This is in essence the nature of the Turing Test.

In other words if you were to communicate with an AI system unknowingly, and you assumed that the ‘entity’ on the other end was a human, could the AI deceive you indefinitely?

Why the Turing Test Matters

In Alan Turing’s paper he alluded to the fact that he believed that the Turing Test could eventually be beat. He states: “by the year 2000 I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent, chance of making the right identification after five minutes of questioning.

When looking at the Turing Test through a modern lens it seems very possible that an AI system could trick a human for five minutes. How often have humans interacted with support chatbots not knowing if the chatbot is a human or a bot?

There have been many reports of the Turing Test being passed. In 2014, a chatbot program named Eugene Goostman, which simulates a 13-year-old Ukrainian boy, is said to have passed the Turing test at an event organised by the University of Reading. The chatbot apparently convinced 33% of the judges at the Royal Society in London that it was human. Nonetheless critics were fast to point out the inadequacies of the test, the fact that so many judges were not convinced, the duration of the test (only 5 minutes), as well as the lack of forthcoming evidence for this achievement.

Nonetheless, it an age of Natural Language Processing (NLP), with its subfields of Natural-language understanding (NLU) and natural-language interpretation (NLI), the question needs to be asked, if a machine is asking and answering questions without fully understanding the context behind what it says is the machine truly intelligent?

After all, if you review the technology behind Watson, a computer system capable of answering questions posed in natural language, developed by IBM to defeat Jeopardy champions, it becomes apparent that Watson was able to beat the world champions by accessing all of the world’s knowledge via the internet, without actually understanding the context behind this language. Similar to a search engine, keywords and reference points were made. If an AI can achieve this level of comprehension, then we should consider that based on today’s advancing technology, deceiving a human for 5 or 10 minutes is simply not setting the bar high enough.

Should the Turing Test Evolve?

The Turing Test has done a remarkable job of standing the test of time. Nonetheless, AI has evolved dramatically since 1950. Every time AI achieves a feat of which we claimed only humans were capable of we set the bar higher. It will only be a matter of time until AI is able to consistently pass the Turing Test as we understand it.

When reviewing the history of AI, the ultimate barometer of whether or not AI can achieve human level intelligence is almost always based on if it can defeat humans at various games. In 1949, Claude Shannon published his thoughts on the topic of how a computer might be made to play chess as this was considered the ultimate summit of human intelligence.

It wasn’t until February 10, 1996, after a grueling three hour match that world chess champion Garry Kasparov lost the first game of a six-game match against Deep Blue, an IBM computer capable of evaluating 200 million moves per second. It wasn’t long until Chess was no longer considered the pinnacle of human intelligence. Chess was then replaced with the game of Go, a game which originated in China over 3000 years ago. The bar for AI achieving human level intelligence was moved up.

Fast forward to October 2015, AlphaGo played its first match against the reigning three-time European Champion, Mr Fan Hui. AlphaGo won the first ever game against a Go professional with a score of 5-0. Go is considered to be the most sophisticated game in the world with its 10360 possible moves. All of a sudden the bar was moved up again.

Eventually the argument was that an AI had to be able to defeat teams of players at MMORPG (massively multiplayer online role-playing games). OpenAI quickly rose to the challenge by using deep reinforcement learning.

It is due to this consistent moving of the proverbial bar that we should reconsider a new modern definition of the Turing Test. The current test may rely too much on deception, and the technology that is in a chatbot. Potentially, with the evolution of robotics we may require that for an AI to truly achieve human level intelligence, the AI will need to interact and “live” in our actual world, versus a game environment or a simulated environment with its defined rules.

If instead of deceiving us,  a robot can can interact with us like any other human, by having conversations, proposing ideas and solutions, maybe only then will the Turing Test be passed. The ultimate version of the Turing Test may be when an AI approaches a human, and attempts to convince us that it is self-aware.

At this point, we will also have achieved Artificial General Intelligence (AGI). It would then be inevitable than the AI/robot would rapidly surpass us in intelligence.

Spread the love
Continue Reading

AI 101

What is Data Science?

mm

Published

on

The field of data science seems to just get bigger and more popular everyday. According to LinkedIn, data science was one of the fastest-growing job fields in 2017 and in 2020 Glassdoor ranked the job of data science as one of the three best jobs within the United States. Given the growing popularity of data science, it’s no surprise that more people are getting interested in the field. Yet what is data science exactly?

Let’s get acquainted with data science, taking some time to define data science, explore how big data and artificial intelligence is changing the field, learn about some common data science tools, and examine some examples of data science.

Defining Data Science

Before we can explore any data science tools or examples, we’ll want to get a concise definition of data science.

Defining “data science” is actually a little tricky, because the term is applied to many different tasks and methods of inquiry and analysis. We can begin by reminding ourselves of what the term “science” means. Science is the systematic study of the physical and natural world through observation and experimentation, aiming to advance human understanding of natural processes. The important words in that definition are “observation” and “understanding”.

If data science is the process of understanding the world from patterns in data, then the responsibility of a data scientist is to transform data, analyze data, and extract patterns from data. In other words, a data scientist is provided with data and they use a number of different tools and techniques to preprocess the data (get it ready for analysis) and then analyze the data for meaningful patterns.

The role of a data scientist is similar to the role of a traditional scientist. Both are concerned with the analysis of data to support or reject hypotheses about how the world operates, trying to make sense of patterns in the data to improve our understanding of the world. Data scientists make use of the same scientific methods that a traditional scientist does. A data scientist starts by gathering observations about some phenomena they would like to study. They then formulate a hypothesis about the phenomenon in question and try to find data that nullifies their hypothesis in some way.

If the hypothesis isn’t contradicted by the data, they might be able to construct a theory, or model, about how the phenomenon works, which they can go on to test again and again by seeing if it holds true for other similar datasets. If a model is sufficiently robust, if it explains patterns well and isn’t nullified during other tests, it can even be used to predict future occurrences of that phenomenon.

A data scientist typically won’t gather their own data through an experiment. They usually won’t design experiments with controls and double-blind trials to discover confounding variables that might interfere with a hypothesis. Most data analyzed by a data scientist will be data gained through observational studies and systems, which is a way in which the job of a data scientist might differ from the job of a traditional scientist, who tends to perform more experiments.

That said, a data scientist might be called on to do a form of experimentation called A/B testing where tweaks are made to a system that gathers data to see how the data patterns change.

Regardless of the techniques and tools used, data science ultimately aims to improve our understanding of the world by making sense out of data, and data is gained through observation and experimentation.  Data science is the process of using algorithms, statistical principles, and various tools and machines to draw insights out of data, insights that help us understand patterns in the world around us.

What Do Data Scientists Do?

You might be seeing that any activity that involves the analysis of data in a scientific manner can be called data science, which is part of what makes defining data science so hard. To make it more clear, let’s explore some of the activities that a data scientist might do on a daily basis.

Data science brings many different disciplines and specialties together. Photo: Calvin Andrus via Wikimeedia Commons, CC BY SA 3.0 (https://commons.wikimedia.org/wiki/File:DataScienceDisciplines.png)

On any given day, a data scientist might be asked to: create data storage and retrieval schema, create data ETL (extract, transform, load) pipelines and clean up data, employ statistical methods, craft data visualizations and dashboards, implement artificial intelligence and machine learning algorithms, make recommendations for actions based on the data.

Let’s break the tasks listed above down a little.

Data Storage, Retrieval, ETL, and Cleanup

A data scientist may be required to handle the installation of technologies needed to store and retrieve data, paying attention to both hardware and software. The person responsible for this position may also be referred to as “Data Engineer”. However, some companies include these responsibilities under the role of data scientists. A data scientist may also need to create, or assist in the creation of, ETL pipelines. Data very rarely comes formatted just as a data scientist needs. Instead, the data will need to be received in a raw form from the data source, transformed into a usable format, and preprocessed (things like standardizing the data, dropping redundancies, and removing corrupted data).

Statistical Methods

The application of statistics is necessary to turn simply looking at data and interpreting it into an actual science. Statistical methods are used to extract relevant patterns from datasets, and a data scientist needs to be well versed in statistical concepts. They need to be able to discern meaningful correlations from spurious correlations by controlling for confounding variables. They also need to know the right tools to use to determine which features in the dataset are important to their model/have predictive power. A data scientist needs to know when to use a regression approach vs. a classification approach, and when to care about the mean of a sample vs. the median of a sample. A data scientist just wouldn’t be a scientist without these crucial skills.

Data Visualization

A crucial part of a data scientist’s job is communicating their findings to others. If a data scientist can’t effectively communicate their findings to others, than the implications of their findings don’t matter. A data scientist should be an effective story-teller as well. This means producing visualizations that communicate relevant points about the dataset and the patterns discovered within it. There is a large number of different data visualization tools that a data scientist might use, and they may visualize data for the purposes of initial, basic exploration (exploratory data analysis) or visualize the results that a model produces.

Recommendations and Business Applications

A data scientist needs to have some intuition of the requirements and goals of their organization or business. A data scientist needs to understand these things because they need to know what types of variables and features they should be analyzing, exploring patterns that will help their organization achieve its goals. The data scientists need to be aware of the constraints that they are operating under and the assumptions that the organization’s leadership are making.

Machine Learning and AI

Machine learning and other artificial intelligence algorithms and models are tools used by data scientists to analyze data, identify patterns within data, discern relationships between variables, and make predictions about future events.

Traditional Data Science vs. Big Data Science

As data collection methods have gotten more sophisticated and databases larger, a difference has arisen between traditional data science and “big data” science.

Traditional data analytics and data science is done with descriptive and exploratory analytics, aiming to find patterns and analyze the performance results of projects. Traditional data analytics methods often focus on just past data and current data. Data analysts often deal with data that has already been cleaned and standardized, while data scientists often deal with complex and dirty data. More advanced data analytics and data science techniques might be used to predict future behavior, although this is more often done with big data, as predictive models often need large amounts of data to be reliably constructed.

“Big data” refers to data that is too large and complex to be handled with traditional data analytics and science techniques and tools. Big data is often collected through online platforms and advanced data transformation tools are used to make the large volumes of data ready for inspection by data science. As more data is collected all the time, more of a data scientists job involves the analysis of big data.

Data Science Tools

Common data science tools include tools to store data, carry out exploratory data analysis, model data, carry out ETL, and visualize data. Platforms like Amazon Web Services, Microsoft Azure, and Google Cloud all offer tools to help data scientists store, transform, analyze, and model data. There are also standalone data science tools like Airflow (data infrastructure) and Tableau (data visualization and analytics).

In terms of machine learning and artificial intelligence algorithms used to model data, they are often provided through data science modules and platforms like TensorFlow, PyTorch, and the Azure Machine-learning studio. These platforms like data scientists make edits to their datasets, compose machine learning architectures, and train machine learning models.

Other common data science tools and libraries include SAS (for statistical modeling), Apache Spark (for the analysis of streaming data), D3.js (for interactive visualizations in the browser), and Jupyter (for interactive, sharable code blocks and visualizations).

Photo: Seonjae Jo via Flickr, CC BY SA 2.0 (https://www.flickr.com/photos/130860834@N02/19786840570)

Examples of Data Science

Examples of data science and its applications are everywhere. Data science has applications in everything from food delivery, sports, traffic, and health. Data is everywhere and so data science can be applied to everything.

In terms of food, Uber is investing in an expansion to its ride-sharing system focused on the delivery of food, Uber Eats. Uber Eats needs to get people their food in a timely fashion, while it is still hot and fresh. In order for this to occur, data scientists for the company need to use statistical modeling that takes into account aspects like distance from restaurants to delivery points, holiday rushes, cooking time, and even weather conditions, all considered with the goal of optimizing delivery times.

Sports statistics are used by team managers to determine who the best players are and form strong, reliable teams that will win games. One notable example is the data science documented by Michael Lewis in the book Moneyball, where the general manager of the Oakland Athletics team analyzed a variety of statistics to identify quality players that could be signed to the team at relatively low cost.

The analysis of traffic patterns is critical for the creation of self-driving vehicles. Self-driving vehicles must be able to predict the activity around them and respond to changes in road conditions, like the increased stopping distance required when it is raining, as well as the presence of more cars on the road during rush hour. Beyond self-driving vehicles, apps like Google Maps analyze traffic patterns to tell commuters how long it will take them to get to their destination using various routes and forms of transportation.

In terms of health data science, computer vision is often combined with machine learning and other AI techniques to create image classifiers capable of examining things like X-rays, FMRIs, and ultrasounds to see if there are any potential medical issues that might show up in the scan. These algorithms can be used to help clinicians diagnose disease.

Ultimately, data science covers numerous activities and brings together aspects of different disciplines. However, data science is always concerned with telling compelling, interesting stories from data, and with using data to better understand the world.

Spread the love
Continue Reading