Connect with us

Big Data

Researchers Are Starting To Train Artificial Intelligence To Combat Hate Speech Online




Researchers Are Starting To Train Artificial Intelligence To Combat Hate Speech Online

Fake news and hate speech online are becoming not a daily, but a-minute-by-minute problem online. The IkigaiLab reports that Facebook and Twitter only recently had to close more than 1.5 billion and 70 million amounts respectively just to try to at least curb the spread of fake news and hate speech around the world.

Still, at the moment, such a task requires enormous human power and almost constant working hours to just take a tip of the hate speech iceberg. To resolve the problem, researchers in numerous labs are starting to train artificial intelligence (AI) to help with this humongous task.

Ikigai cites the Rosetta system that Facebook is using to understand the authenticity of the news, images or other content that is uploaded on that social media. As is explained, what Rosetta does is scan “the word, picture, language, font, date of the post amongst other variables and tries to see if the information being presented is genuine or not.” After the system gathers the information and having in mind that AI is still not fully “adept at understanding innuendoes, references, slights and the contexts in which the content was posted,” the human moderators take over and guide the AI system to discover hate speech and fake news.

To try to further develop the ability of the AI systems to be able to cover all the possible nuances  that characterise hate speech, a team of researchers at the UC Santa Barbara and Intel, as TheNextWeb (TNW) reports, “took thousands of conversations from the scummiest communities on Reddit and Gab and used them to develop and train AI to combat hate speech.”

According to their report, to do so, the joint group of researchers created a specific dataset  featuring “thousands of conversations specially curated to ensure they’d be chock full of hate speech.”They also used a list of the groups on Reddit that are mostly characterized by the use of hate speech compiled by Justin Caffier of Vox.

The researchers ended up collecting “more than 22,000 comments from Reddit and over 33,000 from Gab.” They discovered that the two sites show similar popular hate keywords, but the distributions are very different.

They noted that due to these differences it is very hard for social media, in general, to intervene in real-time since the flow of hate speech is so high that it would require countless real persons to follow it.

To take the problem, the research team started to train AI to intervene. Their initial database was sent to Amazon Turk workers to be labeled. After identifying the individual instances of hate speech, the workers came up with phrases that AI would be used “to deter users from posting similar hate speech in the future.”

Based on that, the team “ran this dataset and its database of interventions through various machine learning and natural language processing systems and created a sort of prototype for an online hate speech intervention AI.”

The results produced were excellent, but since the development is still at an early stage, the system is not ready yet to be actively used. As it is explained, “the system, in theory, should detect hate speech and immediately send a message to the poster letting them know why they shouldn’t post things that obviously represent hate speech. This relies on more than just keyword detection – in order for the AI to work it has to get the context right.”


Spread the love

Former diplomat and translator for the UN, currently freelance journalist/writer/researcher, focusing on modern technology, artificial intelligence, and modern culture.

Big Data

Anthony Macciola, Chief Innovation Officer at ABBYY – Interview Series




Anthony Macciola, Chief Innovation Officer at ABBYY - Interview Series

Anthony is recognized as a thought leader and primary innovator of products, solutions, and technologies for the intelligent capture, RPA, BPM, BI and mobile markets.

ABBYY is an innovator and leader in artificial intelligence (Al) technology including machine learning and natural language processing that helps organizations better understand and drive context and outcomes from their data. The company sets a goal to grow and strengthen its leadership positions by satisfying the ever-increasing demand for AI-enabled products and solutions.

ABBYY has been developing semantic and AI technologies for many years. Thousands of organizations from over 200 countries and regions have chosen ABBYY solutions that transform documents into business value by capturing information in any format. These solutions help organizations of diverse industries boost revenue, improve processes, mitigate risk, and drive competitive advantage.

What got you initially interested in AI?

I first became interested in AI in the 90s. In my role, we were utilizing support vector machines, neural networks, and machine learning engines to create extraction and classification models. At the time, it wasn’t called AI. However, we were leveraging AI to address problems surrounding data and document-driven processes, problems like effectively and accurately extracting, classifying and digitizing data from documents. From very early on in my career, I’ve known that AI can play a key role in transforming unstructured content into actionable information. Now, AI is no longer seen as a futuristic technology but an essential part of our daily lives – both within the enterprise and as consumers. It has become prolific. At ABBYY, we are leveraging AI to help solve some of today’s most pressing challenges. AI and related technologies, including machine learning, natural language processing, neural networks and OCR, help power our solutions that enable businesses to obtain a better understanding of their processes and the content the fuels them.


You’re currently the Chief Innovation Officer at ABBYY. What are some of the responsibilities of this position? 

In my role as Chief Innovation Officer for ABBYY, I’m responsible for our overall vision, strategy, and direction relative to various AI initiatives that leverage machine learning, robotic process automation (RPA), natural language processing and text analytics to identify process and data insights that improve business outcomes.

As CIO, I’m responsible for overseeing the direction of our product innovations as well as identifying outside technologies that are a fit to integrate into our portfolio. I initiated the discussions that lead to acquisition of TimelinePI, now ABBYY Timeline, the only end-to-end Process Intelligence platform in the market. Our new offering enables ABBYY to provide an even more robust and dynamic solution for optimizing the processes a business runs on and the data within those processes. We provide enterprises across diverse industries with solutions to accelerate digital transformation initiatives and unlock new opportunities for providing value to their customers.

I also guide the strategic priorities for the Research & Development and Product Innovation teams. My vision for success with regards to our innovations is guided by the following tenants:

  • Simplification: make everything we do as easy as possible to deploy, consume and maintain.
  • Cloud: leverage the growing demand for our capabilities within a cloud-based SaaS model.
  • Artificial Intelligence: build on our legacy expertise in linguistics and machine learning to ensure we take a leadership role as it relates to content analytics, automation and the application of machine learning within the process automation market.
  • Mobility: ensure we have best-of-breed on device and zero footprint mobile capture capabilities.


ABBYY uses AI technologies to solve document-related problems for enterprises using intelligent capture. Could you walk us through the different machine learning technologies that are used for these applications?

ABBYY leverages several AI enabling technologies to solve document-related and process-related challenges for businesses. More specifically, we work with computer vision, neural networks, machine learning, natural language processing and cognitive skills. We utilize these technologies in the following ways:

Computer Vision: utilized to extract, analyze, and understand information from images, including scanned documents.

Neural Networks: leveraged within our capture solutions to strengthen the accuracy of our classification and extraction technology. We also utilize advanced neural network techniques within our OCR offerings to enhance the accuracy and tolerance of our OCR technology.

Machine Learning: enables software to “learn” and improve, which increases accuracy and performance. In a workflow involving capturing documents and then processing with RPA, machine learning can learn from several variations of documents.

Natural Language Processing: enables software to read, interpret, and create actionable and structured data around unstructured content, such as completely unstructured document such as contracts, emails and other free-form communications.

Cognitive Skill: the ability to carry out a given task with determined results within a specific amount of time and cost. Examples within our products including extracting data and classifying a document.


ABBYY Digital Intelligence solutions help organizations accelerate their digital transformation. How do you define Digital Intelligence, how does it leverage RPA, and how do you go about introducing this to clients?

Digital Intelligence means gaining the valuable, yet often hard to attain, insight into an organization’s operation that enables true business transformation. With access to real-time data about exactly how their processes are currently working and the content that fuels them, Digital Intelligence empowers businesses to make tremendous impact where it matters most: customer experience, competitive advantage, visibility, and compliance.

We are educating our clients as to how Digital Intelligence can accelerate their digital transformation projects by addressing the challenges they have with unstructured and semi-structured data that is locked in documents such as invoices, claims, bills of lading, medical forms, etc. Customers focused on implementing automation projects can leverage Content Intelligence solutions to extract, classify, and validate documents to generate valuable and actionable business insights from their data.

Another component of Digital Intelligence is helping customers solve their process-related challenges. Specifically in relation to using RPA, there is often a lack of visibility of the full end-to-end process and consequently there is a failure to consider the human workflow steps in the process and the documents on which they work. By understanding the full process with Process Intelligence, they can make better decisions on what to automate, how to measure it and how to monitor the entire process in production.

We introduce this concept to clients via the specific solutions that make up our Digital Intelligence platform. Content Intelligence enables RPA digital workers to turn unstructured content into meaningful information. Process Intelligence provides complete visibility into processes and how they are performing in real time.


What are the different types of unstructured data that you can currently work with?

We transform virtually any type of unstructured content, from simple forms to complex and free-form documents. Invoices, mortgage applications, onboarding documents, claim forms, receipts, and waybills are common use cases among our customers. Many organizations utilize our Content Intelligence solutions, such as FlexiCapture, to transform their accounts payable operations, enabling companies to reduce the amount of time and costs associated with tedious and repetitive administrative tasks while also freeing up valuable personnel resources to focus on high-value, mission critical responsibilities.


Which type of enterprises best benefit from the solutions offered by ABBYY?

Enterprises of all sizes, industries, and geographic markets can benefit from ABBYY’s Digital Intelligence solutions. In particular, organizations that are very process-oriented and document driven see substantial benefits from our platform. Businesses within the insurance, banking and financial services, logistics, and healthcare sectors experience notable transformation from our solutions.

For financial service institutions, extracting and processing content effectively can enhance application and onboarding operations, and also enable mobile capabilities, which is becoming increasingly important to remain competitive. With Content Intelligence, banks are able to easily capture documents submitted by the customer – including utility bills, pay stubs, W-2 forms – on virtually any device.

In the insurance industry, Digital Intelligence can significantly improve claims processes by identifying, extracting, and classifying data from claim documents then turning this data into information that feeds into other systems, such as RPA.

Digital Intelligence is a cross-industry solution. It enables enterprises of all compositions to improve their processes and generate value from their data, helping businesses increase operational efficiencies and enhance overall profit margins.


Can you give some examples of how clients would benefit from the Digital Intelligence solutions that are offered by ABBYY?

Several recent examples come to mind relating to transforming accounts payable and claims. A billion-dollar manufacturer and distributor of medical supplies was experiencing double-digit sales growth year-over-year. It used ABBYY solutions with RPA to automate its 2,000/day invoices and achieved significant results in productivity and cost efficiencies. Likewise, and insurance company digitized its 150,000+ annual claims processing. From claim setup to invoice clarity it achieved more than 5,000 hours of productivity benefits.

Another example is with a multi-billion global logistics company that had a highly manual invoice processing challenge. It had dozens of people processing hundreds of thousands of invoices from 124 different vendors annually. When it first considered RPA for its numerous finance activities, it shied away from invoice processing because of the complexity of semi-structured documents. It used our solutions to extract, classify and validate invoice data, which included machine learning for ongoing training of invoices. If there was data that could not be matched, invoices went to a staff member for verification, but the points that needed to be checked were clearly highlighted to minimize effort. The invoices were then processed in the ERP system using RPA software bots. As a result, its accounts payables are now completely automated and is able to processes thousands of invoices at a fraction of the time with significantly less errors.


What are some of the other interesting machine learning powered applications that are offered by ABBYY?

Machine learning is at the heart of our Content Intelligence solutions. ML fuels how we train our classification and extraction technology. We utilize this technology in our FlexiCapture solution to acquire, process, and validate data from documents – even complex or free form ones – and then feed this data into business applications including BPM and RPA. Leveraging machine learning, we are able to transform content-centric processes in a truly advanced way.


Is there anything else that you would like to share about ABBYY?

It goes without saying that these are uncertain and unprecedented times. ABBYY is fully committed to helping businesses navigate these challenging circumstances. It is more important than ever that businesses have what it takes to make timely, intelligent decisions. There is a lot of data coming in and it can be overwhelming. We are committed to making sure organizations are equipped with the technologies they need to deliver outcomes and help customers.

I really enjoyed learning about your work, for anyone who wishes to learn more please visit ABBYY

Spread the love
Continue Reading

Big Data

Human Genome Sequencing and Deep Learning Could Lead to a Coronavirus Vaccine – Opinion




Human Genome Sequencing and Deep Learning Could Lead to a Coronavirus Vaccine - Opinion

The AI community must collaborate with geneticists, in finding a treatment for those deemed most at risk of coronavirus. A potential treatment could involve removing a person’s cells, editing the DNA and then injecting the cells back in, now hopefully armed with a successful immune response. This is currently being worked on for some other vaccines.

The first step would be sequencing the entire human genome from a sizeable segment of the human population.

Sequencing Human Genomes

Sequencing the first human genome cost $2.7 billion and took nearly 15 years to complete. The current cost of sequencing an entire human has dropped dramatically. As recent as 2015 the cost was $4000, now the cost is less than $1000 per person. This cost could drop a few percentage points more when economies of scale are taken into consideration.

We need to sequence the genome of two different types of patients:

  1. Infected with Coronavirus; but healthy
  2. Infected with Coronavirus; but poor immune response

It is impossible to predict which data point will be most valuable, but each sequenced genome would provide a dataset. The more data the more options there are to locate DNA variations which increase a body’s resistance to the disease vector.

Nations are currently losing trillions of dollars to this outbreak, the cost of $1000 a human genome is minor in comparison. A minimum of 1,000 volunteers for both segments of the population would arm researchers with significant volumes of big data. Should the trial increase in size by one order of magnitude, the AI would have even more training data which would increase the odds of success by several orders of magnitude. The more data the better, which is why a target of 10,000 volunteers should be aimed for.

Machine Learning

While multiple functionalities of machine learning would be present, deep learning would be used to find patterns in the data. For instance, there might be an observation that certain DNA variables correspond to a high immunity, while others correspond to a high mortality. At a minimum we would learn which segments of the human population are more susceptible and should be quarantined.

To decipher this data an Artificial Neural Network (ANN) would be located on the cloud, and sequenced human genomes from around the world would be uploaded. With time being of the essence, parallel computing will reduce the time required for the ANN to work its magic.

We could even take it one step further and use the output data sorted by the ANN,and feed it into a separate system called a Recurrent Neural Network (RNN). The RNN uses reinforcement learning to identify which gene selected by the initial ANN is most successful in a simulated environment. The reinforcement learning agent would gamify the entire process of creating a simulated setting, to test which DNA changes are more effective.

A simulated environment is like a virtual game environment, something many AI companies are well positioned to take advantage of based on their previous success in designing AI algorithms to win at esports. This includes companies such DeepMind and OpenAI.

These companies can use their underlying architecture optimized at mastering video games, to create a stimulated environment, test gene edits, and learn which edits lead to specific desired changes.

Once a gene is identified, another technology is used to make the edits.


Recently, the first ever study using CRISPR to edit DNA inside the human body was approved. This was to treat a rare type of genetic disorder that effects one of every 100,000 newborns. The condition can be caused by mutations in as many as 14 genes that play a role in the growth and operation of the retina. In this case, CRISPR sets out to carefully target DNA and to cause slight temporary damage to the DNA strand, causing the cell to repair itself. It is this restorative healing process which has the potential to restore eyesight.

While we are still waiting for results on if this treatment will work, the precedent of having CRISPR approved for trials in the human body is transformational. Potential disorders which can be treated include improving a body’s immune response to specific disease vectors.

Potentially, we can manipulate the body’s natural genetic resistance to a specific disease. The diseases that could potentially be targeted are diverse, but the community should be focusing on the treatment of the new global epidemic coronavirus.  A threat that if unchecked could lead to a death sentence to a large percentage of our population.


While there are many potential options to achieving success, it will require that geneticists, epidemiologists, and machine learning specialists unify. A potential treatment option may be as described above, or may be revealed to be unimaginably different, the opportunity lies in the genome sequencing of a large segment of the population.

Deep learning is the best analysis tool that humans have ever created; we need to at a minimum attempt to use it to create a vaccine.

When we take into consideration what is currently at risk with this current epidemic, these three scientific communities need to come together to work on a cure.

Spread the love
Continue Reading

Big Data

How AI Predicted Coronavirus and Can Prevent Future Pandemics – Opinion




How AI Predicted Coronavirus and Can Prevent Future Pandemics - Opinion

BlueDot AI Prediction

On January 6th, the US Centers for Disease Control and Prevention (CDC) notified the public that a flu-like outbreak was propagating in Wuhan City, in the Hubei Province of China.  Subsequently, the World Health Organization (WHO) released a similar report on January 9th.

While these responses may seem timely, they were slow when compared to an AI company called BlueDot.  BlueDot released a report on December 31st, a full week before the CDC released similar information.

Even more impressive, BlueDot predicted the Zika outbreak in Florida six months before the first case in 2016.

What are some of the datasets that BlueDot analyzes?

  • Disease Surveillance, this includes scanning 10,000+ media and public sources in over 60 languages.
  • Demographic data from national censuses, and national statistic reports. (Population density is a factor behind virus propagation)
  • Real-time climate data from NASA, NOAA, etc. (Viruses spread faster in certain environmental conditions)
  • Insect vectors and animal reservoirs (Important when virus can spread from species to species).

BlueDot currently works with various Government agencies including Global Affairs Canada, Public Health Agency of Canada, the Canadian Medical Association, and the Singapore Ministry of Health.  The BlueDot Insights product sends near real-time infectious disease alerts. Some advantages behind this product include:

  • Reducing the risk of exposure to frontline healthcare workers
  • Global visibility enables time saving on infectious disease surveillance
  • Opportunity to communicate crucial information clearly before it’s too late.
  • Ability to protect populations from infections

How AI Predictability Could Be Improved

What’s preventing the BlueDot AI and similar AIs from improving? The number one limiting factor is inability to access the necessary big data in real-time.

These types of predictive systems rely on big data feeding into an artificial neural network (ANN), which uses deep learning to search for patterns. The more data that is fed into this ANN, the more accurate the machine learning algorithm becomes.

This essentially means that what is preventing the AI from being able to flag a potential outbreak sooner than later, is simply a lack of access to the necessary data. In countries like China which regularly monitor, and filter news, these delays to the necessary data are even more pronounced. The censoring process of each datapoint can significantly reduce the amount of available data, and worse, can even completely remove the accuracy of this data, which removes the potential usefulness of this data. Faulty data was even why previous efforts such as Google Flu Trends failed.

In other words, the major problem that is preventing AI systems from fully being able to predict an outbreak as early as possible is Government interference. Governments like China, and the current Trump administration, need to remove themselves from any type of data filtering, and enable full access to the press to report on global health issues.

That being stated, reporters can only work with the information that is available to them. Bypassing news reports and accessing sources directly would enable machine learning systems to access data in a timelier and more efficient fashion.

What Needs to be Done

Starting immediately, Governments that are truly interested in reducing the cost of healthcare, and preventing an outbreak, should begin a mandatory review of how their health clinics, and hospitals, can distribute certain datapoints in real-time to officials, reporters and AI systems.

Individual private information can be completely stripped from each patient, enabling the patient to remain anonymous while the important data is shared.

A network of hospitals in any city that collects data in real-time and shares this data would be able to offer superior healthcare. For example, it could be tracked that a specific hospital has shown an increase in patients showing flu-like symptoms, with 3 patients at 10:00 AM, to 7 patients at 1:00 PM, to 49 patients by 5:00PM. This data could be compared to hospitals within the same region, for immediate alerts that a certain region is a potential hotzone.

Once this information is collected and assembled, the AI system could trigger alerts to all neighboring regions so that necessary precautions can be made.

While this would be difficult in certain regions of the world, countries with large AI hubs and smaller population densities such as Canada could institute such an advanced system. Canada has AI hubs in the most populated provinces (Waterloo and Toronto, Ontario, and Montreal, Quebec). The advantages of this inter-hospital and inter-provincial cooperation could be extended to offer Canadians other benefits such as accelerated access to emergency medical care, and reduced healthcare spending. Canada could become a leader in both AI and healthcare, licensing this technology to other jurisdictions.

Most importantly, once a country such as Canada has a system in place, the technology/methodologies can then be cloned and exported to other regions. Eventually, the goal would be to blanket the entire world, to ensure outbreaks are a relic of the past.

This type data collection by healthcare workers has benefits for multiple applications. There is no reason why in 2020 that a patient should have to register themselves with each hospital individually, and that those same hospitals are not communicating to one another in real-time. This lack of communication can result in the loss of data with patients who suffer from dementia, or other symptoms which may prevent them from fully communicating the severity of their condition, or even where else they have been treated.

Lessons Learned

We can only hope that governments around the world, take advantage of the important lessons that coronavirus is teaching us. Humanity should consider itself lucky that coronavirus has a relatively mild fatality rate compared to some infectious agents of the past such as the Black Plague which is estimated to have killed 30% to 60% of Europe’s population.

The next time we might not be so lucky, what we do know so far, is that governments are currently ill-equipped to deal with the severity of an outbreak.

Bluedot was conceived in the wake of Toronto’s 2003 SARS outbreak and launched in 2013. The goal was to protect people around the world from infectious diseases with human and artificial intelligence. The AI component has demonstrated remarkable ability to predict the path of infectious diseases, what remains is the human component. We need new policies in place in order to enable companies such as BlueDot to excel at what they do best. As people we need to demand more from our politicians, and healthcare providers.

Spread the love
Continue Reading