Anthony is recognized as a thought leader and primary innovator of products, solutions, and technologies for the intelligent capture, RPA, BPM, BI and mobile markets.
ABBYY is an innovator and leader in artificial intelligence (Al) technology including machine learning and natural language processing that helps organizations better understand and drive context and outcomes from their data. The company sets a goal to grow and strengthen its leadership positions by satisfying the ever-increasing demand for AI-enabled products and solutions.
ABBYY has been developing semantic and AI technologies for many years. Thousands of organizations from over 200 countries and regions have chosen ABBYY solutions that transform documents into business value by capturing information in any format. These solutions help organizations of diverse industries boost revenue, improve processes, mitigate risk, and drive competitive advantage.
What got you initially interested in AI?
I first became interested in AI in the 90s. In my role, we were utilizing support vector machines, neural networks, and machine learning engines to create extraction and classification models. At the time, it wasn’t called AI. However, we were leveraging AI to address problems surrounding data and document-driven processes, problems like effectively and accurately extracting, classifying and digitizing data from documents. From very early on in my career, I’ve known that AI can play a key role in transforming unstructured content into actionable information. Now, AI is no longer seen as a futuristic technology but an essential part of our daily lives – both within the enterprise and as consumers. It has become prolific. At ABBYY, we are leveraging AI to help solve some of today’s most pressing challenges. AI and related technologies, including machine learning, natural language processing, neural networks and OCR, help power our solutions that enable businesses to obtain a better understanding of their processes and the content the fuels them.
You’re currently the Chief Innovation Officer at ABBYY. What are some of the responsibilities of this position?
In my role as Chief Innovation Officer for ABBYY, I’m responsible for our overall vision, strategy, and direction relative to various AI initiatives that leverage machine learning, robotic process automation (RPA), natural language processing and text analytics to identify process and data insights that improve business outcomes.
As CIO, I’m responsible for overseeing the direction of our product innovations as well as identifying outside technologies that are a fit to integrate into our portfolio. I initiated the discussions that lead to acquisition of TimelinePI, now ABBYY Timeline, the only end-to-end Process Intelligence platform in the market. Our new offering enables ABBYY to provide an even more robust and dynamic solution for optimizing the processes a business runs on and the data within those processes. We provide enterprises across diverse industries with solutions to accelerate digital transformation initiatives and unlock new opportunities for providing value to their customers.
I also guide the strategic priorities for the Research & Development and Product Innovation teams. My vision for success with regards to our innovations is guided by the following tenants:
- Simplification: make everything we do as easy as possible to deploy, consume and maintain.
- Cloud: leverage the growing demand for our capabilities within a cloud-based SaaS model.
- Artificial Intelligence: build on our legacy expertise in linguistics and machine learning to ensure we take a leadership role as it relates to content analytics, automation and the application of machine learning within the process automation market.
- Mobility: ensure we have best-of-breed on device and zero footprint mobile capture capabilities.
ABBYY uses AI technologies to solve document-related problems for enterprises using intelligent capture. Could you walk us through the different machine learning technologies that are used for these applications?
ABBYY leverages several AI enabling technologies to solve document-related and process-related challenges for businesses. More specifically, we work with computer vision, neural networks, machine learning, natural language processing and cognitive skills. We utilize these technologies in the following ways:
Computer Vision: utilized to extract, analyze, and understand information from images, including scanned documents.
Neural Networks: leveraged within our capture solutions to strengthen the accuracy of our classification and extraction technology. We also utilize advanced neural network techniques within our OCR offerings to enhance the accuracy and tolerance of our OCR technology.
Machine Learning: enables software to “learn” and improve, which increases accuracy and performance. In a workflow involving capturing documents and then processing with RPA, machine learning can learn from several variations of documents.
Natural Language Processing: enables software to read, interpret, and create actionable and structured data around unstructured content, such as completely unstructured document such as contracts, emails and other free-form communications.
Cognitive Skill: the ability to carry out a given task with determined results within a specific amount of time and cost. Examples within our products including extracting data and classifying a document.
ABBYY Digital Intelligence solutions help organizations accelerate their digital transformation. How do you define Digital Intelligence, how does it leverage RPA, and how do you go about introducing this to clients?
Digital Intelligence means gaining the valuable, yet often hard to attain, insight into an organization’s operation that enables true business transformation. With access to real-time data about exactly how their processes are currently working and the content that fuels them, Digital Intelligence empowers businesses to make tremendous impact where it matters most: customer experience, competitive advantage, visibility, and compliance.
We are educating our clients as to how Digital Intelligence can accelerate their digital transformation projects by addressing the challenges they have with unstructured and semi-structured data that is locked in documents such as invoices, claims, bills of lading, medical forms, etc. Customers focused on implementing automation projects can leverage Content Intelligence solutions to extract, classify, and validate documents to generate valuable and actionable business insights from their data.
Another component of Digital Intelligence is helping customers solve their process-related challenges. Specifically in relation to using RPA, there is often a lack of visibility of the full end-to-end process and consequently there is a failure to consider the human workflow steps in the process and the documents on which they work. By understanding the full process with Process Intelligence, they can make better decisions on what to automate, how to measure it and how to monitor the entire process in production.
We introduce this concept to clients via the specific solutions that make up our Digital Intelligence platform. Content Intelligence enables RPA digital workers to turn unstructured content into meaningful information. Process Intelligence provides complete visibility into processes and how they are performing in real time.
What are the different types of unstructured data that you can currently work with?
We transform virtually any type of unstructured content, from simple forms to complex and free-form documents. Invoices, mortgage applications, onboarding documents, claim forms, receipts, and waybills are common use cases among our customers. Many organizations utilize our Content Intelligence solutions, such as FlexiCapture, to transform their accounts payable operations, enabling companies to reduce the amount of time and costs associated with tedious and repetitive administrative tasks while also freeing up valuable personnel resources to focus on high-value, mission critical responsibilities.
Which type of enterprises best benefit from the solutions offered by ABBYY?
Enterprises of all sizes, industries, and geographic markets can benefit from ABBYY’s Digital Intelligence solutions. In particular, organizations that are very process-oriented and document driven see substantial benefits from our platform. Businesses within the insurance, banking and financial services, logistics, and healthcare sectors experience notable transformation from our solutions.
For financial service institutions, extracting and processing content effectively can enhance application and onboarding operations, and also enable mobile capabilities, which is becoming increasingly important to remain competitive. With Content Intelligence, banks are able to easily capture documents submitted by the customer – including utility bills, pay stubs, W-2 forms – on virtually any device.
In the insurance industry, Digital Intelligence can significantly improve claims processes by identifying, extracting, and classifying data from claim documents then turning this data into information that feeds into other systems, such as RPA.
Digital Intelligence is a cross-industry solution. It enables enterprises of all compositions to improve their processes and generate value from their data, helping businesses increase operational efficiencies and enhance overall profit margins.
Can you give some examples of how clients would benefit from the Digital Intelligence solutions that are offered by ABBYY?
Several recent examples come to mind relating to transforming accounts payable and claims. A billion-dollar manufacturer and distributor of medical supplies was experiencing double-digit sales growth year-over-year. It used ABBYY solutions with RPA to automate its 2,000/day invoices and achieved significant results in productivity and cost efficiencies. Likewise, and insurance company digitized its 150,000+ annual claims processing. From claim setup to invoice clarity it achieved more than 5,000 hours of productivity benefits.
Another example is with a multi-billion global logistics company that had a highly manual invoice processing challenge. It had dozens of people processing hundreds of thousands of invoices from 124 different vendors annually. When it first considered RPA for its numerous finance activities, it shied away from invoice processing because of the complexity of semi-structured documents. It used our solutions to extract, classify and validate invoice data, which included machine learning for ongoing training of invoices. If there was data that could not be matched, invoices went to a staff member for verification, but the points that needed to be checked were clearly highlighted to minimize effort. The invoices were then processed in the ERP system using RPA software bots. As a result, its accounts payables are now completely automated and is able to processes thousands of invoices at a fraction of the time with significantly less errors.
What are some of the other interesting machine learning powered applications that are offered by ABBYY?
Machine learning is at the heart of our Content Intelligence solutions. ML fuels how we train our classification and extraction technology. We utilize this technology in our FlexiCapture solution to acquire, process, and validate data from documents – even complex or free form ones – and then feed this data into business applications including BPM and RPA. Leveraging machine learning, we are able to transform content-centric processes in a truly advanced way.
Is there anything else that you would like to share about ABBYY?
It goes without saying that these are uncertain and unprecedented times. ABBYY is fully committed to helping businesses navigate these challenging circumstances. It is more important than ever that businesses have what it takes to make timely, intelligent decisions. There is a lot of data coming in and it can be overwhelming. We are committed to making sure organizations are equipped with the technologies they need to deliver outcomes and help customers.
I really enjoyed learning about your work, for anyone who wishes to learn more please visit ABBYY.
Artificial Intelligence Enhances Speed of Discoveries For Particle Physics
Researchers at MIT have recently demonstrated that utilizing artificial intelligence to simulate aspects of particles and nuclear physics theories can lead to faster algorithms, and therefore faster discoveries when it comes to theoretical physics. The MIT research team combined theoretical physics with AI models to accelerate the creation of samples that simulate interactions between neutrons, protons, and nuclei.
There are four fundamental forces that govern the universe: gravity, electromagnetism, the weak force, and the strong force. The strong, weak, and electromagnetic forces are studied through particle physics. The traditional method of studying particle interactions requires running numerical simulations of these interactions between particles, typically taking place at 1/10th or 1/100th the size of a proton. These studies can take a long time to complete due to limited computing power, and there are many problems that physicists know how to tackle in theory yet cannot address to said computational limitations.
MIT Physics professor Phiala Shanahan is the head of a research group that uses machine learning models to create new algorithms that can speed up particle physics studies. The symmetries found within physics theories (features of the physical system that stay constant even as conditions change) can be incorporated into machine learning algorithms to produce algorithms more suited to particle physics studies. Shanahan explained that the machine learning models aren’t being used to process large amounts of data, rather they are being used to integrate particle symmetries, and the inclusion of these attributes within a model means that computations can be done more quickly.
The research project was lead by Shanahan and it includes several members of the Theoretical Physics team at NYU, as well as machine-learning researchers from Google DeepMind. The recent study is just one of a series of ongoing and recently completed studies aimed at leveraging the power of machine learning to solve theoretical physics problems that are currently impossible with modern computation schemas. According to MIT graduate student Gurtej Kanwar, the problems that the machine-learning boosted algorithms are trying to solve will help scientists understand more about particle physics, and they are useful in making comparisons against results derived by large-scale particle physics experiments (like those conducted at CERN’s Large Hadron Collider). By comparing the results of the large-scale experiments with the AI algorithms, scientists can get a better idea of how their physics models should be constrained, and when those models break down.
Currently, the only method that scientists can reliably use to investigate the Standard Model of particle physics is one where samples/snapshots are taken of fluctuations occurring in a vacuum. Researchers can gain insight into the properties of the particles and what happens when those particles collide. However, taking samples like this is expensive and it is hoped that AI techniques can make taking samples a cheaper, more efficient process. The snapshots taken of the vacuum can be used much like image training data in a computer vision AI model. The quantum snapshots are used to train a model that can create samples in a much more efficient manner, accomplished by taking samples in an easy-to-sample space and running the samples through the trained model.
The research has created a framework intended to streamline the process of creating machine-learning models based on physics symmetries. The framework has already been applied to simpler physics problems and the research team is now attempting to scale up their approach to work with cutting edge calculations. As Kanwar explained via Phys.org:
“I think we have shown over the past year that there is a lot of promise in combining physics knowledge with machine learning techniques. We are actively thinking about how to tackle the remaining barriers in the way of performing full-scale simulations using our approach. I hope to see the first application of these methods to calculations at scale in the next couple of years.”
What Is Synthetic Data?
What is Synthetic Data?
Synthetic data is a quickly expanding trend and emerging tool in the field of data science. What is synthetic data exactly? The short answer is that synthetic data is comprised of data that isn’t based on any real-world phenomena or events, rather it’s generated via a computer program. Yet why is synthetic data becoming so important for data science? How is synthetic data created? Let’s explore the answers to these questions.
What is a Synthetic Dataset?
As the term “synthetic” suggests, synthetic datasets are generated through computer programs, instead of being composed through the documentation of real-world events. The primary purpose of a synthetic dataset is to be versatile and robust enough to be useful for the training of machine learning models.
In order to be useful for a machine learning classifier, the synthetic data should have certain properties. While the data can be categorical, binary, or numerical, the length of the dataset should be arbitrary and the data should be randomly generated. The random processes used to generate the data should be controllable and based on various statistical distributions. Random noise may also be placed in the dataset.
If the synthetic data is being used for a classification algorithm, the amount of class separation should be customizable, in order that the classification problem can be made easier or harder according to the problem’s requirements. Meanwhile, for a regression task, non-linear generative processes can be employed to generate the data.
Why Use Synthetic Data?
As machine learning frameworks like TensorfFlow and PyTorch become easier to use and pre-designed models for computer vision and natural language processing become more ubiquitous and powerful, the primary problem that data scientists must face is the collection and handling of data. Companies often have difficulty acquiring large amounts of data to train an accurate model within a given time frame. Hand-labeling data is a costly, slow way to acquire data. However, generating and using synthetic data can help data scientists and companies overcome these hurdles and develop reliable machine learning models a quicker fashion.
There are a number of advantages to using synthetic data. The most obvious way that the use of synthetic data benefits data science is that it reduces the need to capture data from real-world events, and for this reason it becomes possible to generate data and construct a dataset much more quickly than a dataset dependent on real-world events. This means that large volumes of data can be produced in a short timeframe. This is especially true for events that rarely occur, as if an event rarely happens in the wild, more data can be mocked up from some genuine data samples. Beyond that, the data can be automatically labeled as it is generated, drastically reducing the amount of time needed to label data.
Synthetic data can also be useful to gain training data for edge cases, which are instances that may occur infrequently but are critical for the success of your AI. Edge cases are events that are very similar to the primary target of an AI but differ in important ways. For instance, objects that are only partially in view could be considered edge cases when designing an image classifier.
Finally, synthetic datasets can minimize privacy concerns. Attempts to anonymize data can be ineffective, as even if sensitive/identifying variables are removed from the dataset, other variables can act as identifiers when they are combined. This isn’t an issue with synthetic data, as it was never based on a real person, or real event, in the first place.
Uses Cases for Synthetic Data
Synthetic data has a wide variety of uses, as it can be applied to just about any machine learning task. Common use cases for synthetic data include self-driving vehicles, security, robotics, fraud protection, and healthcare.
One of the initial use cases for synthetic data was self-driving cars, as synthetic data is used to create training data for cars in conditions where getting real, on-the-road training data is difficult or dangerous. Synthetic data is also useful for the creation of data used to train image recognition systems, like surveillance systems, much more efficiently than manually collecting and labeling a bunch of training data. Robotics systems can be slow to train and develop with traditional data collection and training methods. Synthetic data allows robotics companies to test and engineer robotics systems through simulations. Fraud protection systems can benefit from synthetic data, and new fraud detection methods can be trained and tested with data that is constantly new when synthetic data is used. In the healthcare field, synthetic data can be used to design health classifiers that are accurate, yet preserve people’s privacy, as the data won’t be based on real people.
Synthetic Data Challenges
While the use of synthetic data brings many advantages with it, it also brings many challenges.
When synthetic data is created, it often lacks outliers. Outliers occur in data naturally, and while often dropped from training datasets, their existence may be necessary to train truly reliable machine learning models. Beyond this, the quality of synthetic data can be highly variable. Synthetic data is often generated with an input, or seed, data, and therefore the quality of the data can be dependent on the quality of the input data. If the data used to generate the synthetic data is biased, the generated data can perpetuate that bias. Synthetic data also requires some form of output/quality control. It needs to be checked against human-annotated data, or otherwise authentic data is some form.
How Is Synthetic Data Created?
Synthetic data is created programmatically with machine learning techniques. Classical machine learning techniques like decision trees can be used, as can deep learning techniques. The requirements for the synthetic data will influence what type of algorithm is used to generate the data. Decision trees and similar machine learning models let companies create non-classical, multi-modal data distributions, trained on examples of real-world data. Generating data with these algorithms will provide data that is highly correlated with the original training data. For instances where the typical distribution of data is known , a company can generate synthetic data through use of a Monte Carlo method.
Deep learning-based methods of generating synthetic data typically make use of either a variational autoencoder (VAE) or a generative adversarial network (GAN). VAEs are unsupervised machine learning models that make use of encoders and decoders. The encoder portion of a VAE is responsible for compressing the data down into a simpler, compact version of the original dataset, which the decoder then analyzes and uses to generate an a representation of the base data. A VAE is trained with the goal of having an optimal relationship between the input data and output, one where both input data and output data are extremely similar.
When it comes to GAN models, they are called “adversarial” networks due to the fact that GANs are actually two networks that compete with each other. The generator is responsible for generating synthetic data, while the second network (the discriminator) operates by comparing the generated data with a real dataset and tries to determine which data is fake. When the discriminator catches fake data, the generator is notified of this and it makes changes to try and get a new batch of data by the discriminator. In turn, the discriminator becomes better and better at detecting fakes. The two networks are trained against each other, with fakes becoming more lifelike all the time.
The Science of Real-Estate: Matching and Buying
Your data knows you best, let it find your dream home. The real-estate industry sits on tons of data that goes unused every year. In this article, we discuss how advanced technologies are helping real-estate investors, brokers, and companies utilize the mass amount of information within the industry to help people find their dream homes.
“The practice of AI-powered Urban Analytics is taking off within the real-estate industry. Data science and algorithmic logic are close to the forefront of new urban development practices. How close? is the question — experts predict that digitization will go far beyond intelligent building management systems. New analytical tools with predictive capabilities will dramatically affect the future of urban development, reshaping the real-estate industry in the process.”
Fast forward to 2020: leaving hype traps behind, we acknowledge the transformative effects of data literacy, digitalization strategies, and technology advancements. Predictive analytics, machine learning, and AI-powered applications are still leading innovation in a variety of industries, well beyond the real-estate sector. From the most boring ML applications to the most interesting NLP & OCR automation efforts, industry leaders have learned to leverage these powerful tools to their advantage.
Today we catch up with 3 real-estate use cases. They are meant to illustrate how modern software stacks and intuitive interfaces interplay with Machine Learning and data engineering to create unique products and services.
Home buying processes
Today’s real-estate market poses an interesting machine learning challenge: is there a formula for matching the right home-buyers with the right properties at the right prices? Seeking to build accurate home matching and discovery services is what keeps researchers and industry professionals on their toes. With huge data volumes available to them, and inspired by high accuracy of online recommender systems (Netflix, anyone?), home matching engines are seeing constant development, even in the not-so-technically-inclined real-estate sector.
Orchard is a broker that leverages modern tech tools to improve home discovery services. By using machine learning algorithms, they come up with an answer to the most pressing question that home buyers ask: “What does my dream house look like?”. Additionally, algorithms may help them answer a follow-up question: “Which compromises are I (not) willing to make?”.
Co-Founder and Chief Product & Marketing Officer, Phil DeGisi clarifies:
“Home Match is the first-ever home search algorithm that lets people choose the features that matter most to them. We ask buyers a series of questions about what they value and consider “must-haves” and “nice to haves” in a home – such as a kitchen island, pool in the backyard, and commute time within seconds. Orchard assigns a personal match score to every home in the search area. ”
Like this, the buyers are matched to legitimate house buying opportunities and the entire process becomes easier for all parties involved.
Users of house matching systems get to enjoy an experience characterized by increased personalization and usability. Search results are ranked according to their profiles and easy-to-use, interactive interfaces replace plain old real-estate catalogs.
“Orchard has also developed another industry-first, Photo Switch, which takes these personalized search results and displays them in a more visually useful and personalized way. To do this, Orchard created a machine-learning model to scan photos of every home on the market and determine which rooms are in each photo. This feature is the first of its kind and lets users easily compare their “must-haves” all at once. Whether it’s a chef’s kitchen, a fenced-in backyard, or a cozy living room, home-buyers can now view each room side-by-side in one browser, with the click of a single button.”
Such functionality is only possible due to the seamless interplay of modern tech tools. Web platforms, virtual reality SDKs, image processing algorithms as well as machine learning frameworks all contribute to create a unique real-estate experience.
Commercial real-estate valuations
Another crucial step in commercial real-estate is property valuation. Automated Valuation Models are as old as the industry itself, given the task of evaluating properties and establishing pricing schemes. Traditionally, these models were mostly based on historical sales data. However, models relying on past behavior only are missing out on a lot of other data sources.
Predictive analytics and modern data collection infrastructures are built to integrate external data sources and train algorithms based on heterogeneous data types. Instead of using a single data type that offers a limited perspective on a property, unified data architectures offer a 360-degree view and integrate external data sources: market demand, macroeconomic data, rental values, capital markets, jobs, traffic, etc. Since there are no hard limits to the data that can be used by a property valuation model, predictive analytics is a powerful tool available to real-estate agencies.
Smart Capital offers such a modern solution to property valuation. They use predictive analytics for the valuation of real-estate properties and promise to deliver a full report within one business day. Their CEO, Laura Krashakova, offers some insights into how they achieve this.
“The technology enables data processing and property valuation in real-time and gives individuals access to data previously available only to local brokers. Local insights such as the popularity of the location, amenities in the area, quality of public transport, proximity to major highways, and foot traffic are now readily available and are scored for ease of comparison.”
There are two aspects that make such a service possible in the first place: the ease of access and the possibility to deliver real-time insights. Mobile & web platforms make it easy for customers to access, upload, and visualize their data, regardless of their location. All that is needed is an internet connection. At the same time, predictive analytics frameworks are crunching data in real-time, at the speed of milliseconds. Once new data events occur, they are collected and included in the latest analysis report. No need to wait for time-consuming, intensive computations, since all of that computation can now happen almost instantly, in the cloud.
Once again, the interplay of modern technologies makes it possible to offer a seamless experience based on real-time insights. At the same time, the variety of external data sources becomes a guarantee for increased valuation accuracy. This saves time, money, and headaches for all parties involved.
Streamlined loan application processes
Another commercial real-estate process that poses an interesting challenge is the loan application. A challenge not only for the confused homebuyers but for machine learning models as well. Credit approval models need access to all kinds of data, from personal information, to credit history, historical transactions, and employment history. Manually identifying and integrating all these data sources can quickly turn into a tedious, time-consuming, and annoying task. Moreover, manual processing comes with a high risk of erroneous entries throughout the application. These aspects have turned the manual loan application process into a bottleneck for real-estate transactions.
If only some automated solution existed to take some of the pain away…
Beeline is a company focused on streamlining the loan application process. Their intuitive mobile interface guides buyers through loan applications in minutes. The entire process takes only 15 minutes and claims to save home buyers a lot of headaches. The way they do this is incredibly simple: their service connects to a variety of personal data sources (such as the bank, pay and tax info), uses natural language processing(NLP) to read and collect info, integrates and analyzes all the data in real-time. Like this, tedious and time-consuming processes are bypassed and home-buyers can enjoy streamlined loan application processes.
How is that possible, you’re wondering?
Their service is only possible by integrating a mobile-first experience, intelligent processing capabilities, as well as state of the art user design. Their loan guide is delivered via a chat interface, which gives the users an easy way to find answers to their questions. NLP algorithms are backing these interactions and help create a personalized experience.
At the same time, automated evaluation algorithms happen in the background, just as the buyer is filling in forms. This shows how automation is key to the success of their service. And the seamless interplay of tech tools is what makes this automation possible in the first place.
A powerful mix of tech trends is at the forefront of real-estate innovation: increased data availability, advancements in data processing capabilities, and the ubiquity of machine learning algorithms. They all make it possible to tackle the most challenging applications, in an intelligent, automated, and error-free manner.
On top of that, cloud computing capabilities and modern storage architectures make it possible to extract insights from data in real-time, build complex predictive models, and integrate a variety of data sources. All this makes it possible to foresee the future, innovate, and keep a competitive advantage.
image sources: Canva
- Kevin Tubbs, PhD, SVP Strategic Solutions Group at Penguin Computing – Interview Series
- SingularityNET Goes Multi-Chain with Cardano Collaboration
- How Facebook’s AI Spreads Misinformation and Threatens Democracy
- Andrew Stein, Software Engineer Waymo – Interview Series
- Michael Schrage, Author of Recommendation Engines (The MIT Press) – Interview Series