Omer Har is a data science and software engineering veteran with nearly a decade of experience building AI models that drive big businesses forward.
Omer Har is the Co-Founder and CTO of Explorium, a company that offers a first of its kind data science platform powered by augmented data discovery and feature engineering. By automatically connecting to thousands of external data sources and leveraging machine learning to distill the most impactful signals, the Explorium platform empowers data scientists and business leaders to drive decision-making by eliminating the barrier to acquire the right data and enabling superior predictive power.
When did you first discover that you wanted to be involved in data science?
My interest in data science goes back over a decade, which is about how long I’ve been practicing and leading data science teams. I started out as a software engineer but was always drawn to the complex data and algorithmic challenges from early on. I was lucky to have learned the craft at Microsoft Research, which was one of the few places at the time where you could really work on complex applied machine learning challenges at scale.
You Co-Founded Explorium in 2017, could you discuss the inspiration behind launching this start-up?
Explorium is based on a simple and very powerful need — there is so much data around us that could potentially help build better models, but there is no way to know in advance which data sources are going to be impactful, and how. The original idea came from Maor Shlomo, Explorium Co-founder and CEO, who was dealing with unprecedented data variety in his military service and tackling ways to leverage it into decision making and modeling. When the three of us first came together, it was immediately clear to us that this experience echoes the needs we were dealing within the business world, particularly in fast-growing data science-driven fields like advertising and marketing technology unicorns, where both I and Or Tamir (Explorium Co-founder and COO) were leading growth through data.
Before Explorium, finding relevant data sources that really made an impact — to improve your machine learning model’s accuracy — was a labor-intensive, time-consuming, and expensive process with low chances of success. The reason is that you are basically guessing, and using your most expensive people — data scientists — to experiment. Moreover, data acquisition itself is a complex business process and data science teams usually do not have the ability to commercially engage with multiple data providers.
As a data science leader that was measured by business impact generated by models, I didn’t have the luxury of sending my team on a wild goose chase. As a result, you often prefer to deploy your efforts on things that can have a much lower impact than a relevant new data source, just because they are much more within your realm of control.
Explorium recently successfully raised an additional $31M in funding in a Series B round. Have you been surprised at how fast your company has grown?
It has definitely been a rocket ship ride so far, and you can never take that for granted. I can’t say I was surprised by how widespread the need for better data is, but it’s always an incredible experience to see the impact you generate for customers and their business. The greatest analytical challenge organizations will face over the next decade is finding the right data to feed their models and automated processes. The right data assets can crown new market leaders, so our growth really reflects the rapidly growing number of customers that realize that and are making data a priority. In fact, the number of “Data Hunters” — people looking for data as part of their day to day job — is growing exponentially in our experience.
Could you explain what Explorium’s data platform is and what the automated data discovery process is?
Explorium offers an end-to-end data science platform powered by augmented data discovery and feature engineering. We are focused on the “data” part of data science — which means automatically connecting to thousands of external data sources and leveraging machine learning processes to distill the most impactful signals and features. This is a complex and multi-stage process, which starts by connecting to a myriad of contextually relevant sources in what we call the Explorium data catalog. Then we automate the process that explores this interconnected data variety, by testing hundreds of thousands of ideas for meaningful features and signals to create the optimal feature set, build models on top of it, and serve them to production in flexible ways.
By automating the search for the data you need, not just the data you have internally, the Explorium platform is doing to data science what search engines did for the web — we are scouring, ranking, and bringing you the most relevant data for the predictive question at hand.
This empowers data scientists and business leaders to drive decision-making by eliminating the barrier to acquire the right data and enabling superior predictive power.
What types of external data sources does Explorium tap into?
We hold access to thousands of sources across pretty much any data category you can think of including company, geospatial, behavioral, time-based, website data, and more. We have multiple expert teams that specialize in data acquisition through open, public, and premium sources, as well as partnerships. Our access to unique talent out of Israel’s top intelligence and technology units brings substantial know-how and experience in leveraging data variety for decision making.
How does Explorium use machine learning to understand which types of data are relevant to clients?
This is part of our “secret sauce” so I can’t dive in, but on a high level, we use machine learning to understand the meaning behind the different parts of your datasets and employ constantly improving algorithms to identify which sources in our evolving catalog are potentially relevant. By actually connecting these sources to your data, we are able to perform complex data discovery and feature engineering processes, specifically designed to be effective for external and high-dimensional data, to identify the most impactful features from the most relevant sources. Doing it all in the context of machine learning models makes the impact statistically measurable and allows us to constantly learn and improve our matching, generation, and discovery capabilities.
One of the solutions that is offered is mitigating application fraud risk for online lenders by using augmented data discovery. Could you go into details on how this solution works?
Lending is all about predicting and mitigating risk — whether it comes from the borrower’s ability to repay the loan (e.g. financial performance) or their intention to do so (e.g. fraud). Loan applications are inherently a tradeoff between the lender’s desire to collect more information and their ability to compete with other providers, as longer and more cumbersome questionnaires have lower completion rates, are biased by definition, and so on.
With Explorium, both incumbent banks and online challengers are able to automatically augment the application process with external and objective sources that add immediate context and uncover meaningful relationships. Without giving away too much to help fraudsters, you can imagine that in the context of fraud this could mean different behaviors and properties that stand out versus real applicants if you are able to gather a 360-view of them. Everything from online presence, official records, behavioral patterns on social media, and physical footprints leave breadcrumbs that could be hypothesized and tested as potential features and indicators if you can access the relevant data and vertical know-how. Simply put, better data ensures better predictive models, which helps translate the reduced risk and higher revenue to lenders’ bottom line.
In a wider view, since COVID-19 hit on a global scale, we’ve been seeing an increase in new fraud patterns as well as lenders’ need to go back to basics, as the pandemic broke all the models. No one really took this sort of a “Black Swan” event into account, and part of our initial response to help these companies has been generating custom signals that help assess business risk in these uncertain and dynamic times.
You can read more about in an excellent post written by Maor Shlomo, Explorium Co-Founder and CEO.
Thank you for the great interview, readers who wish to learn more should visit Explorium.