Connect with us


Hyun Kim, CEO and Co-Founder, Superb AI – Interview Series




Hyun Kim is the CEO and Co-Founder of Superb AI,  a company that provides a new generation machine learning data platform to AI teams so that they can build better AI in less time. The Superb AI Suite is an enterprise SaaS platform built to help ML engineers, product teams, researchers and data annotators create efficient training data workflows.

What initially attracted you to the field of AI, Data Science and Robotics?

As an undergraduate majoring in Biomedical Engineering at Duke, I was passionate about genetics and how we can engineer our DNA to cure diseases or create genetically engineered organisms. I remember one wet-lab experiment distinctly that kept failing for like 6 months straight. The most frustrating part of it was that there was a lot of repetitive manual work, and in hindsight that was probably the root of some many potential errors.

That frustration led me to become interested in anything that has to do with automation.  I basically floated around several labs at Duke until I joined one lab that was researching how machine learning can help diagnose Parkinson’s disease using brain MRI scans. Here I got a real taste for the game-changing potential of Deep Learning networks. I ended up pursuing a PhD program in computer science at Duke, and worked intensively at the Intelligent Robot Lab teaching robots how to learn things.

In 2016, you were involved in the Amazon Robotics Challenge. What were you working on and how did you enjoy this experience?

In Amazon Robotics Challenges, teams earn points by making robots autonomously pick and stow items in a given time. Robots in factories and assembly lines can be engineered to the specific object the robot is picking-and-placing, but the ARC challenges our robots to operate in very dynamic situations. I was the leader of “Team Duke” and its Motion Planning function. I designed and implemented robot motion planning methods for pick-and-place robot manipulation tasks in a realistic warehouse environment. It was an exciting learning experience as we had to put together a multitude of complex systems, from computer vision based robot perception systems to motion planning algorithms and custom-designed mechanical end-effector hardwares.

You then worked for nearly 2 years as a machine learning research engineer at SK T-Brain, what was this project?

About a year into my PhD, in March of 2016, Google’s AlphaGo defeated the Human champion, Lee Sedol, on Go. It was such groundbreaking news especially in Korea, where Go is much more popular than elsewhere.

After that event, the government and all major companies immediately started investing a ton in AI research. One of the companies was called SKT, or SK Telecom, a major Korean conglomerate. I was offered a Machine Learning Research Engineer position at their brand new AI research lab, called SKT Brain. I took a leave from my PhD, and went back to Korea to work for about 2 years.

The purpose of my team was to do research on various AI topics that could potentially spin off as a product, or create some business opportunity for the company. In those two years, I took some stabs at topics like self-driving cars, game AI (specifically StarCraft AI) and Generative Adversarial Networks or GANs.

After two years, instead of returning to school to finish my PhD program, I left to start my company, Superb AI.

What was the inspiration behind launching Superb AI?

While researching robot learning at school, and also while working at a corporate research lab, it was very apparent to me that most of my time was spent in data curation.

At school, I was spending most of my time creating simulated environments for robotics simulation data. At my previous company, I was spending time collecting and labeling datasets for self-driving and game AI.

And, sadly, that wasn’t just for me. It was the same for my colleagues, and a very common pain point for every researcher and engineer in the machine learning industry.

I wanted to fix that. As you can tell, I’m a big fan of machine learning and AI, and I think it can truly revolutionize our lives. I want tech breakthroughs to happen more quickly, and I want to see them applied to our everyday lives.

In order to make that happen, I initially researched making machine learning algorithms learn with less human input. That got me interested in things like unsupervised learning and meta-learning. After publishing a paper on GANs, I realized what I wanted to do. Instead of publishing research works, I wanted to make an actual product or service that can impact the industry and start solving the data problem right now.

How would you best describe the services that are offered by Superb AI?

Superb AI provides a machine learning data platform called the Superb AI Suite. Suite helps companies create, label and manage training data efficiently, and accelerate their ML Ops cycle.

It’s well known that the majority of machine learning teams spend over 50% of their time managing training datasets. We help engineers easily filter, search and manipulate training datasets, and integrate with their ML Ops stack, such as data storage or deep learning frameworks through a powerful SDK and API’s.

Product leaders also spend a lot of time with training data. We help make their lives easier through seamless issue tracking, data analytics and many collaboration and productivity related features.

Additionally, our auto-labeling feature, which utilizes many advanced ML techniques such as transfer learning and active learning, can assist the automatic labeling and quality control process.

What has been the most difficult aspect of building a machine learning data platform?

Building a machine learning data platform poses an interesting engineering challenge, not only because machine learning requires a tremendous amount of unstructured data, such as images and videos, but even more because the data must be constantly read and updated by numerous users around the world.

What are some companies that are using the Superb AI platform?

We have clients of varying sizes across many verticals. Large consumer electronics firms including Samsung and LG use our platform to manage data and accelerate their ML development process. Many enterprises and start-ups in the autonomous vehicles industry, as well as companies that utilize unmanned systems in applications from physical security to construction use our platform.

In addition, AR/VR and gaming companies use our training data platform to create and manage datasets that can teach ML models.

In the medical sector, research labs at renowned international universities use our platform to manage training data and more efficiently train Computer Vision models to recognize tumors within MRIs and CT scans.

Superb AI was a member of the Winter 2019 class of Y Combinator, could you describe this experience and what are some of the key takeaways that you learned?

Our two main takeaways were learning to focus on the users, and iterating quickly. YC’s motto is “make something people want”. Oftentimes startups, and especially tech startups, tend to focus on the technical innovations and neglect what the users actually need. Throughout the three month process, we learned to be extremely user-driven — talking to as many users as possible, and really trying to understand what they really need — while also iterating on product updates and messaging to fit the user needs. Since it’s impossible to nail a product or reach product-market-fit on the first try, it’s paramount to constantly talk to users and deliver what they want. It’s very obvious when you think about it, but very hard to focus on in practice.

Is there anything else that you would like to share about Superb AI?

We just beefed up our free product offering that gives users more raw asset storage and more labeling/data management/developer features and toolkits. We are on a real mission to democratize AI and want people to know that there are enterprise companies such as ourselves that are vested in providing as much as possible to further the adoption and integration of AI into our everyday lives.

Thank you for the interviews, readers who wish to learn more should visit Superb AI.

Spread the love

Antoine Tardif is a Futurist who is passionate about the future of AI and robotics. He is the CEO of, and has invested in over 50 AI & blockchain projects. He is the Co-Founder of a news website focusing on digital securities, and is a founding partner of unite.AI. He is also a member of the Forbes Technology Council.

AI 101

What Is Synthetic Data?




What is Synthetic Data?

Synthetic data is a quickly expanding trend and emerging tool in the field of data science. What is synthetic data exactly? The short answer is that synthetic data is comprised of data that isn’t based on any real-world phenomena or events, rather it’s generated via a computer program. Yet why is synthetic data becoming so important for data science? How is synthetic data created? Let’s explore the answers to these questions.

What is a Synthetic Dataset?

As the term “synthetic” suggests, synthetic datasets are generated through computer programs, instead of being composed through the documentation of real-world events. The primary purpose of a synthetic dataset is to be versatile and robust enough to be useful for the training of machine learning models.

In order to be useful for a machine learning classifier, the synthetic data should have certain properties. While the data can be categorical, binary, or numerical, the length of the dataset should be arbitrary and the data should be randomly generated. The random processes used to generate the data should be controllable and based on various statistical distributions. Random noise may also be placed in the dataset.

If the synthetic data is being used for a classification algorithm, the amount of class separation should be customizable, in order that the classification problem can be made easier or harder according to the problem’s requirements. Meanwhile, for a regression task, non-linear generative processes can be employed to generate the data.

Why Use Synthetic Data?

As machine learning frameworks like TensorfFlow and PyTorch become easier to use and pre-designed models for computer vision and natural language processing become more ubiquitous and powerful, the primary problem that data scientists must face is the collection and handling of data. Companies often have difficulty acquiring large amounts of data to train an accurate model within a given time frame. Hand-labeling data is a costly, slow way to acquire data. However, generating and using synthetic data can help data scientists and companies overcome these hurdles and develop reliable machine learning models a quicker fashion.

There are a number of advantages to using synthetic data. The most obvious way that the use of synthetic data benefits data science is that it reduces the need to capture data from real-world events, and for this reason it becomes possible to generate data and construct a dataset much more quickly than a dataset dependent on real-world events. This means that large volumes of data can be produced in a short timeframe. This is especially true for events that rarely occur, as if an event rarely happens in the wild, more data can be mocked up from some genuine data samples. Beyond that, the data can be automatically labeled as it is generated, drastically reducing the amount of time needed to label data.

Synthetic data can also be useful to gain training data for edge cases, which are instances that may occur infrequently but are critical for the success of your AI. Edge cases are events that are very similar to the primary target of an AI but differ in important ways. For instance, objects that are only partially in view could be considered edge cases when designing an image classifier.

Finally, synthetic datasets can minimize privacy concerns. Attempts to anonymize data can be ineffective, as even if sensitive/identifying variables are removed from the dataset, other variables can act as identifiers when they are combined. This isn’t an issue with synthetic data, as it was never based on a real person, or real event, in the first place.

Uses Cases for Synthetic Data

Synthetic data has a wide variety of uses, as it can be applied to just about any machine learning task. Common use cases for synthetic data include self-driving vehicles, security, robotics, fraud protection, and healthcare.

One of the initial use cases for synthetic data was self-driving cars, as synthetic data is used to create training data for cars in conditions where getting real, on-the-road training data is difficult or dangerous. Synthetic data is also useful for the creation of data used to train image recognition systems, like surveillance systems, much more efficiently than manually collecting and labeling a bunch of training data. Robotics systems can be slow to train and develop with traditional data collection and training methods. Synthetic data allows robotics companies to test and engineer robotics systems through simulations. Fraud protection systems can benefit from synthetic data, and new fraud detection methods can be trained and tested with data that is constantly new when synthetic data is used. In the healthcare field, synthetic data can be used to design health classifiers that are accurate, yet preserve people’s privacy, as the data won’t be based on real people.

Synthetic Data Challenges

While the use of synthetic data brings many advantages with it, it also brings many challenges.

When synthetic data is created, it often lacks outliers. Outliers occur in data naturally, and while often dropped from training datasets, their existence may be necessary to train truly reliable machine learning models. Beyond this, the quality of synthetic data can be highly variable. Synthetic data is often generated with an input, or seed, data, and therefore the quality of the data can be dependent on the quality of the input data. If the data used to generate the synthetic data is biased, the generated data can perpetuate that bias. Synthetic data also requires some form of output/quality control. It needs to be checked against human-annotated data, or otherwise authentic data is some form.

How Is Synthetic Data Created?

Synthetic data is created programmatically with machine learning techniques. Classical machine learning techniques like decision trees can be used, as can deep learning techniques. The requirements for the synthetic data will influence what type of algorithm is used to generate the data. Decision trees and similar machine learning models let companies create non-classical, multi-modal data distributions, trained on examples of real-world data. Generating data with these algorithms will provide data that is highly correlated with the original training data. For instances where the typical distribution of data is known , a company can generate synthetic data through use of a Monte Carlo method.

Deep learning-based methods of generating synthetic data typically make use of either a variational autoencoder (VAE) or a generative adversarial network (GAN). VAEs are unsupervised machine learning models that make use of encoders and decoders. The encoder portion of a VAE is responsible for compressing the data down into a simpler, compact version of the original dataset, which the decoder then analyzes and uses to generate an a representation of the base data. A VAE is trained with the goal of having an optimal relationship between the input data and output, one where both input data and output data are extremely similar.

When it comes to GAN models, they are called “adversarial” networks due to the fact that GANs are actually two networks that compete with each other. The generator is responsible for generating synthetic data, while the second network (the discriminator) operates by comparing the generated data with a real dataset and tries to determine which data is fake. When the discriminator catches fake data, the generator is notified of this and it makes changes to try and get a new batch of data by the discriminator. In turn, the discriminator becomes better and better at detecting fakes. The two networks are trained against each other, with fakes becoming more lifelike all the time.

Spread the love
Continue Reading


The Science of Real-Estate: Matching and Buying




Your data knows you best, let it find your dream home. The real-estate industry sits on tons of data that goes unused every year. In this article, we discuss how advanced technologies are helping real-estate investors, brokers, and companies utilize the mass amount of information within the industry to help people find their dream homes.

In 2017, a Field Actions Science Reports article addresses the impact of AI, machine learning, and predictive analytics on the real-estate sector:

“The practice of AI-powered Urban Analytics is taking off within the real-estate industry. Data science and algorithmic logic are close to the forefront of new urban development practices. How close? is the question — experts predict that digitization will go far beyond intelligent building management systems. New analytical tools with predictive capabilities will dramatically affect the future of urban development, reshaping the real-estate industry in the process.”

Fast forward to 2020: leaving hype traps behind, we acknowledge the transformative effects of data literacy, digitalization strategies, and technology advancements. Predictive analytics, machine learning, and AI-powered applications are still leading innovation in a variety of industries, well beyond the real-estate sector. From the most boring ML applications to the most interesting NLP & OCR automation efforts, industry leaders have learned to leverage these powerful tools to their advantage.

Today we catch up with 3 real-estate use cases. They are meant to illustrate how modern software stacks and intuitive interfaces interplay with Machine Learning and data engineering to create unique products and services.

science of real estate one

science of real estate: Your data knows you best, let it find you the perfect home.

Home buying processes

Today’s real-estate market poses an interesting machine learning challenge: is there a formula for matching the right home-buyers with the right properties at the right prices? Seeking to build accurate home matching and discovery services is what keeps researchers and industry professionals on their toes. With huge data volumes available to them, and inspired by high accuracy of online recommender systems (Netflix, anyone?), home matching engines are seeing constant development, even in the not-so-technically-inclined real-estate sector. 

Orchard is a broker that leverages modern tech tools to improve home discovery services. By using machine learning algorithms, they come up with an answer to the most pressing question that home buyers ask: “What does my dream house look like?”. Additionally, algorithms may help them answer a follow-up question: “Which compromises are I (not) willing to make?”. 

Co-Founder and Chief Product & Marketing Officer, Phil DeGisi clarifies:

Home Match is the first-ever home search algorithm that lets people choose the features that matter most to them. We ask buyers a series of questions about what they value and consider “must-haves” and “nice to haves” in a home – such as a kitchen island, pool in the backyard, and commute time within seconds. Orchard assigns a personal match score to every home in the search area.

Like this, the buyers are matched to legitimate house buying opportunities and the entire process becomes easier for all parties involved. 

Users of house matching systems get to enjoy an experience characterized by increased personalization and usability. Search results are ranked according to their profiles and easy-to-use, interactive interfaces replace plain old real-estate catalogs.

“Orchard has also developed another industry-first, Photo Switch, which takes these personalized search results and displays them in a more visually useful and personalized way. To do this, Orchard created a machine-learning model to scan photos of every home on the market and determine which rooms are in each photo. This feature is the first of its kind and lets users easily compare their “must-haves” all at once. Whether it’s a chef’s kitchen, a fenced-in backyard, or a cozy living room, home-buyers can now view each room side-by-side in one browser, with the click of a single button.”

Such functionality is only possible due to the seamless interplay of modern tech tools. Web platforms, virtual reality SDKs, image processing algorithms as well as machine learning frameworks all contribute to create a unique real-estate experience.

Commercial real-estate valuations

Another crucial step in commercial real-estate is property valuation. Automated Valuation Models are as old as the industry itself, given the task of evaluating properties and establishing pricing schemes. Traditionally, these models were mostly based on historical sales data. However, models relying on past behavior only are missing out on a lot of other data sources.

Predictive analytics and modern data collection infrastructures are built to integrate external data sources and train algorithms based on heterogeneous data types. Instead of using a single data type that offers a limited perspective on a property, unified data architectures offer a 360-degree view and integrate external data sources: market demand, macroeconomic data, rental values, capital markets, jobs, traffic, etc. Since there are no hard limits to the data that can be used by a property valuation model, predictive analytics is a powerful tool available to real-estate agencies. 

Smart Capital offers such a modern solution to property valuation. They use predictive analytics for the valuation of real-estate properties and promise to deliver a full report within one business day. Their CEO, Laura Krashakova, offers some insights into how they achieve this.

The technology enables data processing and property valuation in real-time and gives individuals access to data previously available only to local brokers. Local insights such as the popularity of the location, amenities in the area, quality of public transport, proximity to major highways, and foot traffic are now readily available and are scored for ease of comparison.

There are two aspects that make such a service possible in the first place: the ease of access and the possibility to deliver real-time insights. Mobile & web platforms make it easy for customers to access, upload, and visualize their data, regardless of their location. All that is needed is an internet connection. At the same time, predictive analytics frameworks are crunching data in real-time, at the speed of milliseconds. Once new data events occur, they are collected and included in the latest analysis report. No need to wait for time-consuming, intensive computations, since all of that computation can now happen almost instantly, in the cloud.

Once again, the interplay of modern technologies makes it possible to offer a seamless experience based on real-time insights. At the same time, the variety of external data sources becomes a guarantee for increased valuation accuracy. This saves time, money, and headaches for all parties involved.

Streamlined loan application processes

Another commercial real-estate process that poses an interesting challenge is the loan application. A challenge not only for the confused homebuyers but for machine learning models as well. Credit approval models need access to all kinds of data, from personal information, to credit history, historical transactions, and employment history. Manually identifying and integrating all these data sources can quickly turn into a tedious, time-consuming, and annoying task. Moreover, manual processing comes with a high risk of erroneous entries throughout the application. These aspects have turned the manual loan application process into a bottleneck for real-estate transactions.

If only some automated solution existed to take some of the pain away…

Beeline is a company focused on streamlining the loan application process. Their intuitive mobile interface guides buyers through loan applications in minutes. The entire process takes only 15 minutes and claims to save home buyers a lot of headaches. The way they do this is incredibly simple: their service connects to a variety of personal data sources (such as the bank, pay and tax info), uses natural language processing(NLP) to read and collect info, integrates and analyzes all the data in real-time. Like this, tedious and time-consuming processes are bypassed and home-buyers can enjoy streamlined loan application processes.

How is that possible, you’re wondering? 

Their service is only possible by integrating a mobile-first experience, intelligent processing capabilities, as well as state of the art user design. Their loan guide is delivered via a chat interface, which gives the users an easy way to find answers to their questions. NLP algorithms are backing these interactions and help create a personalized experience.

At the same time, automated evaluation algorithms happen in the background, just as the buyer is filling in forms. This shows how automation is key to the success of their service. And the seamless interplay of tech tools is what makes this automation possible in the first place.

What’s next?

A powerful mix of tech trends is at the forefront of real-estate innovation: increased data availability, advancements in data processing capabilities, and the ubiquity of machine learning algorithms. They all make it possible to tackle the most challenging applications, in an intelligent, automated, and error-free manner. 

On top of that, cloud computing capabilities and modern storage architectures make it possible to extract insights from data in real-time, build complex predictive models, and integrate a variety of data sources. All this makes it possible to foresee the future, innovate, and keep a competitive advantage.

image sources: Canva

Spread the love
Continue Reading

Data Science

AI and IoT: Transportation Management in Smart Cities




The Smart Cities of today are powered by advanced technologies that are constantly reshaping urban areas. AI and IoT are becoming increasingly integral to how the world operates. Cloud-based services, the Internet of Things, analytics platforms, and many AI tools are changing the way citizens interact with and move within their environment.

These modern technologies, as outlined by Blue Orange Digital, a top-ranked AI consulting and development agency in NYC, enable applications ranging from waste management to food supply optimization and healthcare digitization. In the process, they are disrupting entire industries and creating new business opportunities and applications. 

Among all urban responsibilities, transportation management poses an interesting problem, even for the most advanced AI tools and technologies. City traffic is a highly dynamic environment, where thousands of participants using different transportation modalities interact in complex manners. On top of that, decisions need to be taken in real-time, in order to ensure the safety and well being of all traffic participants. Activity planning in such an environment is an extremely challenging task. Luckily, AI-powered Smart City technologies are already making great progress in tackling some of the most pressing transportation management issues. 

Below is a list of the most common traffic management solutions that IoT and AI technologies are powering.

Crowdsourced data enables optimized routes for all vehicle types

Data is power, and this holds true especially for city planners: it has become mandatory that their decisions are backed by data. Information about how different city areas are used by the citizens (mobility data) can provide crucial insights into transportation needs. It offers them an accurate overview of how different city pathways are being used and thus increases the chances for more accurate, citizen-friendly planning.

Crowdsourced data is nowadays ubiquitous and originates in a variety of devices. Our smartphones, tablets, laptops and even cars are all constantly emitting geolocation data. A variety of applications are capturing this data and using it to power consumer-facing services. At the same time, analytics frameworks make it straightforward to extract insights from such heterogeneous data sources. By sharing this data with city administration and city planners, it is possible to capitalize on this rich mobility data in order to improve the planning process. 

Think about the most popular bike pathways in your city or the most populated pedestrian areas. Planning without knowledge of how these areas are used would be equivalent to climbing Mount Everest blindfolded, in the dark. Visualization and analytics are definitely needed to bring light to the process and to make sure that all planning decisions are powered by citizen-generated data.

The benefits of crowdsourced mobility data can translate into improved walkability and reduced commute times. For bike riders, this translates to optimized routes and greener pathways, while for the car drivers it means less time spent in city centers, waiting for traffic lights and pedestrians. Mobility data makes it a win-win-win for all traffic participants.

Computer vision & AI enable pedestrian and vehicle safety

Ensuring public road safety is a crucial responsibility of transportation management systems. The complex environment created by vehicles and pedestrians needs to be kept under close surveillance, in order to ensure the safety of all traffic participants.

Luckily, technology is available that makes it possible to automate such surveillance tasks and delegate them to software and algorithms. Computer vision and video analytics can be implemented both on roadside cameras, but also on cars. Algorithms can perform computation on the edge and can detect situational and behavioral abnormalities at the moment when they happen. From the automated reading of license plates to detecting walking patterns, a variety of applications become possible thanks to computer vision. When implemented as part of traffic management systems, they can minimize the high risks associated with careless driving and ensure the safety of public pedestrian areas. 

Delegating and automated tasks to software have the potential to create a much safer environment for all traffic participants. Computer vision and video analytics are the leading technologies for efforts in this direction.

IoT Sensors enable accurate traffic monitoring in smart cities

Understanding traffic is a task that needs to be done in real-time, in order to be able to optimize the traffic flow, both within and outside of urban areas. This involves the identification and communication of accidents, congestion, and temporary roadside obstacles, among other traffic events.

Sensor technologies and advanced wireless communication protocols make it possible for all kinds of vehicles to communicate direction, speed, and travel times. There is no limit to the amount of information that they can communicate, given the increased customizability of IoT devices. Not only can they be attached to any moving object, but they also make it possible to collect and communicate contextual information from the environment. 

Sensor-collected data makes it possible to run real-time analytics, that power immediate traffic management decisions. Such an example application is that of adaptive traffic signals, which are not simply programmed, but take into account live traffic information.

The benefits of sensor-based solutions can be translated into active traffic management measures. They enable short-term prediction and control and can lead to reduced congestion and increased traffic fluidity. By helping traffic management institutions cut down on emissions, noise, and travel times, IoT-based sensor technologies play a crucial role in any modern transportation management system.

What’s next for AI and IoT in smart cities?

City planners and engineers are now working in increasingly complex environments and need to solve increasingly complex problems. AI and IoT are helping them tackle these problems. Traffic and transportation management poses a modern challenge that would be tricky to tackle without the help of software and algorithms. Additionally, traffic management plays a crucial role in any Smart City since it can easily impact the well functioning of all other city functions.

Luckily, modern technologies make it possible to leverage citizen-generated mobility data in order to tackle such complex tasks. With the increased availability of analytics frameworks, cloud services, and data collection devices, it becomes possible to find modern solutions and integrate real-time data as part of traffic management decisions. 

When data is used for decision making and for gaining a better understanding of city travel dynamics, the quality of the management applications also increases. This ensures that traffic control strategies and future infrastructure development projects will accurately match the citizens’ needs. AI and IoT are becoming the new technological norm and that’s a future we are eagerly looking forward to.

Spread the love
Continue Reading