Xavier Conort is a visionary data scientist with more than 25 years of data experience. He began his career as an actuary in the insurance industry before transitioning to data science. He’s a top-ranked Kaggle competitor and was the Chief Data Scientist at DataRobot before co-founding FeatureByte.
FeatureByte is on a mission to scale enterprise AI, by radically simplifying and industrializing AI data. The feature engineering and management platform empowers data scientists to create and share state-of-the-art features and production-ready data pipelines in minutes – instead of weeks or months.
You began your career as an actuary in the Insurance industry before transitioning to Data Science, what caused this shift?
A defining moment was winning the GE Flight Quest, a competition organized by GE with a $250K pool prize, where participants had to predict delays of US domestic flights. I owe part of that success to a valuable insurance practice: the 2 stages modeling. This approach helps control bias in features that lack sufficient representation in the available training data. Along with other wins on Kaggle, this achievement convinced me that my actuarial background afforded me a competitive advantage in the field of data science.
During my Kaggle journey, I also had the privilege of connecting with other enthusiastic data scientists, including Jeremy Achin and Tom De Godoy, who would later become the founders of DataRobot. We shared a common background in insurance and had achieved notable successes on Kaggle. When they eventually launched DataRobot, a company specializing in AutoML, they invited me to join them as the Chief Data Scientist. Their vision of combining the best practices from the insurance industry with the power of machine learning excited me, presenting an opportunity to create something innovative and impactful.
At DataRobot and were instrumental in building their Data Science roadmap. What type of data challenges did you face?
The most significant challenge we faced was the varying quality of data provided as input to our AutoML solution. This issue often resulted in either time-consuming collaboration between our team and clients or disappointing results in production if not addressed appropriately. The quality issues stemmed from multiple sources that required our attention.
One of the primary challenges arose from the general use of business intelligence tools for data prep and management. While these tools are valuable for generating insights, they lack the capabilities required to ensure point-in-time correctness for machine learning data preparation. As a result, leaks in training data could occur, leading to overfitting and inaccurate model performance.
Miscommunication between data scientists and data engineers was another challenge that affected the accuracy of models during production. Inconsistencies between the training and production phases, arising from misalignment between these two teams, could impact model performance in a real-world environment.
What were some of the key takeaways from this experience?
My experience at DataRobot highlighted the significance of data preparation in machine learning. By addressing the challenges of generating model training data, such as point-in-time correctness, expertise gaps, domain knowledge, tool limitations, and scalability, we can enhance the accuracy and reliability of machine learning models. I came to the conclusion that streamlining the data preparation process and incorporating innovative technologies will be instrumental in unlocking the full potential of AI and delivering on its promises.
We also heard from your Co-Founder Razi Raziuddin about the genesis story behind FeatureByte, could we get your version of the events?
When I discussed my observations and insights with my Co-Founder Razi Raziuddin, we realized that we shared a common understanding of the challenges in data preparation for machine learning. During our discussions, I shared with Razi my insights into the recent advancements in the MLOps community. I could observe the emergence of feature stores and feature platforms that AI-first tech companies put in place to reduce the latency of feature serving, encourage feature reuse or simplify feature materialization into training data while ensuring training-serving consistency. However, it was evident to us that there was still a gap in meeting the needs of data scientists. Razi shared with me his insights into how the modern data stack has revolutionized BI and analytics, but is not being fully leveraged for AI.
It became apparent to both Razi and me that we had the opportunity to make a significant impact by radically simplifying the feature engineering process and providing data scientists and ML engineers with the right tools and user experience for seamless feature experimentation and feature serving.
What were some of your biggest challenges in making the transition from data scientist to entrepreneur?
Transitioning from a data scientist to an entrepreneur required me to change from a technical perspective to a broader business-oriented mindset. While I had a strong foundation in understanding pain points, creating a roadmap, executing plans, building a team, and managing budgets, I found that crafting the right messaging that truly resonated with our target audience was one of my biggest obstacles.
As a data scientist, my primary focus had always been on analyzing and interpreting data to derive valuable insights. However, as an entrepreneur, I needed to redirect my thinking towards the market, customers, and the overall business.
Fortunately, I was able to overcome this challenge by leveraging the experience of someone like my Co-Founder Razi.
We heard from Razi about why feature engineering is so difficult, in your view what makes it so challenging?
Feature engineering has two main challenges:
- Transforming existing columns: This involves converting data into a suitable format for machine learning algorithms. Techniques like one-hot encoding, feature scaling, and advanced methods such as text and image transformations are used. Creating new features from existing ones, like interaction features, can greatly enhance model performance. Popular libraries like scikit-learn and Hugging Face provide extensive support for this type of feature engineering. AutoML solutions aim to simplify the process too.
- Extracting new columns from historical data: Historical data is crucial in problem domains such as recommendation systems, marketing, fraud detection, insurance pricing, credit scoring, demand forecasting, and sensor data processing. Extracting informative columns from this data is challenging. Examples include time since the last event, aggregations over recent events, and embeddings from sequences of events. This type of feature engineering requires domain expertise, experimentation, strong coding and data engineering skills, and deep data science knowledge. Factors like time leakage, handling large datasets, and efficient code execution also need consideration.
Overall, feature engineering requires expertise, experimentation and construction of complex ad-hoc data pipelines in the absence of tools specifically designed for it.
Could you share how FeatureByte empowers data science professionals while simplifying feature pipelines?
FeatureByte empowers data science professionals by simplifying the whole process in feature engineering. With an intuitive Python SDK, it enables quick feature creation and extraction from XLarge Event and Item Tables. Computation is efficiently handled by leveraging the scalability of data platforms such as Snowflake, DataBricks and Spark. Notebooks facilitate experimentation, while feature sharing and reuse save time. Auditing ensures feature accuracy, while immediate deployment eliminates pipeline management headaches.
In addition to these capabilities offered by our open-source library, our enterprise solution provides a comprehensive framework for managing and organizing AI operations at scale, including governance workflows and a user interface for the feature catalog.
What is your vision for the future of FeatureByte?
Our ultimate vision for FeatureByte is to revolutionize the field of data science and machine learning by empowering users to unleash their full creative potential and extract unprecedented value from their data assets.
We are particularly excited about the rapid progress in Generative AI and transformers, which opens up a world of possibilities for our users. Furthermore, we are dedicated to democratizing feature engineering. Generative AI has the potential to lower the barrier of entry for creative feature engineering, making it more accessible to a wider audience.
In summary, our vision for the future of FeatureByte revolves around continuous innovation, harnessing the power of Generative AI, and democratizing feature engineering. We aim to be the go-to platform that enables data professionals to transform raw data into actionable input for machine learning, driving breakthroughs and advancements across industries.
Do you have any advice for aspiring AI entrepreneurs?
Define your space, stay focused and welcome novelty.
By defining the space that you want to own, you can differentiate yourself and establish a strong presence in that area. Research the market, understand the needs and pain points of potential customers, and strive to provide a unique solution that addresses those challenges effectively.
Define your long-term vision and set clear short-term goals that align with that vision. Concentrate on building a strong foundation and delivering value in your chosen space.
Finally, while it's important to stay focused, don't shy away from embracing novelty and exploring new ideas within your defined space. The AI field is constantly evolving, and innovative approaches can open up new opportunities.
Thank you for the great interview, readers who wish to learn more should visit FeatureByte.