Omri Geller is the CEO and Co-Founder at Run:AI
Run:AI virtualizes and accelerates AI by pooling GPU compute resources to ensure visibility and, ultimately, control over resource prioritization and allocation. This ensures that AI projects are mapped to business goals and yields significant improvement in the productivity of data science teams, allowing them to build and train concurrent models without resource limitations.
What was it that initially attracted you to Artificial Intelligence?
When I began my Bachelor’s degree in Electrical and Electronics Engineering at Tel Aviv University, I discovered fascinating things about AI that I knew would help take us to the next step in computing possibilities. From there, I knew I wanted to invest myself into the AI space. Whether it was in AI research, or opening a company that would help introduce new ways to apply AI to the world.
Have you always had an interest in computer hardware?
When I received my first computer with an Intel 486 processor at six or seven years old, I was immediately interested to figure out how everything worked, even though I was probably too young to really understand it. Aside from sports, computers became one of my biggest hobbies growing up. Since then, I have built computers, worked with them, and went on to study in the field because of the passion I had as a kid.
What was your inspiration behind launching Run:AI?
I knew from very early on that I wanted to invest myself into the AI space. In the last couple of years, the industry has seen tremendous growth in AI, and a lot of that growth came from both computer scientists, like myself, and hardware that could support more applications. It became clearer to me that I would inevitably start a company – and together with my co-founder Ronen Dar – to continue to innovate and help bring AI to more enterprise companies.
Run:AI enables machine learning specialists to gain a new type of control over the allocation of expensive GPU resources. Can you explain how this works?
What we need to understand is that machine learning engineers, like researchers and data scientists, need to consume computing power in a flexible way. Not only are today’s newest computations very compute-intensive, but there are also new workflows that are being used in data science. These workflows are based on the fact that data science is based on experimentation and running experiments.
In order to develop new solutions to run more efficient experiments, we need to study these workflow tendencies across time. For example: A data scientist uses eight GPUs in one day, but then the next day they might use zero, or they can use one GPU for a long period of time, but then need to use 100 GPUs because they want to run 100 experiments in parallel. Once we understand this workflow for optimizing the processing power of one user, we can begin to scale it to several users.
With traditional computing, a specific number of GPUs are allocated to every user, not taking into account if they are in use or not. With this method, often times, expensive GPUs sit idle without anybody else being able to access them, resulting in low ROI for the GPU. We understand a company’s financial priorities, and offer solutions that allow for dynamic allocation of those resources according to the needs of the users. By offering a flexible system, we can allocate extra power to a specific user when required, by utilizing GPUs not in use by other users, creating maximum ROI for a company’s computing resources and accelerating innovation & time to market of AI solutions.
One of the Run:AI functionalities is that it enables the reduction of blind spots created by static allocation of GPU. How is this achieved?
We have a tool that gives us full visibility into the cluster of resources. By using this tool, we can observe and understand if there are blind spots, and then utilize those idle GPUs for users that need the allocation. The same tool that provides visibility into the cluster and control over the cluster also makes sure those blind spots are mitigated.
In a recent speech, you highlighted some distinctions between build and training workflows, can you explain how Run:AI uses a GPU queueing management mechanism to allocate resource management for both?
An AI model is built in two stages. First, there is the building stage, where a data scientist is writing the code to build the actual model, the same way that an engineer would build a car. The second is the training stage, where the completed model begins to learn and be ‘trained’ on how to optimize a specific task. Similar to someone learning to drive the car after it has been assembled.
To build the model itself, not much computing power is needed. However, eventually, it could need stronger processing power to begin smaller, internal tests. For example, the way an engineer would eventually want to test the engine before they install it. Because of these distinct needs during each stage, Run.AI allows for GPU allocation regardless of if they are building or training the model, however, as mentioned earlier, higher GPU use is generally required for training the model while less is required for building it.
How much raw computing time/resources can be saved by AI developers who wish to integrate Run.AI into their systems?
Our solutions at Run.ai can improve the digitization of resources, by about two to three times, meaning 2-3 times better overall productivity.
Thank you for the interview, readers who wish to learn more may visit Run:AI.