Vijay Kurkal serves as the CEO for Resolve where he oversees the strategic growth of the company as it helps maximize the potential of AIOps and IT automation in enterprises around the world. Vijay has a long history in the tech industry, having spent the last twenty years working with numerous software and hardware companies that have run the gamut from mainframe to bleeding-edge, emerging tech. Before joining Resolve, he held leadership positions at IBM, VMware, Bain & Company, and Insight Partners, playing a critical role in accelerating the growth of a wide array of technology companies and introducing state-of-the-art product lines.
You’ve been leading Resolve since 2018, first as COO, and now as CEO. What initially drew you to this company?
There’s a huge need for automation and AIOps today given the challenges that enterprise IT organizations face. These teams are managing increasingly complex, highly virtualized, hybrid environments and are tasked with rapidly implementing new technologies to stay competitive. Without the aid of tools like automation and AIOps, it’s impossible to effectively manage these environments, and the complexity is only going to grow.
Given the tremendous market opportunity, I was immediately drawn to Resolve’s deep roots in automation. Drawing on my 20 years of experience with a wide range of tech companies, I am incredibly excited about the possibility for automation and AIOps to truly transform IT operations. These technologies are game changers for companies — not just to survive, but to thrive in the current environment. As we’ve seen over the last few months as digital transformation has rapidly accelerated, automation is absolutely necessary to succeed. Resolve is uniquely positioned to meet these needs and usher in the next generation of IT operations.
How would you best describe what Resolve offers IT companies?
By combining cutting-edge AIOps capabilities with our industry-leading automation platform, Resolve helps IT teams achieve more agile, autonomous IT operations even as infrastructure continues to expand in scope and complexity. Our unified product offers a closed loop of discovery, analysis, detection, prediction, and automation, including prebuilt automations that can be autonomously triggered by AIOps insights to stay ahead of problems and lighten the load on IT organizations.
Our goal is to help ITOps, NetOps, and Service Desk teams meet the growing demands on IT, streamline operations, reduce costs, improve MTTR and performance, and accelerate service delivery through the power of automation and AIOps.
For readers who are not familiar with the term AIOps, can you explain what this term describes and what makes it so important?
AIOps – or AI for IT Operations – helps streamline the management of complex, hybrid IT environments by deploying AI, machine learning and advanced analytics to aggregate, analyze, and contextualize tremendous amounts of data amassed from various sources across the IT ecosystem. These insights facilitate the identification of existing or potential performance issues while spotting anomalies and pinpointing the probable root cause of incidents. Over time, machine learning can predict future issues and proactively automate fixes before they affect the business.
Additionally, most AIOps tools offer advanced correlation capabilities help IT pros determine how alarms are related, reducing noise by grouping similar events and bringing the true issues to light, so people can focus on what matters most. Some AIOps solutions also perform auto-discovery and dependency mapping to provide deep visibility into how entities are connected to one another, and how outages might impact critical business services. This offers a wide range of benefits from keeping your CMDB up-to-data and accurate, to accelerating incident response and simplifying troubleshooting, change management, and compliance.
What are some of the data challenges faced by IT companies?
By far the biggest data challenge IT organizations face is managing increased complexity caused by exponential infrastructure growth and the daily onslaught of new technologies. Data volumes and alarm noise created by infrastructure growth have far exceeded human capacity to find the needle in the proverbial IT haystack. Gartner estimates a two- to three-fold increase in data volume growth per year. To survive in this dynamic environment, it’s critical for IT organizations to embrace AIOps and automation to help them cope with massive amounts of data and to streamline management of new technologies.
How can businesses overcome these challenges using Resolve?
Resolve enables businesses to manage increasing IT complexity with fewer resources through the powerful combination of AIOps and automation. The platform is designed to provide immediate relief, as well as long-term value.
Unlike many other AIOps solutions on the market, customers don’t have to wait months to start seeing value with Resolve. In fact, customers get value in literally minutes with Resolve’s automated discovery and dependency mapping. These capabilities enable us to generate complete infrastructure visualizations, detailed cross-domain topology maps, application dependency maps, and comprehensive views of inventory. Additionally, Resolve ingests data from many other tools (such as monitoring, event management, ITSM, and logging solutions) and aggregates it with telemetry data collected natively by our own platform. This allows customers to achieve the much-sought-after ‘single pane of glass’ that they need to effectively manage complex, hybrid infrastructure, and it provides significantly richer (and complete) visibility across domains.
Over the course of several weeks, these insights are enhanced and enriched as Resolve “learns” the environment and leverages machine learning to perform activities like event correlation and clustering, predictive analytics, multivariate anomaly detection, dynamic thresholding, and autonomous remediation – making the product exponentially more intelligent (and valuable) over time.
Our enterprise-class automation capabilities can take action on insights from the AIOps components or can be used independently. Built for the scale and complexity of modern, hybrid environments, the platform can handle everything from simple tasks to very complex processes that go well beyond the capacity of other tools. Combining AIOps with this level of automation offers an unparalleled ability to autonomously predict, prevent, and fix issues before they impact the business, and to radically improve overall operational efficiency.
Can you describe how Resolve makes it easier to investigate security incidents?
Resolve’s automated incident validation quickly determines which alarms are actual threats versus those that are simply false positives. Hours of manual effort are eliminated by automatically collecting data across the IT environment and security tools, including SIEMs, threat feeds, antivirus systems, and logs. All of that data gets unified into a customizable dashboard, so it’s easy to see the problem and determine how to fix it. Resolve centralizes orchestration of the end-to-end triage and investigation workflows to ensure that issues can be addressed quickly. We also capture a full audit trail of incident investigation steps and results to support compliance and governance.
One of the features of Resolve is it enables IT professionals to ignore ‘noise’ to focus on highlighted real problems. Can you discuss this?
IT pros are bombarded with alarm noise coming in from multiple systems. It’s hard to know where to focus since many of these alarms are false positives, and many others ultimately derive from the same underlying problem.
Take for example the case of an e-commerce system failing. Alarm bells will start ringing everywhere as IT pros frantically sort through multiple data sources to determine whether it’s the network, application, or one of many underlying pieces of infrastructure or services causing the problem. It could take hours to determine that the culprit was high CPU utilization that led to a slowing database and ultimately the failure of the e-commerce system. Even worse, with all of the alarm noise, the IT teams might miss the events altogether related to the e-commerce system and instead focus on a much lower priority issue that isn’t revenue related.
Resolve eliminates alarm noise by performing event correlation and clustering. Clustering machine learning algorithms are used to identify and group events (across systems and domains) that usually occur together, which dramatically compresses event volumes. Our platform also leverages sequential pattern analysis and time-series event correlation. Millions of events across applications and infrastructure are normalized and sequenced in a time series and then analyzed by machine learning to identify patterns. These patterns enable Resolve to reduce alarm noise and help pinpoint root cause – as well as proactively detect problems before they happen. Additionally, the time-series correlations can be leveraged to playback all of the events that occurred in a time period leading up to an outage.
In the case of the e-commerce example above, Resolve would be able to cluster all of the alarms related to the application failure, compressing those into a single event. The system could also track the root cause back to a spike in CPU utilization, making it fast and easy for the IT team to fix the issue rather than triaging hundreds of alarms independently as they look under every rock to get to the root of the matter. If desired, Resolve can even trigger an automated response to take care of the problem autonomously without human intervention.
Can you give us a case study of how an enterprise client used Resolve?
Fujitsu had a range of drivers for adopting automation to better deliver its suite of IT managed services to a wide range of global enterprises. Chiefly, Fujitsu needed to bring down operational costs while continuing to grow their infrastructure, improve organizational efficiency and standardize processes. We helped them achieve all of those goals by automating key processes, and we helped them improve MTTA and MTTR to ensure they were quickly addressing issues impacting their customers to meet their SLAs.
Is there anything else that you would like to share about Resolve?
Digital transformation has gained momentum in the wake of the global pandemic. We see an incredible need to alleviate the mounting strain on IT systems and staff that the crisis has created. Meanwhile, it’s also apparent that businesses need to be planning ahead for the next unexpected event. Automation and AIOps are both fundamental to achieving those ends as they can help safeguard business continuity and improve agility and resilience while reducing security risks and cost. Our mission is to help our customers excel even during challenging times by strategically leveraging these technologies.
Thank you for your wonderful answers. Anyone who wishes to learn more should visit Resolve.
How Quantum Mechanics will Change the Tech Industry
Richard Feynman once said, “If you think you understand quantum mechanics, then you don’t understand quantum mechanics.” While that may be true, it certainly doesn’t mean we can’t try. After all, where would we be without our innate curiosity?
To understand the power of the unknown, we’re going to untangle the key concepts behind quantum physics — two of them, to be exact (phew!). It’s all rather abstract, really, but that’s good news for us, because you don’t need to be a Nobel-winning theoretical physicist to understand what’s going on. And what’s going on? Well, let’s find out.
Laying the groundwork
We’ll start with a brief thought experiment. Austrian physicist Erwin Schrödinger wants you to imagine a cat in a sealed box. So far, so good. Now imagine a vial containing a deadly substance is placed inside the box. What happened to the cat? We cannot know to a certainty. Thus, until the situation is observed, i.e. we open the box, the cat is both dead and alive, or in more scientific terms, it is in a superposition of states. This famous thought experiment is known as the Schrödinger’s cat paradox, and it perfectly explains one of the two main phenomena of quantum mechanics.
Superposition dictates that, much like our beloved cat, a particle exists in all possible states up until the moment it is measured. “Observing” the particle immediately destroys its quantum properties, and voilà, it is once again governed by the rules of classical mechanics.
Now, things are about to get more tricky, but don’t be deterred — even Einstein was thrown-back by the idea. Described by the man himself as “spooky action at a distance”, entanglement is a connection between a pair of particles — a physical interaction that results their shared state (or lack thereof, if we go by superposition).
Entanglement dictates that a change in the state of one entangled particle triggers an immediate, predictable response from the remaining particle. To put things into perspective, let’s throw two entangled coins into the air. Subsequently, let’s observe the result. Did the first coin land on heads? Then the measurement of the remaining coin must be tales. In other words, when observed, entangled particles counter each other’s states. No need to be afraid, though — entanglement is not that common. Not yet, that is.
The likely hero
“What’s the point of all this knowledge if I can’t use it?”, you may be asking. Whatever your question, chances are a quantum computer has the answer. In a digital computer, the system requires bits to increase its processing power. Thus, in order to double the processing power, you would simply double the amount of bits — this is not at all similar in quantum computers.
A quantum computer uses qubits, the basic unit of quantum information, to provide processing capabilities unmatched even by the world’s most powerful supercomputers. How? Superposed qubits can simultaneously tackle a number of potential outcomes (or states, to be more consistent with our previous segments). In comparison, a digital computer can only crunch through one calculation at a time. Furthermore, through entanglement, we are able to exponentially amplify the power of a quantum computer, particularly when comparing this to the efficiency of traditional bits in a digital machine. To visualise the scale, consider the sheer amount of processing power each qubit provides, and now double it.
But there’s a catch — even the slightest vibrations and temperature changes, referred to by scientists as “noise”, can cause quantum properties to decay and eventually, disappear altogether. While you can’t observe this in real time, what you will experience is a computational error. The decay of quantum properties is known as decoherence, and it is one of the biggest setbacks when it comes to technology relying on quantum mechanics.
In an ideal scenario, a quantum processor is completely isolated from its surroundings. To do so, scientists use specialised fridges, known as cryogenic refrigerators. These cryogenic refrigerators are colder than interstellar space, and they enable our quantum processor to conduct electricity with virtually no resistance. This is known as a superconducting state, and it makes quantum computers extremely efficient. As a result, our quantum processor requires a fraction of the energy a digital processor would use, generating exponentially more power and substantially less heat in the process. In an ideal scenario, that is.
A (new) world of possibilities
Weather forecasting, financial and molecular modelling, particle physics… the application possibilities for quantum computation are both enormous and prosperous.
Still, one of the most tantalising prospects is perhaps that of quantum artificial intelligence. This is because quantum systems excel at calculating probabilities for many possible choices — their ability to provide continuous feedback to intelligent software is unparalleled in today’s market. The estimated impact is immeasurable, spanning across fields and industries — from AI in the automotive all the way to medical research. Lockheed Martin, American aerospace giant, was quick to realise the benefits, and is already leading by example with its quantum computer, using it for autopilot software testing. Take notes.
The principles of quantum mechanics are also used to address issues in cybersecurity. RSA (Rivest-Shamir-Adleman) cryptography, one of the world’s go-to methods of data encryption, relies on the difficulty of factoring (very) large prime numbers. While this may work with traditional computers, which aren’t particularly effective at solving multi-factor problems, quantum computers will easily crack these encryptions thanks to their unique ability to calculate numerous outcomes simultaneously.
Theoretically, Quantum key distribution takes care of this with a superposition-based encryption system. Imagine you’re trying to relay sensitive information to a friend. To do so, you create an encryption key using qubits, which are then sent to the recipient over an optical cable. Had the encoded qubits been observed by a third party, both you and your friend will have been notified by an unexpected error in the operation. However, to maximise the benefits of QKD, the encryption keys would have to maintain their quantum properties at all times. Easier said than done.
Food for thought
It doesn’t stop there. The brightest minds around the globe are constantly trying to utilise entanglement as a mode of quantum communication. So far, Chinese researchers were able to successfully beam entangled pairs of photons through their Micius satellite over a record-holding 745 miles. That’s the good news. The bad news is that, out of the 6 million entangled photons beamed each second, only one pair survived the journey (thanks, decoherence). An incredible feat nonetheless, this experiment outlines the kind of infrastructure we may use in the future to secure quantum networks.
The quantum race also saw a recent breakthrough advancement from QuTech, a research centre at TU Delft in the Netherlands — their quantum system operates at a temperature over one degree warmer than absolute zero (-273 degrees Celsius).
While these achievements may seem insignificant to you and I, the truth is that, try after try, such groundbreaking research is bringing us a step closer to the tech of tomorrow. One thing remains unchanged, however, and that is the glaring reality that those who manage to successfully harness the power of quantum mechanics will have supremacy over the rest of the world. How do you think they will use it?
Appen’s State of AI Annual Report Reveals Significant Industry Growth
Appen Limited (ASX: APX), the leading provider of high-quality training data for organizations that build effective AI systems at scale, today announced its annual State of AI Report for 2020.
The State of AI 2020 report is the output of a cross-industry, large-organization study of senior business leaders and technologists. The survey intended to examine and identify the main characteristics of the expanding AI and machine learning landscape by gathering responses from AI decision-makers.
There were multiple key takeaways:
- While nearly 3 out of 4 organizations said AI is critical to their business, nearly half feel their organization is behind in their AI journey.
- AI Budgets greater than $5M doubled YoY
- An increasing number of enterprises are getting behind responsible AI as a component to business success, but only 25% of companies said unbiased AI is mission-critical.
- 3 out of 4 organizations report updating their AI models at least quarterly, signifying a focus on the model’s life after deployment.
- The gap between business leaders and technologists continues, despite their alignment being instrumental in building a strong AI infrastructure.
- Despite turbulent times, more than two-thirds of respondents do not expect any negative impact from COVID-19 on their AI strategies.
One of the key findings is that nearly half of those who responded feel their company is behind in their AI journey, this suggests a critical gap exists between the strategic need and the ability to execute.
Lack of data and data management was reported as a main challenge, this includes training data which is foundational of AI and ML model deployments, so, unsurprisingly, 93% of companies report that high-quality training data is important to successful AI.
Organizations also reported using 25% more data types (text, image, video, audio, etc.) in 2020, compared to 2019. Not only are models getting more frequent updates, but teams are using increasingly more data types, and that will translate in an increasing need for investment in reliable training data.
One key indicator of exponential growth of AI was the rapid YoY growth in AI initiates. In 2019, only 39% of executives owned AI initiatives. In 2020, executive ownership of AI skyrocketed to 71%. With this increase in executive ownership, the number of organizations reporting budgets greater than $5M also doubled.
Global cloud providers gained significant traction as data science and ML tools compared to 2019. This may be due to increased budget and executive oversight. What is even more impressive is the increase of respondents who are reporting using global cloud machine learning providers which are identified as: Microsoft Azure (49%), Google Cloud (36%), IBM Watson (31%), AWS (25%), and Salesforce Einstein (17%). Each of these front runners saw double-digit adoption increases vs 2019, proving that as more companies are moving to scale, they’re looking for solutions that can scale with them.
To learn more, we recommend downloading the entire State of AI and Machine Learning Report.
Omri Geller, CEO & Co-Founder of Run:AI – Interview Series
Omri Geller is the CEO and Co-Founder at Run:AI
Run:AI virtualizes and accelerates AI by pooling GPU compute resources to ensure visibility and, ultimately, control over resource prioritization and allocation. This ensures that AI projects are mapped to business goals and yields significant improvement in the productivity of data science teams, allowing them to build and train concurrent models without resource limitations.
What was it that initially attracted you to Artificial Intelligence?
When I began my Bachelor’s degree in Electrical and Electronics Engineering at Tel Aviv University, I discovered fascinating things about AI that I knew would help take us to the next step in computing possibilities. From there, I knew I wanted to invest myself into the AI space. Whether it was in AI research, or opening a company that would help introduce new ways to apply AI to the world.
Have you always had an interest in computer hardware?
When I received my first computer with an Intel 486 processor at six or seven years old, I was immediately interested to figure out how everything worked, even though I was probably too young to really understand it. Aside from sports, computers became one of my biggest hobbies growing up. Since then, I have built computers, worked with them, and went on to study in the field because of the passion I had as a kid.
What was your inspiration behind launching Run:AI?
I knew from very early on that I wanted to invest myself into the AI space. In the last couple of years, the industry has seen tremendous growth in AI, and a lot of that growth came from both computer scientists, like myself, and hardware that could support more applications. It became clearer to me that I would inevitably start a company – and together with my co-founder Ronen Dar – to continue to innovate and help bring AI to more enterprise companies.
Run:AI enables machine learning specialists to gain a new type of control over the allocation of expensive GPU resources. Can you explain how this works?
What we need to understand is that machine learning engineers, like researchers and data scientists, need to consume computing power in a flexible way. Not only are today’s newest computations very compute-intensive, but there are also new workflows that are being used in data science. These workflows are based on the fact that data science is based on experimentation and running experiments.
In order to develop new solutions to run more efficient experiments, we need to study these workflow tendencies across time. For example: A data scientist uses eight GPUs in one day, but then the next day they might use zero, or they can use one GPU for a long period of time, but then need to use 100 GPUs because they want to run 100 experiments in parallel. Once we understand this workflow for optimizing the processing power of one user, we can begin to scale it to several users.
With traditional computing, a specific number of GPUs are allocated to every user, not taking into account if they are in use or not. With this method, often times, expensive GPUs sit idle without anybody else being able to access them, resulting in low ROI for the GPU. We understand a company’s financial priorities, and offer solutions that allow for dynamic allocation of those resources according to the needs of the users. By offering a flexible system, we can allocate extra power to a specific user when required, by utilizing GPUs not in use by other users, creating maximum ROI for a company’s computing resources and accelerating innovation & time to market of AI solutions.
One of the Run:AI functionalities is that it enables the reduction of blind spots created by static allocation of GPU. How is this achieved?
We have a tool that gives us full visibility into the cluster of resources. By using this tool, we can observe and understand if there are blind spots, and then utilize those idle GPUs for users that need the allocation. The same tool that provides visibility into the cluster and control over the cluster also makes sure those blind spots are mitigated.
In a recent speech, you highlighted some distinctions between build and training workflows, can you explain how Run:AI uses a GPU queueing management mechanism to allocate resource management for both?
An AI model is built in two stages. First, there is the building stage, where a data scientist is writing the code to build the actual model, the same way that an engineer would build a car. The second is the training stage, where the completed model begins to learn and be ‘trained’ on how to optimize a specific task. Similar to someone learning to drive the car after it has been assembled.
To build the model itself, not much computing power is needed. However, eventually, it could need stronger processing power to begin smaller, internal tests. For example, the way an engineer would eventually want to test the engine before they install it. Because of these distinct needs during each stage, Run.AI allows for GPU allocation regardless of if they are building or training the model, however, as mentioned earlier, higher GPU use is generally required for training the model while less is required for building it.
How much raw computing time/resources can be saved by AI developers who wish to integrate Run.AI into their systems?
Our solutions at Run.ai can improve the digitization of resources, by about two to three times, meaning 2-3 times better overall productivity.
Thank you for the interview, readers who wish to learn more may visit Run:AI.
- How Quantum Mechanics will Change the Tech Industry
- Jim McGowan, head of product at ElectrifAi – Interview Series
- NASA to Use Machine Learning to Enhance Search for Alien Life on Mars
- New Study Attempts to Improve Hate Speech Detection Algorithms
- Pentagon’s Joint AI Center (JAIC) Testing First Lethal AI Projects