Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

AIOps: Complete Guide to AI

for IT Operations
According to Gartner, IT Operations personnel (IT Ops) are in
the midst of a revolution. The forces of digital business
transformation are necessitating a change to traditional IT
management techniques. Consequently, we are seeing a
significant change in current IT Ops procedures and a
restructuring in how we manage our IT ecosystems. And
Gartner’s term that captures the spirit of these changes is
Artificial Intelligence for IT Operations, or AIOps.

AIOps as a market category has exploded over the last couple

of years. The number of inquiries Gartner fields has increased
exponentially as have the number of Google searches on the
topic. This post explains the technology and market dynamics
driving the emergence of AIOps and how it is a response to
those challenges.
BMC Helix - The Future of Service and
Operations Management

BMC Helix is the first and only end-to-end service and

operations platform that’s integrated with 360-degree
intelligence. Built for the cloud, this reimagined service
and operations experience is unrivaled, giving you:

Single pane of glass for ITSM and ITOM functions

BMC Helix ITSM optimized for ITIL® 4

Enterprise-wide service including IT, HR, Facilities,
and Procurement
An omni-channel experience across Slack, Chatbot,
Skype, and more
Cloud native micro-services platform for your
Automation with conversational bots and RPA bots
More than 7,500 IT organizations trust BMC ITSM
solutions. See why and learn more about BMC Helix ›

Digital Transformation and the Road

to AIOps
It’s important to understand how digital transformation gives
rise to AIOps. Digital transformation encompasses cloud
adoption, rapid change, and the implementation of new
technologies. It also requires a shift in focus to
applications and developers, an increased pace of innovation
and deployment, and the acquisition of new digital
users–machine agents, Internet of Things (IOT) devices,
Application Program Interfaces (APIs), etc.–that organizations
didn’t need to service in the past. All these new technologies
and users are straining traditional performance and service
management strategies and tools to the breaking point.

Artificial Intelligence for IT Operations describes the

paradigm shift required to handle digital transformation in IT

Defining AIOps
AIOps refers to multi-layered technology platforms that
automate and enhance IT operations by 1) using analytics and
machine learning to analyze big data collected from various IT
operations tools and devices, in order to 2) automatically
spot and react to issues in real time.

Gartner explains how an AIOps platform works by using the

diagram in figure 1. AIOps has two main components: Big Data
and Machine Learning. It requires a move away from siloed IT
data in order to aggregate observational data (such as that
found in monitoring systems and job logs) alongside engagement
data (usually found in ticket, incident, and event recording)
inside a Big Data platform. AIOps then implements Analytics
and Machine Learning (ML) against the combined IT data. The
desired outcome is continuous insights that can yield
continuous improvements with the implementation of automation.
AIOps can be thought of as Continuous Integration and
Deployment (CI/CD) for core IT functions.

Figure 1: Gartner’s visualization of the AIOPS platform

AIOps bridges three different IT disciplines—service

management, performance management, and automation—to
accomplish its goals of continuous insights and improvements.
AIOps is the recognition that in our new accelerated, hyper-
scaled IT environments, there must be a new approach that
leverages advances in big data and machine learning to
overcome legacy tool and human limitations.

What’s Driving AIOps?

The promise of artificial Intelligence has been to do what
humans do but do it better, faster, and at scale. AIOps will
do this for IT Operations by addressing the speed, scale, and
complexity challenges of digital transformation, including:

The difficulty IT Operations has in manually managing

its infrastructure. It’s becoming a misnomer to use the
term “infrastructure” here, as modern IT environments
include managed cloud, unmanaged cloud, third party
services, SaaS integrations, mobile, and more.
Traditional approaches to managing complexity don’t work
in dynamic, elastic environments. Tracking and managing
this complexity through manual, human oversight is no
longer possible. Current IT Ops technology is already
beyond the scope of manual management and it will only
get worse in the coming years.
The amount of data that IT Ops needs to retain is
exponentially increasing. Performance monitoring is
generating exponentially larger numbers of events and
alerts. Service ticket volumes experience step function
increases with the introduction of IOT devices, APIs,
mobile applications, and digital or machine users.
Again, it is simply becoming too complex for manual
reporting and analysis.
Infrastructure problems must be responded to at ever-
increasing speeds. As organizations digitize their
business, IT becomes the business. The ‘consumerization’
of technology has changed user expectations for all
industries. Reactions to IT events–whether real or
perceived–need to occur immediately, particularly when
an issue impacts user experience.
More computing power is moving to the edges of the
network. The ease with which cloud infrastructure and
third-party services can be adopted has empowered line
of business (LOB) functions to build their own IT
solutions and applications. Control and budget have
shifted from the core of IT to the edge. More computing
power (that can be taken advantage of) is being added
from outside core IT.
Developers have more power and influence but
accountability still sits with core IT. In DevOps
organizations, programmers take more monitoring
responsibility at the application level, but
accountability for the overall health of the IT
ecosystem and the interaction between applications,
services, and infrastructure still remains the province
of core IT. IT Ops is taking on more responsibility just
as digital businesses are getting more complex.
The Elements of AIOps
AIOps consist of the following elements, shown in figure 2:

Figure 2: The technologies that make up an AIOps platform

Extensive and diverse IT data sources, from currently

siloed tools and IT disciplines such as events, metrics,
logs, job data, tickets, monitoring, etc.
A modern big data platform that permits real-time
processing of streaming IT data. Examples include Hadoop
2.0, Elastic Stack, and some Apache technologies.
Rule application and pattern recognition that enforce
leverage and/or discover context while uncovering
regularities and normalities in the data. These can be,
but don’t have to be, specific to the domain.
Domain algorithms that leverage IT domain expertise
(specific to one environment or at the industry level)
to intelligently interpret and apply the rules and
patterns, as dictated by an organization’s data and its
desired outcomes. These algorithms make it possible to
achieve IT specific goals like eliminating noise,
correlating unstructured data, establishing baselines,
alerting on abnormalities, and identifying probable
Machine learning that can automatically alter or create
new algorithms based on the output of algorithmic
analysis and new data introduced into the system.
Artificial intelligence that can adapt to the new and
unknown in an environment.
Automation, which uses the outcomes generated by the
machine learning and/or AI to automatically create and
apply a response or improvement for identified issues
and situations.

It needs to be said that although AIOps represents a radical

departure for IT Ops, it’s not a radical application of
machine learning and big data. A similar ML approach was
implemented when stock brokers moved from manual trading to
machine trading. Analytics and ML are used in social media, in
applications like Google Maps, Waze, and Yelp, as well as in
online marketplaces like Amazon and eBay. These techniques are
used reliably and extensively in environments where real-time
responses to dynamically changing conditions and user
customization are required.

Adoption of artificial intelligence in AIOps is nascent

compared to machine learning. Right now, the pressing use
cases are best addressed with simple automation or a
combination of ML and automation. It remains to be seen how AI
will evolve and what new use cases it will enable. In any
case, a strong AIOps foundation needs to be laid on IT
Operations as it exists now before we can begin modeling human
behavior for use on it.
IT Ops personnel have been slow to adapt to AIOps-like
environments because, out of necessity, our jobs have always
been more conservative. It’s IT Ops’ job to make sure the
lights stay on and to provide stability for the infrastructure
that organizational applications ride on. However, due to the
trends listed above, more IT Ops shops (especially those in
the Enterprise) will need to implement AIOps strategies and
technologies in the near future.

Additional Resources

AIOps: Steps Towards Autonomous Operations (DEV301-R1) – AWS

re:Invent 2018 from Amazon Web Services

You might also like