APM provides users with dashboards and alerts to troubleshoot an application’s performance in production. These insights are based on known, or expected, system failures—typically related to SRE golden signals—and provide engineers with alerts when pre-defined issues arise, with breadcrumbs and recommendations on how to troubleshoot.
But what about issues that arise that weren’t predefined or expected? Today’s software environments are increasingly distributed, with software that is built, deployed, and maintained by distributed teams. This software also runs on a wide array of hosts—whether on premises or in the cloud. It’s critical for teams to conceptualize, troubleshoot, and improve these distributed systems. An observability practice gives teams the flexibility to query the “unknown unknowns” in their dynamic systems and investigate and troubleshoot anomalies as they arise. Observability platforms give teams a connected, real-time view of their operational data in one place, so they can better understand system behavior to make iterative improvements to the entire stack.
You need application performance monitoring.
IT teams typically adopt APM as a best practice to understand and improve system performance, by helping them identify when an application is slow or broken and then fix issues before they affect users. Through pre-configured alerts and visualizations, APM helps teams understand metrics like response time, throughput, and errors.
You can monitor the performance of everything from websites, to mobile apps, servers, networks, APIs, cloud-based services, and other technologies with tools, products, and solutions including:
- operational dashboards
- real user monitoring
- mobile monitoring
- synthetic monitoring
- serverless monitoring
- database monitoring
- infrastructure monitoring
- service maps
APM provides a high-level view of how an application is performing and works well for the questions or conditions you know to ask in advance, such as:
- “What’s my application’s throughput?”
- “Alert me when I exceed a certain error budget.”
- “What does compute capacity look like?”
But many modern application architectures are too complex to monitor and manage with just APM. You need to consider multiple data sources and various telemetry data types (not just metrics). You also need to think about logging. Each run time is likely emitting logs in different places, and you need a way to consolidate that data and evaluate it in the context of your application. And, as you add more services and microservices components to your architecture, when a user accesses one of these services and gets an error, you need to be able to trace that request across multiple services. You also need to be able to investigate all of this data in one place, so you don’t lose context and can improve KPIs like mean time to recovery (MTTR).
To get to the root cause of an issue when you have multiple run times and many architecture layers, it’s necessary to take a more holistic, proactive approach. While APM provides aggregated metrics, you also need other insights to understand your dynamic stack
You really need observability.
Observability is about getting deep, technical insights into the state of your entire system, no matter how large or complex it is. It helps DevOps teams navigate the challenges of increased fragmentation in today’s distributed systems. Observability also gives you the power to understand patterns and connections in your data that you hadn’t previously considered.
Observability platforms automate collecting data from an array of sources and services together in one place, help you monitor the health of your application by visualizing its performance in context of the entire stack, and then give you the insights to take action. These insights help you understand not just that something happened, but why, with all the tools at your fingertips to take action to resolve.
When evaluating observability platforms, look for ones that allow you to:
- Use open instrumentation agents to gather telemetry data from open source or vendor-specific entities that produce that data. Examples of telemetry data include metrics, events, logs, and traces (often referred to as MELT). Examples of entities include services, hosts, applications, and containers.
- Visualize, navigate, debug, and improve your entire stack to optimize your end users’ experience.
- Analyze the enormous amounts of raw telemetry data, in high cardinality, collected for correlations and context, so humans can make sense of any patterns and anomalies that arise.
- Take advantage of advances in artificial intelligence and machine learning, as part of an AIOps practice, so you can reduce alert noise and eliminate false alarms, correlate incidents, and automatically detect anomalies. Together, these help you find, diagnose, and resolve incidents faster. Insights into which areas need the most improvement help your teams determine where to focus developers’ efforts moving forward, to have the greatest effect on the end users of your application.
APM is part of your observability practice.
Given the application-centricity of today’s software stacks, you can’t have observability without a strong APM discipline.
You can think of it this way: Observability (a noun) is the approach to how well you can understand your complex system. Application performance monitoring (a verb) is an action you take to help in that approach. Observability doesn’t eliminate the need for APM. APM just becomes one of the techniques used to achieve observability.
If you want to evaluate New Relic state of the art observability platform, sign up for a New Relic account today.