BOOK A DEMO

It’s time to monitor your AI applications with New Relic

NVIDIA NIM AI integration logos

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of cloud-native microservices that offer models as optimized containers. These containers can be deployed on clouds, data centers, or workstations, enabling the easy creation of generative AI applications such as copilots and chatbots.

New Relic AI Monitoring seamlessly integrates with NVIDIA NIM, providing full-stack observability for applications built on a wide range of AI models supported by NIM, including Meta’s Llama 3, Mistral Large, and Mixtral 8x22B, among others. This integration helps organizations confidently deploy and monitor AI applications built with NVIDIA NIM, accelerate time-to-market, and improve ROI.

What is NVIDIA NIM?

NVIDIA NIM is a set of inference microservices that provides pre-built, optimized LLM models that simplify the deployment across NVIDIA accelerated infrastructure in the data center and cloud. This eliminates the need for companies to spend valuable time and resources optimizing models to run on different infrastructure, creating APIs for developers to build applications, and maintaining security and support for these models in production. Here’s how NIM tackles the challenges of generative AI:

Watch this Video

Getting started with New Relic AI Monitoring for NVIDIA NIM

New Relic AI Monitoring delivers the power of observability to the entire AI stack for applications built with NVIDIA NIM. This enables you to effortlessly monitor, debug, and optimize your AI applications for performance, quality, and cost, while ensuring data privacy and security. Here are step-by-step instructions to get you started with monitoring AI applications built with NVIDIA NIM

Step 1: Instrument your AI application built with NVIDIA NIM

First, you’ll need to set up instrumentation for your application. Here’s how:

Python agent instrumentation snapshot for AI Monitoring

Step 2: Access AI Monitoring

Once your application is instrumented, you can start using AI Monitoring:

New Relic UI showing the AI Monitoring capability

Screenshot showing the local-nim sample app in the AI entities view

  • Click on the local-nim app to access the APM 360 summary with the integrated AI monitoring view. This unified view gives you instant insights into the AI layer’s key metrics, such as the total number of requests, average response time, token usage, and response error rates. These results appear in context, alongside your APM golden signals, infrastructure insights, and logs. By correlating all this information in one place, you can quickly identify the source of issues and drill down deeper for further analysis.
APM 360 summary view showing the integrated AI response metrics view

Step 3: Deep dive with AI response tracing 

For a more detailed analysis, the deep tracing view is incredibly useful:

  • In the APM 360 summary, click on AI responses navigation on the left and select the response you want to drill into.

AI response view

  • Here, you can see the entire path from the initial user input to final response, including metadata like token count, model information, and actual user interactions. This allows you to quickly root cause the issue.

Deep response tracing view

Step 4: Explore model inventory

Model Inventory provides a comprehensive view of model usage across all your services and accounts. This helps you isolate any model-related performance, errors, and cost issues.

  • Go back to the AI Monitoring section.
  • Click on Model Inventory and view performance, error, and cost metrics by models

Model inventory

Step 5: Compare models for optimal choice

To choose the model that best fits your cost and performance needs:

  • Click on Compare Models
  • Select the models, service, and the time ranges you want to compare in the drop down list

Model comparison view in AI Monitoring

Step 6: Enhance privacy and security

Complementing the robust security advantage of NVIDIA NIM self-hosted models, New Relic allows you to exclude monitoring of sensitive data (PII) in your AI requests and responses:

  • Click on Drop Filters and create filters to target specific data types within the six events offered

Drop filters

To learn how you can monitor your AI Applications, Click here  to schedule a call with our Observability experts.

 

Is GPT-4o really better for enterprise AI solutions?

We at New Relic are currently evaluating the potential switch from GPT-4 Turbo to GPT-4o for our GenAI assistant, New Relic AI. Having previously navigated the transition from GPT-4 to GPT-4 Turbo, we understand that the reality of adopting new technology often differs from initial expectations. In this blog, we aim to provide a practitioner’s perspective on transitioning to GPT-4o. We’ll explore the practical implications of such a move, including performance metrics, integration challenges, capacity considerations, and cost efficiency, based on our preliminary tests and experiences. Our goal is to offer a balanced view that addresses both the potential benefits and the challenges, helping other businesses make informed decisions about whether GPT-4o is the right choice for their AI systems.

 

Understanding the GPT-4o model

To evaluate GPT-4o effectively, it’s important to understand its key features and expected benefits. This section provides a brief overview of what GPT-4o offers.

Analysis capabilities

GPT-4o is designed to enhance natural language processing tasks, offering improved accuracy and understanding in complex queries. It’s expected to perform better in multilingual environments and provide more relevant responses.

Resource efficiency

One of the main selling points of GPT-4o is its resource efficiency. It promises to deliver high performance while using fewer computational resources, potentially leading to lower operational costs. This aspect is particularly important for businesses looking to scale their AI operations without significantly increasing their infrastructure costs.

Usability and integration

GPT-4o claims to integrate seamlessly with existing workflows and tools. Its improvements are intended to make it easier for developers to incorporate the model into their applications, reducing the time and effort required for integration. However, as we’ll see in the performance evaluation, the practical experience may vary.

Accessibility and pricing

A significant advantage of GPT-4o is its cost efficiency. It’s approximately half as expensive as its predecessors, making it an attractive option for companies aiming to reduce operational costs. This pricing can significantly lower the barrier to entry for advanced AI capabilities, enabling more businesses to leverage the power of GPT-4o in their operations.

 

Evaluating the hype

While the marketed benefits of GPT-4o are appealing, it’s important to critically assess these through practical testing. Our initial experiences with GPT-4o have provided mixed results, warranting a deeper dive into specific aspects.

To evaluate GPT-4o comprehensively, we conducted a series of experiments focusing on various aspects of performance and integration:

  • Latency and throughput tests: We measured the speed of response and processing capabilities during peak and off-peak hours.
  • Quality of outputs: We evaluated the accuracy and relevance of responses across different tasks.
  • Workflow integration: We assessed how well GPT-4o integrates with our existing tools and workflows.
  • Token efficiency: We compared the token usage per prompt between GPT-4 Turbo and GPT-4o to understand cost implications.
  • Scalability tests: We monitored performance under increasing loads to assess scalability.
  • Cost analysis: We analyzed the cost implications based on token usage and operational efficiency.

Performance evaluation

In our evaluation of GPT-4o, we compared its performance against two other models: GPT-4 Turbo and GPT-4 Turbo PTU, accessed through Azure OpenAI. At New Relic, we primarily use GPT-4 Turbo via the provisioned throughput unit (PTU) option, which offers dedicated resources and lower latency compared to the pay-as-you-go model. This comparison aims to provide a clear picture of how GPT-4o stacks up in terms of throughput and output quality under different conditions.

Throughput analysis

Throughput, measured in tokens per second, reflects a model’s ability to handle large volumes of data efficiently. It also gives an indication of latency, as higher throughput generally correlates with lower latency.

Our initial tests on May 24 revealed distinct performance characteristics among the three models:

Generation speed (tokens/second) per model and endpoint on 24th of May
  • GPT-4 Turbo PTU: Showed a throughput around 35 tokens/sec. The high throughput is indicative of the benefits of dedicated resources provided by the PTU, making it suitable for high-volume data processing tasks where consistent performance is critical.
  • GPT-4 Turbo: Operating under the pay-as-you-go model, GPT-4 Turbo showed a peak throughput around 15-20 tokens/sec. While efficient, it did exhibit some limitations compared to the PTU model, likely due to the shared resource model that introduces variability.
  • GPT-4o: Demonstrated a throughput around 50 tokens/sec, inline with the OpenAI’s claim that GPT-4o is 2x faster compared to GPT-4 Turbo. While GPT-4o showed potential in handling large-scale data processing, it also exhibited more variability, suggesting performance could fluctuate based on load conditions.

We continued to monitor the performance of these models over time to understand how they cope with varying loads. Further tests conducted on June 11 provided additional insights into the evolving performance of these models:

Generation speed (tokens/second) per model and endpoint on 11th of June

  • GPT-4 Turbo PTU: Showed a rather consistent performance with peak around 35 tokens/sec. This consistency reaffirms the benefits of PTU for applications requiring reliable and high-speed processing.
  • GPT-4 Turbo: Maintained a peak throughput around 15–20 tokens/sec, but showed a slight decrease in peak density indicating less predictable latency due to increased variability.
  • GPT-4o: Showed a significant decrease in throughput with peak now around 20 tokens/sec. Based on our experience with the transition from GPT-4 to GPT-4 Turbo, it’s reasonable to assume that a higher demand of GPT-4o endpoints (accessed via pay-as-you-go model) could further degrade its efficiency, potentially impacting its suitability for applications that require consistent high throughput and low latency.

Quality of outputs

In addition to throughput, we evaluated the quality of outputs from each model, focusing on their ability to generate accurate and relevant responses across various tasks. This evaluation includes natural language processing, multilingual support, overall consistency, and integration with existing workflows.

Natural language processing and multilingual support

  • GPT- 4 Turbo: Both provisioning options of GPT-4 Turbo consistently generate quality responses across various tasks and perform generally well in natural language processing applications.
  • GPT-4o: Excels in understanding and generating natural language, making it highly effective for conversational AI tasks. The o200k_base tokenizer is optimized for various languages, enhancing performance in multilingual contexts and reducing token usage. However, GPT-4o tends to provide longer responses and may hallucinate more frequently, which can be a limitation in applications requiring concise answers.

Accuracy and consistency

  • GPT- 4 Turbo: In tasks requiring high precision, GPT-4 Turbo performs reasonably well, correctly identifying 60–80% of the data in complex data extraction tasks. However, consistency in response accuracy and behavior when the same task is repeated can vary depending on the specific use case and setup.
  • GPT-4o: Shows comparable performance to GPT-4 Turbo. In some fields, GPT-4o slightly outperforms GPT-4 Turbo in accuracy, though this varies depending on the specific task. For instance, in complex data extraction tasks, GPT-4o correctly identifies only 60–80% of the data demonstrating a comparable performance. However, GPT-4o shows significant variability in response consistency, particularly when asked to repeat the same task multiple times.

Integrations and workflow efficiency

  • GPT- 4 Turbo: Generally integrates well with existing tools and workflows, ensuring smooth operations and consistent performance. It leverages integrations to provide comprehensive and contextually relevant answers, making it a reliable option for various applications. However, the model still lacks precision.
  • GPT-4o: Though both the models, GPT-4 Turbo and GPT 4o show comparable performance, GPT-4o sometimes attempts to answer questions directly rather than leveraging integrated tools. This can disrupt workflow efficiency in systems that rely on tool integrations or functions for context-relevant responses.

The confusion matrix below provides further insights into how GPT-4 Turbo and GPT-4o handle function calls differently, reflecting their interpretations of function descriptions and user queries in case of New Relic AI.

Confusion matrix of gpt4 Turbo vs gpt4o

The models show high agreement on certain tools, such as tool_4 with 87% agreement, indicating consistent interpretation of this function. However, the agreement drops to 57% for tool_3, showing variability in processing this particular tool. Both models exhibit similar patterns of misclassification, highlighting nuanced differences in their function calling behavior. This suggests that even though GPT-4o is faster, better and less expensive on paper, replacing GPT-4 Turbo with GPT-4o will not necessarily yield identical behavior. There are always nuances and models can return an unexpected result.

Tokenizer efficiency

Tokenizer efficiency plays a critical role in the overall performance and cost-effectiveness of the models, especially in multilingual contexts.

Model Tokenizer Efficiency
GPT-4 Turbo cl100k_base
  • Optimized for general use across multiple languages and tasks. It uses a base vocabulary of around 100,000 tokens.
  • Provides efficient tokenization for a wide range of applications, ensuring good performance and reasonable token usage across different languages and scripts.
  • While not specialized for any particular language, it handles English and several other languages effectively, making it versatile for diverse tasks.
  • Suitable for tasks where standard token efficiency is adequate, but might require more tokens for complex scripts compared to specialized tokenizers.
GPT-4o o200k_base
  • Specifically designed to handle multiple languages more efficiently. It uses a base vocabulary of around 200,000 tokens.
  • Superior performance in multilingual contexts due to its larger vocabulary, which allows for more precise and compact tokenization.
  • Particularly efficient in tokenizing complex scripts such as Japanese, Chinese, and other non-Latin alphabets.
  • Requires up to 4.4x fewer tokens for complex scripts, resulting in faster processing times and lower costs in multilingual applications.

Cost efficiency

OpenAI’s claim of reduced costs with GPT-4o is a significant selling point. GPT-4o is approximately half as expensive as its predecessors, with input tokens priced at $5 per million and output tokens at $15 per million. Additionally, GPT-4o allows for five times more frequent access compared to GPT-4 Turbo, which can be highly beneficial for applications requiring continuous data processing or real-time analytics.

Feature GPT-4 Turbo GPT-4o
Input tokens $10 per million tokens         $5 per million tokens
Output tokens $30 per million tokens        $15 per million tokens
Rate limits Standard OpenAI API policies      Five times more frequent access

However, our previous experience with transitioning from GPT-4 to GPT-4 Turbo highlighted some important lessons. Despite the advertised cost reduction per 1,000 tokens, we didn’t see the expected savings in certain use cases. This discrepancy was primarily observed in use cases without a specified output format, where the model has the freedom to generate as much text as it thinks fit. For example, in retrieval-augmented generation (RAG) tasks where the model tends to be more “chatty”, the new version of the model can generate more tokens per response. For instance, for the same question to New Relic AI, How do I instrument my python application, the new version can generate a 360-token answer, while the old one generated a 300-token answer. In such cases, you won’t be getting the advertised 50% reduction in cost.

Similarly, while GPT-4o offers lower costs per token, the total token usage per prompt might increase due to the model’s behavior (the model tends to be more “chattier” compared to GPT-4o), potentially offsetting some of the cost savings if prompts are not optimized effectively. Moreover, integration and workflow adjustments required for GPT-4o may necessitate extra development and operational costs. Given the integration challenges previously discussed, businesses may need to invest in optimizing their workflows and ensuring seamless integration with existing tools. These adjustments could incur additional costs, which should be factored into the overall cost efficiency analysis.

 

Making an informed decision:

After understanding the detailed performance and cost metrics of both GPT-4 Turbo and GPT-4o, it’s crucial to make an informed decision that aligns with your specific needs and objectives. This involves a holistic assessment of your business requirements, cost implications, and performance requirements. Here are key factors to consider:

  • Throughput and latency needs: If your application requires high throughput and low latency, GPT-4o’s faster response times and higher rate limits may be beneficial. This is particularly important for real-time applications like chatbots and virtual assistants. However, it’s important to monitor the performance of GPT-4o over time. Our tests indicate that GPT-4o’s performance may degrade with increased usage, which could impact its reliability for long-term projects. For consistent performance, companies might eventually need to consider moving to GPT-4o PTU, which may offer more stable and reliable performance.
  • Quality of outputs: While both models provide high-quality outputs, consider the consistency of responses. GPT-4 Turbo may offer more predictable performance, which is critical for applications where uniform quality is essential.
  • Integration with tools: If your workflow relies heavily on integrated tools and context-rich responses, evaluate how each model handles these integrations. GPT-4 Turbo’s ability to leverage existing tools might offer smoother workflow efficiency compared to GPT-4o.
  • Cost: Compare the cost per million tokens for both input and output. GPT-4o is less expensive, but ensure that any potential increase in token usage per prompt does not offset these savings. Moreover, be mindful of additional costs associated with integration and workflow adjustments. Transitioning to GPT-4o may require changes to existing processes, which could incur development and operational expenses.
  • Token efficiency: Consider the complexity and language diversity of your content. GPT-4o’s o200k_base tokenizer is more efficient for multilingual tasks, potentially reducing overall token usage and cost for non-English content.
  • Rate limits and usage: GPT-4o’s higher rate limits can accommodate more frequent interactions, making it suitable for applications with high interaction volumes. This can ensure smoother performance under high demand.
  • Scalability: Consider the long-term scalability of your application. GPT-4o’s cost efficiency and performance improvements might offer better scalability, but assess how these benefits align with your growth projections and resource availability.

 

Conclusion

Deciding between GPT-4 Turbo and GPT-4o requires a careful evaluation of your specific needs and goals. GPT-4o offers cost benefits and superior efficiency in multilingual contexts but may involve higher token usage and potential integration challenges. Monitoring its performance over time is crucial due to possible degradation. For stable, consistent performance, GPT-4 Turbo remains a reliable option. By assessing these factors, you can select the model that best aligns with your operational needs and ensures both optimal performance and cost efficiency.

To learn how you can monitor your GPT performance, Click here to schedule a call with our Observability experts

Mobile APM: Android and iOS monitoring

Overview of the solution

Performance metrics are tracked by mobile app monitoring, which identifies shortcomings imposed on by servers, networks, devices, code, and other factors. Analysis helps prevent and resolve issues that are vital to a seamless user experience. Time-series data measures of crashes provide insights, and end-user reports enhance analysis, although they can be challenging to connect to specific issues.

Essential metrics to gauge your app’s performance and health

Though web users have adapted to minimal delays, mobile users want quick response times from their apps. User experience can be adversely affected by performance problems. The broad range of devices that have various specifications and the fluctuation of mobile networks increase these challenges.

Understanding exactly when and where consumers experience issues like crashes, delayed UI loading, and Application Not Responding (ANR) errors, it’s essential to quickly identify and resolve these problems. Tracking these metrics over time gives you insights into how well your app is performing and ensures that your team is informed as soon as service level goals are not met.

Below are essential metrics for evaluating your app’s performance and health.

Application start time/app launches

The performance of your application must be efficient; slow starts won’t go unnoticed by users. Launch speed is an excellent indication of the quality of your software, and tracking it helps you determine how responsive it is. Use New Relic to track important data—such as cold time, hot launch time, and more—to improve the performance of your app.

Android vitals deem the subsequent app startup times to be excessive:

  • It takes at least 5 seconds to start up cold
  • Hot startup requires at least 1.5 seconds or longer

Cold start: A cold start refers to an app’s start from scratch.

app-cold-launch-metrics

Hot start: A hot start refers to when your app’s process is already running in the background.

app-hot-launch-metrics

Service map

The service map breaks down your application into all of its component services and depicts the observable dependencies between these services in real time, allowing you to discover bottlenecks and understand how the data flows across your architecture from frontend to backend. This will list your user experience, services, infrastructure, and network entities, including engineering operations.

service-map

Geographic distribution

By looking at the geographic distribution report, you can identify the countries, business regions, or geographical regions where an application gets most of its visitors/unique visitors. Geo distribution includes: network requests, data transfer size, failure rates etc.

geographical-location

Distributed transactions

It might be difficult to diagnose performance problems, particularly if they occur intermittently. Looking at the application logs, we can see that the app takes more than 1 second to fetch data from the database or from third-party APIs. Beyond this observation, no immediate insights are provided by the logs.

Distributed tracing can greatly enhance monitoring across complex application landscapes, encompassing multiple services or applications. This isn’t just for web apps. We extended it to iOS and Android mobile applications, revealing new performance insights.

distributed-tracing

Crashes

Large-scale mobile apps are bound to crash. New Relic instrumentation helps in identifying high-impact crashes. To find functions or methods that are causing problems, display crash data particular to each session and user journeys.

crash-summary-event-trail

Change tracking

Hotfixes and new, significant code changes are captured using change tracking. Using an automated deployment pipeline integration or an API, you can record changes and see them as markers on the mobile summary page.

change-tracking

Errors inbox

A centralized method for recognizing and assigning priority to problems. Similar instances of errors or events are grouped together in an errors inbox. When two errors have the same fingerprint, they’re combined into one error group. Rich error information is provided, enabling you to rectify errors more quickly and in the context of the entire stack.

error-inbox

Comparison of different app versions

For insight into the success of your release, leverage our release versions page to compare crash rates, user engagement, and performance indicators across different releases.

release-version-summary

User journeys

You can now quickly access an extensive overview of every user interaction preceding a crash with New Relic user journeys. This enables you to keep track of every stage of the mobile user journey and identify and address issues more quickly, preventing any interruptions before they negatively impact the user experience.

user-journey

Offline telemetry data

A data payload is retained locally if it cannot be sent online. The data is transferred to New Relic and removed from storage as soon as a connection is established again.

offline-telemetry-data

Conclusion

Strong observability for mobile applications helps ensure an enjoyable experience for users. You can improve application durability and customer satisfaction by tracking crashes, monitoring performance, and quickly solving issues using tools like New Relic mobile monitoring.

To learn how you can gain complete visibility into the performance and troubleshooting of your mobile apps, Click here to schedule a call with our Observability experts

Connecting SLOs to business metric

1. Consolidate monitoring tools and strategy

Our first task was tooling. Reducing the number of monitoring tools used can have multiple benefits for any organization. Multiple tools create challenges in painting a global picture of your infrastructure, services, and client experiences. As our development teams grew rapidly to meet our surge in customers, our teams were reviewing logs in one tool, and analyzing service performance in another. Each service had its own approach to observability, looking at different metrics to decide if there were issues to address as a priority. There was no single approach to collating MELT (metrics, events, logs and traces) data for a comprehensive overview of our systems.

Costs were also increasing. Multiple tools in use by each team were adding to our FinOps costs, and because of the fragmentation, we weren’t getting the full value from what we were spending on observability.

Our first goal was to be able to roll out observability across multiple services comprehensively. With New Relic, we could establish a consistent level of quality, optimize customer-facing software, and introduce common instrumentation. We wanted to instill observability as a mindset—rather than ad hoc logging and monitoring, and having one tool in use across all functionalities helped us build that culture.

2. Implement KPIs for all services

With the foundations laid, our focus turned to the heart of our product ecosystem—our services. Our product is made up of multiple services with different functionalities, such as student dashboards, learning communities, branded mobile apps, and a suite of e-commerce tools—including a proprietary payments platform. Reliability, performance, uptime, and latency are critical to us.

Recognizing this, we embarked on defining and implementing key performance indicators (KPIs) that mattered most. Through New Relic’s service level objectives (SLO) capabilities, we began crafting a performance blueprint that all services could aspire to, grounding these objectives in measurable, impactful metrics like performance, uptime, and latency.

From here, we enhanced our service level agreements (SLAs) with clients, which we expose publicly through our website. For our own internal use of SLOs, we wanted to set stricter levels where we can be much more aggressive around the level of quality and reliability we want to achieve. Some SLOs have clear revenue impacts, such as Thinkific Payments. We watch the SLOs for payments very carefully: latency and uptime impact on revenue. Our customers can’t make money if they can’t sell. If Thinkific Payments is down, it impacts the flow of business opportunities for our customers.

Each product development team monitors its SLOs but we also want to extrapolate those SLOs across all our services. This lets us report to decision-makers in a way that doesn’t require them to look at SLOs from many sources. Having these clear SLOs, all measured consistently, and able to be aggregated, allows us to report on them and encourage decisions that weigh up how much to invest in improving reliability versus adding new features.

3. Report baselines and KPIs: Support decision-making through observability insights

At Thinkific, we deem the need for precision in communication and decision-making critical. Observability isn’t just about collecting data; it is about translating this wealth of information into actionable insights, particularly for those steering the ship—our senior engineering and product leaders. With common observability tooling and standardized KPIs, we now pull together monthly reports.

Thinkific dashboard SLO business metrics

Our reports provide a snapshot of our reliability overall and highlight any critical problems. When reporting metrics, we adopt a traffic light system—green, yellow, and red—to draw immediate attention to areas of concern and equip even our non-technical leaders with the understanding to make informed decisions. This also helps us make recommendations and share our rationale behind them in our report.

SLO report observability metrics Thinkific

Our reports don’t just explain the metric: ‘This is below the threshold, therefore you should work on it’. They weave in the ‘why’ behind each figure, connecting the dots between technical performance and its impact on our users and business goals. This approach empowers our leadership to prioritize effectively, balancing the scales between innovation and optimization with a keen eye on customer value. They can better answer whether we should allocate resources in upcoming sprints to resolve scalability, performance, or latency issues, or whether we should continue building new features for our customers.

New Relic helps us collect this data straight from our systems, so we don’t have to collect and collate it ourselves. So when sharing SLO metrics in monthly reports, we can make that link between the business impact of having observability metrics. We are then able to have conversations on prioritizing technical work according to the data. Our leadership team then makes decisions around these recommendations. At quarterly planning meetings, we can show what we did, how the system has improved, and how those improvements have enhanced the customer experience and business metrics overall.

 

4. Map SLO user journeys

Using our quarterly planning sessions to identify our priority target segments for the quarter ahead, we can now examine the flow of services our customers access and use when they are in our product and how our systems behave. From there, we can start mapping their user journeys through our product and tracking the SLO metrics for each service in their critical path. By doing that, we can focus our engineering time on supporting our priority target groups through their use of our product.

We want all teams, including product managers, to be able to dig into our metrics and understand how things are going from a performance perspective. We see reliability as a key feature in our product, and New Relic helps us connect product managers and technical teams to work on issues together. Our observability culture encourages everyone to jump in and dig into issues and think about how to support the customer user journey the best way we all can.

 

5. A culture of collective ownership

At the heart of Thinkific’s product development organization lies a fundamental shift toward a culture of shared responsibility. It’s a culture where every team member, from designers to product managers to engineers, is empowered to contribute to our collective success. Through this shared commitment, we’re building a platform and crafting experiences that educate, inspire, and transform lives.

This journey of integrating SLOs into our strategic fabric showcases the transformative power of observability. As we move forward, we remain committed to this path of continuous improvement, driven by data, and united in our mission to empower creator educators and their audiences around the globe.

To learn how you can enjoy align your SLOs with Business Metrics, Click here  to schedule a call with our Observability experts

Why observability is becoming a key part of C-suite conversations

 

 

1. Quickly driving productivity and proving value

The New Relic free tier—which includes 100 GB of data ingest—has proven particularly popular among executives, it shows that New Relic delivers value quickly without having to allocate any budget at all.

“My goal was to use the free tier to demonstrate value to our company’s leadership and make the case for an expanded observability deployment,” said Hendrik Duerkop, director of infrastructure and technology at Statista. “The justification was easy to understand: our monthly bill with New Relic is equivalent to four developer hours while delivering massive savings in money and resources … The introduction of the New Relic free tier has been a pivotal moment for Statista, enabling us to identify and address some of our most persistent challenges.”

C-suite leaders themselves appreciate the immediate value delivered by New Relic. “In New Relic, everything is tracked automatically,” said Casey Li, CEO of customer software agency BiteSite. “Unlike other platforms, we don’t have to pick and choose what’s important from the outset. So when a customer has an issue, we know we’ll have the data we need. For example, when a customer says something is loading slowly, New Relic might show that the fetch to the database is taking up 20% of the response time, while the HTML render is only taking up 5%. This information is immediately actionable—we understand where the performance problem is and can work on reducing it. It’s not just that the information from New Relic is actionable, but the solution is usually very easy to implement.”

 

2. Breaking down silos to improve system resilience

Tech leaders use New Relic to achieve goals set by their executives, universally this me means less silos and more reliable performance in order to improve the customer experience.

“M&S dramatically shifted our traditional retail approach for the omnichannel consumer in the drive to be digital-first in 2018,” said Steven Gonsalvez, principal engineer at M&S. “To keep up with this digital transformation, my engineering team had to introduce new tech. New Relic has helped M&S reduce MTTR by one-third, a massive win. This improvement demonstrates the real value of observability—minutes of downtime and poor customer experience all relate to revenue, but they also impact long-term customer retention and lifetime customer value.”

“The ability that New Relic provides to allow you to correlate specific requests or between services all the way from infrastructure through to the front end is really powerful and makes it a lot easier to diagnose issues … before they arise,” said Graham Little, director of engineering at 10x Banking. “That helps us build our culture as well. We’ve got engineers talking to each other across different teams, getting to know each other and breaking down some of those silos.”

“Having all our data available in one place allows me to make informed decisions,” said Joseph Wogan, principal platform engineer at 10x Banking. “It makes me feel empowered that I have the ability to deliver a great customer experience to our clients.”

 

3. Communicating what matters to the business

New Relic gives developers and engineering leaders the data to communicate more effectively with executives and business leaders.

“The technical metrics are not necessarily the ones that you care about,” said JD Weiner, director of DevOps at Forbes. “You don’t necessarily care about memory usage or five-minute load on the system. What you care about are the metrics that matter to the business. Those are things like customer satisfaction metrics, and that’s the direction we’re moving in at Forbes.”

“What I like about New Relic is having everything available to me in a single place,” said Patrick Hyland, senior engineering manager at Domino’s Pizza UK and Ireland. “I can have very predictable conversations with my SRE engineers. It results in a common language and the ability to talk in a clear and effective way about site reliability and engineering situations.”

 

4. Delivering proactive support for peak demand performance

For retailers and other customer-facing organizations, reliability is most important during moments of peak demand. New Relic ensures that a company will be able to keep pace during its busiest seasons—something executives keep a close eye on.

“To prepare our site for triple the traffic during the holiday season, we need team collaboration, and most importantly, the insight to adapt our infrastructure to add new services and capabilities,” said Manuel Garcia, senior principal engineer at farfetch.com. “We need to deliver business growth without compromising on user experience … New Relic is crucial to achieve this and ensure that we have a successful sales season.”

The pressure to perform is even stronger for the online gambling company William Hill, which sees regular peaks during major sports seasons.

“On a typical Saturday, over five million messages get processed through our system,” said Rashid Mattin, software engineering manager at William Hill. “We have a complex platform whereby we have over 400 microservices running, and trying to find where the error is within 400 apps can be quite challenging. So when we’ve got the right observability, and when we’ve got things like OpenTelemetry and we can trace, it helps reduce our MTTR.”

 

5. Simplifying security to reduce application risks

As companies face more stringent privacy regulations—and as executives steer their organizations away from security risks—New Relic simplifies the process of identifying and resolving vulnerabilities. These regulations are particularly important for companies dealing with health and financial data.

“When it came to security within William Hill and within trading, we used a number of tools such as container security scanning and container runtime scanning,” said Mattin. “We also had security scanning integrated into our pipelines. However, a lot of this information would be surfaced to the SRE team or to the InfoSec team. What New Relic and the Vulnerability Management tool enabled us to do was present and surface that data to the developers.”

“In anticipation of growth and the need to meet the rigors of healthcare security standards, we needed a tool to help diagnose and predict the performance and robustness of our technologies,” said Aretha Delight Davis, founder and CEO of ACP Decisions, a non-profit foundation helping people make more informed medical decisions. “New Relic is not only a telemetry platform but also services the triad of understanding the state of a system through its observability, monitoring and telemetry capabilities.”

To learn how you can enjoy these benefits, Click  here  to schedule a call with our Observability experts.

The business value of observability: Insights from the 2023 Observability Forecast

I’ll unpack four key findings from the report that show that the return on investment (ROI) in observability is not just beneficial; it’s essential.

1. Business value and ROI

The standout theme from this year’s report is the tangible business value of observability. Organizations are not just adopting observability for the sake of technology—they’re seeing it as a strategic move to achieve core business objectives. The results? Fewer outages, improved service-level metrics, operational cost savings, and increased revenue.

The numbers speak for themselves. For example, survey respondents indicated a 2x median annual ROI. That means for every dollar invested in observability, organizations are seeing a return of two dollars. An impressive 86% of respondents affirmed the value they receive from their observability investments, with 41% reporting over $1 million in total annual value. This ROI isn’t just a number; it’s a testament to the transformative power of observability on business, technology, and revenue streams.

Without observability, organizations risk higher operational costs and significant revenue loss from downtime. In fact, respondents cited improved system uptime and reliability (40%),  increased operational efficiency (38%), and enhanced user experience (27%) as primary benefits.

2. The power of full-stack observability

To accelerate digital transformation initiatives, organizations are increasingly monitoring their tech stack end to end.

While most organizations still don’t monitor their full tech stack, this is changing. Full-stack observability increased 58% year over year (YoY). By mid-2026, at least 82% of respondents expected to deploy each of the 17 different observability capabilities.

The fast adoption of full-stack observability is likely tied to the value it unlocks for organizations. The more capabilities an organization deploys, the greater the value derived from observability. Those with five or more capabilities deployed were 82% more likely to report over $1 million in annual value from their observability investments.

Organizations that achieve full-stack observability improve service-level metrics as well—particularly mean time to resolution (MTTR) and mean time to detection (MTTD). Respondents who said their organization has more than five capabilities currently deployed were 40% more likely to detect high-business-impact outages in 30 minutes or less, compared to those with one to four capabilities currently deployed. Organizations with full-stack observability had median outage costs of $6.17 million per year compared to $9.83 million per year for those without full-stack observability—a cost savings of $3.66 million per year.

 

3. Boosting performance and productivity

Increasingly, businesses rely on observability to drive workplace efficiencies, innovation, and agility, and meet customer demands with exceptional digital experiences.

For practitioners, observability is a tool that boosts productivity, enabling faster issue detection and resolution. For IT decision makers (ITDMs), it’s a strategic asset, helping achieve both technical and business key performance indicators (KPIs). About a third (35%) of ITDMs said it helps them achieve technical KPIs and/or business KPIs (31%). Almost half (46%) of practitioners said it increases their productivity so they can find and resolve issues faster.

 

4. The high cost of ignoring observability

The benefits of implementing observability are clear. What happens when organizations forgo this crucial practice? The 2023 Observability Forecast provides some sobering insights into the business outcomes of not having an observability solution.

A staggering 96% of respondents indicated that the absence of an observability solution would have a significant financial impact on their business outcomes. About three in ten (29%) of respondents cited higher operational costs due to increased operational efforts as the most severe consequence. This was closely followed by 23% who pointed to revenue loss from increased downtime.

Only 3% of respondents felt that the absence of an observability solution would have no impact on their business outcomes. The overwhelming majority of technology professionals recognize the critical role that observability plays in modern business operations.

 

Conclusion

The data is unequivocal: the absence of an observability solution carries hard financial stakes and can have a ripple effect on other aspects of business, from reputation to competitive positioning. For decision makers, the message is even more transparent. Observability is not a luxury or an optional add-on; it’s a necessity. Businesses must empower every engineer to do better work with data at every stage of the software development lifecycle (SDLC) to improve business outcomes and compete in an increasingly complex digital landscape.
By investing in observability, you’re not just avoiding potential pitfalls; you’re actively driving your business towards greater efficiency, security, and profitability. As the 2023 Observability Forecast  shows, the return on this investment is not just beneficial; it’s essential.

To learn more about Security Observabilty, Click  here to schedule a call  with our experts.

Webiscope is now part of Aman Group

We are happy to announce that Webiscope is now part of Aman Group. We look forward giving our customers and partners greater value with more complete solutions and outstanding service.