BOOK A DEMO

Monitor OpenAI GPT application usage in New Relic

Monitor OpenAI with our integration

New Relic is focused on delivering valuable AI and ML tools that provide in-depth monitoring insights and integrate with your current technology stack. Our industry-first MLOps integration with OpenAI’s GPT-3, GPT-3.5, and beyond provides a seamless path for monitoring this service. Our lightweight library helps you monitor OpenAI completion queries and simultaneously records useful statistics around ChatGPT in a New Relic dashboard about your requests.

With just two lines of code, simply import the monitor module from the nr_openai_monitor library and automatically generate a dashboard that displays a variety of key GPT-3 and GPT-3.5 performance metrics such as cost, requests, average response time, and average tokens per request.

To get started, install the OpenAI Observability quickstart from New Relic Instant Observability (I/O). Watch the Data Bytes video or visit our library repo for further instructions on how to integrate New Relic with your GPT apps and deploy the custom dashboard.

Key observability metrics for GPT-3 and GPT-3.5

Using OpenAI’s most powerful Davinci model costs $0.12 per 1000 tokens, which can add up quickly and make it difficult to operate at scale. So one of the most valuable metrics you’ll want to monitor is the cost of operating ChatGPT. With the integration of GPT-3 and GPT-3.5 with New Relic, our dashboard provides you with real-time cost tracking, to surface the financial implications of your OpenAI usage and help you determine more efficient use cases.

Another important metric is average response time. The speed of your ChatGPT, Whisper API, and other GPT requests can help you improve your models and quickly deliver the value behind your OpenAI applications to your customers. Monitoring GPT-3 and GPT-3.5 with New Relic will give you insight into the performance of your OpenAI requests, so you can understand your usage, improve the efficiency of ML models, and ensure that you’re getting the best possible response times.

Other metrics included on the New Relic dashboard are total requests, average token/request, model names, and samples. These metrics provide valuable information about the usage and effectiveness of ChatGPT and OpenAI, and can help you enhance performance around your GPT use cases.

Overall, our OpenAI integration is fast, easy to use, and will get you access to real-time metrics that can help you optimize your usage, enhance ML models, reduce costs, and achieve better performance with your GPT-3 and GPT-3.5 models.

For more information on how to set up New Relic MLOps or integrate OpenAI’s GPT-3 and GPT-3.5 applications in your observability infrastructure, visit our MLOps documentation or our Instant Observability quickstart for OpenAI.

To learn more about how you can better observe your OpenAI usage, schedule a call with us today

eBPF + OpenTelemetry = The Perfect Match for Observability

In the world of modern software development, observability has become a critical aspect of ensuring reliable and performant applications. The combination of eBPF and OpenTelemetry provides a powerful set of tools for developers and DevOps teams to achieve this goal. In this article, we will explore the technical and commercial advantages of using these technologies together.

eBPF is a technology that allows developers to trace and monitor various aspects of a system, including network traffic and system calls, in real-time. It does this by allowing developers to write small programs that run in the Linux kernel. According to a recent article on The New Stack, “eBPF programs can be used to trace everything that happens within the kernel and on the user side, allowing for a comprehensive view of the system.” This allows developers to quickly identify issues and troubleshoot them more effectively.

OpenTelemetry is an open-source set of libraries and tools that allow developers to collect telemetry data from various sources. This data can be used to gain insights into the system and identify potential issues. According to a recent article on TechTarget, “OpenTelemetry allows developers to instrument code to generate telemetry data that can be collected and analyzed, providing a more comprehensive view of the system.” This allows developers to quickly identify and address issues, improving the overall reliability and performance of their applications.

One of the primary technical benefits of using eBPF and OpenTelemetry together is that they provide a comprehensive view of the system. According to a recent article on The New Stack, “eBPF and OpenTelemetry can work together to provide a more comprehensive view of the system, from the kernel to the application layer.” This allows developers to quickly identify issues and troubleshoot them more effectively.

Another technical benefit of using eBPF and OpenTelemetry is that they are highly scalable. According to a recent article on Cloudflare’s blog, “eBPF and OpenTelemetry are highly scalable, which makes them ideal for use in modern, complex software systems.” This scalability allows developers to monitor their systems effectively, even as they grow in complexity.

From a commercial perspective, the benefits of using eBPF and OpenTelemetry are significant. By quickly identifying and addressing issues, developers can improve the overall reliability and performance of their applications, reducing downtime and improving the customer experience. According to a recent article on Forbes, “observability is critical to ensuring the success of modern software applications, and eBPF and OpenTelemetry provide powerful tools for achieving this goal.” This, in turn, can lead to increased revenue and customer satisfaction.

Another commercial benefit of using eBPF and OpenTelemetry is that they can help reduce costs. By identifying and addressing issues quickly, developers can reduce the need for costly downtime and emergency fixes. According to a recent article on TechTarget, “observability can help reduce the overall cost of software development by identifying and addressing issues early in the development cycle.” This can lead to faster time-to-market and reduced development costs.

In conclusion, eBPF and OpenTelemetry provide powerful tools for achieving observability in modern software systems. These technologies provide a comprehensive view of the system, are highly scalable, and can help reduce costs and improve the customer experience. By using eBPF and OpenTelemetry together, developers and DevOps teams can quickly identify and address issues, improving the overall reliability and performance of their applications.

Refrences:

  1. “Observability with eBPF and OpenTelemetry” on The New Stack: https://thenewstack.io/observability-with-ebpf-and-opentelemetry/
  2. “What is OpenTelemetry?” on OpenTelemetry.io: https://opentelemetry.io/docs/concepts/what-is-opentelemetry/
  3. “Why observability is critical to modern software applications” on Forbes: https://www.forbes.com/sites/forbestechcouncil/2022/05/24/full-stack-observability-optimizing-the-applications-experience-in-the-modern-world/?sh=1103a9a738e4
  4. “Using eBPF and OpenTelemetry for Observability at Cloudflare” on Cloudflare’s blog: https://blog.cloudflare.com/introducing-ebpf_exporter/
  5. “How observability can reduce software development costs” on TechTarget: https://www.techtarget.com/searchapparchitecture/tip/5-basic-strategies-for-distributed-systems-observability

Do you want to learn more OR if you wish to implement eBPF and OpenTelemetry in your organization, SCHEDULE A CALL with us today

The business impact of Telemetry data

In today’s digital age, data is everything. It is the backbone of organizations and a driving force behind decision-making processes. One of the most important data types that companies collect is telemetry data. Telemetry data is a type of data that is collected from remote sensors and sent to a central location for analysis. The data is then used to monitor and optimize a wide range of systems, from industrial machinery to website performance. In this article, we will explore the business impact of telemetry data collection on organizations that collect such data.

Improved Operational Efficiency

Telemetry data is a valuable tool that can help organizations optimize their operations. By collecting data on everything from machine performance to supply chain logistics, organizations can identify areas for improvement and make data-driven decisions that lead to greater efficiency. According to a study by McKinsey, “Companies that leverage advanced analytics to improve their operational efficiency can reduce costs by up to 15%.” This is a significant improvement in profitability and can help organizations remain competitive in their respective markets.

Enhanced Product Development

Telemetry data can also be used to improve product development. By collecting data on how customers interact with products, organizations can identify areas for improvement and develop products that better meet the needs of their customers. This can lead to increased customer satisfaction, higher sales, and a competitive advantage in the marketplace. As Gartner notes, “Companies that use telemetry data to inform product development can reduce time-to-market by up to 50%.”

Predictive Maintenance

Telemetry data can be used to predict when maintenance is needed on equipment. This can help organizations avoid costly downtime and repairs, as well as extend the life of their equipment. According to Forbes, “Predictive maintenance can reduce maintenance costs by up to 30%, reduce downtime by up to 45%, and increase equipment uptime by up to 10%.” This can lead to significant improvements in operational efficiency and profitability.

Improved Customer Experience

Telemetry data can be used to improve the customer experience. By collecting data on customer behavior, preferences, and interactions with products and services, organizations can develop a better understanding of their customers’ needs and preferences. This can lead to more personalized customer experiences, increased customer loyalty, and higher sales. As Deloitte notes, “Telemetry data can help organizations provide a more personalized experience for customers, which can lead to increased customer loyalty and higher sales.”

Better Risk Management

Telemetry data can also be used to manage risk. By collecting data on everything from environmental conditions to equipment performance, organizations can identify potential risks and take proactive measures to mitigate them. This can help organizations avoid costly incidents and ensure regulatory compliance. As a report from Accenture notes, “Telemetry data can help organizations identify potential risks and take proactive measures to mitigate them, leading to better risk management and compliance.”

In conclusion, telemetry data collection has a significant impact on organizations that collect such data. It can improve operational efficiency, enhance product development, enable predictive maintenance, improve the customer experience, and support better risk management. By leveraging telemetry data, organizations can make data-driven decisions that lead to greater efficiency, profitability, and success in the marketplace.

References:

  1. McKinsey & Company, “Advanced analytics in operations: A practical guide for achieving business impact,” 2019.
  2. Gartner, “IoT analytics: Opportunities and challenges for marketers,” 2017.
  3. Forbes, “Why Predictive Maintenance Is The Future Of Industrial IoT,” 2021.
  4. Deloitte, “Telemetry in the automotive industry: The benefits and challenges,” 2021.
  5. Accenture, “The role of telemetry in business operations,” 2019.

Schedule a meeting with us to discuss ways Telemetry data can impact your success

Introducing the OpenTelemetry Collector

 

The Collector can receive telemetry data from a variety of sources, including OpenTelemetry SDKs, agents, and other collectors, and can export that data to a variety of destinations, including backend systems like Prometheus, Zipkin, and Jaeger, as well as log management systems and alerting tools. It can also be configured to perform a number of data processing tasks, including filtering, aggregating, and transforming data, as well as applying rules and policies to telemetry data.

The OpenTelemetry Collector is highly configurable and can be customized to meet the needs of different environments and use cases. It is a key component of the OpenTelemetry project, which aims to provide a consistent, standard way of instrumenting, collecting, and processing telemetry data across different languages and platforms.

Here are the 3 most popular Collector architectures and use cases that use them

The Direct Exporter architecture

A straightforward and efficient approach that uses the Collector to directly export data to your preferred monitoring platform. This architecture is ideal for users who need a simple way to gather telemetry data and quickly export it to a monitoring system without much processing. Two examples of using this architecture are:

The Fan-Out architecture

A flexible and powerful approach that uses the Collector to split incoming data into different pipelines based on their source and destination. This architecture is ideal for users who need to process, transform, or enrich telemetry data before exporting it to a monitoring system. Two examples of using this architecture are:

The Sidecar architecture

A container-based approach that uses the Collector as a sidecar process to collect and export telemetry data from a containerized application. This architecture is ideal for users who need to collect telemetry data from multiple containers running on a single host. Two examples of using this architecture are:

In summary, the OpenTelemetry Collector is a versatile tool that provides different architectures to collect, process, and export telemetry data to different monitoring platforms. By selecting the appropriate architecture for your use case, you can customize your observability pipeline to meet your specific needs.

For more information and free consultation meeting – Sign Up Here.

 

Top five observability pricing traps

Unbudgeted monthly overage fees and penalties

Subscription-based billing for enterprise software is designed to maximize committed shelfware (software that was paid for but not used). While using too little of your commit results in shelfware, using too much usually results in overage penalties. This is beneficial for vendors but not for you.

Many observability vendors include limited amounts of data ingestion, data retention, containers, custom metrics, synthetic checks, and so on as part of their bundled per-agent or lower-priced edition limits. To avoid surprise costs, it’s important to consider these limits and the costs of exceeding them when forecasting your budget.

For example, charging per container imposes a configuration burden on your engineering teams since some observability vendors (like Datadog) charge a premium for monitoring more than five containers running continuously on an infrastructure host, which is extremely common (most customers run 20 or more). And Splunk charges a 150% overage fee if you exceed your monthly subscription usage level for synthetic checks.

Cheap introductory pricing with unpredictable costs as you scale is frustrating. Beware of “starting at” pricing. For example, Datadog offers a lower price if you have an annual contract, including a monthly spending commitment, and a 17–50% higher on-demand price if you don’t have an annual contract or exceed your monthly commitment. Its on-demand price for hosts is 17–20% higher and logs are up to 50% higher!

In addition, for any pricing unit, you should be able to burst (autoscale without penalty) during seasonal spikes or unpredictable workload increases.

Tools to query, track, and alert on billing-related usage are a best practice because they make accurate sizing and pricing easier. For example, you should be able to create an alert when data usage exceeds a fixed monthly threshold for gigabytes. Unfortunately, not all observability vendors provide these tools, so ask whether they do and, if so, how. However, you shouldn’t have to establish daily quotas that you anticipate you’ll run over during peak and seasonal periods, which would result in constantly adjusting quotas.

With usage-based pricing and billing, you don’t have to predict upfront how much data you’ll use over the next year(s) or deal with shelfware if you use too little or penalties if you use too much. Choosing a usage-based pricing and billing vendor helps you avoid unbudgeted overages after spending tons of time attempting to forecast the annual spend for each SKU.

Paying for the whole month or year at your peak usage level

Another variable to consider is how the vendor handles seasonality, blue/green deployments (also known as canary deployments), and traffic spikes. Some observability vendors (like Datadog, Elastic, and Splunk) price by high watermark (peak) usage instead of for your actual usage. In a world where infrastructure scales up and down with customer demand, charging at a peak rate is predatory, as spikes can double your bill. For example, during a winter holiday season, you may have higher usage due to greater user traffic on your frontend applications, which essentially penalizes you for success. Ideally, you should only pay for what you use instead of paying for peak usage all month.

Paying for unwanted bundles to get the capabilities you need

For observability vendors that use a bundle-of-SKUs approach, consider whether that vendor forces you to bundle adjacent use cases. This could mean that if you want just application performance monitoring (APM), you must also sign up for infrastructure monitoring. For example, Datadog requires you to have infrastructure monitoring for every billable APM host, which would increase your APM costs by the cost of the monitoring infrastructure as well.

Constant re-forecasting and re-contracting for 16+ different SKUs

All major observability vendors (except New Relic) use a bundle-of-SKUs approach with up to 20 different infrastructure- and service-based pricing units (such as hosts, agents, nodes, CPU cores, and so on), which are not stable, often as committed use-it-or-lose-it monthly amounts. This complex bundle-of-SKUs approach requires you to forecast your usage based on your historical usage, which can be challenging, especially if you’re experiencing rapid growth.

This complicated forecasting process can be further frustrating when hit by surprise overages. When your monthly usage exceeds your commitment, you’ll receive an unbudgeted bill for overage fees and on-demand penalties. Just a few hours of higher traffic could double your monthly costs!

Developers constantly evolve their applications to take advantage of new technologies, shifting from on-prem to cloud, large to small virtual machines (VMs), VMs to Kubernetes (K8s), K8s to serverless containers, containers to serverless functions, and so on. As your applications and components change with each sprint, you must re-analyze, re-forecast, and re-contract each month for each SKU. This is a difficult task and inefficient use of time for most teams, so many customers are repeatedly hit with unbudgeted overage bills and constantly have to re-negotiate ever-larger contracts with these vendors. Instead, look for a pricing model that’s weighted based on stable pricing units like users.

Whereas New Relic all-in-one pricing gives you the flexibility to use any of our 30+ capabilities as your needs change, enabling full-stack observability with no cost penalties or complicated forecasting.

Data explosion doubling your bill

Data is the biggest variable cost for observability. As you shift from on-prem to cloud and microservices, there can be hundreds of small programs instead of a few large programs. Customers generally report 2–10x telemetry data increases or more. And data can double every two to three years—a data explosion. The associated network, storage, and compute costs can add up quickly.

A common challenge is that log volumes are unpredictable when it comes to forecasting costs. For example, system and user load along with unexpected code changes can cause Datadog log management costs to explode. Datadog has a complicated formula to calculate how logs are used that can add more than US$2.50–$3.75 per one million logging events for 30 days of retention. With an average of 1.5–2 GB per million events, that would be US$1.00–$2.50 per GB! That’s a lot more than the advertised data ingest rate of US$0.10/GB. Splunk charges approximately US$4.00 per GB for logs. Elastic charges per server in your Elasticsearch cluster and an increase of data loggings requires an increase in Elasticsearch servers. So, doubling your data ingestion can double your cluster size and costs.

Therefore, it’s important to future-proof cloud adoption by looking for an observability vendor that offers a low data cost per GB.

And, to reduce the amount of data ingested (and data ingest bills), you should be able to manage your data ingest by configuring data-dropping rules that filter out unimportant, low-value data and potentially sensitive data.

If you want to evaluate New Relic state of the art observability platform, sign up for a New Relic account today.

 

The How and Why of Using Terraform With PagerDuty

For starters, it’s fast. When your infrastructure is all defined as code you can run a script to deploy a series of virtual servers; launch things like containers, databases, and load balancers; and configure any cloud service you might need—such as PagerDuty. Writing the configuration in code also helps keep the settings consistent, reduce the chance of introducing errors, and mitigate deviations between deployments.

Think of the last time a single engineer in your organization was the largest source of knowledge about a certain part of your deployment process. And now, think of how frustrating it was when that engineer left with that knowledge and the rest of the team had to scramble to figure out the missing pieces. When your infrastructure is defined as code, it is already documented for the whole team to see. Engineers can look at the code in a single place and read how the services and processes are configured. This minimizes the risk of losing valuable system knowledge. Of course, the configurations could be documented in a wiki. But we all know that trying to find the right information in a wiki can be a lot of extra work.

All of these benefits of configuring infrastructure as code point to the main reason for this strategy and that is increased efficiency. Knowing that the infrastructure is configured as expected gives engineers the confidence that it can be automatically deployed without any trouble. Then, the engineers can focus on building rather than configuring.

Tools such as Terraform from HashiCorp have emerged as one of the leading ways to declaratively configure technology stacks. And, PagerDuty is among the most popular services that is being being configured by Terraform users. PagerDuty is one of the top 20 Terraform providers, and in this post, I’ll describe how you can use it to configure your PagerDuty account.

Terraform

Terraform is an open-source software tool from HashiCorp that allows you to build, change, and version your infrastructure through code. It allows you to configure your infrastructure and services using a declarative programming language called HashiCorp Configuration Language (HCL).

As organizations adopted more external services, Terraform has grown with the trends to also support configuring a variety of applications. Today, Terraform is a configuration engine that interacts with over 200 providers to manage most of the pieces of your infrastructure. There are providers for everything from AWS to MySQL to Datadog.

The general idea of using Terraform with PagerDuty is to use it as the one source for creating, editing, and deleting resources in your PagerDuty account. With teams using Terraform as their tool of choice for defining other pieces of their infrastructure, it’s only natural that they would also want to configure their PagerDuty accounts in order to keep all of their infrastructure defined in a single location.

Terraform allows teams to manage their infrastructure in a safe, readable, and easily repeatable way. While the HCL code used in Terraform was developed by HashiCorp, engineers will recognize that it looks a lot like JSON. For example, to create a team in PagerDuty, the Terraform configuration would look something like this:

resource "pagerduty_team" "eng_seattle" {
    name = Engineering (Seattle)"
    description = "All engineers in Seattle office"
}

Then, to add team members to that team, a user and team membership resources would be created:

resource "pagerduty_user" "susan_developer" {
    name = "Susan Developer"
    email = "susan@email.com"
}

resource "pagerduty_team_membership" "susan_team_membership" {
    user_id = pagerduty_user.susan_developer.id
    team_id = pagerduty_team.eng_seattle.id
}

Once all the configuration is set, the Terraform code can then be checked into a version control system so that a history of changes can be recorded in case there needs to be a roll-back in the deployed definitions.

Each time that Terraform runs it defines the state of your service settings. In the case of PagerDuty, if resource definitions for things such as users or teams are removed from the files, Terraform will think that you want to delete them in PagerDuty. So, be sure to have a clear understanding across the organization that Terraform is the source of truth for PagerDuty settings, and not the PagerDuty web interface.

Before we get too much further into showing code, it’s important to point out that there are three Terraform products from HashiCorp–Open Source, Cloud, and Enterprise. To get started with and experiment with Terraform this article will use the open source product as the example. Check out the Installing Terraform article from HashiCorp if you’re brand new to the product.

PagerDuty Terraform Provider

PagerDuty’s Terraform provider began as a community success story. The project was started by Alexander Hellbom, a DevOps Engineer in Sweden. Alexander’s company defined nearly all of its infrastructure configurations in Terraform. When they adopted PagerDuty, Alexander discovered that there wasn’t a PagerDuty provider and set out building one on his own. The reception he received from the Terraform community was so positive and supportive that he has continued to maintain the project to this day. Alexander continues to be involved as a maintainer, but PagerDuty has begun to take a more active role.

The PagerDuty provider supports a wide array of PagerDuty resources, from Add ons and Escalation Policies to Maintenance Windows and Schedules. Head over the the PagerDuty Provider documentation to see a full list of resources that are supported.

Example

Before creating any Terraform files you’ll first need to generate a REST API Key in PagerDuty. The key that you generate will be used as the value for token below.

To get started defining PagerDuty settings with Terraform, create a directory where you want to store your Terraform (.tf) files. In that directory create a new file with the .tf extension. For learning purposes, you can create a provider block for ‘pagerduty’. That block initializes the PagerDuty provider, setting the value of token using the API key generated above, and will look something like this:

provider "pagerduty" {
    token = "your_pagerduty_api_key"
}

Now, in production, you may be a little squeamish about checking API keys into code repositories, and rightfully so. The provider also supports reading the access token from your environment variables. To do this, set an environment variable named PAGERDUTY_TOKEN with the value of your API key. When the PagerDuty Terraform Provider finds this environment variable it will initialize the PagerDuty provider using that variable, and you no longer need to use a provider block in your code.

With the provider set, you’ll now use resource blocks to define the specific PagerDuty objects you want to create and manage. Building off of the example above, where we defined Susan Developer as a user and added her to the “Engineering (Seattle)” team, we could now create an Escalation Policy that included her team with the code below.

To reference the values defined for Susan’s user and team you provide an expression that takes the type of the resource, the name you gave it, and then the field you want to reference. For example, to get the team ID for the eng_seattle team you would write pagerduty_team.eng_seattle.id. Using that syntax, we can now define the escalate_seattle_eng Escalation Policy for Susan’s team (referenced as pagerduty_team.eng_seattle) and using Susan as a target (referenced as pagerduty_user.susan_developer).

resource "pagerduty_escalation_policy" "escalate_seattle_eng" {
    name = "Seattle Engineering Escalation Policy"
    num_loops = 2

    teams = [pagerduty_team.eng_seattle.id]

    rule {
        escalation_delay_in_minutes = 10
        target {
            type = "user"
            id = pagerduty_user.susan_developer.id
        }
    }
}

Terraform Plan and Apply

Before running any other Terraform commands, you’ll need to initialize Terraform by running terraform init. This will check the Terraform files in your current directory and install any providers that are referenced. In this case it will install the latest version of provider.pagerduty. With Terraform initialized, it’s time to test out our definitions.

With Terraform initialized, it’s time to verify the definitions that you made. To do this, run terraform plan. This creates an execution plan that determines which actions need to be taken to get to the state defined in your configuration files. This is a really nice way to make sure that the changes you defined actually create the desired outcome, and prevents you from making any unwanted changes.

When you are satisfied with the state described in the execution plan, it’s time to run terraform apply to actually execute those changes. Terraform will show you the output of the plan again, and then ask for you to type the word ‘yes’ to confirm that you want the planned changes to occur. After you confirm, and the plan is executed, Terraform will provide a message communicating the result of each change that was carried out.

In addition to tackling the technical complexity of provisioning systems, HashiCorp also recommends some practices for addressing the organizational complexity of maintaining infrastructure as your operation scales. Similar to how when an application grows it is broken up into microservices that are each focused on a specific thing, they’re built and managed by a single team, and can usually be developed in parallel with other services—as long as their API doesn’t change. In a similar way, Terraform configurations can be broken up into smaller configurations and delegated out to different teams. Each configuration uses something called output variables to publish information and remote state resources to access output from other configurations.

Closing

For more information on what the PagerDuty Terraform Provider has to offer, checkout the PagerDuty Provider documentation on HashiCorp’s Terraform site. To ask questions, post issues, or submit contributions to the PagerDuty Provider project, head over to the Terraform Provider PagerDuty repository on GitHub. Also, to learn more about Terraform, have a listen to the “Talking Terraform with HashiCorp” episode of the Page It to the Limit podcast where we sat down with Paul Hinze, Senior Director of Engineering for Terraform, and Robbie Th’ng, Director of Product for Terraform.

If you don’t already have a PagerDuty account, sign up for a free Guided Trial today.