
Introduction to Helicone AI Gateway & LLM Observability
If you are building applications powered by large language models (LLMs), you have likely encountered a common set of challenges. How do you track every API call to OpenAI or Anthropic? How do you debug a strange response from your chatbot? How do you manage hundreds of different prompts across your team? Helicone is designed to solve these exact problems.
Helicone is an open-source AI gateway and observability platform. It acts as a middle layer between your application and the LLM providers you use. Every request your app sends to an AI model passes through Helicone. This gives you a central place to monitor, debug, analyze, and control your AI traffic. Think of it as a combination of a reverse proxy, a logging system, and a testing environment, all built specifically for LLM applications.
This tool is used by fast-growing AI companies to reduce costs, improve response quality, and catch errors before they affect users. It integrates seamlessly with major providers including OpenAI, Anthropic, and Azure OpenAI. Whether you are a solo developer or part of a large engineering team, Helicone provides the visibility you need to build reliable AI products.
This tutorial will guide you through everything you need to know to start using Helicone effectively. We will cover setup, key features, practical workflows, and expert tips.
Getting Started with Helicone
Creating Your Account and Project
Begin by visiting https://helicone.ai/ and clicking the “Sign Up” button. You can register using your Google account or GitHub account. After your first login, Helicone will prompt you to create a project. Give your project a descriptive name, such as “Production Chatbot” or “GPT-4 Testing”. Each project has its own separate data and settings.
Obtaining Your API Keys
Once your project is created, navigate to the “Settings” section. Here you will find your Helicone API key. This key is used to route your LLM requests through Helicone. Keep this key secure. Do not expose it in client-side code.
Helicone works by intercepting your existing API calls. You do not need to rewrite your entire application. Instead, you simply change the base URL of your LLM provider to point to Helicone’s endpoint. For example, if you are using OpenAI, you would change the base URL from https://api.openai.com to https://oai.helicone.ai. Helicone then forwards your request to the real OpenAI API and captures all the data.
Setting Up Your First Integration
Let us walk through a practical example using Python with the OpenAI library.
Before Helicone:
- Your code calls
openai.ChatCompletion.create()directly. - You have no visibility into latency, cost, or errors.
After Helicone:
- You set an environment variable:
OPENAI_BASE_URL="https://oai.helicone.ai/v1" - You add a header:
"Helicone-Auth": "Bearer YOUR_HELICONE_API_KEY" - Your code remains almost identical, but now every request is logged.
Here is a minimal code example:
import openai
openai.api_base = "https://oai.helicone.ai/v1"
openai.api_key = "YOUR_OPENAI_API_KEY"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello, world!"}],
headers={
"Helicone-Auth": "Bearer YOUR_HELICONE_API_KEY"
}
)
After running this code, go to your Helicone dashboard. You will see your first request logged immediately. This includes the prompt, the response, the number of tokens used, the latency, and the cost.
Key Features of Helicone
Dashboard for Requests, Segments, Sessions, and Users
The main dashboard is your command center. It displays a real-time feed of all requests. Each row shows the model used, the latency, the token count, and the status code. You can filter by date, model, user, or error type.
Segments allow you to group requests by a custom label. For example, you can tag requests from your “iOS App” or “Web Chat” segment. This helps you compare performance across different parts of your application.
Sessions group multiple requests that belong to the same conversation or workflow. If a user asks three questions in a row, those three requests form a session. This is invaluable for debugging multi-turn conversations.
Users tracking lets you associate requests with specific end-users. You can pass a user ID in the request header. The dashboard then shows you which users are making the most requests, which users are experiencing errors, and how much each user is costing you.
HQL Query Language for Data Analysis
Helicone includes its own query language called HQL (Helicone Query Language). HQL is designed specifically for analyzing LLM request data. It is similar to SQL but tailored for this use case. You can write queries to answer questions like:
- What is the average latency for GPT-4 requests in the last 24 hours?
- Which user has the highest error rate?
- Show me all requests where the response contained the word “error”.
- What is the total cost of requests made yesterday?
To use HQL, go to the “Logs” section and click “Query”. You can write queries directly in the text box. For example:
SELECT model, AVG(latency), COUNT(*) FROM request WHERE created_at > NOW() - INTERVAL '7 days' GROUP BY model ORDER BY COUNT(*) DESC
This query returns the average latency and total count for each model used in the last week. HQL is powerful because it lets you export raw data for further analysis in your own tools.
Prompts and Datasets Management
Managing prompts across a team is a common headache. Helicone provides a dedicated “Prompts” section where you can store, version, and organize your prompts. Each prompt can have multiple versions. You can tag prompts with metadata like “production” or “testing”.
Datasets are collections of example inputs and expected outputs. You can create datasets from historical requests or upload them manually. Datasets are used for testing and regression analysis. For example, you can create a dataset of 100 common user questions. Then, when you change a prompt, you can run the dataset through the new prompt and compare the results to the expected outputs.
Playground for Testing
The Playground is an interactive environment where you can test prompts without writing any code. It works similarly to OpenAI’s Playground, but with a key difference: every test you run in the Helicone Playground is automatically logged and tracked. You can switch between models (GPT-4, Claude, etc.), adjust parameters like temperature and max tokens, and see the results instantly.
This is particularly useful for rapid iteration. You can tweak a prompt, test it against a dataset, and immediately see the impact on cost and quality. The Playground also shows you the exact API call that would be made, which is helpful for debugging.
Rate Limits and Alerts Configuration
LLM APIs can be expensive if a bug causes runaway requests. Helicone allows you to set rate limits at the project level or per user. For example, you can set a limit of 100 requests per minute for your entire application. If the limit is exceeded, Helicone can either queue the requests or return an error.
Alerts are equally important. You can configure alerts to notify you via email, Slack, or webhook when certain conditions are met. Common alert conditions include:
- Error rate exceeds 5% in the last 5 minutes.
- Average latency exceeds 10 seconds.
- Daily cost exceeds a budget threshold.
- A specific user makes more than 50 requests in an hour.
Alerts help you respond to problems before they escalate.
Integrations with Major LLM Providers
Helicone supports OpenAI, Anthropic, Azure OpenAI, and several other providers. The integration process is similar for each: you change the base URL and add your Helicone auth header. Helicone also supports custom models and self-hosted models through its flexible gateway architecture.
For Anthropic, you would use https://anthropic.helicone.ai as the base URL. For Azure OpenAI, you would use a specific endpoint that Helicone provides during setup. Each integration captures the same metrics: request, response, tokens, latency, and cost.
How to Use Helicone: A Step-by-Step Workflow
Step 1: Route Your Traffic
Start by routing a single endpoint through Helicone. Choose a non-critical feature, such as a summary generator. Update your code to point to the Helicone base URL. Verify that requests appear in the dashboard. This confirms that the integration is working correctly.
Step 2: Explore the Dashboard
Spend 15 minutes exploring the dashboard. Look at the request list. Click on a single request to see the full details: the exact prompt sent, the exact response received, the token count, and the cost in dollars. This level of detail is usually impossible to get from the LLM provider directly.
Step 3: Add User and Session Tracking
Add the Helicone-User-Id header to your requests. Use the user ID from your own authentication system. Similarly, add a Helicone-Session-Id header to group requests into conversations. After adding these headers, go to the “Users” and “Sessions” tabs in the dashboard. You will see your data organized in a much more useful way.
Step 4: Create Your First Alert
Go to the “Alerts” section. Click “Create Alert”. Choose a condition, such as “Error Rate > 10% for 5 minutes”. Set the notification method to email or Slack. This alert will now run in the background. If your application starts failing, you will know immediately.
Step 5: Use HQL for a Deep Dive
Go to the “Logs” section and open the HQL query editor. Run the following query to find your most expensive requests:
SELECT user_id, SUM(cost), COUNT(*) FROM request WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY user_id ORDER BY SUM(cost) DESC LIMIT 10
This query reveals which users are costing you the most money. You might discover that a single user is making thousands of requests per day. This insight can drive decisions about rate limiting or pricing.
Step 6: Manage Prompts and Datasets
Save your most important prompts in the “Prompts” section. Create a dataset from historical requests. Use the Playground to test a new version of your prompt against that dataset. Compare the outputs side by side. This workflow ensures that your prompt changes actually improve quality without introducing regressions.
Tips for Getting the Most Out of Helicone
Start Small and Scale Gradually
Do not route all your traffic through Helicone on day one. Start with a single endpoint or a single user group. Monitor the performance impact. Helicone adds minimal latency (usually under 10ms), but it is always wise to verify this in your own environment. Once you are comfortable, route more traffic.
Use Custom Properties for Rich Filtering
Helicone allows you to add custom properties to your requests via headers. For example, you can add Helicone-Property-AppVersion: 2.1.0 or Helicone-Property-Feature: Chatbot. These properties become filterable fields in the dashboard and in HQL queries. This makes it trivial to analyze specific features or versions of your application.
Leverage the Cost Tracking Feature
One of Helicone’s most valuable features is accurate cost tracking. LLM providers often do not provide real-time cost data. Helicone calculates cost based on the actual tokens used and the model’s known pricing. Use this data to create budgets and cost reports. Share these reports with your team to foster cost awareness.
Use Datasets for Regression Testing
Whenever you update a prompt, run it against your dataset before deploying to production. Helicone will show you the new outputs alongside the old ones. Look for changes in tone, length, or accuracy. This simple step can prevent embarrassing regressions where a “minor” prompt change suddenly makes your chatbot rude or unhelpful.
Set Up Alerts for Anomalies
Do not rely on checking the dashboard manually. Set up alerts for the most common failure modes: high error rate, high latency, and high cost. Also consider setting an alert for zero requests. If your application suddenly stops making LLM calls, it might indicate a bug or an outage in your own system.
Export Data for Custom Analysis
Helicone allows you to export your request data as CSV or JSON. Use this feature to feed data into your own analytics tools, such as a data warehouse or a BI platform. You can also use the API to pull data programmatically. This is useful for building custom dashboards or performing advanced statistical analysis.
Collaborate with Your Team
Share the dashboard with your team members. Helicone supports team accounts where multiple users can view the same project. Developers can see the impact of their changes. Product managers can track usage patterns. Operations teams can monitor alerts. The shared visibility reduces finger-pointing and accelerates debugging.
Keep Your Helicone API Key Secure
Never expose your Helicone API key in client-side code, such as a mobile app or a browser. If you need to use Helicone from a client, set up a backend proxy that adds the key. Alternatively, use Helicone’s client-side SDK which handles this securely. A leaked key could allow someone to send requests through your account, incurring costs.
Monitor the Helicone Status Page
Helicone itself is a service that must be running for your requests to be logged. Bookmark the Helicone status page. If you ever notice that requests are not appearing in the dashboard, check the status page first. Helicone has a strong uptime record, but no service is perfect.
Conclusion
Helicone is more than just a logging tool. It is a comprehensive platform for building, monitoring, and optimizing LLM-powered applications. By routing your requests through Helicone, you gain visibility that would otherwise require building custom infrastructure. The dashboard, HQL queries, prompt management, datasets, and alerts work together to give you full control over your AI pipeline.
Start with a simple integration, explore the dashboard, and gradually adopt the more advanced features like HQL and datasets. Your future self will thank you when a bug appears, and you can instantly see exactly what happened. Helicone transforms LLM development from a black box into a transparent, manageable process.
Visit https://helicone.ai/ to sign up and begin your journey toward better AI observability today.
Helicone AI Gateway & LLM Observability
Route, debug, and analyze AI applications with observability and gateway tools.