-
Notifications
You must be signed in to change notification settings - Fork 477
packages/openai: New OpenAI integration #12494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update: 28th Jan, 2025 I am still working on the dashboard but rest seems to be more or less done. Except dashboards, other things can be reviewed. |
@@ -46,6 +46,10 @@ processors: | |||
field: openai.audio_speeches.num_model_requests | |||
target_field: openai.base.num_model_requests | |||
ignore_missing: true | |||
- rename: | |||
field: openai.audio_speeches.object | |||
target_field: openai.base.object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if the word base
is a great choice here. Can summary
be a better option? I assume the base represents the overall summary of metric within a timeslice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, base
holds all the common fields. Not summary.
For example, project_id
is common across all data_streams.
@agithomas Yes. Still working on refinements of dashboard. From the issue that Daniela created and also incorporating the comments given by all in AWS Bedrock PR. Will be done soon! |
Done! |
Took care of all. For point 3, actually those are clickable links and that's why the underline is there. I didn't put an underline, it is just clickable that takes them to relevant docs. |
|
Yes, makes sense. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@shmsr : I'll be reviewing the code today.
|
![]()
|
To maintain consistency with other LLM Integrtaions, the agrregated values can be shown for Do you think its better to have the name as "Overall Invocation Frequency" for the topmost generic panel ? |
interval: {{interval}} | ||
{{#if enable_request_tracer}} | ||
resource.tracer.filename: "../../logs/cel/openai-audio-speeches-http-request-trace-*.ndjson" | ||
resource.tracer.maxbackups: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my knowledge, what is this used for ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging. Suppose there's some issue. So, if you enable this feature, you can see the raw request made by CEL execution and also the response.
< request 1 >
< response 1 >
< request 2 >
< response 2 >
...
Helps in debugging.
"start_time": [string(int(state.initial_start_time))], | ||
"page": state.page != null ? [state.page] : [], | ||
"bucket_width": ["{{ bucket_width }}"], | ||
"group_by": ["project_id,user_id,api_key_id,model"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it like saying if you would have had these fields in the document, these would have become your dimension fields to avoid duplication ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's suppose we consider this example: https://platform.openai.com/docs/api-reference/usage/completions_object
See the response?
Unless you group with project_id,user_id,api_key_id,model
, the values for those fields is going to be empty as they are not grouped.
For example this is a real response when dont use group by:
{
"object": "page",
"data": [
{
"object": "bucket",
"start_time": 1738195200,
"end_time": 1738281600,
"results": [
{
"object": "organization.usage.completions.result",
"input_tokens": 22,
"output_tokens": 124,
"num_model_requests": 1,
"project_id": null,
"user_id": null,
"api_key_id": null,
"model": null,
"batch": null,
"input_cached_tokens": 0,
"input_audio_tokens": 0,
"output_audio_tokens": 0
}
]
},
{
"object": "bucket",
"start_time": 1738281600,
"end_time": 1738314727,
"results": []
}
],
"has_more": false,
"next_page": null
}
See results
? They are nulls.
Yes, they can be dimension fields. But here, we did not enabled TSDB as earlier discussed. Historical data support is there, so we are not enabling the same. Also, we chose logs-* here as discussed earlier.
state.url + "?" + { | ||
"start_time": [string(int(state.initial_start_time))], | ||
"page": state.page != null ? [state.page] : [], | ||
"bucket_width": ["{{ bucket_width }}"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say customer changes this aggregation bucket width to say 1m, our dashboards are 1 day timestamp.
Not sure but will that hinder experience for user.
Maybe not, but thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

So, yes it will be fine. See this screenshot for when bucket width is set to hour 1h
. As it x-axis timestamp per day, the viz still properly aggregates. So, it is same as 1d
's vizualization.
But yes, if someone set to minute
i.e., 1m
and suppose collecting 6 months
of data, imagine the number of requests.
Calculate per data-stream:
- 6 months to minutes = 262800. So this how many buckets we have to process.
- Now when querying with bucket_width of
1m
, OpenAI by default, returns 60 buckets in 1 request. So,262800/60
= 4380.
So for 8 data_streams, 35040 requests in such short time which will definitely trigger some rate limit. In fact I have tested this as well, it takes a lot of time as OpenAI blocks so much requests in such short time i.e., 35k requests.
I will put the rate limits in another next enhancement as it is not urgent and immediate requirement when going with default or even 1h
. But with 1m
, there's a huge amount of data to process, so rate limit would make sense there. But then we also expect users to understand that pulling so much historical data with such granularity is definitely a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, giving the users the flexibility to pull data 1d
/ 1h
/ 1m
makes sense and that's why I have made it configurable. There might be users who want to make viz and want 1h
/ 1m
of granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, feel free to address them where appropriate.
Yes, true. But I put them so that I have space to fill. Because just 2 boxes didn't feel nice. Let me see if I can do something.
Yeah make sense. More clarity. |
@ishleenk17 Can you again take a look at the screenshot? I've addressed the comments. |
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
@shmsr are you sure this PR is correct? It's showing 5000+ files changed. Perhaps it's missing a rebase? |
@AndersonQ , yes there were some changes across Integrations repo. |
4416de7
to
76b768c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
## Requirements | ||
|
||
You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. | ||
|
||
You need an OpenAI account to access the [Usage API](https://platform.openai.com/docs/api-reference/usage) with a valid [Admin key](https://platform.openai.com/settings/organization/admin-keys) to programmatic access to Usage APIs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmsr these two sections -- Requirements and Setup -- should remain separate
76b768c
to
ac36185
Compare
💚 Build Succeeded
History
cc @shmsr |
|
Package openai - 0.1.0 containing this change is available at https://epr.elastic.co/package/openai/0.1.0/ |
Proposed commit message
This is a new integration for OpenAI to collect usage metrics using OpenAI's newly launched usage API. It uses the admin key for programmatic access to usage API.
Checklist
changelog.yml
file.How to test this PR locally
Run tests using elastic-package and manual verification with OpenAI usage dashboard (https://platform.openai.com/usage)
Related issues
Screenshots