Skip to content

Conversation

shmsr
Copy link
Member

@shmsr shmsr commented Jan 28, 2025

Proposed commit message

This is a new integration for OpenAI to collect usage metrics using OpenAI's newly launched usage API. It uses the admin key for programmatic access to usage API.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

Run tests using elastic-package and manual verification with OpenAI usage dashboard (https://platform.openai.com/usage)

Related issues

Screenshots

usage

@shmsr shmsr self-assigned this Jan 28, 2025
@shmsr shmsr added the Integration:openai OpenAI label Jan 28, 2025
@andrewkroh andrewkroh added the New Integration Issue or pull request for creating a new integration package. label Jan 28, 2025
@shmsr
Copy link
Member Author

shmsr commented Jan 28, 2025

Update: 28th Jan, 2025

I am still working on the dashboard but rest seems to be more or less done. Except dashboards, other things can be reviewed.

@shmsr shmsr requested a review from a team January 28, 2025 12:04
@shmsr shmsr marked this pull request as ready for review January 28, 2025 22:33
@@ -46,6 +46,10 @@ processors:
field: openai.audio_speeches.num_model_requests
target_field: openai.base.num_model_requests
ignore_missing: true
- rename:
field: openai.audio_speeches.object
target_field: openai.base.object
Copy link
Contributor

@agithomas agithomas Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the word base is a great choice here. Can summary be a better option? I assume the base represents the overall summary of metric within a timeslice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, base holds all the common fields. Not summary.

For example, project_id is common across all data_streams.

@agithomas
Copy link
Contributor

agithomas commented Jan 29, 2025

I refer to the screenshot attached to the issue. Adding a few points for your consideration.

  1. Kindly align the axis label text similar to other gen-ai integration dashboards. It would be best, i think, to have an alignment with the color palette used as well.

  2. When we have a time-series dashboard, please see if there is a need to have "over time" in the panel title.

  3. The need of set underline for the dashboard area splitter (markdown) for texts such as Image Models, Text to Speech Models? Would be good to have a similarity with other gen-ai integration dashboards

  4. For the panels under the sections image models and speech models, kindly correct the y-axis labels

  5. It could be possible in prod environments, the token values can become large. So, for the "single metrics" used at the top of the dashboards, kindly selecting the option "Compact values", if not selected already

image

@shmsr
Copy link
Member Author

shmsr commented Jan 29, 2025

@agithomas Yes. Still working on refinements of dashboard. From the issue that Daniela created and also incorporating the comments given by all in AWS Bedrock PR. Will be done soon!

@shmsr
Copy link
Member Author

shmsr commented Jan 29, 2025

Done!

@shmsr
Copy link
Member Author

shmsr commented Jan 29, 2025

I refer to the screenshot attached to the issue. Adding a few points for your consideration.

  1. Kindly align the axis label text similar to other gen-ai integration dashboards. It would be best, i think, to have an alignment with the color palette used as well.
  2. When we have a time-series dashboard, please see if there is a need to have "over time" in the panel title.
  3. The need of set underline for the dashboard area splitter (markdown) for texts such as Image Models, Text to Speech Models? Would be good to have a similarity with other gen-ai integration dashboards
  4. For the panels under the sections image models and speech models, kindly correct the y-axis labels
  5. It could be possible in prod environments, the token values can become large. So, for the "single metrics" used at the top of the dashboards, kindly selecting the option "Compact values", if not selected already

Took care of all. For point 3, actually those are clickable links and that's why the underline is there. I didn't put an underline, it is just clickable that takes them to relevant docs.

@shmsr shmsr requested review from daniela-elastic and a team January 29, 2025 10:13
@muthu-mps
Copy link
Contributor

  • The panel name Model invocation frequency - by model can be changed to Invocation frequency - by model WDYT?

@shmsr
Copy link
Member Author

shmsr commented Jan 30, 2025

  • The panel name Model invocation frequency - by model can be changed to Invocation frequency - by model WDYT?

Yes, makes sense. Thanks!

Copy link
Contributor

@muthu-mps muthu-mps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ishleenk17
Copy link
Member

@shmsr : I'll be reviewing the code today.
Couple starter comments looking at the dashboard:

  1. Do customers benefit from knowing about API Keys, Project ID's, object types in the dashboard ? Is it worth putting this info there? This info being available in the documents should suffice ?
  2. First 4 panels (invocation frequesncy, token usage....) are generic across models ?
  3. The timestamp is per day, hope that's a discussed value.
  4. We can remove the word "Dashboard" in the title of the dashboard. Just have "OpenAI".

@shmsr
Copy link
Member Author

shmsr commented Jan 31, 2025

@shmsr : I'll be reviewing the code today. Couple starter comments looking at the dashboard:

  1. Do customers benefit from knowing about API Keys, Project ID's, object types in the dashboard ? Is it worth putting this info there? This info being available in the documents should suffice ?
  2. First 4 panels (invocation frequesncy, token usage....) are generic across models ?
  3. The timestamp is per day, hope that's a discussed value.
  4. We can remove the word "Dashboard" in the title of the dashboard. Just have "OpenAI".
  1. See the screenshot. So, OpenAI usage dashboard does contain top user, projects, api keys (they have mapping of id->name, that's why they can show names; but we have only have ids that they can map or add a processor to add (id->name) mapping in the document and make a visualization out of it) that are using and also it seems a good metric to know who are top n users.
Screenshot 2025-01-31 at 12 00 29 PM
  1. Tokens is not a common concept for all models. Like for image models, no concept of tokens. But it is placed there to maintain consistency across Elastic's LLM integrations. But later, you see the proper seggragation — token usage, image model, audio speeches, etc.

  2. Yes. timestamp is per day wise, because, OpenAI themselves keep the bucket_width of daily. That is data is aggregated over a day; it is configurable though like minute wise, and hour wise. But also the older API used to give daily data. In their dashboard as well, it's daily data. So, kept it like that only.

  3. Sure, on it!

@ishleenk17
Copy link
Member

To maintain consistency with other LLM Integrtaions, the agrregated values can be shown for
Total Tokens, Total Invocations. We need not have input output divisioning here in the top panel?
Also we already have them in the Token usage where we are showing input and output.

Do you think its better to have the name as "Overall Invocation Frequency" for the topmost generic panel ?
Since thats the cumulated one. And this is a stacked graph, right ?

interval: {{interval}}
{{#if enable_request_tracer}}
resource.tracer.filename: "../../logs/cel/openai-audio-speeches-http-request-trace-*.ndjson"
resource.tracer.maxbackups: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my knowledge, what is this used for ?

Copy link
Member Author

@shmsr shmsr Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging. Suppose there's some issue. So, if you enable this feature, you can see the raw request made by CEL execution and also the response.

< request 1 >
< response 1 >
< request 2 >
< response 2 >
...

Helps in debugging.

"start_time": [string(int(state.initial_start_time))],
"page": state.page != null ? [state.page] : [],
"bucket_width": ["{{ bucket_width }}"],
"group_by": ["project_id,user_id,api_key_id,model"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it like saying if you would have had these fields in the document, these would have become your dimension fields to avoid duplication ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's suppose we consider this example: https://platform.openai.com/docs/api-reference/usage/completions_object

See the response?

Unless you group with project_id,user_id,api_key_id,model, the values for those fields is going to be empty as they are not grouped.

For example this is a real response when dont use group by:

{
  "object": "page",
  "data": [
    {
      "object": "bucket",
      "start_time": 1738195200,
      "end_time": 1738281600,
      "results": [
        {
          "object": "organization.usage.completions.result",
          "input_tokens": 22,
          "output_tokens": 124,
          "num_model_requests": 1,
          "project_id": null,
          "user_id": null,
          "api_key_id": null,
          "model": null,
          "batch": null,
          "input_cached_tokens": 0,
          "input_audio_tokens": 0,
          "output_audio_tokens": 0
        }
      ]
    },
    {
      "object": "bucket",
      "start_time": 1738281600,
      "end_time": 1738314727,
      "results": []
    }
  ],
  "has_more": false,
  "next_page": null
}

See results? They are nulls.

Yes, they can be dimension fields. But here, we did not enabled TSDB as earlier discussed. Historical data support is there, so we are not enabling the same. Also, we chose logs-* here as discussed earlier.

state.url + "?" + {
"start_time": [string(int(state.initial_start_time))],
"page": state.page != null ? [state.page] : [],
"bucket_width": ["{{ bucket_width }}"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say customer changes this aggregation bucket width to say 1m, our dashboards are 1 day timestamp.
Not sure but will that hinder experience for user.
Maybe not, but thoughts ?

Copy link
Member Author

@shmsr shmsr Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-01-31 at 4 00 23 PM

So, yes it will be fine. See this screenshot for when bucket width is set to hour 1h. As it x-axis timestamp per day, the viz still properly aggregates. So, it is same as 1d's vizualization.

But yes, if someone set to minute i.e., 1m and suppose collecting 6 months of data, imagine the number of requests.

Calculate per data-stream:

  • 6 months to minutes = 262800. So this how many buckets we have to process.
  • Now when querying with bucket_width of 1m, OpenAI by default, returns 60 buckets in 1 request. So, 262800/60 = 4380.

So for 8 data_streams, 35040 requests in such short time which will definitely trigger some rate limit. In fact I have tested this as well, it takes a lot of time as OpenAI blocks so much requests in such short time i.e., 35k requests.

I will put the rate limits in another next enhancement as it is not urgent and immediate requirement when going with default or even 1h. But with 1m, there's a huge amount of data to process, so rate limit would make sense there. But then we also expect users to understand that pulling so much historical data with such granularity is definitely a lot.

Copy link
Member Author

@shmsr shmsr Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, giving the users the flexibility to pull data 1d/ 1h / 1m makes sense and that's why I have made it configurable. There might be users who want to make viz and want 1h / 1m of granularity.

Copy link
Contributor

@alaudazzi alaudazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, feel free to address them where appropriate.

@shmsr
Copy link
Member Author

shmsr commented Jan 31, 2025

To maintain consistency with other LLM Integrtaions, the agrregated values can be shown for Total Tokens, Total Invocations. We need not have input output divisioning here in the top panel? Also we already have them in the Token usage where we are showing input and output.

Do you think its better to have the name as "Overall Invocation Frequency" for the topmost generic panel ? Since thats the cumulated one. And this is a stacked graph, right ?

@ishleenk17

To maintain consistency with other LLM Integrtaions, the agrregated values can be shown for Total Tokens, Total Invocations. We need not have input output divisioning here in the top panel? Also we already have them in the Token usage where we are showing input and output.

Yes, true. But I put them so that I have space to fill. Because just 2 boxes didn't feel nice. Let me see if I can do something.

Do you think its better to have the name as "Overall Invocation Frequency" for the topmost generic panel ? Since thats the cumulated one. And this is a stacked graph, right ?

Yeah make sense. More clarity.

@shmsr
Copy link
Member Author

shmsr commented Jan 31, 2025

@ishleenk17 Can you again take a look at the screenshot? I've addressed the comments.

@ishleenk17
Copy link
Member

  1. We can have token usage by model. That's what we have done for other LLM's. Would help user to see which model consumes most tokens.
Screenshot 2025-01-31 at 4 39 41 PM
  1. Also, the "i" (info) icons for Total tokens, Totla invocations.

@andrewkroh andrewkroh added Integration:1password 1Password (Partner supported) Integration:abnormal_security Abnormal AI Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Feb 4, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@AndersonQ
Copy link
Member

@shmsr are you sure this PR is correct? It's showing 5000+ files changed. Perhaps it's missing a rebase?

@ishleenk17
Copy link
Member

ishleenk17 commented Feb 4, 2025

@shmsr are you sure this PR is correct? It's showing 5000+ files changed. Perhaps it's missing a rebase?

@AndersonQ , yes there were some changes across Integrations repo.
I have shared the details on Slack with you.

@shmsr shmsr force-pushed the openai-official-metrics branch from 4416de7 to 76b768c Compare February 4, 2025 11:41
@shmsr shmsr removed request for a team, AndersonQ and rdner February 4, 2025 11:44
Copy link
Contributor

@alaudazzi alaudazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


## Requirements

You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.

You need an OpenAI account to access the [Usage API](https://platform.openai.com/docs/api-reference/usage) with a valid [Admin key](https://platform.openai.com/settings/organization/admin-keys) to programmatic access to Usage APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmsr these two sections -- Requirements and Setup -- should remain separate

@shmsr shmsr added Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] and removed Integration:1password 1Password (Partner supported) Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Integration:abnormal_security Abnormal AI labels Feb 4, 2025
@shmsr shmsr force-pushed the openai-official-metrics branch from 76b768c to ac36185 Compare February 5, 2025 08:00
@elasticmachine
Copy link

💚 Build Succeeded

History

  • 💚 Build #21599 succeeded 76b768cefcb16bb857e9e7c948460d2378d2b016
  • 💚 Build #21467 succeeded 4416de75ee0e7affd055822b26d57fbf9b9bf6ee
  • 💚 Build #21456 succeeded 8db956170af7b2d47a6c2f2956db96318aec450b
  • 💚 Build #21452 succeeded 631821be6569b970b9c2eb24e44523723be525b0
  • 💚 Build #21443 succeeded 639c6587d4ebc8a7070eb7be345250a730f25218
  • 💚 Build #21432 succeeded 86f1ced2cf3520a189dc9c0f738bb037e60a5cab

cc @shmsr

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
71.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@shmsr shmsr merged commit fd8c7f8 into elastic:main Feb 5, 2025
4 of 5 checks passed
@elastic-vault-github-plugin-prod

Package openai - 0.1.0 containing this change is available at https://epr.elastic.co/package/openai/0.1.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:openai OpenAI New Integration Issue or pull request for creating a new integration package. Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants