-
Notifications
You must be signed in to change notification settings - Fork 476
[Atlassian JIRA and Atlassian Conflunce] Update Cursor Logic to Remove Duplicate Events #13665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Atlassian JIRA and Atlassian Conflunce] Update Cursor Logic to Remove Duplicate Events #13665
Conversation
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohitjha-elastic, While adding fingerprint also works for this issue, can you check if it can be solved at cursor level itself without having to propagate the duplicate event into the ingest pipeline and then ignoring it?
Agree with @kcreddy here, 1st option would be to try and optimise better cursor date handling. If that's not possible we should do fingerprints. |
@kcreddy @ShourieG Also, storing current timestamp in cursor does not seem feasible that might miss the data in case of API failure. Since the filter fetches data on or after the start time, the only potential duplication after each interval would be those events (It must be only few) with a timestamp exactly matching the cursor timestamp. In that case, this approach seems quite reasonable. Let me know what you think. |
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
audit |
2762.43 | 2159.83 | -602.6 (-21.81%) | 💔 |
Package atlassian_jira
👍(0) 💚(0) 💔(1)
Expand to view
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
audit |
3194.89 | 2257.34 | -937.55 (-29.35%) | 💔 |
To see the full report comment with /test benchmark fullreport
@mohitjha-elastic The problem with fingerprint is that the fields that we believe are unique may not always be that way and often need to add more fields because users complain of missing data. It might not be bad to just fingerprint on |
1. Remove fingerprint. 2. Update cursor logic to add 1ms to it to remove duplicate events.
d79b1a2
to
eb4fc49
Compare
Thanks, @kcreddy for clarifying the use case of adding fingerprints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest the following for the commit message; line breaks and no markdown in git commit messages, otherwise unaltered.
atlassian_jira and atlassian_cloud: update cursor logic to remove duplicate events.
After reviewing the Atlassian Jira[1] and Atlassian Confluence[2] API
documentation, It has been noticed that the APIs return data on or
after the specified start date. Currently, the date from the response
body is getting saved into the cursor. Because of this, when the
request is being made in the next interval, it includes data we've
already fetched—leading to duplicate events being published to
Elasticsearch. Hence, updated the cursor logic, added 1ms to it would
fetch the data afterwards.
This change has been tested on the data available in the test folder as
well as using the mock server.
[1]https://developer.atlassian.com/cloud/jira/platform/rest/v3/api-group-audit-records/#api-rest-api-3-auditing-record-get
[2]https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-audit/#api-wiki-rest-api-audit-get
packages/atlassian_confluence/data_stream/audit/agent/stream/httpjson.yml.hbs
Outdated
Show resolved
Hide resolved
packages/atlassian_jira/data_stream/audit/agent/stream/httpjson.yml.hbs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please resolve @efd6 comments before merging.
Add final new line in http json files.
04c7cb6
to
6955823
Compare
💚 Build Succeeded
History
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Package atlassian_confluence - 1.29.1 containing this change is available at https://epr.elastic.co/package/atlassian_confluence/1.29.1/ |
Package atlassian_jira - 1.30.2 containing this change is available at https://epr.elastic.co/package/atlassian_jira/1.30.2/ |
Proposed Commit Message
Checklist
changelog.yml
file.How to test this PR locally
To test atlassian_jira integration
Clone integrations repo.
Install the elastic package locally.
Start the elastic stack using the elastic package.
Move to integrations/packages/atlassian_jira directory.
Run the following command to run tests.
elastic-package test -v
To test atlassian_confluence integration
Clone integrations repo.
Install the elastic package locally.
Start the elastic stack using the elastic package.
Move to integrations/packages/atlassian_confluence directory.
Run the following command to run tests.
elastic-package test -v
Related issues