Skip to content

Conversation

@jsoriano
Copy link
Member

@jsoriano jsoriano commented Jul 12, 2024

Enable failure store and check if it has any documents after executing system tests.

Failure store is used if the version of the stack is 8.15.0 or newer. In lower versions there are unexpected failures on tests.

Fixes #1832.

@jsoriano jsoriano self-assigned this Jul 12, 2024
@jsoriano jsoriano force-pushed the detect-errors-in-failure-store branch from 2b6982d to dfbb951 Compare July 12, 2024 18:36
@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@jsoriano jsoriano force-pushed the detect-errors-in-failure-store branch from 1f55369 to 42f9b52 Compare July 18, 2024 11:09
@jsoriano
Copy link
Member Author

test integrations

@jsoriano jsoriano marked this pull request as ready for review July 18, 2024 12:16
@jsoriano jsoriano requested review from a team and flash1293 July 18, 2024 12:16
@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@jsoriano
Copy link
Member Author

Failures in CI seem legit.

@jsoriano jsoriano marked this pull request as draft July 18, 2024 14:43
@jsoriano
Copy link
Member Author

Ok, it looks like even when the failure store is available in older versions, it has some issues with time series before 8.15. Reopening for review after enabling failure store starting on 8.15, and pinning the stack version for the related test package.

@jsoriano jsoriano marked this pull request as ready for review July 18, 2024 15:30
@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

Copy link
Contributor

@jlind23 jlind23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but waiting for @flash1293 review and test anyway

@flash1293
Copy link
Contributor

The code looks good to me, however I'm not sure how to test locally anymore - I used to

elastic-package test system -v --setup --config-file ./my/config.yml, then elastic-package test system -v --no-provision.

The first command still works and finds the docs in the failure store, however I don't know how to see the test failing locally (as the setup routine is not running the validation bits). Seems like this code was recently restructured.

Not a blocker to merge though, I read through the code and it seems to work exactly like the ignored docs stuff.

@jsoriano
Copy link
Member Author

@flash1293 do you mean that it doesn't fail when running with --no-provision? It should fail the same as when running directly with elastic-package test system -v.

@flash1293
Copy link
Contributor

It never finds any hits when running with --no-provision and doesn't terminate

@jsoriano
Copy link
Member Author

It never finds any hits when running with --no-provision and doesn't terminate

Have you tried with elastic-package test system -v? Could you share the package you are testing with?

@flash1293
Copy link
Contributor

@jsoriano When I'm running just with test system it doesn't get to that point because it fails during teardown:

Error: failed to tear down system runner: there was an apply error: can't remove the package: could not remove package; API status code = 400; response body = {"statusCode":400,"error":"Bad Request","message":"Unable to remove package with existing package policy(s) in use by agent(s)"}

I'm testing with the test/packages/false_positives/failure_store package from this PR.

I don't think it's something wrong with this PR, seems like a general thing.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I think it was just some stale state in my local stack. Tested again with a fresh stack and it works as expected 👍

One nit: Maybe we should show more details of the failure than just the error message (e.g. which processor was involved and so on) to make it easier to debug what's happening.

@jsoriano
Copy link
Member Author

Maybe we should show more details of the failure than just the error message (e.g. which processor was involved and so on) to make it easier to debug what's happening.

Thanks @flash1293 for the feedback. I have pushed another commit including more information in the error messages, they include the error type, the processor and the pipelines that caused the issue:

--- Test results for package: failure_store - START --- FAILURE DETAILS: failure_store/test fail: [0] illegal_argument_exception: unable to convert [eight] to integer (processor: convert, pipelines: logs-failure_store.test-0.0.1) [1] illegal_argument_exception: unable to convert [five] to integer (processor: convert, pipelines: logs-failure_store.test-0.0.1) [2] illegal_argument_exception: unable to convert [seven] to integer (processor: convert, pipelines: logs-failure_store.test-0.0.1) [3] illegal_argument_exception: unable to convert [six] to integer (processor: convert, pipelines: logs-failure_store.test-0.0.1) 

Additionally, the stack trace is logged at the debug level when collecting results.

Wdyt? Is it too much?

@jsoriano jsoriano requested a review from flash1293 July 23, 2024 15:59
@flash1293
Copy link
Contributor

Love it! Thanks for adding

@jsoriano
Copy link
Member Author

test integrations

@elasticmachine
Copy link
Collaborator

Created or updated PR in integrations repository to test this version. Check elastic/integrations#10492

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @jsoriano

@jsoriano jsoriano merged commit 1fbc6f7 into elastic:main Jul 24, 2024
@jsoriano jsoriano deleted the detect-errors-in-failure-store branch July 24, 2024 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants