Skip to content

Conversation

@SinghVikram97
Copy link
Contributor

@SinghVikram97 SinghVikram97 commented Sep 22, 2025

What type of PR is this?
Cleanup and Fix

Which issue does this PR fix:
Fix HandleReconcileError function to enable exponential backoff.
Incase of transient errors from lattice apis keep static 10 seconds delay.

Exponential backoff is problematic here because for operations like target group deletion where target draining takes time, this results in unnecessarily long delays in reconciliation.

What does this PR do / Why do we need it:
According to the docs

 If the returned error is non-nil: the Result is ignored and the request will requeued using exponential backoff unless error is a terminal error If the error is nil: the returned Result has a non-zero result RequeueAfter, the request will be requeued after the specified duration. 
  • Cleanup deprecated Requeue
  • Remove returning Result if the returned error is non-nil
  • In reconcile.go we check if err is of type RetryError but since RetryError is just an alias for err type, all error types satisfy that if condition which leads to 20 second requeue delay for all errors. Refactored NewRetryError function to return RequeueNeededAfter error type instead.
  • Remove RetryErr which created err with Lattice_Retry error message with NewRetryError function call which returns a custom error type to fix type comparisons.

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:
#462

Testing done on this change:

make presubmit and make e2e-test run successfully.

Automation added to e2e:

N/A

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this PR introduce any user-facing change?:

No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@rlymbur
Copy link
Contributor

rlymbur commented Sep 23, 2025

Can you add unit tests and an integration test to validate that the backoff is working as expected?

reason: reason,
}
// Retry errors caused by Lattice APIs which need high requeueAfter seconds
func NewRetryError() *RequeueNeededAfter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the non-retriable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are retriable. For example when target group status is create in progress. But the default exponential backoff starts with 5ms which might be too aggressive for lattice apis. So that's why kept it as 10 seconds.

@rlymbur rlymbur enabled auto-merge September 24, 2025 23:42
@rlymbur rlymbur added this pull request to the merge queue Sep 24, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 24, 2025
@rlymbur rlymbur added this pull request to the merge queue Sep 25, 2025
github-merge-queue bot pushed a commit that referenced this pull request Sep 25, 2025
* exponential backoff for reconciler requests * fix typo in comments --------- Co-authored-by: vbedi <vbedi@amazon.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 25, 2025
@rlymbur rlymbur added this pull request to the merge queue Sep 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 25, 2025
@rlymbur rlymbur added this pull request to the merge queue Sep 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 25, 2025
@SinghVikram97 SinghVikram97 added this pull request to the merge queue Sep 26, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Sep 26, 2025
@SinghVikram97 SinghVikram97 added this pull request to the merge queue Sep 26, 2025
@SinghVikram97 SinghVikram97 removed this pull request from the merge queue due to a manual request Sep 26, 2025
@SinghVikram97 SinghVikram97 added this pull request to the merge queue Sep 26, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 26, 2025
@SinghVikram97 SinghVikram97 added this pull request to the merge queue Sep 28, 2025
Merged via the queue into main with commit e5fb7ab Sep 28, 2025
5 checks passed
@rlymbur rlymbur deleted the vbedi_issue_462 branch October 16, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants