Skip to content

Recreate kind cluster on every evg host reboot #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lucian-tosa
Copy link
Contributor

Summary

Evergreen hosts reboot every day or weekend (depending on your configuration). After every reboot, inter-cluster connectivity might be broken for some reason. Recreating the clusters is the only solution we have so far.
This is a systemd service that runs on every boot and recreates all clusters (including the kind-kind for single cluster tests).
Our tunnel command will now also get the kubeconfig from the host otherwise the tunnel won't open to the (new) ports of the recreated clusters.
The only action for you is to run evg_host.sh configure which will update the scripts and set up the systemd service.

Proof of Work

Tested locally, but it would be nice if someone can checkout this branch and try it for themselves.

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you checked for release_note changes?

Reminder (Please remove this when merging)

  • Please try to Approve or Reject Changes the PR, keep PRs in review as short as possible
  • Our Short Guide for PRs: Link
  • Remember the following Communication Standards - use comment prefixes for clarity:
    • blocking: Must be addressed before approval.
    • follow-up: Can be addressed in a later PR or ticket.
    • q: Clarifying question.
    • nit: Non-blocking suggestions.
    • note: Side-note, non-actionable. Example: Praise
    • --> no prefix is considered a question
@lucian-tosa lucian-tosa requested a review from a team as a code owner April 23, 2025 14:33
Copy link
Member

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am split on this change, it risks loosing data or setups... I understand the desire to fix this, but on the other hand I don't like when my environment gets cleaned up automatically when it was still in use. For example, if I had run patches against deployments or single cluster deployments my environment will be resetted.

Can you add an env var to opt-in for this re-creation? Than each engineer can decide whether they want re-creation or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants