-
Couldn't load subscription status.
- Fork 642
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I searched the issues and found no similar issues.
KubeRay Component
kubectl-plugin
What happened + What you expected to happen
- The kubectl-ray plugin currently sets
spec.jobIdwith a separate GET -> UPDATE after creating the RayJob. This can race with the kuberay operator's status updates and cause errors as:
Error: Error occurred when trying to add job ID to RayJob: Operation cannot be fulfilled on rayjobs.ray.io "27e93fb9": the object has been modified; please apply your changes to the latest version and try again - The job ID can be generated and set before create / apply to avoid the conflict. Additionally, ray job submit is started asynchronously in a goroutine, which can lead to
exec: not started; starting the process synchronously resolves this.
Reproduction script
Add a small sleep between GET and UPDATE:
options.RayJob, err = k8sClients.RayClient().RayV1().RayJobs(options.namespace).Get(ctx, options.RayJob.GetName(), v1.GetOptions{}) if err != nil { return fmt.Errorf("Failed to get latest version of Ray job: %w", err) } options.RayJob.Spec.JobId = rayJobID time.Sleep(30 * time.Second) _, err = k8sClients.RayClient().RayV1().RayJobs(options.namespace).Update(ctx, options.RayJob, v1.UpdateOptions{FieldManager: util.FieldManager}) if err != nil { return fmt.Errorf("Error occurred when trying to add job ID to RayJob: %w", err) } and then patch the RayJob in that sleep window:
kubectl patch rayjob -n ray-job 6d3ef4a3 \ --type='merge' \ -p '{"metadata":{"annotations":{"repro/tick":"'$(date +%s%N)'"}}}' \ >/dev/null Anything else
Happens regularly.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working