- Notifications
You must be signed in to change notification settings - Fork 10
CLOUDP-324440 Save rs member ids #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add6a32
to 8ae690c
Compare f76fc3a
to 1da9a93
Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue with migrating a MongoDBMultiCluster resource to a new project by saving and restoring replica set member IDs in an annotation. It also updates test cases and refactors various functions to use the new GetReplicaSets naming convention.
- Added a new constant for the replica set member IDs annotation.
- Updated reconciliation logic to save and use member IDs during project changes.
- Refactored tests and Om client methods for consistent API usage.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
pkg/util/constants.go | Introduces a new constant for saving RS member IDs. |
docker/mongodb-kubernetes-tests/tests/multicluster/multi_cluster_scale_up_cluster.py | Updates test scenarios to account for scaled member counts and project changes. |
controllers/operator/mongodbmultireplicaset_controller.go | Adds logic to retrieve and save RS member IDs from OM and updates the signature for saving the last achieved spec. |
controllers/operator/mongodbmultireplicaset_controller_test.go | Adds new test cases to verify annotation persistence and proper member ID usage. |
controllers/om/* | Adds new methods (e.g., MemberIds and GetReplicaSetMemberIds) and updates references to use the new GetReplicaSets API. |
RELEASE_NOTES.md | Documents the bug fix related to project migration for MongoDBMultiCluster resources. |
Comments suppressed due to low confidence (1)
controllers/operator/mongodbmultireplicaset_controller.go:815
- [nitpick] The inner variable 'processIds' is a nested map; consider renaming it (for example to 'allMemberIds') to more clearly reflect its structure and improve code readability.
func getReplicaSetProcessIdsFromAnnotation(mrs mdbmultiv1.MongoDBMultiCluster) map[string]int {
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider checking if rsMemberIds is nil or if the marshalled byte slice is empty instead of comparing to the string "null", as this would improve clarity and robustness if the serialization behavior changes.
if string(rsMemberIdsBytes) != "null" { | |
if rsMemberIds != nil && len(rsMemberIdsBytes) > 0 { |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
| |||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use conn.GetReplicaSetMemberIds()
here instead? Couldn't that save us the whole annotation work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline - its the migration case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucian-tosa Can you please explain the "migration case" that requires use of annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: let's add a comment here on why this can happen (the migration case) for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the migration case is detailed in the PR description. The steps that lead to a broken deployment is:
- scale the deployment in a way that results in non-sequential ids
- move the deployment to a new project.
I also added some other considerations and edge cases to the PR description.
@anandsyncs let me know if I should explain something further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally - awesome fix! 👏
But I've left one blocking comment regarding not adding another method to the OM's connection interface. Consider the rest optional.
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
processIds := make(map[string]map[string]int) | ||
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok { | ||
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil { | ||
return map[string]int{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider making this a fatal error? If there is some garbage in the annotation, isn't it too risky to ignore its content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not - we should at least log this very clearly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case where a broken annotation would be fatal is if the deployment is being migrated to a new OM project, and the annotation was manually changed at the same time.
I can't think of any other reason for why the annotation can have garbage, I wouldn't make this fatal. If there is garbage in the annotation, the operator will rewrite anyway.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is garbage in annotation that means someone or something changed it and in this case we should probably error the reconciliation as the annotation is there, but we don't have correct information from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's like, check spec.version and find garbage in it - should we ignore it and continue reconciling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, done
docker/mongodb-kubernetes-tests/tests/multicluster/multi_cluster_scale_up_cluster.py Show resolved Hide resolved
@@ -97,7 +131,6 @@ def test_ops_manager_has_been_updated_correctly_before_scaling(): | |||
| |||
@pytest.mark.e2e_multi_cluster_scale_up_cluster | |||
def test_scale_mongodb_multi(mongodb_multi: MongoDBMulti, member_cluster_clients: List[MultiClusterClient]): | |||
mongodb_multi.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to not need load() you must change mongodb_multi's fixture scope to "function" to load it every time it's injected into function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -139,3 +159,59 @@ def test_ops_manager_has_been_updated_correctly_after_scaling(): | |||
def test_replica_set_is_reachable(mongodb_multi: MongoDBMulti, ca_path: str): | |||
tester = mongodb_multi.tester() | |||
tester.assert_connectivity(opts=[with_tls(use_tls=True, ca_path=ca_path)]) | |||
| |||
| |||
# From here on, the tests are for verifying that we can change the project of the MongoDBMulti resource even with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find this comment a bit unnecessary, but you don't need to remove it.
It would be better to wrap such a set of test functions into a class and name it properly, e.g. TestNonSequentialMemberIdsInReplicaSet
or sth like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
assert oldRsMembers == newRsMembers | ||
| ||
| ||
def sts_are_ready(mdb: MongoDBMulti, member_cluster_clients: List[MultiClusterClient], members: List[int]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to me like something we must already have somewhere in our tests. Or is it doing something different than just waiting for sts to be ready? If yes, then pls name it so we can see in the name why it's different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't already have it. But I will add it as part of the MongoDBMulti class so we can reuse it at least for that resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
controllers/om/mockedomclient.go Outdated
@@ -141,6 +141,22 @@ func (oc *MockedOmConnection) ConfigureProject(project *Project) { | |||
oc.context.OrgID = project.OrgID | |||
} | |||
| |||
func (oc *MockedOmConnection) GetReplicaSetMemberIds() (map[string]map[string]int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR: Given that we are just copy pasting the the whole functionality, we can simply wrap the omclient with this mock client and only override what is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this method entirely following Lukasz's suggestion
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
| |||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucian-tosa Can you please explain the "migration case" that requires use of annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome and simple code! Great tests and straight-forward. 2 blocking things and we are good to go
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
| |||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: let's add a comment here on why this can happen (the migration case) for clarity
processIds := make(map[string]map[string]int) | ||
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok { | ||
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil { | ||
return map[string]int{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not - we should at least log this very clearly
docker/mongodb-kubernetes-tests/tests/multicluster/multi_cluster_scale_up_cluster.py Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
This PR fixes the behaviour of changing the project in a MongoDBMultiCluster resource. Multicluster resources have a different method of computing the member ids of each process in a replicaset (
replicasets
field in AC). Because of that, it is possible to get non-sequential ids by doing scaling operations. For example, we start from thisIf we now scale the first cluster up we end up with:
If we now change the project, the operator will find an empty AC and will recalculate the ids it of the replicaset members.
But now, member
0-2
has id 2 instead of 3, and member1-0
has id 3 instead of 2. This will instruct the agents to change the members of the replicaset, but this change will be rejected, leaving the deployment stuck. The operator can't retrieve the previous member ids.In this PR we will now save the member ids we achieved in an annotation, and whenever there is an empty AC in OM, we try to read the annotation in case the deployment is being migrated to a new project.
Other considerations
MongoDBMultiCluster
is backed up with annotations and then re-applied to a new project, the annotation will be used to compute the member ids. This is not an issue, there is no restrictions on these ids (apart from being an integer and unique). After the initial deployment, any scaling operations will simply compute new sequential ids.Proof of Work
Added unit tests to check that annotations are created and updated successfully, as well as using the annotation when building the replicaset.
Also added an E2E test to ensure the resource will become ready.
Checklist
Reminder (Please remove this when merging)