- Notifications
You must be signed in to change notification settings - Fork 605
Separate operator workload nodegroup #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
29 commits Select commit Hold shift + click to select a range
c74baf2 NodeGroup spot instances
vishalbollu f4fd69c Update cluster-autoscaler.yaml
deliahu abfe18f Update autoscaler to version 1.16
vishalbollu a6a096b Merge branch 'spot-instances' into separate-operator-workload-nodegroup
vishalbollu fdc8201 Calculate allocatable resources more accurately
vishalbollu 6f12aa9 Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu b0e0fa6 Separate nodegroups
vishalbollu 12ee06f Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu 3268382 Add desired instances
vishalbollu 8d4ea32 Minor cleanup
vishalbollu e952392 Remove debug statements
vishalbollu 607c545 Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu 351e68b Remove more debugging helpers
vishalbollu 58e4933 Reset go.mod
vishalbollu c56ca3e Remove more echo statements
vishalbollu cdf862e Remove unnecessary boto3 dependency
vishalbollu 1f18d52 Address some PR comments and fix linting
vishalbollu f90f921 Remove InternalClusterConfig
deliahu 2703944 Address more PR comments
vishalbollu bd24c1c Separate internal cluster config
deliahu a8c16f4 Change cortex internal cluster path for dev to be in the dev directory
vishalbollu fad20f4 Update config.md docs
vishalbollu 96005c1 Change config map key name
vishalbollu 5848fbe Remove outdated comment and minor refactor
vishalbollu d37914c Fix formatting
deliahu acf0058 Update api_workload.go
deliahu 19fe4ff Update memory_capacity.go
deliahu 3f9a62f Update metrics-server.yaml
deliahu 1c40e7e Merge branch 'master' into separate-operator-workload-nodegroup
vishalbollu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| # Copyright 2019 Cortex Labs, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| | ||
| apiVersion: eksctl.io/v1alpha5 | ||
| kind: ClusterConfig | ||
| | ||
| metadata: | ||
| name: $CORTEX_CLUSTER_NAME | ||
| region: $CORTEX_REGION | ||
| version: "1.14" | ||
| | ||
| nodeGroups: | ||
| - name: ng-cortex-operator | ||
| instanceType: t3.medium | ||
| minSize: 2 | ||
| maxSize: 2 | ||
| desiredCapacity: 2 | ||
| ami: auto | ||
| iam: | ||
| withAddonPolicies: | ||
| autoScaler: true | ||
| kubeletExtraConfig: | ||
| kubeReserved: | ||
| cpu: 150m | ||
| memory: 300Mi | ||
| ephemeral-storage: 1Gi | ||
| kubeReservedCgroup: /kube-reserved | ||
| systemReserved: | ||
| cpu: 150m | ||
| memory: 300Mi | ||
| ephemeral-storage: 1Gi | ||
| evictionHard: | ||
| memory.available: 200Mi | ||
| nodefs.available: 5% | ||
| | ||
| - name: ng-cortex-worker | ||
| instanceType: $CORTEX_INSTANCE_TYPE | ||
| minSize: $CORTEX_MIN_INSTANCES | ||
| maxSize: $CORTEX_MAX_INSTANCES | ||
| desiredCapacity: $CORTEX_DESIRED_INSTANCES | ||
| ami: auto | ||
| iam: | ||
| withAddonPolicies: | ||
| autoScaler: true | ||
| tags: | ||
| k8s.io/cluster-autoscaler/enabled: 'true' | ||
| k8s.io/cluster-autoscaler/node-template/label/nvidia.com/gpu: 'true' | ||
| k8s.io/cluster-autoscaler/node-template/taint/dedicated: nvidia.com/gpu=true | ||
| labels: | ||
| lifecycle: Ec2Spot | ||
| workload: "true" | ||
| nvidia.com/gpu: 'true' | ||
| taints: | ||
| nvidia.com/gpu: "true:NoSchedule" | ||
| workload: "true:NoSchedule" | ||
| kubeletExtraConfig: | ||
| kubeReserved: | ||
| cpu: 150m | ||
| memory: 300Mi | ||
| ephemeral-storage: 1Gi | ||
| kubeReservedCgroup: /kube-reserved | ||
| systemReserved: | ||
| cpu: 150m | ||
| memory: 300Mi | ||
| ephemeral-storage: 1Gi | ||
| evictionHard: | ||
| memory.available: 200Mi | ||
| nodefs.available: 5% |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| import requests | ||
| import sys | ||
| import re | ||
| import os | ||
| import pathlib | ||
| import json | ||
| import yaml | ||
| | ||
| PRICING_ENDPOINT_TEMPLATE = ( | ||
| "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/{}/index.json" | ||
| ) | ||
| | ||
| | ||
| def download_metadata(cluster_config): | ||
| response = requests.get(PRICING_ENDPOINT_TEMPLATE.format(cluster_config["region"])) | ||
| offers = response.json() | ||
| | ||
| instance_mapping = {} | ||
| | ||
| for product_id, product in offers["products"].items(): | ||
| if product.get("attributes") is None: | ||
| continue | ||
| if product["attributes"].get("servicecode") != "AmazonEC2": | ||
| continue | ||
| if product["attributes"].get("tenancy") != "Shared": | ||
| continue | ||
| if product["attributes"].get("operatingSystem") != "Linux": | ||
| continue | ||
| if product["attributes"].get("capacitystatus") != "Used": | ||
| continue | ||
| if product["attributes"].get("operation") != "RunInstances": | ||
| continue | ||
| price_dimensions = list(offers["terms"]["OnDemand"][product["sku"]].values())[0][ | ||
| "priceDimensions" | ||
| ] | ||
| | ||
| price = list(price_dimensions.values())[0]["pricePerUnit"]["USD"] | ||
| | ||
| instance_type = product["attributes"]["instanceType"] | ||
| metadata = { | ||
| "sku": product["sku"], | ||
| "instance_type": instance_type, | ||
| "cpu": int(product["attributes"]["vcpu"]), | ||
| "mem": int( | ||
| float(re.sub("[^0-9\\.]", "", product["attributes"]["memory"].split(" ")[0])) * 1024 | ||
| ), | ||
| "price": float(price), | ||
| } | ||
| if product["attributes"].get("gpu") is not None: | ||
| metadata["gpu"] = product["attributes"]["gpu"] | ||
| instance_mapping[instance_type] = metadata | ||
| | ||
| return instance_mapping | ||
| | ||
| | ||
| def get_metadata(cluster_config): | ||
| return download_metadata(cluster_config) | ||
deliahu marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| | ||
| def set_ec2_metadata(cluster_config_path): | ||
| with open(cluster_config_path, "r") as cluster_config_file: | ||
| cluster_config = yaml.safe_load(cluster_config_file) | ||
| instance_mapping = get_metadata(cluster_config) | ||
| instance_type = instance_mapping[cluster_config["instance_type"]] | ||
deliahu marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| cluster_config["instance_mem"] = str(instance_type["mem"]) + "Mi" | ||
| cluster_config["instance_cpu"] = str(instance_type["cpu"]) | ||
| cluster_config["instance_gpu"] = int(instance_type.get("gpu", 0)) | ||
| | ||
| with open(cluster_config_path, "w") as cluster_config_file: | ||
| yaml.dump(cluster_config, cluster_config_file, default_flow_style=False) | ||
| | ||
| | ||
| def main(): | ||
| set_ec2_metadata(sys.argv[1]) | ||
| | ||
| | ||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.