Skip to content

Commit

Permalink
Merge branch 'master' into YUNIKORN-1351
Browse files Browse the repository at this point in the history
  • Loading branch information
chenyulin0719 committed Jan 26, 2024
2 parents f48e089 + 27ce47c commit 7beb9aa
Show file tree
Hide file tree
Showing 127 changed files with 4,643 additions and 5,658 deletions.
17 changes: 16 additions & 1 deletion .github/workflows/pre-commit.yml
Expand Up @@ -39,7 +39,7 @@ jobs:
strategy:
fail-fast: false
matrix:
k8s: [v1.28.0, v1.27.3, v1.26.6, v1.25.11, v1.24.15]
k8s: [v1.29.0, v1.28.0, v1.27.3, v1.26.6, v1.25.11, v1.24.15]
plugin: ['', '--plugin']
steps:
- name: Checkout source code
Expand All @@ -55,7 +55,22 @@ jobs:
echo "vm.nr_hugepages = 1024" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
sudo sysctl -a | grep vm.nr_hugepages
- name: Cache and Restore e2e required tools
id: cache
uses: actions/cache@v3
with:
path: |
tools
key: ${{ runner.os }}-e2e-${{ hashFiles('Makefile') }}
restore-keys: |
${{ runner.os }}-e2e-
- run: ./scripts/run-e2e-tests.sh -a "test" -n "yk8s" -v "kindest/node:${KIND_NODE_IMAGE}" ${KIND_EXTRA_ARGS}
env:
KIND_NODE_IMAGE: ${{ matrix.k8s }}
KIND_EXTRA_ARGS: ${{ matrix.plugin }}
- name: Upload artifacts
uses: actions/upload-artifact@v4
if: ${{ failure() }}
with:
name: ${{ github.job }} stdout (${{ matrix.k8s }}${{ matrix.plugin == '--plugin' && format(', {0}', matrix.plugin) || matrix.plugin }})
path: build/e2e
6 changes: 4 additions & 2 deletions Makefile
Expand Up @@ -130,10 +130,12 @@ HELM_ARCHIVE=helm-$(HELM_VERSION)-$(OS)-$(EXEC_ARCH).tar.gz
HELM_ARCHIVE_BASE=$(OS)-$(EXEC_ARCH)

# spark
export SPARK_VERSION=3.3.1
export SPARK_VERSION=3.3.3
# sometimes the image is not avaiable with $SPARK_VERSION, the minor version must match
export SPARK_PYTHON_VERSION=3.3.1
export SPARK_HOME=$(BASE_DIR)$(TOOLS_DIR)/spark
export SPARK_SUBMIT_CMD=$(SPARK_HOME)/bin/spark-submit
export SPARK_PYTHON_IMAGE=docker.io/apache/spark-py:v$(SPARK_VERSION)
export SPARK_PYTHON_IMAGE=docker.io/apache/spark-py:v$(SPARK_PYTHON_VERSION)

FLAG_PREFIX=github.com/apache/yunikorn-k8shim/pkg/conf

Expand Down
10 changes: 9 additions & 1 deletion deployments/examples/README.md
Expand Up @@ -102,7 +102,7 @@ The namespace example uses a placement rule and special queue configuration. The
* run the sleep pod in the production namespace which creates a new `production` queue using the local [sleeppod_prod.yaml](namespace/sleeppod_prod.yaml): `kubectl create -f namespaces.yaml`.
The pod spec does not specify a queue just a namespace but the application will be run in the newly created `root.production` queue. This queue does not exist in the queue configuration.

### placements
## placements
The placements' rules are described in [Yunikorn website](https://yunikorn.apache.org/docs/user_guide/placement_rules).
App placements rules in Yunikorn contains `Provided Rule`, `User Name Rule`, `Fixed Rule`, `Tag Rule`.
Every placement example includes a example yaml file and a config yaml file.
Expand All @@ -113,3 +113,11 @@ Before deploying the pods, the configuration field in yunikorn-release/helm/yuni
* [User Name Rule](./placements/username)
* [Fixed Rule](./placements/fixed)
* [Tag Rule](./placements/tag)

## preemption
This example demonstrates how to set up priority queues and initiate jobs to trigger preemption. Follow the steps in [README.md](./preemption/README.md) to understand how preemption works between queues.

More documents for preemption:
* [App & Queue Priorities](https://yunikorn.apache.org/docs/user_guide/priorities)
* [Preemption](https://yunikorn.apache.org/docs/next/design/preemption/)
* [Use Case: Preemption & Priority scheduling with fencing](https://yunikorn.apache.org/docs/user_guide/use_cases#preemption--priority-scheduling-with-fencing)
79 changes: 79 additions & 0 deletions deployments/examples/preemption/README.md
@@ -0,0 +1,79 @@
<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->

# A simple example to demo preemption

## Description
This example includes yamls to
1. Create priority queue with preemption fence. ([yunikorn-configs.yaml](./yunikorn-configs.yaml))
2. Create application with PriorityClass. ([normal-priority-job-with-9999-priority-class.yaml](./normal-priority-job-with-9999-priority-class.yaml))

By following the steps below, you will be able to see the below behaviors of preemption
1. A task cannot preempt tasks outside its preemption fence.
2. Preemption is aiming to ensure that the queue's resource usage reaches at least the guaranteed amount of resources.
3. Preemption can never leave a queue lower than its guaranteed capacity.

## Run the example workloads
```shell script
# Update YuniKorn config
kubectl apply -f yunikorn-configs.yaml

# Create low-priority jobs
kubectl apply -f low-priority-job-outside-fence.yaml
kubectl apply -f low-priority-job.yaml
kubectl get pods | grep Running
# Should have 10 pods in Running.
# Expected: 5 outside fence, 5 low-priority.

# Create normal-priority job
kubectl apply -f normal-priority-job.yaml
kubectl get pods | grep Running
# Preemption is triggered after a specified delay, with the aim of ensuring that each queue's resource usage reaches at least the guaranteed amount of resources. A task cannot preempt tasks outside its preemption fence.
# Wait 1~2 minutes for preemption to occur.
# Expected: 5 outside fence, 2 low-priority, 3 normal-priority.

# Remove low-priority job
kubectl delete -f low-priority-job.yaml
kubectl get pods | grep Running
# Expected: 5 outside fence, 5 normal-priority.

# Create high-priority job
kubectl apply -f high-priority-job.yaml
kubectl get pods | grep Running
# Preemption can never leave a queue lower than its guaranteed capacity.
# Wait 1~2 minutes for preemption to occur.
# Expected: 5 outside fence, 3 normal-priority, 2 high-priority.

# Remove normal-priority job
kubectl delete -f normal-priority-job.yaml
kubectl get pods | grep Running
# Expected: 5 outside fence, 5 high-priority.

# Create normal-priority job with PriorityClass
kubectl apply -f normal-priority-job-with-9999-priority-class.yaml
kubectl get pods | grep Running
# After applying priority.offset in the queue, the job has the highest priority.
# Wait 1~2 minutes for preemption to occur.
# Expected: 5 outside fence, 3 high-priority, 2 normal-priority-job-with-9999-priority-class.

#cleanup
kubectl delete -f normal-priority-job-with-9999-priority-class.yaml
kubectl delete -f high-priority-job.yaml
kubectl delete -f low-priority-job-outside-fence.yaml
```

39 changes: 39 additions & 0 deletions deployments/examples/preemption/high-priority-job.yaml
@@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
name: high-priority-job
spec:
completions: 10
parallelism: 10
template:
metadata:
labels:
applicationId: high-priority-job
queue: root.sandbox.tenants.tenant-high
spec:
schedulerName: yunikorn
containers:
- name: pause
image: registry.k8s.io/pause:3.7
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never
@@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
name: low-priority-job-outside-fence
spec:
completions: 5
parallelism: 5
template:
metadata:
labels:
applicationId: low-priority-job-outside-fence
queue: root.sandbox.system.system-low
spec:
schedulerName: yunikorn
containers:
- name: pause
image: registry.k8s.io/pause:3.7
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never
39 changes: 39 additions & 0 deletions deployments/examples/preemption/low-priority-job.yaml
@@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
name: low-priority-job
spec:
completions: 5
parallelism: 5
template:
metadata:
labels:
applicationId: low-priority-job
queue: root.sandbox.tenants.tenant-low
spec:
schedulerName: yunikorn
containers:
- name: pause
image: registry.k8s.io/pause:3.7
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never
@@ -0,0 +1,50 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: add-9999-priority
annotations:
yunikorn.apache.org/allow-preemption: "true"
value: 9999
globalDefault: false

---
apiVersion: batch/v1
kind: Job
metadata:
name: normal-priority-job-with-9999-priority-class
spec:
completions: 10
parallelism: 10
template:
metadata:
labels:
applicationId: normal-priority-job-with-9999-priority-class
queue: root.sandbox.tenants.tenant-normal
spec:
schedulerName: yunikorn
containers:
- name: pause
image: registry.k8s.io/pause:3.7
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never
priorityClassName: add-9999-priority
39 changes: 39 additions & 0 deletions deployments/examples/preemption/normal-priority-job.yaml
@@ -0,0 +1,39 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
name: normal-priority-job
spec:
completions: 10
parallelism: 10
template:
metadata:
labels:
applicationId: normal-priority-job
queue: root.sandbox.tenants.tenant-normal
spec:
schedulerName: yunikorn
containers:
- name: pause
image: registry.k8s.io/pause:3.7
resources:
requests:
cpu: "100m"
memory: "100Mi"
restartPolicy: Never

0 comments on commit 7beb9aa

Please sign in to comment.