Pod Scheduling Base on Runtime Tiered Locality
In Pod Scheduling Optimization, we introduce how to schedule application Pods to nodes with cached data.
However, in some cases, if the data cached nodes cannot be scheduled with the application Pod, the Pod will be scheduled to a node closer to the data cached nodes, such as on the same zone, its read and write performance will be better than in different zones.
Fluid supports configuring tiered locality information in K8s clusters, which is stored in the configmap named 'webhook-plugins' in fluid system namespace. file of Fluid's Helm Chart.
The following is a specific example, assuming that the K8s cluster has locality information for zones and regions, achieving the following goals:
- When the application Pod is not configured with required dataset scheduling, prefer to schedule pod to data cached nodes. If pods can not be scheduled in data cached nodes, prefer to be scheduled in the same zone. If pods can not be scheduled in the same zone nodes too, then prefer to be scheduled in the same region;
- When using Pod to configure required dataset scheduling, require pod to be scheduled in the same zone of data cached nodes instead of the data cached nodes.
0. Prerequisites
The version of k8s you are using needs to support admissionregistration.k8s.io/v1 (Kubernetes version > 1.16 ) Enabling allowed controllers needs to be configured by passing a flag to the Kubernetes API server. Make sure that your cluster is properly configured.
--enable-admission-plugins=MutatingAdmissionWebhook
Note that if your cluster has been previously configured with other allowed controllers, you only need to add the MutatingAdmissionWebhook parameter.
1. Configure Tiered Locality in Fluid
- Configure before installing Fluid
Define the tiered locality configuration in the Helm Charts values.yaml like below.
pluginConfig:
- name: NodeAffinityWithCache
args: |
preferred:
# fluid built-in name(default not enabled), used to schedule pods to the node with existing fuse pod
# - name: fluid.io/fuse
# weight: 100
# fluid built-in name(default enabled), used to schedule pods to the data cached node
- name: fluid.io/node
weight: 100
# runtime worker's zone label name(default enabled), can be changed according to k8s environment.
- name: topology.kubernetes.io/zone
weight: 50
# runtime worker's region label name(default enabled), can be changed according to k8s environment.
- name: topology.kubernetes.io/region
weight: 10
required:
# If Pod is configured with required affinity, then schedule the pod to nodes match the label.
# Default value is 'fluid.io/node'. Multiple names is the And relation.
- topology.kubernetes.io/zone
Install Fluid following the document Installation. After installation, a configmap
named webhook-plugins
storing above configuration will exist in Fluid namespace(default fluid-system
).
- Modify tiered locality configuration in the existing Fluid cluster
Modify tiered location configuration (content see point 1) in the configMap named 'webhook-plugins'
in the Fluid namespace (default fluid-system
), the new configuration only takes affect when the fluid-webhook pod restarts.
2. Configure the tiered locality information for the Runtime
Tiered location information can be configured through the NodeAffinity field of the Dataset or the NodeSelector field of the Runtime.
The following is the configuration of tiered location information defined in the yaml of the Dataset. And the workers of the Runtime will be deployed on nodes matching these labels.
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: hbase
spec:
mounts:
- mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/
name: hbase
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-a
- key: topology.kubernetes.io/region
operator: In
values:
- region-a