Cache Co-locality
In Fluid, remote files specified in Dataset
object are schedulable, which means you are able to control where to put your data in a k8s cluster,
just like what you may have done to Pods. Also, Fluid is able to make cache co-locality scheduling decisions for workloads to minimize overhead costs.
This tutorial will show you an overview about features mentioned above.
Prerequisites
Before everything we are going to do, please refer to Installation Guide to install Fluid on your Kubernetes Cluster, and make sure all the components used by Fluid are ready like this:
$ kubectl get pod -n fluid-system
NAME READY STATUS RESTARTS AGE
alluxioruntime-controller-5b64fdbbb-84pc6 1/1 Running 0 8h
csi-nodeplugin-fluid-fwgjh 2/2 Running 0 8h
csi-nodeplugin-fluid-ll8bq 2/2 Running 0 8h
dataset-controller-5b7848dbbb-n44dj 1/1 Running 0 8h
Normally, you shall see a Pod named "dataset-controller", a Pod named "alluxioruntime-controller" and several Pods named "csi-nodeplugin". The num of "csi-nodeplugin" Pods depends on how many nodes your Kubernetes cluster have(e.g. 2 in this tutorial), so please make sure all "csi-nodeplugin" Pods are working properly.
Set Up Workspace
$ mkdir <any-path>/co-locality
$ cd <any-path>/co-locality
Install Resources to Kubernetes
Check all nodes in your Kubernetes cluster
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1
cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1
Label one of the nodes
$ kubectl label nodes cn-beijing.192.168.1.146 hbase-cache=true
Since we'll use NodeSelector
to manage where to put our data, we mark the desired node by labeling it.
Check all nodes again
$ kubectl get node -L hbase-cache
NAME STATUS ROLES AGE VERSION HBASE-CACHE
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1 true
cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1