Amazon EKS increase IP address with network interface prefixes

shazi7804

5 年前

有在使用 Amazon EKS 的人應該會知道不同的 instance type 能放的 maximum pods 有限，對於跑一些很小 workload 想要極大化 computing 的人就會受限在 pods 的數量，好在前陣子 Amazon EKS 就把這個很多人都在敲碗的功能「Increase the amount of available IP addresses for your Amazon EC2 nodes」給搞出來了，然後作者就作為先鋒身先士卒測試了一番，在這篇紀錄一下在過程中踩到的雷以及如何實現的。

原本 Amazon EKS 可用的 IP 都是基於 ENI (Elastic network interface) 數量而定，然後再扣掉 Worker node 本身的 IP 就所剩無幾

(Number of network interfaces for the instance type × (the number of IP addressess per network interface – 1)) + 2

而這個功能的實現其實是基於 Amazon EC2 發佈的 Assigning prefixes to Amazon EC2 network interfaces 讓 network interface 可以加上 IPv4 /28 prefix、IPv6 /80 prefix 這讓單一個 Amazon EC2 可用的 IP 數量大幅提高。

但作者仍然要提一下，即便 Worker node 提高 max-pods 數量，但實際 VPC CIDR 可用的 IP 仍然有限，或是妥善利用 Secondary CIDR。

Prerequisites

基於 Nitro System-based 的 instance types
Amazon VPC CNI 必須 1.9.0+

How to

從 Amazon VPC CNI 1.9.0+ 之後開始支援 ENABLE_PREFIX_DELEGATION 這個參數啟動 prefix 功能

$ kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

AWS 提供了一個 max-pods-calculator.sh 計算 instance type 最大可支援的 maximum pods

$ curl -o max-pods-calculator.sh https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh
$ chmod +x max-pods-calculator.sh
$ ./max-pods-calculator.sh --instance-type m5.large --cni-version 1.9.0-eksbuild.1 --cni-prefix-delegation-enabled

110

詳細的計算方法可以在 max-pods-calculator.sh 內找到，基本上仍然是基於 Network interface 數量再扣掉 Worker node 本身的 IP address。

Max pods examples
t3.nano: 34
m5.large: 110
m5.8xlarge: 250

Amazon VPC CNI 有兩種計算 maximum pods per ENI 的方法 WARM_PREFIX_TARGET 或者 WARM_IP_TARGET + MINIMUM_IP_TARGET (prefix-and-ip-target)，簡單來講如果想要無腦讓 prefix 自動幫你分配就選 WARM_PREFIX_TARGET，如果你對於 IP 的利用率非常計較則會用 WARM_IP_TARGET + MINIMUM_IP_TARGET。我比較無腦所以我選 WARM_PREFIX_TARGET

$ kubectl set env ds aws-node -n kube-system WARM_PREFIX_TARGET=1

除了 Amazon VPC CNI 以外 self-Managed-node 也需要在 bootstrap.sh 加上 --kubelet-extra-args '--max-pods=110' 參數讓 Amazon EKS 解放束縛

$ sudo /etc/eks/bootstrap.sh <cluster-name> \
  --kubelet-extra-args '--max-pods=110'

透過 kubectl describe 檢查 Worker node 是否已經生效，當然這也意味者原本已經有 Worker node 必須要更換具有 max pods 設定的 Worker node。

$ kubectl describe nodes ip-10-0-61-30.ec2.internal | grep pods

pods: 110

Testing

簡單跑一個 Nginx deployment 測試一下

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 200
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest                                                     
        ports:
        - containerPort: 80

直接調整 replicas 或是 kubectl scale 將 replicas 加到 200

$ kubectl scale --replicas=200 deployment/nginx

然後用 kube-ops-view 看一下分佈狀況：

因為 Worker node 可用的 pods 數量還夠，所以多數 pods 都配到有設定 max-pods 的 Worker nodes。

Prerequisites

How to

Testing

分享此文：