今天又被問到有關於 Amazon EKS 跟 AWS Load Balancer Controller 整合的問題,報案者表示 ALB Target Group 註冊上來的 instance 都顯示「request timeout」,會遇到問題不外乎是幾種:
- Security Group 設定錯誤
- Pod/Ingress 設定錯誤
- Pod 忙碌或運作不正常
遇到這種狀況先驗證幾個狀況:
- Pod 是否運作正常、沒有異常 logs/exception?=> 沒有異常
- Kubernetes event log 沒有異常 logs/exception?=> 沒有異常
- 從 Worker node 執行 cURL container/service port 是否正常 => cURL 有正常 response
- 從 Amazon VPC 其他 Instance 允許 Security Group rule,cURL 訪問 => cURL 正常 response
從上述狀態確認服務皆可正常運作,重新檢查 Security Group 發現 Worker node 沒有允許來自 ALB access,通常 Security Group rule 會由 AWS Load Balancer Controller 自動加上 rule 才對。
重新翻開 YAML 設定檢查:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backend-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/target-type: instance
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/security-groups: sg-xxxxxxx
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend-service
port:
name: 80
但是在上述 YAML 設定有一行是 alb.ingress.kubernetes.io/security-groups
翻開文件找到一條註釋:
If you specify this annotation, you need to configure the security groups on your Node/Pod to allow inbound traffic from the load balancer. You could also set the manage-backend-security-group-rules
if you want the controller to manage the access rules.
這意味著如果指定了 alb.ingress.kubernetes.io/security-groups
也要設定 alb.ingress.kubernetes.io/manage-backend-security-group-rules
等於 true 時 ALB Controller 才會管理 Worker node 的 Security Group rule。
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backend-ingress
annotations:
alb.ingress.kubernetes.io/security-groups: sg-xxxxxxx
alb.ingress.kubernetes.io/manage-backend-security-group-rules: true
加上之後 ALB Controller 就自動把 security group rule 加到 Worker node 了!