Kubernetes en production : déployer une app haute disponibilité avec RKE

Dans le cadre d'un projet infrastructure, j'ai déployé un cluster Kubernetes en utilisant RKE (Rancher Kubernetes Engine). Voici un guide pratique des concepts clés et des décisions d'architecture.

Pourquoi RKE ?

RKE est un installateur Kubernetes léger qui tourne entièrement dans des conteneurs Docker. Par rapport à kubeadm, il offre :

Une configuration déclarative via un seul fichier YAML (cluster.yml)
Une procédure de mise à jour simplifiée (rke upgrade)
Une compatibilité native avec Rancher pour la gestion multi-cluster

Architecture du cluster

┌─────────────────────────────────────────┐
│              Load Balancer              │
│           (HAProxy / Nginx)             │
└──────────────┬──────────────────────────┘
               │
    ┌──────────┴──────────┐
    │                     │
┌───▼───┐           ┌─────▼─────┐
│Master1│           │  Master2  │
│ etcd  │           │   etcd    │
│control│           │  control  │
└───────┘           └───────────┘
    │                     │
    └─────────┬───────────┘
              │
    ┌─────────┼─────────┐
    │         │         │
┌───▼──┐  ┌──▼───┐  ┌──▼───┐
│Worker│  │Worker│  │Worker│
│  01  │  │  02  │  │  03  │
└──────┘  └──────┘  └──────┘

Configuration RKE

# cluster.yml
nodes:
  - address: 192.168.1.10
    user: ubuntu
    role: [controlplane, etcd]
  - address: 192.168.1.11
    user: ubuntu
    role: [controlplane, etcd]
  - address: 192.168.1.20
    user: ubuntu
    role: [worker]
  - address: 192.168.1.21
    user: ubuntu
    role: [worker]
  - address: 192.168.1.22
    user: ubuntu
    role: [worker]

network:
  plugin: canal

ingress:
  provider: nginx

services:
  kube-api:
    extra_args:
      # Durcissement de l'API Server
      anonymous-auth: "false"
      audit-log-path: /var/log/kube-audit.log

Déploiement de l'application

Deployment avec rolling update

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-ha
  namespace: production
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0          # Zéro downtime garanti
  selector:
    matchLabels:
      app: app-ha
  template:
    spec:
      # Anti-affinité : jamais 2 pods sur le même nœud
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: kubernetes.io/hostname
      containers:
        - name: app
          image: app-ha:v1.2.0
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-ha-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-ha
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Les concepts "auto-réparateurs"

Kubernetes gère automatiquement :

Scénario	Comportement K8s
Pod crashe	Redémarrage automatique (restartPolicy)
Nœud tombe	Pods reprogrammés sur nœuds sains
Charge CPU élevée	HPA scale up automatiquement
Charge diminue	HPA scale down après cooldown
Déploiement raté	Rollback automatique si health checks échouent

Ingress avec TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
    - hosts:
        - app.example.com
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: app-ha-svc
                port:
                  number: 80

Leçons apprises

Toujours utiliser podAntiAffinity pour les apps critiques — évite qu'un seul nœud héberge tous vos pods
maxUnavailable: 0 est essentiel pour le zéro downtime
Les health checks sont non-négociables — sans eux, K8s ne sait pas si votre app est vraiment prête
Monitorer avec Prometheus + Grafana dès le début, pas quand ça casse

Projet personnel réalisé avec RKE 2.x et Kubernetes 1.28.