Skip to content

Deployment

Helm Chart Structure

All services are deployed from a single shared Helm chart at helm/appointment-service/. Values are layered at deploy time using Helm's last-value-wins precedence:

values.yaml                     # Base defaults for all services
  └── values-{service}.yaml     # Per-service overrides
        └── values-{env}.yaml   # Per-environment overrides (production, staging)
              └── --set flags   # CI-injected: image tag, digest, strategy

Chart Templates

Template Purpose
deployment.yaml Deployment with rolling/canary strategy, probes, security context
hpa.yaml HorizontalPodAutoscaler — CPU 70% + memory 80% targets
service.yaml ClusterIP service
ingress.yaml Optional ingress (disabled by default; enabled per service)
pdb.yaml PodDisruptionBudget — minAvailable: 1 (overridden to 2 in production)
serviceaccount.yaml ServiceAccount with optional IRSA annotation

Per-Service Values Files

File Key Overrides
values-booking.yaml 3 replicas min, HPA 3–20, canary strategy, IRSA annotation for RDS + MSK
values-search.yaml nodeSelector: workload=search, tolerations for the search taint, HPA 3–30, IRSA for OpenSearch SigV4
values-availability.yaml Canary strategy
values-patient.yaml Canary strategy
values-notification.yaml Rolling strategy
values-practitioner.yaml Rolling strategy
values-production.yaml minReplicas: 3, minAvailable: 2 PDB, scale-down stabilisation 600s
values-staging.yaml Reduced replicas and resource limits for cost

Deployment Strategies

Services are classified by criticality:

Service Strategy Rationale
Booking Service Canary (10% initial) Revenue-critical write path
Availability Service Canary (10% initial) Feeds the booking pre-check
Patient Service Canary (10% initial) User-facing profile data
Search Service Rolling Read-only; stateless; fast rollback acceptable
Notification Service Rolling Async consumer; no synchronous user impact
Practitioner Service Rolling Low-traffic CRUD

Rolling Update

rollingUpdate:
  maxSurge: 1
  maxUnavailable: 0

New pods are added one at a time before old pods are terminated. At no point is capacity reduced below the requested replica count.

Canary Deployment Flow

The canary flow runs entirely within the production GitHub Actions job:

  1. A separate Helm release ({service}-canary) is created with a reduced replica count and a canary.weight annotation routing N% of traffic to it via the ingress/service-mesh layer.
  2. A 5-minute observation window (sleep 300) allows error rate monitoring.
  3. If no manual rollback is triggered, the stable release is upgraded to the new image.
  4. The canary release is deleted with helm uninstall {service}-canary.
# Step 1 — deploy canary
helm upgrade booking-service-canary helm/appointment-service \
  --set canary.enabled=true \
  --set canary.weight=10 \
  --set image.tag="${SHA}" \
  --atomic --timeout 10m

# Step 3 — promote to stable
helm upgrade booking-service helm/appointment-service \
  --set canary.enabled=false \
  --set image.tag="${SHA}" \
  --atomic --timeout 15m

# Step 4 — clean up
helm uninstall booking-service-canary

Pod Security Configuration

Security defaults are set in values.yaml and apply to all services. Per-service files may only tighten, not loosen, these settings.

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000      # matches appuser UID in the Dockerfile
  runAsGroup: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop:
      - ALL

Two emptyDir volumes (/tmp and /app/.cache) are mounted to satisfy services that need writable scratch space while keeping the root filesystem read-only.


Horizontal Pod Autoscaler

HPA is enabled for all services with dual-metric scaling:

Metric Target
CPU utilization 70%
Memory utilization 80%

Scale-up: 2 pods per 60-second window (fast response to traffic spikes). Scale-down: 1 pod per 60-second window, with a 300-second stabilisation window (600s in production) to prevent flapping.

No CPU limit is set — CPU throttling causes latency spikes. The HPA handles horizontal scaling before a node becomes CPU-saturated. Memory limits are enforced to prevent OOM cascades.


Docker Build Strategy

All services share docker/Dockerfile. The build accepts a SERVICE_DIR argument so the same file serves the entire monorepo.

Stages

Stage Base Purpose
base node:22-bookworm-slim OS security patches, tini, appuser (UID 1000)
deps base npm ci --ignore-scripts — cached until package-lock.json changes
builder deps TypeScript compile (npm run build); prune devDependencies
release base Copy only dist/, production node_modules, package.json — ~50 MB final image

Key Practices

  • tini as PID 1 — ensures correct SIGTERM forwarding and zombie process reaping
  • --ignore-scripts during npm ci — prevents malicious postinstall hooks in CI
  • No shell or package manager in the final image (a distroless alternative is documented in a comment in the Dockerfile)
  • OCI image labelsorg.opencontainers.image.* with build date, git commit, branch
  • Digest pinning — CI passes --set image.digest so Kubernetes pulls by digest, not by mutable tag
  • SBOM + provenancedocker/build-push-action generates attestations on every push to ECR

EKS Node Group Layout

Node Group Instance Type Taint Workloads
general m6i.xlarge None Booking, Availability, Patient, Practitioner, Notification
search r6i.xlarge workload=search:NoSchedule Search Service only

Pod anti-affinity (preferred) spreads pods across AZs. A topology spread constraint (maxSkew: 1 on kubernetes.io/hostname) prevents all replicas from landing on the same node.