Zero-Downtime EKS Service Migration Using Weighted ALB Routing & Cross-Namespace Proxies

Jun 25, 2025

Introduction

When you need to migrate a critical service from one namespace to another in Kubernetes while maintaining zero downtime, traditional blue-green deployments fall short. The challenge becomes even more complex when your services live in different namespaces and you're using AWS Application Load Balancer (ALB) for ingress.

In this post, I'll walk you through a battle-tested approach we used to migrate our auth service from v1 to v2 across different namespaces with zero downtime and gradual traffic shifting.

The Challenge

Our scenario involved migrating one of our service i.e auth-service from the auth-service-v1-dev namespace to auth-service-v2-dev namespace. The constraints were:

Zero downtime during migration
Weighted traffic distribution (canary deployments)
Both services fronted by the same AWS ALB
ALB controller limitation: can only see services within the same namespace as the ingress

Consideration

There were 3 options considered:

Path based migration: Only route traffic to certain path to v2 and slowly migrate rest of the paths
Weighted: Only route certain percentage of traffic to v2 ( 90/10, 80/20,60/40… )
Hybrid of path & weighted: Combination of 1 & 2

After planning and communicating with engineering teams, we decided that weighted migration would be the best path forward for the team.

Assumption

Before proceeding, we had few things already setup like:

ALB as ingress
External DNS handling DNS management / DNS-01 Challenge
Cert-Manager handling Certificate handling

For our setup, we had following validated:

Both applications running on same EKS cluster
Both applications fronted by ALB as an ingress
Both applications under the same ALB group name
v1 application is in auth-service-v1-dev namespace
v2 application is in auth-service-v2-dev namespace
Both applications expose the same functionality on different ports i.e v1 is exposing app at port 8002 and v2 is exposing app at port 8080

The "Teleporter Proxy" Pattern

We can’t directly apply ALB weighted ingress configuration for this migration, as ALB controller can only see services in the same namespace as the ingress. It can't directly route to auth-service-v2-dev-auth-service-v2 because that service lives in a different namespace.

We create a "proxy" service in the v1 namespace that ALB can see, but it secretly forwards all traffic to the v2 namespace using the ClusterIP. We call it the “Teleporter Proxy” Pattern. Let’s see how it is implemented first, then we will go about configuring the whole migration.

The core approach was what we call the "Teleporter Proxy" pattern. Since ALB can't directly route to services in different namespaces, we create a proxy service in the source namespace that forwards traffic to the target namespace using ClusterIP addresses.

Here's how the traffic flow works:

ALB → Proxy Service (v1 namespace) → Real Service (v2 namespace) → v2 Pods

Traffic Flow:

ALB sends 20% traffic → auth-service-v2.auth-service-v1-dev:8080 (proxy)
Proxy forwards → 123.10.28.114:8080 (real v2 service)
Real v2 service routes to v2 pods
Response flows back through the same path

It's like having a traffic forwarding service - ALB sends traffic to the local address, but it gets automatically forwarded to the real destination in another neighborhood (namespace). The proxy service acts as a bridge, allowing ALB to register targets while secretly forwarding traffic across namespace boundaries.

This gives us cross-namespace weighted routing while keeping the services properly isolated in their own namespaces.

┌─────────────────────────────────────────────────────────────────┐
│                      Source Namespace                           │
│  ┌─────────────────┐    ┌─────────────────────────────────┐     │
│  │   Real Service  │    │      Teleporter Proxy Service   │     | 
│  │                 │    │   (No Selector/Manual Endpoints) │    │
│  └─────────────────┘    └─────────────────┬───────────────┘     │
└───────────────────────────────────────────┼──────────────────────┘
                                            │ 
                                            │ Teleports traffic via ClusterIP
                                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Target Namespace                            │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Target Service                             │    │
│  │          ClusterIP: 172.20.x.x                          │    │
│  │                                                         │    │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐                    │    │
│  │  │  Pod 1  │ │  Pod 2  │ │  Pod 3  │                    │    │
│  │  └─────────┘ └─────────┘ └─────────┘                    │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Migration Summary

Just implementing the patten was not enough, we had to work on making the migration weighted and seamless. In order to do so, we had to leverage the ALB target group configuration.

Since target groups are created and attached automatically based on the configuration & ingress annotations, we had to create a service which will basically act as a proxy to the v2 namespace. However, there is another problem here.

ExternalName type service, which is Kubernetes' native way to reference external namespaces, is not supported by ALB. Hence we need to manually create a ClusterIP service and endpoints, and attach the ClusterIP address from the v2 namespace service to the v1 namespace service ( pattern as explained above ). Once deployed with weighted configuration with service name reference, it creates a target group in AWS console with proper targets that ALB can route to, enabling weighted traffic distribution between v1 and v2.

Slowly we change the weighted migration percentage to offload traffic to v2, reaching from (100/0) to (0/100). At this moment, all traffic is going to the v2 namespace.

Pre-Migration

Step 1 : Pre-Configure Service v2 Proxy and Endpoints

What this step does:

Getting the ClusterIP of the v2 service in auth-service-v2-dev namespace. We need that to configure our teleporter proxy.
Creating a "teleporter proxy" service in the v1 namespace (auth-service-v1-dev)
Manually pointing the proxy service to the v2 service's ClusterIP which essentially punches a hole to from V1 to V2 directly.

ok, let’s try to implement above,

Get the Cluster IP of the v2 Service

kubectl get svc auth-service-v2-dev-auth-service-v2 -n auth-service-v2-dev -o jsonpath='{.spec.clusterIP}'

Create the service in the v1 namespace using the following commands:

# Step 1: Get the v2 service ClusterIP
kubectl get svc auth-service-v2-dev-auth-service-v2 -n auth-service-v2-dev

# Note: ClusterIP is 123.10.28.114 (from your output)

# Step 2: Create proxy service and endpoints in v1 namespace
kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: auth-service-v2-teleporter
  namespace: auth-service-v1-dev
  labels:
    app: auth-service-v2
    version: v2
spec:
  type: ClusterIP
  ports:
  - name: jmx-metrics
    port: 9020
    protocol: TCP
    targetPort: 9020
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  - name: metrics
    port: 9000
    protocol: TCP
    targetPort: 9000
---
apiVersion: v1
kind: Endpoints
metadata:
  name: auth-service-v2-teleporter
  namespace: auth-service-v1-dev
subsets:
- addresses:
  - ip: 123.10.28.114  # ClusterIP from auth-service-v2-dev namespace
  ports:
  - name: jmx-metrics
    port: 9020
    protocol: TCP
  - name: http
    port: 8080
    protocol: TCP
  - name: metrics
    port: 9000
    protocol: TCP
EOF

The v2 service in auth-service-v1-dev will now proxy traffic to the actual v2 service in auth-service-v2-dev, and ALB can register it as targets for weighted load balancing!

Step 2: Configure v1 Helm for Weighted Routing

What this step does:

Creates a real ALB target group for ALB target registration
Makes application ready for weighted routing between v1 and v2

Update your v1 Helm configuration in auth-service-v1-dev namespace:

  name: auth-service
  # ... existing configuration ...
  ingress:
    enabled: true
  # ... existing configuration ...
      alb.ingress.kubernetes.io/target-group-attributes: slow_start.duration_seconds=120,deregistration_delay.timeout_seconds=30
      alb.ingress.kubernetes.io/target-type: ip
+     alb.ingress.kubernetes.io/group.order: "40"  # Lower priority than V1 
+      alb.ingress.kubernetes.io/actions.weighted-routing: |
+       {
+         "type": "forward",
+         "forwardConfig": {
+           "targetGroups": [
+             {
+               "serviceName": "auth-service-v1-dev-auth-service",
+               "servicePort": "8002",
+               "weight": 100
+             },
+             {
+               "serviceName": "auth-service-v2-teleporter",
+               "servicePort": "8080", 
+               "weight": 0
+             }
+           ]
+         }
+       }
    hosts:
      - host: auth-service.dev.mycompany.com
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
-             name: auth-service-v1-dev-auth-service
+             name: weighted-routing  # CHANGED: Use action name instead of direct service
+             port:
-               name: http
+               name: use-annotation   # CHANGED: Use annotation instead of http port

Step 3: Register target in the target group created by ALB from console

This was a manual thing but you would need to add the endpoints for the service and register them as targets in target groups. For some reason, ALB doesn’t automatically populate them. My guess it because the target are referencing cross name space

Go to Console > EC2 > Load Balancers > [ YOUR ALB ] > Listeners :443 > [ Search for domain ] > Click new Target Group ( one with 0 % ) > Register Targets

You can get Target IP by looking at targets from v2 service target group.

Step 4: Configure v2 Helm for alb target group ordering ( StandBy Mode )

Update your v2 Helm configuration in auth-service-v2-dev namespace:

            ingress:
              enabled: true
              alb: true
              annotations:
                # ... existing configuration ...
                alb.ingress.kubernetes.io/group.name: dev-internal-alb
+               alb.ingress.kubernetes.io/group.order: "10"  # Lower priority than V1
              
              ingressClassName: dev-alb-public
              hosts:
                - host: auth-service-v2.dev.mycompany.com
                  paths:
                    - path: /*
              # ... existing configuration ...

Step 5: Apply the helm changes to both app and monitor.

Apply helm changes and monitor the application. There should not be any impact to the application or existing traffic

Migration

Phase 1: Start Canary ( 10% traffic to V2 )

Change the weight value and apply helm configuration in v1 application

alb.ingress.kubernetes.io/actions.weighted-routing: |
  {
    "type": "forward",
    "forwardConfig": {
      "targetGroups": [
        {
          "serviceName": "auth-service-v1-dev-auth-service", 
          "servicePort": "8002",
+         "weight": 90
        },
        {
          "serviceName": "auth-service-v2-teleporter",
          "servicePort": "8080", 
+         "weight": 10
        }
      ]
    }
  }

Phase 2: Increase v2 Traffic ( 30% traffic to v2 )

Same as above

alb.ingress.kubernetes.io/actions.weighted-routing: |
  {
    "type": "forward",
    "forwardConfig": {
      "targetGroups": [
        {
          "serviceName": "auth-service-v1-dev-auth-service", 
          "servicePort": "8002",
+         "weight": 70
        },
        {
          "serviceName": "auth-service-v2-teleporter",
          "servicePort": "8080", 
+         "weight": 30
        }
      ]
    }
  }

Phase 3: Balanced split ( 50% traffic to v2 )

Same as above

alb.ingress.kubernetes.io/actions.weighted-routing: |
  {
    "type": "forward",
    "forwardConfig": {
      "targetGroups": [
        {
          "serviceName": "auth-service-v1-dev-auth-service", 
          "servicePort": "8002",
+         "weight": 50
        },
        {
          "serviceName": "auth-service-v2-teleporter",
          "servicePort": "8080", 
+         "weight": 50
        }
      ]
    }
  }

Phase 4: Majority traffic to v2 ( 80% traffic to v2 )

Same as above

alb.ingress.kubernetes.io/actions.weighted-routing: |
  {
    "type": "forward",
    "forwardConfig": {
      "targetGroups": [
        {
          "serviceName": "auth-service-v1-dev-auth-service", 
          "servicePort": "8002",
+         "weight": 20
        },
        {
          "serviceName": "auth-service-v2-teleporter",
          "servicePort": "8080", 
+         "weight": 80
        }
      ]
    }
  }

Phase 5: Full traffic to v2 ( 100% traffic to v2 )

Same as above

alb.ingress.kubernetes.io/actions.weighted-routing: |
  {
    "type": "forward",
    "forwardConfig": {
      "targetGroups": [
        {
          "serviceName": "auth-service-v1-dev-auth-service", 
          "servicePort": "8002",
+         "weight": 0
        },
        {
          "serviceName": "auth-service-v2-teleporter",
          "servicePort": "8080", 
+         "weight": 100
        }
      ]
    }
  }

Phase 6: Final DNS Switch

Since we still want to use the existing DNS for v2, we need to change the host value in v2 namespace to be the current production DNS and update the group order to be higher than v1.

Step 1: Update v2 to take over production DNS

Change the group order to be much higher than v1 and set the host value to be auth-service.dev.mycompany.com and apply the change

Update v2 helmfile.yaml


- name: auth-service-v2-production
  namespace: auth-service-v2-dev
  values:
    - applications:
      - name: auth-service-v2
        ingress:
          enabled: true  # Now enable v2 ingress
          ingressClassName: dev-alb-public
          annotations:
            alb.ingress.kubernetes.io/group.name: dev-internal-alb
+           alb.ingress.kubernetes.io/group.order: '200'  # HIGHER than v1
            # ... other annotations ...
          hosts:
            - host: auth-service.dev.mycompany.com  # SAME as v1 production DNS
              paths:
              - path: /*
                pathType: ImplementationSpecific
                backend:
                  service:
                    name: auth-service-v2-dev-auth-service-v2
                    port:
                      name: http

Step 2: Update v1 to Lower priority and Different DNS

Update v1 helmfile.yaml


- name: auth-service
  values:
    - applications:
      - name: auth-service
        ingress:
          annotations:
            alb.ingress.kubernetes.io/group.order: '50'  # LOWER than v2
            # Remove weighted routing annotation
          hosts:
            - host: auth-service-v1.dev.mycompany.com  # DIFFERENT DNS for rollback

Monitor and Observe

Monitor ALB target group health

aws elbv2 describe-target-health --target-group-arn <target-group-arn>

Monitor application metrics

kubectl top pods -n auth-service-v1-dev
kubectl top pods -n auth-service-v2-dev

Check service endpoints

kubectl get endpoints auth-service-v2-teleporter -n auth-service-v1-dev
kubectl get endpoints -n auth-service-v2-dev

Monitor logs

# v1
kubectl logs -n auth-service-v1-dev -l app.kubernetes.io/name=auth-service-v1-dev --tail=100 -f

# v2
kubectl logs -n auth-service-v2-dev -l app.kubernetes.io/name=auth-service-v2-dev --tail=100 -f

Rollback

The beauty of this approach is the instant rollback capability. Simply flip the group order priorities:

1. Change v1 Helmfile group order and weight back to v1

Here we bump the priority ( group order for v1 , to larger than v2, but also route traffic to service in v1 namespace ). Applying this will immediately rollback application to v1 namespace

  name: auth-service
  # ... existing configuration ...
  ingress:
    enabled: true
  # ... existing configuration ...
      alb.ingress.kubernetes.io/target-group-attributes: slow_start.duration_seconds=120,deregistration_delay.timeout_seconds=30
      alb.ingress.kubernetes.io/target-type: ip
+     alb.ingress.kubernetes.io/group.order: "60"  # Higher priority than V1 
      alb.ingress.kubernetes.io/actions.weighted-routing: |
       {
         "type": "forward",
         "forwardConfig": {
           "targetGroups": [
             {
               "serviceName": "auth-service-v1-dev-auth-service",
               "servicePort": "8002",
+               "weight": 100
             },
             {
               "serviceName": "auth-service-v2-teleporter",
               "servicePort": "8080", 
+               "weight": 0
             }
           ]
         }
       }
    hosts:
      - host: auth-service.dev.mycompany.com
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
+             name: weighted-routing
              port:
+               name: use-annotation

2. Change v2 Helmfile group order and host back to v2 DNS

We bump group order in v2 to be higher than v1 but also change host value ( DNS ) to make sure they don’t have any conflict.

            ingress:
              enabled: true
              alb: true
              annotations:
                # **All annotations below configure the target group and not the ALB itself**
                alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS":443}]'
                # Protocol for target group health check and routing
                alb.ingress.kubernetes.io/backend-protocol: HTTP
                alb.ingress.kubernetes.io/healthcheck-path: "/actuator/health/readiness"
                alb.ingress.kubernetes.io/healthcheck-port: "9000"
+               alb.ingress.kubernetes.io/group.order: '40'
                # health check success codes
                alb.ingress.kubernetes.io/success-codes: "200-301"
                alb.ingress.kubernetes.io/target-group-attributes: "slow_start.duration_seconds=120,deregistration_delay.timeout_seconds=30"
                alb.ingress.kubernetes.io/target-type: ip
                # name of the shared ingress group
                alb.ingress.kubernetes.io/group.name: dev-internal-alb
              ingressClassName: dev-alb-public
              hosts:
+               - host: auth-service-v2.dev.mycompany.com
                  paths:
                    - path: /*
                      pathType: ImplementationSpecific
                      backend:
                        service:
                          name: auth-service-v2-dev-auth-service-v2
                          port:
                            name: http

Post-Migration Cleanup

After successful migration and DNS switch, perform following steps for clean up of existing v1

Remove v1 resources from eks ( Ingress & deployment )
Scale down v1 deployment

kubectl scale deployment auth-service-v1-dev-auth-service -n auth-service-v1-dev --replicas=0

Delete ingress

kubectl delete ingress auth-service-weighted-ingress -n auth-service-v1-dev

Remove proxy service ( no longer needed )

kubectl delete service auth-service-v2-teleporter -n auth-service-v1-dev
kubectl delete endpoints auth-service-v2-teleporter -n auth-service-v1-dev

Update the monitoring and alerting dashboards to point to V2 namespace
Keep eye on traffic and dashboard for any anomalies

Verification commands

Before migration

 while true; do
 echo -n "$(date '+%H:%M:%S') - "
 curl -s -w " Status: %{http_code}\n" --location 'https://auth-service.dev.mycompany.com/ping'
 sleep 1
done

15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200

During migration

 while true; do
 echo -n "$(date '+%H:%M:%S') - "
 curl -s -w " Status: %{http_code}\n" --location 'https://auth-service.dev.mycompany.com/ping'
 sleep 1
done

15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V2":true} Status: 200 <== occasional 200 from v2 and weighted traffic dist
15:00:52 - {"V2":true} Status: 200<== occasional 200 from v2 and weighted traffic dist
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200
15:00:52 - {"V1":true} Status: 200

After Migration

 while true; do
 echo -n "$(date '+%H:%M:%S') - "
 curl -s -w " Status: %{http_code}\n" --location 'https://auth-service.dev.mycompany.com/ping'
 sleep 1
done

15:00:52 - {"V2":true} Status: 200
15:00:52 - {"V2":true} Status: 200
15:00:52 - {"V2":true} Status: 200

Key Benefits

True Zero Downtime: Traffic never stops flowing during migration
Gradual Risk Mitigation: Start with 10% traffic and observe
Instant Rollback: One configuration change reverts everything
Cross-Namespace Support: Works despite ALB controller limitations
Production Battle-Tested: Handles real-world complexity

Lessons Learned

ExternalName services don't work with ALB - manual endpoints are required
Group order is crucial for ALB rule precedence
Monitor target group health throughout the process
Keep rollback DNS ready for emergency situations
Clean up proxy services after successful migration

Conclusion

This teleporter proxy pattern solves a real limitation in Kubernetes networking while providing the safety and control needed for production migrations. The approach scales to any cross-namespace migration scenario and provides the operational confidence teams need when moving critical services.

The combination of weighted routing, namespace bridging, and priority-based rollback creates a robust migration framework that minimizes risk while maximizing control. Whether you're migrating between versions, namespaces, or even clusters, these patterns provide a solid foundation for zero-downtime operations.

Note: This is not the only solution, there are many ways you can achieve same thing. As there is a saying, “there are more than one way to fry a fish”, however given our circumstance and setup, this implementation delivered the desired result. For example, you can do these much easier if your cluster has gateway PI or service mesh like Istio deployed. You can do the same using separate ALB but all of these have their pros and cons. For our case, it was practical, one time migration, safe, and requires no major infra changes.

Have you implemented similar cross-namespace migration patterns? Share your experiences and alternative approaches in the comments below.

Reliability Engineering

Discussion about this post