ArgoCD Part 5 — Sync Strategies: Auto Sync, Self Heal, and Order Control

Isn’t Sync Just Pressing a Button?
Auto Sync — Automatic Synchronization
Prune — Should Deletion Be Automatic Too?
Self Heal — Automatic Drift Recovery
Sync Wave — Controlling Deployment Order
Sync Hook — Running Tasks Before and After Sync
Retry Policy — Retrying on Failure
Sync Option Combinations — Per-Environment Strategies

Isn’t Sync Just Pressing a Button?

When you first use ArgoCD, it looks like the Sync button solves everything. Push manifests to Git, press Sync in the UI, and it’s reflected in the cluster. Clean and simple.

But once you enter a real production environment, questions start piling up. Do I have to sync manually every time? What happens if someone modifies things directly with kubectl? Are resources that are no longer needed automatically deleted? ConfigMap needs to be deployed before Deployment — how do I control the order?

The answers to these questions are ArgoCD’s sync strategies. Beyond a simple “apply” button, there’s a system for fine-tuning the level of automation and safety of deployments.

Auto Sync — Automatic Synchronization

By default, ArgoCD only detects Git changes but doesn’t automatically sync. Manual sync being the default is an intentional design choice — if automatic deployment is set up for production, a single mistake could lead to an incident.

But the story is different for development or staging environments. In most cases, you want changes reflected automatically with each commit, and that’s when you enable Auto Sync.

Add syncPolicy.automated to the Application manifest.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/my-org/my-app.git
    targetRevision: main
    path: k8s/overlays/dev
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app
  syncPolicy:
    automated:
      prune: false
      selfHeal: false

With just this setting, ArgoCD automatically initiates a sync whenever the Git and cluster states diverge. The default polling interval is 3 minutes, but connecting a Git webhook can make the response nearly real-time.

Auto Sync comes with two sub-options: prune and selfHeal. These are critical options to consider when enabling Auto Sync, so let’s look at them right away.

Prune — Should Deletion Be Automatic Too?

Suppose you delete a manifest file from Git. The corresponding resource is still alive in the cluster. ArgoCD marks it as OutOfSync but doesn’t delete it by default.

Setting prune: true automatically removes resources from the cluster that have been deleted from Git.

syncPolicy:
  automated:
    prune: true

Convenient but potentially dangerous. It also means that if someone accidentally deletes a manifest file, the production resource disappears. So typically, teams use prune: true for dev/staging and prune: false for production with manual cleanup.

If you want to exclude specific resources from pruning, you can use an annotation.

metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false

Resources with this annotation will remain in the cluster even if they’re removed from Git. Applying this to resources like PersistentVolumeClaims that shouldn’t be carelessly deleted adds an extra safety net.

Self Heal — Automatic Drift Recovery

During cluster operations, it’s not uncommon for someone to modify resources directly with kubectl edit. Whether it’s an urgent incident response or a mistake, changes not reflected in Git create drift between Git and the cluster.

selfHeal: true detects this drift and automatically reverts to the Git state.

syncPolicy:
  automated:
    prune: true
    selfHeal: true

With Self Heal enabled, even if someone changes the replica count via kubectl, ArgoCD reverts it to the original value within seconds. Think of it as a feature that enforces the core GitOps principle that “Git is the only source of truth.”

There’s a caveat, though. It can conflict with HPA (Horizontal Pod Autoscaler). HPA might increase replicas, only for Self Heal to reduce them again — creating a loop. In this case, either remove the spec.replicas field from the Git manifests entirely, or configure ArgoCD to ignore that specific field.

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas

This tells ArgoCD to ignore differences in the replicas field, allowing it to peacefully coexist with HPA.

Sync Wave — Controlling Deployment Order

There are dependency relationships between Kubernetes resources. A Namespace must exist before you can create resources inside it, and ConfigMaps or Secrets should be ready before a Deployment comes up for it to function properly.

ArgoCD controls this order through a mechanism called Sync Wave. Assign a wave number to each resource, and they’re synced sequentially from the lowest number first.

Let’s look at the execution order per wave and which resources are deployed at each step.

flowchart LR
    W1["Wave -1\nCRD definitions"]
    W0["Wave 0\nNamespace"]
    W1a["Wave 1\nConfigMap, Secret"]
    W2["Wave 2\nDeployment, StatefulSet"]
    W3["Wave 3\nService, Ingress"]

    W1 -->|"After Healthy check"| W0
    W0 -->|"After Healthy check"| W1a
    W1a -->|"After Healthy check"| W2
    W2 -->|"After Healthy check"| W3

Each wave completes and all its resources must reach Healthy status before moving to the next wave. The following example manifests show how this ordering is expressed through annotations.

# Wave 0: Namespace first
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "0"

---
# Wave 1: ConfigMap and Secret
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "1"
data:
  DATABASE_HOST: "db.example.com"
  LOG_LEVEL: "info"

---
# Wave 2: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-org/my-app:latest
          envFrom:
            - configMapRef:
                name: app-config

---
# Wave 3: Service
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "3"
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

Resources with the same wave number are synced simultaneously, and all resources in that wave must be healthy before the next wave begins. If no annotation is specified, the default value is 0.

You can also use negative wave numbers. Assigning -1 to resources like CRDs that need to be processed before everything else is a useful technique.

Sync Hook — Running Tasks Before and After Sync

Sometimes you need to run DB migrations before deployment or execute smoke tests after deployment. ArgoCD uses Sync Hooks to run Jobs or Pods at specific points in the sync lifecycle.

Let’s look at when hooks are executed and in what order across the entire lifecycle.

sequenceDiagram
    participant U as User/Auto Trigger
    participant A as ArgoCD
    participant K as Kubernetes

    U->>A: Sync start
    A->>K: Create PreSync Job (e.g., DB migration)
    K-->>A: PreSync success
    A->>K: Apply Sync resources (in wave order)
    Note over K: wave 0 → 1 → 2... sequential deployment
    K-->>A: All resources Healthy
    A->>K: Create PostSync Job (e.g., smoke test, notification)
    K-->>A: PostSync success
    A-->>U: Synced state complete

    Note over A: SyncFail Hook fires on failure

The supported hook types are:

Hook	Execution Timing
`PreSync`	Before sync starts. Ideal for DB migrations and schema changes
`Sync`	Applied alongside main resources
`PostSync`	After all resources reach Healthy status. For smoke tests, notification delivery
`SyncFail`	When sync fails. For rollback notifications or cleanup tasks

Here’s an example of a DB migration as a PreSync Hook.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
  namespace: my-app
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: my-org/db-migrate:latest
          command: ["./migrate", "up"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url
      restartPolicy: Never
  backoffLimit: 3

hook-delete-policy determines the cleanup policy for Hook resources. Setting HookSucceeded automatically deletes the Job after it succeeds, preventing completed Jobs from piling up. Using BeforeHookCreation deletes the previous Hook first and creates a new one on the next sync.

Sending a notification after deployment via PostSync is also a common pattern.

apiVersion: batch/v1
kind: Job
metadata:
  name: notify-deploy
  namespace: my-app
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - -H
            - "Content-Type: application/json"
            - -d
            - '{"text":"my-app deployment complete."}'
            - https://hooks.slack.com/services/T00/B00/xxxx
      restartPolicy: Never
  backoffLimit: 1

You can also combine Hooks with Sync Waves. Adding wave numbers to PreSync Hooks lets you control the order even among multiple PreSync tasks.

Retry Policy — Retrying on Failure

Temporary issues like network blips or API server outages causing sync failures isn’t uncommon. Since manually re-syncing every time is inefficient, setting up a Retry policy enables automatic retries.

syncPolicy:
  automated:
    prune: true
    selfHeal: true
  retry:
    limit: 5
    backoff:
      duration: 5s
      factor: 2
      maxDuration: 3m

This configuration retries up to 5 times on sync failure, with the first retry after 5 seconds, the second after 10 seconds, the third after 20 seconds, and so on — doubling the interval each time. The maximum wait time is capped at 3 minutes.

The reason for using an exponential backoff strategy is simple. If the API server is overloaded, retrying at short intervals only makes the situation worse. Gradually increasing the interval gives the server time to recover.

Sync Option Combinations — Per-Environment Strategies

Let’s organize how to combine the options we’ve covered for each environment.

For development environments, fast feedback is the priority. You want commits to be reflected immediately and unnecessary resources to be cleaned up automatically.

syncPolicy:
  automated:
    prune: true
    selfHeal: true
  retry:
    limit: 3
    backoff:
      duration: 5s
      factor: 2
      maxDuration: 1m

Staging typically maintains a flow similar to production while keeping auto sync enabled. Whether to use Prune depends on the situation.

syncPolicy:
  automated:
    prune: false
    selfHeal: true
  retry:
    limit: 5
    backoff:
      duration: 10s
      factor: 2
      maxDuration: 3m

Production requires caution. It’s safer to disable auto sync and sync manually after review. Keeping only Self Heal on prevents drift while avoiding unintended deployments.

syncPolicy:
  automated:
    prune: false
    selfHeal: true
  retry:
    limit: 5
    backoff:
      duration: 30s
      factor: 2
      maxDuration: 5m

Of course, this is just a guideline, not a definitive answer. Depending on the team’s maturity and CI/CD pipeline setup, some organizations use Auto Sync in production, while other teams prefer manual sync even in development environments.

Sync strategies are the backbone of ArgoCD operations. Which options you enable or disable determines the level of deployment automation and safety. In the next part, we’ll look at how to manage multiple clusters with a single ArgoCD instance and how to automate repetitive Application definitions with ApplicationSet.

→ Part 6: Multi-Cluster and ApplicationSet