Self-hosting & operations

Deploy

Take OrthID to production on Kubernetes: the data-plane and control-plane split, a complete Helm values file, sizing guidance, TLS, and the checks that confirm a healthy rollout.

This page covers a single-region production deployment. If you have not run OrthID at all yet, start with Self-host for prerequisites and a local stack. To run more than one region, deploy this topology once per region and read Regions.

Topology: data plane and control plane

OrthID separates two responsibilities so they can scale and fail independently.

Data plane: the request-path services that verify sessions, issue tokens, mint agent credentials, and serve sign-in. This is the hot path. It is stateless, horizontally scaled, and the only tier that must be reachable from your apps. It holds an open connection to Postgres and your key provider but keeps no local state.
Control plane: the management services behind the Operator and Tenant consoles, plus background workers (audit archiving, key rewrap, SCIM sync, scheduled exports). It tolerates brief downtime without affecting live sign-ins. Run fewer replicas; do not expose it to the public internet.

Both planes share one Postgres database and one key provider. Keeping them on separate deployments means a surge of console or batch activity cannot starve the sign-in path, and you can roll or scale each independently.

Note

Identity records and signing keys stay in this deployment’s home region. The data plane and control plane both run inside the region you set in region; nothing replicates out of it. See Regions for multi-region patterns.

Helm values

The chart from Self-host exposes both planes. Below is a production-shaped values file. Sensitive connection strings come from an existing Kubernetes secret, not from this file.

values.yaml

region: au-syd-1
publicUrl: https://id.acme.health

# Connection strings and other secrets live in a Secret, not here.
existingSecret: orthid

secrets:
  provider: vault          # customer-managed keys; see the BYOK guide.

# Data plane: the hot sign-in / verify / token path.
dataPlane:
  replicaCount: 4
  autoscaling:
    enabled: true
    minReplicas: 4
    maxReplicas: 20
    targetCPUUtilizationPercentage: 65
  resources:
    requests: { cpu: "500m", memory: "512Mi" }
    limits:   { cpu: "2",    memory: "1Gi" }

# Control plane: consoles and background workers.
controlPlane:
  replicaCount: 2
  resources:
    requests: { cpu: "250m", memory: "512Mi" }
    limits:   { cpu: "1",    memory: "1Gi" }

storage:
  endpoint: https://s3.ap-southeast-2.amazonaws.com
  bucket: orthid-prod-au

ingress:
  enabled: true
  className: nginx
  host: id.acme.health
  tls:
    enabled: true
    secretName: orthid-tls   # cert-manager populates this

Apply it with a versioned upgrade so migrations run as a pre-upgrade hook:

Terminal

helm upgrade --install orthid orthid/orthid \
  --namespace orthid --create-namespace \
  --values values.yaml \
  --version 1.8.0

Sizing

OrthID is light on the request path because sessions.verify() validates JWTs locally and adds no network hop. The figures below are starting points for the data plane; measure and let autoscaling do the rest.

Prop	Type	Default	Description
`Up to 100k MAU`	`data plane`	4 pods	0.5 vCPU / 512Mi each. Postgres: 2 vCPU, 8Gi, 100Gi disk.
`Up to 1M MAU`	`data plane`	6 to 12 pods	Autoscale to 12. Postgres: 4 vCPU, 16Gi, with a read replica for the control plane.
`Control plane`	`any size`	2 pods	Consoles and workers. Scale by background job volume (SCIM, exports), not by sign-in traffic.
`Postgres`	`shared`	-	The capacity constraint at scale. Size connections and IOPS for peak sign-in bursts; enable PITR.

Postgres is the limiter

The data plane is stateless and cheap to scale; the database is what you tune. Give it headroom on connections and IOPS, keep it in the same region and availability zone family as the pods, and turn on point-in-time recovery before go-live.

TLS

All traffic to OrthID must be HTTPS; the API rejects plaintext, and session cookies are set Secure. Terminate TLS at your ingress. The values file above expects a certificate in the orthid-tls secret, which cert-manager can issue and renew automatically.

certificate.yaml

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: orthid-tls
  namespace: orthid
spec:
  secretName: orthid-tls
  dnsNames:
    - id.acme.health
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

Set publicUrl to the exact HTTPS origin OrthID is reached at. It is baked into token issuer and audience claims and into redirect URLs, so a mismatch breaks sign-in and SSO callbacks.

Post-deploy verification

After the rollout, confirm both planes are healthy before sending real traffic. Check that pods are ready, that the data plane reports all dependencies connected, and that a TLS sign-in round trip works.

Terminal

# 1. Both planes rolled out and ready.
kubectl -n orthid get pods
kubectl -n orthid rollout status deploy/orthid-data-plane
kubectl -n orthid rollout status deploy/orthid-control-plane

# 2. Migrations applied at the expected schema version.
kubectl -n orthid logs job/orthid-migrate | tail -n 5

# 3. Readiness: database, storage, and secrets all connected.
curl -s https://id.acme.health/readyz | jq
# { "status": "ready", "region": "au-syd-1",
#   "checks": { "database": "ok", "storage": "ok", "secrets": "ok" } }

# 4. TLS terminates and the right origin is served.
curl -sI https://id.acme.health/healthz | head -n 1
# HTTP/2 200

Point probes at /readyz

Configure your ingress and Kubernetes readiness probes against /readyz, not /healthz. A pod can be alive but unable to reach Postgres or the key provider; routing to it would fail sign-ins. /readyz only returns ready when every dependency is connected.

Next steps

Regions to run OrthID in more than one region and pin tenants to a residency boundary.
Upgrade for safe version upgrades, migration hooks, and rollback.