Files
dokploy/docs/design/forward-auth-sso.md
T
Mauricio Siu 41c09cd86b feat: implement forward authentication settings and UI components
- Added a new `forward_auth_settings` table to manage authentication domains and their configurations.
- Introduced UI components for handling forward authentication, including enabling/disabling SSO for domains and selecting SSO providers.
- Updated existing tests to include validation for the new `forwardAuthProviderId` field in domain configurations.
- Enhanced the dashboard to integrate forward authentication management, allowing users to configure SSO settings directly from the application interface.

This update improves the flexibility and security of application authentication by allowing integration with various identity providers.
2026-06-02 01:47:50 -06:00

22 KiB
Raw Blame History

Design: SSO Forward Auth for Deployed Applications (Enterprise)

Status: Approved for implementation (v1) — branch feat/forward-auth-sso Author: Engineering Date: 2026-05 Audience: Internal + enterprise customer requesting the feature

Decisions locked for v1

  • Auth gate: Option A — integrate oauth2-proxy (do not build our own auth server).
  • OIDC source: reuse the existing sso_provider table (read-only from this feature).
  • Auth domain per server, modeled as a domains row: because each server is an isolated swarm with its own proxy (§6.1), each server has its own auth domain (e.g. auth-prod.acme.com) hosting that server's single oauth2 callback, registered once in the IdP per server. The auth domain is a row in the existing domains table with domainType: "forwardAuth" and a serverId (null = local host) — so it inherits certificates, TLS and the domain pipeline like any app domain, instead of a separate settings table. There is no forward_auth_settings table.
  • domains.forwardAuthProviderId (FK → sso_provider.providerId, ON DELETE set null) marks an app domain as protected by a provider; deleting the provider auto-unprotects the domain. This is distinct from the forwardAuth domain row, which is the gate itself.
  • Why per-server (not one global auth domain): a single auth.acme.com would resolve (DNS) to one server only, and the forwardAuth check runs over each server's internal network — a remote server can't reach another server's proxy without exposing it publicly. Per-server keeps every server autonomous (local forwardAuth, no cross-server traffic). The cost is one IdP callback per server, which is acceptable.
  • Shared base domain assumption: the auth domain and the protected apps on a server share a base domain, so the session cookie (scoped to baseDomain) works across that server's apps. Apps outside that base are out of scope for v1.
  • Client secret at rest: deferred — the clientSecret stays unencrypted in oidcConfig for v1 (same as today). Tracked as security debt in §10.
  • oauth2-proxy quirks handled: --insecure-oidc-allow-unverified-email (many IdPs send email_verified=false → otherwise a 500), and whitelist-domains = baseDomain (oauth2-proxy has no universal wildcard; the base domain covers every app under it).

1. Problem statement

An enterprise customer wants to place an SSO authentication gate in front of each deployed application (the apps/compose services that Dokploy publishes through Traefik), so that an unauthenticated visitor must log in against the company's IdP (OIDC) before reaching the app. This should be an enterprise-only feature, and ideally should reuse the OIDC information we already store.

In short: "Can we sit an SSO layer between Traefik and each application, reusing the OIDC tables?"

Answer: yes, it's feasible. Traefik supports this natively via the forwardAuth middleware. The hard part is not Traefik — it's the auth proxy service that performs the OIDC flow. This doc compares the two viable ways to build that service, and confirms what we can and cannot reuse from the existing SSO tables.


2. Critical clarification: what our OIDC data actually is

The existing sso_provider table (packages/server/src/db/schema/sso.ts:7) is owned by the better-auth SSO plugin. It exists so that users can log into the Dokploy dashboard against an external IdP (Dokploy acts as an OIDC/SAML client).

It stores, as JSON text columns:

  • oidcConfigclientId, clientSecret, authorizationEndpoint, tokenEndpoint, userInfoEndpoint, jwksEndpoint, discoveryEndpoint, scopes, pkce, and a mapping for user fields.
  • samlConfig — full SAML SP/IdP metadata.
  • issuer, providerId (unique), domain, organizationId, userId.

⚠️ Important security note: the clientSecret lives inside the oidcConfig text column as plain JSON and is not encrypted at rest in the schema. Reusing this data for a second purpose (see §4) means that secret gets read and re-injected into another service's config. That widens its blast radius and must be called out to the customer. If we reuse it we should seriously consider encrypting it at rest as part of this work.

Key point for the customer: this data describes Dokploy-as-an-OIDC-client. To protect their applications, we need a separate component (an auth proxy) that runs the OIDC authorization-code flow on behalf of the protected app — handle login redirect, callback, token validation, session cookie, and logout. better-auth's SSO plugin does not do this for third-party apps behind Traefik; it only logs users into Dokploy itself.

So "reuse the OIDC tables" is possible at the level of credentials/endpoints (clientId, secret, issuer, scopes), but the runtime behavior (the actual SSO gate) is net-new regardless of approach.


3. How the Traefik side works (the easy half)

Traefik's forwardAuth middleware delegates the auth decision to an external HTTP service. For every request Traefik calls address; a 2xx lets the request through (optionally copying authResponseHeaders back to the app), anything else (typically a 302 to the IdP) is returned to the browser.

Dokploy already has everything needed to wire this up:

Capability Where it lives today Reuse
ForwardAuthMiddleware Traefik type utils/traefik/file-types.ts:659 (address, tls, trustForwardHeader, authResponseHeaders, authRequestHeaders) as-is
Per-domain middleware chain domains.middlewares: text[] column (db/schema/domain.ts) — already exists and is applied as-is
Attaching middleware to a router createDomainLabels() joins domain.middlewares into traefik.http.routers.<name>.middlewares (utils/docker/domain.ts:255) as-is
Writing dynamic middleware YAML createSecurityMiddleware() / writeMiddleware() pattern, local + remote(SSH) (utils/traefik/security.ts, middleware.ts) as pattern
Deploying a helper container/service on the swarm dokploy-redis / dokploy-monitoring / dokploy-traefik setup (setup/redis-setup.ts, monitoring-setup.ts, traefik-setup.ts) on dokploy-network as pattern
Enterprise gating enterpriseProcedure + hasValidLicense() (server/api/trpc.ts:216, services/proprietary/license-key.ts) as-is

So the Dokploy-side glue (UI toggle on a domain → write a forwardAuth middleware → append its name to domains.middlewares → reload Traefik) is small and low-risk. The variable is the auth service that address points to.


4. The decision: where does the auth flow run?

This is the real fork. Both options use the same Traefik forwardAuth wiring from §3; they differ in what sits behind address.

Option A — Integrate an existing forward-auth proxy (oauth2-proxy)

Deploy a battle-tested proxy (e.g. oauth2-proxy or traefik-forward-auth) as a Dokploy-managed Docker service, configured from the OIDC credentials. Dokploy generates the proxy config + the Traefik middleware; the proxy owns the OIDC flow, sessions, and cookies.

Pros

  • The security-critical part (OIDC flow, session/cookie handling, token refresh, logout, CSRF/ state) is mature, audited, and maintained externally.
  • We write config + deployment glue, not an auth server. Far less code.
  • oauth2-proxy supports OIDC discovery, header injection, allowed-domains/groups, and Traefik forwardAuth mode out of the box.
  • Deployment follows an existing pattern (dokploy-monitoring/dokploy-redis style services on dokploy-network, local + remote via serverId).

Cons

  • A new bundled image to ship, version, and update across self-hosted + remote servers.
  • Per-org (or per-app) proxy instance + a session store (cookie-based or Redis) to manage.
  • Less branding control (login is the IdP's; the proxy is mostly invisible, which is usually fine).
  • Mapping our oidcConfig JSON → oauth2-proxy env/flags is a translation layer we must own and keep correct as configs vary (PKCE, custom endpoints, skipDiscovery, etc.).

Option B — Build our own forward-auth service

Write a small Dokploy auth service that implements the OIDC authorization-code flow itself and answers Traefik's forwardAuth calls.

Pros

  • Full control over UX/branding, session model, and how it integrates with Dokploy orgs/permissions.
  • One codebase we fully understand; no third-party image to track.
  • Could share types/utilities with the rest of packages/server.

Cons

  • We are now building and owning an authentication service — sessions, signed/encrypted cookies, CSRF/state/nonce, token validation against JWKS, refresh, logout, clock-skew, replay protection. This is a large, security-sensitive surface that is easy to get subtly wrong.
  • The earlier "~200 LOC service" estimate is unrealistic; a correct implementation is substantially more, plus ongoing security maintenance.
  • We carry the liability for any auth bug in front of customer apps.

Recommendation

Option A (integrate oauth2-proxy). The Traefik wiring is identical either way, so the only thing we're really choosing is whether to own an auth server. For a feature that gates access to customer production apps, delegating the auth flow to a mature project is the lower-risk, lower-cost, faster path. Build our own only if a hard requirement (deep branding, an unusual session model, air-gapped constraints) makes oauth2-proxy unworkable — none is evident yet.


5. Reusing sso_provider (per your decision)

You chose to reuse the existing sso_provider OIDC config rather than add an independent table. That's workable and minimizes setup for the customer, with these caveats to design around:

  1. Semantic coupling. sso_provider currently means "how Dokploy users log into the dashboard." Reusing it for "how app visitors authenticate" overloads it. The IdP/client may legitimately need to differ (different OIDC client, different allowed audience, different redirect URIs — the app's callback, not Dokploy's). Mitigation: treat sso_provider as the source of issuer + base credentials, and add a thin per-domain config (which provider, plus app-specific redirect/allowed-groups) rather than assuming a 1:1 reuse.
  2. Redirect URIs. Each protected app needs its callback registered at the IdP (e.g. https://app.customer.com/oauth2/callback). The dashboard login uses Dokploy's own callback. The customer must add the app callbacks to the same OIDC client, or use a dedicated client. Document this clearly.
  3. Secret handling. As noted in §2, reading clientSecret out of oidcConfig and injecting it into oauth2-proxy means that secret now lives in a second place (proxy config/env on the target server). Recommend encrypting oidcConfig at rest and passing the secret to the proxy via a Docker secret / file mount rather than a plain env var.
  4. better-auth ownership. register currently round-trips through auth.registerSSOProvider() (sso.ts:251); rows may be written by an external auth service. We should read from sso_provider for forward-auth, but avoid mutating it through the forward-auth feature to prevent fighting better-auth over the same rows.

6. Proposed architecture (Option A)

                                         ┌───────────────────────────┐
  Browser ──HTTPS──▶  Traefik  ──forwardAuth──▶  oauth2-proxy (dokploy-managed)
                       │  router for app.customer.com        │
                       │   middlewares=[sso-<provider>]       │  OIDC auth-code flow
                       │                                      ▼
                       │                              Customer IdP (OIDC)
                       │                                      │
                       ◀───── 2xx + X-Auth-* headers ─────────┘
                       │
                       ▼
                 Deployed application

New/changed pieces (all enterprise-gated):

  1. Helper service deployment — a dokploy-forward-auth (oauth2-proxy) Docker service per org (or per server), modeled on monitoring-setup.ts / redis-setup.ts, attached to dokploy-network, supporting local + remote (serverId). Config derived from the chosen sso_provider.oidcConfig.
  2. Traefik middleware generation — a createForwardAuthMiddleware() following the security.ts pattern: write a forwardAuth entry (using ForwardAuthMiddleware from file-types.ts) to the dynamic middlewares file, address pointing at the helper service, with authResponseHeaders for the user identity headers.
  3. Domain wiring — UI toggle "Protect with SSO" on a domain + a field to pick the provider; appends the middleware name to the existing domains.middlewares[] and reloads Traefik. No schema change strictly required for the chain itself; a small column or join is needed to record which provider protects a domain.
  4. tRPC routerforward-auth router under routers/proprietary/, all enterpriseProcedure, with enable/disable-on-domain mutations.

6.1. Remote servers: one proxy per server

This is forced by Dokploy's networking model, not a design preference:

  • Each remote server is its own isolated Docker Swarm (docker swarm init per server, server-setup.ts:381).
  • dokploy-network is an overlay local to each server's swarm (server-setup.ts:438) — it does not span servers. A container on the Dokploy host cannot reach a container on a remote server over dokploy-network.
  • Each server runs its own Traefik (traefik-setup.ts:120); it only routes to services on that same server.

Therefore Traefik on server A can only forwardAuth to a proxy that lives on server A. The deployment model is one dokploy-forward-auth instance per server (host + each remote), exactly mirroring how dokploy-monitoring is already deployed per server via getRemoteDocker(serverId) (monitoring-setup.ts:10). One instance per server still protects all apps on that server (multi-upstream), so it is not one-per-app.

Dokploy host:      dokploy-forward-auth   → protects local apps
Remote server A:   dokploy-forward-auth   → protects A's apps
Remote server B:   dokploy-forward-auth   → protects B's apps

Session scope (v1 = isolated per server): because oauth2-proxy sessions are cookie-based per instance, a user moving between an app on server A and an app on server B may re-authenticate. v1 accepts this. To enable shared SSO later, point all instances at a common cookie domain and the same cookie-secret; v1 stores these in a structured config so flipping to shared mode is config-only, not a refactor.

Lifecycle: deploy/update the proxy per server during the serverSetup flow (server-setup.ts:47) and/or lazily the first time a domain on that server is protected.

6.2. Auth domain per server (the low-friction model)

The first iteration used a per-app callback (https://app/oauth2/callback), which meant: register a callback in the IdP per app, and update the proxy whitelist (a service.update) on every new protected domain. Too manual.

v1 uses one auth domain per server (each server is autonomous — §6.1):

Per server (e.g. "Production"):
1. Admin sets "auth-prod.acme.com" for that server in SSO settings (once).
   → a Traefik router  auth-prod.acme.com/oauth2/*  → that server's oauth2-proxy
   → ONE callback to register in the IdP:  https://auth-prod.acme.com/oauth2/callback

2. app1.acme.com on Production (SSO enabled):
   - no session → forwardAuth 401 → errors middleware 302s the browser to
     https://auth-prod.acme.com/oauth2/sign_in?rd=<app1 url>
   - login at IdP → returns to auth-prod.acme.com/oauth2/callback (the one registered)
   - cookie scoped to .acme.com → redirect back to app1.acme.com ✅

3. app2.acme.com, app3.acme.com on the same server:
   - same flow, same callback, same cookie. ZERO new IdP config, ZERO proxy redeploy. ✅

Why it removes both pain points (within a server):

  • One IdP callback per server: the redirect_uri is always that server's auth-<server>.acme.com/oauth2/callback, configured once per server.
  • No per-app redeploy: cookie + whitelist are scoped to baseDomain, which already covers any new subdomain on that server.

Wiring summary:

  • forward_auth_settings, unique per (organizationId, serverId): authDomain, baseDomain (derived, e.g. .acme.com), https. serverId = null = local host.
  • Proxy env (per server): redirect-url = <scheme>://authDomain/oauth2/callback, cookie-domains = baseDomain, whitelist-domains = baseDomain, per-server cookie-secret.
  • Traefik: a dedicated forward-auth-domain.yml router for authDomain/oauth2/* → proxy on that server; each protected app gets a forwardAuth + an errors middleware that 302s to its server's auth domain login. The middleware resolves which auth domain to use from the app's serverId.

Limitations (out of scope for v1):

  • Apps not under their server's baseDomain won't get shared SSO (cross-domain cookies).
  • SSO is not shared across servers (a user moving between apps on different servers logs in again). True cross-server SSO would require exposing one proxy publicly for cross-server forwardAuth — deliberately avoided for autonomy/latency.

7. Open questions for the customer / product

  • Granularity: protect per domain, per application, or per project/environment?
  • Session scope: single sign-on shared across all protected apps on a base domain, or isolated per app? (Affects cookie domain + whether one proxy instance is shared.)
  • Authorization, not just authentication: do they need group/role-based allow rules (e.g. only group=engineering), or is "any authenticated user from the IdP" enough?
  • Remote servers: must this work on remote (SSH-managed) servers from day one, or local/Dokploy-host only for v1?
  • Logout / session lifetime expectations.
  • Dedicated OIDC client for app protection vs reusing the dashboard-login client.

8. Effort estimate (Option A, design-validated)

Assumes oauth2-proxy, reuse of sso_provider, local + remote support, one provider per domain.

Workstream Rough effort
Helper service deploy (image choice, setup module, local+remote, lifecycle) 35 d
OIDC config → proxy config translation layer (incl. secret handling) 23 d
createForwardAuthMiddleware() + dynamic file write/reload (local+remote) 23 d
Domain wiring + provider linkage (schema touch, labels, enable/disable) 23 d
tRPC router + UI (toggle, provider select, status) 23 d
Security review, encryption-at-rest for secret, testing 34 d
Total ~1421 d

Option B (own auth service) is meaningfully larger — add the full auth-server build plus ongoing security ownership; do not estimate it as a small delta over A.


9. Recommendation summary

  • Feasible: yes. Traefik forwardAuth + Dokploy's existing middleware/deploy patterns make the integration straightforward.
  • Build the gate with oauth2-proxy (Option A), not a hand-rolled auth server.
  • Reuse sso_provider for credentials/endpoints, but add a thin per-domain link and treat app callbacks/redirects as distinct from dashboard login. Client-secret encryption at rest is deferred (see §10).
  • Gate everything behind enterpriseProcedure + valid license, consistent with existing SSO.
  • Resolve the §7 product questions (granularity, authorization rules, remote-server scope) before committing to the estimate.

10. Security debt (deferred to a follow-up)

These are knowingly accepted for v1 and must be tracked, not forgotten:

  1. clientSecret unencrypted at rest. oidcConfig (incl. clientSecret) remains plain JSON in the DB, as it is today. Reusing it for forward-auth propagates the secret to each server's proxy config. Follow-up: add encrypt/decrypt for oidcConfig and rotate.
  2. Secret transport to proxy. Even in v1, pass clientSecret to oauth2-proxy via a Docker secret / mounted file, not a plain env var, to keep it out of docker inspect output.
  3. Trusted proxy. Configure oauth2-proxy --reverse-proxy=true and restrict --trusted-proxy-ip to the Traefik instance so forwarded identity headers can't be spoofed by the upstream app or other containers.
  4. Cross-server shared session (deferred). v1 is isolated per server (§6.1); shared SSO is a config flip later, not built now.