PocketID is my latest home lab project, if you have been following my other posts. Running Pocket ID at a single site is straightforward. Running it across two sites, in a way that actually survives one of them going offline without requiring you to re-enroll every passkey you own, is a different problem entirely. Passkeys are not passwords. You can’t just copy a username and a hash into a new database and call the user migrated. WebAuthn credentials are bound to an authenticator origin and the specific credential record that was created during registration. If that record doesn’t exist on the instance you’re authenticating against, the authentication will fail. Full stop.
So when I set out to add a second Pocket ID instance at a second site, the question was never “which database replication tool do I use.” It was: how do I keep exactly the right data in sync between two nodes, over a link I don’t fully control, without requiring me to re-enroll passkeys every time something changes? If all of my engineering training has taught me anything, it’s to think about the problem you are trying to solve, and THEN explore all the ways to solve that problem.
This post covers the sync architecture I designed to solve that problem, and specifically the reasoning behind the encryption scheme. If you’re running Pocket ID yourself and have thought about high availability (HA) at all (something SRE’s should always be thinking about), hopefully this is useful.
The Constraint That Shapes Everything
The two Pocket ID instances communicate over plain HTTP. Mainly because I am lazy and the interception of traffic on my network between two backend services on a separate VLAN didn’t score high on my threat model. However, while the sync path is internal to my network, “internal” is not the same as “safe to send secrets over in plaintext.” The sync payload contains WebAuthn credential records, OIDC client secrets, API keys, and custom claims. None of that should travel unencrypted, even on a network I nominally control.
This constraint ruled out the simplest possible design (i.e., authentication with just HTTP with a shared secret) and pointed directly toward payload-level encryption, because that’s what you reach for when you can’t do transport-level encryption.
The Architecture
The model is primary/secondary polling. The secondary polls the primary’s GET /sync/export endpoint at a configurable interval (five minutes by default). The primary builds a SyncPayload containing every piece of state that needs to be consistent across nodes: users with their WebAuthn credential records, custom claims, and group memberships; user groups; OIDC clients; API keys; and app config variables. That payload is encrypted with ECIES (Elliptic Curve Integrated Encryption Scheme), leveraging the secondary’s public key before it leaves the primary. The secondary receives it, decrypts it, and reconciles its local database via a transactional upsert. More on why ECIES specifically in a moment — but the short version is that it handles key exchange and authenticated encryption without requiring a shared secret or a transport-layer PKI.
The reconciliation logic follows a simple rule: the primary is always the source of truth. If an entity is present in the payload, it is upserted on the secondary. If an entity exists on the secondary but is absent from the payload, it is deleted. There is no merging, no conflict resolution, no two-way replication. The secondary is a read replica of everything except local runtime state.
One important detail about what gets excluded: internal fields like instanceId and the sync keypairs themselves are never included in the payload. Each node needs to remain individually identifiable, and you definitely don’t want to overwrite the secondary’s own keypair with the primary’s during a sync.
After the database reconciliation, app config is reloaded in-memory. No restart required.
flowchart LR
subgraph primary [Primary Site]
P[Pocket ID\nPrimary]
end
subgraph secondary [Secondary Site]
S[Pocket ID\nSecondary]
end
S -->|"GET /sync/export\n(bearer token)"| P
P -->|"ECIES-encrypted SyncPayload"| S
S -->|"decrypt + transactional upsert"| S
Registration and Auth
Before the secondary can poll, it needs to register with the primary. On first boot, the secondary generates a keypair and exposes the public half at GET /sync/pubkey. That public key is the recipient key — the primary will use it to encrypt every sync payload destined for this secondary. An admin copies that public key into the primary’s UI along with a name, and receives a bearer token in return, shown exactly once. That token goes into SYNC_PRIMARY_TOKEN on the secondary, and polling begins. All data shared between the two hosts is encrypted before being dropped onto the wire using these shared keypairs, leveraging the Elliptic Curve Integrated Encryption Scheme (ECIES). (Don’t worry, if you aren’t familiar with ECIES yet, I dive into this below!)
The primary doesn’t store the bearer token itself. It stores only the SHA-256 hash. This means a compromised primary database doesn’t hand an attacker the credentials needed to impersonate a secondary on the sync endpoint. The authentication model is narrow: a valid bearer token is proof of registration, nothing more.
Why are we generating keypairs, and what the heck is ECIES?
This is the part I want to spend some time on, because the choice of encryption scheme was deliberate, and it’s worth explaining the reasoning, not just the result. In its simplest form, this encryption scheme allows 2 parties to each share portions of a key only each of them know separately, merge them together, and create a brand-new key which is used for all future interactions.
Think of it kind of like a conference call bridge. There is a big meeting room where everyone is talking freely, but if 2 people want to have a side conversation, they can enter their own unique participant ID and wind up in a channel that is just for them! Neither party knows the other’s code, but magically the conference bridge knows to connect these two parties together!
What is ECIES?
Warning! This part will get a bit technical, and dive into some intermediate encryption techniques. If the above analogy worked enough for you, and you’re not interested in learning about encryption techniques, feel free to skip this section.
ECIES stands for Elliptic Curve Integrated Encryption Scheme. The name is a bit dense, but the idea is elegant: it’s a scheme that handles key agreement, key derivation, and authenticated encryption as one coherent package, using elliptic curve cryptography as the foundation.
In concrete terms, here’s what happens when the primary encrypts a sync payload for the secondary:
- The primary generates a fresh ephemeral keypair (X25519 curve). This keypair exists only for this one encryption operation.
- The primary performs an Elliptic Curve Diffie-Hellman (ECDH) key agreement between its ephemeral private key and the secondary’s long-term public key. The output is a shared secret that only these two keypairs can produce.
- That shared secret is fed into HKDF-SHA256 (a standard key derivation function) with the info string
"pocket-id-sync-v1"to derive the actual encryption key. The info string is a domain separator — it ensures that a key derived for this specific purpose can’t be accidentally reused for something else. - The derived key is used with AES-256-GCM to encrypt the payload. GCM is an authenticated mode, which means it doesn’t just encrypt the data, it also produces an authentication tag. If anyone tampers with the ciphertext in transit, decryption will fail with an authentication error rather than silently producing garbage output.
- The wire format that goes over HTTP is: the 32-byte ephemeral public key, followed by the AES-GCM nonce, followed by the ciphertext, followed by the GCM authentication tag.
When the secondary receives the payload, it reverses the process: it performs the same ECDH agreement using its long-term private key and the ephemeral public key from the wire format, derives the same key, and decrypts.
Why Not Just Use HTTPS Between the Nodes?
Because that would require retrofitting certificates and a mutual TLS setup specifically for this one sync path between two backend services that communicate over plain HTTP everywhere else. The engineering cost of that is real, and it couples a relatively simple sync feature to a significantly more complex infrastructure dependency. Payload-level encryption solves the same problem without touching the transport layer.
Why Not a Shared Secret?
Hint: For a shared secret, think your Wifi network’s passkey.
You’d have to exchange that secret securely out-of-band, store it on both nodes, and rotate it if either node is compromised. With ECIES, the secondary just publishes its public key at /sync/pubkey during registration. The primary never learns the secondary’s private key. There is no secret to exchange, because there’s no shared secret at all. The security rests entirely on the secondary’s long-term private key staying private, which is a significantly more tractable operational constraint.
The Forward Secrecy Property
Because each encryption uses a fresh ephemeral keypair (i.e., that key is only used once), there’s a useful forward secrecy property: if an attacker captures a sync payload on the wire today, and somehow compromises the secondary’s long-term private key years from now, they still cannot decrypt that captured payload. The ephemeral private key that produced the shared secret for that specific payload no longer exists. This isn’t a primary design goal for a homelab sync system, but it costs nothing to have it, and it’s a direct consequence of using ECIES correctly.
Why ECIES Over Something Like Libsodium’s Sealed Box?
Libsodium’s crypto_box_seal is functionally similar and would have been a reasonable alternative. The choice of ECIES came down to the primitives being explicit and auditable: X25519, HKDF-SHA256, AES-256-GCM are all named, specified standards with broad implementation support. The scheme is legible. Someone reading the code doesn’t need to know Libsodium’s specific construction choices to understand what’s happening; they just need to know those three primitives.
Alternatives I Considered and Rejected
Database Replication
The obvious approach for keeping two instances in sync is to replicate the database. PostgreSQL streaming replication, Litestream for SQLite, that category of solution. I ruled this out for a few reasons.
Pocket ID uses SQLite. Moving to PostgreSQL to get streaming replication is real operational overhead: a new service to run, a new failure mode to manage, a significantly more complex deployment. For a homelab service where simplicity is a genuine value, that’s a hard sell. Plus, PocketID is open source, so I have free rein to modify the code for this cool use case (Plus, I wanted to see how far I could press Claude!)
Even if I made the migration, database replication is all-or-nothing. Audit logs, ephemeral session data, internal metadata, the sync keypairs themselves — all of it replicates. The data that actually needs to be consistent across nodes (passkeys, OIDC clients, users, app config) is a small subset of what’s in the database. Replicating the whole thing to keep a handful of tables in sync is overkill, and it brings the split-brain problem with it: if the secondary goes offline and both nodes accept writes, reconciling a full database replica on reconnect is a significantly harder problem than re-syncing from the primary’s authoritative payload.
Shared Database
Point both instances at one database. This eliminates the sync problem entirely but defeats the HA goal just as cleanly. A single shared database is a single point of failure, and if it goes offline, both instances go with it. This still wouldn’t solve my split brain, or replication problem, either.
What the Custom Sync Endpoint Gets You
The custom sync endpoint lets both nodes keep their own SQLite databases, maintains a clear primary/secondary model where the primary is always authoritative, and replicates exactly the data that matters, nothing else. The data that needs to stay in sync is small and changes infrequently. There’s no reason to reach for database-level replication machinery for that.
Configuration
Wiring it up is straightforward. On the primary:
1SYNC_ENABLED=true
2SYNC_ROLE=primaryOn the secondary:
1SYNC_ENABLED=true
2SYNC_ROLE=secondary
3SYNC_PRIMARY_URL=http://primary-host:port # URL of the primary
4SYNC_PRIMARY_TOKEN=... # bearer token from registration
5SYNC_INTERVAL=5m # how often to pollThe bootstrap sequence is: start the secondary (it exposes its public key at GET /sync/pubkey), copy that key into the primary’s admin UI along with a name, save, copy the bearer token you get back (it won’t be shown again), set it as SYNC_PRIMARY_TOKEN on the secondary, and restart the secondary. From there, it polls on the configured interval and reconciles automatically. Also, you only have to do this bootstrap process a single time. Once the keys have been shared to the 2 instances, all encryption of future communication will be derived from these keys.
sequenceDiagram
actor Admin
participant S as Secondary
participant P as Primary
S->>S: Generate X25519 keypair on first boot
Admin->>S: GET /sync/pubkey
S-->>Admin: public key (base64)
Admin->>P: Register peer (name + public key)
P->>P: Store public key + SHA-256(token)
P-->>Admin: Bearer token (shown once)
Admin->>S: Set SYNC_PRIMARY_TOKEN=TOKEN
note over Admin,S: Bootstrap complete, polling begins
S->>P: GET /sync/export (Bearer token)
P->>P: Encrypt SyncPayload with secondary's public key
P-->>S: ECIES-encrypted payload
S->>S: Decrypt + reconcile database
The Load Balancer Gotcha
One thing I ran into that’s worth calling out: Pocket ID stores session state in memory. That means if HAProxy load balances a request mid-authentication to the other instance, the session doesn’t exist there and the authentication attempt fails. You’ll see a confusing error and have no idea why.
The fix is straightforward — 5-minute sticky sessions on HAProxy. Once a client starts an authentication flow, all requests from that client go to the same instance until the session completes. Five minutes is plenty of headroom for any reasonable auth flow, and it doesn’t meaningfully undermine the HA benefit since failover still works cleanly for new sessions.
1backend pocket_id
2 balance roundrobin
3 stick-table type ip size 200k expire 5m
4 stick on src
5 server primary primary-host:port check
6 server secondary secondary-host:port check backupWithout this, the sync works perfectly but authentication fails randomly depending on which node HAProxy picks. It’s the kind of thing that’s obvious in hindsight and deeply annoying to debug the first time.
Why did we go through all of this?
Running Pocket ID across two sites with automatic sync means that a passkey I enrolled last month works regardless of which instance is handling my request today, and it works even if one site is completely unreachable. That’s the actual goal: not a technically interesting encryption scheme, not a clever polling architecture, but the ability to authenticate from anywhere and have it just work.
The encryption scheme exists because the data being synced is sensitive and the transport is plain HTTP. ECIES was the right tool for that: it handles key exchange without requiring a shared secret, it derives a fresh encryption key for each payload, and it authenticates the ciphertext so tampering is detectable. Claude helped with the implementation details, particularly around the wire format and the HKDF parameter choices, but the design decision to use ECIES, and the reasoning that led there, came from working through the problem myself.
For a homelab, this is probably more engineering than the problem strictly required. But passkeys are specifically the authentication mechanism I don’t want to re-enroll every time I change something, and that constraint made the sync problem real enough to be worth solving properly. Also, it was a fun use of some Claude tokens and a few hours of my Tuesday night.
Also, special shout out to DTK for all of his help and guidance over the years learning PKI and encryption. You rock!