Atomic Claim Pattern
The atomic claim pattern is Monque’s core mechanism for ensuring a pending job is claimed by only one scheduler instance at a time.
This prevents concurrent duplicates, but it does not guarantee that a job can never run more than once. Jobs may be processed again after a crash, a retry, or stale recovery, so workers should be idempotent.
The Problem
Section titled “The Problem”In distributed systems, multiple workers might try to pick up the same job simultaneously. Without proper coordination, this leads to:
- Duplicate processing: Same job runs multiple times
- Race conditions: Workers step on each other’s work
- Lost updates: Results get overwritten
- Wasted resources: Redundant computation
The Solution: Atomic Claims
Section titled “The Solution: Atomic Claims”Monque uses MongoDB’s findOneAndUpdate with atomic guarantees to ensure only one scheduler claims each pending job.
How It Works
Section titled “How It Works”Why This Works
Section titled “Why This Works”- Atomic operation: MongoDB guarantees the query and update execute as one unit
- First-writer wins: Only one instance can match and update the same document
- Immediate visibility: Other instances see the updated document instantly
- No external locks: No need for Redis, ZooKeeper, or distributed lock managers
Scheduler Instance ID
Section titled “Scheduler Instance ID”Each Monque instance has a unique identifier:
Viewing Instance ID
Section titled “Viewing Instance ID”Multi-Instance Deployment
Section titled “Multi-Instance Deployment”Scaling Horizontally
Section titled “Scaling Horizontally”Run multiple scheduler instances for high availability:
Load Distribution
Section titled “Load Distribution”Jobs are distributed naturally based on claim timing:
Indexes for Performance
Section titled “Indexes for Performance”Monque creates the required MongoDB indexes during initialize() to keep claim and polling queries fast.
For the full list of indexes, see Jobs.
These indexes ensure claim operations remain fast even with large queues.
Claim Lifecycle
Section titled “Claim Lifecycle”sequenceDiagram
participant W1 as Worker 1
participant DB as MongoDB
participant W2 as Worker 2
Note over DB: Job in pending state
W1->>DB: findOneAndUpdate (claim attempt)
W2->>DB: findOneAndUpdate (claim attempt)
DB-->>W1: Job document (claimed!)
DB-->>W2: null (already claimed)
Note over W1: Processes job
W1->>DB: Update status to completed
Note over DB: Job complete, claimedBy cleared
Failure Handling
Section titled “Failure Handling”Worker Crashes
Section titled “Worker Crashes”If a worker crashes while processing:
- Job remains in
processingstatus withclaimedByset lastHeartbeatstops updating- After
lockTimeout, the job can be recovered on startup (see Heartbeat)
Graceful Shutdown
Section titled “Graceful Shutdown”When stop() is called:
Best Practices
Section titled “Best Practices”1. Use Meaningful Instance IDs
Section titled “1. Use Meaningful Instance IDs”2. Monitor Claim Metrics
Section titled “2. Monitor Claim Metrics”3. Handle Claim Failures Gracefully
Section titled “3. Handle Claim Failures Gracefully”Comparison with Other Patterns
Section titled “Comparison with Other Patterns”| Pattern | Pros | Cons |
|---|---|---|
| Atomic Claim (Monque) | No external dependencies, strong consistency | Requires MongoDB |
| Redis Locks | Fast, widely used | Additional infrastructure, lock expiry issues |
| Pessimistic Locking | Simple concept | Blocks other workers, deadlock risk |
| Optimistic Locking | No blocking | Retry storms under contention |
Next Steps
Section titled “Next Steps”- Change Streams - Real-time job notifications
- Heartbeat Mechanism - Detect stale claims
- Workers - Configure worker concurrency