Integrating payments into a real product goes far beyond following the provider's documentation. When real money flows through the system, every silent failure is a problem — duplicate charges, inconsistent status, or sales that were never confirmed. In this article, I present the architecture I use to receive and process webhooks from providers like Stripe, Mercado Pago, and Asaas in Node.js applications.
The real problem
Webhooks are how payment providers notify your application about events — a PIX confirmed, a subscription cancelled, a dispute opened. The problem is that this mechanism is inherently unreliable: webhooks can arrive duplicated, out of order, delayed, or simply not arrive at all.
If your application isn't prepared to handle these scenarios, you'll discover the problem when a customer complains they paid but didn't get access — or worse, when finance notices the numbers don't add up.
The diagram below shows the complete flow we'll build in this article:
Payment webhook lifecycle diagram — validation, idempotency, queue, worker, retry and reconciliationSignature validation
The first security layer is validating that the webhook actually came from the provider. Each provider implements this differently:
- Stripe: sends a
Stripe-Signatureheader with timestamp + HMAC-SHA256 of the body - Mercado Pago: sends
x-signaturewith hash and query params, requires API validation - Asaas: sends a token in the header that must be compared with the one configured in the dashboard
The rule is simple: never process a webhook without validating the signature. Without this, anyone can send a POST to your endpoint and simulate a payment confirmation.
import Stripe from 'stripe'
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!)
export async function POST(req: Request) {
const body = await req.text()
const signature = req.headers.get('stripe-signature')!
let event: Stripe.Event
try {
event = stripe.webhooks.constructEvent(
body,
signature,
process.env.STRIPE_WEBHOOK_SECRET!
)
} catch (err) {
return new Response('Invalid signature', { status: 400 })
}
// Process validated event
await processEvent(event)
return new Response('OK', { status: 200 })
}Idempotency: preventing duplicate processing
Payment providers resend webhooks when they don't receive a 2xx response. This means the same event can arrive multiple times. If your handler isn't idempotent, you might credit a customer's balance twice or send two confirmation emails.
The solution is to store the event ID and check before processing:
async function processEvent(event: PaymentEvent) {
// Check if already processed
const existing = await db.webhookEvents.findUnique({
where: { eventId: event.id }
})
if (existing) {
return // Already processed, skip
}
// Record event before processing
await db.webhookEvents.create({
data: {
eventId: event.id,
provider: 'stripe',
type: event.type,
processedAt: new Date()
}
})
// Process safely
switch (event.type) {
case 'payment_intent.succeeded':
await handlePaymentSuccess(event.data)
break
case 'payment_intent.payment_failed':
await handlePaymentFailure(event.data)
break
}
}Async processing: immediate ack
A critical rule: respond 200 as fast as possible. If your handler takes too long to respond (because it's updating the database, sending emails, calling another API), the provider will consider it failed and resend — causing more load and potential duplicates.
The correct pattern is immediate ack + background processing:
export async function POST(req: Request) {
const event = await validateAndParse(req)
if (!event) {
return new Response('Invalid', { status: 400 })
}
// Save to queue for async processing
await db.webhookQueue.create({
data: {
eventId: event.id,
payload: JSON.stringify(event),
status: 'pending'
}
})
// Respond immediately
return new Response('OK', { status: 200 })
}
// Separate worker processes the queue
async function processQueue() {
const pending = await db.webhookQueue.findMany({
where: { status: 'pending' },
orderBy: { createdAt: 'asc' }
})
for (const item of pending) {
try {
await processEvent(JSON.parse(item.payload))
await db.webhookQueue.update({
where: { id: item.id },
data: { status: 'processed' }
})
} catch (err) {
await db.webhookQueue.update({
where: { id: item.id },
data: {
status: 'failed',
retryCount: { increment: 1 },
lastError: err.message
}
})
}
}
}Out-of-order webhooks
A real-world scenario that happens frequently: the payment.failed webhook arrives before payment.created. Or the provider sends refund.completed before payment.succeeded. If your system depends on a specific order, it will break.
Two approaches to handle this:
- State machine: define valid transitions for each payment status. If an event attempts an invalid transition (e.g., refund before success), enqueue for later reprocessing.
- Provider timestamp: use the event's timestamp (not the received timestamp) to determine which state is more recent. Ignore events older than the current state.
async function handlePaymentUpdate(event: PaymentEvent) {
const payment = await db.payments.findUnique({
where: { providerPaymentId: event.paymentId }
})
if (!payment) {
// Payment doesn't exist yet, enqueue for retry
await enqueueForRetry(event)
return
}
// Ignore events older than current state
if (event.timestamp <= payment.lastEventTimestamp) {
return
}
// Validate state transition
const validTransitions: Record<string, string[]> = {
pending: ['confirmed', 'failed', 'cancelled'],
confirmed: ['refunded', 'disputed'],
failed: ['pending'] // provider retry
}
if (!validTransitions[payment.status]?.includes(event.newStatus)) {
await enqueueForRetry(event)
return
}
await db.payments.update({
where: { id: payment.id },
data: {
status: event.newStatus,
lastEventTimestamp: event.timestamp
}
})
}PIX: differences between providers
PIX is the most widely used payment method in Brazil, but each provider implements confirmation differently:
- Stripe: doesn't natively offer PIX in Brazil (uses bank transfers as an alternative)
- Mercado Pago: PIX confirmation usually arrives within seconds via webhook, but can be delayed up to minutes during peak hours
- Asaas: confirmation can take from seconds to minutes, and the confirmation webhook sometimes arrives before the creation webhook
In practice, this means you cannot rely on a specific order of events for PIX. The system needs to be resilient to confirmations arriving before creation, long delays, and cases where the webhook simply doesn't arrive.
Reconciliation: when state diverges
Even with all the protections above, there will be times when your application's local state and the provider's state diverge. A webhook that never arrived, a bug in the handler, a deploy that took the worker down for a few minutes.
The solution is a reconciliation job that runs periodically:
async function reconcilePayments() {
// Find payments pending for over 30 minutes
const stalePayments = await db.payments.findMany({
where: {
status: 'pending',
createdAt: {
lt: new Date(Date.now() - 30 * 60 * 1000)
}
}
})
for (const payment of stalePayments) {
// Query status directly from provider
const providerStatus = await getProviderStatus(
payment.provider,
payment.providerPaymentId
)
if (providerStatus !== payment.status) {
await db.payments.update({
where: { id: payment.id },
data: { status: providerStatus }
})
// Log for auditing
await db.reconciliationLog.create({
data: {
paymentId: payment.id,
previousStatus: payment.status,
newStatus: providerStatus,
reason: 'reconciliation_job'
}
})
}
}
}This job is the ultimate safety net. It ensures that even when everything fails — lost webhooks, handler bugs, provider downtime — the system eventually converges to the correct state.
Dead letter queue: what to do when it fails
When webhook processing fails repeatedly (3-5 attempts), it should go to a dead letter queue — a separate table for events that need manual intervention or investigation.
async function processWithRetry(item: WebhookQueueItem) {
const MAX_RETRIES = 5
if (item.retryCount >= MAX_RETRIES) {
// Move to dead letter queue
await db.deadLetterQueue.create({
data: {
eventId: item.eventId,
payload: item.payload,
lastError: item.lastError,
failedAt: new Date()
}
})
await db.webhookQueue.delete({
where: { id: item.id }
})
// Alert the team
await notify(`Webhook ${item.eventId} failed ${MAX_RETRIES}x and was moved to DLQ`)
return
}
// Try to process with exponential backoff
try {
await processEvent(JSON.parse(item.payload))
} catch (err) {
const nextRetry = new Date(
Date.now() + Math.pow(2, item.retryCount) * 1000
)
await db.webhookQueue.update({
where: { id: item.id },
data: {
retryCount: { increment: 1 },
lastError: err.message,
nextRetryAt: nextRetry
}
})
}
}Graceful degradation
When a payment provider is down, your application can't simply break. The user needs to know that the payment is being processed, even if confirmation takes longer. Some practices:
- Intermediate status: use a state like
awaiting_confirmationthat the user sees while waiting for provider confirmation - Configurable timeouts: if confirmation doesn't arrive in X minutes, mark as
requires_reviewinstead of failing silently - Provider fallback: if one provider is down, offer another payment method as an alternative
- Clear communication: notify the user about the actual status instead of showing an endless loading spinner
Architecture summary
The complete architecture for payment webhooks in production comes down to these layers:
- Signature validation — reject any unauthenticated webhook
- Idempotency — store event IDs and ignore duplicates
- Immediate ack — respond 200 and process in background
- State machine — validate state transitions and handle out-of-order events
- Reconciliation — periodic job that syncs with the provider
- Dead letter queue — repeatedly failed events go to investigation
- Graceful degradation — the system keeps working when the provider fails
Each layer is a safety net for the previous one. None of them alone solves the problem — it's the combination that makes the system reliable enough to handle real money in production.
