Vinicius Aguiar
Architecture

Webhook architecture for payment providers in Node.js

Apr 15, 2026 · 10 min read

Integrating payments into a real product goes far beyond following the provider's documentation. When real money flows through the system, every silent failure is a problem — duplicate charges, inconsistent status, or sales that were never confirmed. In this article, I present the architecture I use to receive and process webhooks from providers like Stripe, Mercado Pago, and Asaas in Node.js applications.

The real problem

Webhooks are how payment providers notify your application about events — a PIX confirmed, a subscription cancelled, a dispute opened. The problem is that this mechanism is inherently unreliable: webhooks can arrive duplicated, out of order, delayed, or simply not arrive at all.

If your application isn't prepared to handle these scenarios, you'll discover the problem when a customer complains they paid but didn't get access — or worse, when finance notices the numbers don't add up.

The diagram below shows the complete flow we'll build in this article:

Payment webhook lifecycle diagram — validation, idempotency, queue, worker, retry and reconciliationPayment webhook lifecycle diagram — validation, idempotency, queue, worker, retry and reconciliation

Signature validation

The first security layer is validating that the webhook actually came from the provider. Each provider implements this differently:

  • Stripe: sends a Stripe-Signature header with timestamp + HMAC-SHA256 of the body
  • Mercado Pago: sends x-signature with hash and query params, requires API validation
  • Asaas: sends a token in the header that must be compared with the one configured in the dashboard

The rule is simple: never process a webhook without validating the signature. Without this, anyone can send a POST to your endpoint and simulate a payment confirmation.

import Stripe from 'stripe'

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!)

export async function POST(req: Request) {
  const body = await req.text()
  const signature = req.headers.get('stripe-signature')!

  let event: Stripe.Event
  try {
    event = stripe.webhooks.constructEvent(
      body,
      signature,
      process.env.STRIPE_WEBHOOK_SECRET!
    )
  } catch (err) {
    return new Response('Invalid signature', { status: 400 })
  }

  // Process validated event
  await processEvent(event)
  return new Response('OK', { status: 200 })
}

Idempotency: preventing duplicate processing

Payment providers resend webhooks when they don't receive a 2xx response. This means the same event can arrive multiple times. If your handler isn't idempotent, you might credit a customer's balance twice or send two confirmation emails.

The solution is to store the event ID and check before processing:

async function processEvent(event: PaymentEvent) {
  // Check if already processed
  const existing = await db.webhookEvents.findUnique({
    where: { eventId: event.id }
  })

  if (existing) {
    return // Already processed, skip
  }

  // Record event before processing
  await db.webhookEvents.create({
    data: {
      eventId: event.id,
      provider: 'stripe',
      type: event.type,
      processedAt: new Date()
    }
  })

  // Process safely
  switch (event.type) {
    case 'payment_intent.succeeded':
      await handlePaymentSuccess(event.data)
      break
    case 'payment_intent.payment_failed':
      await handlePaymentFailure(event.data)
      break
  }
}

Async processing: immediate ack

A critical rule: respond 200 as fast as possible. If your handler takes too long to respond (because it's updating the database, sending emails, calling another API), the provider will consider it failed and resend — causing more load and potential duplicates.

The correct pattern is immediate ack + background processing:

export async function POST(req: Request) {
  const event = await validateAndParse(req)
  if (!event) {
    return new Response('Invalid', { status: 400 })
  }

  // Save to queue for async processing
  await db.webhookQueue.create({
    data: {
      eventId: event.id,
      payload: JSON.stringify(event),
      status: 'pending'
    }
  })

  // Respond immediately
  return new Response('OK', { status: 200 })
}

// Separate worker processes the queue
async function processQueue() {
  const pending = await db.webhookQueue.findMany({
    where: { status: 'pending' },
    orderBy: { createdAt: 'asc' }
  })

  for (const item of pending) {
    try {
      await processEvent(JSON.parse(item.payload))
      await db.webhookQueue.update({
        where: { id: item.id },
        data: { status: 'processed' }
      })
    } catch (err) {
      await db.webhookQueue.update({
        where: { id: item.id },
        data: {
          status: 'failed',
          retryCount: { increment: 1 },
          lastError: err.message
        }
      })
    }
  }
}

Out-of-order webhooks

A real-world scenario that happens frequently: the payment.failed webhook arrives before payment.created. Or the provider sends refund.completed before payment.succeeded. If your system depends on a specific order, it will break.

Two approaches to handle this:

  • State machine: define valid transitions for each payment status. If an event attempts an invalid transition (e.g., refund before success), enqueue for later reprocessing.
  • Provider timestamp: use the event's timestamp (not the received timestamp) to determine which state is more recent. Ignore events older than the current state.
async function handlePaymentUpdate(event: PaymentEvent) {
  const payment = await db.payments.findUnique({
    where: { providerPaymentId: event.paymentId }
  })

  if (!payment) {
    // Payment doesn't exist yet, enqueue for retry
    await enqueueForRetry(event)
    return
  }

  // Ignore events older than current state
  if (event.timestamp <= payment.lastEventTimestamp) {
    return
  }

  // Validate state transition
  const validTransitions: Record<string, string[]> = {
    pending: ['confirmed', 'failed', 'cancelled'],
    confirmed: ['refunded', 'disputed'],
    failed: ['pending'] // provider retry
  }

  if (!validTransitions[payment.status]?.includes(event.newStatus)) {
    await enqueueForRetry(event)
    return
  }

  await db.payments.update({
    where: { id: payment.id },
    data: {
      status: event.newStatus,
      lastEventTimestamp: event.timestamp
    }
  })
}

PIX: differences between providers

PIX is the most widely used payment method in Brazil, but each provider implements confirmation differently:

  • Stripe: doesn't natively offer PIX in Brazil (uses bank transfers as an alternative)
  • Mercado Pago: PIX confirmation usually arrives within seconds via webhook, but can be delayed up to minutes during peak hours
  • Asaas: confirmation can take from seconds to minutes, and the confirmation webhook sometimes arrives before the creation webhook

In practice, this means you cannot rely on a specific order of events for PIX. The system needs to be resilient to confirmations arriving before creation, long delays, and cases where the webhook simply doesn't arrive.

Reconciliation: when state diverges

Even with all the protections above, there will be times when your application's local state and the provider's state diverge. A webhook that never arrived, a bug in the handler, a deploy that took the worker down for a few minutes.

The solution is a reconciliation job that runs periodically:

async function reconcilePayments() {
  // Find payments pending for over 30 minutes
  const stalePayments = await db.payments.findMany({
    where: {
      status: 'pending',
      createdAt: {
        lt: new Date(Date.now() - 30 * 60 * 1000)
      }
    }
  })

  for (const payment of stalePayments) {
    // Query status directly from provider
    const providerStatus = await getProviderStatus(
      payment.provider,
      payment.providerPaymentId
    )

    if (providerStatus !== payment.status) {
      await db.payments.update({
        where: { id: payment.id },
        data: { status: providerStatus }
      })

      // Log for auditing
      await db.reconciliationLog.create({
        data: {
          paymentId: payment.id,
          previousStatus: payment.status,
          newStatus: providerStatus,
          reason: 'reconciliation_job'
        }
      })
    }
  }
}

This job is the ultimate safety net. It ensures that even when everything fails — lost webhooks, handler bugs, provider downtime — the system eventually converges to the correct state.

Dead letter queue: what to do when it fails

When webhook processing fails repeatedly (3-5 attempts), it should go to a dead letter queue — a separate table for events that need manual intervention or investigation.

async function processWithRetry(item: WebhookQueueItem) {
  const MAX_RETRIES = 5

  if (item.retryCount >= MAX_RETRIES) {
    // Move to dead letter queue
    await db.deadLetterQueue.create({
      data: {
        eventId: item.eventId,
        payload: item.payload,
        lastError: item.lastError,
        failedAt: new Date()
      }
    })

    await db.webhookQueue.delete({
      where: { id: item.id }
    })

    // Alert the team
    await notify(`Webhook ${item.eventId} failed ${MAX_RETRIES}x and was moved to DLQ`)
    return
  }

  // Try to process with exponential backoff
  try {
    await processEvent(JSON.parse(item.payload))
  } catch (err) {
    const nextRetry = new Date(
      Date.now() + Math.pow(2, item.retryCount) * 1000
    )
    await db.webhookQueue.update({
      where: { id: item.id },
      data: {
        retryCount: { increment: 1 },
        lastError: err.message,
        nextRetryAt: nextRetry
      }
    })
  }
}

Graceful degradation

When a payment provider is down, your application can't simply break. The user needs to know that the payment is being processed, even if confirmation takes longer. Some practices:

  • Intermediate status: use a state like awaiting_confirmation that the user sees while waiting for provider confirmation
  • Configurable timeouts: if confirmation doesn't arrive in X minutes, mark as requires_review instead of failing silently
  • Provider fallback: if one provider is down, offer another payment method as an alternative
  • Clear communication: notify the user about the actual status instead of showing an endless loading spinner

Architecture summary

The complete architecture for payment webhooks in production comes down to these layers:

  1. Signature validation — reject any unauthenticated webhook
  2. Idempotency — store event IDs and ignore duplicates
  3. Immediate ack — respond 200 and process in background
  4. State machine — validate state transitions and handle out-of-order events
  5. Reconciliation — periodic job that syncs with the provider
  6. Dead letter queue — repeatedly failed events go to investigation
  7. Graceful degradation — the system keeps working when the provider fails

Each layer is a safety net for the previous one. None of them alone solves the problem — it's the combination that makes the system reliable enough to handle real money in production.