Webhooks not running on Fargate - Umbraco 17

I have an Umbraco 17.3.2 instance running on Fargate where the webhooks suddenly stopped running on April 16th. There are no Umbraco logs about it, nothing appears in the Webhook logs table, no errors in Cloudwatch. There were no fargate deployments that day, and no network changes - although i would expect that to lead to errors, rather than nothing. It totally all works running the published code on my Win 10 workstation and a straight copy of the database, so I am at a loss.
Nothing i have tried has made any difference.

The docker image uses sdk:10.0 for build and aspnet:10.0 to run

Has anyone got an idea what might be at foot?

Hi @binraider

Are you able to confirm the server role in the database as webhooks only run on SchedulingPublisher role, not Subscriber.

SELECT id, address, computerName, registeredDate, lastNotifiedDate, isActive, isSchedulingPublisher
FROM umbracoServer
ORDER BY registeredDate;

It could be another instance has started and is running as subscriber so webhooks don’t fire.

Also, try changing your logging level to see if you can capture any debug logs.

"Logging": {
  "LogLevel": {
    "Umbraco.Cms.Infrastructure.HostedServices": "Debug",
    "Umbraco.Cms.Core.Webhooks": "Debug",
    "Umbraco.Cms.Core.Sync": "Debug"
  }
}

Also worth checking web hooks are actually enabled in the appsettings.

Justin

Yes i checked the umbracoServer table for that and it is/was set to SchedulingPublisher. I will add the extra logging. Also the Task Definition is set to only run 1 instance, its not meant to be scaled. Yes the Webhooks section in appsettings is set. I have been trying things for hours.

Are you able to restart the site to see if that helps or have you already tried that?

Maybe try clearing the umbracoServer table and restarting just to rule that out?

Hi @binraider ,

@justin-nevitech mostly covered everything. Just want to add few bits to it. Please try this, it might help somehow:

I think there could be these 2 most likely culprits as setup on Umbraco side is looking good:

1. Certificate Expiration: Your local Windows 10 machine automatically updates its Trusted Root Certificate Store in the background. Your Fargate task runs a Linux image (aspnet:10.0) which has a static list of trusted certs. If the destination webhook URL updated its SSL certificate on April 16th to an authority the Linux image doesn’t recognize, the background task will reject the connection and die silently. (Here is a massive GitHub thread of .NET developers hitting this exact Linux vs. Windows SSL issue: dotnet/runtime #27703)

2. The Network “Black Hole”: If an AWS Security Group or NAT Gateway was changed and is now dropping outbound packets, the .NET HttpClient doesn’t get an error back. It just hangs indefinitely waiting for a response. This blocks the entire webhook queue silently. (AWS has a good troubleshooting guide for this specific Fargate outbound block here: Troubleshoot Fargate outbound connections)

The Fastest Way to Test This: To stop guessing and prove if it’s the network or a certificate, you should bypass Umbraco entirely.

Use AWS ECS Exec to drop directly into your running Fargate container’s shell and run a raw curl command to the webhook destination:

curl -v -X POST [https://your-webhook-endpoint.com](https://your-webhook-endpoint.com)

  • If it hangs indefinitely, it’s an AWS network/firewall block.

  • If it throws an SSL handshake error, it’s the certificate trust issue.

If that curl command fails, you have absolute proof that the issue is the AWS environment.

Hope it helps!

2 Likes

Thats really helpful. I will see about doing that now.

@binraider If you’re still struggling, I asked AI to generate some debug logging helpers that you can put in your site to see if you can get some more information about webhooks. Feel free to use as you see fit:

// =============================================================================
// WebhookDiagnostics.cs
//
// Drop-in diagnostic for an Umbraco 17 site where webhooks have stopped firing
// silently. Add this file to any project in the solution that has a reference
// to Umbraco.Cms.Core and Umbraco.Cms.Infrastructure (typically the web app
// project itself). It will auto-compose at startup — no extra wiring needed.
//
// What it does, all logged at WARNING level so it shows up in CloudWatch:
//
//   1. Logs every ContentPublishedNotification with the runtime server role
//      and accessor type. (Tells you whether the role check in
//      WebhookEventBase.HandleAsync is the silent killer.)
//
//   2. Decorates IWebhookFiringService and logs every FireAsync call, plus
//      whether it completed or threw. (Tells you whether requests are making
//      it into umbracoWebhookRequest.)
//
//   3. On application start, dumps the list of registered IWebhookEvent
//      aliases. (Catches a stray composer that cleared the event collection.)
//
//   4. Logs every RecurringBackgroundJobIgnoredNotification with the reason
//      class. (Tells you if TouchServerJob — the thing that updates the
//      runtime server role — is being skipped, e.g. because of MainDom.)
//
//   5. Exposes GET /_diag/webhooks returning the runtime role and pending
//      request count, so you can curl it without redeploying.
//
// To remove: just delete this file. There is no other footprint.
// =============================================================================

using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Core.DependencyInjection;
using Umbraco.Cms.Core.Events;
using Umbraco.Cms.Core.Models;
using Umbraco.Cms.Core.Notifications;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Core.Sync;
using Umbraco.Cms.Core.Webhooks;
using Umbraco.Cms.Infrastructure.Notifications;
using Umbraco.Cms.Infrastructure.Services.Implement;

namespace WebhookDiagnostics;

// -----------------------------------------------------------------------------
// 1. Composer — wires everything up. Auto-discovered by Umbraco at startup.
// -----------------------------------------------------------------------------
public class WebhookDiagnosticComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
    {
        // Notification handlers
        builder.AddNotificationAsyncHandler<ContentPublishedNotification, ContentPublishedDiagnosticHandler>();
        builder.AddNotificationAsyncHandler<UmbracoApplicationStartedNotification, WebhookEventDumpHandler>();
        builder.AddNotificationAsyncHandler<RecurringBackgroundJobIgnoredNotification, JobIgnoredHandler>();

        // Decorate IWebhookFiringService.
        // We register the original concrete type so we can resolve it as the
        // inner; then we replace the IWebhookFiringService registration with
        // a factory that builds the decorator. AddUnique<TService, TImpl>()
        // would *remove* the original from DI before we got to wrap it, which
        // is why we go via the factory overload.
        builder.Services.AddTransient<WebhookFiringService>();
        builder.Services.AddUnique<IWebhookFiringService>(sp =>
            new LoggingWebhookFiringService(
                sp.GetRequiredService<WebhookFiringService>(),
                sp.GetRequiredService<ILogger<LoggingWebhookFiringService>>()));
    }
}

// -----------------------------------------------------------------------------
// 2. ContentPublishedNotification handler.
//    Tells you whether the notification is being raised at all, and what the
//    runtime server role looks like at the moment of dispatch. The role check
//    inside WebhookEventBase.HandleAsync that bails out silently uses exactly
//    this same IServerRoleAccessor, so this is the authoritative reading.
// -----------------------------------------------------------------------------
public class ContentPublishedDiagnosticHandler : INotificationAsyncHandler<ContentPublishedNotification>
{
    private readonly ILogger<ContentPublishedDiagnosticHandler> _logger;
    private readonly IServerRoleAccessor _roleAccessor;

    public ContentPublishedDiagnosticHandler(
        ILogger<ContentPublishedDiagnosticHandler> logger,
        IServerRoleAccessor roleAccessor)
    {
        _logger = logger;
        _roleAccessor = roleAccessor;
    }

    public Task HandleAsync(ContentPublishedNotification notification, CancellationToken cancellationToken)
    {
        var role = _roleAccessor.CurrentServerRole;
        var willFire = role is ServerRole.Single or ServerRole.SchedulingPublisher;

        _logger.LogWarning(
            "WEBHOOK_DIAG: ContentPublished received. Role={Role} (WillFireWebhooks={WillFire}), " +
            "AccessorType={AccessorType}, PublishedCount={Count}, Machine={Machine}",
            role,
            willFire,
            _roleAccessor.GetType().Name,
            notification.PublishedEntities.Count(),
            Environment.MachineName);

        return Task.CompletedTask;
    }
}

// -----------------------------------------------------------------------------
// 3. IWebhookFiringService decorator.
//    Logs every attempt to queue a webhook request. If you see "FireAsync
//    completed" but no row in umbracoWebhookRequest, something is eating the
//    insert. If you don't see FireAsync called at all when content publishes,
//    the role check or notification flow is the blocker — not the queueing.
// -----------------------------------------------------------------------------
public class LoggingWebhookFiringService : IWebhookFiringService
{
    private readonly IWebhookFiringService _inner;
    private readonly ILogger<LoggingWebhookFiringService> _logger;

    public LoggingWebhookFiringService(
        IWebhookFiringService inner,
        ILogger<LoggingWebhookFiringService> logger)
    {
        _inner = inner;
        _logger = logger;
    }

    public async Task FireAsync(IWebhook webhook, string eventAlias, object? payload, CancellationToken cancellationToken)
    {
        _logger.LogWarning(
            "WEBHOOK_DIAG: FireAsync called. Alias={Alias}, Url={Url}, WebhookKey={Key}, Enabled={Enabled}",
            eventAlias, webhook.Url, webhook.Key, webhook.Enabled);

        try
        {
            await _inner.FireAsync(webhook, eventAlias, payload, cancellationToken);
            _logger.LogWarning("WEBHOOK_DIAG: FireAsync completed (request should now be in umbracoWebhookRequest). Alias={Alias}", eventAlias);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "WEBHOOK_DIAG: FireAsync threw. Alias={Alias}", eventAlias);
            throw;
        }
    }
}

// -----------------------------------------------------------------------------
// 4. Startup dump of registered webhook events.
//    If your event alias (e.g. "Umbraco.ContentPublish") is not in this list,
//    something — usually a composer doing `.WebhookEvents().Clear()` without
//    re-adding — has nuked the registration.
// -----------------------------------------------------------------------------
public class WebhookEventDumpHandler : INotificationAsyncHandler<UmbracoApplicationStartedNotification>
{
    private readonly WebhookEventCollection _events;
    private readonly ILogger<WebhookEventDumpHandler> _logger;

    public WebhookEventDumpHandler(WebhookEventCollection events, ILogger<WebhookEventDumpHandler> logger)
    {
        _events = events;
        _logger = logger;
    }

    public Task HandleAsync(UmbracoApplicationStartedNotification notification, CancellationToken cancellationToken)
    {
        var aliases = _events.Select(e => e.Alias).OrderBy(a => a).ToArray();
        _logger.LogWarning(
            "WEBHOOK_DIAG: Application started. Registered webhook events ({Count}): {Aliases}",
            aliases.Length,
            string.Join(", ", aliases));
        return Task.CompletedTask;
    }
}

// -----------------------------------------------------------------------------
// 5. RecurringBackgroundJobIgnored handler.
//    The webhook *firing* job (WebhookFiring) is a distributed job in v17 and
//    won't show up here. But TouchServerJob *is* recurring, and if it's being
//    ignored — runtime not at Run, role mismatch, or MainDom — the in-memory
//    server role never updates and the queueing path silently dies. So if you
//    see TouchServerJob in this log, that's a strong lead.
// -----------------------------------------------------------------------------
public class JobIgnoredHandler : INotificationAsyncHandler<RecurringBackgroundJobIgnoredNotification>
{
    private readonly ILogger<JobIgnoredHandler> _logger;
    public JobIgnoredHandler(ILogger<JobIgnoredHandler> logger) => _logger = logger;

    public Task HandleAsync(RecurringBackgroundJobIgnoredNotification notification, CancellationToken cancellationToken)
    {
        _logger.LogWarning(
            "WEBHOOK_DIAG: Recurring job ignored: {JobType}",
            notification.Job.GetType().Name);
        return Task.CompletedTask;
    }
}

// -----------------------------------------------------------------------------
// 6. Diagnostic endpoint: GET /_diag/webhooks
//    Returns the runtime server role and a few useful counters. Hit it with
//    curl from inside or outside the container — no redeploy needed once this
//    file is shipped.
//
//    NOTE: this is unauthenticated by design (it's diagnostic, no secrets).
//    Remove the file or add [Authorize] before leaving it in prod.
// -----------------------------------------------------------------------------
[ApiController]
[Route("_diag/webhooks")]
public class WebhookDiagController : ControllerBase
{
    private readonly IServerRoleAccessor _roleAccessor;
    private readonly WebhookEventCollection _events;
    private readonly IWebhookFiringService _firingService;

    public WebhookDiagController(
        IServerRoleAccessor roleAccessor,
        WebhookEventCollection events,
        IWebhookFiringService firingService)
    {
        _roleAccessor = roleAccessor;
        _events = events;
        _firingService = firingService;
    }

    [HttpGet]
    public IActionResult Get() => Ok(new
    {
        machine = Environment.MachineName,
        utcNow = DateTime.UtcNow,
        runtimeRole = _roleAccessor.CurrentServerRole.ToString(),
        roleAccessorType = _roleAccessor.GetType().FullName,
        webhookFiringServiceType = _firingService.GetType().FullName, // should show LoggingWebhookFiringService
        registeredWebhookEventCount = _events.Count(),
        registeredWebhookEventAliases = _events.Select(e => e.Alias).OrderBy(a => a).ToArray(),
    });
}

Justin

1 Like

So i found out the problem, and it was that there was an extra task running, that wasn’t showing on the Service page, only on the Cluster page. In normal operation i dont bother looking at the cluster page but go straight to the service page, but as i was poking around i saw it - a loose cannon that had been running for the last 21 days. Once i stopped it normal service returned.

Thanks to everyone for the help you have given me.

1 Like