Anywhere long running background tasks are happening cancellation tasks should be used to cancel them when shutdown is called. Ideally with a bit more fine grain that this, especially if the PDFs are chunky, but in principle this should happen.
foreach (var pdf in pdfs)
{
if (cancellationToken.IsCancellationRequested)
{
return;
}
IndexPdf(pdf);
}
Yes, but generally speaking, even with Azure’s shenanigans, the app should be cleanly shutting down. Examine has defensive coding around this either way.
It also has great logs, so it’s usually easy to pinpoint the real problem just by setting some namespaces to “Debug”.
I can guarantee that UglyToad/PDFPig has nothing using/checking cancellation tokens. It might be happening inside Examine quite the thing. But this third party code for indexing PDFs, it does not happen nicely and fails easily when indexing files. I don’t know if that might cause Examine to fail without a nice “shutdown”. But lived experience is things lock. We don’t know why. It’s all clutching at straws. A 2 index location as described above would solve all issues.
And yes, we generally just leave Debug mode on for our smaller sites with smallish file size and day turn overs on the logs, but then the log files are unreadable outside Umbraco and the search for logs inside Umbraco isn’t the best so it’s still difficult. Maybe we’ll do as you say and then on debug for only the examine namespace for a while and see.
But also why not log to an external store? SEQ, Application Insights any other available serilog SINK that works for you?
I prefer SEQ.. (and you can limit the umbraco file based logging if you aren’t using it to save disk space :-)) Seq — centralized structured logs <PackageReference Include="Serilog.Sinks.Seq" Version="7.0.1" /> I believe fits with the serilog version in v13 umbraco
@bythewiseman So your issue is the Azure locking whilst running. The trigger is easy to diagnose - Azure will automatically restart instances for a number of reasons including updates to software, after certain GC activity etc. You can avoid the infrastructure-based restarts by using the WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG config setting I posted earlier, as that is its purpose. I think this will largely solve your issue.
The actual cause, I don’t have a solution for but I don’t believe it is Examine itself.
On the flip side, I have a solution for my issue of indexes locking whilst deploying. Although I am yet to test it, I am confident it will work. This article here describes a little known feature (re)introduced in .NET called ShadowCopy. This is a fantastic writeup and I seriously suggest people read it.
The key bit for me is here:
… this is due to the application still running pending requests or running some background operations that have not completed and released their background threads. End result: In some cases the IIS application does not unload.
Sometimes you can wait a little bit and try again, but if the application is super busy or has long running requests or background services it might be increasingly difficult to update the application without explicitly shutting down the Web application on the server
This is nothing new - we all know this but I think that the fact that Umbraco runs in a nested app domain and is heavily reliant on background services means that this locking situation isn’t going to have a code-based fix. Maybe running on Linux will be better, without IIS getting in the way?
Personally, I think this is going to remain an unresolved issue with v13 and using the Search provider with v17 is going to be the way forward.
I’m not sure that Rick’s approach will work with Umbraco, as IIRC, Umbraco does make assumptions based on Assembly.GetExecutingAssembly().
it might be increasingly difficult to update the application without explicitly shutting down the Web application on the server
It’s important to make sure that Umbraco isn’t still running if you deploy over the top of it.
To that end some of our CI pipelines have an explicit stop/start call to IIS when deploying to VPSs where app_offline.htm wasn’t effective enough. Though, and I know I sound like a broken record here, but if you can work out why a site isn’t cleanly shutting down and fix it, app_offline.htm should be enough on its own.
For Azure Web Apps, deployment slots make this a non-issue as you can deploy to a stopped instance. Much friendlier and robust experience all round. You can even have your pipeline spin up ephemeral slots - 0% chance of file-locking if you’re deploying to a brand new empty slot.
If you think this is an issue you try setting WEBSITE_DISABLE_OVERLAPPED_RECYCLING=1 on your web app. This will ensure that the Web App’s VM is shutdown before spinning up a new instance.
Maybe, maybe not, we’ll see. I’ll report back when I’ve done some experiments.
Correct (we already use this setting) but in @bythewiseman’s case he’s trying to avoid the app shutting down in the first place which is the trigger for the issue. This setting will just cause extra avoidable downtime.
That level of downtime isn’t acceptable to me or my clients. We’re all on the same page though re finding a fix.
However, I don’t believe this is necessarily an Examine, Umbraco or code issue. I believe this might be tied to .NET 8 itself. It’s difficult to prove but my theory is that with the performance increases that came in .NET 8, the shutdown process of Umbraco is hampered by additional overlapping calls to Examine that aren’t prevented in time. I could be wrong and I likely am but as I said earlier I think this is going to remain an unresolved issue with v13.
Cool, that takes IIS out of the equation completely. In which case I would concentrate on what you can control via config. Have you implemented the WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG appsetting key? If not, that’s the first place to start. If you are slot swapping, I imagine you already have the WEBSITE_DISABLE_OVERLAPPED_RECYCLING setting set but if you don’t, this should be set. It’s frustrating because with it you no longer benefit from the seamless slot swap but until it is possible to move away from Examine, I have found this to be necessary too.
Fair enough, but you can’t stop this from being true:
For zero-downtime deployments with Umbraco, the only viable option is Blue-Green deployments. Whether that’s slot swaps or containers or something more manual on a VPS.
Interestingly, Rick Strahl’s approach is basically Blue-Green, but at the filesystem level only
A lot of people are using v13 in anger across Azure Web Apps and VPSs without constantly locking up. Right now I don’t even think WEBSITE_DISABLE_OVERLAPPED_RECYCLING is enabled on any of our client’s production sites…
I feel like there must be something more unusual going on here.