Misconfigured examine settings lead to OutOfMemoryExceptions on Azure webapp

The Question
We have an Umbraco 13.5.2 site hosted in Azure webapps. The site was misconfigured and about once a month would crash with OutOfMemoryExceptions. We’ve fixed the configuration. What I’m hoping to get some insight into is how the misconfiguration could have resulted in OutOfMemoryExceptions. We are trying to figure out if there is more to the problem than just that misconfiguration.

Anyone have any ideas on why our misconfigured examine/lucene settings would result in OutOfMemoryExceptions? Until I’ve found some explanation about how the misconfiguration could cause this exception, I’m inclined to believe that there is a separate problem at play.

Thanks everyone!

The setup
We have a backend, CMS instance and a public scalable instance (not currently scaled out at all) both pointed to the same database. The site has periodically started crashing and won’t come up again without actually removing all traffic from the public site for a time. Recycling the app pool is not sufficient.

We have identified part of the problem being that our examine configuration was not correct for load balancing in Azure. Both the backend, CMS instance and the frontend, scalable instance had:

{
    "Umbraco": {
        "CMS": {
            "Examine": {
                "LuceneDirectoryFactory" : "SyncedTempFileSystemDirectoryFactory"
            }
        }
    }
}

The frontend, scalable instance should have had

{
    "Umbraco": {
        "CMS": {
            "Examine": {
                "LuceneDirectoryFactory" : "TempFileSystemDirectoryFactory"
            }
        }
    }
}

The Symptoms
The site would crash and we couldn’t fix it by just recycling the app pool. We actually had to route traffic away from the site for a time until the site had a chance to stabilize.

When we would review the Umbraco logs, we would see

  1. The site would restart mid day. We aren’t sure what triggered this yet
  2. The site would go through its regular boot process
  3. When the site would boot back up, it would replicate the indexes from the %Home% directory back to its %TEMP% directory with no apparent errors. There are info level logs indicating this.
  4. Around 1min later, the site would throw an OutOfMemoryException and restart
  5. The cycle would repeat

This issue might be related/relevant:

Azure Web Apps are essentially transient - they get restarted, moved between machines etc. regularly. One of the things that can happen is that two instances end up online at the same time - if this happens with SyncedTempFileSystemDirectoryFactory then you can end up in a scenario where both instances are trying to write to the same index at the same time.

I’m not massively familiar with Examine but what I think is happening in your case was that instance-1 is writing to the synced index while instance-2’s memory is getting filled up with all the things that it’s waiting to write, but can’t.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.