Over the past 6 months we’ve observed a slow but steady increase in memory usage over a 24-hour period, which eventually results in an OOM kill and ECS task replacement.
We are running Umbraco 13 in a high-availability configuration (Delivery/Mgmt) on AWS ECS, with each task currently sized at 1 vCPU / 4 GB RAM.
From our investigation (including repeated memory dumps and runtime monitoring), it appears the majority of memory growth is associated with the published content cache. As pages are accessed and the site is used, memory consumption increases gradually and does not appear to stabilise before hitting the container memory limit.
Some relevant scale details:
Hosting 10 websites
Approximately 650+ published pages total
Multi-site setup with shared content database
50k-100k monthly page views.
I understand that Umbraco infrastructure sizing can vary significantly by implementation, and I haven’t been able to find concrete sizing guidance in the documentation for AWS.
Given the above:
Is 4 GB RAM per node considered undersized for this scale in Umbraco 13?
If not, what would be the recommended infrastructure configuration?
Any guidance on expected memory behavior or recommended sizing for this kind of workload would be greatly appreciated.
Have you tried doing any load testing to see how much you instances can take, before they crash?
Also if possible, you could also gain some performance by jumping to Umbraco 17 and take advantage of the hybrid cache setup.
The stress test wil propably give you the right answer, but based on our experience we see a general 8GB usage when the backoffice is used both for editors and for serving data through the Deliver API. And our solutions servers closer to 2 million page views a month (but with the use of cloudflare cache also included).
Based on the details on your setup, I always use the thumb-rule of “better safe than sorry”. We are also in the progress of moving our setup from a windows server Docker setup to a kubernetes cluster, and we would rather just give it to much power in the beginning and then after 2-3 weeks scale down to what fits the project the best. I would rather pay the extra to ensure good performance, than to give each node to little and having to scalue up.
Just to be sure, you did configure the platform for Production Mode, right?
It has a huge impact on performance if not setup and configured to spec.
We have a platform with ~23k content nodes that sits at around 4GB memory, reported from the IIS process.
The platform also has Engage and a pretty good amount of custom code and integrations.
So as I see it 4GB should be plenty for a solution of your size.
Unless your content is… weird, 4GB should be perfectly fine. CPU is usually the bottleneck that needs addressing before memory becomes a problem.
It sounds like there’s a memory leak. In Umbraco >v15 content is loaded into the cache eagerly at boot and on its own that shouldn’t “grow” with pageviews.
I have seen this behaviour before. In our case, a rogue UmbracoHelper was being injected into the DI container and never disposed of. This meant the associated IPublishedSnapshot was never disposed of either so we ended up with multiple copies of content from the cache piling up in memory.
Look for any code that’s holding onto references to instances of IPublishedContent or IPublishedSnapshot, either directly or indirectly. Also look out for anything else that’s designed to be request-scoped/short-lived, that you might be holding onto across requests.