Unable to rebuild database cache

So, after some very sweaty and high-stress hours our site is finally back up! I’ll attempt to summarize:

After many attempts to try and get Umbraco to rebuild it’s cmsContentNu cache, we cloned the source code from Github and checked out the version tag we’re running (v13.11) to attempt to figure out what was causing the issues.

We set the Serilog log-level to just show Errors and saw a lot of “No data for media” exceptions, which led us to this line:

Since our site was very large we were throwing thousands of exceptions which caused the entire site to crash. So we patched this dll (Umbraco.PublishedCache.NuCache.dll) to instead of throwing exceptions, to simply log and move on. This patch made it possible to boot the site, but the pages looked weird, stuff was missing and it just seemed way off. But hey, it was up, which was progress!

So deeper into the rabbit hole we went, and we ended up writing another patch to the same dll, because when we attempted to click the “Rebuild” button on the database cache it just ticked for maybe 2 seconds then stopped, no errors, no logs, no nothing. so we ended up looking at this file:

And if you follow that you will see that it just swallows any exceptions that might be thrown, so our next patch was to catch these exceptions and log them.

From that log message we could create an SQL query that compiled of list of “Corrupt/Broken” documents in the database:

IF OBJECT_ID('tempdb..#CorruptDocs') IS NOT NULL DROP TABLE #CorruptDocs;

SELECT d.nodeId, 'missing_umbracoContent' AS reason
INTO #CorruptDocs
FROM umbracoDocument d
LEFT JOIN umbracoContent c ON c.nodeId = d.nodeId
WHERE c.nodeId IS NULL

UNION ALL
SELECT d.nodeId, 'missing_current_content_version'
FROM umbracoDocument d
LEFT JOIN umbracoContentVersion cv ON cv.nodeId = d.nodeId AND cv.[current] = 1
WHERE cv.id IS NULL

UNION ALL
SELECT d.nodeId, 'missing_document_version_for_current'
FROM umbracoDocument d
JOIN umbracoContentVersion cv ON cv.nodeId = d.nodeId AND cv.[current] = 1
LEFT JOIN umbracoDocumentVersion dv ON dv.id = cv.id
WHERE dv.id IS NULL

UNION ALL
SELECT d.nodeId, 'published_without_published_version'
FROM umbracoDocument d
WHERE d.published = 1
  AND NOT EXISTS (
      SELECT 1
      FROM umbracoContentVersion cv
      JOIN umbracoDocumentVersion dv ON dv.id = cv.id AND dv.published = 1
      WHERE cv.nodeId = d.nodeId
  );

SELECT reason, COUNT(*) AS cnt
FROM #CorruptDocs
GROUP BY reason
ORDER BY reason;

SELECT *
FROM #CorruptDocs
ORDER BY nodeId, reason;

This yieled a database table with around 45 broken nodes. Where there’s been a miss-match of various published statuses.

The fix was to fix these affected nodes with the following SQL script:

BEGIN TRAN;

DECLARE @Broken TABLE (nodeId INT PRIMARY KEY);
INSERT INTO @Broken (nodeId) VALUES
(4186),(4350),(5068),(5229),(6600),(9358),(10247),(13662),(14249),(14437),
(14442),(14913),(15182),(15192),(15197),(15198),(15234),(15605),(15664),(15816),
(16694),(16715),(17673),(17944),(18001),(18077),(20501),(27739),(28130),(28245),
(29461),(29463),(29574),(29863),(29872),(30479),(30489),(30578),(31565),(31583),
(31978),(31984),(32367),(32404),(32485);

-- 1) Ensure current-version has a row in umbracoDocumentVersion
-- (if missing, create it with published=1)
INSERT INTO umbracoDocumentVersion (id, templateId, published)
SELECT cv.id, NULL, 1
FROM umbracoContentVersion cv
JOIN @Broken b ON b.nodeId = cv.nodeId
LEFT JOIN umbracoDocumentVersion dv ON dv.id = cv.id
WHERE cv.[current] = 1
  AND dv.id IS NULL;

-- 2) Clear published-flag on all variants of this node
UPDATE dv
SET dv.published = 0
FROM umbracoDocumentVersion dv
JOIN umbracoContentVersion cv ON cv.id = dv.id
JOIN @Broken b ON b.nodeId = cv.nodeId;

-- 3) Set published=1 on current-version
UPDATE dv
SET dv.published = 1
FROM umbracoDocumentVersion dv
JOIN umbracoContentVersion cv ON cv.id = dv.id
JOIN @Broken b ON b.nodeId = cv.nodeId
WHERE cv.[current] = 1;

-- 4) Ensure document is published
UPDATE d
SET d.published = 1
FROM umbracoDocument d
JOIN @Broken b ON b.nodeId = d.nodeId;

-- 5) Verify, should yield 0 rows
SELECT d.nodeId
FROM umbracoDocument d
JOIN @Broken b ON b.nodeId = d.nodeId
WHERE d.published = 1
  AND NOT EXISTS (
      SELECT 1
      FROM umbracoContentVersion cv
      JOIN umbracoDocumentVersion dv ON dv.id = cv.id AND dv.published = 1
      WHERE cv.nodeId = d.nodeId
  );
-- COMMIT;
-- ROLLBACK;

Our verification step yielded 0 rows, which was expected. We could then stop the app-pool, clear the NuCache folders and restart everything back up. Now the entire site booted correctly and we could also click the Rebuild button on the Published Status screen and we could successfully rebuild the cmsContentNu cache!

My theory is that this was a follow up error for this issue we had some weeks ago: Unable to import settings, multilingual site "DocumentCultureVariationDto was not found for node"

(I know the forum post is old, but this client hasn’t wanted us to do any work on this site up until now, so that’s why we hadn’t taken any action for so long)

Big thanks to everyone that helped me yesterday!

@justin-nevitech @ShekharTarare

3 Likes