Excited to announce that I have just published a new open-source Umbraco package to NuGet: Umbraco.Community.Examine.OpenXml.
If you have ever needed to make Word documents, Excel spreadsheets or PowerPoint presentations uploaded to the Umbraco media library searchable, you will know there has been no straightforward solution for modern Umbraco. The official UmbracoExamine.PDF package handles PDFs brilliantly, but nothing equivalent existed for Office documents on Lucene-based Examine.
This package fills that gap. It uses the DocumentFormat.OpenXml SDK to extract text from .docx, .pptx and .xlsx files and indexes them into a dedicated Examine index. It supports Umbraco 13, 16 and 17, installs with a single dotnet add package command, and requires zero additional configuration.
It is based on the same patterns as UmbracoExamine.PDF, so if you have used that package it will feel immediately familiar.
I first wrote about this approach back in 2023 as a GitHub project you had to manually integrate into your solution. It is great to finally have it available as a proper package that is easy to install and keep up to date.
Read the full write-up on the Nevitech blog and install it from NuGet today.