BlockGrid / RTE Examine indexing issues

Hi, I have an indexing issue with the BlockGrid where when the markup is being stripped automatically on indexing, which works ideally in most scenarios, in one case I have a table inside a Block with RTE content that has <br> dividing content in the cells, as an example of a BlockGrid property with alias ‘content’:

<table border="1" style="border-collapse: collapse; width: 100%;">
<tbody>
<tr>
<td>
<h3><strong>Company Secretary</strong></h3>
<p>John Smith<br>Murphys LTD<br>London<br>N1 1AA</p>
</td>
</tr>
</tbody>
</table>
<h2>Relations</h2>
<p>For all relations, please contact us on <a href="mailto:[email protected]">[email protected]</a></p>

This indexes as:

Company Secretary John SmithMurphys LTDLondonN1 1AA Relations For relations please contact us on [email protected]

Which causes issues searching on Smith as it’s actually SmithMurphys and doesn’t get found in the backend search even.

Has anyone had this issue or know an easy fix?

I could create a new field to index a tweaked version of the html to that but seems a bit over the top and requires a bit of setup. I guess fundamentally the indexing issue isn’t related to the BlockGrid as such, it’s more to do with the RTE inside the blocks that’s problematic.

@steve, I have not encountered this specific issue on practice, but I don’t think there’s an easy fix for this. Umbraco strips out all HTML tags when indexing RTE content and replaces them with inner HTML (for brs those are none). The source code Umbraco-CMS/src/Umbraco.Web.Common/Mvc/HtmlStringUtilities.cs at main · umbraco/Umbraco-CMS · GitHub . So it might be easier for you to manipulate the RTE content in a way that it adds blanks, rather than trying to resolve the HTML stripping. Or convert them to a series of <p> instead.

Thanks for the reply @valeriy-byteant and the details, there’s a pull request for this one now which hopefully solves it - Preserve word boundaries when indexing RTE content with <br> tags by steveatkiss · Pull Request #19540 · umbraco/Umbraco-CMS · GitHub

1 Like