Searching a website

Is there any package that could help me with indexing my website content for a search engine. I imagine we could index the real page content, and not a grid that could not contain the rendered data if it is a specific block. My idea would be on publish, to render the grid, or even the page and extract the structure of the page. I could group the indexing by tag H1, H2, and what ever attribute. And in my search apply some boosting related to kind of fields that have been extracted. This could be more straight forward and customizable, then applying logic for each block in your grid ?

This should be a more robust for a generic text search engine. We could extend that basic search with more specialized fields for some types using taxonomies (tags, or any categorization).

Hi Laurent. I wonder if you’re familiar with thebuilt in search functionality Umrbaco already has? It’s pretty fantastic and functional.
We’ve always been able to build on / extend the built in search to accomplish whatever the defined requirements!

Maybe Growcreate Schema Generator | Umbraco Marketplace ??

1 Like

Umbraco already indexes content into a Lucene index which you can query using Examine. You can perform queries using wildcards and boosting and even add/change the field values when content gets indexed. It may be worth you looking into that using the link @pc-pdx posted above to see if that fits your needs before trying to write something yourself.

Alternatively, I know people have used other external search providers with Umbraco such as Algolia and Azure Search if that’s what you need?

1 Like

my question is more about indexing then searching… Searching is easy, but how to index efficiently is another story ?
if i look now in my index that contains the blockgrid , there is nothing i can do with it… my block grid contains a lot of external content, that make me things to be reusable or even more dynamic things. so out of the box there is nothing i can do with the current indexes :slight_smile:
So my idea was to index the rendered output of any published content, this is for the text search functionality, this could be extended with additional criteria that can depend of the type of objects we are looking for.
Using semantics of the page through his structure H1, H2 and microdata could be a way to refine your search results… Terms in H1 could be boosted for instance comparing to plain text terms. I could even add some attributes to sections, to manually boost the content. Basically, content that is directly linked to the page should have a better prevalence, then a list of related items…
Yah it sounds like a bit overloaded.. Just thinking on how to make it all efficient with a minimal effort which is already more extended that the effort umbraco puts in.
I guess yes there are a lot of alternatives, but if you want to go for a CMS with a full set of features , umbraco could be at the end a lower option for small/medium organizations related on what other cms can offer.
In a short term i would like to have my own Umbraco template and code base, ready for most of modern functionality and implementable in terms of days or at max weeks (regardless the design part) ? But I already have a themed base setup, which works pretty well :=-)

You can create your own custom indexes and add/amend fields in the index when they are populated, in case you need to pull in data from other pages or from external sources. It may not be an elegant solution, as you would need to parse Block Grid JSON but you can create an index that suits your needs with a lot of customisation.

Including other pages and content in your index does raise issues as you would need to consider when related pages change they also reflect in the index otherwise your index can get stale and not reflect your current content.

I hope that helps!