I want to extend my externalindex with some fields. For instance if we are looking at a TAG datatype field, the tags are comma separated… This is not working for facetting so i want to add a field tag_face, where i can index each value separately. So the index will contain each tag non tokenized, as a tag could be multiple words.
But there is no FieldDefinitionsTypes for Array for instance. And even i can not map a field to a new name. For instance i maybe still need the original “tags” definitions, but i want to add “tags_facets” for the tags field of the content type.
You can then take the tags values and save them in any format you like, here is a sample of how I’ve handled tags in the past:
case FaqItem.ModelTypeAlias:
//Setting searchableTags on FAQ items
if (contentNode.HasValue(AliasConstants.FAQItem.Tags))
{
var hasTags = e.ValueSet.Values.TryGetValue(ExamineConstants.FieldNames.Tags, out var existingTags);
if (!hasTags)
{
return;
}
var tagsLowercase = existingTags!.Select(t => t.ToString()?.ToLowerInvariant());
var newValues = new List<object>();
foreach (var tag in tagsLowercase)
{
if (!string.IsNullOrWhiteSpace(tag))
{
newValues.Add(tag.AddLuceneDelimiterWord());
}
}
updateValues.Add(ExamineConstants.FieldNames.SearchableTags, newValues);
}
break;
}
This is in Umbraco 17 but previous versions should handle it in a similar way.
This saves the tags in to a new field that I’ve named SearchableTags.
Yes I can add FieldDefinitions from the documentation I read. But where do I plug custom indexing ? This is not in the documentation. And it’s a bit overload to create a new index just to add one or two fields that can be used for facetting efficiently with Lucene.
Yes, I read that one. But there is no FieldDefinitions types for multiple values. I do not want multiple values stored as CSV, i need to index each value of a field. This is how we can work efficiently with Lucene and facetting.
So how can i change the indexing, by providing my own logic and my own fields… as FieldDefinitionTypes is lacking of some features. I even don’t know why it has not been implemented, the core fields like __Path should already be implemented with an additional field where we can just find all docs where the path contains a certain nodeId, and not going through like search all nodes with *,NodeId,* or endsWith *,NodeId. Same for tags or any multiple values
Hi @cmwalolo
Here is the code I use to search for tags on a project I have just now.
Once you have the tags in their own field in Examine (External Index) when you search from the frontend you can call something like this :
public IEnumerable<FaqItem> GetFAQsByTags(IList<string>? tags, IPublishedContent contextPage)
{
if (tags == null || tags.Count == 0)
{
return Enumerable.Empty<FaqItem>();
}
var searchPathIds = GetSearchPathIds(contextPage).ToList();
var allFaqItems = new List<FaqItem>();
foreach (var tag in tags)
{
var faqItems = SearchFAQsByTag(contextPage, tag, searchPathIds);
allFaqItems.AddRange(faqItems);
}
return allFaqItems.GroupBy(faq => faq.Id).Select(group => group.First());
}
private IEnumerable<FaqItem> SearchFAQsByTag(IPublishedContent contextPage, string tag, IEnumerable<string> searchPathIds)
{
if (!_examineManager.TryGetIndex(UmbracoIndexes.ExternalIndexName, out var index))
{
_logger.LogWarning("External index not found when searching for FAQ tag: {Tag}", tag);
return Enumerable.Empty<FaqItem>();
}
try
{
var searcher = index.Searcher;
var query = searcher.CreateQuery("content");
var tagQuery = query.Field(ExamineConstants.FieldNames.SearchableTags, tag.AddLuceneDelimiterWord().ToLowerInvariant().Escape());
if (searchPathIds.Any())
{
tagQuery = tagQuery.And().GroupedOr(["searchPath"], searchPathIds.ToArray());
}
var results = tagQuery.Execute(QueryOptions.SkipTake(0, 9999));
return results.Select(x => _umbracoHelper.Content(x.Id)).OfType<FaqItem>();
}
catch (Exception ex)
{
_logger.LogError(ex, "Error searching for FAQ items with tag: {Tag}", tag);
return Enumerable.Empty<FaqItem>();
}
}
Also if you are searching for ids in the path, if that’s all you want then I think the improvements to the content/publishedcahe/media repositories there have been published tests that show the examine is actually slower than a linq query… though obviously your use case will dictate.
Thank you,
Yes, I have a look into it, and check the code, to see if it’s what i want to go to. i need to be able to querying lucene indexes using TermQuery.
It’s not only about the path, it can be any multiple values fields. If there is only optimization on the Path query with Linq then it’s not enough.
My code is ready to be used, my only issue is to index multiple values fields other by storing a comma separated value .
In the umbraco docs I see there is a complete example for creating a new index, but I don’t see how I can extend an existing index.
And by the way yes Linq can be faster, as for now querying the Path with Lucene needs at least two boolean queries with wildcards. To handle ‘x,NodeId,x’ or endsWith ‘,NodeIDx’. x is for asterix. so it means a traversal of all terms of Lucene.
So the benchmark, even if it shows a better performance for Linq, after optimization… Doesn’t point out that the Lucene implementation is not optimal
Great, with the SearchExtensions I have now the path as a multiple value.
if (request.NodeId > 0)
{
var query = new TermQuery(new Term("path", request.NodeId.ToString()));
//var pathQuery = new BooleanQuery
// {
// { new WildcardQuery(new Term("__Path", $"*,{request.NodeId},*")), Occur.SHOULD }, // middle
// { new WildcardQuery(new Term("__Path", $"*,{request.NodeId}")), Occur.SHOULD } // end
// };
So my query is much simpler and optimal.
The weird thing is that the ConfigureIndexOptions of the package seems to be global to all packages. Is there a way to configure those options by Index ?
I’ll find a way to apply this to other fields, my different tags for now.
The bad thing is unfortunately my Examine Dashboard in the backend is broken now, i can’t query an index anymore, as i get some errors. Boring !