Anindita Bhattacharya - Sitecore Solutions Architect
5 May 2021
We recently came across a requirement in a project, where the site node had multiple microsites within itself, BUT the client wanted the search feature of the main site to only include pages from the main site, and not from any of the microsites.
Sitecore search index configuration easily allows us to include/exclude templates & configure what fields we want to include in the index in the <documentOptions> section. However, there is currently no out-of-the-box way to exclude items from being indexed based on their path in the Sitecore tree.
To enable this, we wrote our custom crawler, which determined whether an item is to be included/excluded – based on the configured paths to ignore.
Please note – this solution still results in all the items being crawled, but conditionally included in the index.
We updated the custom crawler configuration – to make the excluded paths configurable. (to the microsite nodes in our case).
<?xml version="1.0" encoding="utf-8"?> <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/"> <sitecore role:require="Standalone or ContentManagement" search:require="solr"> <contentSearch> <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch"> <indexes hint="list:AddIndex"> <index id="site_master_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider"> <param desc="name">$(id)</param> <param desc="core">site_master_index</param> <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" /> <strategies hint="list:AddStrategy"> <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual" role:require="ContentManagement and !Indexing" /> <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/intervalAsyncMaster" role:require="Standalone or (ContentManagement and Indexing)" /> </strategies> <locations hint="list:AddCrawler"> <crawler type="Site.Website.Infrastructure.Search.Crawler.ExcludePathsItemCrawler, Site.Website.Infrastructure"> <Database>master</Database> <Root>/sitecore/content/home</Root> <ExcludeItemsList hint="list"> <ChicagoMetro>/home/chicago-metro</ChicagoMetro> <MemphisEast>/home/memphis-east</MemphisEast> </ExcludeItemsList> </crawler> </locations> </index> </indexes> </configuration> </contentSearch> </sitecore> </configuration>
The section of importance here in this index configuration – is the Crawler / ExcludeItemsList.
Here is the code which reads this section and uses the paths to conditionally include/exclude items in the index. We override the default method used to check if an item is excluded here:
using Sitecore.ContentSearch; using Sitecore.Diagnostics; using System.Collections.Generic; using System.Linq; namespace Site.Website.Infrastructure.Search.Crawler { public class ExcludePathsItemCrawler : SitecoreItemCrawler { public List<string> ExcludeItemsList { get; } = new List<string>(); protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false) { Assert.ArgumentNotNull(indexable, "item"); return ExcludeItemsList.Any(path => indexable.AbsolutePath.StartsWith(path)) || base.IsExcludedFromIndex(indexable, checkLocation); } } }
That’s it! The public property here automatically maps the configuration and can be used in the code as is.
You could also use this method to customize your crawler as per your requirements – other than adding path constraints, making things as configurable as you’d like!
Anindita is a 7-time Sitecore MVP and has been working on Sitecore since 2013. She has worked in various roles on Sitecore projects, from being an individual contributor to a Solution architect, and enjoys being close to the code! She is the founder/co-organizer of Sitecore User Groups in Bangalore & Mumbai and is actively involved in the Sitecore community. She has over 14 years of experience in .NET technologies and is passionate about learning and keeping up with technology.
Share on social media