Skip to content
Unravel Sitecore!

Unravel Sitecore!

Follow me to re-invent and master Sitecore

  • Home
  • About Me
  • Follow Me
Posted on April 4, 2023April 4, 2023 by Deepthi K

Configure Sitecore Search Crawler

Now that you have access to sandbox and explored the search console as my first blog covers. It is time to configure the web crawler to your needs. Obviously, every instance of the website you would like to crawl has it’s own mini crawler requirements. For instance, the POC I was working with was behind a login, so, it is important that we let the crawler do it’s thing by either skipping authentication or adding some authentication parameters on the request headers. Let’s see how that can done and how we did it.

First things first, login on to search console and go to crawler settings. The settings should be located inside sources section under Administrative Tools. Once you open the source of concern, it would look like below. Do note that at least one source should already be set up by search team based on form details I talk about in my first blog here.

If you scroll down, you should see Web Crawler Settings, click on edit icon and you should now see Web Crawler settings. In our case, we did two steps here:

  • Add best practice security header ‘user-agent’ with value ‘sitecorebot’ that Sitecore recommends in their documentation. Here is the link if you are curious
  • Add another header to help us detect these requests on application side and skip authentication. Of course there are other ways to do it such as using special Authentication settings on Web Crawler, you can read about that process on Sitecore documentation here

Another few important things to check while you are in Source Settings:

  1. Ensure Trigger/s are set up according to your needs of content to be indexed. In my case, I ensured sitemap trigger is present and has right values.
  2. Check that the scan frequency is according to your needs

That is it!! After these steps, the crawler was able to crawl our website which is behind login. Happy crawler which means we can get to next important part of ensuring Index has all the right data for API calls to function well. Lets get in to that part in our next blog.

CategoriesSearch, sitecore Tagscontent behind login, indexing, Sitecore Search, web crawler settings

Post navigation

Previous PostPrevious Getting Started with Sitecore Search
Next PostNext Bump up Index game on Sitecore Search

Recent Posts

  • Exploring Sitecore Connect
  • Configure Data System for Chat GPT connection
  • Deliver Personalized Experiences
  • Custom Events to CDP
  • Innovating with Chat GPT and Sitecore: A Modern Remix

Recent Comments

  • Deepthi on Gather Content & Sitecore
  • Kamruz Jaman on Allowed Controls Issue Fortis Dynamic Placeholders
  • Shilpy on Best Practices for any Sitecore Project
  • Anonymous on Debugging CD issues – No matching constructor was found
  • Sandeep S on Multi line Field With Embed code Experience Editor Issues

Archives

  • November 2023
  • October 2023
  • April 2023
  • March 2023
  • September 2022
  • October 2021
  • August 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • November 2020
  • October 2020
  • July 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • July 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • July 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Tweets by DipSindol

Proudly powered by WordPress