SharePoint 2013 - SPLessons

SharePoint Crawling and Scheduling

Home > Lesson > Chapter 65
SPLessons 5 Steps, 3 Clicks
5 Steps - 3 Clicks

SharePoint Crawling and Scheduling

SharePoint Crawling and Scheduling

SharePoint Search Service Application needs to start doing work of crawling content before you can begin to see search results. The crawl component takes on the heavy lifting of crawling over content, and the content processing component parses found content from the crawler and passes raw content to the indexing component.The Crawl Component will crawl over a number of content sources, available out of the  box. The following is a list of the available content sources that SharePoint provides:
  • SharePoint content
  • SharePoint user profiles
  • Web pages via HTTP
  • File shares
  • Exchange
  • Lotus Notes

Crawl Rules

Crawl rules define a set of rules such that the crawl processor can determine whether to process a piece of content for indexing. There is more to crawl rules than just inclusion or exclusion of entities in the search index. Crawl rules also allow users to instruct the crawler what authentication to apply to given content. By default, the crawler uses a Search Service account (defined when you installed SharePoint 2013) to access content.

Crawled Properties

Crawled properties consist of automatically extracted crawled content, grouped by category based on the protocol handler or IFILTER used by the content processor. In layperson’s terms, the SharePoint crawler crawls content from a content source and extracts metadata—such as that defined by content owners. When the crawler extracts the content for a piece of metadata it creates a new crawled property and assigns it the value of the extracted content. For example, a document might contain title metadata, which defines the title of the document within a content type or document library. When the crawler stumbles upon a document with the title metadata, it will create a new crawled property for the title field.

Scheduling

In the Search service application, you can schedule a full or incremental crawl of a content source. There are four types of Schedules:
  • DailySchedule - Used to specify the number of days between crawls.
  • WeeklySchedule - Used to specify the number of weeks between crawls.
  • MonthlyDateSchedule - Used to specify the days of the month and months of the year when the crawl should occur.
  • MonthlyDayOfWeekSchedule - Used to specify the days of the month, the weeks of the month, and the months of the year when the crawl should occur.