I’ve been challenged with reducing the amount of time for our Full Crawls. We are crawling currently about 1.6 million docs. Our crawl rate is terrible. It’s anywhere from 18 to 21 hours. The incremental crawls however are MUCH faster.
I have a dedicated crawl server setup – 6 Procs – 16 GB of ram. Plus I am using the Application server (same specs) as a secondary crawl server.
It’s my understanding we should be hitting a crawl rate of around 45 docs per second. We are not even close to that.
What I’ve tried so far.. creating a new content source – Index Reset – Full Crawl.
Turning off – Offload Task and Chimney on SQL and Crawl servers. Disabled A\V checks during that time.
Looking for suggestions….
Thanks SP-C
Add more crawl components and CPU. If needed, separate indexed databases on another SQL Server.
http://www.pdflib.com/products/tet-pdf-ifilter/
http://www.foxitsoftware.com/products/ifilter/performance.php
As of right now we don’t have many PDFs (mostly TIF files) however that is about to change. VERY quickly. Do you have any 3rd party suggestions.
Thanks again
Modifying the impact rule so that it crawls more items at a time would be better and give you a more measured result. If you are using the Adobe iFilter and have a good number of PDFs, then that could have an impact as well. The Adobe iFilter can only crawl 1 item at a time as it is not multi-threaded. You may consider another 3rd Party iFilter that can crawl multiple items at a time.
Moved all the log files to different disk and ran a Full over the holiday – no dice. Still took 21 hours. I noted that the guy that set this up had a crawler impact rule of 10 docs per time on. Going to remove that rule and see what happens.