I’ve been challenged with reducing the amount of time for our Full Crawls. We are crawling currently about 1.6 million docs. Our crawl rate is terrible. It’s anywhere from 18 to 21 hours. The incremental crawls however are MUCH faster.
I have a dedicated crawl server setup – 6 Procs – 16 GB of ram. Plus I am using the Application server (same specs) as a secondary crawl server.
It’s my understanding we should be hitting a crawl rate of around 45 docs per second. We are not even close to that.
What I’ve tried so far.. creating a new content source – Index Reset – Full Crawl.
Turning off – Offload Task and Chimney on SQL and Crawl servers. Disabled A\V checks during that time.
Looking for suggestions….
Thanks SP-C
If you suspect SQL then you might want to check the indexes and fragmentation as they will impact the ability of the system to return documents quickly.
There is a good doc on maintaining SQL that you can get from Microsoft: Database Maintenance for Microsoft SharePoint 2010 Products
Moving the Data/Log files to appropriate disks alone could shave a ton of time off of your crawls. Also, check your log file growth settings to make sure they are 10% or so.
I would make those changes and then monitor before making more changes.
SQL could be the problem. *Note I was given this baby* The Data and Logs are on the same disk (I know I know) I have moved all other applications to different disk on our SQL Cluster.
We currently have 48GB of memory (soon to be more) and Max memory is set to 38GB
Have to admit I’ve never heard of Max Degree of Parallelism. Ours was set to 0 (default) Did a little research and will be changing it to 1. I will also try to get the Logs moved to a different disk. I will monitor for spikes when the Full crawl runs this weekend.
You can increase the number that you are pulling at a time to 72 I believe. I wouldn’t necessarily just crank it up that high, but you might increase it a little at a time to see if it makes a difference.
You will have to look at the SQL side too. Make sure that the configuration there is solid. Data and log files on separate physical spindles. DOP is set to 1. Max Memory is set so that the OS and other programs have room to breath. You might also look at the IOPS on the disks to see if SQL is the bottleneck.
Also, check this out: http://www.networkworld.com/news/tech/2010/052410-tech-update.html