Can’t crawl public-facing SharePoint 2010 publishing site

Question

2.68K viewsSeptember 5, 2013SP2010

0

Phil Greer0 September 5, 2013 0 Comments

My company has a public-facing SharePoint 2010 publishing site that is configured for anonymous access. We also have an internal SharePoint 2010 portal. The external site lives in the DMZ and the internal site lives inside of our network.

I need to be able to crawl the external web from the internal portal.

To this end, I created a content source in the internal portal’s search service application for the external website with the following settings:

type of content to be crawled = SharePoint Sites
start address = http://www.domain
crawl everything under the hostname for each start address

However, crawls return this warning:
http://www.domain
The URL was permanently moved. ( URL redirected to http://www.domain/pages/default.aspx )

This reminds me – I have a URL Rewrite mapping in IIS that points “/” to “/pages/default.aspx”. Unfortunately, when I remove that mapping, I still receive the same warning.

If I edit the content source and replace the start address with “http://www.domain/pages/default.aspx”, I receive this error:
http://www.domain/pages
Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has “Full Read” permissions on the SharePoint Web Application being crawled.

Sure enough, browsing to “http://www.domain/pages” prompts me for a login and then gives me a 401 unauthorized.

But I don’t want the crawler to go to http://www.domain/pages. I want it to start at “http://www.domain/pages/default.aspx”.

Other things I tried that didn’t help:

monkeyed around a fair bit with robots.txt on the external website
dabbled with configuring Crawl Rules
tried setting the content source to “type of content to be crawled = Web Sites”
ULS isn’t helping me on this one
searching the Google

Any clues? What am I missing?

(Visited 116 times, 1 visits today)

Add a Comment

1 Answer

score 0 · Answer 1 · 2013-09-11T22:21:00+00:00

I had some time to get back to this today and I’m happy to say that I resolved the issue. So, for anyone who experiences the same problem – I hope this helps!

Since I setup my content source as a “SharePoint Site” the crawler used the default content access account to crawl. However, the site crawled was in a different farm and on a different domain. The default content access account needed to be updated.

The first time I tried to put in the credentials for “Specify a different content access account” in my Crawl Rule for this content source, I received the message “The username or password is not valid”. However, I left “Do not allow Basic Authentication” checked.

Disregarding the warning, I unchecked this box and was able to enter the target farm’s content access account credentials.

Crawled again and viola ~ it worked!

Still not sure why I couldn’t crawl this internet-facing SharePoint 2010 site as a website, but oh well, I’m good now.

(Visited 1 times, 1 visits today)

Can’t crawl public-facing SharePoint 2010 publishing site

1 Answer

Get 200+ hours of Microsoft 365 Training for 27$!