0

My company has a public-facing SharePoint 2010 publishing site that is configured for anonymous access. We also have an internal SharePoint 2010 portal. The external site lives in the DMZ and the internal site lives inside of our network.

I need to be able to crawl the external web from the internal portal.

To this end, I created a content source in the internal portal’s search service application for the external website with the following settings:

  • type of content to be crawled = SharePoint Sites
  • start address = http://www.domain
  • crawl everything under the hostname for each start address

However, crawls return this warning:
http://www.domain
The URL was permanently moved. ( URL redirected to http://www.domain/pages/default.aspx )

This reminds me – I have a URL Rewrite mapping in IIS that points “/” to “/pages/default.aspx”. Unfortunately, when I remove that mapping, I still receive the same warning.

If I edit the content source and replace the start address with “http://www.domain/pages/default.aspx”, I receive this error:
http://www.domain/pages
Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has “Full Read” permissions on the SharePoint Web Application being crawled.

Sure enough, browsing to “http://www.domain/pages” prompts me for a login and then gives me a 401 unauthorized.

But I don’t want the crawler to go to http://www.domain/pages. I want it to start at “http://www.domain/pages/default.aspx”.

Other things I tried that didn’t help:

  • monkeyed around a fair bit with robots.txt on the external website
  • dabbled with configuring Crawl Rules
  • tried setting the content source to “type of content to be crawled = Web Sites”
  • ULS isn’t helping me on this one
  • searching the Google

Any clues? What am I missing?

(Visited 116 times, 1 visits today)
Add a Comment