I have heard that Sharepoint could be the answer to a loosely managed, fast growing file share. I was asked to try and find a solution for a file share which currently houses millions of files, and terabytes of data using Sharepoint. I see a number of huge challenges with this exercise if we use Sharepoint, and causing many questions.Â
Should we import the documents into the database, should we use RBS, should the files stay on the file share and just import metadata or references? How would that happen? What type of storage, how would you set up the content databases? Would you use multiple drives? How will this effect backups? Also what is the best way to migrate 10s of millions of files?Â
We would probably want to eliminate duplicates, keep versions, and store meta-data for easy searching.
Does anyone have any experience, ideas or suggestions regarding this type of process being managed in Sharepoint? Are there any third party applications that make sense? Has anyone actually done something like this before.
Thanks in advance for your help.Â
Vikki McCormick
Do the fileshare documents have custom metadata already (outside of the standard metadata: title, author, created, modified etc.)? Probably not and you likely don’t want to set that up for millions of documents.
Perhaps look at a tool like Concept Searching for auto-tagging documents. I haven’t used it, but did attend a few of their webinars and it looks like an interesting product.
As for testing on a smaller scale, yes that is a good place to start. Choose a small subset of a couple thousand documents and create a content source just for that subset. Test and see what works and what doesn’t in your environment.
If you have any other specific questions along the way, don’t hesitate to ask.Â