Hi,
Today I was faced with an interesting question regarding SharePoint’s search capabilities. I have a need to customize the search results as to display content of PDF files returned from the search in a specific format.
Suppose there is a document library of patient records in PDF format. Each PDF would contain (amongst other information) the patient ID, DOB, address, and hair colour. Now, if I searched for a complete or partial patient ID I would like the results to display as follows:
123431
DOB: 10/30/1973
Address: 123 Main Street, Mainsville, MA
Hair Color: Black
582654
DOB: 11/07/2001
Address: 234 Front Street, Toronto, ON
Hair Color: Blonde
The Patient ID would be found inside the PDF and would be a link to the document. The rest other fields would be picked-up from inside the PDF. I know that the search results can be formatted using XSLT in SharePoint 2010 and using the GUI in SharePoint 2013. However, can SharePoint be set up to read data from inside the result documents and display it in the results?
Thanks,
-Haniel
Hey Haniel,
SharePoint itself wont pick up specific content from inside documents (other than as generic document body content) with the exception of some clever stuff for Office documents (can pick up Title’s etc). I suspect you’ll need to write a pipeline enhancement (Content Enrichment WebService), which has the ability to read PDFs and hunt out the specific content you’re wanting to extract. Hopefully it’s delimited somewhat consistently.
Val blogged about it a couple of months ago.
You may also be able to get some of the fields with acustom entity extractor but your milage may vary depending on your content.
Alternatively, you could separately inspect the content (either manually or using various tagging systems like Pingar) and either inject metadata into the PDF (there are a number of fields available, though I don’t know how many SP Search picks up) or add as metadata in SP. I guess it depends on your volumes…
Good luck, it’s not for the faint of heart 🙂
FYI We’ve done similar things in SP2010 with FAST and will need to port to SP2013 next year…
Regards
Craig