Searching PDF Files (Orbis+) |
|
In general, Orbis+ can search PDF files. The program uses a wide variety of tools to extract text from PDF file so that the text can be searched. This works well for the vast majority of PDF files. However, there are some files for which text cannot be extracted, and they therefore cannot be searched. Fortunately, these difficult-to-convert files seem to be rare. In addition, for files that were created by scanning using OCR, the text conversion may not be 100% accurate. For example, the image “modern” could come out as “modem.” In this case, the file could be searched, but a search for “modern” would fail to find this instance of the word.
You can create a textbase that includes PDF files and use it for your searches. It is likely that all of your PDF files will be searchable. However, if your textbase includes files that cannot be converted to text, then your searches will not retrieve text from these files. So for example, a search for "medieval" might find text in several PDF documents, but miss an instance of "medieval" in one document that could not be properly converted.
If you would like to assess the searchability of the documents in your textbase, Orbis offers an indexer log:
For detailed technical information, see PDF Files: Technical Notes.
See also: |