You sort of need to think of PDFs as consisting of two layers. One layer is just a photo of the document it has been created from, that is the bit you can see. The second layer would be plain text representation of that documnet. A PDF has has been "printed" from, for instance, MS Word or Excel, would have both layers. While a scanned PDF would only have the "photo" layer.

The former is what allows the likes of Google to include PDFs in search results. In the latter case you are into the area of OCR (Optical character recognition) which is problamatic. Any results PDFs are much more likely to have been produced using the former method and you have much better chances of success there.