How to find out why is text not searchable in a PDF (and make it searchable)
up vote
2
down vote
favorite
I have a PDF article (not created by me).
However, I can not search for text in the PDF. All PDF viewers I've tried return zero results for words that are obviously in there. I've tried with Adobe Acrobat Professional 8, SumatraPDF and Google Chrome.
How can I find out why the document is not searchable?
Things I've checked:
- The PDFproducer is reported as 'pdftopdf' and PDf version is reported as 1.3. However, it seems to have been created in something like MSWord or OpenOffice (but not *TEX).
- It is definitely not a scanned document, as the font is crisp-clear at all zoom levels, and text is selectable.
- If I look at the security settings (ctrl-D in Adobe Acrobat), everything is allowed (like printing, copying, ...).
- my search options do not have 'match case' turned on
- I can not turn it into a searchable document using Acrobat's 'Recognize text using OCR' as it reports: 'This page contains renderable text'.
So, what else could be the reason for the DPF not being searchable?
And how to make it text-searchable?
pdf search
add a comment |
up vote
2
down vote
favorite
I have a PDF article (not created by me).
However, I can not search for text in the PDF. All PDF viewers I've tried return zero results for words that are obviously in there. I've tried with Adobe Acrobat Professional 8, SumatraPDF and Google Chrome.
How can I find out why the document is not searchable?
Things I've checked:
- The PDFproducer is reported as 'pdftopdf' and PDf version is reported as 1.3. However, it seems to have been created in something like MSWord or OpenOffice (but not *TEX).
- It is definitely not a scanned document, as the font is crisp-clear at all zoom levels, and text is selectable.
- If I look at the security settings (ctrl-D in Adobe Acrobat), everything is allowed (like printing, copying, ...).
- my search options do not have 'match case' turned on
- I can not turn it into a searchable document using Acrobat's 'Recognize text using OCR' as it reports: 'This page contains renderable text'.
So, what else could be the reason for the DPF not being searchable?
And how to make it text-searchable?
pdf search
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a PDF article (not created by me).
However, I can not search for text in the PDF. All PDF viewers I've tried return zero results for words that are obviously in there. I've tried with Adobe Acrobat Professional 8, SumatraPDF and Google Chrome.
How can I find out why the document is not searchable?
Things I've checked:
- The PDFproducer is reported as 'pdftopdf' and PDf version is reported as 1.3. However, it seems to have been created in something like MSWord or OpenOffice (but not *TEX).
- It is definitely not a scanned document, as the font is crisp-clear at all zoom levels, and text is selectable.
- If I look at the security settings (ctrl-D in Adobe Acrobat), everything is allowed (like printing, copying, ...).
- my search options do not have 'match case' turned on
- I can not turn it into a searchable document using Acrobat's 'Recognize text using OCR' as it reports: 'This page contains renderable text'.
So, what else could be the reason for the DPF not being searchable?
And how to make it text-searchable?
pdf search
I have a PDF article (not created by me).
However, I can not search for text in the PDF. All PDF viewers I've tried return zero results for words that are obviously in there. I've tried with Adobe Acrobat Professional 8, SumatraPDF and Google Chrome.
How can I find out why the document is not searchable?
Things I've checked:
- The PDFproducer is reported as 'pdftopdf' and PDf version is reported as 1.3. However, it seems to have been created in something like MSWord or OpenOffice (but not *TEX).
- It is definitely not a scanned document, as the font is crisp-clear at all zoom levels, and text is selectable.
- If I look at the security settings (ctrl-D in Adobe Acrobat), everything is allowed (like printing, copying, ...).
- my search options do not have 'match case' turned on
- I can not turn it into a searchable document using Acrobat's 'Recognize text using OCR' as it reports: 'This page contains renderable text'.
So, what else could be the reason for the DPF not being searchable?
And how to make it text-searchable?
pdf search
pdf search
edited Mar 6 '13 at 10:33
asked Mar 6 '13 at 9:45
Rabarberski
4,442216078
4,442216078
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17
add a comment |
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17
add a comment |
5 Answers
5
active
oldest
votes
up vote
6
down vote
accepted
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
add a comment |
up vote
1
down vote
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
add a comment |
up vote
0
down vote
I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.
What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
add a comment |
up vote
0
down vote
go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document
add a comment |
up vote
0
down vote
So after trying a lot of things that didn't work. Here's how I actually got this done:
Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )
Follow al the necessary steps to convert BUT before that--
Find the button that says something like 'optical character recognition' and click that
Convert your file and you should be golden.
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
add a comment |
up vote
6
down vote
accepted
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
add a comment |
up vote
6
down vote
accepted
up vote
6
down vote
accepted
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See https://stackoverflow.com/questions/12703387/pdf-font-encoding.
and https://stackoverflow.com/questions/4523283/how-do-you-debug-pdf-files
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
edited May 23 '17 at 12:41
Community♦
1
1
answered Mar 6 '13 at 10:24
RedGrittyBrick
66.3k12104159
66.3k12104159
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
add a comment |
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
Ah, the encoding seems indeed to be the issue. When I try to copy paste text, I get garbage. And the Font tab in Acrobat says for each listed font 'encoding: custom'
– Rabarberski
Mar 6 '13 at 10:30
add a comment |
up vote
1
down vote
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
add a comment |
up vote
1
down vote
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
add a comment |
up vote
1
down vote
up vote
1
down vote
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
answered Apr 29 '16 at 7:27
Don
111
111
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
add a comment |
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
I think the edit document text option is only available in the paid version of Acrobat.
– Burgi
May 1 '16 at 18:57
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
Probably - the original poster has Acrobat Professional 8. That should have it. This approach (changing the font) may work with other tools.
– Don
May 4 '16 at 3:03
add a comment |
up vote
0
down vote
I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.
What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
add a comment |
up vote
0
down vote
I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.
What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
add a comment |
up vote
0
down vote
up vote
0
down vote
I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.
What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.
I was having the same problem, and in frustration, googled to find an answer. It turns out that for me, the problem was simply that I was using Preview on my iMac to view and search the PDF. In most cases, searching works in Preview. But for a large book downloaded from Google Books, it didn't.
What worked was simply opening the PDF in Adobe Reader. (Duh, what a concept, I know.) Now I can search. This probably won't work for everyone with a Mac, but it might help someone.
answered Jan 2 '17 at 19:18
Susan
1
1
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
add a comment |
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
"I've tried with Adobe Acrobat Professional 8" OP said. Please read the question carefully.
– NetwOrchestration
Jan 2 '17 at 19:43
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
Please read the question again carefully. Your answer does not answer the original question.
– DavidPostill♦
Jan 29 '17 at 15:47
add a comment |
up vote
0
down vote
go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document
add a comment |
up vote
0
down vote
go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document
add a comment |
up vote
0
down vote
up vote
0
down vote
go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document
go to Edit / preferences - select 'search' from the left hand side of preferences screen - then 'Purge Cache Contents' - select OK then close and reopen the document
answered Jun 1 '17 at 22:09
hope this helps
1
1
add a comment |
add a comment |
up vote
0
down vote
So after trying a lot of things that didn't work. Here's how I actually got this done:
Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )
Follow al the necessary steps to convert BUT before that--
Find the button that says something like 'optical character recognition' and click that
Convert your file and you should be golden.
add a comment |
up vote
0
down vote
So after trying a lot of things that didn't work. Here's how I actually got this done:
Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )
Follow al the necessary steps to convert BUT before that--
Find the button that says something like 'optical character recognition' and click that
Convert your file and you should be golden.
add a comment |
up vote
0
down vote
up vote
0
down vote
So after trying a lot of things that didn't work. Here's how I actually got this done:
Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )
Follow al the necessary steps to convert BUT before that--
Find the button that says something like 'optical character recognition' and click that
Convert your file and you should be golden.
So after trying a lot of things that didn't work. Here's how I actually got this done:
Find yourself a PDF to Word converter or something. (I recommend https://www.online-convert.com/ )
Follow al the necessary steps to convert BUT before that--
Find the button that says something like 'optical character recognition' and click that
Convert your file and you should be golden.
answered Jun 1 at 20:39
Alex
1
1
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f561589%2fhow-to-find-out-why-is-text-not-searchable-in-a-pdf-and-make-it-searchable%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Interesting, is that document contains any sensitive data? if not can you share it?
– SparKot
Mar 6 '13 at 9:49
@SparKot: I am not sure if I can share the document, so I prefer rather not to. Although I understand this would greatly aid in troubleshooting.
– Rabarberski
Mar 6 '13 at 10:02
Have you tried to upload it to Evernote and check if they can make it searchable? AFAIK they have a good OCR engine for that task.
– ChaosCakeCoder
Mar 6 '13 at 10:17