![]() |
![]() |
![]() |
Help: OCR (Optical Character Recognition)OCR (Optical Character Recognition) is the process of converting a bitmap image of text (like a scanned document) into text that can be selected, copied and searched by PDFpen and other text editing software. Once the text has been recognized by OCR, it is placed on an invisible layer above the image of text that you can see. When you copy text, the text is copied from this invisible OCR layer. OCR technology will not produce a perfect rendering of the bitmapped text. You will need to proofread and edit the text that results from OCR. Using OCR in PDFpen
While PDFpen is performing the OCR, a progress bar will appear. The operation can take a few seconds or much longer, depending on the size and contents of the scanned document. To perform OCR manually, choose Edit > OCR Page. PDFpen commences to perform the OCR operation and the progress bar appears. Batch OCR (PDFpenPro Only)OCR multiple documents at a time.
Progress As each file completes, its progress indicator turns green, indicating success. Yellow or red means OCR failed. If a yellow or red circle appears next to the file name, either try again or send the document to support for assistance. The documents will OCR in the background as you continue working in PDFpen. Add more documents to the list anytime. Each document saves back to its original file. OCR continues even if you close the window. Reopen the window from File > OCR Files. A chime sounds once the full list has completed. A list of completed files remains in the window. Remove completed entries with Remove. Selecting, copying and correcting OCR TextOnce OCR is finished, the document’s text can be edited like any other text. To make visible text changes use Correct Text, details in Working with Text. Searching OCR TextThe text generated by the OCR operation can be searched like any other text. See Searching Within A PDF. Tips to Improve the OCR Results of Your Document:
Forcing OCRPDFpen looks at the document and if it sees one image the size of a page, it assumes that the document is a scan and automatically offers to perform OCR. In some cases, PDFpen may not recognize a scanned document. Under the Edit menu, OCR Page will be grayed out and unavailable to select.
Viewing the OCR Text LayerOnce text has been recognized by the OCR process, it is placed on an invisible layer above the image of text that you can see. When you copy text, the text is copied from this invisible OCR text layer. Text from the OCR text layer is a close, but not perfect, rendering of the bitmapped text. You will need to proofread and edit the text that results from OCR. When you copy and paste the OCR text, you may notice some inaccuracies which you can correct at that time. View the OCR text layer:
Remove the OCR LayerTo completely remove the OCR layer from a document:
At this point, you may redo OCR, or use the document as is. If you want to remove the OCR from a document to redo it, you may Force OCR. Editing the OCR Text Layer (PDFpenPro Only)Make corrections to the OCR text layer.
Changes to the OCR text layer are not the same as changes made using the Correct Text tool since changes to the OCR text layer are not made to the visible text of the document. Also, like using the Correct Text tool, this is aimed at correcting typos and small errors, not reformatting an entire document. For layout changes and major edits, export the document to Word format, and make changes in a word processor. Dictionaries and OCRMedical and legal dictionaries are included in PDFpen’s OCR engine to improve the quality of OCR output for scanned documents by recognizing words specific to the medical and legal professions. This feature is built-in, so there is no need to turn on or adjust any setting. If you choose to edit OCR text, misspelled words for selected text may be displayed with a red squiggly underline. |
||
© 2003-2020 SmileOnMyMac, LLC dba Smile. All rights reserved. PDFpen and PDFpenPro are registered trademarks of Smile. The Smile logo is a trademark of Smile. |