Conversion to Searchable Text
Scanning is most powerful when you can combine it with (keyword) searchable text. Optical Character Recognition (OCR) software converts scanned images of printed or typewritten pages to searchable and editable text. We use a variety of software tools and have had great results with uncorrected OCR for clients and for the Stones Directories, which we produce for sale. We can also zone newspaper and journal text, where the articles and headlines are not uniformly placed on the page. For documents that are not suited to OCR we offer double entry transcription of the text.
OCR (Optical character recognition)
Computers can perform data entry by scanning images and converting the information information into a searchable file, usually either by text output or an Adobe PDF file with multiple layers - a top image layer with a hidden text-searchable layer underneath. An OCR program opens the digitised images and attempts to divide the text on the page into recognisable areas of text (paragraphs and columns) - known as "zones". Once the pages are zoned these resulting areas are processed - the program attempts to analyze the shape of the images inside the zones and convert these into a form that the computer can manipulate. An OCR system enables you to take a book,. magazine article, newspaper or other printed material and convert it into an electronic computer file which could be edited using a word processor, or searched using PDF viewing software. All OCR systems include optical scanners for reading text, and sophisticated software for analyzing images. Our OCR systems allow us to "Pattern Train" on a job-by-job basis, whereby we "teach" the system how to recgnise text of varying fonts in order to improve the accuracy of our conversion. OCR systems still have difficulty with handwritten text however, and we would recommend transcription for this type of original.
OCR is also a powerful alternative to data entry by means of automated forms processing. Any company interested in the cost-effective approach to capturing data from forms and/or retaining their images may want to consider the accuracy and readability of the OCR process.
Expected accuracy will vary depending on the OCR method chosen and the quality of the originals and digitised images.
Transcription
Transcription is the act of transcribing the textural information from a digitised image into another form, usually text information, spreadsheet or database format. Types of digitsed images that could be transcribed might include handwritten documents, index cards, lists, scripts, or simple data like names and addresses.
Double data transcription is a data entry quality control method. In the first pass through a set of records, data keystrokes are entered onto each record as the data entry operator types them. On the second pass through the batch, an operator at a separate machine enters the same data again. This information is then either fed through a computer verification program or is done by a person comparing the two blocks of data. The verifier compares the second operator's keystrokes with the contents of the record. If there were no discrepancies the verifier accepts the data. If there are discrepancies beween the two blocks of data a choice is made as to which is the best to choose from. This can be handled by means of strict vocabulary dictionaries, or manually by a data operator. The accuracy for double data transcription exceeds 99.9%.
Single-entry transcription is used in the interest of simplicity. It is usually less expensive than double-entry transcription because it does not require data to be entered twice and then compared which involves computer programming to create vocabularity dictionaries and verification rules and at least two employees who are familiar with the data.
Expected accuracy will vary depending on the transcription method chosen and the quality of the originals and digitised images.
PROCESS EXPECTED ACCURACY
OCR (Typed) 70-90+%
Single Key Transcription from typed material 90 - 95+%
Single Key Transcription from handwritten material (the quality varies depending on the handwriting) 60-95+%
Double Key Transcription 95+ - 99.5+ %
Triple key Transcription 99.5+%
Please contact us if you'd like to know more about our text conversion services.