ocr

Purpose

Finds documents containing words that were recognized by the PolyAnalyst OCR module with a high recognition confidence score.

Syntax

ocr([0-100], argument)

Arguments

The function takes several arguments. The first optional argument should be an integer in the range [0-100] that sets the recognition confidence threshold. The function finds all the words whose recognition confidence score is greater than or equal to it.

The function takes any PDL-query as an argument. The words returned by the query are interpreted as a list of arguments joined by the OR operator. The ocr() function checks if the OCR confidence score of the found words falls within a specified range.

The optional named parameter confidence sets the confidence range.

Returned value

Documents matching the query.

Examples

ocr() matches all the words whose confidence is greater than the threshold specified in the OCR module settings (by default it is set to 80).

For more information see the settings of the nodes "Files" or "Optical character recognition".

ocr(80) matches all the words with confidence in the range [80, 100].

ocr(confidence:>20) matches all the words with confidence greater than 20.

ocr(confidence:<90) matches all the words with confidence less than 90.

ocr(confidence:>20, confidence:<80) matches all the words with confidence ranging from 20 to 80.

ocr(80, entity(People)) matches People occurrences consisting of words with confidence equal to or greater than 80.

ocr(a,b, confidence:<=30) matches the words "a" or "b" if their confidence is equal to or less than 30.

ocr(sentence()) matches all the sentences that contain words with confidence greater than the threshold specified in the OCR module settings.

Note

The OCR module identifies only words with a recognition confidence score below a given threshold.

By default, the confidence threshold is set to 80. The real recognition confidence of words with a score above 80 is unknown, but it is considered equal to 100.

Texts that were not processed by the OCR module are considered to have 100% confidence.

For example, the query ocr(entity(People)) matches all persons in regular texts, but if the texts were processed by the OCR module, the query will filter out occurrences containing words with low confidence score.