ocr
Purpose
Finds documents containing words that were recognized by the PolyAnalyst OCR module with a high recognition confidence score.
Arguments
The function takes several arguments. The first optional argument should be an integer in the range [0-100] that sets the recognition confidence threshold. The function finds all the words whose recognition confidence score is greater than or equal to it.
The function takes any PDL-query as an argument. The words returned by the query are interpreted as a list of arguments joined by the OR operator. The ocr() function checks if the OCR confidence score of the found words falls within a specified range.
The optional named parameter confidence sets the confidence range.
Note
The OCR module identifies only words with a recognition confidence score below a given threshold.
By default, the confidence threshold is set to 80. The real recognition confidence of words with a score above 80 is unknown, but it is considered equal to 100.
Texts that were not processed by the OCR module are considered to have 100% confidence.
For example, the query ocr(entity(People)) matches all persons in regular texts, but if the texts were processed by the OCR module, the query will filter out occurrences containing words with low confidence score.