Search within Document Parts
The function docpart() is used to search within specific document parts (headings, tables, lists, contents).
Syntax
The first parameter section_type specifies the target document part. The docpart() function is used to search within the sections listed below:
Section |
Comments |
Supported attributes |
|
Attribute |
Comments |
||
table_of_contents/contents/toc |
table of contents |
||
heading |
heading |
1. level |
Heading level: [1, 6], 1 — the most important heading, 6 — the least important heading. |
list |
list |
1. number |
List number within the document. |
2. type (ordered, unordered, bulleted) |
List type (ordered, unordered, bulleted). |
||
3. item |
Item number within the list. |
||
4. level |
List level: [1, 6], 1 — the most important level, 6 — the least important level. |
||
list_item |
list item |
number |
Item number within the list. |
table |
table |
1. name |
Table name. |
2. number |
Table number within the document. |
||
3. column/col |
Name/number of table’s column. |
||
4. col_number |
Number of table’s column. |
||
5. row |
Name/number of table’s row. |
||
6. row_number |
Number of table’s row. |
||
row |
table row |
1. name |
Row name. |
2. number |
Row number. |
||
column/col |
table column |
1. name |
Column name. |
2. number |
Column number. |
||
cell |
table cell |
||
section |
section |
1. name |
Section name; can be a PDL-query. |
2. whole (yes/no) |
If set to "yes", the name parameter refers to the entire section name (set to "no" by default) |
||
3. level |
Specifies the section’s level, corresponds to a heading. |
||
4. field (body/heading/any) |
Search within a section’s body/heading/both body and heading. Set to "any" by default. |
||
email section |
1. sender |
Email’s sender. |
|
2. recipient |
Email’s recipient. |
||
3. copy |
Recipient in copy. |
||
4. subject |
Email’s subject. |
||
5. opening |
Email’s opening. |
||
6. closing |
Email’s closing. |
||
7. signature |
Email’s signature. |
||
8. body |
Email’s body. |
||
9. date_time |
Email’s date and time. |
||
10. forwarded:yes/no |
Defines whether email is a forwarded message. |
||
page |
page/page range |
number |
Sets the page number or page range if two parameters are specified. |
hyperlink |
internet hyperlink |
The function also takes the optional parameter ocr used to find documents containing words that were recognized by the PolyAnalyst OCR module with a high recognition confidence score. The function also takes the named parameter confidence which sets the confidence range of OCR recognition.
Note
-
If users wish to search within several sections, they may list them with "|" symbol.
-
If the attributes are omitted, the function matches all sections of the specified type.
-
One can use the relational operators ">", "<", ">=", "<=", "!=" to specify a search within numerical parameters, e.g. docpart(table, col:>1, col:<3, row:>1)
-
The docpart() function matches the intersection of the query with table sections or pages set by the number argument. Therefore, the query can only partially reside in the specified table sections or on the specified pages.
-
The optional named attribute number of the page parameter can take a negative value. In this case, it is counted from the last page in the document, i.e. number:"-1" limits the query to the last page, number:>="-2" limits the query to the last two pages.
-
The hyperlink parameter finds hyperlinks only in html-pages. In order to use the parameter, it is necessary to connect the node to an already executed parent node Internet source.
-
Supported file formats for each section are listed in the table below.
Section |
Supported File Formats |
contents |
docx, odt |
heading |
docx, html, odt, pptx, ppt, rtf |
list |
docx, html, odt, pptx |
table |
docx, doc, html, odt, pptx, ppt, pdf, rtf |
section |
docx, html, odt, pptx, ppt, rtf |
page |
docx, pdf |
Example
Task example: Find the phrase "table of contents" in the table of contents
Users can write a query docpart(contents, "table of contents") that matches all occurrences of the phrase "table of contents" in the table of contents.
