document

Purpose

This function is used to search within a dataset.

Syntax

document([min,][max,][term_1,][term_2,…​])

Arguments

This function has no required arguments. When called without arguments, it matches all words in the document. Optional parameters min and max specify the minimal and maximal document number within a dataset. When they are omitted, the function searches within the whole dataset. All arguments to search for must be within the same document.

The function also supports the following optional named parameters:

Parameter

Explanation

match:=range

The whole fragment of text between the first and the last arguments is extracted.

match:=arguments

Only the arguments listed inside the function are extracted (default value).

match:=first/last/shortest/longest

Matches the first/last/shortest/longest document.

whole:=yes/no

Regulates whether to extract sentences made up only by the arguments listed in the query or not (set to no by default).

allow_punct:=yes/no

Regulates whether punctuation marks are allowed within the sequence (set to yes by default).

allow_space:=yes/no

Regulates whether spaces are allowed within the sequence (set to yes by default).

min_doc:=<numeral>

Specifies the minimal document number within a dataset.

max_doc:=<numeral>

Specifies the maximal document number within a dataset.

mode:=forward/backward

Specifies a document’s position from the beginning/end of the dataset.

Note
  • If the first or/and the second arguments are numbers, they will be interpreted as min_doc and max_doc optional parameters respectively.

  • When both the first numerical arguments min and max and optional named parameters min_doc and max_doc are specified, priority will be given to the latter.

Returned Value

Documents matching the query.

Example

document(1, 2) matches the first two documents of the dataset.

document(1, 2, mode:=backward) matches the last two documents of the dataset.

document(not cat) matches the documents not containing the word "cat".

The function document() may be combined with functions like case(), length(), lemma(), etc.

case(upper, document()) matches all documents written in uppercase.

case(upper, document(abc)) matches documents containing ABC in upper case.

length(2, document(), count:=word) matches documents containing two words or more.

lemma(noun, document()) matches documents containing only nouns.