constituent
Purpose
Finds documents that contain arguments within a constituent (word or group of words that act as a single unit, such as noun or verb phrases).
Arguments
The function accepts several arguments:
-
The first optional parameter constituent_type allows to specify a constituent’s label and accepts one of the values below:
np - noun phrase (a senior lawyer, the most often used accounts, the Harvard Graduate School…)
vp - verb phrase (contact your representative, will participate via conference call)
adjp - adjective phrase (clinically significant, most innovative and secure)
advp - adverb phrase (as easy as possible, even more)
pp - prepositional phrase (for your help, on Thursday, in the office)
qp - quantifier phrase (no less than six, over $1,000)
conj - conjunction phrase (and so, and then)
sbar - dependent clause (I’ll let you know [when we have some new information])
s - clause (I hope you are doing well, The meeting will probably take place on Thursday)
intj - interjection (yes, no, please)
prt - particle
lst - list marker (a., 1.)
prn - parenthetical (text within the parentheses)
frag - fragment (Step 1.)
nac - not a constituent
rrc - reduced relative clause
ucp - unlike coordinated phrase
whadj - wh-adjective phrase
whavp - wh-adverb phrase
whnp - wh-noun phrase
whpp - wh-prepositional phrase
x - unknown, uncertain, or unbracketable
If the constituent label is specified, the arguments can be omitted. In this case the function matches all the constituents tagged by a specified label.
-
The other arguments define the terms to search for.
The function also supports optional named parameters:
-
min_length, max_length, length: length (in tokens) allows to specify minimum/maximum/exact length of constituent in tokens;
-
level:=min/max allows to extract lower-level/high-level constituents only;
-
allow_punct:=yes/no allows or prohibits punctuation between arguments (set to "yes" by default);
-
allow_space:=yes/no allows or prohibits spaces between arguments (set to "no" by default);
-
match:=range matches text range starting from the first found term and ending with the last found term.
-
whole:=yes matches only the constituents contained in the query.