constituent

Purpose

Finds documents that contain arguments within a constituent (word or group of words that act as a single unit, such as noun or verb phrases).

Syntax

constituent([constituent_type,] term1, term2… )

Arguments

The function accepts several arguments:

The first optional parameter constituent_type allows to specify a constituent’s label and accepts one of the values below:

np - noun phrase (a senior lawyer, the most often used accounts, the Harvard Graduate School…)

vp - verb phrase (contact your representative, will participate via conference call)

adjp - adjective phrase (clinically significant, most innovative and secure)

advp - adverb phrase (as easy as possible, even more)

pp - prepositional phrase (for your help, on Thursday, in the office)

qp - quantifier phrase (no less than six, over $1,000)

conj - conjunction phrase (and so, and then)

sbar - dependent clause (I’ll let you know [when we have some new information])

s - clause (I hope you are doing well, The meeting will probably take place on Thursday)

intj - interjection (yes, no, please)

prt - particle

lst - list marker (a., 1.)

prn - parenthetical (text within the parentheses)

frag - fragment (Step 1.)

nac - not a constituent

rrc - reduced relative clause

ucp - unlike coordinated phrase

whadj - wh-adjective phrase

whavp - wh-adverb phrase

whnp - wh-noun phrase

whpp - wh-prepositional phrase

x - unknown, uncertain, or unbracketable

If the constituent label is specified, the arguments can be omitted. In this case the function matches all the constituents tagged by a specified label.

The other arguments define the terms to search for.

The function also supports optional named parameters:

min_length, max_length, length: length (in tokens) allows to specify minimum/maximum/exact length of constituent in tokens;
level:=min/max allows to extract lower-level/high-level constituents only;
allow_punct:=yes/no allows or prohibits punctuation between arguments (set to "yes" by default);
allow_space:=yes/no allows or prohibits spaces between arguments (set to "no" by default);
match:=range matches text range starting from the first found term and ending with the last found term.
whole:=yes matches only the constituents contained in the query.

Returned Value

Documents matching the query.

Examples

constituent(expect, smartphone)) returns "The expected smartphones are produced by the new Chinese company";

constituent(np, company) returns company in "The expected smartphones are produced by the new Chinese company" and "The company expects the new smartphones to be delivered by Sunday" as it is a part of a noun phrase there;

constituent(pp, line, match:=range), is equal to constituent(pp, line, match:=range, level:=max), returns "She is really looking forward to the red smartphone from the new Chinese line";

constituent(pp, line, match:=range, level:=min) return the lowest preposition phrase, containing "line": "She is really looking forward to the red smartphone from the new Chinese line";

constituent(np, company, match:=range) returns "The company expects …" and "… produced by the new Chinese company". With the named argument match:=range the whole constituent is returned;

constituent(np, company, min_length:=3, match:=range) returns only "… produced by the new Chinese company", as the length in tokens of this constituent is 4, and does not return "The company expects …" as the length of this one is only 2 tokens.

constituent(provide, value) returns "The report provides value" and "The report provides value in millions of US dollars".

constituent(provide, value, whole:=yes) returns "The report provides value".