constituent

Purpose

Finds documents that contain arguments within a constituent (word or group of words that act as a single unit, such as noun or verb phrases).

Syntax

constituent([constituent_type,] term1, term2…​ )

Arguments

The function accepts several arguments:

  • The first optional parameter constituent_type allows to specify a constituent’s label and accepts one of the values below:

np - noun phrase (a senior lawyer, the most often used accounts, the Harvard Graduate School…​)

vp - verb phrase (contact your representative, will participate via conference call)

adjp - adjective phrase (clinically significant, most innovative and secure)

advp - adverb phrase (as easy as possible, even more)

pp - prepositional phrase (for your help, on Thursday, in the office)

qp - quantifier phrase (no less than six, over $1,000)

conj - conjunction phrase (and so, and then)

sbar - dependent clause (I’ll let you know [when we have some new information])

s - clause (I hope you are doing well, The meeting will probably take place on Thursday)

intj - interjection (yes, no, please)

prt - particle

lst - list marker (a., 1.)

prn - parenthetical (text within the parentheses)

frag - fragment (Step 1.)

nac - not a constituent

rrc - reduced relative clause

ucp - unlike coordinated phrase

whadj - wh-adjective phrase

whavp - wh-adverb phrase

whnp - wh-noun phrase

whpp - wh-prepositional phrase

x - unknown, uncertain, or unbracketable

If the constituent label is specified, the arguments can be omitted. In this case the function matches all the constituents tagged by a specified label.

  • The other arguments define the terms to search for.

The function also supports optional named parameters:

  • min_length, max_length, length: length (in tokens) allows to specify minimum/maximum/exact length of constituent in tokens;

  • level:=min/max allows to extract lower-level/high-level constituents only;

  • allow_punct:=yes/no allows or prohibits punctuation between arguments (set to "yes" by default);

  • allow_space:=yes/no allows or prohibits spaces between arguments (set to "no" by default);

  • match:=range matches text range starting from the first found term and ending with the last found term.

  • whole:=yes matches only the constituents contained in the query.

Returned Value

Documents matching the query.

Examples

constituent(expect, smartphone)) returns "The expected smartphones are produced by the new Chinese company";

constituent(np, company) returns company in "The expected smartphones are produced by the new Chinese company" and "The company expects the new smartphones to be delivered by Sunday" as it is a part of a noun phrase there;

constituent(pp, line, match:=range), is equal to constituent(pp, line, match:=range, level:=max), returns "She is really looking forward to the red smartphone from the new Chinese line";

constituent(pp, line, match:=range, level:=min) return the lowest preposition phrase, containing "line": "She is really looking forward to the red smartphone from the new Chinese line";

constituent(np, company, match:=range) returns "The company expects …​" and "…​ produced by the new Chinese company". With the named argument match:=range the whole constituent is returned;

constituent(np, company, min_length:=3, match:=range) returns only "…​ produced by the new Chinese company", as the length in tokens of this constituent is 4, and does not return "The company expects …​" as the length of this one is only 2 tokens.

constituent(provide, value) returns "The report provides value" and "The report provides value in millions of US dollars".

constituent(provide, value, whole:=yes) returns "The report provides value".