Search by Constituents

In order to find a lexeme as a part of a syntactic constituent use the function constituent().

Syntax

constituent([constituent_type,] term1, term2…​ )

The first optional parameter constituent_type allows to specify a constituent’s label and accepts one of the values from the constituency tree. Here are the most common labels.

constituent_type

Description

Example

np

Noun Phrase

a senior lawyer

the most often used accounts

the Harvard Graduate School

vp

Verb Phrase

contact your representative

will participate via conference call

adjp

Adjective Phrase

clinically significant

most innovative and secure

advp

Adverb Phrase

as easy as possible

even more

pp

Prepositional Phrase

for your help

sbar

Dependent Clause

I’ll let you know [when we have some new information]

s

Clause

I hope you are doing well

The meeting will probably take place on Thursday

For all supported Constituency labels users may consult Constituency Labels.

If constituent_type is omitted, the function returns the term as part of any constituent.

If the constituent label is specified, while the term arguments are omitted, the function matches all the constituents tagged by a specified label.

Example

pdl constituent 3

"The newly released tablet was released on Monday"

constituent(np) returns all noun phrases: The newly released tablet

constituent(release) returns "The newly released tablet was released on Monday" because no constituent type is specified

constituent(vp, release) returns "The newly released tablet was released on Monday" because only the second "released" in the sentence is a part of the VP

The function also supports optional named arguments.

  • min_length / max_length / length:= length (in tokens) - specifies minimal, maximum or exact length of constituent.

  • level:=min/max - extracts either lower-level or higher-level constituents only; by default set to level:=max.

  • match:=range - extracts the whole constituent.

  • whole:=yes - extracts only the constituents contained in the query.

Example

pdl constituent 1

"The company expects the new smartphones to be delivered by Sunday"

pdl constituent 2

"The expected smartphones are produced by the new Chinese company"

pdl constituent 4

"She is really looking forward to the red smartphone from the new Chinese line"

constituent(expect, smartphone) returns only "the expected smartphones" from the second sentence, because in the the first one the arguments do not constitute a valid fragment.

constituent(np, company) returns company in the first and the second sentences where it is a part of a noun phrase.

constituent(pp, line, match:=range), is equal to constituent(pp, line, match:=range, level:=max), returns "She is really looking forward to the red smartphone from the new Chinese line".

constituent(pp, line, match:=range, level:=min) returns the lowest preposition phrase, containing "line": "She is really looking forward to the red smartphone from the new Chinese line".

constituent(np, company, match:=range) returns "The company expects …​" and "…​ produced by the new Chinese company". With the named argument match:=range the whole constituent is returned.

constituent(np, company, min_length:=3, match:=range) returns only "…​ produced by the new Chinese company" since the length in tokens of this constituent is 4, and does not return "The company expects …​" since the length of this one is only 2 tokens.