PDL language
The pattern definition language (PDL) is PolyAnalyst’s proprietary query language designed for information retrieval from unstructured texts. PDL is a powerful tool for analyzing textual data without formal structure (such as news articles, blog posts, customer feedback, research papers, reports, social media, etc). Using PDL queries it is possible to retrieve any information, such as names of companies, vehicle models, contact information, product defects, names of drugs and chemicals, research article topics, share rates, market dynamics information, customer problems, etc.
For instance, a query that searches for words in title case followed by words "Ltd", "Co" or "Inc" retrieves company names ("Samsung Electronics Co.", "Apple Inc.", "Argus Solutions, Ltd" etc.). A query to match mentions of ministries would be the word "ministry" in title case followed by preposition "of" and a sequence of words in title case ("Ministry of Trade", "Ministry of Foreign Economic Relations", "Ministry of Internal Affairs", etc.).
PDL includes a wide range of features from simple word matching to ontology-based searching and allows to capture different ways in which information of interest may be expressed.
For instance, PDL allows user to search for words from a particular dictionary, word synonyms (such as "sales grow" and "sales increase"), parts of speech (such as nouns or verbs), or to specify morphological features of the words (such as the word "park" as a verb but not as a noun).
PDL supports advanced proximity search to match arguments in text within a specified distance. Proximity queries can specify the required distance, search within one or several sentences, set constraints on the word order and surrounding context, and indicate terms that must or must not occur.
Using PDL syntax-based features it is possible to query syntax trees and search for concepts connected by syntactic relationship.
PDL also provides ontology-based functions to search for semantically similar or related terms from associated ontologies.
In order to retrieve complex information, search queries can be combined or nested inside each other.
PDL Syntax
A PDL query is a sequence of PDL functions, operators and strings.
Functions
The function name is followed by parentheses containing the list of comma separated arguments. If a function has no parameters, the parentheses are left empty. Function names are case-insensitive but for better readability, it is recommended to use a single format, like all lowercase, or all uppercase.
Syntax
Example
Functions parameters
Many PDL functions support optional parameters to change the function’s default behavior. Parameters are usually passed as the function’s first argument.
Example
Many PDL functions also support named parameters.
Syntax
Example
For the whole list of the named parameters for all the functions, please see the chapter "PDL functions reference".
The order of the named parameters is not significant, but for better readability, it is recommended to put them after function arguments.
Example
Operators
PDL Operators can be applied following a single expression (unary operators) or they can connect two expressions (binary operators). Like functions, operator names are case-insensitive.
Syntax
Example
The following table lists all types of operators in PDL.
Operator |
Name |
Type |
not |
not |
unary |
and |
and |
binary |
or |
or |
binary |
xor |
xor |
binary |
& |
set intersection |
binary |
/ |
set difference |
binary |
For more information about operators, please see the chapter "Operators".
Functions and operators can be nested, so that the result of one function (or operator) is an argument of the parent function (or operator). Therefore, a PDL-query is usually a combination of functions, operators and strings.
Example
For more information about complex PDL-queries, please see the chapter "Walkthrough Example".