char

Purpose

Looks for category of tokens specified by its first parameter category.

Syntax

char(category, term1, term2, …​ )

Arguments

List of category parameter’s values below is exhaustive.

Category

Synonym

Description

alpha

a

tokens that consist of alphabetic characters only

alnum

an, alphanum

tokens that consist of alphabetic and numeric characters

numeral

n, num

tokens that consist of digits and symbols that can be used within numbers (such as commas, dots, slashes etc.)

digit

d

tokens that consist of digits only

special

sp

non-alphabetic and non-numeric symbols (#, @, &, %…​)

word

w

alpha|alnum|numeral|special

punct

p

any punctuation sign

bracket

br

left or right parenthesis

colon

col, ":"

colon symbols

comma

","

comma symbols

dot

"."

dot symbols

exclamation

excl, "!"

exclamation symbols

hyphen

hp, "-"

a dash/a hyphen

lbracket

lb, "("

left parenthesis

rbracket

rb, ")"

right parenthesis

question

qm, ?

question mark symbols

semicolon

sc, ";"

semicolon symbol

slash

sl, "/"

slash symbol

quote

qt

any quote symbol

lquote

lqt

any left quote symbol

rquote

rqt

any right quote symbol

squote

sq, "'"

single quote symbol

lsquote

lsq, ‘

left single quote symbol

rsquote

rsq, ’

right single quote symbol

dquote

dq, "\""

double quote symbol

ldquote

ldq, “

left double quote symbol

rdquote

rdq, ”

right double quote symbol

plus

pl, "+"

plus symbol

plusminus

pm, ±

plus-minus symbol

equal

eq, "="

equals symbol

less

ls, "<"

less-than symbol

greater

gr, ">"

greater-than symbol

tilde

td, ~

tilde symbol

vline

vl, "|"

vertical line symbol

arabic

tokens that consist of arabic alphabet symbols

chinese

tokens that consist of chinese alphabet symbols

cyrillic

tokens that consist of cyrillic alphabet symbols

greek

tokens that consist of greek alphabet symbols

hiragana

tokens that consist of hiragana alphabet symbols

katakana

tokens that consist of katakana alphabet symbols

korean

tokens that consist of korean alphabet symbols

latin

tokens that consist of latin alphabet symbols

mixed

tokens that consist of mixed alphabet symbols

Note

  • Several quoted punctuation marks are treated as sequence: char("?!") matches "?!".

  • It is possible to mix category (only one category is allowed) and alphabetic parameters (number of alphabetic parameters is not limited) to specify a search, using an underscore ("_").

Returned Value

Documents matching the query.

Examples

char(alnum) = char(an) matches "A7", "LAF006C", "AH-26".

char(comma) = char(",") matches commas.

char(digit) matches "100" and "11".

char(num) = char(n) matches "713.446.9307"; "1,000"; "100"; "8th"; "11".

char(">=") matches ">="; ">≈".

phrase(0, char("\""), stem(noun), char("\"")) matches quoted nouns.

char(alpha, term(mylist)) matches words from wordclass "mylist" that contain alphabetic characters only.

char(mixed) matches «GSK-3β-dependent», «αB-crystallin», «радио-FM», etc.

char(latin_greek) matches «interferon-β», «GSK-3β-dependent», «αB-crystallin», etc.

char(alnum_greek) matches «Δ6», «β2», etc.

char(alpha_cyrillic) matches tokens that consist only of cyrillic alphabet characters.

char(alnum_latin_cyrillic) matches tokens that consist of numeric characters and latin and cyrillic alphabetic symbols, for example, "wp7-устройство", "к750i".

char(alpha_cyrillic|numeral) matches tokens consisting of cyrillic symbols or numbers.