qparser
module¶
Parser object¶
-
class
whoosh.qparser.
QueryParser
(fieldname, schema, plugins=None, termclass=<class 'whoosh.query.terms.Term'>, phraseclass=<class 'whoosh.query.positional.Phrase'>, group=<class 'whoosh.qparser.syntax.AndGroup'>)¶ A hand-written query parser built on modular plug-ins. The default configuration implements a powerful fielded query language similar to Lucene’s.
You can use the
plugins
argument when creating the object to override the default list of plug-ins, and/or useadd_plugin()
and/orremove_plugin_class()
to change the plug-ins included in the parser.>>> from whoosh import qparser >>> parser = qparser.QueryParser("content", schema) >>> parser.remove_plugin_class(qparser.WildcardPlugin) >>> parser.add_plugin(qparser.PrefixPlugin()) >>> parser.parse(u"hello there") And([Term("content", u"hello"), Term("content", u"there")])
Parameters: - fieldname – the default field – the parser uses this as the field for any terms without an explicit field.
- schema – a
whoosh.fields.Schema
object to use when parsing. The appropriate fields in the schema will be used to tokenize terms/phrases before they are turned into query objects. You can specify None for the schema to create a parser that does not analyze the text of the query, usually for testing purposes. - plugins – a list of plugins to use. WhitespacePlugin is automatically included, do not put it in this list. This overrides the default list of plugins. Classes in the list will be automatically instantiated.
- termclass – the query class to use for individual search terms.
The default is
whoosh.query.Term
. - phraseclass – the query class to use for phrases. The default
is
whoosh.query.Phrase
. - group – the default grouping.
AndGroup
makes terms required by default.OrGroup
makes terms optional by default.
-
add_plugin
(pin)¶ Adds the given plugin to the list of plugins in this parser.
-
add_plugins
(pins)¶ Adds the given list of plugins to the list of plugins in this parser.
-
default_set
()¶ Returns the default list of plugins to use.
-
filterize
(nodes, debug=False)¶ Takes a group of nodes and runs the filters provided by the parser’s plugins.
-
filters
()¶ Returns a priorized list of filter functions provided by the parser’s currently configured plugins.
-
multitoken_query
(spec, texts, fieldname, termclass, boost)¶ Returns a query for multiple texts. This method implements the intention specified in the field’s
multitoken_query
attribute, which specifies what to do when strings that look like single terms to the parser turn out to yield multiple tokens when analyzed.Parameters: - spec – a string describing how to join the text strings into a
query. This is usually the value of the field’s
multitoken_query
attribute. - texts – a list of token strings.
- fieldname – the name of the field.
- termclass – the query class to use for single terms.
- boost – the original term’s boost in the query string, should be applied to the returned query object.
- spec – a string describing how to join the text strings into a
query. This is usually the value of the field’s
-
parse
(text, normalize=True, debug=False)¶ Parses the input string and returns a
whoosh.query.Query
object/tree.Parameters: - text – the unicode string to parse.
- normalize – whether to call normalize() on the query object/tree before returning it. This should be left on unless you’re trying to debug the parser output.
Return type:
-
process
(text, pos=0, debug=False)¶ Returns a group of syntax nodes corresponding to the given text, tagged by the plugin Taggers and filtered by the plugin filters.
Parameters: - text – the text to tag.
- pos – the position in the text to start tagging at.
-
remove_plugin
(pi)¶ Removes the given plugin object from the list of plugins in this parser.
-
remove_plugin_class
(cls)¶ Removes any plugins of the given class from this parser.
-
replace_plugin
(plugin)¶ Removes any plugins of the class of the given plugin and then adds it. This is a convenience method to keep from having to call
remove_plugin_class
followed byadd_plugin
each time you want to reconfigure a default plugin.>>> qp = qparser.QueryParser("content", schema) >>> qp.replace_plugin(qparser.NotPlugin("(^| )-"))
-
tag
(text, pos=0, debug=False)¶ Returns a group of syntax nodes corresponding to the given text, created by matching the Taggers provided by the parser’s plugins.
Parameters: - text – the text to tag.
- pos – the position in the text to start tagging at.
-
taggers
()¶ Returns a priorized list of tagger objects provided by the parser’s currently configured plugins.
-
term_query
(fieldname, text, termclass, boost=1.0, tokenize=True, removestops=True)¶ Returns the appropriate query object for a single term in the query string.
Pre-made configurations¶
The following functions return pre-configured QueryParser objects.
-
whoosh.qparser.
MultifieldParser
(fieldnames, schema, fieldboosts=None, **kwargs)¶ Returns a QueryParser configured to search in multiple fields.
Instead of assigning unfielded clauses to a default field, this parser transforms them into an OR clause that searches a list of fields. For example, if the list of multi-fields is “f1”, “f2” and the query string is “hello there”, the class will parse “(f1:hello OR f2:hello) (f1:there OR f2:there)”. This is very useful when you have two textual fields (e.g. “title” and “content”) you want to search by default.
Parameters: - fieldnames – a list of field names to search.
- fieldboosts – an optional dictionary mapping field names to boosts.
-
whoosh.qparser.
SimpleParser
(fieldname, schema, **kwargs)¶ Returns a QueryParser configured to support only +, -, and phrase syntax.
-
whoosh.qparser.
DisMaxParser
(fieldboosts, schema, tiebreak=0.0, **kwargs)¶ Returns a QueryParser configured to support only +, -, and phrase syntax, and which converts individual terms into DisjunctionMax queries across a set of fields.
Parameters: fieldboosts – a dictionary mapping field names to boosts.
Plug-ins¶
-
class
whoosh.qparser.
Plugin
¶ Base class for parser plugins.
-
filters
(parser)¶ Should return a list of
(filter_function, priority)
tuples to add to parser. Lower priority numbers run first.Filter functions will be called with
(parser, groupnode)
and should return a group node.
-
taggers
(parser)¶ Should return a list of
(Tagger, priority)
tuples to add to the syntax the parser understands. Lower priorities run first.
-
-
class
whoosh.qparser.
SingleQuotePlugin
(expr=None)¶ Adds the ability to specify single “terms” containing spaces by enclosing them in single quotes.
-
class
whoosh.qparser.
PrefixPlugin
(expr=None)¶ Adds the ability to specify prefix queries by ending a term with an asterisk.
This plugin is useful if you want the user to be able to create prefix but not wildcard queries (for performance reasons). If you are including the wildcard plugin, you should not include this plugin as well.
>>> qp = qparser.QueryParser("content", myschema) >>> qp.remove_plugin_class(qparser.WildcardPlugin) >>> qp.add_plugin(qparser.PrefixPlugin()) >>> q = qp.parse("pre*")
-
class
whoosh.qparser.
WildcardPlugin
(expr=None)¶
-
class
whoosh.qparser.
RegexPlugin
(expr=None)¶ Adds the ability to specify regular expression term queries.
The default syntax for a regular expression term is
r"termexpr"
.>>> qp = qparser.QueryParser("content", myschema) >>> qp.add_plugin(qparser.RegexPlugin()) >>> q = qp.parse('foo title:r"bar+"')
-
class
whoosh.qparser.
BoostPlugin
(expr=None)¶ Adds the ability to boost clauses of the query using the circumflex.
>>> qp = qparser.QueryParser("content", myschema) >>> q = qp.parse("hello there^2")
-
class
whoosh.qparser.
GroupPlugin
(openexpr='[(]', closeexpr='[)]')¶ Adds the ability to group clauses using parentheses.
-
class
whoosh.qparser.
EveryPlugin
(expr=None)¶
-
class
whoosh.qparser.
FieldsPlugin
(expr='(?P<text>\\w+|[*]):', remove_unknown=True)¶ Adds the ability to specify the field of a clause.
Parameters: - expr – the regular expression to use for tagging fields.
- remove_unknown – if True, converts field specifications for fields that aren’t in the schema into regular text.
-
class
whoosh.qparser.
PhrasePlugin
(expr='"(?P<text>.*?)"(~(?P<slop>[1-9][0-9]*))?')¶ Adds the ability to specify phrase queries inside double quotes.
-
class
whoosh.qparser.
RangePlugin
(expr=None, excl_start='{', excl_end='}')¶ Adds the ability to specify term ranges.
-
class
whoosh.qparser.
OperatorsPlugin
(ops=None, clean=False, And='(?<=\\s)AND(?=\\s)', Or='(?<=\\s)OR(?=\\s)', AndNot='(?<=\\s)ANDNOT(?=\\s)', AndMaybe='(?<=\\s)ANDMAYBE(?=\\s)', Not='(^|(?<=(\\s|[()])))NOT(?=\\s)', Require='(^|(?<=\\s))REQUIRE(?=\\s)')¶ By default, adds the AND, OR, ANDNOT, ANDMAYBE, and NOT operators to the parser syntax. This plugin scans the token stream for subclasses of
Operator
and calls theirOperator.make_group()
methods to allow them to manipulate the stream.There are two levels of configuration available.
The first level is to change the regular expressions of the default operators, using the
And
,Or
,AndNot
,AndMaybe
, and/orNot
keyword arguments. The keyword value can be a pattern string or a compiled expression, or None to remove the operator:qp = qparser.QueryParser("content", schema) cp = qparser.OperatorsPlugin(And="&", Or="\|", AndNot="&!", AndMaybe="&~", Not=None) qp.replace_plugin(cp)
You can also specify a list of
(OpTagger, priority)
pairs as the first argument to the initializer to use custom operators. See Creating custom operators for more information on this.
-
class
whoosh.qparser.
PlusMinusPlugin
(plusexpr='\\+', minusexpr='-')¶ Adds the ability to use + and - in a flat OR query to specify required and prohibited terms.
This is the basis for the parser configuration returned by
SimpleParser()
.
-
class
whoosh.qparser.
GtLtPlugin
(expr=None)¶ Allows the user to use greater than/less than symbols to create range queries:
a:>100 b:<=z c:>=-1.4 d:<mz
This is the equivalent of:
a:{100 to] b:[to z] c:[-1.4 to] d:[to mz}
The plugin recognizes
>
,<
,>=
,<=
,=>
, and=<
after a field specifier. The field specifier is required. You cannot do the following:>100
This plugin requires the FieldsPlugin and RangePlugin to work.
-
class
whoosh.qparser.
MultifieldPlugin
(fieldnames, fieldboosts=None, group=<class 'whoosh.qparser.syntax.OrGroup'>)¶ Converts any unfielded terms into OR clauses that search for the term in a specified list of fields.
>>> qp = qparser.QueryParser(None, myschema) >>> qp.add_plugin(qparser.MultifieldPlugin(["a", "b"]) >>> qp.parse("alfa c:bravo") And([Or([Term("a", "alfa"), Term("b", "alfa")]), Term("c", "bravo")])
This plugin is the basis for the
MultifieldParser
.Parameters: - fieldnames – a list of fields to search.
- fieldboosts – an optional dictionary mapping field names to a boost to use for that field.
- group – the group to use to relate the fielded terms to each other.
-
class
whoosh.qparser.
FieldAliasPlugin
(fieldmap)¶ Adds the ability to use “aliases” of fields in the query string.
This plugin is useful for allowing users of languages that can’t be represented in ASCII to use field names in their own language, and translate them into the “real” field names, which must be valid Python identifiers.
>>> # Allow users to use 'body' or 'text' to refer to the 'content' field >>> parser.add_plugin(FieldAliasPlugin({"content": ["body", "text"]})) >>> parser.parse("text:hello") Term("content", "hello")
-
class
whoosh.qparser.
CopyFieldPlugin
(map, group=<class 'whoosh.qparser.syntax.OrGroup'>, mirror=False)¶ Looks for basic syntax nodes (terms, prefixes, wildcards, phrases, etc.) occurring in a certain field and replaces it with a group (by default OR) containing the original token and the token copied to a new field.
For example, the query:
hello name:matt
could be automatically converted by
CopyFieldPlugin({"name", "author"})
to:hello (name:matt OR author:matt)
This is useful where one field was indexed with a differently-analyzed copy of another, and you want the query to search both fields.
You can specify a different group type with the
group
keyword. You can also specifygroup=None
, in which case the copied node is inserted “inline” next to the original, instead of in a new group:hello name:matt author:matt
Parameters: - map – a dictionary mapping names of fields to copy to the names of the destination fields.
- group – the type of group to create in place of the original
token. You can specify
group=None
to put the copied node “inline” next to the original node instead of in a new group. - two_way – if True, the plugin copies both ways, so if the user specifies a query in the ‘toname’ field, it will be copied to the ‘fromname’ field.
Syntax node objects¶
Base nodes¶
-
class
whoosh.qparser.
SyntaxNode
¶ Base class for nodes that make up the abstract syntax tree (AST) of a parsed user query string. The AST is an intermediate step, generated from the query string, then converted into a
whoosh.query.Query
tree by calling thequery()
method on the nodes.Instances have the following required attributes:
has_fieldname
- True if this node has a
fieldname
attribute. has_text
- True if this node has a
text
attribute has_boost
- True if this node has a
boost
attribute. startchar
- The character position in the original text at which this node started.
endchar
- The character position in the original text at which this node ended.
-
is_ws
()¶ Returns True if this node is ignorable whitespace.
-
query
(parser)¶ Returns a
whoosh.query.Query
instance corresponding to this syntax tree node.
-
r
()¶ Returns a basic representation of this node. The base class’s
__repr__
method calls this, then does the extra busy work of adding fieldname and boost where appropriate.
-
set_boost
(boost)¶ Sets the boost associated with this node.
For nodes that don’t have a boost, this is a no-op.
-
set_fieldname
(name, override=False)¶ Sets the fieldname associated with this node. If
override
is False (the default), the fieldname will only be replaced if this node does not already have a fieldname set.For nodes that don’t have a fieldname, this is a no-op.
-
set_range
(startchar, endchar)¶ Sets the character range associated with this node.
Nodes¶
-
class
whoosh.qparser.
FieldnameNode
(fieldname, original)¶ Abstract syntax tree node for field name assignments.
-
class
whoosh.qparser.
TextNode
(text)¶ Intermediate base class for basic nodes that search for text, such as term queries, wildcards, prefixes, etc.
Instances have the following attributes:
qclass
- If a subclass does not override
query()
, the base class will use this class to construct the query. tokenize
- If True and the subclass does not override
query()
, the node’s text will be tokenized before constructing the query removestops
- If True and the subclass does not override
query()
, and the field’s analyzer has a stop word filter, stop words will be removed from the text before constructing the query.
-
class
whoosh.qparser.
WordNode
(text)¶ Syntax node for term queries.
-
class
whoosh.qparser.
RangeNode
(start, end, startexcl, endexcl)¶ Syntax node for range queries.
-
class
whoosh.qparser.
MarkerNode
¶ Base class for nodes that only exist to mark places in the tree.
Group nodes¶
-
class
whoosh.qparser.
GroupNode
(nodes=None, boost=1.0, **kwargs)¶ Base class for abstract syntax tree node types that group together sub-nodes.
Instances have the following attributes:
merging
- True if side-by-side instances of this group can be merged into a single group.
qclass
- If a subclass doesn’t override
query()
, the base class will simply wrap this class around the queries returned by the subnodes.
This class implements a number of list methods for operating on the subnodes.
-
class
whoosh.qparser.
BinaryGroup
(nodes=None, boost=1.0, **kwargs)¶ Intermediate base class for group nodes that have two subnodes and whose
qclass
initializer takes two arguments instead of a list.
-
class
whoosh.qparser.
ErrorNode
(message, node=None)¶
-
class
whoosh.qparser.
AndGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
OrGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
AndNotGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
AndMaybeGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
DisMaxGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
RequireGroup
(nodes=None, boost=1.0, **kwargs)¶
-
class
whoosh.qparser.
NotGroup
(nodes=None, boost=1.0, **kwargs)¶
Operators¶
-
class
whoosh.qparser.
Operator
(text, grouptype, leftassoc=True)¶ Base class for PrefixOperator, PostfixOperator, and InfixOperator.
Operators work by moving the nodes they apply to (e.g. for prefix operator, the previous node, for infix operator, the nodes on either side, etc.) into a group node. The group provides the code for what to do with the nodes.
Parameters: - text – the text of the operator in the query string.
- grouptype – the type of group to create in place of the operator and the node(s) it operates on.
- leftassoc – for infix opeators, whether the operator is left
associative. use
leftassoc=False
for right-associative infix operators.
-
class
whoosh.qparser.
PrefixOperator
(text, grouptype, leftassoc=True)¶ Parameters: - text – the text of the operator in the query string.
- grouptype – the type of group to create in place of the operator and the node(s) it operates on.
- leftassoc – for infix opeators, whether the operator is left
associative. use
leftassoc=False
for right-associative infix operators.
-
class
whoosh.qparser.
PostfixOperator
(text, grouptype, leftassoc=True)¶ Parameters: - text – the text of the operator in the query string.
- grouptype – the type of group to create in place of the operator and the node(s) it operates on.
- leftassoc – for infix opeators, whether the operator is left
associative. use
leftassoc=False
for right-associative infix operators.
-
class
whoosh.qparser.
InfixOperator
(text, grouptype, leftassoc=True)¶ Parameters: - text – the text of the operator in the query string.
- grouptype – the type of group to create in place of the operator and the node(s) it operates on.
- leftassoc – for infix opeators, whether the operator is left
associative. use
leftassoc=False
for right-associative infix operators.