spelling
module¶
See correcting errors in user queries.
This module contains helper functions for correcting typos in user queries.
Corrector objects¶
-
class
whoosh.spelling.
Corrector
¶ Base class for spelling correction objects. Concrete sub-classes should implement the
_suggestions
method.-
suggest
(text, limit=5, maxdist=2, prefix=0)¶ Parameters: - text – the text to check. This word will not be added to the suggestions, even if it appears in the word graph.
- limit – only return up to this many suggestions. If there are not
enough terms in the field within
maxdist
of the given word, the returned list will be shorter than this number. - maxdist – the largest edit distance from the given word to look at. Values higher than 2 are not very effective or efficient.
- prefix – require suggestions to share a prefix of this length with the given word. This is often justifiable since most misspellings do not involve the first letter of the word. Using a prefix dramatically decreases the time it takes to generate the list of words.
-
-
class
whoosh.spelling.
ReaderCorrector
(reader, fieldname, fieldobj)¶ Suggests corrections based on the content of a field in a reader.
Ranks suggestions by the edit distance, then by highest to lowest frequency.
-
class
whoosh.spelling.
MultiCorrector
(correctors, op)¶ Merges suggestions from a list of sub-correctors.
QueryCorrector objects¶
-
class
whoosh.spelling.
QueryCorrector
(fieldname)¶ Base class for objects that correct words in a user query.
-
correct_query
(q, qstring)¶ Returns a
Correction
object representing the corrected form of the given query.Parameters: - q – the original
whoosh.query.Query
tree to be corrected. - qstring – the original user query. This may be None if the
original query string is not available, in which case the
Correction.string
attribute will also be None.
Return type: - q – the original
-
-
class
whoosh.spelling.
SimpleQueryCorrector
(correctors, terms, aliases=None, prefix=0, maxdist=2)¶ A simple query corrector based on a mapping of field names to
Corrector
objects, and a list of("fieldname", "text")
tuples to correct. And terms in the query that appear in list of term tuples are corrected using the appropriate corrector.Parameters: - correctors – a dictionary mapping field names to
Corrector
objects. - terms – a sequence of
("fieldname", "text")
tuples representing terms to be corrected. - aliases – a dictionary mapping field names in the query to field names for spelling suggestions.
- prefix – suggested replacement words must share this number of
initial characters with the original word. Increasing this even to
just
1
can dramatically speed up suggestions, and may be justifiable since spellling mistakes rarely involve the first letter of a word. - maxdist – the maximum number of “edits” (insertions, deletions,
subsitutions, or transpositions of letters) allowed between the
original word and any suggestion. Values higher than
2
may be slow.
- correctors – a dictionary mapping field names to
-
class
whoosh.spelling.
Correction
(q, qstring, corr_q, tokens)¶ Represents the corrected version of a user query string. Has the following attributes:
query
- The corrected
whoosh.query.Query
object. string
- The corrected user query string.
original_query
- The original
whoosh.query.Query
object that was corrected. original_string
- The original user query string.
tokens
- A list of token objects representing the corrected words.
You can also use the
Correction.format_string()
method to reformat the corrected query string using awhoosh.highlight.Formatter
class. For example, to display the corrected query string as HTML with the changed words emphasized:from whoosh import highlight correction = mysearcher.correct_query(q, qstring) hf = highlight.HtmlFormatter(classname="change") html = correction.format_string(hf)