spelling module

See correcting errors in user queries.

This module contains helper functions for correcting typos in user queries.

Corrector objects

class whoosh.spelling.Corrector

Base class for spelling correction objects. Concrete sub-classes should implement the _suggestions method.

suggest(text, limit=5, maxdist=2, prefix=0)
Parameters:
  • text – the text to check. This word will not be added to the suggestions, even if it appears in the word graph.
  • limit – only return up to this many suggestions. If there are not enough terms in the field within maxdist of the given word, the returned list will be shorter than this number.
  • maxdist – the largest edit distance from the given word to look at. Values higher than 2 are not very effective or efficient.
  • prefix – require suggestions to share a prefix of this length with the given word. This is often justifiable since most misspellings do not involve the first letter of the word. Using a prefix dramatically decreases the time it takes to generate the list of words.
class whoosh.spelling.ReaderCorrector(reader, fieldname, fieldobj)

Suggests corrections based on the content of a field in a reader.

Ranks suggestions by the edit distance, then by highest to lowest frequency.

class whoosh.spelling.MultiCorrector(correctors, op)

Merges suggestions from a list of sub-correctors.

QueryCorrector objects

class whoosh.spelling.QueryCorrector(fieldname)

Base class for objects that correct words in a user query.

correct_query(q, qstring)

Returns a Correction object representing the corrected form of the given query.

Parameters:
  • q – the original whoosh.query.Query tree to be corrected.
  • qstring – the original user query. This may be None if the original query string is not available, in which case the Correction.string attribute will also be None.
Return type:

Correction

class whoosh.spelling.SimpleQueryCorrector(correctors, terms, aliases=None, prefix=0, maxdist=2)

A simple query corrector based on a mapping of field names to Corrector objects, and a list of ("fieldname", "text") tuples to correct. And terms in the query that appear in list of term tuples are corrected using the appropriate corrector.

Parameters:
  • correctors – a dictionary mapping field names to Corrector objects.
  • terms – a sequence of ("fieldname", "text") tuples representing terms to be corrected.
  • aliases – a dictionary mapping field names in the query to field names for spelling suggestions.
  • prefix – suggested replacement words must share this number of initial characters with the original word. Increasing this even to just 1 can dramatically speed up suggestions, and may be justifiable since spellling mistakes rarely involve the first letter of a word.
  • maxdist – the maximum number of “edits” (insertions, deletions, subsitutions, or transpositions of letters) allowed between the original word and any suggestion. Values higher than 2 may be slow.
class whoosh.spelling.Correction(q, qstring, corr_q, tokens)

Represents the corrected version of a user query string. Has the following attributes:

query
The corrected whoosh.query.Query object.
string
The corrected user query string.
original_query
The original whoosh.query.Query object that was corrected.
original_string
The original user query string.
tokens
A list of token objects representing the corrected words.

You can also use the Correction.format_string() method to reformat the corrected query string using a whoosh.highlight.Formatter class. For example, to display the corrected query string as HTML with the changed words emphasized:

from whoosh import highlight

correction = mysearcher.correct_query(q, qstring)

hf = highlight.HtmlFormatter(classname="change")
html = correction.format_string(hf)