support.charset module

support.charset module

support.charset module

This module contains tools for working with Sphinx charset table files. These files are useful for doing case and accent folding. See whoosh.analysis.CharsetTokenizer and whoosh.analysis.CharsetFilter.

whoosh.support.charset.default_charset

An extensive case- and accent folding charset table. Taken from http://speeple.com/unicode-maps.txt

whoosh.support.charset.charset_table_to_dict(tablestring)

Takes a string with the contents of a Sphinx charset table file and returns a mapping object (a defaultdict, actually) of the kind expected by the unicode.translate() method: that is, it maps a character number to a unicode character or None if the character is not a valid word character.

The Sphinx charset table format is described at http://www.sphinxsearch.com/docs/current.html#conf-charset-table.