- Release notes
- Quick start
- Introduction to Whoosh
- Glossary
- Designing a schema
- How to index documents
- How to search
- Parsing user queries
- The default query language
- Indexing and parsing dates/times
- Query objects
- About analyzers
- Stemming, variations, and accent folding
- Indexing and searching N-grams
- Sorting and faceting
- How to create highlighted search result excerpts
- Query expansion and Key word extraction
- “Did you mean… ?” Correcting errors in user queries
- Field caches
- Tips for speeding up batch indexing
- Concurrency, locking, and versioning
- Indexing and searching document hierarchies
- Whoosh recipes
- Whoosh API
analysis
modulecodec.base
modulecollectors
modulecolumns
modulefields
modulefiledb.filestore
modulefiledb.filetables
modulefiledb.structfile
moduleformats
modulehighlight
modulesupport.bitvector
moduleindex
modulelang.morph_en
modulelang.porter
modulelang.wordnet
modulematching
moduleqparser
modulequery
modulereading
modulescoring
modulesearching
modulesorting
modulespelling
modulesupport.charset
modulesupport.levenshtein
moduleutil
modulewriting
module
- Technical notes
support.charset
module¶
This module contains tools for working with Sphinx charset table files. These files
are useful for doing case and accent folding.
See whoosh.analysis.CharsetTokenizer
and whoosh.analysis.CharsetFilter
.
-
whoosh.support.charset.
default_charset
¶ An extensive case- and accent folding charset table. Taken from http://speeple.com/unicode-maps.txt
-
whoosh.support.charset.
charset_table_to_dict
(tablestring)¶ Takes a string with the contents of a Sphinx charset table file and returns a mapping object (a defaultdict, actually) of the kind expected by the unicode.translate() method: that is, it maps a character number to a unicode character or None if the character is not a valid word character.
The Sphinx charset table format is described at http://www.sphinxsearch.com/docs/current.html#conf-charset-table.