Release notes
- Whoosh 2.x release notes
- Whoosh 1.x release notes
- Whoosh 0.3 release notes
Quick start
- A quick introduction
- The Index and Schema objects
- The IndexWriter object
- The Searcher object
Introduction to Whoosh
- About Whoosh
- What is Whoosh?
- What can Whoosh do for you?
- Getting help with Whoosh
Glossary
Designing a schema
- About schemas and fields
- Built-in field types
- Creating a Schema
- Modifying the schema after indexing
- Dynamic fields
- Advanced schema setup
How to index documents
- Creating an Index object
- Clearing the index
- Indexing documents
  - Indexing and storing different values for the same field
  - Finishing adding documents
- Merging segments
- Deleting documents
- Updating documents
- Incremental indexing
- Clearing the index
How to search
- The Searcher object
- Results object
- Scoring and sorting
  - Scoring
  - Sorting
- Highlighting snippets and More Like This
- Filtering results
- Which terms from my query matched?
- Collapsing results
- Time limited searches
- Convenience methods
- Combining Results objects
Parsing user queries
- Overview
- Using the default parser
- Common customizations
- Advanced customization
The default query language
- Overview
- Individual terms and phrases
- Boolean operators
- Fields
- Inexact terms
- Ranges
- Boosting query elements
- Making a term from literal text
Indexing and parsing dates/times
- Indexing dates
- Parsing date queries
- About time zones and basetime
- Date parser notes
- Limitations
Query objects
About analyzers
- Overview
- Using analyzers
- Advanced Analysis
Stemming, variations, and accent folding
- The problem
- Stemming
- Variations
- Lemmatization
- Character folding
Indexing and searching N-grams
- Overview
Sorting and faceting
- Overview
- Sorting
- Grouping
- Facet types
- MultiFacet
- Missing values
- Using overlapping groups
- Using a custom sort order
- Expert: writing your own facet
How to create highlighted search result excerpts
- Overview
- Requirements
- How to
- The character limit
- Customizing the highlights
- Highlighter object
- Speeding up highlighting
  - PinpointFragmenter
  - PinpointFragmenter limitations
- Using the low-level API
  - Usage
Query expansion and Key word extraction
- Overview
- Usage
- Expansion models
“Did you mean… ?” Correcting errors in user queries
- Overview
- Pulling suggestions from an indexed field
- Pulling suggestions from a word list
- Merging two or more correctors
- Correcting user queries
Field caches
- Customizing cache behaviour
- Creating a custom caching policy
Tips for speeding up batch indexing
- Overview
- StemmingAnalyzer cache
- The limitmb parameter
- The procs parameter
- The multisegment parameter
Concurrency, locking, and versioning
- Concurrency
- Locking
  - Lock files
- Versioning
Indexing and searching document hierarchies
- Overview
- Using nested document indexing
- Using query-time joins
Whoosh recipes
- General
  - Get the stored fields for a document from the document number
- Analysis
  - Eliminate words shorter/longer than N
  - Allow optional case-sensitive searches
- Searching
  - Find every document
  - iTunes-style search-as-you-type
- Shortcuts
  - Look up documents by a field value
- Sorting and scoring
  - Score results based on the position of the matched term
- Results
  - How many hits were there?
  - Which terms matched in each hit?
- Global information
Whoosh API
- analysis module
  - Analyzers
  - Tokenizers
  - Filters
  - Token classes and functions
    - Token
    - unstopped()
- codec.base module
  - Classes
- collectors module
  - Base classes
  - Basic collectors
  - Wrappers
- columns module
  - Base classes
  - Basic columns
  - Technical columns
  - Experimental columns
    - ClampedNumericColumn
- fields module
  - Schema class
    - Schema
    - SchemaClass
  - FieldType base class
    - FieldType
  - Pre-made field types
    - ID
    - IDLIST
    - STORED
    - KEYWORD
    - TEXT
    - NUMERIC
    - DATETIME
    - BOOLEAN
    - NGRAM
    - NGRAMWORDS
  - Exceptions
    - FieldConfigurationError
    - UnknownFieldError
- filedb.filestore module
  - Base class
    - Storage
  - Implementation classes
    - FileStorage
    - RamStorage
  - Helper functions
    - copy_storage()
    - copy_to_ram()
  - Exceptions
    - ReadOnlyError
- filedb.filetables module
  - Hash file
    - HashWriter
      - HashWriter.add()
      - HashWriter.add_all()
    - HashReader
  - Ordered Hash file
    - OrderedHashWriter
    - OrderedHashReader
- filedb.structfile module
  - Classes
- formats module
  - Base class
    - Format
  - Formats
- highlight module
  - Manual highlighting
    - Highlighter
    - highlight()
  - Fragmenters
  - Scorers
    - FragmentScorer
    - BasicFragmentScorer
  - Formatters
  - Utility classes
    - Fragment
- support.bitvector module
  - Base classes
    - DocIdSet
    - BaseBitSet
  - Implementation classes
- index module
  - Functions
  - Base class
    - Index
  - Implementation
    - FileIndex
  - Exceptions
- lang.morph_en module
  - variations()
- lang.porter module
  - stem()
- lang.wordnet module
  - Thesaurus
    - Thesaurus
  - Low-level functions
- matching module
  - Matchers
  - Exceptions
    - ReadTooFar
    - NoQualityAvailable
- qparser module
  - Parser object
    - QueryParser
    - Pre-made configurations
  - Plug-ins
  - Syntax node objects
- query module
  - Base classes
  - Query classes
    - Term
    - Variations
    - FuzzyTerm
    - Phrase
    - And
    - Or
    - DisjunctionMax
    - Not
    - Prefix
    - Wildcard
    - Regex
    - TermRange
    - NumericRange
    - DateRange
    - Every
    - NullQuery
  - Binary queries
  - Span queries
  - Special queries
  - Exceptions
    - QueryError
- reading module
  - Classes
  - Exceptions
    - TermNotFound
- scoring module
  - Base classes
  - Scoring algorithm classes
  - Scoring utility classes
- searching module
  - Searching classes
    - Searcher
  - Results classes
  - Exceptions
    - NoTermsException
    - TimeLimit
- sorting module
  - Base types
    - FacetType
      - FacetType.categorizer()
    - Categorizer
  - Facet types
  - Facets object
    - Facets
  - FacetType objects
- spelling module
  - Corrector objects
  - QueryCorrector objects
- support.charset module
  - default_charset
  - charset_table_to_dict()
- support.levenshtein module
  - relative()
  - distance()
- util module
  - fib()
  - make_binary_tree()
  - make_weighted_tree()
  - synchronized()
  - unclosed()
- writing module
  - Writer
    - IndexWriter
  - Utility writers
    - BufferedWriter
    - AsyncWriter
  - Exceptions
    - IndexingError
Technical notes
- How to implement a new backend
  - Index
  - IndexWriter
  - IndexReader
  - Matcher
- filedb notes
  - Files created

`sorting` module¶

Base types¶

class whoosh.sorting.FacetType¶

Base class for “facets”, aspects that can be sorted/faceted.

categorizer(global_searcher)¶

Returns a Categorizer corresponding to this facet.

Parameters:	global_searcher – A parent searcher. You can use this searcher if you need global document ID references.

class whoosh.sorting.Categorizer¶

Base class for categorizer objects which compute a key value for a document based on certain criteria, for use in sorting/faceting.

Categorizers are created by FacetType objects through the FacetType.categorizer() method. The whoosh.searching.Searcher object passed to the categorizer method may be a composite searcher (that is, wrapping a multi-reader), but categorizers are always run per-segment, with segment-relative document numbers.

The collector will call a categorizer’s set_searcher method as it searches each segment to let the cateogorizer set up whatever segment- specific data it needs.

Collector.allow_overlap should be True if the caller can use the keys_for method instead of key_for to group documents into potentially overlapping groups. The default is False.

If a categorizer subclass can categorize the document using only the document number, it should set Collector.needs_current to False (this is the default) and NOT USE the given matcher in the key_for or keys_for methods, since in that case segment_docnum is not guaranteed to be consistent with the given matcher. If a categorizer subclass needs to access information on the matcher, it should set needs_current to True. This will prevent the caller from using optimizations that might leave the matcher in an inconsistent state.

key_for(matcher, segment_docnum)¶

Returns a key for the current match.

Parameters:	matcher – a `whoosh.matching.Matcher` object. If `self.needs_current` is `False`, DO NOT use this object, since it may be inconsistent. Use the given `segment_docnum` instead. segment_docnum – the segment-relative document number of the current match.

key_to_name(key)¶: Returns a representation of the key to be used as a dictionary key in faceting. For example, the sorting key for date fields is a large integer; this method translates it into a datetime object to make the groupings clearer.

keys_for(matcher, segment_docnum)¶

Yields a series of keys for the current match.

This method will be called instead of key_for if self.allow_overlap is True.

Parameters:	matcher – a `whoosh.matching.Matcher` object. If `self.needs_current` is `False`, DO NOT use this object, since it may be inconsistent. Use the given `segment_docnum` instead. segment_docnum – the segment-relative document number of the current match.

set_searcher(segment_searcher, docoffset)¶: Called by the collector when the collector moves to a new segment. The segment_searcher will be atomic. The docoffset is the offset of the segment’s document numbers relative to the entire index. You can use the offset to get absolute index docnums by adding the offset to segment-relative docnums.

Facet types¶

class whoosh.sorting.FieldFacet(fieldname, reverse=False, allow_overlap=False, maptype=None)¶

Sorts/facets by the contents of a field.

For example, to sort by the contents of the “path” field in reverse order, and facet by the contents of the “tag” field:

paths = FieldFacet("path", reverse=True)
tags = FieldFacet("tag")
results = searcher.search(myquery, sortedby=paths, groupedby=tags)

This facet returns different categorizers based on the field type.

Parameters:	fieldname – the name of the field to sort/facet on. reverse – if True, when sorting, reverse the sort order of this facet. allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field.

class whoosh.sorting.QueryFacet(querydict, other=None, allow_overlap=False, maptype=None)¶

Sorts/facets based on the results of a series of queries.

Parameters:	querydict – a dictionary mapping keys to `whoosh.query.Query` objects. other – the key to use for documents that don’t match any of the queries.

class whoosh.sorting.RangeFacet(fieldname, start, end, gap, hardend=False, maptype=None)¶

Sorts/facets based on numeric ranges. For textual ranges, use QueryFacet.

For example, to facet the “price” field into $100 buckets, up to $1000:

prices = RangeFacet("price", 0, 1000, 100)
results = searcher.search(myquery, groupedby=prices)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:

Parameters:	fieldname – the numeric field to sort/facet on. start – the start of the entire range. end – the end of the entire range. gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, `gap=[1,5,10]` will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets. hardend – if True, the end of the last bucket is clamped to the value of `end`. If False (the default), the last bucket is always `gap` sized, even if that means the end of the last bucket is after `end`.

fieldname – the numeric field to sort/facet on.
start – the start of the entire range.
end – the end of the entire range.
gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, gap=[1,5,10] will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets.
hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.

class whoosh.sorting.DateRangeFacet(fieldname, start, end, gap, hardend=False, maptype=None)¶

Sorts/facets based on date ranges. This is the same as RangeFacet except you are expected to use daterange objects as the start and end of the range, and timedelta or relativedelta objects as the gap(s), and it generates DateRange queries instead of TermRange queries.

For example, to facet a “birthday” range into 5 year buckets:

from datetime import datetime
from whoosh.support.relativedelta import relativedelta

startdate = datetime(1920, 0, 0)
enddate = datetime.now()
gap = relativedelta(years=5)
bdays = DateRangeFacet("birthday", startdate, enddate, gap)
results = searcher.search(myquery, groupedby=bdays)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:

Parameters:	fieldname – the numeric field to sort/facet on. start – the start of the entire range. end – the end of the entire range. gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, `gap=[1,5,10]` will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets. hardend – if True, the end of the last bucket is clamped to the value of `end`. If False (the default), the last bucket is always `gap` sized, even if that means the end of the last bucket is after `end`.

fieldname – the numeric field to sort/facet on.
start – the start of the entire range.
end – the end of the entire range.
gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, gap=[1,5,10] will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets.
hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.

class whoosh.sorting.ScoreFacet¶

Uses a document’s score as a sorting criterion.

For example, to sort by the tag field, and then within that by relative score:

tag_score = MultiFacet(["tag", ScoreFacet()])
results = searcher.search(myquery, sortedby=tag_score)

class whoosh.sorting.FunctionFacet(fn, maptype=None)¶

This facet type is low-level. In most cases you should use TranslateFacet instead.

This facet type ets you pass an arbitrary function that will compute the key. This may be easier than subclassing FacetType and Categorizer to set up the desired behavior.

The function is called with the arguments (searcher, docid), where the searcher may be a composite searcher, and the docid is an absolute index document number (not segment-relative).

For example, to use the number of words in the document’s “content” field as the sorting/faceting key:

fn = lambda s, docid: s.doc_field_length(docid, "content")
lengths = FunctionFacet(fn)

class whoosh.sorting.MultiFacet(items=None, maptype=None)¶

Sorts/facets by the combination of multiple “sub-facets”.

For example, to sort by the value of the “tag” field, and then (for documents where the tag is the same) by the value of the “path” field:

facet = MultiFacet([FieldFacet("tag"), FieldFacet("path")])
results = searcher.search(myquery, sortedby=facet)

As a shortcut, you can use strings to refer to field names, and they will be assumed to be field names and turned into FieldFacet objects:

facet = MultiFacet(["tag", "path"])

You can also use the add_* methods to add criteria to the multifacet:

facet = MultiFacet()
facet.add_field("tag")
facet.add_field("path", reverse=True)
facet.add_query({"a-m": TermRange("name", "a", "m"),
                 "n-z": TermRange("name", "n", "z")})

class whoosh.sorting.StoredFieldFacet(fieldname, allow_overlap=False, split_fn=None, maptype=None)¶

Lets you sort/group using the value in an unindexed, stored field (e.g. whoosh.fields.STORED). This is usually slower than using an indexed field.

For fields where the stored value is a space-separated list of keywords, (e.g. "tag1 tag2 tag3"), you can use the allow_overlap keyword argument to allow overlapped faceting on the result of calling the split() method on the field value (or calling a custom split function if one is supplied).

Parameters:

Parameters:	fieldname – the name of the stored field. allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field. The categorizer uses `string.split()` or the custom `split_fn` to convert the stored value into a list of facet values. split_fn – a custom function to split a stored field value into multiple facet values when `allow_overlap` is True. If not supplied, the categorizer simply calls the value’s `split()` method.

fieldname – the name of the stored field.
allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field. The categorizer uses string.split() or the custom split_fn to convert the stored value into a list of facet values.
split_fn – a custom function to split a stored field value into multiple facet values when allow_overlap is True. If not supplied, the categorizer simply calls the value’s split() method.

Facets object¶

class whoosh.sorting.Facets(x=None)¶

Maps facet names to FacetType objects, for creating multiple groupings of documents.

For example, to group by tag, and also group by price range:

facets = Facets()
facets.add_field("tag")
facets.add_facet("price", RangeFacet("price", 0, 1000, 100))
results = searcher.search(myquery, groupedby=facets)

tag_groups = results.groups("tag")
price_groups = results.groups("price")

(To group by the combination of multiple facets, use MultiFacet.)

add_facet(name, facet)¶: Adds a FacetType object under the given name.

add_facets(facets, replace=True)¶: Adds the contents of the given Facets or dict object to this object.

add_field(fieldname, **kwargs)¶: Adds a FieldFacet for the given field name (the field name is automatically used as the facet name).

add_query(name, querydict, **kwargs)¶

Adds a QueryFacet under the given name.

Parameters:	name – a name for the facet. querydict – a dictionary mapping keys to `whoosh.query.Query` objects.

items()¶: Returns a list of (facetname, facetobject) tuples for the facets in this object.

names()¶: Returns an iterator of the facet names in this object.

FacetType objects¶

class whoosh.sorting.FacetMap¶

Base class for objects holding the results of grouping search results by a Facet. Use an object’s as_dict() method to access the results.

You can pass a subclass of this to the maptype keyword argument when creating a FacetType object to specify what information the facet should record about the group. For example:

# Record each document in each group in its sorted order
myfacet = FieldFacet("size", maptype=OrderedList)

# Record only the count of documents in each group
myfacet = FieldFacet("size", maptype=Count)

add(groupname, docid, sortkey)¶

Adds a document to the facet results.

Parameters:	groupname – the name of the group to add this document to. docid – the document number of the document to add. sortkey – a value representing the sort position of the document in the full results.

as_dict()¶: Returns a dictionary object mapping group names to implementation-specific values. For example, the value might be a list of document numbers, or a integer representing the number of documents in the group.

class whoosh.sorting.OrderedList¶

Stores a list of document numbers for each group, in the same order as they appear in the search results.

The as_dict method returns a dictionary mapping group names to lists of document numbers.

class whoosh.sorting.UnorderedList¶

Stores a list of document numbers for each group, in arbitrary order. This is slightly faster and uses less memory than OrderedListResult if you don’t care about the ordering of the documents within groups.

The as_dict method returns a dictionary mapping group names to lists of document numbers.

class whoosh.sorting.Count¶

Stores the number of documents in each group.

The as_dict method returns a dictionary mapping group names to integers.

class whoosh.sorting.Best¶

Stores the “best” document in each group (that is, the one with the highest sort key).

The as_dict method returns a dictionary mapping group names to docnument numbers.

sorting module

`sorting` module

Table Of Contents

`sorting` module¶

Base types¶

Facet types¶

Facets object¶

FacetType objects¶

sorting module

sorting module

Table Of Contents

sorting module¶

Base types¶

Facet types¶

Facets object¶

FacetType objects¶

`sorting` module

`sorting` module¶