sorting module

Base types

class whoosh.sorting.FacetType

Base class for “facets”, aspects that can be sorted/faceted.

categorizer(global_searcher)

Returns a Categorizer corresponding to this facet.

Parameters:global_searcher – A parent searcher. You can use this searcher if you need global document ID references.
class whoosh.sorting.Categorizer

Base class for categorizer objects which compute a key value for a document based on certain criteria, for use in sorting/faceting.

Categorizers are created by FacetType objects through the FacetType.categorizer() method. The whoosh.searching.Searcher object passed to the categorizer method may be a composite searcher (that is, wrapping a multi-reader), but categorizers are always run per-segment, with segment-relative document numbers.

The collector will call a categorizer’s set_searcher method as it searches each segment to let the cateogorizer set up whatever segment- specific data it needs.

Collector.allow_overlap should be True if the caller can use the keys_for method instead of key_for to group documents into potentially overlapping groups. The default is False.

If a categorizer subclass can categorize the document using only the document number, it should set Collector.needs_current to False (this is the default) and NOT USE the given matcher in the key_for or keys_for methods, since in that case segment_docnum is not guaranteed to be consistent with the given matcher. If a categorizer subclass needs to access information on the matcher, it should set needs_current to True. This will prevent the caller from using optimizations that might leave the matcher in an inconsistent state.

key_for(matcher, segment_docnum)

Returns a key for the current match.

Parameters:
  • matcher – a whoosh.matching.Matcher object. If self.needs_current is False, DO NOT use this object, since it may be inconsistent. Use the given segment_docnum instead.
  • segment_docnum – the segment-relative document number of the current match.
key_to_name(key)

Returns a representation of the key to be used as a dictionary key in faceting. For example, the sorting key for date fields is a large integer; this method translates it into a datetime object to make the groupings clearer.

keys_for(matcher, segment_docnum)

Yields a series of keys for the current match.

This method will be called instead of key_for if self.allow_overlap is True.

Parameters:
  • matcher – a whoosh.matching.Matcher object. If self.needs_current is False, DO NOT use this object, since it may be inconsistent. Use the given segment_docnum instead.
  • segment_docnum – the segment-relative document number of the current match.
set_searcher(segment_searcher, docoffset)

Called by the collector when the collector moves to a new segment. The segment_searcher will be atomic. The docoffset is the offset of the segment’s document numbers relative to the entire index. You can use the offset to get absolute index docnums by adding the offset to segment-relative docnums.

Facet types

class whoosh.sorting.FieldFacet(fieldname, reverse=False, allow_overlap=False, maptype=None)

Sorts/facets by the contents of a field.

For example, to sort by the contents of the “path” field in reverse order, and facet by the contents of the “tag” field:

paths = FieldFacet("path", reverse=True)
tags = FieldFacet("tag")
results = searcher.search(myquery, sortedby=paths, groupedby=tags)

This facet returns different categorizers based on the field type.

Parameters:
  • fieldname – the name of the field to sort/facet on.
  • reverse – if True, when sorting, reverse the sort order of this facet.
  • allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field.
class whoosh.sorting.QueryFacet(querydict, other=None, allow_overlap=False, maptype=None)

Sorts/facets based on the results of a series of queries.

Parameters:
  • querydict – a dictionary mapping keys to whoosh.query.Query objects.
  • other – the key to use for documents that don’t match any of the queries.
class whoosh.sorting.RangeFacet(fieldname, start, end, gap, hardend=False, maptype=None)

Sorts/facets based on numeric ranges. For textual ranges, use QueryFacet.

For example, to facet the “price” field into $100 buckets, up to $1000:

prices = RangeFacet("price", 0, 1000, 100)
results = searcher.search(myquery, groupedby=prices)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:
  • fieldname – the numeric field to sort/facet on.
  • start – the start of the entire range.
  • end – the end of the entire range.
  • gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, gap=[1,5,10] will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets.
  • hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.
class whoosh.sorting.DateRangeFacet(fieldname, start, end, gap, hardend=False, maptype=None)

Sorts/facets based on date ranges. This is the same as RangeFacet except you are expected to use daterange objects as the start and end of the range, and timedelta or relativedelta objects as the gap(s), and it generates DateRange queries instead of TermRange queries.

For example, to facet a “birthday” range into 5 year buckets:

from datetime import datetime
from whoosh.support.relativedelta import relativedelta

startdate = datetime(1920, 0, 0)
enddate = datetime.now()
gap = relativedelta(years=5)
bdays = DateRangeFacet("birthday", startdate, enddate, gap)
results = searcher.search(myquery, groupedby=bdays)

The ranges/buckets are always inclusive at the start and exclusive at the end.

Parameters:
  • fieldname – the numeric field to sort/facet on.
  • start – the start of the entire range.
  • end – the end of the entire range.
  • gap – the size of each “bucket” in the range. This can be a sequence of sizes. For example, gap=[1,5,10] will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets.
  • hardend – if True, the end of the last bucket is clamped to the value of end. If False (the default), the last bucket is always gap sized, even if that means the end of the last bucket is after end.
class whoosh.sorting.ScoreFacet

Uses a document’s score as a sorting criterion.

For example, to sort by the tag field, and then within that by relative score:

tag_score = MultiFacet(["tag", ScoreFacet()])
results = searcher.search(myquery, sortedby=tag_score)
class whoosh.sorting.FunctionFacet(fn, maptype=None)

This facet type is low-level. In most cases you should use TranslateFacet instead.

This facet type ets you pass an arbitrary function that will compute the key. This may be easier than subclassing FacetType and Categorizer to set up the desired behavior.

The function is called with the arguments (searcher, docid), where the searcher may be a composite searcher, and the docid is an absolute index document number (not segment-relative).

For example, to use the number of words in the document’s “content” field as the sorting/faceting key:

fn = lambda s, docid: s.doc_field_length(docid, "content")
lengths = FunctionFacet(fn)
class whoosh.sorting.MultiFacet(items=None, maptype=None)

Sorts/facets by the combination of multiple “sub-facets”.

For example, to sort by the value of the “tag” field, and then (for documents where the tag is the same) by the value of the “path” field:

facet = MultiFacet([FieldFacet("tag"), FieldFacet("path")])
results = searcher.search(myquery, sortedby=facet)

As a shortcut, you can use strings to refer to field names, and they will be assumed to be field names and turned into FieldFacet objects:

facet = MultiFacet(["tag", "path"])

You can also use the add_* methods to add criteria to the multifacet:

facet = MultiFacet()
facet.add_field("tag")
facet.add_field("path", reverse=True)
facet.add_query({"a-m": TermRange("name", "a", "m"),
                 "n-z": TermRange("name", "n", "z")})
class whoosh.sorting.StoredFieldFacet(fieldname, allow_overlap=False, split_fn=None, maptype=None)

Lets you sort/group using the value in an unindexed, stored field (e.g. whoosh.fields.STORED). This is usually slower than using an indexed field.

For fields where the stored value is a space-separated list of keywords, (e.g. "tag1 tag2 tag3"), you can use the allow_overlap keyword argument to allow overlapped faceting on the result of calling the split() method on the field value (or calling a custom split function if one is supplied).

Parameters:
  • fieldname – the name of the stored field.
  • allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field. The categorizer uses string.split() or the custom split_fn to convert the stored value into a list of facet values.
  • split_fn – a custom function to split a stored field value into multiple facet values when allow_overlap is True. If not supplied, the categorizer simply calls the value’s split() method.

Facets object

class whoosh.sorting.Facets(x=None)

Maps facet names to FacetType objects, for creating multiple groupings of documents.

For example, to group by tag, and also group by price range:

facets = Facets()
facets.add_field("tag")
facets.add_facet("price", RangeFacet("price", 0, 1000, 100))
results = searcher.search(myquery, groupedby=facets)

tag_groups = results.groups("tag")
price_groups = results.groups("price")

(To group by the combination of multiple facets, use MultiFacet.)

add_facet(name, facet)

Adds a FacetType object under the given name.

add_facets(facets, replace=True)

Adds the contents of the given Facets or dict object to this object.

add_field(fieldname, **kwargs)

Adds a FieldFacet for the given field name (the field name is automatically used as the facet name).

add_query(name, querydict, **kwargs)

Adds a QueryFacet under the given name.

Parameters:
  • name – a name for the facet.
  • querydict – a dictionary mapping keys to whoosh.query.Query objects.
items()

Returns a list of (facetname, facetobject) tuples for the facets in this object.

names()

Returns an iterator of the facet names in this object.

FacetType objects

class whoosh.sorting.FacetMap

Base class for objects holding the results of grouping search results by a Facet. Use an object’s as_dict() method to access the results.

You can pass a subclass of this to the maptype keyword argument when creating a FacetType object to specify what information the facet should record about the group. For example:

# Record each document in each group in its sorted order
myfacet = FieldFacet("size", maptype=OrderedList)

# Record only the count of documents in each group
myfacet = FieldFacet("size", maptype=Count)
add(groupname, docid, sortkey)

Adds a document to the facet results.

Parameters:
  • groupname – the name of the group to add this document to.
  • docid – the document number of the document to add.
  • sortkey – a value representing the sort position of the document in the full results.
as_dict()

Returns a dictionary object mapping group names to implementation-specific values. For example, the value might be a list of document numbers, or a integer representing the number of documents in the group.

class whoosh.sorting.OrderedList

Stores a list of document numbers for each group, in the same order as they appear in the search results.

The as_dict method returns a dictionary mapping group names to lists of document numbers.

class whoosh.sorting.UnorderedList

Stores a list of document numbers for each group, in arbitrary order. This is slightly faster and uses less memory than OrderedListResult if you don’t care about the ordering of the documents within groups.

The as_dict method returns a dictionary mapping group names to lists of document numbers.

class whoosh.sorting.Count

Stores the number of documents in each group.

The as_dict method returns a dictionary mapping group names to integers.

class whoosh.sorting.Best

Stores the “best” document in each group (that is, the one with the highest sort key).

The as_dict method returns a dictionary mapping group names to docnument numbers.