sorting
module¶
Base types¶
-
class
whoosh.sorting.
FacetType
¶ Base class for “facets”, aspects that can be sorted/faceted.
-
categorizer
(global_searcher)¶ Returns a
Categorizer
corresponding to this facet.Parameters: global_searcher – A parent searcher. You can use this searcher if you need global document ID references.
-
-
class
whoosh.sorting.
Categorizer
¶ Base class for categorizer objects which compute a key value for a document based on certain criteria, for use in sorting/faceting.
Categorizers are created by FacetType objects through the
FacetType.categorizer()
method. Thewhoosh.searching.Searcher
object passed to thecategorizer
method may be a composite searcher (that is, wrapping a multi-reader), but categorizers are always run per-segment, with segment-relative document numbers.The collector will call a categorizer’s
set_searcher
method as it searches each segment to let the cateogorizer set up whatever segment- specific data it needs.Collector.allow_overlap
should beTrue
if the caller can use thekeys_for
method instead ofkey_for
to group documents into potentially overlapping groups. The default isFalse
.If a categorizer subclass can categorize the document using only the document number, it should set
Collector.needs_current
toFalse
(this is the default) and NOT USE the given matcher in thekey_for
orkeys_for
methods, since in that casesegment_docnum
is not guaranteed to be consistent with the given matcher. If a categorizer subclass needs to access information on the matcher, it should setneeds_current
toTrue
. This will prevent the caller from using optimizations that might leave the matcher in an inconsistent state.-
key_for
(matcher, segment_docnum)¶ Returns a key for the current match.
Parameters: - matcher – a
whoosh.matching.Matcher
object. Ifself.needs_current
isFalse
, DO NOT use this object, since it may be inconsistent. Use the givensegment_docnum
instead. - segment_docnum – the segment-relative document number of the current match.
- matcher – a
-
key_to_name
(key)¶ Returns a representation of the key to be used as a dictionary key in faceting. For example, the sorting key for date fields is a large integer; this method translates it into a
datetime
object to make the groupings clearer.
-
keys_for
(matcher, segment_docnum)¶ Yields a series of keys for the current match.
This method will be called instead of
key_for
ifself.allow_overlap
isTrue
.Parameters: - matcher – a
whoosh.matching.Matcher
object. Ifself.needs_current
isFalse
, DO NOT use this object, since it may be inconsistent. Use the givensegment_docnum
instead. - segment_docnum – the segment-relative document number of the current match.
- matcher – a
-
set_searcher
(segment_searcher, docoffset)¶ Called by the collector when the collector moves to a new segment. The
segment_searcher
will be atomic. Thedocoffset
is the offset of the segment’s document numbers relative to the entire index. You can use the offset to get absolute index docnums by adding the offset to segment-relative docnums.
-
Facet types¶
-
class
whoosh.sorting.
FieldFacet
(fieldname, reverse=False, allow_overlap=False, maptype=None)¶ Sorts/facets by the contents of a field.
For example, to sort by the contents of the “path” field in reverse order, and facet by the contents of the “tag” field:
paths = FieldFacet("path", reverse=True) tags = FieldFacet("tag") results = searcher.search(myquery, sortedby=paths, groupedby=tags)
This facet returns different categorizers based on the field type.
Parameters: - fieldname – the name of the field to sort/facet on.
- reverse – if True, when sorting, reverse the sort order of this facet.
- allow_overlap – if True, when grouping, allow documents to appear in multiple groups when they have multiple terms in the field.
-
class
whoosh.sorting.
QueryFacet
(querydict, other=None, allow_overlap=False, maptype=None)¶ Sorts/facets based on the results of a series of queries.
Parameters: - querydict – a dictionary mapping keys to
whoosh.query.Query
objects. - other – the key to use for documents that don’t match any of the queries.
- querydict – a dictionary mapping keys to
-
class
whoosh.sorting.
RangeFacet
(fieldname, start, end, gap, hardend=False, maptype=None)¶ Sorts/facets based on numeric ranges. For textual ranges, use
QueryFacet
.For example, to facet the “price” field into $100 buckets, up to $1000:
prices = RangeFacet("price", 0, 1000, 100) results = searcher.search(myquery, groupedby=prices)
The ranges/buckets are always inclusive at the start and exclusive at the end.
Parameters: - fieldname – the numeric field to sort/facet on.
- start – the start of the entire range.
- end – the end of the entire range.
- gap – the size of each “bucket” in the range. This can be a
sequence of sizes. For example,
gap=[1,5,10]
will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets. - hardend – if True, the end of the last bucket is clamped to the
value of
end
. If False (the default), the last bucket is alwaysgap
sized, even if that means the end of the last bucket is afterend
.
-
class
whoosh.sorting.
DateRangeFacet
(fieldname, start, end, gap, hardend=False, maptype=None)¶ Sorts/facets based on date ranges. This is the same as RangeFacet except you are expected to use
daterange
objects as the start and end of the range, andtimedelta
orrelativedelta
objects as the gap(s), and it generatesDateRange
queries instead ofTermRange
queries.For example, to facet a “birthday” range into 5 year buckets:
from datetime import datetime from whoosh.support.relativedelta import relativedelta startdate = datetime(1920, 0, 0) enddate = datetime.now() gap = relativedelta(years=5) bdays = DateRangeFacet("birthday", startdate, enddate, gap) results = searcher.search(myquery, groupedby=bdays)
The ranges/buckets are always inclusive at the start and exclusive at the end.
Parameters: - fieldname – the numeric field to sort/facet on.
- start – the start of the entire range.
- end – the end of the entire range.
- gap – the size of each “bucket” in the range. This can be a
sequence of sizes. For example,
gap=[1,5,10]
will use 1 as the size of the first bucket, 5 as the size of the second bucket, and 10 as the size of all subsequent buckets. - hardend – if True, the end of the last bucket is clamped to the
value of
end
. If False (the default), the last bucket is alwaysgap
sized, even if that means the end of the last bucket is afterend
.
-
class
whoosh.sorting.
ScoreFacet
¶ Uses a document’s score as a sorting criterion.
For example, to sort by the
tag
field, and then within that by relative score:tag_score = MultiFacet(["tag", ScoreFacet()]) results = searcher.search(myquery, sortedby=tag_score)
-
class
whoosh.sorting.
FunctionFacet
(fn, maptype=None)¶ This facet type is low-level. In most cases you should use
TranslateFacet
instead.This facet type ets you pass an arbitrary function that will compute the key. This may be easier than subclassing FacetType and Categorizer to set up the desired behavior.
The function is called with the arguments
(searcher, docid)
, where thesearcher
may be a composite searcher, and thedocid
is an absolute index document number (not segment-relative).For example, to use the number of words in the document’s “content” field as the sorting/faceting key:
fn = lambda s, docid: s.doc_field_length(docid, "content") lengths = FunctionFacet(fn)
-
class
whoosh.sorting.
MultiFacet
(items=None, maptype=None)¶ Sorts/facets by the combination of multiple “sub-facets”.
For example, to sort by the value of the “tag” field, and then (for documents where the tag is the same) by the value of the “path” field:
facet = MultiFacet([FieldFacet("tag"), FieldFacet("path")]) results = searcher.search(myquery, sortedby=facet)
As a shortcut, you can use strings to refer to field names, and they will be assumed to be field names and turned into FieldFacet objects:
facet = MultiFacet(["tag", "path"])
You can also use the
add_*
methods to add criteria to the multifacet:facet = MultiFacet() facet.add_field("tag") facet.add_field("path", reverse=True) facet.add_query({"a-m": TermRange("name", "a", "m"), "n-z": TermRange("name", "n", "z")})
-
class
whoosh.sorting.
StoredFieldFacet
(fieldname, allow_overlap=False, split_fn=None, maptype=None)¶ Lets you sort/group using the value in an unindexed, stored field (e.g.
whoosh.fields.STORED
). This is usually slower than using an indexed field.For fields where the stored value is a space-separated list of keywords, (e.g.
"tag1 tag2 tag3"
), you can use theallow_overlap
keyword argument to allow overlapped faceting on the result of calling thesplit()
method on the field value (or calling a custom split function if one is supplied).Parameters: - fieldname – the name of the stored field.
- allow_overlap – if True, when grouping, allow documents to appear
in multiple groups when they have multiple terms in the field. The
categorizer uses
string.split()
or the customsplit_fn
to convert the stored value into a list of facet values. - split_fn – a custom function to split a stored field value into
multiple facet values when
allow_overlap
is True. If not supplied, the categorizer simply calls the value’ssplit()
method.
Facets object¶
-
class
whoosh.sorting.
Facets
(x=None)¶ Maps facet names to
FacetType
objects, for creating multiple groupings of documents.For example, to group by tag, and also group by price range:
facets = Facets() facets.add_field("tag") facets.add_facet("price", RangeFacet("price", 0, 1000, 100)) results = searcher.search(myquery, groupedby=facets) tag_groups = results.groups("tag") price_groups = results.groups("price")
(To group by the combination of multiple facets, use
MultiFacet
.)-
add_facets
(facets, replace=True)¶ Adds the contents of the given
Facets
ordict
object to this object.
-
add_field
(fieldname, **kwargs)¶ Adds a
FieldFacet
for the given field name (the field name is automatically used as the facet name).
-
add_query
(name, querydict, **kwargs)¶ Adds a
QueryFacet
under the givenname
.Parameters: - name – a name for the facet.
- querydict – a dictionary mapping keys to
whoosh.query.Query
objects.
-
items
()¶ Returns a list of (facetname, facetobject) tuples for the facets in this object.
-
names
()¶ Returns an iterator of the facet names in this object.
-
FacetType objects¶
-
class
whoosh.sorting.
FacetMap
¶ Base class for objects holding the results of grouping search results by a Facet. Use an object’s
as_dict()
method to access the results.You can pass a subclass of this to the
maptype
keyword argument when creating aFacetType
object to specify what information the facet should record about the group. For example:# Record each document in each group in its sorted order myfacet = FieldFacet("size", maptype=OrderedList) # Record only the count of documents in each group myfacet = FieldFacet("size", maptype=Count)
-
add
(groupname, docid, sortkey)¶ Adds a document to the facet results.
Parameters: - groupname – the name of the group to add this document to.
- docid – the document number of the document to add.
- sortkey – a value representing the sort position of the document in the full results.
-
as_dict
()¶ Returns a dictionary object mapping group names to implementation-specific values. For example, the value might be a list of document numbers, or a integer representing the number of documents in the group.
-
-
class
whoosh.sorting.
OrderedList
¶ Stores a list of document numbers for each group, in the same order as they appear in the search results.
The
as_dict
method returns a dictionary mapping group names to lists of document numbers.
-
class
whoosh.sorting.
UnorderedList
¶ Stores a list of document numbers for each group, in arbitrary order. This is slightly faster and uses less memory than
OrderedListResult
if you don’t care about the ordering of the documents within groups.The
as_dict
method returns a dictionary mapping group names to lists of document numbers.
-
class
whoosh.sorting.
Count
¶ Stores the number of documents in each group.
The
as_dict
method returns a dictionary mapping group names to integers.
-
class
whoosh.sorting.
Best
¶ Stores the “best” document in each group (that is, the one with the highest sort key).
The
as_dict
method returns a dictionary mapping group names to docnument numbers.