columns
module¶
The API and implementation of columns may change in the next version of Whoosh!
This module contains “Column” objects which you can use as the argument to a
Field object’s sortable=
keyword argument. Each field defines a default
column type for when the user specifies sortable=True
(the object returned
by the field’s default_column()
method).
The default column type for most fields is VarBytesColumn
,
although numeric and date fields use NumericColumn
. Expert users may use
other field types that may be faster or more storage efficient based on the
field contents. For example, if a field always contains one of a limited number
of possible values, a RefBytesColumn
will save space by only storing the
values once. If a field’s values are always a fixed length, the
FixedBytesColumn
saves space by not storing the length of each value.
A Column
object basically exists to store configuration information and
provides two important methods: writer()
to return a ColumnWriter
object
and reader()
to return a ColumnReader
object.
Base classes¶
-
class
whoosh.columns.
Column
¶ Represents a “column” of rows mapping docnums to document values.
The interface requires that you store the start offset of the column, the length of the column data, and the number of documents (rows) separately, and pass them to the reader object.
-
default_value
(reverse=False)¶ Returns the default value for this column type.
-
reader
(dbfile, basepos, length, doccount)¶ Returns a
ColumnReader
object you can use to read a column of this type from disk.Parameters: - dbfile – the
StructFile
to read from. - basepos – the offset within the file at which the column starts.
- length – the length in bytes of the column occupies in the file.
- doccount – the number of rows (documents) in the column.
- dbfile – the
-
stores_lists
()¶ Returns True if the column stores a list of values for each document instead of a single value.
-
writer
(dbfile)¶ Returns a
ColumnWriter
object you can use to use to create a column of this type on disk.Parameters: dbfile – the StructFile
to write to.
-
-
class
whoosh.columns.
ColumnWriter
(dbfile)¶
-
class
whoosh.columns.
ColumnReader
(dbfile, basepos, length, doccount)¶
Basic columns¶
-
class
whoosh.columns.
VarBytesColumn
(allow_offsets=True, write_offsets_cutoff=32768)¶ Stores variable length byte strings. See also
RefBytesColumn
.The current implementation limits the total length of all document values a segment to 2 GB.
The default value (the value returned for a document that didn’t have a value assigned to it at indexing time) is an empty bytestring (
b''
).Parameters: - allow_offsets – Whether the column should write offsets when there are many rows in the column (this makes opening the column much faster). This argument is mostly for testing.
- write_offsets_cutoff – Write offsets (for speed) when there are more than this many rows in the column. This argument is mostly for testing.
-
class
whoosh.columns.
FixedBytesColumn
(fixedlen, default=None)¶ Stores fixed-length byte strings.
Parameters: - fixedlen – the fixed length of byte strings in this column.
- default – the default value to use for documents that don’t
specify a value. If you don’t specify a default, the column will
use
b'\x00' * fixedlen
.
-
class
whoosh.columns.
RefBytesColumn
(fixedlen=0, default=None)¶ Stores variable-length or fixed-length byte strings, similar to
VarBytesColumn
andFixedBytesColumn
. However, where those columns stores a value for each document, this column keeps a list of all the unique values in the field, and for each document stores a short pointer into the unique list. For fields where the number of possible values is smaller than the number of documents (for example, “category” or “chapter”), this saves significant space.This column type supports a maximum of 65535 unique values across all documents in a segment. You should generally use this column type where the number of unique values is in no danger of approaching that number (for example, a “tags” field). If you try to index too many unique values, the column will convert additional unique values to the default value and issue a warning using the
warnings
module (this will usually be preferable to crashing the indexer and potentially losing indexed documents).Parameters: - fixedlen – an optional fixed length for the values. If you specify a number other than 0, the column will require all values to be the specified length.
- default – a default value to use for documents that don’t specify
one. If you don’t specify a default, the column will use an empty
bytestring (
b''
), or if you specify a fixed length,b'\x00' * fixedlen
.
-
class
whoosh.columns.
NumericColumn
(typecode, default=0)¶ Stores numbers (integers and floats) as compact binary.
Parameters: - typecode – a typecode character (as used by the
struct
module) specifying the number type. For example,"i"
for signed integers. - default – the default value to use for documents that don’t specify one.
- typecode – a typecode character (as used by the
Technical columns¶
-
class
whoosh.columns.
BitColumn
(compress_at=2048)¶ Stores a column of True/False values compactly.
Parameters: compress_at – columns with this number of values or fewer will be saved compressed on disk, and loaded into RAM for reading. Set this to 0 to disable compression.
-
class
whoosh.columns.
CompressedBytesColumn
(level=3, module='zlib')¶ Stores variable-length byte strings compressed using deflate (by default).
Parameters: - level – the compression level to use.
- module – a string containing the name of the compression module to use. The default is “zlib”. The module should export “compress” and “decompress” functions.
-
class
whoosh.columns.
StructColumn
(spec, default)¶ Parameters: - fixedlen – the fixed length of byte strings in this column.
- default – the default value to use for documents that don’t
specify a value. If you don’t specify a default, the column will
use
b'\x00' * fixedlen
.
-
class
whoosh.columns.
PickleColumn
(child)¶ Converts arbitrary objects to pickled bytestrings and stores them using the wrapped column (usually a
VarBytesColumn
orCompressedBytesColumn
).If you can express the value you want to store as a number or bytestring, you should use the appropriate column type to avoid the time and size overhead of pickling and unpickling.