columns module

The API and implementation of columns may change in the next version of Whoosh!

This module contains “Column” objects which you can use as the argument to a Field object’s sortable= keyword argument. Each field defines a default column type for when the user specifies sortable=True (the object returned by the field’s default_column() method).

The default column type for most fields is VarBytesColumn, although numeric and date fields use NumericColumn. Expert users may use other field types that may be faster or more storage efficient based on the field contents. For example, if a field always contains one of a limited number of possible values, a RefBytesColumn will save space by only storing the values once. If a field’s values are always a fixed length, the FixedBytesColumn saves space by not storing the length of each value.

A Column object basically exists to store configuration information and provides two important methods: writer() to return a ColumnWriter object and reader() to return a ColumnReader object.

Base classes

class whoosh.columns.Column

Represents a “column” of rows mapping docnums to document values.

The interface requires that you store the start offset of the column, the length of the column data, and the number of documents (rows) separately, and pass them to the reader object.

default_value(reverse=False)

Returns the default value for this column type.

reader(dbfile, basepos, length, doccount)

Returns a ColumnReader object you can use to read a column of this type from disk.

Parameters:
  • dbfile – the StructFile to read from.
  • basepos – the offset within the file at which the column starts.
  • length – the length in bytes of the column occupies in the file.
  • doccount – the number of rows (documents) in the column.
stores_lists()

Returns True if the column stores a list of values for each document instead of a single value.

writer(dbfile)

Returns a ColumnWriter object you can use to use to create a column of this type on disk.

Parameters:dbfile – the StructFile to write to.
class whoosh.columns.ColumnWriter(dbfile)
class whoosh.columns.ColumnReader(dbfile, basepos, length, doccount)

Basic columns

class whoosh.columns.VarBytesColumn(allow_offsets=True, write_offsets_cutoff=32768)

Stores variable length byte strings. See also RefBytesColumn.

The current implementation limits the total length of all document values a segment to 2 GB.

The default value (the value returned for a document that didn’t have a value assigned to it at indexing time) is an empty bytestring (b'').

Parameters:
  • allow_offsets – Whether the column should write offsets when there are many rows in the column (this makes opening the column much faster). This argument is mostly for testing.
  • write_offsets_cutoff – Write offsets (for speed) when there are more than this many rows in the column. This argument is mostly for testing.
class whoosh.columns.FixedBytesColumn(fixedlen, default=None)

Stores fixed-length byte strings.

Parameters:
  • fixedlen – the fixed length of byte strings in this column.
  • default – the default value to use for documents that don’t specify a value. If you don’t specify a default, the column will use b'\x00' * fixedlen.
class whoosh.columns.RefBytesColumn(fixedlen=0, default=None)

Stores variable-length or fixed-length byte strings, similar to VarBytesColumn and FixedBytesColumn. However, where those columns stores a value for each document, this column keeps a list of all the unique values in the field, and for each document stores a short pointer into the unique list. For fields where the number of possible values is smaller than the number of documents (for example, “category” or “chapter”), this saves significant space.

This column type supports a maximum of 65535 unique values across all documents in a segment. You should generally use this column type where the number of unique values is in no danger of approaching that number (for example, a “tags” field). If you try to index too many unique values, the column will convert additional unique values to the default value and issue a warning using the warnings module (this will usually be preferable to crashing the indexer and potentially losing indexed documents).

Parameters:
  • fixedlen – an optional fixed length for the values. If you specify a number other than 0, the column will require all values to be the specified length.
  • default – a default value to use for documents that don’t specify one. If you don’t specify a default, the column will use an empty bytestring (b''), or if you specify a fixed length, b'\x00' * fixedlen.
class whoosh.columns.NumericColumn(typecode, default=0)

Stores numbers (integers and floats) as compact binary.

Parameters:
  • typecode – a typecode character (as used by the struct module) specifying the number type. For example, "i" for signed integers.
  • default – the default value to use for documents that don’t specify one.

Technical columns

class whoosh.columns.BitColumn(compress_at=2048)

Stores a column of True/False values compactly.

Parameters:compress_at – columns with this number of values or fewer will be saved compressed on disk, and loaded into RAM for reading. Set this to 0 to disable compression.
class whoosh.columns.CompressedBytesColumn(level=3, module='zlib')

Stores variable-length byte strings compressed using deflate (by default).

Parameters:
  • level – the compression level to use.
  • module – a string containing the name of the compression module to use. The default is “zlib”. The module should export “compress” and “decompress” functions.
class whoosh.columns.StructColumn(spec, default)
Parameters:
  • fixedlen – the fixed length of byte strings in this column.
  • default – the default value to use for documents that don’t specify a value. If you don’t specify a default, the column will use b'\x00' * fixedlen.
class whoosh.columns.PickleColumn(child)

Converts arbitrary objects to pickled bytestrings and stores them using the wrapped column (usually a VarBytesColumn or CompressedBytesColumn).

If you can express the value you want to store as a number or bytestring, you should use the appropriate column type to avoid the time and size overhead of pickling and unpickling.

Experimental columns

class whoosh.columns.ClampedNumericColumn(child)

An experimental wrapper type for NumericColumn that clamps out-of-range values instead of raising an exception.