support.bitvector module

support.bitvector module

support.bitvector module

An implementation of an object that acts like a collection of on/off bits.

Base classes

class whoosh.idsets.DocIdSet

Base class for a set of positive integers, implementing a subset of the built-in set type’s interface with extra docid-related methods.

This is a superclass for alternative set implementations to the built-in set which are more memory-efficient and specialized toward storing sorted lists of positive integers, though they will inevitably be slower than set for most operations since they’re pure Python.

after(i)

Returns the next integer in the set after i, or None.

before(i)

Returns the previous integer in the set before i, or None.

first()

Returns the first (lowest) integer in the set.

invert_update(size)

Updates the set in-place to contain numbers in the range [0 - size) except numbers that are in this set.

last()

Returns the last (highest) integer in the set.

class whoosh.idsets.BaseBitSet

Implementation classes

class whoosh.idsets.BitSet(source=None, size=0)

A DocIdSet backed by an array of bits. This can also be useful as a bit array (e.g. for a Bloom filter). It is much more memory efficient than a large built-in set of integers, but wastes memory for sparse sets.

Parameters:
  • maxsize – the maximum size of the bit array.
  • source – an iterable of positive integers to add to this set.
  • bits – an array of unsigned bytes (“B”) to use as the underlying bit array. This is used by some of the object’s methods.
class whoosh.idsets.OnDiskBitSet(dbfile, basepos, bytecount)

A DocIdSet backed by an array of bits on disk.

>>> st = RamStorage()
>>> f = st.create_file("test.bin")
>>> bs = BitSet([1, 10, 15, 7, 2])
>>> bytecount = bs.to_disk(f)
>>> f.close()
>>> # ...
>>> f = st.open_file("test.bin")
>>> odbs = OnDiskBitSet(f, bytecount)
>>> list(odbs)
[1, 2, 7, 10, 15]
Parameters:
  • dbfile – a StructFile object to read from.
  • basepos – the base position of the bytes in the given file.
  • bytecount – the number of bytes to use for the bit array.
class whoosh.idsets.SortedIntSet(source=None, typecode='I')

A DocIdSet backed by a sorted array of integers.

class whoosh.idsets.MultiIdSet(idsets, offsets)

Wraps multiple SERIAL sub-DocIdSet objects and presents them as an aggregated, read-only set.

Parameters:
  • idsets – a list of DocIdSet objects.
  • offsets – a list of offsets corresponding to the DocIdSet objects in idsets.