filedb.filetables module

filedb.filetables module

filedb.filetables module

This module defines writer and reader classes for a fast, immutable on-disk key-value database format. The current format is based heavily on D. J. Bernstein’s CDB format (http://cr.yp.to/cdb.html).

Hash file

class whoosh.filedb.filetables.HashWriter(dbfile, magic=b'HSH3', hashtype=0)

Implements a fast on-disk key-value store. This hash uses a two-level hashing scheme, where a key is hashed, the low eight bits of the hash value are used to index into one of 256 hash tables. This is basically the CDB algorithm, but unlike CDB this object writes all data serially (it doesn’t seek backwards to overwrite information at the end).

Also unlike CDB, this format uses 64-bit file pointers, so the file length is essentially unlimited. However, each key and value must be less than 2 GB in length.

Parameters:
  • dbfile – a StructFile object to write to.
  • magic – the format tag bytes to write at the start of the file.
  • hashtype – an integer indicating which hashing algorithm to use. Possible values are 0 (MD5), 1 (CRC32), or 2 (CDB hash).
add(key, value)

Adds a key/value pair to the file. Note that keys DO NOT need to be unique. You can store multiple values under the same key and retrieve them using HashReader.all().

add_all(items)

Convenience method to add a sequence of (key, value) pairs. This is the same as calling HashWriter.add() on each pair in the sequence.

class whoosh.filedb.filetables.HashReader(dbfile, length=None, magic=b'HSH3', startoffset=0)

Reader for the fast on-disk key-value files created by HashWriter.

Parameters:
  • dbfile – a StructFile object to read from.
  • length – the length of the file data. This is necessary since the hashing information is written at the end of the file.
  • magic – the format tag bytes to look for at the start of the file. If the file’s format tag does not match these bytes, the object raises a FileFormatError exception.
  • startoffset – the starting point of the file data.
all(key)

Yields a sequence of values associated with the given key.

classmethod open(storage, name)

Convenience method to open a hash file given a whoosh.filedb.filestore.Storage object and a name. This takes care of opening the file and passing its length to the initializer.

ranges_for_key(key)

Yields a sequence of (datapos, datalength) tuples associated with the given key.

Ordered Hash file

class whoosh.filedb.filetables.OrderedHashWriter(dbfile)

Implements an on-disk hash, but requires that keys be added in order. An OrderedHashReader can then look up “nearest keys” based on the ordering.

Parameters:
  • dbfile – a StructFile object to write to.
  • magic – the format tag bytes to write at the start of the file.
  • hashtype – an integer indicating which hashing algorithm to use. Possible values are 0 (MD5), 1 (CRC32), or 2 (CDB hash).
class whoosh.filedb.filetables.OrderedHashReader(dbfile, length=None, magic=b'HSH3', startoffset=0)
Parameters:
  • dbfile – a StructFile object to read from.
  • length – the length of the file data. This is necessary since the hashing information is written at the end of the file.
  • magic – the format tag bytes to look for at the start of the file. If the file’s format tag does not match these bytes, the object raises a FileFormatError exception.
  • startoffset – the starting point of the file data.