- Release notes
- Quick start
- Introduction to Whoosh
- Glossary
- Designing a schema
- How to index documents
- How to search
- Parsing user queries
- The default query language
- Indexing and parsing dates/times
- Query objects
- About analyzers
- Stemming, variations, and accent folding
- Indexing and searching N-grams
- Sorting and faceting
- How to create highlighted search result excerpts
- Query expansion and Key word extraction
- “Did you mean… ?” Correcting errors in user queries
- Field caches
- Tips for speeding up batch indexing
- Concurrency, locking, and versioning
- Indexing and searching document hierarchies
- Whoosh recipes
- Whoosh API
analysis
modulecodec.base
modulecollectors
module- Base classes
Collector
ScoredCollector
WrappingCollector
WrappingCollector.all_ids()
WrappingCollector.collect()
WrappingCollector.collect_matches()
WrappingCollector.count()
WrappingCollector.finish()
WrappingCollector.matches()
WrappingCollector.prepare()
WrappingCollector.remove()
WrappingCollector.results()
WrappingCollector.set_subsearcher()
WrappingCollector.sort_key()
- Basic collectors
- Wrappers
- Base classes
columns
modulefields
module- Schema class
- FieldType base class
FieldType
FieldType.clean()
FieldType.index()
FieldType.parse_query()
FieldType.parse_range()
FieldType.process_text()
FieldType.self_parsing()
FieldType.separate_spelling()
FieldType.sortable_terms()
FieldType.spellable_words()
FieldType.spelling_fieldname()
FieldType.subfields()
FieldType.supports()
FieldType.to_bytes()
FieldType.to_column_value()
FieldType.tokenize()
- Pre-made field types
- Exceptions
filedb.filestore
module- Base class
Storage
Storage.close()
Storage.create()
Storage.create_file()
Storage.create_index()
Storage.delete_file()
Storage.destroy()
Storage.file_exists()
Storage.file_length()
Storage.file_modified()
Storage.index_exists()
Storage.list()
Storage.lock()
Storage.open_file()
Storage.open_index()
Storage.optimize()
Storage.rename_file()
Storage.temp_storage()
- Implementation classes
- Helper functions
- Exceptions
- Base class
filedb.filetables
modulefiledb.structfile
module- Classes
StructFile
StructFile.close()
StructFile.flush()
StructFile.read_pickle()
StructFile.read_string()
StructFile.read_svarint()
StructFile.read_tagint()
StructFile.read_varint()
StructFile.write_byte()
StructFile.write_pickle()
StructFile.write_string()
StructFile.write_svarint()
StructFile.write_tagint()
StructFile.write_varint()
BufferFile
ChecksumFile
- Classes
formats
modulehighlight
modulesupport.bitvector
moduleindex
module- Functions
- Base class
Index
Index.add_field()
Index.close()
Index.doc_count()
Index.doc_count_all()
Index.field_length()
Index.is_empty()
Index.last_modified()
Index.latest_generation()
Index.max_field_length()
Index.optimize()
Index.reader()
Index.refresh()
Index.remove_field()
Index.searcher()
Index.up_to_date()
Index.writer()
- Implementation
- Exceptions
lang.morph_en
modulelang.porter
modulelang.wordnet
modulematching
module- Matchers
Matcher
Matcher.all_ids()
Matcher.all_items()
Matcher.block_quality()
Matcher.children()
Matcher.copy()
Matcher.depth()
Matcher.id()
Matcher.is_active()
Matcher.items_as()
Matcher.matching_terms()
Matcher.max_quality()
Matcher.next()
Matcher.replace()
Matcher.reset()
Matcher.score()
Matcher.skip_to()
Matcher.skip_to_quality()
Matcher.spans()
Matcher.supports()
Matcher.supports_block_quality()
Matcher.term()
Matcher.term_matchers()
Matcher.value()
Matcher.value_as()
Matcher.weight()
NullMatcher
ListMatcher
WrappingMatcher
MultiMatcher
FilterMatcher
BiMatcher
AdditiveBiMatcher
UnionMatcher
DisjunctionMaxMatcher
IntersectionMatcher
AndNotMatcher
InverseMatcher
RequireMatcher
AndMaybeMatcher
ConstantScoreMatcher
- Exceptions
- Matchers
qparser
module- Parser object
QueryParser
QueryParser.add_plugin()
QueryParser.add_plugins()
QueryParser.default_set()
QueryParser.filterize()
QueryParser.filters()
QueryParser.multitoken_query()
QueryParser.parse()
QueryParser.process()
QueryParser.remove_plugin()
QueryParser.remove_plugin_class()
QueryParser.replace_plugin()
QueryParser.tag()
QueryParser.taggers()
QueryParser.term_query()
- Pre-made configurations
- Plug-ins
- Syntax node objects
- Parser object
query
module- Base classes
Query
Query.accept()
Query.all_terms()
Query.all_tokens()
Query.apply()
Query.children()
Query.copy()
Query.deletion_docs()
Query.docs()
Query.estimate_min_size()
Query.estimate_size()
Query.existing_terms()
Query.field()
Query.has_terms()
Query.is_leaf()
Query.is_range()
Query.iter_all_terms()
Query.leaves()
Query.matcher()
Query.normalize()
Query.replace()
Query.requires()
Query.simplify()
Query.terms()
Query.tokens()
Query.with_boost()
CompoundQuery
MultiTerm
ExpandingTerm
WrappingQuery
- Query classes
- Binary queries
- Span queries
- Special queries
- Exceptions
- Base classes
reading
module- Classes
IndexReader
IndexReader.all_doc_ids()
IndexReader.all_stored_fields()
IndexReader.all_terms()
IndexReader.close()
IndexReader.codec()
IndexReader.column_reader()
IndexReader.corrector()
IndexReader.doc_count()
IndexReader.doc_count_all()
IndexReader.doc_field_length()
IndexReader.doc_frequency()
IndexReader.expand_prefix()
IndexReader.field_length()
IndexReader.field_terms()
IndexReader.first_id()
IndexReader.frequency()
IndexReader.generation()
IndexReader.has_deletions()
IndexReader.has_vector()
IndexReader.indexed_field_names()
IndexReader.is_deleted()
IndexReader.iter_docs()
IndexReader.iter_field()
IndexReader.iter_from()
IndexReader.iter_postings()
IndexReader.iter_prefix()
IndexReader.leaf_readers()
IndexReader.lexicon()
IndexReader.max_field_length()
IndexReader.min_field_length()
IndexReader.most_distinctive_terms()
IndexReader.most_frequent_terms()
IndexReader.postings()
IndexReader.segment()
IndexReader.storage()
IndexReader.stored_fields()
IndexReader.term_info()
IndexReader.terms_from()
IndexReader.terms_within()
IndexReader.vector()
IndexReader.vector_as()
MultiReader
TermInfo
- Exceptions
- Classes
scoring
modulesearching
module- Searching classes
Searcher
Searcher.boolean_context()
Searcher.collector()
Searcher.context()
Searcher.correct_query()
Searcher.doc_count()
Searcher.doc_count_all()
Searcher.docs_for_query()
Searcher.document()
Searcher.document_number()
Searcher.document_numbers()
Searcher.documents()
Searcher.get_parent()
Searcher.idf()
Searcher.key_terms()
Searcher.key_terms_from_text()
Searcher.more_like()
Searcher.postings()
Searcher.reader()
Searcher.refresh()
Searcher.search()
Searcher.search_page()
Searcher.search_with_collector()
Searcher.suggest()
Searcher.up_to_date()
- Results classes
Results
Results.copy()
Results.docnum()
Results.docs()
Results.estimated_length()
Results.estimated_min_length()
Results.extend()
Results.facet_names()
Results.fields()
Results.filter()
Results.groups()
Results.has_exact_length()
Results.has_matched_terms()
Results.is_empty()
Results.items()
Results.key_terms()
Results.matched_terms()
Results.score()
Results.scored_length()
Results.upgrade()
Results.upgrade_and_extend()
Hit
ResultsPage
- Exceptions
- Searching classes
sorting
modulespelling
modulesupport.charset
modulesupport.levenshtein
moduleutil
modulewriting
module- Writer
IndexWriter
IndexWriter.add_document()
IndexWriter.add_field()
IndexWriter.cancel()
IndexWriter.commit()
IndexWriter.delete_by_query()
IndexWriter.delete_by_term()
IndexWriter.delete_document()
IndexWriter.end_group()
IndexWriter.group()
IndexWriter.reader()
IndexWriter.remove_field()
IndexWriter.start_group()
IndexWriter.update_document()
- Utility writers
- Exceptions
- Writer
- Technical notes
Glossary¶
- Analysis¶¶
- The process of breaking the text of a field into individual terms to be indexed. This consists of tokenizing the text into terms, and then optionally filtering the tokenized terms (for example, lowercasing and removing stop words). Whoosh includes several different analyzers.
- Corpus¶¶
- The set of documents you are indexing.
- Documents¶¶
- The individual pieces of content you want to make searchable. The word “documents” might imply files, but the data source could really be anything – articles in a content management system, blog posts in a blogging system, chunks of a very large file, rows returned from an SQL query, individual email messages from a mailbox file, or whatever. When you get search results from Whoosh, the results are a list of documents, whatever “documents” means in your search engine.
- Fields¶¶
- Each document contains a set of fields. Typical fields might be “title”, “content”, “url”, “keywords”, “status”, “date”, etc. Fields can be indexed (so they’re searchable) and/or stored with the document. Storing the field makes it available in search results. For example, you typically want to store the “title” field so your search results can display it.
- Forward index¶¶
- A table listing every document and the words that appear in the document. Whoosh lets you store term vectors that are a kind of forward index.
- Indexing¶¶
- The process of examining documents in the corpus and adding them to the reverse index.
- Postings¶¶
- The reverse index lists every word in the corpus, and for each word, a list of documents in which that word appears, along with some optional information (such as the number of times the word appears in that document). These items in the list, containing a document number and any extra information, are called postings. In Whoosh the information stored in postings is customizable for each field.
- Reverse index¶¶
- Basically a table listing every word in the corpus, and for each word, the list of documents in which it appears. It can be more complicated (the index can also list how many times the word appears in each document, the positions at which it appears, etc.) but that’s how it basically works.
- Schema¶¶
- Whoosh requires that you specify the fields of the index before you begin indexing. The Schema associates field names with metadata about the field, such as the format of the postings and whether the contents of the field are stored in the index.
- Term vector¶¶
- A forward index for a certain field in a certain document. You can specify in the Schema that a given field should store term vectors.