- Release notes
- Quick start
- Introduction to Whoosh
- Glossary
- Designing a schema
- How to index documents
- How to search
- Parsing user queries
- The default query language
- Indexing and parsing dates/times
- Query objects
- About analyzers
- Stemming, variations, and accent folding
- Indexing and searching N-grams
- Sorting and faceting
- How to create highlighted search result excerpts
- Query expansion and Key word extraction
- “Did you mean… ?” Correcting errors in user queries
- Field caches
- Tips for speeding up batch indexing
- Concurrency, locking, and versioning
- Indexing and searching document hierarchies
- Whoosh recipes
- Whoosh API
analysismodulecodec.basemodulecollectorsmodule- Base classes
CollectorScoredCollectorWrappingCollectorWrappingCollector.all_ids()WrappingCollector.collect()WrappingCollector.collect_matches()WrappingCollector.count()WrappingCollector.finish()WrappingCollector.matches()WrappingCollector.prepare()WrappingCollector.remove()WrappingCollector.results()WrappingCollector.set_subsearcher()WrappingCollector.sort_key()
- Basic collectors
- Wrappers
- Base classes
columnsmodulefieldsmodule- Schema class
- FieldType base class
FieldTypeFieldType.clean()FieldType.index()FieldType.parse_query()FieldType.parse_range()FieldType.process_text()FieldType.self_parsing()FieldType.separate_spelling()FieldType.sortable_terms()FieldType.spellable_words()FieldType.spelling_fieldname()FieldType.subfields()FieldType.supports()FieldType.to_bytes()FieldType.to_column_value()FieldType.tokenize()
- Pre-made field types
- Exceptions
filedb.filestoremodule- Base class
StorageStorage.close()Storage.create()Storage.create_file()Storage.create_index()Storage.delete_file()Storage.destroy()Storage.file_exists()Storage.file_length()Storage.file_modified()Storage.index_exists()Storage.list()Storage.lock()Storage.open_file()Storage.open_index()Storage.optimize()Storage.rename_file()Storage.temp_storage()
- Implementation classes
- Helper functions
- Exceptions
- Base class
filedb.filetablesmodulefiledb.structfilemodule- Classes
StructFileStructFile.close()StructFile.flush()StructFile.read_pickle()StructFile.read_string()StructFile.read_svarint()StructFile.read_tagint()StructFile.read_varint()StructFile.write_byte()StructFile.write_pickle()StructFile.write_string()StructFile.write_svarint()StructFile.write_tagint()StructFile.write_varint()
BufferFileChecksumFile
- Classes
formatsmodulehighlightmodulesupport.bitvectormoduleindexmodule- Functions
- Base class
IndexIndex.add_field()Index.close()Index.doc_count()Index.doc_count_all()Index.field_length()Index.is_empty()Index.last_modified()Index.latest_generation()Index.max_field_length()Index.optimize()Index.reader()Index.refresh()Index.remove_field()Index.searcher()Index.up_to_date()Index.writer()
- Implementation
- Exceptions
lang.morph_enmodulelang.portermodulelang.wordnetmodulematchingmodule- Matchers
MatcherMatcher.all_ids()Matcher.all_items()Matcher.block_quality()Matcher.children()Matcher.copy()Matcher.depth()Matcher.id()Matcher.is_active()Matcher.items_as()Matcher.matching_terms()Matcher.max_quality()Matcher.next()Matcher.replace()Matcher.reset()Matcher.score()Matcher.skip_to()Matcher.skip_to_quality()Matcher.spans()Matcher.supports()Matcher.supports_block_quality()Matcher.term()Matcher.term_matchers()Matcher.value()Matcher.value_as()Matcher.weight()
NullMatcherListMatcherWrappingMatcherMultiMatcherFilterMatcherBiMatcherAdditiveBiMatcherUnionMatcherDisjunctionMaxMatcherIntersectionMatcherAndNotMatcherInverseMatcherRequireMatcherAndMaybeMatcherConstantScoreMatcher
- Exceptions
- Matchers
qparsermodule- Parser object
QueryParserQueryParser.add_plugin()QueryParser.add_plugins()QueryParser.default_set()QueryParser.filterize()QueryParser.filters()QueryParser.multitoken_query()QueryParser.parse()QueryParser.process()QueryParser.remove_plugin()QueryParser.remove_plugin_class()QueryParser.replace_plugin()QueryParser.tag()QueryParser.taggers()QueryParser.term_query()
- Pre-made configurations
- Plug-ins
- Syntax node objects
- Parser object
querymodule- Base classes
QueryQuery.accept()Query.all_terms()Query.all_tokens()Query.apply()Query.children()Query.copy()Query.deletion_docs()Query.docs()Query.estimate_min_size()Query.estimate_size()Query.existing_terms()Query.field()Query.has_terms()Query.is_leaf()Query.is_range()Query.iter_all_terms()Query.leaves()Query.matcher()Query.normalize()Query.replace()Query.requires()Query.simplify()Query.terms()Query.tokens()Query.with_boost()
CompoundQueryMultiTermExpandingTermWrappingQuery
- Query classes
- Binary queries
- Span queries
- Special queries
- Exceptions
- Base classes
readingmodule- Classes
IndexReaderIndexReader.all_doc_ids()IndexReader.all_stored_fields()IndexReader.all_terms()IndexReader.close()IndexReader.codec()IndexReader.column_reader()IndexReader.corrector()IndexReader.doc_count()IndexReader.doc_count_all()IndexReader.doc_field_length()IndexReader.doc_frequency()IndexReader.expand_prefix()IndexReader.field_length()IndexReader.field_terms()IndexReader.first_id()IndexReader.frequency()IndexReader.generation()IndexReader.has_deletions()IndexReader.has_vector()IndexReader.indexed_field_names()IndexReader.is_deleted()IndexReader.iter_docs()IndexReader.iter_field()IndexReader.iter_from()IndexReader.iter_postings()IndexReader.iter_prefix()IndexReader.leaf_readers()IndexReader.lexicon()IndexReader.max_field_length()IndexReader.min_field_length()IndexReader.most_distinctive_terms()IndexReader.most_frequent_terms()IndexReader.postings()IndexReader.segment()IndexReader.storage()IndexReader.stored_fields()IndexReader.term_info()IndexReader.terms_from()IndexReader.terms_within()IndexReader.vector()IndexReader.vector_as()
MultiReaderTermInfo
- Exceptions
- Classes
scoringmodulesearchingmodule- Searching classes
SearcherSearcher.boolean_context()Searcher.collector()Searcher.context()Searcher.correct_query()Searcher.doc_count()Searcher.doc_count_all()Searcher.docs_for_query()Searcher.document()Searcher.document_number()Searcher.document_numbers()Searcher.documents()Searcher.get_parent()Searcher.idf()Searcher.key_terms()Searcher.key_terms_from_text()Searcher.more_like()Searcher.postings()Searcher.reader()Searcher.refresh()Searcher.search()Searcher.search_page()Searcher.search_with_collector()Searcher.suggest()Searcher.up_to_date()
- Results classes
ResultsResults.copy()Results.docnum()Results.docs()Results.estimated_length()Results.estimated_min_length()Results.extend()Results.facet_names()Results.fields()Results.filter()Results.groups()Results.has_exact_length()Results.has_matched_terms()Results.is_empty()Results.items()Results.key_terms()Results.matched_terms()Results.score()Results.scored_length()Results.upgrade()Results.upgrade_and_extend()
HitResultsPage
- Exceptions
- Searching classes
sortingmodulespellingmodulesupport.charsetmodulesupport.levenshteinmoduleutilmodulewritingmodule- Writer
IndexWriterIndexWriter.add_document()IndexWriter.add_field()IndexWriter.cancel()IndexWriter.commit()IndexWriter.delete_by_query()IndexWriter.delete_by_term()IndexWriter.delete_document()IndexWriter.end_group()IndexWriter.group()IndexWriter.reader()IndexWriter.remove_field()IndexWriter.start_group()IndexWriter.update_document()
- Utility writers
- Exceptions
- Writer
- Technical notes
Whoosh 0.3 release notes¶
Major improvements to reading/writing of postings and query performance.
Changed default post limit (run size) from 4 MB to 32 MB.
Finished migrating backend-specific code into
whoosh.filedbpackage.Moved formats from whoosh.fields module into new whoosh.formats module.
DocReader and TermReader classes combined into new IndexReader interface. You can get an IndexReader implementation by calling Index.reader(). Searcher is now a wrapper around an IndexReader.
Range query object changed, with new signature and new syntax in the default query parser. Now you can use
[start TO end]in the query parser for an inclusive range, and{start TO end}for an exclusive range. You can also mix the delimiters, for example[start TO end}for a range with an inclusive start but exclusive end term.Added experimental DATETIME field type lets you pass a
datetime.datetimeobject as a field value toadd_document:from whoosh.fields import Schema, ID, DATETIME from whoosh.filedb.filestore import RamStorage from datetime import datetime schema = Schema(id=ID, date=DATETIME) storage = RamStorage() ix = storage.create_index(schema) w = ix.writer() w.add_document(id=u"A", date=datetime.now()) w.close()
Internally, the DATETIME field indexes the datetime object as text using the format (4 digit year + 2 digit month + 2 digit day + ‘T’ + 2 digit hour + 2 digit minute + 2 digit second + 6 digit microsecond), for example
20090817T160203109000.The default query parser now lets you use quoted strings in prefix and range queries, e.g.
["2009-05" TO "2009-12"],"alfa/bravo"*, making it easier to work with terms containing special characters.DocReader.vector_as(docnum, fieldid, astype)is nowIndexReader.vector_as(astype, docnum, fieldid)(i.e. the astype argument has moved from the last to the first argument), e.g.v = ixreader.vector_as("frequency", 102, "content").Added whoosh.support.charset for translating Sphinx charset table files.
Added whoosh.analysis.CharsetTokenizer and CharsetFilter to enable case and accent folding.
Added experimental
whoosh.ramdbin-memory backend.Added experimental
whoosh.query.FuzzyTermquery type.Added
whoosh.lang.wordnetmodule containingThesaurusobject for using WordNet synonym database.