- Release notes
- Quick start
- Introduction to Whoosh
- Glossary
- Designing a schema
- How to index documents
- How to search
- Parsing user queries
- The default query language
- Indexing and parsing dates/times
- Query objects
- About analyzers
- Stemming, variations, and accent folding
- Indexing and searching N-grams
- Sorting and faceting
- How to create highlighted search result excerpts
- Query expansion and Key word extraction
- “Did you mean… ?” Correcting errors in user queries
- Field caches
- Tips for speeding up batch indexing
- Concurrency, locking, and versioning
- Indexing and searching document hierarchies
- Whoosh recipes
- Whoosh API
analysis
modulecodec.base
modulecollectors
module- Base classes
Collector
ScoredCollector
WrappingCollector
WrappingCollector.all_ids()
WrappingCollector.collect()
WrappingCollector.collect_matches()
WrappingCollector.count()
WrappingCollector.finish()
WrappingCollector.matches()
WrappingCollector.prepare()
WrappingCollector.remove()
WrappingCollector.results()
WrappingCollector.set_subsearcher()
WrappingCollector.sort_key()
- Basic collectors
- Wrappers
- Base classes
columns
modulefields
module- Schema class
- FieldType base class
FieldType
FieldType.clean()
FieldType.index()
FieldType.parse_query()
FieldType.parse_range()
FieldType.process_text()
FieldType.self_parsing()
FieldType.separate_spelling()
FieldType.sortable_terms()
FieldType.spellable_words()
FieldType.spelling_fieldname()
FieldType.subfields()
FieldType.supports()
FieldType.to_bytes()
FieldType.to_column_value()
FieldType.tokenize()
- Pre-made field types
- Exceptions
filedb.filestore
module- Base class
Storage
Storage.close()
Storage.create()
Storage.create_file()
Storage.create_index()
Storage.delete_file()
Storage.destroy()
Storage.file_exists()
Storage.file_length()
Storage.file_modified()
Storage.index_exists()
Storage.list()
Storage.lock()
Storage.open_file()
Storage.open_index()
Storage.optimize()
Storage.rename_file()
Storage.temp_storage()
- Implementation classes
- Helper functions
- Exceptions
- Base class
filedb.filetables
modulefiledb.structfile
module- Classes
StructFile
StructFile.close()
StructFile.flush()
StructFile.read_pickle()
StructFile.read_string()
StructFile.read_svarint()
StructFile.read_tagint()
StructFile.read_varint()
StructFile.write_byte()
StructFile.write_pickle()
StructFile.write_string()
StructFile.write_svarint()
StructFile.write_tagint()
StructFile.write_varint()
BufferFile
ChecksumFile
- Classes
formats
modulehighlight
modulesupport.bitvector
moduleindex
module- Functions
- Base class
Index
Index.add_field()
Index.close()
Index.doc_count()
Index.doc_count_all()
Index.field_length()
Index.is_empty()
Index.last_modified()
Index.latest_generation()
Index.max_field_length()
Index.optimize()
Index.reader()
Index.refresh()
Index.remove_field()
Index.searcher()
Index.up_to_date()
Index.writer()
- Implementation
- Exceptions
lang.morph_en
modulelang.porter
modulelang.wordnet
modulematching
module- Matchers
Matcher
Matcher.all_ids()
Matcher.all_items()
Matcher.block_quality()
Matcher.children()
Matcher.copy()
Matcher.depth()
Matcher.id()
Matcher.is_active()
Matcher.items_as()
Matcher.matching_terms()
Matcher.max_quality()
Matcher.next()
Matcher.replace()
Matcher.reset()
Matcher.score()
Matcher.skip_to()
Matcher.skip_to_quality()
Matcher.spans()
Matcher.supports()
Matcher.supports_block_quality()
Matcher.term()
Matcher.term_matchers()
Matcher.value()
Matcher.value_as()
Matcher.weight()
NullMatcher
ListMatcher
WrappingMatcher
MultiMatcher
FilterMatcher
BiMatcher
AdditiveBiMatcher
UnionMatcher
DisjunctionMaxMatcher
IntersectionMatcher
AndNotMatcher
InverseMatcher
RequireMatcher
AndMaybeMatcher
ConstantScoreMatcher
- Exceptions
- Matchers
qparser
module- Parser object
QueryParser
QueryParser.add_plugin()
QueryParser.add_plugins()
QueryParser.default_set()
QueryParser.filterize()
QueryParser.filters()
QueryParser.multitoken_query()
QueryParser.parse()
QueryParser.process()
QueryParser.remove_plugin()
QueryParser.remove_plugin_class()
QueryParser.replace_plugin()
QueryParser.tag()
QueryParser.taggers()
QueryParser.term_query()
- Pre-made configurations
- Plug-ins
- Syntax node objects
- Parser object
query
module- Base classes
Query
Query.accept()
Query.all_terms()
Query.all_tokens()
Query.apply()
Query.children()
Query.copy()
Query.deletion_docs()
Query.docs()
Query.estimate_min_size()
Query.estimate_size()
Query.existing_terms()
Query.field()
Query.has_terms()
Query.is_leaf()
Query.is_range()
Query.iter_all_terms()
Query.leaves()
Query.matcher()
Query.normalize()
Query.replace()
Query.requires()
Query.simplify()
Query.terms()
Query.tokens()
Query.with_boost()
CompoundQuery
MultiTerm
ExpandingTerm
WrappingQuery
- Query classes
- Binary queries
- Span queries
- Special queries
- Exceptions
- Base classes
reading
module- Classes
IndexReader
IndexReader.all_doc_ids()
IndexReader.all_stored_fields()
IndexReader.all_terms()
IndexReader.close()
IndexReader.codec()
IndexReader.column_reader()
IndexReader.corrector()
IndexReader.doc_count()
IndexReader.doc_count_all()
IndexReader.doc_field_length()
IndexReader.doc_frequency()
IndexReader.expand_prefix()
IndexReader.field_length()
IndexReader.field_terms()
IndexReader.first_id()
IndexReader.frequency()
IndexReader.generation()
IndexReader.has_deletions()
IndexReader.has_vector()
IndexReader.indexed_field_names()
IndexReader.is_deleted()
IndexReader.iter_docs()
IndexReader.iter_field()
IndexReader.iter_from()
IndexReader.iter_postings()
IndexReader.iter_prefix()
IndexReader.leaf_readers()
IndexReader.lexicon()
IndexReader.max_field_length()
IndexReader.min_field_length()
IndexReader.most_distinctive_terms()
IndexReader.most_frequent_terms()
IndexReader.postings()
IndexReader.segment()
IndexReader.storage()
IndexReader.stored_fields()
IndexReader.term_info()
IndexReader.terms_from()
IndexReader.terms_within()
IndexReader.vector()
IndexReader.vector_as()
MultiReader
TermInfo
- Exceptions
- Classes
scoring
modulesearching
module- Searching classes
Searcher
Searcher.boolean_context()
Searcher.collector()
Searcher.context()
Searcher.correct_query()
Searcher.doc_count()
Searcher.doc_count_all()
Searcher.docs_for_query()
Searcher.document()
Searcher.document_number()
Searcher.document_numbers()
Searcher.documents()
Searcher.get_parent()
Searcher.idf()
Searcher.key_terms()
Searcher.key_terms_from_text()
Searcher.more_like()
Searcher.postings()
Searcher.reader()
Searcher.refresh()
Searcher.search()
Searcher.search_page()
Searcher.search_with_collector()
Searcher.suggest()
Searcher.up_to_date()
- Results classes
Results
Results.copy()
Results.docnum()
Results.docs()
Results.estimated_length()
Results.estimated_min_length()
Results.extend()
Results.facet_names()
Results.fields()
Results.filter()
Results.groups()
Results.has_exact_length()
Results.has_matched_terms()
Results.is_empty()
Results.items()
Results.key_terms()
Results.matched_terms()
Results.score()
Results.scored_length()
Results.upgrade()
Results.upgrade_and_extend()
Hit
ResultsPage
- Exceptions
- Searching classes
sorting
modulespelling
modulesupport.charset
modulesupport.levenshtein
moduleutil
modulewriting
module- Writer
IndexWriter
IndexWriter.add_document()
IndexWriter.add_field()
IndexWriter.cancel()
IndexWriter.commit()
IndexWriter.delete_by_query()
IndexWriter.delete_by_term()
IndexWriter.delete_document()
IndexWriter.end_group()
IndexWriter.group()
IndexWriter.reader()
IndexWriter.remove_field()
IndexWriter.start_group()
IndexWriter.update_document()
- Utility writers
- Exceptions
- Writer
- Technical notes
Introduction to Whoosh¶
About Whoosh¶
Whoosh was created by Matt Chaput. It started as a quick and dirty search server for the online documentation of the Houdini 3D animation software package. Side Effects Software generously allowed Matt to open source the code in case it might be useful to anyone else who needs a very flexible or pure-Python search engine (or both!).
- Whoosh is fast, but uses only pure Python, so it will run anywhere Python runs, without requiring a compiler.
- By default, Whoosh uses the Okapi BM25F ranking function, but like most things the ranking function can be easily customized.
- Whoosh creates fairly small indexes compared to many other search libraries.
- All indexed text in Whoosh must be unicode.
- Whoosh lets you store arbitrary Python objects with indexed documents.
What is Whoosh?¶
Whoosh is a fast, pure Python search engine library.
The primary design impetus of Whoosh is that it is pure Python. You should be able to use Whoosh anywhere you can use Python, no compiler or Java required.
Like one of its ancestors, Lucene, Whoosh is not really a search engine, it’s a programmer library for creating a search engine [1].
Practically no important behavior of Whoosh is hard-coded. Indexing of text, the level of information stored for each term in each field, parsing of search queries, the types of queries allowed, scoring algorithms, etc. are all customizable, replaceable, and extensible.
[1] | It would of course be possible to build a turnkey search engine on top of Whoosh, like Nutch and Solr use Lucene. |
What can Whoosh do for you?¶
Whoosh lets you index free-form or structured text and then quickly find matching documents based on simple or complex search criteria.
Getting help with Whoosh¶
You can view outstanding issues on the Whoosh Bitbucket page and get help on the Whoosh mailing list.