Python API Reference
Complete reference for the vectlite Python package.
import vectlite
Module Functions
open
vectlite.open(
path: str,
dimension: int | None = None,
read_only: bool = False,
) -> Database
Open or create a vectlite database.
| Parameter | Type | Description |
|---|---|---|
path | str | Path to the .vdb file. Created if it does not exist. |
dimension | int | None | Vector dimension. Required when creating a new database. Omit when opening an existing one (the stored dimension is used). |
read_only | bool | Open in read-only mode. Uses shared file locks so multiple readers can access the same file. Write operations raise VectLiteError. |
Returns: Database
open_store
vectlite.open_store(root: str) -> Store
Open or create a collection store (a directory of independent databases).
| Parameter | Type | Description |
|---|---|---|
root | str | Path to the directory that holds the collections. Created if it does not exist. |
Returns: Store
restore
vectlite.restore(source: str, dest: str) -> Database
Restore a backup to a new database path.
| Parameter | Type | Description |
|---|---|---|
source | str | Path to the backup directory (created by Database.backup()). |
dest | str | Path where the restored .vdb file will be written. |
Returns: Database -- the restored database, opened for read-write.
sparse_terms
vectlite.sparse_terms(text: str) -> dict[str, float]
Tokenize and weight text into a sparse term vector suitable for BM25 search.
| Parameter | Type | Description |
|---|---|---|
text | str | Input text to analyse. |
Returns: dict[str, float] -- mapping of terms to their weights.
upsert_text
vectlite.upsert_text(
db: Database,
id: str,
text: str,
embed: Callable[[str], list[float]],
metadata: Metadata | None = None,
namespace: str | None = None,
) -> None
High-level helper that generates a dense embedding and sparse terms from text, then upserts the record.
| Parameter | Type | Description |
|---|---|---|
db | Database | Target database. |
id | str | Record identifier. |
text | str | Text to embed and index. |
embed | Callable[[str], list[float]] | Function that converts text to a dense vector. |
metadata | Metadata | None | Optional metadata dict. |
namespace | str | None | Optional namespace. |
search_text
vectlite.search_text(
db: Database,
query: str,
embed: Callable[[str], list[float]],
*,
k: int = 10,
filter: Filter | None = None,
namespace: str | None = None,
all_namespaces: bool = False,
dense_weight: float = 1.0,
sparse_weight: float = 1.0,
fetch_k: int = 0,
mmr_lambda: float | None = None,
vector_name: str | None = None,
fusion: str = "linear",
rrf_k: int = 60,
explain: bool = False,
rerank: RerankHook | None = None,
rerank_k: int = 0,
) -> list[SearchResult]
High-level hybrid search. Generates a dense embedding and sparse terms from query, then runs a fused search.
| Parameter | Type | Default | Description |
|---|---|---|---|
db | Database | -- | Target database. |
query | str | -- | Natural-language query. |
embed | Callable[[str], list[float]] | -- | Function that converts text to a dense vector. |
k | int | 10 | Number of results to return. |
filter | Filter | None | None | MongoDB-style metadata filter. |
namespace | str | None | None | Restrict to a single namespace. |
all_namespaces | bool | False | Search across all namespaces. |
dense_weight | float | 1.0 | Weight for the dense score component. |
sparse_weight | float | 1.0 | Weight for the sparse (BM25) score component. |
fetch_k | int | 0 | Number of candidates to fetch before re-ranking. 0 uses the engine default. |
mmr_lambda | float | None | None | Maximal Marginal Relevance diversity parameter (0 = max diversity, 1 = max relevance). None disables MMR. |
vector_name | str | None | None | Search a specific named vector space. |
fusion | str | "linear" | Fusion strategy: "linear" or "rrf". |
rrf_k | int | 60 | RRF smoothing constant (only used when fusion="rrf"). |
explain | bool | False | Include scoring breakdown in results. |
rerank | RerankHook | None | None | Optional reranker function. See vectlite.rerankers. |
rerank_k | int | 0 | Number of candidates to pass to the reranker. 0 uses fetch_k. |
Returns: list[SearchResult]
search_text_with_stats
vectlite.search_text_with_stats(
db: Database,
query: str,
embed: Callable[[str], list[float]],
*,
# same parameters as search_text
) -> SearchResponse
Same as search_text but returns a SearchResponse containing both results and query statistics.
Database
Returned by open() and restore().
Properties
| Property | Type | Description |
|---|---|---|
path | str | Absolute path to the .vdb file. |
wal_path | str | Path to the write-ahead log file. |
dimension | int | Vector dimension for this database. |
read_only | bool | Whether the database was opened in read-only mode. |
count
db.count(namespace: str | None = None) -> int
Return the number of records, optionally scoped to a namespace.
namespaces
db.namespaces() -> list[str]
Return a list of all namespaces present in the database.
transaction
db.transaction() -> Transaction
Begin a new transaction. Use as a context manager for automatic commit/rollback:
with db.transaction() as tx:
tx.upsert("id", vector, metadata)
Returns: Transaction
insert
db.insert(
id: str,
vector: list[float],
metadata: Metadata | None = None,
*,
namespace: str | None = None,
sparse: dict[str, float] | None = None,
vectors: dict[str, list[float]] | None = None,
) -> None
Insert a new record. Raises VectLiteError if a record with the same id (and namespace) already exists.
| Parameter | Type | Description |
|---|---|---|
id | str | Record identifier. |
vector | list[float] | Dense embedding vector. |
metadata | Metadata | None | Optional metadata dict. |
namespace | str | None | Target namespace. |
sparse | dict[str, float] | None | Sparse term vector for BM25 search. |
vectors | dict[str, list[float]] | None | Additional named vectors. |
upsert
db.upsert(
id: str,
vector: list[float],
metadata: Metadata | None = None,
*,
namespace: str | None = None,
sparse: dict[str, float] | None = None,
vectors: dict[str, list[float]] | None = None,
) -> None
Insert or update a record. If the id already exists the record is replaced.
Parameters are identical to insert().
insert_many
db.insert_many(
records: list[Record],
*,
namespace: str | None = None,
) -> int
Batch insert multiple records. Raises on duplicate IDs.
| Parameter | Type | Description |
|---|---|---|
records | list[Record] | List of record dicts with keys id, vector, and optionally metadata, sparse, vectors. |
namespace | str | None | Target namespace for all records. |
Returns: int -- number of records inserted.
upsert_many
db.upsert_many(
records: list[Record],
*,
namespace: str | None = None,
) -> int
Batch upsert multiple records.
Returns: int -- number of records upserted.
bulk_ingest
db.bulk_ingest(
records: Iterable[Record],
*,
namespace: str | None = None,
batch_size: int = 1000,
on_progress: Callable[[int], None] | None = None,
) -> int
Stream large datasets into the database in batches. Automatically commits every batch_size records.
| Parameter | Type | Description |
|---|---|---|
records | Iterable[Record] | An iterable (or generator) of record dicts. |
namespace | str | None | Target namespace. |
batch_size | int | Commit every N records. |
on_progress | Callable[[int], None] | None | Called after each batch with the cumulative count. |
Returns: int -- total number of records ingested.
get
db.get(
id: str,
*,
namespace: str | None = None,
) -> Record | None
Retrieve a record by ID. Returns None if not found.
delete
db.delete(
id: str,
*,
namespace: str | None = None,
) -> bool
Delete a record by ID. Returns True if the record existed and was deleted.
delete_many
db.delete_many(
ids: list[str],
*,
namespace: str | None = None,
) -> int
Delete multiple records by ID. Returns the number of records actually deleted.
flush
db.flush() -> None
Flush pending writes to the WAL. Data is durable after this call but not yet compacted into the main file.
compact
db.compact() -> None
Merge the WAL into the main .vdb file and rebuild ANN indexes if necessary. Call this periodically or after large batch writes.
snapshot
db.snapshot(dest: str) -> None
Create a self-contained copy of the database at dest. Includes all committed data (call compact() first to include WAL entries).
backup
db.backup(dest: str) -> None
Full backup: copies the .vdb file and all ANN sidecar files to the dest directory.
search
db.search(
vector: list[float] | None = None,
*,
k: int = 10,
filter: Filter | None = None,
namespace: str | None = None,
all_namespaces: bool = False,
sparse: dict[str, float] | None = None,
dense_weight: float = 1.0,
sparse_weight: float = 1.0,
fusion: str = "linear",
rrf_k: int = 60,
fetch_k: int = 0,
mmr_lambda: float | None = None,
vector_name: str | None = None,
query_vectors: dict[str, list[float]] | None = None,
vector_weights: dict[str, float] | None = None,
explain: bool = False,
rerank: RerankHook | None = None,
rerank_k: int = 0,
) -> list[SearchResult]
Run a search query. Supports dense, sparse, hybrid, and multi-vector search modes.
| Parameter | Type | Default | Description |
|---|---|---|---|
vector | list[float] | None | None | Dense query vector. Pass None for sparse-only or multi-vector search. |
k | int | 10 | Number of results to return. |
filter | Filter | None | None | MongoDB-style metadata filter. |
namespace | str | None | None | Restrict to a namespace. |
all_namespaces | bool | False | Search all namespaces. |
sparse | dict[str, float] | None | None | Sparse term vector for keyword search. |
dense_weight | float | 1.0 | Weight for the dense component in hybrid search. |
sparse_weight | float | 1.0 | Weight for the sparse component in hybrid search. |
fusion | str | "linear" | Fusion strategy: "linear" or "rrf". |
rrf_k | int | 60 | RRF smoothing constant. |
fetch_k | int | 0 | Number of candidates to retrieve before reranking. 0 uses the engine default. |
mmr_lambda | float | None | None | MMR diversity parameter. None disables MMR. |
vector_name | str | None | None | Search a specific named vector space. |
query_vectors | dict[str, list[float]] | None | None | Named query vectors for multi-vector search. |
vector_weights | dict[str, float] | None | None | Weights for multi-vector search. |
explain | bool | False | Include scoring breakdown in results. |
rerank | RerankHook | None | None | Reranker function. |
rerank_k | int | 0 | Candidates to pass to the reranker. |
Returns: list[SearchResult]
search_with_stats
db.search_with_stats(
# same parameters as search()
) -> SearchResponse
Same as search() but returns a SearchResponse with results and query statistics.
Store
Returned by open_store(). Manages a directory of independent database collections.
Properties
| Property | Type | Description |
|---|---|---|
root | str | Absolute path to the store directory. |
collections
store.collections() -> list[str]
List all collection names in the store.
create_collection
store.create_collection(name: str, dimension: int) -> Database
Create a new collection. Raises VectLiteError if it already exists.
open_or_create_collection
store.open_or_create_collection(name: str, dimension: int) -> Database
Open an existing collection or create a new one.
open_collection
store.open_collection(name: str) -> Database
Open an existing collection. Raises VectLiteError if it does not exist.
drop_collection
store.drop_collection(name: str) -> None
Delete a collection and all its data from disk.
Transaction
Returned by Database.transaction(). Supports use as a context manager (with statement) for automatic commit on success and rollback on exception.
Context Manager
with db.transaction() as tx:
tx.upsert("id", vector, metadata)
# auto-commits here; rolls back on exception
insert
tx.insert(
id: str,
vector: list[float],
metadata: Metadata | None = None,
*,
namespace: str | None = None,
sparse: dict[str, float] | None = None,
vectors: dict[str, list[float]] | None = None,
) -> None
Queue an insert within the transaction.
upsert
tx.upsert(
id: str,
vector: list[float],
metadata: Metadata | None = None,
*,
namespace: str | None = None,
sparse: dict[str, float] | None = None,
vectors: dict[str, list[float]] | None = None,
) -> None
Queue an upsert within the transaction.
insert_many
tx.insert_many(
records: list[Record],
*,
namespace: str | None = None,
) -> int
Queue a batch insert. Returns the number of records queued.
upsert_many
tx.upsert_many(
records: list[Record],
*,
namespace: str | None = None,
) -> int
Queue a batch upsert. Returns the number of records queued.
delete
tx.delete(
id: str,
*,
namespace: str | None = None,
) -> None
Queue a delete within the transaction.
commit
tx.commit() -> None
Commit all queued operations atomically.
rollback
tx.rollback() -> None
Discard all queued operations.
__len__
len(tx) -> int
Return the number of queued operations in the transaction.
Types
MetadataValue
MetadataValue = str | int | float | bool | None | list | dict
A single metadata field value.
Metadata
Metadata = dict[str, MetadataValue]
A metadata dictionary attached to a record.
Filter
Filter = dict[str, Any]
MongoDB-style filter expression. See Metadata Filters for the full query syntax.
RerankHook
RerankHook = Callable[[str, list[SearchResult]], list[SearchResult]]
A function that receives the query string and a list of candidate results, and returns a reordered list. Used with the rerank parameter.
Record
class Record(TypedDict, total=False):
id: str # Required
vector: list[float] # Required
metadata: Metadata
sparse: dict[str, float]
vectors: dict[str, list[float]]
namespace: str
score: float # Present in search results
A record dictionary. Used for batch operations and returned by get().
SearchResult
class SearchResult(TypedDict):
id: str
score: float
metadata: Metadata
namespace: str
explain: ExplainDetails | None
A single search result.
ExplainDetails
class ExplainDetails(TypedDict):
dense_score: float
sparse_score: float
fused_score: float
rerank_score: float | None
Scoring breakdown when explain=True.
SearchStats
class SearchStats(TypedDict):
candidates_evaluated: int
index_type: str # "hnsw" | "flat"
timings: SearchTimings
Engine statistics for a search query.
SearchTimings
class SearchTimings(TypedDict):
total_ms: float
index_ms: float
filter_ms: float
rank_ms: float
rerank_ms: float | None
Timing breakdown in milliseconds.
SearchResponse
class SearchResponse(TypedDict):
results: list[SearchResult]
stats: SearchStats
Returned by search_with_stats() and search_text_with_stats().
Exceptions
VectLiteError
class VectLiteError(Exception): ...
Base exception for all vectlite errors. Raised for:
- Write operations on a read-only database
- Duplicate ID on
insert() - Dimension mismatch
- Corrupt database file
- File lock contention
- I/O errors
Sub-modules
vectlite.analyzers
Text analysis utilities for customizing sparse tokenization.
Analyzer
class Analyzer:
def __init__(
self,
*,
lowercase: bool = True,
stopwords: set[str] | None = None,
stemmer: str | None = None,
min_token_length: int = 1,
max_token_length: int = 40,
) -> None: ...
def tokenize(self, text: str) -> list[str]: ...
def term_frequencies(self, text: str) -> dict[str, float]: ...
| Parameter | Type | Default | Description |
|---|---|---|---|
lowercase | bool | True | Lowercase tokens before indexing. |
stopwords | set[str] | None | None | Set of stopwords to remove. Use the built-in constants or provide your own. |
stemmer | str | None | None | Stemmer algorithm name (e.g. "english", "french"). None disables stemming. |
min_token_length | int | 1 | Discard tokens shorter than this. |
max_token_length | int | 40 | Discard tokens longer than this. |
Methods:
tokenize(text)-- returns a list of processed tokens.term_frequencies(text)-- returns a term-frequency dict suitable for use as a sparse vector.
Constants
vectlite.analyzers.ENGLISH_STOPWORDS: frozenset[str]
vectlite.analyzers.FRENCH_STOPWORDS: frozenset[str]
Pre-built stopword sets.
vectlite.rerankers
Composable reranking functions for search post-processing.
text_match
vectlite.rerankers.text_match(
field: str = "text",
weight: float = 1.0,
) -> RerankHook
Boost results where the query appears as a substring in the given metadata field.
metadata_boost
vectlite.rerankers.metadata_boost(
field: str,
values: dict[str, float],
default: float = 0.0,
) -> RerankHook
Adjust scores based on a metadata field value. The values dict maps field values to score multipliers.
cross_encoder
vectlite.rerankers.cross_encoder(
model: Any,
query_field: str | None = None,
doc_field: str = "text",
) -> RerankHook
Rerank using a cross-encoder model (e.g. from sentence-transformers). The model must implement a predict(pairs) method.
| Parameter | Type | Description |
|---|---|---|
model | Any | A cross-encoder model with a predict() method. |
query_field | str | None | Metadata field for the query text. None uses the raw query string. |
doc_field | str | Metadata field containing the document text. |
bi_encoder
vectlite.rerankers.bi_encoder(
model: Any,
doc_field: str = "text",
) -> RerankHook
Rerank using a bi-encoder model. The model must implement an encode(texts) method.
compose
vectlite.rerankers.compose(*hooks: RerankHook) -> RerankHook
Chain multiple rerankers in sequence. Each hook receives the output of the previous one.
reranker = vectlite.rerankers.compose(
vectlite.rerankers.text_match("text", weight=0.5),
vectlite.rerankers.metadata_boost("priority", {"high": 2.0, "low": 0.5}),
)
results = db.search(query_emb, k=10, rerank=reranker, rerank_k=50)