Examples

Typical usage: configure → build an index from a matrix of embeddings → search → persist → reload. This mirrors how you might call DCEE from a FastAPI worker or batch job.

Quick start

import numpy as np
from dcee import DCEEConfig, DCEEEngine, is_gpu_available

print("GPU:", is_gpu_available())

emb = np.random.randn(10_000, 128).astype(np.float32)
emb /= np.linalg.norm(emb, axis=1, keepdims=True)

cfg = DCEEConfig.tuned_for(len(emb), emb.shape[1])
engine = DCEEEngine(cfg)
engine.build(emb)

q = emb[0]
for idx, score in engine.search(q, top_k=5):
    print(idx, score)

engine.save("index.dce2")

loaded = DCEEEngine.from_file("index.dce2")
print(loaded.search(q, top_k=3))

Full signatures and more snippets: API reference.

Notes

DCEEConfig.tuned_for(n_vectors, dim) sets heuristic defaults for cluster count, probes, refinement pool, and keyframe spacing.
search returns a list of (global_row_index, cosine_similarity) pairs (vectors are treated as unit-normalized for scoring).
Use DCEEEngine.from_file(path, verbose=False) in production to avoid extra logging during load.
Set cfg.verbose = False for quieter builds and tqdm progress.

Repository scripts

The main project ships helper scripts (run from the repo root after dev install):

benchmark_dcee.py — compare DCEE to FAISS baselines on the same queries.
tune_dcee.py — random search over hyperparameters (recall vs latency).
test_realworld_dcee.py — synthetic or sentence-transformer corpora.