No description
  • Python 97.9%
  • Makefile 2.1%
Find a file
Glenn c103d82807 feat(document): Enhance document metadata with source information
This change improves the traceability of added documents by
automatically enriching their metadata with details about their origin.
This ensures that regardless of how a document is added (via text input
or file upload), its source type and, if applicable, its source name are
consistently recorded. This prevents ambiguity about the document's
provenance, which could otherwise lead to difficulties in debugging or
auditing document ingestion processes.

- Introduce \_source_type metadata: Automatically adds "\_source_type"
  to document metadata, indicating whether the document originated from
  "file" or "text" input.
- Include \_source_name for file inputs: When a document is added from a
  file, "\_source_name" metadata is added, storing the path of the
  source file.
- Prioritize generated metadata: Ensures that automatically generated
  source metadata (\_source_type, \_source_name) overrides any
  user-provided metadata with the same keys, maintaining data integrity.
- Return metadata in add document response: The svc_add_document
  function now includes the full metadata dictionary in its return
  payload, allowing immediate verification of the added metadata.
- Update query service to include all document fields: The
  search_documents function now explicitly requests "documents",
  "metadatas", and "distances" in its include list, ensuring
  comprehensive results.
- Update FakeCollection query method: The test utility
  FakeCollection.query method now accepts an 'include' parameter to
  align with the updated query service.
- Add comprehensive tests for origin metadata: New tests verify the
  correct population and precedence of \_source_type and \_source_name
  metadata for both text and file-based document additions.

Signed-off-by: Glenn <glenux@glenux.net>
2026-04-26 21:37:01 +02:00
docs docs: Introduce comprehensive CONTRIBUTING and README 2026-04-26 16:10:19 +02:00
src/chroma_toolkit feat(document): Enhance document metadata with source information 2026-04-26 21:37:01 +02:00
tests feat(document): Enhance document metadata with source information 2026-04-26 21:37:01 +02:00
.gitignore feat(typecheck): Add mypy type checking to dev workflow 2026-04-26 12:28:28 +02:00
CONTRIBUTING.md docs: Introduce comprehensive CONTRIBUTING and README 2026-04-26 16:10:19 +02:00
Makefile feat(cli): Introduce flexible output formatting (YAML, JSON, TOML) 2026-04-26 14:09:42 +02:00
pyproject.toml feat(cli): Introduce flexible output formatting (YAML, JSON, TOML) 2026-04-26 14:09:42 +02:00
README.md docs: Introduce comprehensive CONTRIBUTING and README 2026-04-26 16:10:19 +02:00
uv.lock feat(cli): Introduce flexible output formatting (YAML, JSON, TOML) 2026-04-26 14:09:42 +02:00

chroma-toolkit

Python CLI toolkit for ChromaDB operations, with a stable output contract for automation.

Project health

  • Status: v0.1.0.
  • Python support: >=3.10.
  • Backends: local (PersistentClient) and http (HttpClient).
  • Stability target: command behavior and json output shape should stay script-friendly.

What this tool is for

chroma-toolkit helps you manage Chroma resources from the command line:

  • server inspection (heartbeat, version)
  • collection lifecycle (list, create, info, delete, update)
  • document lifecycle (add, get, list, delete, count)
  • semantic search (query search)
  • plugin registry inspection (plugin list)

Prerequisites

  • Python >=3.10
  • uv installed

Installation

uv sync

Run the CLI with:

uv run chroma-toolkit --help

Quickstart (local backend)

uv run chroma-toolkit --path ./.chroma server heartbeat --output json
uv run chroma-toolkit --path ./.chroma collection create --name hub --get-or-create --output json
uv run chroma-toolkit --path ./.chroma document add --collection hub --id doc-1 --document "How to deploy" --output json
uv run chroma-toolkit --path ./.chroma document get --collection hub --id doc-1 --output json
uv run chroma-toolkit --path ./.chroma query search --collection hub --text "deploy" --n-results 3 --output json
uv run chroma-toolkit --path ./.chroma plugin list --output json

Quickstart (HTTP backend)

Use this when your Chroma server is running remotely or in another process.

uv run chroma-toolkit --backend http --host 127.0.0.1 --port 8000 server version --output json
uv run chroma-toolkit --backend http --host 127.0.0.1 --port 8000 collection list --output json

Command overview

Global options:

  • --path (default ./.chroma)
  • --backend {local,http} (default local)
  • --host (default 127.0.0.1, used by http backend)
  • --port (default 8000, used by http backend)

Command groups:

  • server: heartbeat, version
  • collection: list, create, info, delete, update
  • document: add, get, list, delete, count
  • query: search
  • plugin: list

Output contract

--output is a command option (not a global option).

  • Accepted formats: yaml (default), json, toml
  • --output can be provided only once per command

Examples:

uv run chroma-toolkit collection list --output yaml
uv run chroma-toolkit collection list --output json
uv run chroma-toolkit collection list --output toml

All formats use the same logical envelope:

  • Success: {"ok": true, "data": <payload>, "error": null}
  • Error: {"ok": false, "data": null, "error": {"code": <int>, "message": <str>}}

Notes:

  • In toml, null is rendered as the string "null".
  • Unknown backend/runtime failures are prefixed with backend: .

Exit codes

  • 0: success
  • 2: invalid CLI usage (Click)
  • 3: backend/runtime error
  • 4: resource not found
  • 5: validation error

Known limits (v0.1)

  • plugin list is available, but end-to-end ingestion/chunking pipeline is not finalized.
  • collection list sorts by name and applies limit/offset in CLI code for deterministic output.
  • document delete --where may return deleted_count: null when backend delete count is unavailable.

Troubleshooting

  • No such option: --output at root:
    • --output must be set on the command itself, for example server version --output json.
  • backend: embedding function unavailable... on query search:
    • configure an embedding function on the target collection/backend.
  • Collection not found: <name>:
    • check backend selection (local vs http) and collection name.

Contributing

See CONTRIBUTING.md for setup, quality checks, and pull request rules.