No description
- Python 97.9%
- Makefile 2.1%
This change improves the traceability of added documents by automatically enriching their metadata with details about their origin. This ensures that regardless of how a document is added (via text input or file upload), its source type and, if applicable, its source name are consistently recorded. This prevents ambiguity about the document's provenance, which could otherwise lead to difficulties in debugging or auditing document ingestion processes. - Introduce \_source_type metadata: Automatically adds "\_source_type" to document metadata, indicating whether the document originated from "file" or "text" input. - Include \_source_name for file inputs: When a document is added from a file, "\_source_name" metadata is added, storing the path of the source file. - Prioritize generated metadata: Ensures that automatically generated source metadata (\_source_type, \_source_name) overrides any user-provided metadata with the same keys, maintaining data integrity. - Return metadata in add document response: The svc_add_document function now includes the full metadata dictionary in its return payload, allowing immediate verification of the added metadata. - Update query service to include all document fields: The search_documents function now explicitly requests "documents", "metadatas", and "distances" in its include list, ensuring comprehensive results. - Update FakeCollection query method: The test utility FakeCollection.query method now accepts an 'include' parameter to align with the updated query service. - Add comprehensive tests for origin metadata: New tests verify the correct population and precedence of \_source_type and \_source_name metadata for both text and file-based document additions. Signed-off-by: Glenn <glenux@glenux.net> |
||
|---|---|---|
| docs | ||
| src/chroma_toolkit | ||
| tests | ||
| .gitignore | ||
| CONTRIBUTING.md | ||
| Makefile | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
chroma-toolkit
Python CLI toolkit for ChromaDB operations, with a stable output contract for automation.
Project health
- Status:
v0.1.0. - Python support:
>=3.10. - Backends:
local(PersistentClient) andhttp(HttpClient). - Stability target: command behavior and
jsonoutput shape should stay script-friendly.
What this tool is for
chroma-toolkit helps you manage Chroma resources from the command line:
- server inspection (
heartbeat,version) - collection lifecycle (
list,create,info,delete,update) - document lifecycle (
add,get,list,delete,count) - semantic search (
query search) - plugin registry inspection (
plugin list)
Prerequisites
- Python
>=3.10 uvinstalled
Installation
uv sync
Run the CLI with:
uv run chroma-toolkit --help
Quickstart (local backend)
uv run chroma-toolkit --path ./.chroma server heartbeat --output json
uv run chroma-toolkit --path ./.chroma collection create --name hub --get-or-create --output json
uv run chroma-toolkit --path ./.chroma document add --collection hub --id doc-1 --document "How to deploy" --output json
uv run chroma-toolkit --path ./.chroma document get --collection hub --id doc-1 --output json
uv run chroma-toolkit --path ./.chroma query search --collection hub --text "deploy" --n-results 3 --output json
uv run chroma-toolkit --path ./.chroma plugin list --output json
Quickstart (HTTP backend)
Use this when your Chroma server is running remotely or in another process.
uv run chroma-toolkit --backend http --host 127.0.0.1 --port 8000 server version --output json
uv run chroma-toolkit --backend http --host 127.0.0.1 --port 8000 collection list --output json
Command overview
Global options:
--path(default./.chroma)--backend {local,http}(defaultlocal)--host(default127.0.0.1, used byhttpbackend)--port(default8000, used byhttpbackend)
Command groups:
server:heartbeat,versioncollection:list,create,info,delete,updatedocument:add,get,list,delete,countquery:searchplugin:list
Output contract
--output is a command option (not a global option).
- Accepted formats:
yaml(default),json,toml --outputcan be provided only once per command
Examples:
uv run chroma-toolkit collection list --output yaml
uv run chroma-toolkit collection list --output json
uv run chroma-toolkit collection list --output toml
All formats use the same logical envelope:
- Success:
{"ok": true, "data": <payload>, "error": null} - Error:
{"ok": false, "data": null, "error": {"code": <int>, "message": <str>}}
Notes:
- In
toml,nullis rendered as the string"null". - Unknown backend/runtime failures are prefixed with
backend:.
Exit codes
0: success2: invalid CLI usage (Click)3: backend/runtime error4: resource not found5: validation error
Known limits (v0.1)
plugin listis available, but end-to-end ingestion/chunking pipeline is not finalized.collection listsorts by name and applieslimit/offsetin CLI code for deterministic output.document delete --wheremay returndeleted_count: nullwhen backend delete count is unavailable.
Troubleshooting
No such option: --outputat root:--outputmust be set on the command itself, for exampleserver version --output json.
backend: embedding function unavailable...onquery search:- configure an embedding function on the target collection/backend.
Collection not found: <name>:- check backend selection (
localvshttp) and collection name.
- check backend selection (
Contributing
See CONTRIBUTING.md for setup, quality checks, and pull request rules.