Skip to content

Commit

Permalink
[ENH] [MEGAPR] Blockfile refactor, compactor job, record segment (#2052)
Browse files Browse the repository at this point in the history
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - /
 - New functionality
- This PR contains a very large refactor the blockfile code to make it
strongly typed and zero-copy. It is a lot of prototype-quality code. I
am getting it merged to test on staging but subsequently will clean this
all up. We should not normalize this and its an extreme deviance from
the norm.

## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
  • Loading branch information
HammadB authored May 1, 2024
1 parent f91ea3d commit eb18e5c
Show file tree
Hide file tree
Showing 98 changed files with 7,807 additions and 4,176 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
exclude: 'chromadb/proto/(chroma_pb2|coordinator_pb2|logservice_pb2)\.(py|pyi|py_grpc\.py)' # Generated files
exclude: 'chromadb/proto/(chroma_pb2|coordinator_pb2|logservice_pb2|chroma_pb2_grpc|coordinator_pb2_grpc|logservice_pb2_grpc)\.(py|pyi)' # Generated files
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
Expand Down
4 changes: 2 additions & 2 deletions chromadb/api/segment.py
Original file line number Diff line number Diff line change
Expand Up @@ -692,9 +692,8 @@ def _query(
for embedding in query_embeddings:
self._validate_dimension(coll, len(embedding), update=False)

metadata_reader = self._manager.get_segment(collection_id, MetadataReader)

if where or where_document:
metadata_reader = self._manager.get_segment(collection_id, MetadataReader)
records = metadata_reader.get_metadata(
where=where, where_document=where_document
)
Expand Down Expand Up @@ -729,6 +728,7 @@ def _query(
all_ids: Set[str] = set()
for id_list in ids:
all_ids.update(id_list)
metadata_reader = self._manager.get_segment(collection_id, MetadataReader)
records = metadata_reader.get_metadata(ids=list(all_ids))
metadata_by_id = {r["id"]: r["metadata"] for r in records}
for id_list in ids:
Expand Down
45 changes: 22 additions & 23 deletions chromadb/proto/chroma_pb2.py

Large diffs are not rendered by default.

88 changes: 46 additions & 42 deletions chromadb/proto/chroma_pb2.pyi

Large diffs are not rendered by default.

208 changes: 91 additions & 117 deletions chromadb/proto/chroma_pb2_grpc.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions chromadb/proto/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@


# TODO: Unit tests for this file, handling optional states etc


def to_proto_vector(vector: Vector, encoding: ScalarEncoding) -> proto.Vector:
if encoding == ScalarEncoding.FLOAT32:
as_bytes = array.array("f", vector).tobytes()
Expand Down Expand Up @@ -158,6 +156,8 @@ def from_proto_segment_scope(segment_scope: proto.SegmentScope) -> SegmentScope:
return SegmentScope.VECTOR
elif segment_scope == proto.SegmentScope.METADATA:
return SegmentScope.METADATA
elif segment_scope == proto.SegmentScope.RECORD:
return SegmentScope.RECORD
else:
raise RuntimeError(f"Unknown segment scope {segment_scope}")

Expand All @@ -167,6 +167,8 @@ def to_proto_segment_scope(segment_scope: SegmentScope) -> proto.SegmentScope:
return proto.SegmentScope.VECTOR
elif segment_scope == SegmentScope.METADATA:
return proto.SegmentScope.METADATA
elif segment_scope == SegmentScope.RECORD:
return proto.SegmentScope.RECORD
else:
raise RuntimeError(f"Unknown segment scope {segment_scope}")

Expand Down
9 changes: 4 additions & 5 deletions chromadb/proto/coordinator_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit eb18e5c

Please sign in to comment.