Skip to content

feat: indexing and sparse

Dung Phan requested to merge feature/indexing-and-sparse-embeddings into develop

Indexing and Cloud-based Sparse Embedding

Changes Overview

This PR introduces the following improvements and adjustments related to data indexing and embedding handling:

  1. Add support for indexing data directly to Zilliz
  2. Remove local sparse embedding logic
  3. Handle sparse embedding entirely on the cloud

Detailed Changes

1. Zilliz Integration

  • Implemented functionality to index data directly into Zilliz vector database
  • Established connection handling and necessary API calls for uploading vectors

2. Remove Local Sparse Embedding

  • Deleted local sparse embedding logic and related files/modules
  • Transitioned embedding responsibilities fully to cloud services

3. Cloud-based Sparse Embedding

  • Integrated external API/service for sparse vector generation
  • Ensured compatibility with existing dense embedding pipelines
  • Improved scalability and reduced local computation requirements

Testing

  • Verified successful data indexing to Zilliz
  • Tested cloud embedding integration with sample data
  • Ran full pipeline to validate end-to-end indexing

Cleanup Verification

  • Ensured all removed embedding code is unused and unreferenced
  • Verified new Zilliz indexing logic works as intended
  • Checked logs and cloud responses for consistency and errors
Edited by Bùi Minh Quân

Merge request reports