feat: indexing and sparse
Indexing and Cloud-based Sparse Embedding
Changes Overview
This PR introduces the following improvements and adjustments related to data indexing and embedding handling:
- Add support for indexing data directly to Zilliz
- Remove local sparse embedding logic
- Handle sparse embedding entirely on the cloud
Detailed Changes
1. Zilliz Integration
- Implemented functionality to index data directly into Zilliz vector database
- Established connection handling and necessary API calls for uploading vectors
2. Remove Local Sparse Embedding
- Deleted local sparse embedding logic and related files/modules
- Transitioned embedding responsibilities fully to cloud services
3. Cloud-based Sparse Embedding
- Integrated external API/service for sparse vector generation
- Ensured compatibility with existing dense embedding pipelines
- Improved scalability and reduced local computation requirements
Testing
-
Verified successful data indexing to Zilliz -
Tested cloud embedding integration with sample data -
Ran full pipeline to validate end-to-end indexing
Cleanup Verification
- Ensured all removed embedding code is unused and unreferenced
- Verified new Zilliz indexing logic works as intended
- Checked logs and cloud responses for consistency and errors
Edited by Bùi Minh Quân