Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consume document failed - Collection field dim is 1024, but entities field dim is 0 #4199

Open
4 tasks done
kerlion opened this issue May 9, 2024 · 1 comment
Open
4 tasks done
Labels
🐞 bug Something isn't working

Comments

@kerlion
Copy link
Contributor

kerlion commented May 9, 2024

Self Checks

  • This is only for bug report, if you would like to ask a quesion, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.6.6

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Knowlege -> upload documents
->Chunk settings->Custom
Segment identifier: \n\n
Maximum chunk length: 500
Chunk overlap: 50
Text preprocessing rules -> Check Replace consecutive spaces, newlines and tabs

I suspect it is related to "Replace consecutive spaces, newlines and tabs"

It works fine at 0.6.4.!

✔️ Expected Behavior

Embedding complete.

❌ Actual Behavior

reported error:
[2024-05-09 01:12:02,952: DEBUG/MainProcess] Prefix dict has been built successfully. [2024-05-09 01:12:03,340: DEBUG/MainProcess] Created new connection using: 0ef89dbb6b6340f98a776bee7a1e3bea [2024-05-09 01:12:03,351: DEBUG/MainProcess] Created new connection using: 4f2bd99b26314c989c3ae9d6454ecabf [2024-05-09 01:12:03,371: ERROR/MainProcess] RPC error: [insert_rows], <**ParamError: (code=1, message=Collection field dim is 1024, but entities field dim is 0**)>, <Time:{'RPC start': '2024-05-09 01:12:03.366004', 'RPC error': '2024-05-09 01:12:03.371536'}> [2024-05-09 01:12:03,371: ERROR/MainProcess] Failed to insert batch starting at entity: 0/10 [2024-05-09 01:12:03,371: ERROR/MainProcess] Failed to insert batch starting at entity: 0/10 [2024-05-09 01:12:03,378: ERROR/MainProcess] consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 73, in run self._load( File "/app/api/core/indexing_runner.py", line 677, in _load tokens += future.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/app/api/core/indexing_runner.py", line 732, in _process_chunk index_processor.load(dataset, chunk_documents, with_keywords=False) File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 60, in load vector.create(documents) File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 173, in create self._vector_processor.create( File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 68, in create self.add_texts(texts, embeddings) File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 94, in add_texts raise e File "/app/api/core/rag/datasource/vdb/milvus/milvus_vector.py", line 88, in add_texts ids = self._client.insert(collection_name=self._collection_name, data=batch_insert_list) File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 206, in insert raise ex from ex File "/usr/local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 198, in insert res = conn.insert_rows(collection_name, insert_batch, timeout=timeout) File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 127, in handler raise e from e File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 123, in handler return func(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 162, in handler return func(self, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 102, in handler raise e from e File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 68, in handler return func(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 501, in insert_rows request = self._prepare_row_insert_request( File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 482, in _prepare_row_insert_request return Prepare.row_insert_param( File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 422, in row_insert_param return cls._parse_row_request(request, fields_info, enable_dynamic, entities) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 394, in _parse_row_request _, _, auto_id_loc = traverse_rows_info(fields_info, entities) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/utils.py", line 215, in traverse_rows_info raise ParamError( pymilvus.exceptions.ParamError: <ParamError: (code=1, message=Collection field dim is 1024, but entities field dim is 0)>

Copy link

dosubot bot commented May 9, 2024

The error you're encountering, "Collection field dim is 1024, but entities field dim is 0," indicates a mismatch between the expected dimensionality of the data and what's actually being provided. This typically means that the system expects vector data with a dimensionality of 1024 for each document, but the submitted data appears to have no dimensions, suggesting it might be empty or improperly formatted.

This issue could be related to the document processing and embedding phase, where the "Replace consecutive spaces, newlines, and tabs" setting might be affecting the output. Since you've identified a change in behavior between versions 0.6.4 and 0.6.6, it's possible that adjustments or updates in the newer version could be impacting how documents are processed and prepared for embedding.

To address this, please ensure that the documents you're trying to upload are correctly formatted and contain the expected content. Also, reviewing the custom chunk settings and how the "Replace consecutive spaces, newlines, and tabs" option is configured could help identify if there's a specific setting causing the issue. Adjusting these settings or testing with a simpler document might help isolate the problem.

If the issue persists, providing more details about the documents (format, size, content) and the exact settings used for chunking and text preprocessing could offer further insights into what might be going wrong.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant