Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Add Data to Existing Qdrant Collection on Self-Hosted Instance #4221

Open
zainulabd786 opened this issue May 13, 2024 · 14 comments
Open
Labels
bug Something isn't working

Comments

@zainulabd786
Copy link

zainulabd786 commented May 13, 2024

Current Behavior

I am experiencing an issue with adding data to an existing collection in my Qdrant vector database, which is running on a self-hosted instance. The collection contains a significant amount of data, and everything was functioning correctly until recently.

Collection Information:

{
  "status": "green",
  "optimizer_status": "ok",
  "vectors_count": 5633,
  "indexed_vectors_count": 5061,
  "points_count": 5632,
  "segments_count": 2,
  "config": {
    "params": {
      "vectors": {
        "size": 3072,
        "distance": "Cosine"
      },
      "shard_number": 1,
      "replication_factor": 1,
      "write_consistency_factor": 1,
      "on_disk_payload": true
    },
    "hnsw_config": {
      "m": 16,
      "ef_construct": 100,
      "full_scan_threshold": 10000,
      "max_indexing_threads": 0,
      "on_disk": false
    },
    "optimizer_config": {
      "deleted_threshold": 0.2,
      "vacuum_min_vector_number": 1000,
      "default_segment_number": 0,
      "max_segment_size": null,
      "memmap_threshold": null,
      "indexing_threshold": 20000,
      "flush_interval_sec": 5,
      "max_optimization_threads": 1
    },
    "wal_config": {
      "wal_capacity_mb": 32,
      "wal_segments_ahead": 0
    },
    "quantization_config": null
  },
  "payload_schema": {
    "metadata.chat_id": {
      "data_type": "keyword",
      "points": 0
    },
    "metadata.user_id": {
      "data_type": "keyword",
      "points": 5632
    }
  }
}

I am attempting to add a new point to the collection using the following request:
Add Data Request:

PUT /collections/{collection_name}/points
{
  "points": [
    {
      "id": 1,
      "payload": {
        "color": "red"
      },
      "vector": [
        -0.037950132,
        -0.00005226054,
        -0.0028657909,
        ...
      ]
    }
  ]
}

Response:

{
  "result": {
    "operation_id": 751,
    "status": "acknowledged"
  },
  "status": "ok",
  "time": 0.000190275
}

Despite receiving an "acknowledged" status, the data does not appear to be added to the collection. When I attempt to retrieve the data, the response indicates that the data is not available.
Retrieve Request:

POST /collections/{collection_name}/points
{
  "ids": [1]
}

Retrieve Response:

{
  "result": [],
  "status": "ok",
  "time": 0.000027523
}

Steps to Reproduce

  1. Execute Add Data Request.
  2. Execute Retrieve Request with the id supplied in Add Data Request.

Expected Behavior

The data should be added to the collection.

Possible Solution

Context (Environment)

I have a Retrieval-Augmented Generation (RAG) application that relies on indexing data into Qdrant. The inability to add new data to the existing collection has resulted in blocking a critical feature of my application. This feature is essential for ensuring that my application can retrieve and utilize the most recent and relevant data efficiently. I am aiming to restore the functionality of data indexing to maintain the application's performance and reliability.

Environment:
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
QDRANT_INIT_FILE_PATH : /qdrant/init/.qdrant-initialized

Detailed Description

Possible Implementation

@zainulabd786 zainulabd786 added the bug Something isn't working label May 13, 2024
@generall
Copy link
Member

hey @zainulabd786, could you please try to insert data with ?wait=true, it should give you more verbose output

@zainulabd786
Copy link
Author

wait=true

@generall Thanks for your suggestion, I tried PUT /collections/review_docs_oai/points?wait=true, I got the following response:

{
  "result": {
    "operation_id": 755,
    "status": "completed"
  },
  "status": "ok",
  "time": 0.002304571
}

But the retrieval call still returns no data.

@timvisee
Copy link
Member

I've not seen this behavior before.

Sorry if this is obvious, but are you using review_docs_oai in exactly the same way on all requests? Also, are you running multiple instances with a load balancer in front?

@zainulabd786
Copy link
Author

I've not seen this behavior before.

Sorry if this is obvious, but are you using review_docs_oai in exactly the same way on all requests? Also, are you running multiple instances with a load balancer in front?

@timvisee Yes, review_docs_oai is used in exactly the same way, I verified that. Also, There's not load load balancer, it's just one pod.

@timvisee
Copy link
Member

timvisee commented May 13, 2024

Thank you for confirming.

I cannot reproduce this locally with your above examples. Would you mind to try whether you see the same behavior in a new collection, or even a separate instance?

I do get points without vectors though. That is expected behavior. You can include vectors by setting with_vector: true. Though, that doesn't solve the issue you report.

@zain-at-edensolutions
Copy link

Thank you for confirming.

I cannot reproduce this locally with your above examples. Would you mind to try whether you see the same behavior in a new collection, or even a separate instance?

I do get points without vectors though. That is expected behavior. You can include vectors by setting with_vector: true. Though, that doesn't solve the issue you report.

When I tried the same PUT request with a different collection, it worked as expected. The issue unexpectedly arose with a specific collection.

@timvisee
Copy link
Member

timvisee commented May 14, 2024

Thank you for confirming. That's weird!

Would it be possible to send the collection over to us for analysis?

@zain-at-edensolutions
Copy link

Sure, Do you want me to take a snapshot and then share with you?

@timvisee
Copy link
Member

Sure, Do you want me to take a snapshot and then share with you?

Yes, that should be good enough. If that doesn't work out somehow we might want to copy the whole file tree containing the collection.

@timvisee
Copy link
Member

@zain-at-edensolutions @zainulabd786 I cannot reproduce this problem with the collection you provided.

PUT /collections/review_docs_oai/points
{
  "points": [
    {
      "id": 1,
      "payload": {
        "color": "red"
      },
      "vector": {}
    }
  ]
}

POST /collections/review_docs_oai/points
{
  "ids": [1],
  "with_vector": true
}

Gives me:

{
  "result": [
    {
      "id": 1,
      "payload": null,
      "vector": {}
    }
  ],
  "status": "ok",
  "time": 0.000277937
}

@zain-at-edensolutions
Copy link

@zain-at-edensolutions @zainulabd786 I cannot reproduce this problem with the collection you provided.

PUT /collections/review_docs_oai/points
{
  "points": [
    {
      "id": 1,
      "payload": {
        "color": "red"
      },
      "vector": {}
    }
  ]
}

POST /collections/review_docs_oai/points
{
  "ids": [1],
  "with_vector": true
}

Gives me:

{
  "result": [
    {
      "id": 1,
      "payload": null,
      "vector": {}
    }
  ],
  "status": "ok",
  "time": 0.000277937
}

That's strange, I tried the same. but no luck. I don't even see any errors in my instance logs.

@zainulabd786
Copy link
Author

@timvisee I created a cluster on Qdrant cloud and uploaded my collection snapshot. I am able to reproduce the issue on cloud.
https://a0afc391-e8c8-44b4-8b73-f4ff209139fb.us-east4-0.gcp.cloud.qdrant.io:6333/dashboard

I am ok if you want me to share the API key on E-mail.

@zain-at-edensolutions
Copy link

@timvisee I shared the API key on email. Could you please give it a try? You will be able to reproduce the issue I am facing.

@generall
Copy link
Member

Hey @zain-at-edensolutions, @zainulabd786 noticed that you are using old v1.7.2

The problem you are reporting is indeed reproducible on this version, but not on the latest. Could you please try to upgrade?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants