Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support Vector Type #1734

Open
7flash opened this issue Jan 12, 2024 · 9 comments
Open

Feature Request: Support Vector Type #1734

7flash opened this issue Jan 12, 2024 · 9 comments

Comments

@7flash
Copy link

7flash commented Jan 12, 2024

Since Cassandra has introduced Vector type, and it's already supported in Python driver, I hope it can be added in gocql as well, and it will resolve this issue: datastax/gocql-astra#17 (comment)

@jfleming-ic
Copy link
Contributor

We're also keen to see vector support in gocql, and I think our customers would love to see it as well. Any news on this feature?

@nkev
Copy link

nkev commented Feb 28, 2024

Are there plans to implement the many new features in Cassandra 5?

@martin-sucha
Copy link
Contributor

@nkev Personally I don't plan to work on Cassandra 5 support, if anyone else wants to, feel free. See a more detailed response in the mailing list.

@nkev
Copy link

nkev commented Feb 29, 2024

@martin-sucha Thanks for the update. Let's hope a gopher (or few) with a deep understanding of C* puts their hand up.

@tengu-alt
Copy link

Hello! I will try to handle it.

@tengu-alt
Copy link

tengu-alt commented May 22, 2024

Hello! I will try to handle it.

During the implementation of the vector type support I found several issues:

  • The Vector type is implemented in Cassandra as not the native collection type but the custom type.
  • The data serialization on the select operation happens differently because of non-mentioned in official Cassandra documentation restrictions of the vector elements length (that also causes errors when I am trying to select values that length are longer than Cassandra allows). Also, I tested it via cqlsh:
    create table example.vectors(id text, words vector <text, 3 >, PRIMARY KEY(id )) ;
    INSERT INTO vectors (id, words ) VALUES ('id', ['AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB','2','1']);
    cqlsh:example> SELECT * FROM vectors ;
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 767, in recv_results_rows
    self.parsed_rows = [decode_row(row) for row in rows]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 767, in <listcomp>
    self.parsed_rows = [decode_row(row) for row in rows]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 764, in decode_row
    return tuple(decode_val(val, col_md, col_desc) for val, col_md, col_desc in zip(row, column_metadata, col_descs))
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 764, in <genexpr>
    return tuple(decode_val(val, col_md, col_desc) for val, col_md, col_desc in zip(row, column_metadata, col_descs))
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 761, in decode_val
    return col_type.from_binary(raw_bytes, protocol_version)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 315, in from_binary
    return cls.deserialize(byts, protocol_version)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 1445, in deserialize
    return [cls.subtype.deserialize(byts[idx:idx + 4], protocol_version) for idx in indexes]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 1445, in <listcomp>
    return [cls.subtype.deserialize(byts[idx:idx + 4], protocol_version) for idx in indexes]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 769, in deserialize
    return byts.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 772, in recv_results_rows
    decode_val(val, col_md, col_desc)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 761, in decode_val
    return col_type.from_binary(raw_bytes, protocol_version)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 315, in from_binary
    return cls.deserialize(byts, protocol_version)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 1445, in deserialize
    return [cls.subtype.deserialize(byts[idx:idx + 4], protocol_version) for idx in indexes]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 1445, in <listcomp>
    return [cls.subtype.deserialize(byts[idx:idx + 4], protocol_version) for idx in indexes]
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cqltypes.py", line 769, in deserialize
    return byts.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cassandra/bin/../pylib/cqlshlib/cqlshmain.py", line 990, in perform_simple_statement
    result = future.result()
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/cluster.py", line 4920, in result
    raise self._final_exception
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/connection.py", line 1229, in process_msg
    response = decoder(header.version, self.user_type_map, stream_id,
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 1208, in decode_message
    msg = msg_class.recv_body(body, protocol_version, user_type_map, result_metadata, cls.column_encryption_policy)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 745, in recv_body
    msg.recv(f, protocol_version, user_type_map, result_metadata, column_encryption_policy)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 731, in recv
    self.recv_results_rows(f, protocol_version, user_type_map, result_metadata, column_encryption_policy)
  File "/opt/cassandra/bin/../lib/cassandra-driver-internal-only-3.28.0.zip/cassandra-driver-3.28.0/cassandra/protocol.py", line 774, in recv_results_rows
    raise DriverException('Failed decoding result column "%s" of type %s: %s' % (col_md[2],
cassandra.DriverException: Failed decoding result column "words" of type org.apache.cassandra.db.marshal.VectorType<text, 3>: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte```       

@martin-sucha
Copy link
Contributor

The Vector type is implemented in Cassandra as not the native collection type but the custom type.

Could this be because gocql uses protocol v4, which does not have native support for the vector type, while protocol v5 does?

@martin-sucha
Copy link
Contributor

Please open a Cassandra issue about the length issue.

@tengu-alt
Copy link

The Vector type is implemented in Cassandra as not the native collection type but the custom type.

Could this be because gocql uses protocol v4, which does not have native support for the vector type, while protocol v5 does?

Exactly!
I will hold it until the protocol v5 support will appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants