Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbeaver/dbeaver#23390 Support REAL_VECTOR type in HANA plugin #23391

Open
wants to merge 6 commits into
base: devel
Choose a base branch
from

Conversation

stefanuhrig
Copy link
Contributor

The new vector data type REAL_VECTOR was introduced with HANA Cloud Database QRC 1/2024. Details about the new type are available in the SAP HANA Database Vector Engine Guide.

HANA's JDBC driver natively supports that type starting with version 2.21.5.

This change introduces a new value handler so that vectors are displayed like arrays. Furthermore, the column type modifiers are adapted to display vector dimension constraints.

The new vector data type REAL_VECTOR was introduced with HANA Cloud
Database QRC 1/2024. Details about the new type are available in the
SAP HANA Database Vector Engine Guide.

HANA's JDBC driver natively supports that type starting with version
2.21.5.

This change introduces a new value handler so that vectors are displayed
like arrays. Furthermore, the column type modifiers are adapted to
display vector dimension constraints.
@LonwoLonwo
Copy link
Member

Hello @stefanuhrig

Thanks for your contribution.
We will look into it.

@LonwoLonwo LonwoLonwo linked an issue Apr 11, 2024 that may be closed by this pull request
@LonwoLonwo
Copy link
Member

Please use a meaningful branch name next time.

@LonwoLonwo
Copy link
Member

Please provide more info for our QA Team.
Screens of your working feature also will be very helpful.

@@ -72,6 +73,14 @@ protected DBPDataSourceInfo createDataSourceInfo(DBRProgressMonitor monitor, @No
return info;
}

@Override
public DBPDataKind resolveDataKind(String typeName, int valueType) {
if ("REAL_VECTOR".equalsIgnoreCase(typeName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a constant in the HanaConstants class for the "REAL_VECTOR"

@stefanuhrig
Copy link
Contributor Author

@LonwoLonwo

Which additional info do you require?

You can find the full guide to the new HANA feature at https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/sap-hana-cloud-sap-hana-database-vector-engine-guide.

To shortly summarize:

A new datatype has been introduced into SAP HANA Cloud. This datatype has been designed for vector embeddings, i.e. high-dimensional vectors consisting of floating-point numbers. Vector embeddings play an important role in generative AI contexts.

Without this feature, vectors are displayed in their binary format in DBeaver and look like data garbage. With this feature, they are displayed like arrays, and the elements of the vectors can be edited, too.

If you have any further questions or require any further information, please reach out.

Here is a screenshot of the new feature:

vectors

This is how it looks without this new feature:

before

@@ -33,6 +33,8 @@ public class HANAValueHandlerProvider implements DBDValueHandlerProvider {
public DBDValueHandler getValueHandler(DBPDataSource dataSource, DBDFormatSettings preferences,
DBSTypedObject typedObject) {
switch (typedObject.getTypeName()) {
case "REAL_VECTOR":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't add a constant here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonwoLonwo
Thanks for your feedback.

I did not use a constant here because there are hard-coded datatype names below.

I considered the following options:

  1. Use the constant and make the code look inconsistent.
  2. Introduce constants for ST_Geometry and ST_Point, which would add refactorings not related to this feature.
  3. Use the hard-coded datatype name.

I opted for 3 but can change it. What's your proposal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add constants for ST_Geometry and ST_Point.
(2)

@LonwoLonwo
Copy link
Member

Thanks for the explanation of the future.

@@ -24,4 +24,9 @@ public class HANAConstants {

// pseudo schema for PUBLIC SYNONYMs
public static final String SCHEMA_PUBLIC = "PUBLIC";

// Data type names
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR";
public static final String DATA_TYPE_NAME_REAL_VECTOR = "REAL_VECTOR";


// Data type names
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR";
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY";
public static final String DATA_TYPE_NAME_ST_GEOMETRY = "ST_GEOMETRY";

// Data type names
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR";
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY";
public static final String DATATYPENAME_ST_POINT = "ST_POINT";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String DATATYPENAME_ST_POINT = "ST_POINT";
public static final String DATA_TYPE_NAME_ST_POINT = "ST_POINT";

Change data type constant prefix from DATATYPENAME to DATA_TYPE_NAME.
@stefanuhrig
Copy link
Contributor Author

@LonwoLonwo

Is there something else I need to take care of so that this change can be merged or is it just a matter of time?

@Matvey16
Copy link
Member

@stefanuhrig Is there a wasy to get an instance of HANA Cloud Database for thesting. We only have on premise HANA for our tests and as I understand it doesn't support this feature. Also how can I get 2.21 jdbc driver? I only see 2.20 version on maven

@stefanuhrig
Copy link
Contributor Author

@Matvey16

We have HANA Cloud instances for OSS testing and could grant access to you. We have one in the US West region and one in Germany. Which location is closer to you?

The 2.21 JDBC driver has not been released yet. Unfortunately, I am not allowed to share a preliminary version with you. I can ask the responsible colleagues for an estimate release date though.

@Matvey16
Copy link
Member

Matvey16 commented May 17, 2024

You can send me the credentials on matvei.baranov@dbeaver.com Germany would be better. But I want to ask if there is even a point to test or merge this without th newest driver? How would this data type be handled by the current driver?

@stefanuhrig
Copy link
Contributor Author

@Matvey16

Both the driver and the HANA Cloud version supporting the new client/server protocol (HANA Cloud QRC 2/24) will probably be released end of June.

If either the driver or the database does not support the new client/server protocol, vectors will be reported to be VARBINARY data by the database. So, if a vector column would be directly selected, the result would look like in the second screenshot above.

I will send you the credentials for the HANA Cloud instance per mail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support REAL_VECTOR type in HANA plugin
4 participants