Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify driver.supportedDataTypes to support installed Postgres extension data type. Bypass validation for a column. #10056

Open
1 of 18 tasks
w1am opened this issue May 17, 2023 · 12 comments · May be fixed by #10138 or #10789
Open
1 of 18 tasks

Comments

@w1am
Copy link

w1am commented May 17, 2023

Feature Description

I'm using the pgvector extension in my database, which allows me to store data in a vector datatype and perform vector operations such as nearest neighbors with SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;. However, TypeORM seems to have an entity validator method that verifies whether the datatype defined in my entity exists.

For instance, the following won't work because the vector datatype doesn't natively exist.

@Entity('items')
export class Item extends BaseEntity {
  @PrimaryGeneratedColumn("increment")
  name: number;

  @Column("vector")
  embedding: number[];
}

In raw SQL, this is how I would achieve it:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

Is there a way to modify or bypass the validation for a column? I could create this column using raw SQL, but that defeats the purpose of using an ORM. Of course, bypassing seems like a terrible idea. Is there a way to do this?

The Solution

I would like to bypass the entity validation for a specific column, as I'm using a custom extension, pgvector, in my database. Defining the type in the TypeORM entity results in a validation error, since the vector datatype isn't natively supported by PostgreSQL.

As a side note, I've installed the extension manually using the command CREATE EXTENSION vector.

Considered Alternatives

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

Additional Context

You may refer to the pgvector documentation https://github.com/pgvector/pgvector

Relevant Database Driver(s)

  • aurora-mysql
  • aurora-postgres
  • better-sqlite3
  • cockroachdb
  • cordova
  • expo
  • mongodb
  • mysql
  • nativescript
  • oracle
  • postgres
  • react-native
  • sap
  • spanner
  • sqlite
  • sqlite-abstract
  • sqljs
  • sqlserver

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

@w1am w1am changed the title Modify driver.supportedDataTypes to support installed Postgres extension data type. Can I bypass validation for a column? Modify driver.supportedDataTypes to support installed Postgres extension data type. Bypass validation for a column. May 17, 2023
@Ginden
Copy link
Collaborator

Ginden commented May 17, 2023

You can use my work from this PR: #8625

I'm too busy to complete it, so feel free to rebase/merge and add some tests.

@purerosefallen
Copy link

How about directly support pgvector in TypeORM? as currently AIGC becoming a thing.

@strongpauly strongpauly linked a pull request Jun 17, 2023 that will close this issue
7 tasks
@dasheck0
Copy link

+1

@Thore1954
Copy link

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

ORMs are simply a mapping between tables and classes. As far as typeorm is concerned, the vector datatype is just a string.
See langchain-ai/langchainjs#1436 for reference.

@Marlinski
Copy link

+1 much needed!

@wpride
Copy link

wpride commented Jul 28, 2023

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

ORMs are simply a mapping between tables and classes. As far as typeorm is concerned, the vector datatype is just a string. See hwchase17/langchainjs#1436 for reference.

Not when it comes to generating migrations

@Thore1954
Copy link

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

ORMs are simply a mapping between tables and classes. As far as typeorm is concerned, the vector datatype is just a string. See hwchase17/langchainjs#1436 for reference.

Not when it comes to generating migrations

Migrations are in fact raw SQL!

@wpride
Copy link

wpride commented Jul 29, 2023

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

ORMs are simply a mapping between tables and classes. As far as typeorm is concerned, the vector datatype is just a string. See hwchase17/langchainjs#1436 for reference.

Not when it comes to generating migrations

Migrations are in fact raw SQL!

That is a fact, I don't know what else they might be. This ORM happens to generate that SQL based on the column annotation. So no, as far as TypeORM is concerned this is not just a string.

@Thore1954
Copy link

Thore1954 commented Jul 29, 2023

I could create this column using raw SQL, but that defeats the purpose of using an ORM.

ORMs are simply a mapping between tables and classes. As far as typeorm is concerned, the vector datatype is just a string. See hwchase17/langchainjs#1436 for reference.

Not when it comes to generating migrations

Migrations are in fact raw SQL!

That is a fact, I don't know what else they might be. This ORM happens to generate that SQL based on the column annotation. So no, as far as TypeORM is concerned this is not just a string.

I happen to write my own migrations (old habits). However, I don't think modifying the generated SQL and replacing character varying with vector would be a problem. Everything else works just the same.

@Thore1954
Copy link

That being said, I do believe it's a good idea to support pgvector out of the box similar to PostGIS (#2423).

@nrathi
Copy link

nrathi commented Sep 3, 2023

What's the right workaround type for a vector column:

@Column({ type: "jsonb" }) or @Column("numeric", { array: true })?

@mlois-efimob
Copy link

I worked with other ORMs, and all of them that I remember have some form to define custom data types, this happens in a lot of databases, PostgreSQL is one of this cases where can be user defined types, extensions, and so on.

Supporting custom column data types allow for all this use cases to fit the common use case for the library, we are exploring this kind of modification. At this time for example we have some tables that have a xid8 data type, that is not supported by typeorm. We workaround by marking the entity as not synced and making all the migration changes by hand, not very fun.

firtoz added a commit to Lunix-AI/typeorm that referenced this issue Mar 24, 2024
@firtoz firtoz linked a pull request Mar 24, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants