Skip to content

Commit 0612350

Browse files
authoredJun 13, 2024··
feat (core): support https and data url strings in image parts (#1944)
1 parent 17e5bbb commit 0612350

13 files changed

+159
-49
lines changed
 

‎.changeset/smart-ducks-fold.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'ai': patch
3+
---
4+
5+
feat (core): support https and data url strings in image parts

‎content/docs/03-ai-sdk-core/03-prompts.mdx

+19-10
Original file line numberDiff line numberDiff line change
@@ -76,15 +76,27 @@ const result = await generateText({
7676
Instead of sending a text in the `content` property, you can send an array of parts that include text and other data types.
7777
Currently image and text parts are supported.
7878

79-
For models that support multi-modal inputs, user messages can include images. An `image` can be a base64-encoded image (`string`), an `ArrayBuffer`, a `Uint8Array`,
80-
a `Buffer`, or a `URL` object. It is possible to mix text and multiple images.
79+
For models that support multi-modal inputs, user messages can include images. An `image` can be one of the following:
80+
81+
- base64-encoded image:
82+
- `string` with base-64 encoded content
83+
- data URL `string`, e.g. `data:image/png;base64,...`
84+
- binary image:
85+
- `ArrayBuffer`
86+
- `Uint8Array`
87+
- `Buffer`
88+
- URL:
89+
- http(s) URL `string`, e.g. `https://example.com/image.png`
90+
- `URL` object, e.g. `new URL('https://example.com/image.png')`
91+
92+
It is possible to mix text and multiple images.
8193

8294
<Note type="warning">
8395
Not all models support all types of multi-modal inputs. Check the model's
8496
capabilities before using this feature.
8597
</Note>
8698

87-
#### Example: Buffer images
99+
#### Example: Binary image (Buffer)
88100

89101
```ts highlight="8-11"
90102
const result = await generateText({
@@ -104,9 +116,7 @@ const result = await generateText({
104116
});
105117
```
106118

107-
#### Example: Base-64 encoded images
108-
109-
<Note>You do not need a `data:...` prefix for the base64-encoded image.</Note>
119+
#### Example: Base-64 encoded image (string)
110120

111121
```ts highlight="8-11"
112122
const result = await generateText({
@@ -126,9 +136,9 @@ const result = await generateText({
126136
});
127137
```
128138

129-
#### Example: Image URLs
139+
#### Example: Image URL (string)
130140

131-
```ts highlight="8-13"
141+
```ts highlight="8-12"
132142
const result = await generateText({
133143
model: yourModel,
134144
messages: [
@@ -138,9 +148,8 @@ const result = await generateText({
138148
{ type: 'text', text: 'Describe the image in detail.' },
139149
{
140150
type: 'image',
141-
image: new URL(
151+
image:
142152
'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
143-
),
144153
},
145154
],
146155
},

‎content/docs/07-reference/ai-sdk-core/01-generate-text.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ console.log(text);
107107
name: 'image',
108108
type: 'string | Uint8Array | Buffer | ArrayBuffer | URL',
109109
description:
110-
'The image content of the message part. String are base64 encoded content. URLs need to be represented with a URL object',
110+
'The image content of the message part. String are either base64 encoded content, base64 data URLs, or http(s) URLs.',
111111
},
112112
],
113113
},

‎content/docs/07-reference/ai-sdk-core/02-stream-text.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ for await (const textPart of textStream) {
109109
name: 'image',
110110
type: 'string | Uint8Array | Buffer | ArrayBuffer | URL',
111111
description:
112-
'The image content of the message part. String are base64 encoded content. URLs need to be represented with a URL object',
112+
'The image content of the message part. String are either base64 encoded content, base64 data URLs, or http(s) URLs.',
113113
},
114114
],
115115
},

‎content/docs/07-reference/ai-sdk-core/03-generate-object.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ console.log(JSON.stringify(object, null, 2));
126126
name: 'image',
127127
type: 'string | Uint8Array | Buffer | ArrayBuffer | URL',
128128
description:
129-
'The image content of the message part. String are base64 encoded content. URLs need to be represented with a URL object'
129+
'The image content of the message part. String are either base64 encoded content, base64 data URLs, or http(s) URLs.'
130130
}
131131
]
132132
}

‎content/docs/07-reference/ai-sdk-core/04-stream-object.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ for await (const partialObject of partialObjectStream) {
129129
name: 'image',
130130
type: 'string | Uint8Array | Buffer | ArrayBuffer | URL',
131131
description:
132-
'The image content of the message part. String are base64 encoded content. URLs need to be represented with a URL object'
132+
'The image content of the message part. String are either base64 encoded content, base64 data URLs, or http(s) URLs.'
133133
}
134134
]
135135
}

‎content/docs/07-reference/ai-sdk-rsc/01-stream-ui.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ A helper function to create a streamable UI from LLM providers. This function is
9797
name: 'image',
9898
type: 'string | Uint8Array | Buffer | ArrayBuffer | URL',
9999
description:
100-
'The image content of the message part. String are base64 encoded content. URLs need to be represented with a URL object',
100+
'The image content of the message part. String are either base64 encoded content, base64 data URLs, or http(s) URLs.',
101101
},
102102
],
103103
},

‎examples/ai-core/src/generate-text/anthropic-multimodal-url.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,8 @@ async function main() {
1515
{ type: 'text', text: 'Describe the image in detail.' },
1616
{
1717
type: 'image',
18-
image: new URL(
18+
image:
1919
'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
20-
),
2120
},
2221
],
2322
},

‎examples/ai-core/src/generate-text/google-multimodal-url.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,8 @@ async function main() {
1515
{ type: 'text', text: 'Describe the image in detail.' },
1616
{
1717
type: 'image',
18-
image: new URL(
18+
image:
1919
'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
20-
),
2120
},
2221
],
2322
},

‎examples/ai-core/src/generate-text/google-vertex-multimodal-url.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,8 @@ async function main() {
1414
{ type: 'text', text: 'Describe the image in detail.' },
1515
{
1616
type: 'image',
17-
image: new URL(
17+
image:
1818
'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
19-
),
2019
},
2120
],
2221
},

‎examples/ai-core/src/generate-text/openai-multimodal-url.ts

+1-2
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,8 @@ async function main() {
1515
{ type: 'text', text: 'Describe the image in detail.' },
1616
{
1717
type: 'image',
18-
image: new URL(
18+
image:
1919
'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
20-
),
2120
},
2221
],
2322
},

‎packages/core/core/prompt/convert-to-language-model-prompt.test.ts

+77-26
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,85 @@
11
import { convertToLanguageModelMessage } from './convert-to-language-model-prompt';
22

33
describe('convertToLanguageModelMessage', () => {
4-
describe('assistant message', () => {
5-
it('should ignore empty text parts', async () => {
6-
const result = convertToLanguageModelMessage({
7-
role: 'assistant',
8-
content: [
9-
{
10-
type: 'text',
11-
text: '',
12-
},
13-
{
14-
type: 'tool-call',
15-
toolName: 'toolName',
16-
toolCallId: 'toolCallId',
17-
args: {},
18-
},
19-
],
4+
describe('user message', () => {
5+
describe('image parts', () => {
6+
it('should convert image string https url to URL object', async () => {
7+
const result = convertToLanguageModelMessage({
8+
role: 'user',
9+
content: [
10+
{
11+
type: 'image',
12+
image: 'https://example.com/image.jpg',
13+
},
14+
],
15+
});
16+
17+
expect(result).toEqual({
18+
role: 'user',
19+
content: [
20+
{
21+
type: 'image',
22+
image: new URL('https://example.com/image.jpg'),
23+
},
24+
],
25+
});
26+
});
27+
28+
it('should convert image string data url to base64 content', async () => {
29+
const result = convertToLanguageModelMessage({
30+
role: 'user',
31+
content: [
32+
{
33+
type: 'image',
34+
image: 'data:image/jpg;base64,dGVzdA==',
35+
},
36+
],
37+
});
38+
39+
expect(result).toEqual({
40+
role: 'user',
41+
content: [
42+
{
43+
type: 'image',
44+
image: new Uint8Array([116, 101, 115, 116]),
45+
mimeType: 'image/jpg',
46+
},
47+
],
48+
});
2049
});
50+
});
51+
});
52+
53+
describe('assistant message', () => {
54+
describe('text parts', () => {
55+
it('should ignore empty text parts', async () => {
56+
const result = convertToLanguageModelMessage({
57+
role: 'assistant',
58+
content: [
59+
{
60+
type: 'text',
61+
text: '',
62+
},
63+
{
64+
type: 'tool-call',
65+
toolName: 'toolName',
66+
toolCallId: 'toolCallId',
67+
args: {},
68+
},
69+
],
70+
});
2171

22-
expect(result).toEqual({
23-
role: 'assistant',
24-
content: [
25-
{
26-
type: 'tool-call',
27-
args: {},
28-
toolCallId: 'toolCallId',
29-
toolName: 'toolName',
30-
},
31-
],
72+
expect(result).toEqual({
73+
role: 'assistant',
74+
content: [
75+
{
76+
type: 'tool-call',
77+
args: {},
78+
toolCallId: 'toolCallId',
79+
toolName: 'toolName',
80+
},
81+
],
82+
});
3283
});
3384
});
3485
});

‎packages/core/core/prompt/convert-to-language-model-prompt.ts

+49
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import { detectImageMimeType } from '../util/detect-image-mimetype';
99
import { convertDataContentToUint8Array } from './data-content';
1010
import { ValidatedPrompt } from './get-validated-prompt';
1111
import { InvalidMessageRoleError } from './invalid-message-role-error';
12+
import { getErrorMessage } from '@ai-sdk/provider-utils';
1213

1314
export function convertToLanguageModelPrompt(
1415
prompt: ValidatedPrompt,
@@ -80,6 +81,54 @@ export function convertToLanguageModelMessage(
8081
};
8182
}
8283

84+
// try to convert string image parts to urls
85+
if (typeof part.image === 'string') {
86+
try {
87+
const url = new URL(part.image);
88+
89+
switch (url.protocol) {
90+
case 'http:':
91+
case 'https:': {
92+
return {
93+
type: 'image',
94+
image: url,
95+
mimeType: part.mimeType,
96+
};
97+
}
98+
case 'data:': {
99+
try {
100+
const [header, base64Content] = part.image.split(',');
101+
const mimeType = header.split(';')[0].split(':')[1];
102+
103+
if (mimeType == null || base64Content == null) {
104+
throw new Error('Invalid data URL format');
105+
}
106+
107+
return {
108+
type: 'image',
109+
image:
110+
convertDataContentToUint8Array(base64Content),
111+
mimeType,
112+
};
113+
} catch (error) {
114+
throw new Error(
115+
`Error processing data URL: ${getErrorMessage(
116+
message,
117+
)}`,
118+
);
119+
}
120+
}
121+
default: {
122+
throw new Error(
123+
`Unsupported URL protocol: ${url.protocol}`,
124+
);
125+
}
126+
}
127+
} catch (_ignored) {
128+
// not a URL
129+
}
130+
}
131+
83132
const imageUint8 = convertDataContentToUint8Array(part.image);
84133

85134
return {

0 commit comments

Comments
 (0)
Please sign in to comment.