-
-
Notifications
You must be signed in to change notification settings - Fork 177
Description
Hi,
I am trying to implement the gpt 4o vision capabilities in Unity using this package. I am trying to send an image URL to the model in the following manner :
public async void SendImageUrlToGPT4(string imageurl)
{
var userMessage = new ChatMessage
{
Role = "user",
Content = "[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"
};
messages.Add(userMessage);
var request = new CreateChatCompletionRequest
{
Messages = messages,
Model = "gpt-4o",
MaxTokens = 300
};
var response = await openAI.CreateChatCompletion(request);
if (response.Choices != null && response.Choices.Count > 0)
{
var chatResponse = response.Choices[0].Message;
Debug.Log(chatResponse.Content);
OnResponse.Invoke(chatResponse.Content);
Debug.Log("Response Finished");
}
else
{
Debug.LogError("No response from GPT-4 Vision.");
}
}
However, the model always gives a response with incorrect descriptions which perhaps could be because there is some issue with the way the request is being sent to the model in Unity?
When I tried passing the same URL in the python code snippet provided by OpenAI, the model describes the image accurately. Here is the python code that I tested:
import openai
import json
# Set your API key
openai.api_key = ""
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129",
},
},
],
}
],
max_tokens=300,
)
print(response['choices'][0]['message']['content'])
Here is the JSON dump of the request payload in Unity:
{"Role":"user","Content":"[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129\"}]"}
The JSON dump in Python:
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What\u2019s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129"
}
}
]
}
],
"max_tokens": 300
}
Can you please let me know how can I correctly send the image to the model and get the correct response with the image description using this package? I am not sure what is causing this issue. Any insights on this would be greatly appreciated.