Skip to content

Issue with Sending Image URL to GPT-4o in Unity #129

@Zaf01

Description

@Zaf01

Hi,

I am trying to implement the gpt 4o vision capabilities in Unity using this package. I am trying to send an image URL to the model in the following manner :

 public async void SendImageUrlToGPT4(string imageurl)
    {
        var userMessage = new ChatMessage
        {
            Role = "user",
            Content = "[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"
        };


        messages.Add(userMessage);

        var request = new CreateChatCompletionRequest
        {
            Messages = messages,
            Model = "gpt-4o",
            MaxTokens = 300
        };

        var response = await openAI.CreateChatCompletion(request);


        if (response.Choices != null && response.Choices.Count > 0)
        {
            var chatResponse = response.Choices[0].Message;
          
            Debug.Log(chatResponse.Content);

            OnResponse.Invoke(chatResponse.Content);

            Debug.Log("Response Finished");
        }
        else
        {
            Debug.LogError("No response from GPT-4 Vision.");
        }
    }

However, the model always gives a response with incorrect descriptions which perhaps could be because there is some issue with the way the request is being sent to the model in Unity?

When I tried passing the same URL in the python code snippet provided by OpenAI, the model describes the image accurately. Here is the python code that I tested:


import openai
import json

# Set your API key
openai.api_key = ""

response = openai.ChatCompletion.create(
    model="gpt-4o",
   messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129",
          },
        },
      ],
    }
  ],
    max_tokens=300,
)

print(response['choices'][0]['message']['content'])

Here is the JSON dump of the request payload in Unity:

{"Role":"user","Content":"[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129\"}]"}

The JSON dump in Python:

{
      "model": "gpt-4o",
      "messages": [
            {
                  "role": "user",
                  "content": [
                        {
                              "type": "text",
                              "text": "What\u2019s in this image?"
                        },
                        {
                              "type": "image_url",
                              "image_url": {
                                    "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129"
                              }
                        }
                  ]
            }
      ],
      "max_tokens": 300
}

Can you please let me know how can I correctly send the image to the model and get the correct response with the image description using this package? I am not sure what is causing this issue. Any insights on this would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions