On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT. DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates novel images based on written descriptions called prompts. Although OpenAI released no technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of them licensed from stock websites like Shutterstock. It’s likely DALL-E 3 follows this same formula, but with new training techniques and more computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts. While OpenAI’s examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations. Compared to DALL-E 2, OpenAI says that DALL-E 3 refines small details like hands more effectively, creating engaging images by default with “no hacks or prompt engineering required.”

  • Quicky
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    3
    ·
    edit-2
    9 months ago

    Was the prompt “Woman from China”?

    Edit: I feel like the nuance of this joke may have been lost on some. Whether or not I read the article is irrelevant, since this was not a genuine question, rather a play on words of the double meaning of “china” as in “A woman from (the country) China” and “A woman (emerging) from china (porcelain)”.

    I’ll get my coat.

    • Chariotwheel@kbin.social
      link
      fedilink
      arrow-up
      16
      arrow-down
      4
      ·
      9 months ago

      The prompt is on the picture in the article:

      A DALL-E 3 image provided by OpenAI with the prompt: “A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.”

      Why do we need AI creating text, when nobody is reading?

      • Quicky
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 months ago

        The next time I make the same joke?

        I reckon I’ll just keep it to myself instead. I already feel ridiculous for having to explain it. Lemmy is harder than real life.