Using AI-generated images to train AI quickly creates a loop where the results get worse in either quality or visual diversity

  • wolfshadowheart@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    No shit?

    I was going to say more, but the article put my thoughts more concisely.

    But Shumailav remains optimistic about using synthetic data to train AIs. The key would be to make sure any AI-generated data is high quality and free from systematic errors that could more severely impact results down the road.

    Essentially what I was going to say is how it’s not necessarily the images but the quality and correctness for training. Imagine an XY grid 1-5 and you ask it to display green for circle and red for square. Eventually after 100 selections you have a mishmash of green and red. This is the training data under a microscope.
    What we should be doing for stable diffusion models is asking it to then redisplay circle and square and training it by confirmation.

    If you just feed AI images that AI produces, well, no shit the quality will go down? We produce thousands of images of a prompt and choose less than 10 that we seem worthwhile enough to edit.

    Like the article said, this will be more possible when AI actually can 100% determine and output an exact image. Right now we are at super high detection rates, I believe like 98% under the right circumstances. But for output we are far from 100% perfect output.