• HoloPengin@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    As a side note because it wasn’t too clear from your writing, but the weights are only tweaked a tiny tiny bit by each training image. Unless the trainer sees the same image a shitload of times (Mona Lisa, that one stock photo used to show off phone cases, etc) then the image can’t be recreated by the AI at all. Elements of the image that are shared with lots of other images (shading style, poses, Mario’s general character design, etc) could, but you’re never getting that one original image or even any particular identifiable element from it out of the AI. The AI learns concepts and how they interact because the amount of influence it takes from each individual image and its caption is so incredibly tiny but it trains on hundreds of millions of images and captions. The goal of the AI image generation is to be able to create vast variety of images directed by prompts, and generating lots of images which directly resemble anything in the training set is undesirable, and in the field it’s called over-fitting.

    Anyways, the end result is that AI isn’t photo-bashing, it’s more like concept-bashing. And lots of methods exist now to better control the outputs, from ControlNet, to fine-tuning on a smaller set of images, to Dalle-3 which can follow complex natural language prompts better than older methods.

    Regardless, lots of people find that training generative AI using a mass of otherwise copyrighted data (images, fan fiction, news articles, ebooks, what have you) without prior consent just really icky.

    • drathvedro
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      You show it a piece of art with a whole lot of tags attached. It then semi-randomly changes pixel colors until it matches the training image. That set of instructions is associated with the tags, and the two are combined into a series of tiny weights that the randomizer uses. Anyways, the end result is that AI isn’t photo-bashing, it’s more like concept-bashing

      That’s what I’ve meant by “very finely shredded pieces”. Ioversimplifed it, yes. But what I mean is that it’s not literally taking a pixel off an image and putting it into output. But that using the original image in any way is just copying with extra steps.

      Say, we forego AI entirely and talk real world copyright. If I were to record a movie theater screen with a camcorder, I would commit copyright infringement, even though it’s transformed by my camera lens. Same as If I were to distribute the copyrighted work in a ZIP file, invert colors, or trace every frame and paint it with watercolors.

      What if I was to distribute the work’s name alongside it’s SHA-1 hash? You might argue that such transformation destroys the original work and can no longer be used to retrieve the original and therefore should be legal. But, if that was the case, torrent site owners could sleep peacefully knowing that they are safe from prosecution. Real world has shown that it’s not the case.

      Now, what if we take some hashing function and brute force the seed until we get one which outputs the SHA-1’s of certain works given their names. That’d be a terrible version of AI, acting exactly like an over-trained model would: spouting random numbers except for works it was “trained” upon. Is distributing such seed/weight a copyright violation? I’d argue that’d be an overly complicated way to conceal piracy, but yes, it would be. Because those seeds/weights are are still a based on the original works, even if not strictly a direct result of their transformation.

      Anyways, the end result is that AI isn’t photo-bashing, it’s more like concept-bashing

      Copying concepts is also a copyright infringement, though

      Regardless, lots of people find that training generative AI using a mass of otherwise copyrighted data (images, fan fiction, news articles, ebooks, what have you) without prior consent just really icky.

      It shouldn’t be just “icky”, it should be illegal and be prosecuted ASAP. The longer it goes on like this, the more the entire internet is going to be filled with those kind-of-copyrighted things, and eventually turn into a lawsuit shitstorm.

      • HoloPengin@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Heads up, this is a long fucking comment. I don’t care if you love or hate AI art, what it represents, or how it’s trained. I’m here to inform, refine your understanding of the tools (and how exactly that might fit in the current legal landscape), and nothing more. I make no judgements about whether you should or shouldn’t like AI art or generative AI in general. You may disagree about some of the legal standpoints too, but please be aware of how the tools actually work because grossly oversimplifying them creates serious confusion and frustration when discussing it.

        Just know that, because these tools are open source and publically available to use offline, Pandora’s box has been opened.

        copying concepts is also copyright infringement

        Except it really isn’t in many cases, and even in the cases where it could be, there can be rather important exceptions. How this all applies to AI tools/companies themselves is honestly still up for debate.

        Copyright protects actual works (aka “specific expression”), not mere ideas.

        The concept of a descending blocks puzzle game isn’t copyrighted, but the very specific mechanics of Tetris are copyrighted. The concept of a cartoon mouse isn’t copyrighted, but mickey mouse’s visual design is. The concept of a brown haired girl with wolf ears/tail and red eyes is not copyrighted, but the exact depiction of Holo from Spice and Wolf is (though that’s more complicated due to weaker trademark and stronger copyright laws in Japan). A particular chord progression is not copyrightable (or at least it shouldn’t be) but a song or performance created with it is.

        A mere concept is not copyrightable. Once the concept is specific enough and you have copyrighted visual depictions of it, then you start to run more into trademark law territory and start to gain a copyright case. I really feel like these cases are kinda exceptions though, at least for the core models like stable diffusion itself, because there’s just so much existing art (both official and even moreso copyright/trademark infringing fan art) of characters like Mickey Mouse anyways.

        The thing the AI does is distill concepts and interactions between concepts shared between many input images, and can do so in a generalized way that allows concepts never before seen together to be mixed together easily. You aren’t getting transformations of specific images out of the AI, or even small pieces of each trained image, you’re instead getting transformations of learned concepts shared across many many many works. This is why the shredding analogy just doesn’t work. The AI generally doesn’t, and is not designed to, mimic individual training images. A single image changes the weights of the AI by such a miniscule amount, and those exact same weights are also changed by many other images the AI trains on. Generative AI is very distinctly different from tracing, or distributing mass information that’s precisely specific enough to pirate content, or from transforming copyrighted works to make them less detectable.

        To drive the point home, I’d like to expand on how the AI and its training is actually implemented, because I think that might clear some things up for anyone reading. I feel like the actual way in which the AI training uses images matters.

        A diffusion model, which is what current AI art uses, is a giant neural network that we want to guess the noise pattern of an image. To train it on an image, we add some random amount of noise to the whole image (could be a small amount like film grain, or it could be enough to make the image completely noise, but it’s random each time), then pass that image and its caption through the AI to get the noise pattern the AI guesses is in the image. Now we take the difference between the noise pattern it guessed and the noise pattern we actually added to the training image to calculate the error. Finally, we tweak the AI weights based on that error. Of note, we don’t tweak the AI to perfectly guess the noise pattern or reduce the error to zero, we barely tweak the AI to guess ever so slightly better (like, 0.001% better). Because the AI is never supposed to see the same image many times, it has to learn to interpret the captions (and thus concepts) provided alongside each image to direct its noise guesses. The AI still ends up being really bad at guessing high noise or completely random noise anyways, which is yet another reason why it can’t generally reproduce existing trained images from nothing.

        Now let’s talk about generation (aka “inference”). So we have an AI that’s decent at guessing noise patterns in existing images as long as we provide captions. This works even for images that it didn’t train on. That’s great for denoising and upscaling existing images, but how do we get it to generate new unique images? By asking it to denoise random noise and giving it a caption! It’s still really shitty at this though, the image just looks like some blobby splotches of color with no form, else it probably wouldn’t work at denoising existing images anyways. We have a hack though: add some random noise back into the generated image and send it through the AI again. Every time we do this, the image gets sharper and more refined, and looks more and more like the caption we provided. After doing this 10-20 times we end up with a completely original image that isn’t identifiable in the training set but looks conceptually similar to existing images that share similar concepts. The AI has learned not to copy images while training, but actually learned visual concepts. Concepts which are generally not copyrighted. Some very specific depictions which it learns are technically copyrighted, i.e. Mickey Mouse’s character design, but the problem with that claim too is that there are fair use exceptions, legitimate use cases, which can often cover someone who uses the AI in this capacity (parody, educational, not for profit, etc). Whether providing a tool that can just straight up allow anyone to create infringing depictions of common characters or designs is legal is up for debate, but when you use generative AI it’s up to you to know the legality of publishing the content you create with it, just like with hand made art. And besides, if you ask an AI model or another artist to draw Mickey mouse for you, you know what you’re asking for, it’s not a surprise, and many artists would be happy to oblige so long as their work doesn’t get construed as official Disney company art. (I guess that’s sorta a point of contention about this whole topic though isn’t it? If artists could get takedowns on their mickey mouse art, why wouldn’t an AI model get takedowns too for trivially being able to create it?)

        Anyways, if you want this sort of training or model release to be a copyright violation, as many do, I’m unconvinced current copyright/IP laws could handle it gracefully, because even if the precise method by which AI’s and humans learn and execute is different, the end result is basically the same. We have to draw new more specific lines on what is and isn’t allowed, decide how AI tools should be regulated while taking care not to harm real artists, and few will agree on where the lines should be drawn.

        Also though, Stable Diffusion and its many many descendents are already released publicly and open source (same with Llama for text generation), and it’s been disseminated to so many people that you can no longer stop it from existing. That fact doesn’t give StabilityAI a pass, nor do other AI companies who keep their models private get a pass, but it’s still worth remembering that Pandora’s box has already been opened.