Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:

I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.

Aaaand I got blocked

  • nothacking@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    2
    ·
    9 months ago

    They are programmed to do that to cover the companies ass. They are also set up to not trust anything you tell them. I once tried to get chatGPT to accept that Russia might have invaded Ukraine in 2022, and it refused to believe anything not in the training data. (Might be different now, they seem to be updating it, just find a new recent event)

    • straypet@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      edit-2
      9 months ago

      Well, of course. Who would in their right mind would set it up so random input from random people online gets included into the model?

      The model is trained on known data and the web interface only lets you use the model, not contribute to train it.

      • Womble@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        9 months ago

        Its not training the model, it’s the model using the context you provide it (in that instance). If you use an unfiltered LLM it will run with anything you say and go from there, for example you could tell it Mexico reclaimed Texas and it would carry on as if that’s true. But only until you close it down its not permanently changing the model it is just changing the context in which that instance is running.

        The big tech companies are going to huge lengths to filter and censor their LLMs when used by the public both to prevent negative PR and because they dont want people to have unrestricted access to them.

    • CleoTheWizard@beehaw.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      And for good reason. If they trusted user input and took it at face value even for just the current conversation, the user could run wild and get it saying basically anything.

      Also chatGPT not having current info is a problem when trying to feed it current info. It will either try to daydream with you or it will follow its data that has hundreds of sources saying they haven’t invaded yet.

      As far as covering the companies ass, I think AI models currently have plenty of problems and I’m amazed that corporations can just let this run wild. Even being able to do what OP just did here is a big liability because more laws around AI aren’t even written yet. Companies are fine being sued and expect to be through this. They just think that will cost less than losing out on AI. And I think they’re right.