We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision.

https://arxiv.org/abs/2311.07590

  • rambaroo@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    1 year ago

    ChatGPT is not consciousness. It’s literally just a language model that’s spent countless hours learning how to generate human language. It has no awareness of its existence and no capability for metacognition. We know how ChatGPT works, it isn’t a mystery. It can’t do a single thing without human input.

    • lolcatnip@reddthat.com
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      1 year ago

      The thing about saying something is or isn’t conscious is that we don’t have any good theory of what consciousness even is. It’s not something we can measure. The only way we can assure ourselves that other people are conscious is that they claim to be conscious in ways we find convincing and otherwise behave in ways we associate with our own consciousness.

      I can’t think of any reason why a lump of silicon should attain consciousness because you ran the right program on it, but I also can’t see why a blob of cells should be conscious either. I also can’t think of any reason why we’d be aware of it if a lump of silicon did become conscious.

    • 0ops
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      4
      ·
      1 year ago

      A.) Do you have proof for all of these claims about what llm’s aren’t, with definitions for key terms? B.) Do you have proof that these claims don’t apply to yourself? We can’t base our understanding of intelligence, artificial or biological, on circular reasoning and ancient assumptions.

      It can’t do a single thing without human input.

      That’s correct, hence why I said that chatGPT isn’t there yet. What are you without input though? Is a human nervous system floating in a vacuum conscious? What could it have possibly learned? It doesn’t even have the concept of having sensations at all, let alone vision, let alone the ability to visualize anything specific. What are you without an environment to take input from and manipulate/output to in turn?