So, I’ve heard that ML manipulates tokens and specifically for the English corpora they take place of words. If we want model to be polite and not to speak uncomfortable language we can remove certain words from the internal array where all tokens and their associative data are stored, for example “fuck”.

  • xerox
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    (or in ChatGPT’s case, 3rd most likely)

    Why 3rd?

    • BURN@lemmy.world
      link
      fedilink
      arrow-up
      10
      arrow-down
      1
      ·
      1 year ago

      I believe that the 3rd or nth, word is because it sounds more human. The statistically first correct word ends up sounding very robotic and forced, where the 3rd is still very likely correct, but leads to variation in responses

      This is all from what I remember reading a mini-paper about it, so I could be wrong