When German journalist Martin Bernklautyped his name and location into Microsoft’s Copilot to see how his articles would be picked up by the chatbot, the answers horrified him. Copilot’s results asserted that Bernklau was an escapee from a psychiatric institution, a convicted child abuser, and a conman preying on widowers. For years, Bernklau had served as a courts reporter and the AI chatbot had falsely blamed him for the crimes whose trials he had covered.

The accusations against Bernklau weren’t true, of course, and are examples of generative AI’s “hallucinations.” These are inaccurate or nonsensical responses to a prompt provided by the user, and they’re alarmingly common. Anyone attempting to use AI should always proceed with great caution, because information from such systems needs validation and verification by humans before it can be trusted.

But why did Copilot hallucinate these terrible and false accusations?

  • Railcar8095
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 months ago

    The example you shared is not an LLM. It’s a classic chatbot with pre-defined answers. It basically knows keyword to KB article. If no term is known, it will tell “I don’t know”. It will also suggest incorrect KB if picks one keyword, ignoring the rest of the context. It has no idea of the answer is correct by any means. At best somebody will periodically check a sample of questions that the user didn’t consider correct to evaluate the pairings, but it’s not AI, at least not a good one

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 months ago

      If you read my answers you’ll see that I said they are not llm. They are language models powered by smaller datasets and with smaller neural networks.

      I picked a tax agency in particular because I know first hand that tax agencies (I would surprise me that UK didn’t use it) do use language models with neural networks, notice that again I’m not saying generative llm, to parse the question and select a proper answer. Not the keyword method you think they use.

      I would have provided the first hand example I know but it is spanish and people may not be able to effectively understand it. But I do know that tax agencies usually use very similar tools one country from another. So probably UK does use it. If you want to test the spanish one here it is. And sources on what type of AI is used.

      https://sede.agenciatributaria.gob.es/Sede/ayuda/herramientas-asistencia-virtual.html

      https://es.newsroom.ibm.com/2018-02-28-La-Agencia-Tributaria-utiliza-IBM-Watson-para-ayudar-a-las-empresas-en-la-gestion-del-IVA

      Again, because it seems that I need to repeat this so people can properly train on the info I’m writing, not LLM, not GPT, not a large general use language model. As for that amount of parameters cutting not confident answers would cut most answers, probably. At least with nowadays state of technology, things keep improving each year.

      Edit: found some english source on the matter https://www.investinspain.org/en/news/2024/ibm

      The chatbot it is still only in spanish and co-official languages still.

      • Railcar8095
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        3 months ago

        That’s what you’re missing. Those are not language models nor use neural networks. At best they use a classification NLP. They do not generate text, use pick pre-constructed answers based on the inputs. Because it this three’s no confidence beyond “what’s generally the correct based on this keyword”

        I’ve worked with IBM Watson. That existed and was used for basic bots a decade ago. You have you manually feed the terms to outputs.

        Y he usado la web de la agencia tributaria para confirmar lo que digo.