An indepth explanation of how LLMs work with an minimum of jargon

wisdomchicken · 11 months ago

An indepth explanation of how LLMs work with an minimum of jargon

garyyo@lemmy.world · 11 months ago

We don’t understand it because no one designed it. We designed how to train a nn, we designed some parts of the structure, but not the individual parts inside. For the largest LLMs there are upwards of 70 billion different parameters. Each being individual numbers they were can tweak. The are just too many of them to understand what any individual one does, and since we just left a optimization algorithm do it’s optimizing we can’t really even know what groups of them do.

We can get around this, we can study it like we do the brain. Instead of looking at what an individual part does, group them together and figure out how they group influences things (AI explanability), or even get a different NN to look at it and generate an explanation (post hoc rationale generation). But that’s not really the same as actually understand what it is actually doing under the hood. What it is doing under the hood is more or less fundamentally unknowable, there is just to much information and it’s not well organized enough for us to be able to understand. Maybe one day we will be able to abstract what is going on in there and organize it in an understandable manner, but not yet.

An indepth explanation of how LLMs work with an minimum of jargon

An indepth explanation of how LLMs work with an minimum of jargon

Large language models, explained with a minimum of math and jargon