For each word that appears, note down a list of all the words that ever directly follow it - including end-of-sentence.
Now pick a starting word, pick a following-word at random from the list, rinse and repeat.
You can make it fancier if you want by noting how many times each word follows its predecessor in the sample text, and weighting the random choice accordingly.
Either way, the string of almost-language this produces is called a Markov chain.
It’s a bit like constantly picking the middle button in your phone’s autocomplete.
It’s a fun little exercise to knock together in your programming language of choice.
If you make a prompt-and-response bot out of it, learning from each input, it’s like talking to an oracular teddy bear. You almost can’t help being nice to it as you teach it to speak; humans will pack-bond with anything.
LLMs are the distant and very fancy descendants of these - but pack-bonding into an actual romantic relationship with one would be as sad as marrying a doll.
Take a whole bunch of text.
For each word that appears, note down a list of all the words that ever directly follow it - including end-of-sentence.
Now pick a starting word, pick a following-word at random from the list, rinse and repeat.
You can make it fancier if you want by noting how many times each word follows its predecessor in the sample text, and weighting the random choice accordingly.
Either way, the string of almost-language this produces is called a Markov chain.
It’s a bit like constantly picking the middle button in your phone’s autocomplete.
It’s a fun little exercise to knock together in your programming language of choice.
If you make a prompt-and-response bot out of it, learning from each input, it’s like talking to an oracular teddy bear. You almost can’t help being nice to it as you teach it to speak; humans will pack-bond with anything.
LLMs are the distant and very fancy descendants of these - but pack-bonding into an actual romantic relationship with one would be as sad as marrying a doll.