ChatGPT spills its prompt

David Gerard@awful.systems · 4 months ago

ChatGPT spills its prompt

intensely_human · 4 months ago

Our brains only emulate precision as well. We’re better at it not an an architectural level but just because we’re configured to use various strategies to check thoughts against other thoughts and observations.

We can’t directly perceive logic. We have heuristics for generating logical steps, and we have heuristics for locating and detection obvious breaks. But nobody has an algorithm in their own head to rigorously check all the logic using a single pass through some kind of structure. It’s an asymptotic thing we approach by sort of slashing at logical claims from various angles to see if we can break the structure. We have a set number of slashes we’re sort of biased toward being satisfied with that “huh, the logic is sound” on that one.

I think LLMs could be a lot more precise, without much change in the architecture of the neural net parts, if we just did some old school code (or for fun we could use natural language interpreted like this set of leaked prompt instructions) and the code carries out a strategy of checking A versus B and having different pairs of LLMs debate with each other and have an LLM boss a different one around by writing out lists like “Consider the part with the table. Is there any way that could go wrong?”

Or, even better, recognize that it relies on losing the larger prompt context.

Instead of “now go back and review your idea for problems”, you present it to a fresh LLM without knowledge of why it’s being asked:

Does this plan: yadda-yadda, make sense in terms of the sequence of events? If anything is out of order, report bad plan. Else report okay.

A different LLM is asked:

Does this plan: yadda-yadda, make sense in terms of the cash flows in and out? Do they add up? (If there isn’t any money involved in the plan you report it as okay)

Yet another:

Does this plan: yadda-yadda-yadda, make sense in terms of the first step not having any other requirements that aren’t already true?

… etc

Then another LLM is being presented with:

Here’s what seventeen different LLMs said about whether this plan makes sense on various dimensions. Your response should just be whether any items on this list read “not okay”:

Sequence: okay

Cash flow: okay

First step immediately doable: not okay

Each step actually required for step right after it: okay

…

And it builds up from there.

You can also have LLMs define these structures of pipelines of things to check and all this in order to pass an idea as legit. You can even just copy each exact prompt-check multiple times in parallel and average those outputs to eliminate noise ephemera.

intensely_human · 4 months ago

Main point: maybe human logical or design precision comes from being able to do the equivalent of context-free presentation of sub-questions to little LLMs in the mind. To divorce a particular evaluation from the bias introduced by narrative-generation in context.

ChatGPT spills its prompt

ChatGPT spills its prompt

ChatGPT just (accidentally) shared all of its secret rules – here's what we learned