• Steve@awful.systems
    link
    fedilink
    English
    arrow-up
    40
    ·
    3 months ago

    Is it absurd that the maker of a tech product controls it by writing it a list of plain language guidelines? or am I out of touch?

    • Kg. Madee Ⅱ.@mathstodon.xyz
      link
      fedilink
      arrow-up
      31
      arrow-down
      1
      ·
      3 months ago

      @fasterandworse @dgerard I mean, it is absurd. But it is how it works: an LLM is a black box from a programming perspective, and you cannot directly control what it will output.
      So you resort to pre-weighting certain keywords in the hope that it will nudge the system far enough in your desired direction.
      There is no separation between code (what the provider wants it to do) and data (user inputs to operate on) in this application 🥴

      • corbin@awful.systems
        link
        fedilink
        English
        arrow-up
        7
        ·
        3 months ago

        That’s the standard response from last decade. However, we now have a theory of soft prompting: start with a textual prompt, embed it, and then optimize the embedding with a round of fine-tuning. It would be obvious if OpenAI were using this technique, because we would only recover similar texts instead of verbatim texts when leaking the prompt (unless at zero temperature, perhaps.) This is a good example of how OpenAI’s offerings are behind the state of the art.

      • intensely_human
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 months ago

        I mean, I hate myself for being this pedantic but technically there is code. But the code to run an LLM as it trains or generates responses is almost analogous to the hardware in the traditional hardware/software split.

        I guess the layers are:

        • Actual hardware: GPUs etc
        • ”The algorithm” / “The software hardware”: Matrix math, back propagation, etc
        • The configuration: a number of layers, number of parameters, etc
        • The … test suite?: training dataset
        • The app: a trained model
        • Data: prompts, including the prompt that is the entire conversation so far

        I dunno. It’s harder than I thought to make an analogy between these layers.

        • Kg. Madee Ⅱ.@mathstodon.xyz
          link
          fedilink
          arrow-up
          3
          ·
          3 months ago

          @intensely_human yes, that’s about what I meant: you can’t make any directed changes to the actual code level, so the vendor has to make their customization at the same data level that users make their inputs. And that’s why there is no way to prevent users from overriding the initial prompt

          • intensely_human
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 months ago

            Well, the vendor can also make their customization in the training data.

            It’s hard, because it takes a lot more depth of connections to encapsulate a concept like “hide the following fact”, but just like with spies, the best time to thwart interrogation is during their training, not during their mission briefing.

    • ebu@awful.systems
      link
      fedilink
      English
      arrow-up
      20
      ·
      edit-2
      3 months ago

      simply ask the word generator machine to generate better words, smh

      this is actually the most laughable/annoying thing to me. it betrays such a comprehensive lack of understanding of what LLMs do and what “prompting” even is. you’re not giving instructions to an agent, you are feeding a list of words to prefix to the output of a word predictor

      in my personal experiments with offline models, using something like “below is a transcript of a chat log with XYZ” as a prompt instead of “You are XYZ” immediately gives much better results. not good results, but better

        • intensely_human
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 months ago

          Our brains only emulate precision as well. We’re better at it not an an architectural level but just because we’re configured to use various strategies to check thoughts against other thoughts and observations.

          We can’t directly perceive logic. We have heuristics for generating logical steps, and we have heuristics for locating and detection obvious breaks. But nobody has an algorithm in their own head to rigorously check all the logic using a single pass through some kind of structure. It’s an asymptotic thing we approach by sort of slashing at logical claims from various angles to see if we can break the structure. We have a set number of slashes we’re sort of biased toward being satisfied with that “huh, the logic is sound” on that one.

          I think LLMs could be a lot more precise, without much change in the architecture of the neural net parts, if we just did some old school code (or for fun we could use natural language interpreted like this set of leaked prompt instructions) and the code carries out a strategy of checking A versus B and having different pairs of LLMs debate with each other and have an LLM boss a different one around by writing out lists like “Consider the part with the table. Is there any way that could go wrong?”

          Or, even better, recognize that it relies on losing the larger prompt context.

          Instead of “now go back and review your idea for problems”, you present it to a fresh LLM without knowledge of why it’s being asked:

          Does this plan: yadda-yadda, make sense in terms of the sequence of events? If anything is out of order, report bad plan. Else report okay.

          A different LLM is asked:

          Does this plan: yadda-yadda, make sense in terms of the cash flows in and out? Do they add up? (If there isn’t any money involved in the plan you report it as okay)

          Yet another:

          Does this plan: yadda-yadda-yadda, make sense in terms of the first step not having any other requirements that aren’t already true?

          … etc

          Then another LLM is being presented with:

          Here’s what seventeen different LLMs said about whether this plan makes sense on various dimensions. Your response should just be whether any items on this list read “not okay”:

          • Sequence: okay
          • Cash flow: okay
          • First step immediately doable: not okay
          • Each step actually required for step right after it: okay

          And it builds up from there.

          You can also have LLMs define these structures of pipelines of things to check and all this in order to pass an idea as legit. You can even just copy each exact prompt-check multiple times in parallel and average those outputs to eliminate noise ephemera.

          • intensely_human
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 months ago

            Main point: maybe human logical or design precision comes from being able to do the equivalent of context-free presentation of sub-questions to little LLMs in the mind. To divorce a particular evaluation from the bias introduced by narrative-generation in context.

      • o7___o7@awful.systems
        link
        fedilink
        English
        arrow-up
        10
        ·
        edit-2
        3 months ago

        simply ask the word generator machine to generate better words, smh

        Butterfly man: “Is this recursive self-improvement”

      • intensely_human
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 months ago

        comprehensive lack of understanding of what LLMs do and what “prompting” even is. you’re not giving instructions to an agent, you are feeding a list of words to prefix to the output of a word predictor

        Why not both? Like, a mouse is nothing but chemical reactions but a mouse is also an intelligent thing. A house is just wood and plaster but it’s also a home. A letter is just ink on wood fibers but it’s also a job offer.

        An LLM is nothing but predictive text generator / statistical prompt completer / glorified autocomplete / an array of matrices of floating point numbers / a csv file.

        But it’s also a personlike mind that thinks and follows instructions simply because the following of instructions was a behavior manifest in the set of utterances it was shaped around.

        Happy to break any of these seemingly woo words down into precise engineering definitions if you need, but please trust I’m using them because they’re the shortest way to convey legit concepts when I say:

        The trained model has absorbed the spirit of those whose speech it trained on. That spirit is what responds to instructions like a person, and which responds to being addressed as “You”.

        That’s why addressing it as “you” works at all.

    • barsquid@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      3 months ago

      It is absurd. It’s just throwing words at it and hoping whatever area of the vector database it starts generating words from makes sense in response.

      • FRANK.MCCONNEL@fosstodon.org
        link
        fedilink
        arrow-up
        7
        ·
        2 months ago

        @fasterandworse @dgerard I mean, it’s like catnip for the people who control how the company’s money is spent

        For absurd, I think one would want the LLM’s configuration language to be more like INTERCAL; but this may also be more explicit about how your instructions are merely suggestions to a black box full of weights and pulleys and with some randomness added to make it less predictable/repetitive