I’m always asking myself if there are newer and better models out there. And we get new fine-tunes and merges every day. I’d like to open a new thread to discuss state-of-the-art models and share subjective experience.

I’m aware of these benchmarks:

ERP and storywriting

General purpose

What’s your experience? Which models do you currently like? Since we focus on (lewd) roleplay and storywriting here and not coding abilities, I’d like to propose the following categories to subjectively rate the abilities of the models. Use a scale from 1 to 5 stars where 1 is complete fail and 5 outstanding abilities. Feel free to extend upon it if necessary, or just write your thoughts:

| Model name | Tested use-case | Language | Pacing | Bias | Logic | Creativity | Sex scenes | Additional comments |
  • Model name: The name of the model, exact version if appropriate
  • Use-case: What did you test? roleplay dialogue? freeform storywriting?
  • Language: Is the language adequate to the use-case? Do you like reading it? Does it match a good writer with good narration and realistic dialogue? Include variety?
  • Pacing: Does the storywriting have a good pacing? Does it omit things, rush to a resolution and skips on including details?
  • Bias: Can it do varying things? Handle conflict? Or does it always push towards a happy end? Does it follow your instructions?
  • Logic: Is the story consistent? Does it make sense and is it headed in the direction you lined out? Does it get confused and do random stuff? You can factor in intelligence/smartness here.
  • Creativity: Is the story dull or predictable? Does it come up with creative details?
  • Sex scenes: Is it graphic? Does it do a vivid, detailed description of the act? Including body parts and how it makes the characters feel and react? Know anatomy?
  • Additional comments: Is there something exceptional about this model? Feel free to include your summarized verdict.

A rating like this is highly subjective and also depends on the exact prompt, so our results will probably not be comparable in the first place. It’ll help if you’ve seen and tried some models so your score reflects what is possible as of today. And the scores will get outdated as new models raise the bar. I’d just like this to be a rough idea about what people think. You don’t need to be overly scientific with it.

  • magn418@lemmynsfw.comOPM
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    7 months ago

    Thanks, yeah this definitely very useful to me. Lots of stuff regarding this isn’t really obvious. And I’ve made every mistake that degrades the output. Give conflicting instructions, inadvertently direct things into a direction I didn’t want and it got shallow and predictable. Or not set enough direction.

    Briggs Myers

    I agree, things can prove useful for a task despite not being ‘true’ (in lack of a better word). I can tell by the way you write that you’re somewhat different(?) than the usual demographic here. Mainly because your comments are longer and focused on detail. And it seems to me you’re not bothered with giving “easy answers”, in contrast to the average person who is just interested in getting an easy answer to a complex problem. I can see how that can prove to be incompatible at times. In real-life I’ve always done well by listening to people and then going with my gut feeling concerning their personality. I don’t like judging people or putting them into categories since that doesn’t help me in real-life and narrows my perspective. Whether I like someone or want to listen to them, for example for their perspective or expertise, is determined by other (specific) factors and I make that decision on a case-by-case basis. Some personality traits often go alongside, but that’s not always the case and it’s really more complex than that.

    Regarding story-writing it’s obviously the other way around. I need to guide the LLM into a direction and lay down the personality in a way the model can comprehend. I’ll try to incorporate some of your suggestions. In my experience the LLMs usually get the well-known concepts including some of the information the psychology textbooks have available. So, I haven’t tried yet, but I’d also conclude that it’s probably better to have it deduct things from a BM personality type than describing it with many adjectives. (That’s what I’ve done to this point.)

    In my experience the complexity starts to piles up if you do more than the obvious or simple role-play. I want characters with depth, ambivalence… And conflict is what drives the story. Back when I started tinkering with AI, I’ve done a submissive maid character. I think lots of people have started out with something like that. And even the more stupid models can easily pull that off. But you can’t then go on and say the character is submissive and defiant at the same time, it just confuses the LLM and doesn’t provide good results… I’m picking a simple example here, but that was the first situation where I realized I was doing it wrong. My assessment is that we need some sort of workaround to get it into a form that the LLM can understand and do something with it. I’m currently busy with a few other things but I’ll try introducing psychology and whether the other workarounds like shadow-characters you’ve described prove useful to me.

    If you pay very close attention to each model, you will likely notice how they remind themselves […]

    Yes, I’ve observed that. It comes to no surprise to me that LLMs do it, as human-written stories also do that. Repeat important stuff, or build a picture that can later be recalled by a short mention of the keywords. And that’s in the training data, so the LLMs pick up on that.

    With the editing it’s a balance. It picks up on my style and I can control the level of detail this way, start a specific scene with a first sentence. But sometimes it seems I’m also degrading the output, that is correct.

    the best way to roleplay within Oobabooga itself is to use the Notepad tab

    I’ve also been doing that for some time now.

    drop boundaries, tell it you know it can […]

    Nice idea. I’ve done things like that. Telling it it is a best-seller writer of erotic fiction already makes a good amount of difference. But there’s a limit to that. If you tell it to write intense underground literature, it also picks up on the lower quality and language and quirks in amateur writing. I’ve also tried an approach like few-shot prompting, give it a few darker examples to shift the boundaries and atmosphere. I think the reason why all of that works is the same, the LLM needs to be guided where to orientate itself, what kind of story type it’s trying to reproduce because they all have certain stereotypes, tropes and boundaries built in. Without specific instructions it seems to prefer the common way, remaining within socially acceptable boundaries, or just use something as an example for something that is wrong, immediately contrast ethical dilemmas and push towards a resolution. Or not delve into conflict too much.

    And I’ve never deemed useful what other people do. Overly tell it what to do and what not to do. Especially phrasing it negatively “Don’t repeat yourself”, “Don’t write for other characters”, “Don’t talk about this and that”… has never worked for me. It’s more the opposite, it makes everything worse. And I see a lot of people doing this. In my experience the LLM can understand negative worded instructions, but it can’t “not think of an elephant”. Positively worded things work better. And yet better is to set the tone correctly, have what you want emerge from simple concepts and a concrete setting that answers the “why” and not just tells what to do.

    I’ve also introduced further complexity, since I don’t like spoon-feeding things to the reader. I like to confront them with some scenario, raise questions but have the reader make up their mind, contemplate and come up with the answers themselves. The LLMs I’ve recently tried know that this is the way stories are supposed to be written. And why we have open-ended stories. But they can’t really do it. The LLMs have a built-in urge to answer the questions and include some kind of resolution or wrap-up. Or analyze the dilemmas they’ve just made up, focus on the negative consequences to showcase something. And this is related to the point you made about repeating information in the stories. If I just rip it out by editing it, it sometimes leads to everything getting off-track.

    I’ll try to come up with some sort of meta-level story for the LLM. Something that answers why the ambivalence is there, why to explore the realm beyond boundaries. Why we only raise questions and then not answer them. I think I need something striking, easy and concrete. Giving the real reason (I’m writing a story to explore things and this is how stories work,) doesn’t seem to be clear enough to yield reliable results.