I’m always asking myself if there are newer and better models out there. And we get new fine-tunes and merges every day. I’d like to open a new thread to discuss state-of-the-art models and share subjective experience.

I’m aware of these benchmarks:

ERP and storywriting

General purpose

What’s your experience? Which models do you currently like? Since we focus on (lewd) roleplay and storywriting here and not coding abilities, I’d like to propose the following categories to subjectively rate the abilities of the models. Use a scale from 1 to 5 stars where 1 is complete fail and 5 outstanding abilities. Feel free to extend upon it if necessary, or just write your thoughts:

| Model name | Tested use-case | Language | Pacing | Bias | Logic | Creativity | Sex scenes | Additional comments |
  • Model name: The name of the model, exact version if appropriate
  • Use-case: What did you test? roleplay dialogue? freeform storywriting?
  • Language: Is the language adequate to the use-case? Do you like reading it? Does it match a good writer with good narration and realistic dialogue? Include variety?
  • Pacing: Does the storywriting have a good pacing? Does it omit things, rush to a resolution and skips on including details?
  • Bias: Can it do varying things? Handle conflict? Or does it always push towards a happy end? Does it follow your instructions?
  • Logic: Is the story consistent? Does it make sense and is it headed in the direction you lined out? Does it get confused and do random stuff? You can factor in intelligence/smartness here.
  • Creativity: Is the story dull or predictable? Does it come up with creative details?
  • Sex scenes: Is it graphic? Does it do a vivid, detailed description of the act? Including body parts and how it makes the characters feel and react? Know anatomy?
  • Additional comments: Is there something exceptional about this model? Feel free to include your summarized verdict.

A rating like this is highly subjective and also depends on the exact prompt, so our results will probably not be comparable in the first place. It’ll help if you’ve seen and tried some models so your score reflects what is possible as of today. And the scores will get outdated as new models raise the bar. I’d just like this to be a rough idea about what people think. You don’t need to be overly scientific with it.

  • magn418@lemmynsfw.comOPM
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    9 months ago

    Thanks! Yeah, you were kind enough to include a bit of extra info in your previous posts. Your stories are somewhat specific and complex. I figured if you like a model… it has to be ‘intelligent’ enough to keep track…

    I wonder if I also like that model for my purposes. I’m not sure if I can run the 70B model, I’d have to spin up a runpod cloud instance for that. But I’ll try the FlatDolphinMaid 8x7B tomorrow.

    You’re right. (Good) AI storywriting and finding good models and settings isn’t easy. I also discarded models and approaches because the prompt (or settings) I used didn’t work that well and it later turned out I should have done more testing and got to like that model, all it needed was a different wording or better settings.

    And some models have unique quirks or style or things they excel at… Which might skew expectations when switching to a different model.