MarinaraSpaghetti's merging experience

Share
3,379 views8/20/2024
This is obviously a joke video, all the folks and models mentioned in it are actually wonderful, and I love you all. <3 Keep up the amazing work!

1 Comment

-5
hugalafutro1 year ago
I wish more models worked well above 20-30k context, most of the ones I like the writing of are 8k. It's ok for quick coom bot chat, but unusable for meaningful longer rp. Been using command-r for that, just wish I could run it with more than 16k context locally without burning my gpu and electric bill in one fell swoop
Show captions (47)
00:00 - 00:05So I've been doing nothing but NeMo merges for the past 72 hours.
00:05 - 00:07Trying to pull off the best long context one!
00:07 - 00:10Shuffling around Shuttle Mini and Magnum 2.5 KTO like a professional
00:11 - 00:16dealer at an expensive casino, except you can't rely on card counting to win.
00:17 - 00:21And then I found out Magnum shits itself at contexts above 32k.
00:21 - 00:24Just breaks!
00:29 - 00:31It BREAKS!
00:31 - 00:34'Falls off on higher contexts', my ass!
00:38 - 00:40It spurts nonsense!
00:41 - 00:47But that's not the end of the world since I still have Shuttle trained on 128k, right?!
00:47 - 00:50I praised Shuttle on Drummer's Discord for working on high contexts and
00:50 - 00:56Fizz suddenly jumps in with "I have no idea HOW."
00:58 - 00:59"It was trained with 16k!"
00:59 - 01:01Just like Magnum, yet it works!
01:04 - 01:06Kalomaze about to go on a suicide watch!
01:06 - 01:08Meanwhile MistralAI is just taking the piss!
01:13 - 01:14But that's not all!
01:14 - 01:16Turns out the best model
01:16 - 01:20working on high contexts
01:21 - 01:27is fucking Lyra v1!
01:35 - 01:36The only one that claims to handle only up to 16k!
01:36 - 01:39I was putting it lower,
01:41 - 01:44even removing it at some point from my merges entirely.
01:44 - 01:47Because Sao claimed that it fell off after 16k.
01:47 - 01:49You know, they said "they have tried loras with up to 64K,
01:50 - 01:53but they just do not work well."
01:54 - 01:56My ass!
01:56 - 01:58Lyra is better at recalling stuff
01:58 - 02:02than the official NeMo Instruct!
02:05 - 02:07No joke.
02:10 - 02:11Meanwhile,
02:11 - 02:13models like Rocinante
02:15 - 02:17trained atop Instruct,
02:17 - 02:20also shit themselves
02:21 - 02:28at 32k contexts!
02:29 - 02:30That's just sad.
02:30 - 02:32The only 'moist' thing about it
02:36 - 02:39are the tears it brings out of me
02:40 - 02:42at how much time I wasted
02:50 - 02:55on trying to ram it into my merges!
02:56 - 02:58It didn't make the prose better?
02:58 - 03:04No, no, it writes pretty good!
03:04 - 03:09That is, only if you don't use ChatML!
03:09 - 03:11And I was trying to pull off a ChatML merge.
03:16 - 03:20Jokes on me I guess!
03:23 - 03:26Back to Mistral's shitty [INST]!
03:31 - 03:36I'm done with everyone's shit of putting special tokens wherever they want!