| 00:00 - 00:05 | So I've been doing nothing but NeMo merges for the past 72 hours. |
| 00:05 - 00:07 | Trying to pull off the best long context one! |
| 00:07 - 00:10 | Shuffling around Shuttle Mini and Magnum 2.5 KTO like a professional |
| 00:11 - 00:16 | dealer at an expensive casino, except you can't rely on card counting to win. |
| 00:17 - 00:21 | And then I found out Magnum shits itself at contexts above 32k. |
| 00:21 - 00:24 | Just breaks! |
| 00:29 - 00:31 | It BREAKS! |
| 00:31 - 00:34 | 'Falls off on higher contexts', my ass! |
| 00:38 - 00:40 | It spurts nonsense! |
| 00:41 - 00:47 | But that's not the end of the world since I still have Shuttle trained on 128k, right?! |
| 00:47 - 00:50 | I praised Shuttle on Drummer's Discord for working on high contexts and |
| 00:50 - 00:56 | Fizz suddenly jumps in with "I have no idea HOW." |
| 00:58 - 00:59 | "It was trained with 16k!" |
| 00:59 - 01:01 | Just like Magnum, yet it works! |
| 01:04 - 01:06 | Kalomaze about to go on a suicide watch! |
| 01:06 - 01:08 | Meanwhile MistralAI is just taking the piss! |
| 01:13 - 01:14 | But that's not all! |
| 01:14 - 01:16 | Turns out the best model |
| 01:16 - 01:20 | working on high contexts |
| 01:21 - 01:27 | is fucking Lyra v1! |
| 01:35 - 01:36 | The only one that claims to handle only up to 16k! |
| 01:36 - 01:39 | I was putting it lower, |
| 01:41 - 01:44 | even removing it at some point from my merges entirely. |
| 01:44 - 01:47 | Because Sao claimed that it fell off after 16k. |
| 01:47 - 01:49 | You know, they said "they have tried loras with up to 64K, |
| 01:50 - 01:53 | but they just do not work well." |
| 01:54 - 01:56 | My ass! |
| 01:56 - 01:58 | Lyra is better at recalling stuff |
| 01:58 - 02:02 | than the official NeMo Instruct! |
| 02:05 - 02:07 | No joke. |
| 02:10 - 02:11 | Meanwhile, |
| 02:11 - 02:13 | models like Rocinante |
| 02:15 - 02:17 | trained atop Instruct, |
| 02:17 - 02:20 | also shit themselves |
| 02:21 - 02:28 | at 32k contexts! |
| 02:29 - 02:30 | That's just sad. |
| 02:30 - 02:32 | The only 'moist' thing about it |
| 02:36 - 02:39 | are the tears it brings out of me |
| 02:40 - 02:42 | at how much time I wasted |
| 02:50 - 02:55 | on trying to ram it into my merges! |
| 02:56 - 02:58 | It didn't make the prose better? |
| 02:58 - 03:04 | No, no, it writes pretty good! |
| 03:04 - 03:09 | That is, only if you don't use ChatML! |
| 03:09 - 03:11 | And I was trying to pull off a ChatML merge. |
| 03:16 - 03:20 | Jokes on me I guess! |
| 03:23 - 03:26 | Back to Mistral's shitty [INST]! |
| 03:31 - 03:36 | I'm done with everyone's shit of putting special tokens wherever they want! |