WEBVTT

0
00:00.000 --> 00:05.000
So I've been doing nothing but NeMo merges for the past 72 hours.

1
00:05.000 --> 00:07.500
Trying to pull off the best long context one!

2
00:07.500 --> 00:10.500
Shuffling around Shuttle Mini and Magnum 2.5 KTO like a professional

3
00:11.000 --> 00:16.000
dealer at an expensive casino, except you can't rely on card counting to win.

4
00:17.500 --> 00:21.500
And then I found out Magnum shits itself at contexts above 32k.

5
00:21.500 --> 00:24.000
Just breaks!

6
00:29.500 --> 00:31.000
It BREAKS!

7
00:31.000 --> 00:34.500
'Falls off on higher contexts', my ass!

8
00:38.000 --> 00:40.000
It spurts nonsense!

9
00:41.000 --> 00:47.640
But that's not the end of the world since I still have Shuttle trained on 128k, right?!

10
00:47.640 --> 00:50.750
I praised Shuttle on Drummer's Discord for working on high contexts and

11
00:50.750 --> 00:56.000
Fizz suddenly jumps in with "I have no idea HOW."

12
00:58.000 --> 00:59.500
"It was trained with 16k!"

13
00:59.500 --> 01:01.500
Just like Magnum, yet it works!

14
01:04.200 --> 01:06.000
Kalomaze about to go on a suicide watch!

15
01:06.590 --> 01:08.500
Meanwhile MistralAI is just taking the piss!

16
01:13.000 --> 01:14.500
But that's not all!

17
01:14.500 --> 01:16.500
Turns out the best model

18
01:16.500 --> 01:20.000
working on high contexts

19
01:21.000 --> 01:27.000
is fucking Lyra v1!

20
01:35.000 --> 01:36.500
The only one that claims to handle only up to 16k!

21
01:36.500 --> 01:39.700
I was putting it lower,

22
01:41.000 --> 01:44.000
even removing it at some point from my merges entirely.

23
01:44.000 --> 01:47.000
Because Sao claimed that it fell off after 16k.

24
01:47.000 --> 01:49.000
You know, they said "they have tried loras with up to 64K,

25
01:50.000 --> 01:53.000
but they just do not work well."

26
01:54.750 --> 01:56.000
My ass!

27
01:56.000 --> 01:58.000
Lyra is better at recalling stuff

28
01:58.000 --> 02:02.500
than the official NeMo Instruct!

29
02:05.000 --> 02:07.000
No joke.

30
02:10.000 --> 02:11.000
Meanwhile,

31
02:11.000 --> 02:13.000
models like Rocinante

32
02:15.000 --> 02:17.000
trained atop Instruct,

33
02:17.000 --> 02:20.750
also shit themselves

34
02:21.000 --> 02:28.000
at 32k contexts!

35
02:29.000 --> 02:30.000
That's just sad.

36
02:30.000 --> 02:32.800
The only 'moist' thing about it

37
02:36.900 --> 02:39.800
are the tears it brings out of me

38
02:40.000 --> 02:42.750
at how much time I wasted

39
02:50.750 --> 02:55.000
on trying to ram it into my merges!

40
02:56.689 --> 02:58.189
It didn't make the prose better?

41
02:58.189 --> 03:04.000
No, no, it writes pretty good!

42
03:04.000 --> 03:09.000
That is, only if you don't use ChatML!

43
03:09.000 --> 03:11.500
And I was trying to pull off a ChatML merge.

44
03:16.689 --> 03:20.000
Jokes on me I guess!

45
03:23.900 --> 03:26.500
Back to Mistral's shitty [INST]!

46
03:31.000 --> 03:36.500
I'm done with everyone's shit of putting special tokens wherever they want!