when digging around I happened to find this thread which has some benchmarks for a diff model
itās apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf. strawberry
), then any Use Deserving Of o3 (letās say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a āreasonableā response latency (ā5-ish secondsā)
so youād need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in āquickā time. which, again, isnāt even whatever the fuck people are doing with openaiās garbage.
utter, complete, comprehensive clownery. era-redefining clownery.
but some dumb motherfucker in a bar will keep telling me itās the future. and I get to not boop 'em on the nose. le sigh.
that list undercounts far more than I expected it to