- cross-posted to:
- hackernews@lemmy.smeargle.fans
- cross-posted to:
- hackernews@lemmy.smeargle.fans
My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.
You must log in or register to comment.