Hacker News: Front Page 2026-05-29 19:38 Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA Open original source ↗ Reindex This Article Article URL: https://github.com/jmaczan/tiny-vllm Comments URL: https://news.ycombinator.com/item?id=48328184 Points: 164 # Comments: 14
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request Hacker News: Front Page • similarity 0.582
Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM Hacker News: Front Page • similarity 0.524
Can LLMs Beat Classical Hyperparameter Optimization Algorithms? Hacker News: Front Page • similarity 0.497
Show HN: Mach – A compiled systems language looking for contributions Hacker News: Front Page • similarity 0.450
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks Hacker News: Front Page • similarity 0.439
No comments yet.