Show HN: Docker Model Runner Integrates vLLM for High-Throughput Inference ~ Read Better Quote

Thursday, November 20, 2025

Show HN: Docker Model Runner Integrates vLLM for High-Throughput Inference

Hi HN, I’m one of the authors of this post.

We’ve updated Docker Model Runner to support vLLM alongside the existing llama.cpp backend. The goal is to bridge the gap between local prototyping (often done with GGUF/llama.cpp) and high-throughput production (often done with Safetensors/vLLM) using a consistent Docker workflow.

Key technical details:

Auto-routing: The tool detects the model format. If you pull a GGUF model, it routes to llama.cpp. If you pull a Safetensors model, it routes to vLLM.

API: It exposes an OpenAI-compatible API (/v1/chat/completions), so the client code doesn't need to change based on the backend.

Usage: It’s just docker model run ai/smollm2-vllm.

Current Limitations:

Right now, the vLLM backend is optimized for x86_64 with Nvidia GPUs.

We are actively working on WSL2 support for Windows users and DGX Spark compatibility.

Happy to answer any questions about the integration or the roadmap!

https://www.docker.com/blog/docker-model-runner-integrates-v...

Comments URL: https://news.ycombinator.com/item?id=45996081

Points: 2

# Comments: 1

from Hacker News: Newest https://ift.tt/RiYfJ0z

Read Better Quote

Thursday, November 20, 2025

Show HN: Docker Model Runner Integrates vLLM for High-Throughput Inference

0 comments:

Post a Comment

Popular Posts

Blog Archive

Search

Labels

About