LLM 推理 & 部署
10 个项目跑模型用的 runtime/server(llama.cpp、Ollama 等)
这是什么
跑开源大模型的推理 runtime / server / 量化库。把模型权重变成可服务化的 HTTP/CLI 接口。
用什么场景
- 本地一键拉起模型对话(Ollama)
- CPU / 低显存环境跑量化模型(llama.cpp / llama-cpp-python)
- 多 GPU 集群推理服务(LocalAI / vLLM)
- 浏览器内推理(exo)
选型考虑
部署复杂度:Ollama 一行命令最友好,llama.cpp 编译麻烦但极致性能。硬件支持:Apple Silicon 选 llama.cpp / Ollama;Nvidia 多卡看 vLLM / LocalAI。语言:Python 生态接 llama-cpp-python;Go / 容器化部署看 Ollama。
主流项目
本分类下 10 个推理框架与运行时,按部署便利性与社区活跃度排序。
所有项目10
浏览该分类下的所有开源项目
LocalAI
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
exo
Run frontier AI locally.
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
ollama
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
llama-cpp-python
Python bindings for llama.cpp
text-generation-webui
The original local LLM interface. Text, vision, tool-calling, training, and more. 100% offline.
llm.c
LLM training in simple, raw C/CUDA
llama.cpp
LLM inference in C/C++
llama2-webui
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
lightning
Deep learning framework to train, deploy, and ship AI products Lightning fast.
相关分类
探索其他开源项目分类