AI 奇想空间

Home

开源项目本页

LLM 推理 & 部署

10 个项目

跑模型用的 runtime/server（llama.cpp、Ollama 等）

这是什么

跑开源大模型的推理 runtime / server / 量化库。把模型权重变成可服务化的 HTTP/CLI 接口。

用什么场景

本地一键拉起模型对话（Ollama）
CPU / 低显存环境跑量化模型（llama.cpp / llama-cpp-python）
多 GPU 集群推理服务（LocalAI / vLLM）
浏览器内推理（exo）

选型考虑

部署复杂度：Ollama 一行命令最友好，llama.cpp 编译麻烦但极致性能。硬件支持：Apple Silicon 选 llama.cpp / Ollama；Nvidia 多卡看 vLLM / LocalAI。语言：Python 生态接 llama-cpp-python；Go / 容器化部署看 Ollama。

主流项目

本分类下 10 个推理框架与运行时，按部署便利性与社区活跃度排序。

所有项目10

浏览该分类下的所有开源项目

LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

#agents#ai#api

46.5k4.1k

1 天前

exo

Run frontier AI locally.

45.0k3.2k

1 天前

ollama

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

#deepseek#gemma#gemma3

172.7k16.3k

1 天前

pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

#ai#artificial-intelligence#data-science

31.2k3.7k

3 天前

llama-cpp-python

Python bindings for llama.cpp

10.4k1.4k

7 天前

text-generation-webui

The original local LLM interface. Text, vision, tool-calling, training, and more. 100% offline.

46.5k5.9k

4/13

llm.c

LLM training in simple, raw C/CUDA

30.1k3.6k

6/26

llama.cpp

LLM inference in C/C++

#ggml#llama

74.1k10.7k

2/13

llama2-webui

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.

#llama-2#llama2#llm

1.9k202

2/14