1from transformers import AutoModelForCausalLM, AutoTokenizer
23model_name ="Qwen/Qwen2.5-7B-Instruct"45model = AutoModelForCausalLM.from_pretrained(6 model_name,7 torch_dtype="auto",8 device_map="auto"9)10tokenizer = AutoTokenizer.from_pretrained(model_name)1112prompt ="Give me a short introduction to large language model."13messages =[14{"role":"system","content":"You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},15{"role":"user","content": prompt}16]17text = tokenizer.apply_chat_template(18 messages,19 tokenize=False,20 add_generation_prompt=True21)22model_inputs = tokenizer([text], return_tensors="pt").to(model.device)2324generated_ids = model.generate(25**model_inputs,26 max_new_tokens=51227)28generated_ids =[29 output_ids[len(input_ids):]for input_ids, output_ids inzip(model_inputs.input_ids, generated_ids)30]3132response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
我们建议您使用最新版本的 vLLM 构建兼容 OpenAI 的 API 服务,包括工具使用支持。用聊天模型启动服务器,例如 Qwen2.5-7B-Instruct:
shell
1vllm serve Qwen/Qwen2.5-7B-Instruct
然后如下所示使用聊天 API:
shell
1curl http://localhost:8000/v1/chat/completions -H"Content-Type: application/json"-d'{
2 "model": "Qwen/Qwen2.5-7B-Instruct",
3 "messages": [
4 {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
5 {"role": "user", "content": "Tell me something about large language models."}
6 ],
7 "temperature": 0.7,
8 "top_p": 0.8,
9 "repetition_penalty": 1.05,
10 "max_tokens": 512
11}'
python
1from openai import OpenAI
2# 设置 OpenAI 的 API 密钥和 API 基础地址以使用 vLLM 的 API 服务。3openai_api_key ="EMPTY"4openai_api_base ="http://localhost:8000/v1"56client = OpenAI(7 api_key=openai_api_key,8 base_url=openai_api_base,9)这个内容讲述了使用不同的API和工具来运行和部署名为Qwen2.5的大语言模型。我们可以从几个部分入手理解这个技术内容。
1011### SGLang1213SGLang 目前提供OpenAI兼容API,但暂不支持工具使用或函数调用。您需要从源代码安装SGLang,并启动一个服务器,通过命令 `python -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --port 30000` 来开启服务。在Python中,您可以使用类似的代码来与模型交互:
1415```python
16from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint
1718@function19defmulti_turn_question(s, question_1, question_2):20 s += system("You are Qwen, created by Alibaba Cloud. You are a helpful assistant.")21 s += user(question_1)22 s += assistant(gen("answer_1", max_tokens=256))23 s += user(question_2)24 s += assistant(gen("answer_2", max_tokens=256))2526set_default_backend(RuntimeEndpoint("http://localhost:30000"))2728state = multi_turn_question.run(29 question_1="What is the capital of China?",30 question_2="List two local attractions.",31)3233for m in state.messages():34print(m["role"],":", m["content"])3536print(state["answer_1"])
@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
url = {https://qwenlm.github.io/blog/qwen2.5/},
author = {Qwen Team},
month = {September},
year = {2024}
}
@article{qwen2,
title={Qwen2 Technical Report},
author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
journal={arXiv preprint arXiv:2407.10671},
year={2024}
}