391 lines
11 KiB
Markdown
391 lines
11 KiB
Markdown
# 云端无人机语音服务 (Cloud Voice Service)
|
||
|
||
基于 **FastAPI + WebSocket** 的云端语音交互服务,为无人机提供 **LLM 意图识别** 和 **TTS 文字转语音** 能力。
|
||
|
||
## 📋 特性
|
||
|
||
- ✅ **协议规范**: 完整实现 `Cloud Voice Protocol v1.0 (text_uplink)`
|
||
- ✅ **LLM 意图识别**: 阿里云百炼 Qwen 模型,区分飞控指令与闲聊
|
||
- ✅ **TTS 本地推理**: Piper-TTS 高效本地合成语音 (24kHz PCM)
|
||
- ✅ **流式输出**: LLM 结果 + TTS 音频块流式下发
|
||
- ✅ **并发支持**: 最多 4 路无人机并发会话
|
||
- ✅ **模块化架构**: 易于扩展新的 LLM/TTS 提供者
|
||
|
||
## 📁 项目结构
|
||
|
||
```
|
||
voicellmcloud/
|
||
├── app/ # 主应用
|
||
│ ├── main.py # FastAPI 入口
|
||
│ ├── config.py # 配置管理
|
||
│ ├── protocols/ # 协议层
|
||
│ │ ├── models.py # 消息数据模型
|
||
│ │ └── validators.py # 协议验证
|
||
│ ├── websocket/ # WebSocket 管理
|
||
│ │ ├── session.py # 会话管理
|
||
│ │ └── handler.py # 消息处理
|
||
│ ├── services/ # 业务服务接口
|
||
│ │ ├── llm_service.py # LLM 接口
|
||
│ │ ├── tts_service.py # TTS 接口
|
||
│ │ └── intent_service.py # 意图识别
|
||
│ ├── providers/ # 第三方服务实现
|
||
│ │ ├── dashscope_llm.py # 阿里云 LLM
|
||
│ │ └── piper_tts.py # Piper TTS
|
||
│ └── utils/ # 工具
|
||
│ ├── audio.py # 音频处理
|
||
│ └── logger.py # 日志
|
||
├── models/ # TTS 模型文件目录
|
||
├── requirements.txt # Python 依赖
|
||
├── .env # 环境配置
|
||
├── .env.example # 配置示例
|
||
├── start.sh / start.bat # 启动脚本
|
||
└── README.md
|
||
```
|
||
|
||
## 🚀 快速开始
|
||
|
||
### 1. 环境准备
|
||
|
||
```bash
|
||
# Python 3.10+
|
||
python --version
|
||
|
||
# 安装依赖
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### 2. 下载 Piper TTS 模型
|
||
|
||
```bash
|
||
# 下载中文模型到 models/ 目录
|
||
python -m piper.download_voice zh_CN-huayan-medium
|
||
|
||
# 或手动下载
|
||
# https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0/zh_CN/huayan/medium
|
||
# 将 zh_CN-huayan-medium.onnx 和 .json 放到 models/
|
||
```
|
||
|
||
### 3. 配置环境变量
|
||
|
||
```bash
|
||
# 复制配置示例
|
||
cp .env.example .env
|
||
|
||
# 编辑 .env,修改以下配置:
|
||
# - DASHSCOPE_API_KEY: 阿里云百炼 API Key (已预填)
|
||
# - TTS_MODEL_DIR: Piper 模型目录 (默认 models)
|
||
# - BEARER_TOKEN: 鉴权 Token (客户端需一致)
|
||
```
|
||
|
||
### 4. 启动服务
|
||
|
||
**Linux/macOS:**
|
||
```bash
|
||
chmod +x start.sh
|
||
./start.sh
|
||
```
|
||
|
||
**Windows:**
|
||
```bash
|
||
start.bat
|
||
```
|
||
|
||
**或直接运行:**
|
||
```bash
|
||
python -m uvicorn app.main:app --host 0.0.0.0 --port 8765 --reload
|
||
```
|
||
|
||
### 5. 验证服务
|
||
|
||
```bash
|
||
# 健康检查
|
||
curl http://localhost:8765/health
|
||
|
||
# 应返回:
|
||
# {"status":"ok","active_sessions":0,"llm_provider":"dashscope","tts_provider":"piper"}
|
||
```
|
||
|
||
## 🔧 配置说明
|
||
|
||
所有配置通过 `.env` 文件或环境变量设置:
|
||
|
||
| 配置项 | 默认值 | 说明 |
|
||
|--------|--------|------|
|
||
| `WS_HOST` | `0.0.0.0` | WebSocket 监听地址 |
|
||
| `WS_PORT` | `8765` | WebSocket 端口 |
|
||
| `BEARER_TOKEN` | `drone-voice-cloud-token-2024` | 鉴权 Token |
|
||
| `DASHSCOPE_API_KEY` | - | 阿里云百炼 API Key |
|
||
| `LLM_MODEL` | `qwen-plus` | LLM 模型 (qwen-turbo/plus/max) |
|
||
| `LLM_CONTEXT_TURNS` | `4` | 保留历史对话轮数 |
|
||
| `TTS_PROVIDER` | `piper` | TTS 提供者 |
|
||
| `TTS_VOICE_NAME` | `zh_CN-huayan-medium` | Piper 语音名称 |
|
||
| `MAX_CONCURRENT_SESSIONS` | `4` | 最大并发会话数 |
|
||
| `LOG_LEVEL` | `INFO` | 日志级别 |
|
||
|
||
## 📡 WebSocket 协议
|
||
|
||
完整协议见 `CLOUD_VOICE_PROTOCOL_v1_text_uplink.md`
|
||
|
||
### 连接地址
|
||
```
|
||
ws://<server-ip>:8765/v1/voice/session
|
||
```
|
||
|
||
### 基本时序
|
||
|
||
```
|
||
客户端 服务端
|
||
| |
|
||
|------ session.start -------------->|
|
||
| |
|
||
|<----- session.ready ---------------|
|
||
| |
|
||
|------ turn.text ------------------>|
|
||
| |
|
||
|<----- dialog_result --------------|
|
||
|<----- tts_audio_chunk (text) -----|
|
||
|<----- tts_audio_chunk (binary) ---|
|
||
|<----- turn.complete --------------|
|
||
| |
|
||
|------ session.end --------------->|
|
||
```
|
||
|
||
### 示例消息
|
||
|
||
**session.start:**
|
||
```json
|
||
{
|
||
"type": "session.start",
|
||
"proto_version": "1.0",
|
||
"transport_profile": "text_uplink",
|
||
"session_id": "uuid-v4",
|
||
"auth_token": "your-token",
|
||
"client": {
|
||
"device_id": "drone-001",
|
||
"locale": "zh-CN",
|
||
"capabilities": {
|
||
"playback_sample_rate_hz": 24000,
|
||
"prefer_tts_codec": "pcm_s16le"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**turn.text:**
|
||
```json
|
||
{
|
||
"type": "turn.text",
|
||
"proto_version": "1.0",
|
||
"transport_profile": "text_uplink",
|
||
"turn_id": "uuid-v4",
|
||
"text": "起飞然后在前方十米悬停",
|
||
"is_final": true,
|
||
"source": "device_stt"
|
||
}
|
||
```
|
||
|
||
## 🧪 测试
|
||
|
||
### 测试用例
|
||
|
||
1. **闲聊**: "今天天气怎么样"
|
||
- 预期: `routing=chitchat`, TTS 播报闲聊回复
|
||
|
||
2. **飞控指令**: "起飞然后在前方十米悬停"
|
||
- 预期: `routing=flight_intent`, `actions=[takeoff, goto]`, TTS 播报 summary
|
||
|
||
3. **返航**: "返航"
|
||
- 预期: `routing=flight_intent`, `actions=[return_home]`
|
||
|
||
4. **非法音频消息**: 发送 `turn.audio_chunk`
|
||
- 预期: `error code=INVALID_MESSAGE`
|
||
|
||
5. **鉴权失败**: 使用错误 token
|
||
- 预期: `error code=UNAUTHORIZED`
|
||
|
||
### Python 测试客户端
|
||
|
||
```python
|
||
import asyncio
|
||
import json
|
||
import websockets
|
||
|
||
async def test_client():
|
||
uri = "ws://localhost:8765/v1/voice/session"
|
||
|
||
async with websockets.connect(uri) as ws:
|
||
# 1. 发送 session.start
|
||
await ws.send(json.dumps({
|
||
"type": "session.start",
|
||
"proto_version": "1.0",
|
||
"transport_profile": "text_uplink",
|
||
"session_id": "test-session-001",
|
||
"auth_token": "drone-voice-cloud-token-2024",
|
||
"client": {
|
||
"device_id": "test-drone",
|
||
"locale": "zh-CN",
|
||
"capabilities": {
|
||
"playback_sample_rate_hz": 24000,
|
||
"prefer_tts_codec": "pcm_s16le"
|
||
}
|
||
}
|
||
}))
|
||
|
||
# 接收 session.ready
|
||
ready = await ws.recv()
|
||
print(f"← {ready}")
|
||
|
||
# 2. 发送 turn.text
|
||
await ws.send(json.dumps({
|
||
"type": "turn.text",
|
||
"proto_version": "1.0",
|
||
"transport_profile": "text_uplink",
|
||
"turn_id": "test-turn-001",
|
||
"text": "你好,今天天气怎么样?",
|
||
"is_final": True,
|
||
"source": "device_stt"
|
||
}))
|
||
|
||
# 3. 接收响应
|
||
while True:
|
||
msg = await ws.recv()
|
||
if isinstance(msg, bytes):
|
||
print(f"← 音频数据 ({len(msg)} bytes)")
|
||
else:
|
||
data = json.loads(msg)
|
||
print(f"← {data['type']}: {json.dumps(data, ensure_ascii=False)}")
|
||
|
||
if data.get('type') == 'turn.complete':
|
||
break
|
||
|
||
asyncio.run(test_client())
|
||
```
|
||
|
||
## 🏗 架构设计
|
||
|
||
### 模块化层次
|
||
|
||
```
|
||
┌─────────────────────────────────────────┐
|
||
│ FastAPI Application │
|
||
│ (app/main.py) │
|
||
├─────────────────────────────────────────┤
|
||
│ WebSocket Handler │
|
||
│ (app/websocket/handler.py) │
|
||
├──────────┬──────────┬───────────────────┤
|
||
│ LLM │ TTS │ Intent Service │
|
||
│ Service │ Service │ (意图识别) │
|
||
├──────────┼──────────┼───────────────────┤
|
||
│DashScope │ Piper │ 协议模型/验证 │
|
||
│ (阿里云) │ (本地) │ │
|
||
└──────────┴──────────┴───────────────────┘
|
||
```
|
||
|
||
### 扩展新的 LLM/TTS 提供者
|
||
|
||
只需实现对应接口并注册:
|
||
|
||
```python
|
||
# 1. 实现接口
|
||
class MyLLMService(LLMServiceInterface):
|
||
async def chat(...): ...
|
||
async def initialize(...): ...
|
||
async def shutdown(...): ...
|
||
|
||
# 2. 在 app/main.py 中添加
|
||
if settings.LLM_PROVIDER == "my_llm":
|
||
llm_service = MyLLMService()
|
||
```
|
||
|
||
## 📊 性能指标
|
||
|
||
| 指标 | 预期值 |
|
||
|------|--------|
|
||
| LLM 推理延迟 | 1-3s (阿里云 qwen-plus) |
|
||
| TTS 首字节延迟 | <200ms (Piper 本地) |
|
||
| 音频采样率 | 24000 Hz |
|
||
| 音频格式 | PCM S16LE (mono) |
|
||
| 最大并发 | 4 sessions |
|
||
|
||
## 🔮 后续规划
|
||
|
||
- [ ] 支持本地 H200 部署 LLM (vLLM/TGI)
|
||
- [ ] 多语言 TTS 支持
|
||
- [ ] WebSocket TLS (WSS) 支持
|
||
- [ ] Prometheus 指标监控
|
||
- [ ] 会话持久化与断线重连
|
||
- [ ] Docker 容器化部署
|
||
|
||
## 📝 开发说明
|
||
|
||
### 添加新模块
|
||
|
||
```bash
|
||
# 创建模块目录
|
||
mkdir app/new_module
|
||
touch app/new_module/__init__.py
|
||
touch app/new_module/module.py
|
||
```
|
||
|
||
### 日志级别
|
||
|
||
```bash
|
||
# 修改 .env
|
||
LOG_LEVEL=DEBUG # 查看详细日志
|
||
LOG_LEVEL=INFO # 生产环境
|
||
```
|
||
|
||
### 调试技巧
|
||
|
||
```python
|
||
# 在 handler.py 中添加断点
|
||
import pdb; pdb.set_trace()
|
||
```
|
||
|
||
## ❓ 常见问题
|
||
|
||
**Q: Piper TTS 初始化失败?**
|
||
```bash
|
||
# 检查模型文件是否存在
|
||
ls -lh models/zh_CN-huayan-medium.onnx
|
||
|
||
# 重新下载
|
||
python -m piper.download_voice zh_CN-huayan-medium
|
||
```
|
||
|
||
**Q: LLM 调用超时?**
|
||
```bash
|
||
# 检查 API Key
|
||
echo $DASHSCOPE_API_KEY
|
||
|
||
# 增加超时时间
|
||
LLM_TIMEOUT=60
|
||
```
|
||
|
||
**Q: 客户端连接被拒绝?**
|
||
```bash
|
||
# 检查 BEARER_TOKEN 是否一致
|
||
# 服务器 .env 中的 BEARER_TOKEN 必须与客户端 auth_token 一致
|
||
```
|
||
|
||
## 扩展阅读
|
||
|
||
- [项目总结与部署手册](./PROJECT_SUMMARY_AND_DEPLOYMENT.md):整体能力、架构闭环、生产部署与排障
|
||
- [飞控意图 Schema v1](./FLIGHT_INTENT_SCHEMA_v1.md)、[实施计划](./FLIGHT_INTENT_IMPLEMENTATION_PLAN.md)
|
||
- [dialog_result v1 + confirm](./CLOUD_VOICE_DIALOG_v1.md):**签字基准**(`protocol`、`confirm`、`user_input` 字符串)
|
||
- [飞控门控历史备选](./CLOUD_VOICE_FLIGHT_CONFIRM_v1.md):`flight_intent_pending`、`turn.confirmation`
|
||
|
||
## 📄 许可证
|
||
|
||
内部项目 - 无人机云端语音交互服务
|
||
|
||
## 🤝 贡献
|
||
|
||
提交 Issue 或 Pull Request 以改进本项目。
|
||
|
||
---
|
||
|
||
**版本**: v1.0.0
|
||
**更新日期**: 2024-04-07
|
||
**协议版本**: Cloud Voice Protocol v1.0 (text_uplink)
|