7.4 部署上线
代码写完只是开始,能稳定跑在服务器上才算交付。本章覆盖从 Docker 容器化到 CI/CD 自动部署的完整上线流程。
学习时长:2-3 周
为什么 AI 应用的部署比传统 Web 更复杂?
传统 Web:代码 → Docker → 服务器 → 完成
AI 应用:代码 → Docker → GPU/向量数据库/模型文件/API Key 管理 → 多容器编排 → 完成
额外需要考虑:
1. 大模型文件存储与加载(本地部署场景)
2. 向量数据库持久化(pgvector / Milvus 数据卷)
3. API Key 安全管理(不能写死在代码里)
4. 流式响应的反向代理配置(Nginx 需要关闭缓冲)
5. GPU 资源分配(本地模型推理)7.4.1 Docker 容器化
1. AI 应用 Dockerfile 最佳实践
dockerfile
# === 后端服务(FastAPI + LLM) ===
# 使用 slim 镜像减小体积
FROM python:3.11-slim AS base
# 设置工作目录
WORKDIR /app
# 安装系统依赖(psycopg2 等需要)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev curl \
&& rm -rf /var/lib/apt/lists/*
# 先复制依赖文件,利用 Docker 缓存层
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 再复制应用代码(代码变更不会重新安装依赖)
COPY . .
# 非 root 用户运行(安全最佳实践)
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
EXPOSE 8000
# 生产环境用 Gunicorn + Uvicorn Workers
CMD ["gunicorn", "main:app", \
"-w", "4", \
"-k", "uvicorn.workers.UvicornWorker", \
"--bind", "0.0.0.0:8000", \
"--timeout", "120"]2. .dockerignore(减小镜像体积)
__pycache__
*.pyc
.git
.env
.venv
node_modules
*.md
tests/3. 多阶段构建(前后端分离项目)
dockerfile
# Stage 1:构建前端
FROM node:20-slim AS frontend
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci --production=false
COPY frontend/ .
RUN npm run build
# Stage 2:构建后端
FROM python:3.11-slim AS backend
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ ./backend/
# 将前端构建产物复制到后端静态目录
COPY --from=frontend /app/frontend/dist ./static/
EXPOSE 8000
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]7.4.2 Docker Compose 多容器编排
典型 AI 应用的三容器架构:
┌─────────────────────────────────────────────────┐
│ Docker Compose │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Nginx │──│ FastAPI │──│ PostgreSQL │ │
│ │ :80/:443 │ │ :8000 │ │ + pgvector │ │
│ │ 反向代理 │ │ AI 后端 │ │ :5432 │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
│ ↑ ↑ │
│ SSL 终止 数据卷持久化 │
│ 静态文件 │
└─────────────────────────────────────────────────┘yaml
# docker-compose.yml
services:
# === 反向代理 ===
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro # SSL 证书
- ./frontend/dist:/usr/share/nginx/html:ro # 前端静态文件
depends_on:
backend:
condition: service_healthy
restart: always
# === AI 后端 ===
backend:
build:
context: .
dockerfile: Dockerfile
env_file: .env # API Key 等敏感配置
environment:
- DATABASE_URL=postgresql+asyncpg://user:pass@db:5432/aiapp
- REDIS_URL=redis://redis:6379/0
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
restart: always
# === 数据库(PostgreSQL + pgvector) ===
db:
image: pgvector/pgvector:pg16
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: aiapp
volumes:
- pgdata:/var/lib/postgresql/data # 数据持久化
- ./init.sql:/docker-entrypoint-initdb.d/init.sql # 初始化脚本
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d aiapp"]
interval: 10s
timeout: 5s
retries: 5
restart: always
# === 缓存(可选) ===
redis:
image: redis:7-alpine
volumes:
- redisdata:/data
restart: always
volumes:
pgdata:
redisdata:Nginx 配置(AI 应用专用)
nginx
# nginx/nginx.conf
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
# === 关键:AI 流式响应配置 ===
upstream backend {
server backend:8000;
}
server {
listen 80;
server_name yourdomain.com;
# 前端静态文件
location / {
root /usr/share/nginx/html;
try_files $uri $uri/ /index.html;
}
# API 代理
location /api/ {
proxy_pass http://backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# ⚠️ 流式响应必须关闭缓冲
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s; # LLM 响应可能很慢
}
# SSE 流式接口(专门配置)
location /api/chat/stream {
proxy_pass http://backend/chat/stream;
proxy_buffering off;
proxy_cache off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
proxy_read_timeout 600s; # 长对话超时
}
# WebSocket 支持
location /ws/ {
proxy_pass http://backend/ws/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s;
}
}
}⚠️ AI 部署最常见的坑:Nginx 默认开启
proxy_buffering,会导致 SSE 流式响应被缓冲,用户看不到逐字打字效果。必须显式关闭。
7.4.3 环境变量与密钥管理
1. .env 文件管理 API Key
bash
# .env(不要提交到 Git!)
# LLM API Keys
OPENAI_API_KEY=sk-xxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxx
DASHSCOPE_API_KEY=sk-xxxxx
# 数据库
DATABASE_URL=postgresql+asyncpg://user:pass@db:5432/aiapp
# Redis
REDIS_URL=redis://redis:6379/0
# 应用配置
APP_ENV=production
LOG_LEVEL=INFO
CORS_ORIGINS=https://yourdomain.comgitignore
# .gitignore
.env
.env.*
!.env.example2. .env.example(提交到 Git,供团队参考)
bash
# .env.example — 复制为 .env 并填入真实值
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/aiapp
REDIS_URL=redis://localhost:6379/0
APP_ENV=development3. Python 中安全读取配置
python
# config.py
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
"""应用配置(从 .env 自动加载)"""
# LLM
openai_api_key: str
anthropic_api_key: str = ""
# 数据库
database_url: str
# 应用
app_env: str = "development"
log_level: str = "INFO"
cors_origins: str = "http://localhost:3000"
class Config:
env_file = ".env"
@lru_cache()
def get_settings() -> Settings:
return Settings()
# 使用
settings = get_settings()
# settings.openai_api_key ← 自动从 .env 读取7.4.4 CI/CD 自动部署
GitHub Actions 自动部署流水线
yaml
# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to server via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.SERVER_HOST }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
cd /opt/ai-app
# 拉取最新代码
git pull origin main
# 重新构建并启动(--no-cache 确保代码更新)
docker compose build --no-cache backend
docker compose up -d
# 等待健康检查通过
sleep 10
curl -f http://localhost:8000/health || exit 1
echo "✅ 部署成功"GitHub Secrets 配置:
Settings → Secrets and variables → Actions → New repository secret
SERVER_HOST = your-server-ip
SERVER_USER = deploy
SSH_PRIVATE_KEY = (SSH 私钥内容)Webhook 轻量方案(替代 GitHub Actions)
python
# webhook_server.py — 部署在服务器上,监听 Git Push 事件
from fastapi import FastAPI, Request
import subprocess
import hmac
import hashlib
app = FastAPI()
WEBHOOK_SECRET = "your-webhook-secret"
@app.post("/webhook/deploy")
async def deploy(request: Request):
"""GitHub Webhook 触发自动部署"""
body = await request.body()
# 验证签名
signature = request.headers.get("X-Hub-Signature-256", "")
expected = "sha256=" + hmac.new(
WEBHOOK_SECRET.encode(), body, hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
return {"error": "Invalid signature"}, 403
# 执行部署
result = subprocess.run(
["bash", "/opt/ai-app/deploy.sh"],
capture_output=True, text=True, timeout=120
)
return {
"status": "success" if result.returncode == 0 else "failed",
"output": result.stdout[-500:] # 最后 500 字符
}bash
#!/bin/bash
# deploy.sh
cd /opt/ai-app
git pull origin main
docker compose build --no-cache backend
docker compose up -d
docker compose ps
echo "部署完成: $(date)"7.4.5 日志与监控
1. 结构化日志
python
# logging_config.py
import logging
import json
from datetime import datetime
class JSONFormatter(logging.Formatter):
"""JSON 格式日志(方便 ELK/Loki 采集)"""
def format(self, record):
log_data = {
"timestamp": datetime.utcnow().isoformat(),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
"line": record.lineno,
}
if hasattr(record, "request_id"):
log_data["request_id"] = record.request_id
if record.exc_info:
log_data["exception"] = self.formatException(record.exc_info)
return json.dumps(log_data, ensure_ascii=False)
def setup_logging(level: str = "INFO"):
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logging.root.handlers = [handler]
logging.root.setLevel(level)2. Docker Compose 日志收集
yaml
# docker-compose.yml 中添加日志配置
services:
backend:
logging:
driver: "json-file"
options:
max-size: "50m" # 单个日志文件最大 50MB
max-file: "5" # 最多保留 5 个文件bash
# 常用日志命令
docker compose logs backend --tail 100 # 最近 100 行
docker compose logs backend -f # 实时跟踪
docker compose logs backend --since 1h # 最近 1 小时
docker compose logs backend 2>&1 | grep ERROR # 过滤错误3. 健康检查与告警
python
# health.py — 完整的健康检查端点
from fastapi import APIRouter
from datetime import datetime
import asyncpg
import redis
router = APIRouter()
@router.get("/health")
async def health_check():
"""综合健康检查"""
checks = {}
# 检查数据库
try:
conn = await asyncpg.connect(DATABASE_URL)
await conn.execute("SELECT 1")
await conn.close()
checks["database"] = "ok"
except Exception as e:
checks["database"] = f"error: {str(e)}"
# 检查 Redis
try:
r = redis.Redis.from_url(REDIS_URL)
r.ping()
checks["redis"] = "ok"
except Exception as e:
checks["redis"] = f"error: {str(e)}"
all_ok = all(v == "ok" for v in checks.values())
return {
"status": "healthy" if all_ok else "degraded",
"timestamp": datetime.utcnow().isoformat(),
"checks": checks
}7.4.6 部署检查清单
上线前必须检查:
环境配置
────────────────────────────────
✅ .env 文件已配置所有 API Key
✅ .env 已加入 .gitignore
✅ DATABASE_URL 指向生产数据库
✅ CORS_ORIGINS 设置为实际域名
✅ APP_ENV=production
Docker
────────────────────────────────
✅ Dockerfile 使用非 root 用户
✅ HEALTHCHECK 已配置
✅ 数据卷已挂载(数据库、Redis)
✅ docker compose up -d 正常启动
Nginx
────────────────────────────────
✅ proxy_buffering off(SSE 流式必须)
✅ WebSocket 升级已配置
✅ proxy_read_timeout ≥ 120s
✅ SSL 证书已配置(HTTPS)
安全
────────────────────────────────
✅ API Key 不在代码中硬编码
✅ 数据库密码使用强密码
✅ 不暴露不必要的端口
✅ 定期备份数据库
监控
────────────────────────────────
✅ /health 端点正常返回
✅ 日志输出到文件/收集器
✅ CI/CD 流水线测试通过
✅ 部署后 curl 验证功能正常学习资源