docs: add VPS deployment + CI notes [CI SKIP]
Some checks failed
CI / test (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
Security Scan / backend-security (push) Has been cancelled
Security Scan / frontend-security (push) Has been cancelled

综合归档 ai.puro.im 部署记录:
- VPS 架构 + CI 流程图
- 初次手工部署流程(对应 LOCAL_SETUP_NOTES 的线上版)
- Drone pipeline 细节 + skip-CI 约定
- 7 个踩过的坑 + 解法(Setup wizard port/权限、账号池 3 件套、run_mode、balance 缓存、/setup 路由)
- 当前 snapshot、常用运维命令、从头重来流程

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
puro ci
2026-04-19 13:05:30 +08:00
parent 9c34e9619c
commit ac3417a964

541
VPS_DEPLOYMENT_NOTES.md Normal file
View File

@@ -0,0 +1,541 @@
# Sub2API VPS 部署 + CI/CD 记录
> 2026-04-19 @ `ai.puro.im` · 217.216.32.230 · Ubuntu 22.04 x86_64
本文档对应 `LOCAL_SETUP_NOTES.md`Mac 本地开发环境)的线上版本,覆盖:从零到站点可访问、接入 Codex CLI、建立 Drone CI 自动部署的完整流程。
---
## 概览
```
外网
│ HTTPS (Let's Encrypt, Caddy auto-TLS)
ai.puro.im :443
┌──────┴──────┐
│ Caddy │ host, /etc/caddy/conf.d/sub2api.conf
│ reverse_ │ → 127.0.0.1:8081
│ proxy │
└──────┬──────┘
docker port map 127.0.0.1:8081 → 容器 8080
┌────────▼──────────────┐
│ sub2api-net (compose) │
│ sub2api (Go) │ /opt/sub2api/ (持久化)
│ sub2api-pg (PG15) │ ├─ app-data/ (config, .installed, pricing, logs)
│ sub2api-redis │ ├─ pg-data/
│ (redis:7-alpine) │ └─ redis-data/
└───────────────────────┘
│ VPS 出口
api.anthropic.com / chatgpt.com / api.openai.com
```
### CI 流程
```
Mac (本地仓库) ─ git push gitea main ─▶ Gitea (git.puro.im) ──webhook──▶ Drone (devops.puro.im)
┌───────────┴────────────┐
│ build-frontend (pnpm) │
│ build-backend (go) │
│ deploy │
│ - cp bin + Dockerfile│
│ - docker compose up │
│ --build sub2api │
└────────────────────────┘
```
---
## 一、VPS 环境
| 项 | 值 | 备注 |
|---|---|---|
| 主机 | 217.216.32.230(新加坡) | `ssh vps` |
| OS | Ubuntu 22.04 LTS x86_64 | |
| Caddy | host 安装,`/etc/caddy/conf.d/*.conf` auto-include | 已跑着 puro.im / git.puro.im / devops.puro.im / erp.puro.im / ... |
| Docker | host 装 + `devops-net` 网络 | 各应用容器都在 devops-net 里 |
| Gitea | `git.puro.im:2222` (SSH), 3000 (HTTP) | purovps/sub2api 仓库 |
| Drone | `devops.puro.im`OAuth via Gitea | drone-server + drone-runner-docker |
| Sub2API 专用 docker-compose | `sub2api-net` 独立 bridge 网络(**不加 devops-net** | 隔离 PG/Redis 生命周期 |
| DNS | Cloudflare **DNS only**(非代理模式) | `ai.puro.im` A 记录 → VPS IP |
| 端口冲突 | 宿主 `:8080` 被 drone-server 占 | Sub2API 映射到宿主 `127.0.0.1:8081` |
---
## 二、初次部署流程(手工路径,首次跑通)
> CI 跑通后这段主要作为"首次引导 / 灾难恢复参考"。日常改代码走 CI下一节
### 1. 本机交叉编译 linux/amd64 二进制
```bash
cd /Users/mini/Work/dev/sub2api/backend
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -tags embed -ldflags='-s -w' -o sub2api-linux ./cmd/server
# 产物 ~75MB, static ELF x86-64
```
注意:本机 macOS arm64VPS 是 x86_64**必须交叉编译**。`-tags embed` 把前端 dist 打进二进制(配合先跑 `pnpm run build`)。
### 2. VPS 建目录 + scp 文件
```bash
ssh vps "mkdir -p /opt/sub2api/{app-data,pg-data,redis-data}"
# 准备 deploy 文件(本地 staging 目录)
cp backend/sub2api-linux /Users/mini/Work/dev/sub2api-deploy/
# Dockerfile + docker-compose.yml 见下节内容
scp /Users/mini/Work/dev/sub2api-deploy/{sub2api-linux,Dockerfile,docker-compose.yml} \
vps:/opt/sub2api/
# distroless:nonroot 需要 UID 65532 可写
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
```
### 3. Dockerfile`.ci/Dockerfile` 也是同一份)
```dockerfile
FROM gcr.io/distroless/static-debian12:nonroot
WORKDIR /app
COPY sub2api-linux /app/sub2api
EXPOSE 8080
ENTRYPOINT ["/app/sub2api"]
```
### 4. docker-compose.yml
关键点:
- `sub2api` 监听 `127.0.0.1:8081:8080`(避开宿主 8080 被 drone 占)
- `/app/data` 整个挂到 `./app-data`(含 config.yaml / install.lock / pricing / logs
- PG 密码通过 `POSTGRES_PASSWORD` env不进仓库
- 独立 `sub2api-net` bridge不混 devops-net
生产文件位置:`/opt/sub2api/docker-compose.yml`**不签入 git**,含 PG 密码)。模板结构:
```yaml
services:
postgres:
image: postgres:15
container_name: sub2api-pg
environment:
POSTGRES_DB: sub2api
POSTGRES_USER: postgres
POSTGRES_PASSWORD: "<32-hex 强口令>"
volumes: [./pg-data:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d sub2api"]
restart: unless-stopped
networks: [sub2api-net]
redis:
image: redis:7-alpine
container_name: sub2api-redis
command: ["redis-server", "--appendonly", "yes"]
volumes: [./redis-data:/data]
healthcheck:
test: ["CMD", "redis-cli", "ping"]
restart: unless-stopped
networks: [sub2api-net]
sub2api:
build: .
image: sub2api:local
container_name: sub2api
depends_on:
postgres: { condition: service_healthy }
redis: { condition: service_healthy }
volumes:
- ./app-data:/app/data
ports: ["127.0.0.1:8081:8080"]
restart: unless-stopped
networks: [sub2api-net]
networks:
sub2api-net:
driver: bridge
```
### 5. 起 stack
```bash
ssh vps "cd /opt/sub2api && docker compose up -d --build"
# 等 PG/Redis healthy 后 sub2api 才会起depends_on + healthcheck 保证)
```
### 6. Caddy 反代
`/etc/caddy/conf.d/sub2api.conf`
```caddy
ai.puro.im {
reverse_proxy 127.0.0.1:8081 {
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Proto {scheme}
}
encode gzip zstd
log {
output file /var/log/caddy/ai.puro.im.log
format json
}
}
```
```bash
ssh vps "systemctl reload caddy"
```
### 7. 首次访问 → Setup Wizard
**不预先写 `/opt/sub2api/app-data/config.yaml`**——让 sub2api 以 wizard 模式启动,自己生成 config。走
1. 浏览器开 `https://ai.puro.im` → 自动进 Setup Wizard
2. Step 1 数据库host=`postgres` / port=`5432` / user=`postgres` / password=<compose 里那个> / dbname=`sub2api` / sslmode=`disable`
3. Step 2 Redishost=`redis` / port=`6379` / password 空 / db=`0`
4. Step 3 Admin邮箱 + ≥8 位密码(自己选好记下来)
5. Step 4 提交 → wizard 写 `app-data/config.yaml` + `app-data/.installed` + DB 创建 admin → 进程自杀、compose 重启后以正常模式跑
### 8. 加 OpenAI OAuth 账号(账号池)
从本机 `~/.codex/auth.json``tokens.refresh_token`,粘到:
- 后台 → 账号管理 → 添加账号 → Platform: OpenAI / Type: OAuth → "粘贴 Refresh Token" → 验证
- 弹出 ChatGPT 邮箱+Plan 即成功
### 9. 三件必调(否则 503 "no available OpenAI accounts"
见 [§四 · 坑 3](#坑-3-新建-oauth-账号一上来无法调度),后台 UI 里:
- **开启调度开关**
- **去掉 auto_pause_on_expired**
- **清掉 expires_at**(或设远期)
---
## 三、CI/CD 流水线Drone
### 仓库
| 位置 | 用途 |
|---|---|
| GitHub: `Wei-Shaw/sub2api` | 上游origin remote |
| Gitea: `git.puro.im/purovps/sub2api` | 我们的 fork**Drone trigger source**remote 名 `gitea` |
本地开发流程:
```bash
# 日常
git checkout main
# 改代码...
git commit
git push gitea main # 触发 CI
# 可选同步上游
git fetch origin
git merge origin/main # 手动决定是否合上游
```
### Drone 激活
首次激活https://devops.puro.im → Sync → 点亮 `purovps/sub2api` → Settings → 开 **Trusted**pipeline 要挂 host socket 和 `/opt/sub2api`)。
激活后 Gitea 会自动配 webhook`POST https://devops.puro.im/hook`,事件 `push, pull_request, ...`)。
### `.drone.yml` 三段 pipeline
```yaml
kind: pipeline
type: docker
name: default
trigger: { branch: [main], event: [push] }
steps:
- name: build-frontend # node:18-alpine + pnpm 10.33
# pnpm install --frozen-lockfile && pnpm run build
# 产物 → backend/internal/web/dist/
- name: build-backend # golang:1.23-alpine + GOTOOLCHAIN=auto
# 自动拉 Go 1.26.2 toolchain
# CGO_ENABLED=0 GOOS=linux GOARCH=amd64 → sub2api-linux
- name: deploy # docker:cli
# cp backend/sub2api-linux /opt/sub2api/sub2api-linux
# cp .ci/Dockerfile /opt/sub2api/Dockerfile
# cd /opt/sub2api && docker compose up -d --build sub2api
# 验证 sub2api 容器 running
```
host 挂载 3 类卷:
- `/var/run/docker.sock` — 用宿主 docker daemon 做构建+重启
- `/opt/sub2api` — 更新 binary/Dockerfile
- `/opt/drone/cache/{pnpm-store,go-build,go-mod}` — 加速后续构建
### 首次构建时长
| Step | 首次 | 缓存命中后 |
|---|---|---|
| build-frontend | ~90s | ~30s |
| build-backend | ~60s+首次下 Go 1.26.2 toolchain | ~40s |
| deploy | ~20s | ~20s |
| **合计** | **~3 min** | **~90s** |
### CI 不碰的内容secrets / 数据)
| 文件/目录 | 位置 | 为什么 |
|---|---|---|
| `docker-compose.yml` | `/opt/sub2api/` | 含 PG 密码 |
| `app-data/config.yaml` | `/opt/sub2api/app-data/` | JWT secret / 运行态 |
| `pg-data/` | `/opt/sub2api/` | PG 数据 |
| `redis-data/` | `/opt/sub2api/` | Redis 数据 |
CI 只:
- 覆写 `/opt/sub2api/sub2api-linux`(二进制)
- 覆写 `/opt/sub2api/Dockerfile`
- `docker compose up -d --build sub2api` — 只重建 `sub2api` servicePG/Redis 不动)
### Skip CI
文档/无代码变动想跳过构建commit 消息加 `[CI SKIP]`Drone 官方约定)。
---
## 四、踩坑与解法
### 坑 1Setup Wizard 默认写 `server.port: 443`
**现象**Wizard 跑完 → 容器重启 → `127.0.0.1:8081` 无响应app 启动了但监听在 container 内的 :443
**根因**Wizard 默认把 `server.port` 写成 443面向"容器直接暴露 HTTPS"的用法,不适合我们"容器内 8080 + Caddy 外 443"的模式)。
**解法**
```bash
ssh vps "sed -i 's/port: 443/port: 8080/' /opt/sub2api/app-data/config.yaml && docker restart sub2api"
```
后续 CI 不重写 config.yaml这个修复是一次性的。
---
### 坑 2distroless:nonroot 日志目录权限
**现象**:容器启动日志刷 `write error: can't open new logfile: open /app/data/logs/sub2api.log: permission denied`
**根因**`gcr.io/distroless/static-debian12:nonroot` 进程 UID=65532`/opt/sub2api/app-data/logs` 宿主目录 owner=root。
**解法**
```bash
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
```
一次性。CI 不碰 app-data所以不复发。
---
### 坑 3新建 OAuth 账号一上来"无法调度"
**现象**refresh_token 粘好、验证通过、UI 显示"正常"curl `/responses``503 "Service temporarily unavailable"`,日志 `openai.account_select_failed: no available OpenAI accounts`
**根因(三个叠加)**
1. UI 默认 `schedulable=false`
2. Wizard/后端给 `expires_at` 写了 **access_token 的短期过期**~7 分钟后),而 `auto_pause_on_expired=true` 默认开
3. 每次 sub2api 重启 → `[AccountExpiry] Auto paused 1 expired accounts` → schedulable 被打回 false
**解法**SQL 或 UI 都可):
```sql
UPDATE accounts
SET schedulable=true, expires_at=NULL, auto_pause_on_expired=false
WHERE id=<n>;
```
**已存 memory**`feedback_sub2api_account_pitfalls.md`。长期方案:修 fork 里的账号创建逻辑OAuth 账号的 `expires_at` 应指 refresh_token 过期或订阅到期,不是 access_token。
---
### 坑 4`run_mode: simple` 隐藏 SaaS 菜单
**现象**:为跳过 `INSUFFICIENT_BALANCE` 临时切到 `run_mode: simple`,重登后台发现**用户管理 / 分组管理 / 渠道管理 / 订阅管理 / 兑换码 / 优惠码**全消失。
**根因****设计行为**,不是 bug。`simple` 模式是给 "单用户/团队内部工具" 用,刻意隐藏 SaaS 管理面板;前端 `stores/auth.ts``isSimpleMode = computed(() => runMode === 'simple')` 控制路由可见性。
**解法**
- 如果需要管理 group/订阅/计费/兑换码 → 保持 `run_mode: standard`
- `INSUFFICIENT_BALANCE` 的替代方案:给 admin 塞大额 balance 即可(见坑 5
---
### 坑 5`standard` 模式 + admin balance=0 → 403
**现象**:切回 `standard` 后 curl `/responses``INSUFFICIENT_BALANCE`
**根因**:中间件 `api_key_auth.go:198` 检查 `apiKey.User.Balance <= 0` → 403。admin 默认余额为 0。
**解法**
```sql
UPDATE users SET balance = 1000000000 WHERE id=1;
```
10 亿够跑很久。接 iShare 后改用 iShare 订阅模式sub2api 标记为 "subscription mode",走订阅限额而不查 balance
---
### 坑 6Redis L2 cache stale
**现象**:跑 [坑 5] 的 SQL 之后、docker restart sub2api 之后curl 仍报 `INSUFFICIENT_BALANCE`
**根因**Sub2API 的 API key 鉴权有两级缓存:
- L1 进程内 LRUTTL 15s
- L2 Rediskey `sub2api:apikey:*`**TTL 300s**
docker restart 只清 L1L2 是独立 Redis 容器,保留着旧的 `balance=0` 缓存条目。
**解法**
```bash
ssh vps "docker exec sub2api-redis redis-cli FLUSHDB && docker restart sub2api"
```
**一般经验**:改 DB 里的 user / api_key / account 后需重启 sub2api **+** 清 Redis。只重启 sub2api 不够。
---
### 坑 7`/setup/*` 路由按需注册
**现象**:首次手工写了 config.yaml 让容器跑起来,后来想走 Wizard 重新配置;前端进了 `/setup` 页,填完点"测试连接" → `Request failed with status code 404`
**根因**`cmd/server/main.go` 里只有 `NeedsSetup() == true` 分支(走 `runSetupServer()`)才 `setup.RegisterRoutes(r)`。config.yaml 已存在时走 `runMainServer()`setup 路由不注册 → 前端 `/setup/test-db` → 404。
**解法**:让 `NeedsSetup()` 返回 true`/app/data/config.yaml` **和** `/app/data/.installed` 都不存在。具体做法:
```bash
ssh vps "cd /opt/sub2api && mv app-data/config.yaml app-data/config.yaml.bak; \
mv app-data/.installed app-data/.installed.bak 2>/dev/null; \
docker restart sub2api"
```
---
## 五、当前状态 snapshot2026-04-19
```
Domain https://ai.puro.im (HTTP/2, Let's Encrypt via Caddy)
VPS 217.216.32.230 (Singapore, Ubuntu 22.04)
Image sub2api:local (Drone CI-built, distroless + static Go binary)
Containers on VPS:
sub2api ← Drone 每次 push 重建
sub2api-pg postgres:15 (devops-net 无关,独立 sub2api-net)
sub2api-redis redis:7-alpine
Accounts:
admin@puro.im (id=1, role=admin, balance=1e9)
test_myopenai (OpenAI OAuth, group=test_codex, schedulable=true)
API keys:
sk-d2132de2f0b4c1ab64ef7241a16d254cab483f1f8afd47ad4a89e39cf6e2345a
(user=admin, group=test_codex)
Config:
run_mode: standard
server.port: 8080 (容器内)
database.host: postgres (compose DNS)
redis.host: redis (compose DNS)
CI:
Drone job trigger = git push gitea main
Gitea webhook → Drone webhook (active)
最近一次 build: commit 9c34e961 ✓ (前端 ~90s + 后端 ~60s + deploy ~20s)
```
---
## 六、常用运维命令
### 日常
```bash
# 状态
ssh vps "cd /opt/sub2api && docker compose ps"
# 日志
ssh vps "docker logs sub2api --tail 100 -f"
ssh vps "tail -f /var/log/caddy/ai.puro.im.log"
# 强制重启 sub2api不动 PG/Redis
ssh vps "docker restart sub2api"
# 重载 Caddy
ssh vps "systemctl reload caddy"
# 清 Redis 缓存(改 DB 后要做)
ssh vps "docker exec sub2api-redis redis-cli FLUSHDB"
```
### 触发 CI 重建
```bash
git commit --allow-empty -m "ci: rebuild" && git push gitea main
# 跟 build 进度(可选)
ssh vps "docker ps --filter 'name=^drone-'"
```
### 数据备份
```bash
# PG dump
ssh vps "docker exec sub2api-pg pg_dump -U postgres sub2api" \
> ~/backups/sub2api-$(date +%Y%m%d).sql
# 整个 /opt/sub2api/
ssh vps "tar -czf /tmp/sub2api-full-$(date +%Y%m%d).tgz -C /opt sub2api"
scp vps:/tmp/sub2api-full-*.tgz ~/backups/
```
### 查关键 DB 信息
```sql
-- 账号池
SELECT id, name, platform, type, status, schedulable, expires_at FROM accounts;
-- API keys
SELECT id, user_id, name, status, group_id, quota, quota_used FROM api_keys;
-- 用户 + 余额
SELECT id, email, role, balance FROM users;
-- 今日用量
SELECT model_name, SUM(total_tokens), SUM(total_cost)
FROM usage_logs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY model_name
ORDER BY 2 DESC;
```
---
## 七、清理/从头重来
```bash
# 停 + 删所有容器
ssh vps "cd /opt/sub2api && docker compose down"
# 清数据(会彻底删 wizard 数据、用户、账号池!)
ssh vps "rm -rf /opt/sub2api/{app-data,pg-data,redis-data}"
ssh vps "mkdir -p /opt/sub2api/{app-data,pg-data,redis-data}"
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
# 重新起(走 Wizard
ssh vps "cd /opt/sub2api && docker compose up -d"
# Caddy 保留,不用动
```
**不清数据的"纯重建"**CI 跑一遍等价):
```bash
git commit --allow-empty -m "ci: rebuild" && git push gitea main
```
---
## 参考
- 本地开发:`LOCAL_SETUP_NOTES.md`
- CI 配置:`.drone.yml``.ci/Dockerfile``.ci/README.md`
- 上游https://github.com/Wei-Shaw/sub2api
- Forkhttps://git.puro.im/purovps/sub2api