Files
sub2api/VPS_DEPLOYMENT_NOTES.md
puro ci ac3417a964
Some checks failed
CI / test (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
Security Scan / backend-security (push) Has been cancelled
Security Scan / frontend-security (push) Has been cancelled
docs: add VPS deployment + CI notes [CI SKIP]
综合归档 ai.puro.im 部署记录:
- VPS 架构 + CI 流程图
- 初次手工部署流程(对应 LOCAL_SETUP_NOTES 的线上版)
- Drone pipeline 细节 + skip-CI 约定
- 7 个踩过的坑 + 解法(Setup wizard port/权限、账号池 3 件套、run_mode、balance 缓存、/setup 路由)
- 当前 snapshot、常用运维命令、从头重来流程

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 13:05:30 +08:00

542 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Sub2API VPS 部署 + CI/CD 记录
> 2026-04-19 @ `ai.puro.im` · 217.216.32.230 · Ubuntu 22.04 x86_64
本文档对应 `LOCAL_SETUP_NOTES.md`Mac 本地开发环境)的线上版本,覆盖:从零到站点可访问、接入 Codex CLI、建立 Drone CI 自动部署的完整流程。
---
## 概览
```
外网
│ HTTPS (Let's Encrypt, Caddy auto-TLS)
ai.puro.im :443
┌──────┴──────┐
│ Caddy │ host, /etc/caddy/conf.d/sub2api.conf
│ reverse_ │ → 127.0.0.1:8081
│ proxy │
└──────┬──────┘
docker port map 127.0.0.1:8081 → 容器 8080
┌────────▼──────────────┐
│ sub2api-net (compose) │
│ sub2api (Go) │ /opt/sub2api/ (持久化)
│ sub2api-pg (PG15) │ ├─ app-data/ (config, .installed, pricing, logs)
│ sub2api-redis │ ├─ pg-data/
│ (redis:7-alpine) │ └─ redis-data/
└───────────────────────┘
│ VPS 出口
api.anthropic.com / chatgpt.com / api.openai.com
```
### CI 流程
```
Mac (本地仓库) ─ git push gitea main ─▶ Gitea (git.puro.im) ──webhook──▶ Drone (devops.puro.im)
┌───────────┴────────────┐
│ build-frontend (pnpm) │
│ build-backend (go) │
│ deploy │
│ - cp bin + Dockerfile│
│ - docker compose up │
│ --build sub2api │
└────────────────────────┘
```
---
## 一、VPS 环境
| 项 | 值 | 备注 |
|---|---|---|
| 主机 | 217.216.32.230(新加坡) | `ssh vps` |
| OS | Ubuntu 22.04 LTS x86_64 | |
| Caddy | host 安装,`/etc/caddy/conf.d/*.conf` auto-include | 已跑着 puro.im / git.puro.im / devops.puro.im / erp.puro.im / ... |
| Docker | host 装 + `devops-net` 网络 | 各应用容器都在 devops-net 里 |
| Gitea | `git.puro.im:2222` (SSH), 3000 (HTTP) | purovps/sub2api 仓库 |
| Drone | `devops.puro.im`OAuth via Gitea | drone-server + drone-runner-docker |
| Sub2API 专用 docker-compose | `sub2api-net` 独立 bridge 网络(**不加 devops-net** | 隔离 PG/Redis 生命周期 |
| DNS | Cloudflare **DNS only**(非代理模式) | `ai.puro.im` A 记录 → VPS IP |
| 端口冲突 | 宿主 `:8080` 被 drone-server 占 | Sub2API 映射到宿主 `127.0.0.1:8081` |
---
## 二、初次部署流程(手工路径,首次跑通)
> CI 跑通后这段主要作为"首次引导 / 灾难恢复参考"。日常改代码走 CI下一节
### 1. 本机交叉编译 linux/amd64 二进制
```bash
cd /Users/mini/Work/dev/sub2api/backend
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -tags embed -ldflags='-s -w' -o sub2api-linux ./cmd/server
# 产物 ~75MB, static ELF x86-64
```
注意:本机 macOS arm64VPS 是 x86_64**必须交叉编译**。`-tags embed` 把前端 dist 打进二进制(配合先跑 `pnpm run build`)。
### 2. VPS 建目录 + scp 文件
```bash
ssh vps "mkdir -p /opt/sub2api/{app-data,pg-data,redis-data}"
# 准备 deploy 文件(本地 staging 目录)
cp backend/sub2api-linux /Users/mini/Work/dev/sub2api-deploy/
# Dockerfile + docker-compose.yml 见下节内容
scp /Users/mini/Work/dev/sub2api-deploy/{sub2api-linux,Dockerfile,docker-compose.yml} \
vps:/opt/sub2api/
# distroless:nonroot 需要 UID 65532 可写
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
```
### 3. Dockerfile`.ci/Dockerfile` 也是同一份)
```dockerfile
FROM gcr.io/distroless/static-debian12:nonroot
WORKDIR /app
COPY sub2api-linux /app/sub2api
EXPOSE 8080
ENTRYPOINT ["/app/sub2api"]
```
### 4. docker-compose.yml
关键点:
- `sub2api` 监听 `127.0.0.1:8081:8080`(避开宿主 8080 被 drone 占)
- `/app/data` 整个挂到 `./app-data`(含 config.yaml / install.lock / pricing / logs
- PG 密码通过 `POSTGRES_PASSWORD` env不进仓库
- 独立 `sub2api-net` bridge不混 devops-net
生产文件位置:`/opt/sub2api/docker-compose.yml`**不签入 git**,含 PG 密码)。模板结构:
```yaml
services:
postgres:
image: postgres:15
container_name: sub2api-pg
environment:
POSTGRES_DB: sub2api
POSTGRES_USER: postgres
POSTGRES_PASSWORD: "<32-hex 强口令>"
volumes: [./pg-data:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d sub2api"]
restart: unless-stopped
networks: [sub2api-net]
redis:
image: redis:7-alpine
container_name: sub2api-redis
command: ["redis-server", "--appendonly", "yes"]
volumes: [./redis-data:/data]
healthcheck:
test: ["CMD", "redis-cli", "ping"]
restart: unless-stopped
networks: [sub2api-net]
sub2api:
build: .
image: sub2api:local
container_name: sub2api
depends_on:
postgres: { condition: service_healthy }
redis: { condition: service_healthy }
volumes:
- ./app-data:/app/data
ports: ["127.0.0.1:8081:8080"]
restart: unless-stopped
networks: [sub2api-net]
networks:
sub2api-net:
driver: bridge
```
### 5. 起 stack
```bash
ssh vps "cd /opt/sub2api && docker compose up -d --build"
# 等 PG/Redis healthy 后 sub2api 才会起depends_on + healthcheck 保证)
```
### 6. Caddy 反代
`/etc/caddy/conf.d/sub2api.conf`
```caddy
ai.puro.im {
reverse_proxy 127.0.0.1:8081 {
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Proto {scheme}
}
encode gzip zstd
log {
output file /var/log/caddy/ai.puro.im.log
format json
}
}
```
```bash
ssh vps "systemctl reload caddy"
```
### 7. 首次访问 → Setup Wizard
**不预先写 `/opt/sub2api/app-data/config.yaml`**——让 sub2api 以 wizard 模式启动,自己生成 config。走
1. 浏览器开 `https://ai.puro.im` → 自动进 Setup Wizard
2. Step 1 数据库host=`postgres` / port=`5432` / user=`postgres` / password=<compose 里那个> / dbname=`sub2api` / sslmode=`disable`
3. Step 2 Redishost=`redis` / port=`6379` / password 空 / db=`0`
4. Step 3 Admin邮箱 + ≥8 位密码(自己选好记下来)
5. Step 4 提交 → wizard 写 `app-data/config.yaml` + `app-data/.installed` + DB 创建 admin → 进程自杀、compose 重启后以正常模式跑
### 8. 加 OpenAI OAuth 账号(账号池)
从本机 `~/.codex/auth.json``tokens.refresh_token`,粘到:
- 后台 → 账号管理 → 添加账号 → Platform: OpenAI / Type: OAuth → "粘贴 Refresh Token" → 验证
- 弹出 ChatGPT 邮箱+Plan 即成功
### 9. 三件必调(否则 503 "no available OpenAI accounts"
见 [§四 · 坑 3](#坑-3-新建-oauth-账号一上来无法调度),后台 UI 里:
- **开启调度开关**
- **去掉 auto_pause_on_expired**
- **清掉 expires_at**(或设远期)
---
## 三、CI/CD 流水线Drone
### 仓库
| 位置 | 用途 |
|---|---|
| GitHub: `Wei-Shaw/sub2api` | 上游origin remote |
| Gitea: `git.puro.im/purovps/sub2api` | 我们的 fork**Drone trigger source**remote 名 `gitea` |
本地开发流程:
```bash
# 日常
git checkout main
# 改代码...
git commit
git push gitea main # 触发 CI
# 可选同步上游
git fetch origin
git merge origin/main # 手动决定是否合上游
```
### Drone 激活
首次激活https://devops.puro.im → Sync → 点亮 `purovps/sub2api` → Settings → 开 **Trusted**pipeline 要挂 host socket 和 `/opt/sub2api`)。
激活后 Gitea 会自动配 webhook`POST https://devops.puro.im/hook`,事件 `push, pull_request, ...`)。
### `.drone.yml` 三段 pipeline
```yaml
kind: pipeline
type: docker
name: default
trigger: { branch: [main], event: [push] }
steps:
- name: build-frontend # node:18-alpine + pnpm 10.33
# pnpm install --frozen-lockfile && pnpm run build
# 产物 → backend/internal/web/dist/
- name: build-backend # golang:1.23-alpine + GOTOOLCHAIN=auto
# 自动拉 Go 1.26.2 toolchain
# CGO_ENABLED=0 GOOS=linux GOARCH=amd64 → sub2api-linux
- name: deploy # docker:cli
# cp backend/sub2api-linux /opt/sub2api/sub2api-linux
# cp .ci/Dockerfile /opt/sub2api/Dockerfile
# cd /opt/sub2api && docker compose up -d --build sub2api
# 验证 sub2api 容器 running
```
host 挂载 3 类卷:
- `/var/run/docker.sock` — 用宿主 docker daemon 做构建+重启
- `/opt/sub2api` — 更新 binary/Dockerfile
- `/opt/drone/cache/{pnpm-store,go-build,go-mod}` — 加速后续构建
### 首次构建时长
| Step | 首次 | 缓存命中后 |
|---|---|---|
| build-frontend | ~90s | ~30s |
| build-backend | ~60s+首次下 Go 1.26.2 toolchain | ~40s |
| deploy | ~20s | ~20s |
| **合计** | **~3 min** | **~90s** |
### CI 不碰的内容secrets / 数据)
| 文件/目录 | 位置 | 为什么 |
|---|---|---|
| `docker-compose.yml` | `/opt/sub2api/` | 含 PG 密码 |
| `app-data/config.yaml` | `/opt/sub2api/app-data/` | JWT secret / 运行态 |
| `pg-data/` | `/opt/sub2api/` | PG 数据 |
| `redis-data/` | `/opt/sub2api/` | Redis 数据 |
CI 只:
- 覆写 `/opt/sub2api/sub2api-linux`(二进制)
- 覆写 `/opt/sub2api/Dockerfile`
- `docker compose up -d --build sub2api` — 只重建 `sub2api` servicePG/Redis 不动)
### Skip CI
文档/无代码变动想跳过构建commit 消息加 `[CI SKIP]`Drone 官方约定)。
---
## 四、踩坑与解法
### 坑 1Setup Wizard 默认写 `server.port: 443`
**现象**Wizard 跑完 → 容器重启 → `127.0.0.1:8081` 无响应app 启动了但监听在 container 内的 :443
**根因**Wizard 默认把 `server.port` 写成 443面向"容器直接暴露 HTTPS"的用法,不适合我们"容器内 8080 + Caddy 外 443"的模式)。
**解法**
```bash
ssh vps "sed -i 's/port: 443/port: 8080/' /opt/sub2api/app-data/config.yaml && docker restart sub2api"
```
后续 CI 不重写 config.yaml这个修复是一次性的。
---
### 坑 2distroless:nonroot 日志目录权限
**现象**:容器启动日志刷 `write error: can't open new logfile: open /app/data/logs/sub2api.log: permission denied`
**根因**`gcr.io/distroless/static-debian12:nonroot` 进程 UID=65532`/opt/sub2api/app-data/logs` 宿主目录 owner=root。
**解法**
```bash
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
```
一次性。CI 不碰 app-data所以不复发。
---
### 坑 3新建 OAuth 账号一上来"无法调度"
**现象**refresh_token 粘好、验证通过、UI 显示"正常"curl `/responses``503 "Service temporarily unavailable"`,日志 `openai.account_select_failed: no available OpenAI accounts`
**根因(三个叠加)**
1. UI 默认 `schedulable=false`
2. Wizard/后端给 `expires_at` 写了 **access_token 的短期过期**~7 分钟后),而 `auto_pause_on_expired=true` 默认开
3. 每次 sub2api 重启 → `[AccountExpiry] Auto paused 1 expired accounts` → schedulable 被打回 false
**解法**SQL 或 UI 都可):
```sql
UPDATE accounts
SET schedulable=true, expires_at=NULL, auto_pause_on_expired=false
WHERE id=<n>;
```
**已存 memory**`feedback_sub2api_account_pitfalls.md`。长期方案:修 fork 里的账号创建逻辑OAuth 账号的 `expires_at` 应指 refresh_token 过期或订阅到期,不是 access_token。
---
### 坑 4`run_mode: simple` 隐藏 SaaS 菜单
**现象**:为跳过 `INSUFFICIENT_BALANCE` 临时切到 `run_mode: simple`,重登后台发现**用户管理 / 分组管理 / 渠道管理 / 订阅管理 / 兑换码 / 优惠码**全消失。
**根因****设计行为**,不是 bug。`simple` 模式是给 "单用户/团队内部工具" 用,刻意隐藏 SaaS 管理面板;前端 `stores/auth.ts``isSimpleMode = computed(() => runMode === 'simple')` 控制路由可见性。
**解法**
- 如果需要管理 group/订阅/计费/兑换码 → 保持 `run_mode: standard`
- `INSUFFICIENT_BALANCE` 的替代方案:给 admin 塞大额 balance 即可(见坑 5
---
### 坑 5`standard` 模式 + admin balance=0 → 403
**现象**:切回 `standard` 后 curl `/responses``INSUFFICIENT_BALANCE`
**根因**:中间件 `api_key_auth.go:198` 检查 `apiKey.User.Balance <= 0` → 403。admin 默认余额为 0。
**解法**
```sql
UPDATE users SET balance = 1000000000 WHERE id=1;
```
10 亿够跑很久。接 iShare 后改用 iShare 订阅模式sub2api 标记为 "subscription mode",走订阅限额而不查 balance
---
### 坑 6Redis L2 cache stale
**现象**:跑 [坑 5] 的 SQL 之后、docker restart sub2api 之后curl 仍报 `INSUFFICIENT_BALANCE`
**根因**Sub2API 的 API key 鉴权有两级缓存:
- L1 进程内 LRUTTL 15s
- L2 Rediskey `sub2api:apikey:*`**TTL 300s**
docker restart 只清 L1L2 是独立 Redis 容器,保留着旧的 `balance=0` 缓存条目。
**解法**
```bash
ssh vps "docker exec sub2api-redis redis-cli FLUSHDB && docker restart sub2api"
```
**一般经验**:改 DB 里的 user / api_key / account 后需重启 sub2api **+** 清 Redis。只重启 sub2api 不够。
---
### 坑 7`/setup/*` 路由按需注册
**现象**:首次手工写了 config.yaml 让容器跑起来,后来想走 Wizard 重新配置;前端进了 `/setup` 页,填完点"测试连接" → `Request failed with status code 404`
**根因**`cmd/server/main.go` 里只有 `NeedsSetup() == true` 分支(走 `runSetupServer()`)才 `setup.RegisterRoutes(r)`。config.yaml 已存在时走 `runMainServer()`setup 路由不注册 → 前端 `/setup/test-db` → 404。
**解法**:让 `NeedsSetup()` 返回 true`/app/data/config.yaml` **和** `/app/data/.installed` 都不存在。具体做法:
```bash
ssh vps "cd /opt/sub2api && mv app-data/config.yaml app-data/config.yaml.bak; \
mv app-data/.installed app-data/.installed.bak 2>/dev/null; \
docker restart sub2api"
```
---
## 五、当前状态 snapshot2026-04-19
```
Domain https://ai.puro.im (HTTP/2, Let's Encrypt via Caddy)
VPS 217.216.32.230 (Singapore, Ubuntu 22.04)
Image sub2api:local (Drone CI-built, distroless + static Go binary)
Containers on VPS:
sub2api ← Drone 每次 push 重建
sub2api-pg postgres:15 (devops-net 无关,独立 sub2api-net)
sub2api-redis redis:7-alpine
Accounts:
admin@puro.im (id=1, role=admin, balance=1e9)
test_myopenai (OpenAI OAuth, group=test_codex, schedulable=true)
API keys:
sk-d2132de2f0b4c1ab64ef7241a16d254cab483f1f8afd47ad4a89e39cf6e2345a
(user=admin, group=test_codex)
Config:
run_mode: standard
server.port: 8080 (容器内)
database.host: postgres (compose DNS)
redis.host: redis (compose DNS)
CI:
Drone job trigger = git push gitea main
Gitea webhook → Drone webhook (active)
最近一次 build: commit 9c34e961 ✓ (前端 ~90s + 后端 ~60s + deploy ~20s)
```
---
## 六、常用运维命令
### 日常
```bash
# 状态
ssh vps "cd /opt/sub2api && docker compose ps"
# 日志
ssh vps "docker logs sub2api --tail 100 -f"
ssh vps "tail -f /var/log/caddy/ai.puro.im.log"
# 强制重启 sub2api不动 PG/Redis
ssh vps "docker restart sub2api"
# 重载 Caddy
ssh vps "systemctl reload caddy"
# 清 Redis 缓存(改 DB 后要做)
ssh vps "docker exec sub2api-redis redis-cli FLUSHDB"
```
### 触发 CI 重建
```bash
git commit --allow-empty -m "ci: rebuild" && git push gitea main
# 跟 build 进度(可选)
ssh vps "docker ps --filter 'name=^drone-'"
```
### 数据备份
```bash
# PG dump
ssh vps "docker exec sub2api-pg pg_dump -U postgres sub2api" \
> ~/backups/sub2api-$(date +%Y%m%d).sql
# 整个 /opt/sub2api/
ssh vps "tar -czf /tmp/sub2api-full-$(date +%Y%m%d).tgz -C /opt sub2api"
scp vps:/tmp/sub2api-full-*.tgz ~/backups/
```
### 查关键 DB 信息
```sql
-- 账号池
SELECT id, name, platform, type, status, schedulable, expires_at FROM accounts;
-- API keys
SELECT id, user_id, name, status, group_id, quota, quota_used FROM api_keys;
-- 用户 + 余额
SELECT id, email, role, balance FROM users;
-- 今日用量
SELECT model_name, SUM(total_tokens), SUM(total_cost)
FROM usage_logs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY model_name
ORDER BY 2 DESC;
```
---
## 七、清理/从头重来
```bash
# 停 + 删所有容器
ssh vps "cd /opt/sub2api && docker compose down"
# 清数据(会彻底删 wizard 数据、用户、账号池!)
ssh vps "rm -rf /opt/sub2api/{app-data,pg-data,redis-data}"
ssh vps "mkdir -p /opt/sub2api/{app-data,pg-data,redis-data}"
ssh vps "chown -R 65532:65532 /opt/sub2api/app-data"
# 重新起(走 Wizard
ssh vps "cd /opt/sub2api && docker compose up -d"
# Caddy 保留,不用动
```
**不清数据的"纯重建"**CI 跑一遍等价):
```bash
git commit --allow-empty -m "ci: rebuild" && git push gitea main
```
---
## 参考
- 本地开发:`LOCAL_SETUP_NOTES.md`
- CI 配置:`.drone.yml``.ci/Dockerfile``.ci/README.md`
- 上游https://github.com/Wei-Shaw/sub2api
- Forkhttps://git.puro.im/purovps/sub2api