feat: 优化异步线程池配置,增加设备检测和通用任务线程池的核心与最大线程数,提升并发处理能力;更新数据库连接池配置,增强连接管理和性能
This commit is contained in:
306
DEPLOYMENT_CHECKLIST.md
Normal file
306
DEPLOYMENT_CHECKLIST.md
Normal file
@@ -0,0 +1,306 @@
|
||||
# 并发优化部署检查清单
|
||||
|
||||
## 📋 部署前检查
|
||||
|
||||
### 1. 代码变更确认
|
||||
- [x] `application.yml` - Tomcat 线程池配置
|
||||
- [x] `application.yml` - HikariCP 连接池配置
|
||||
- [x] `application.yml` - 事务超时配置
|
||||
- [x] `application.yml` - 监控端点配置
|
||||
- [x] `ScriptClient.java` - HTTP 连接池配置
|
||||
- [x] `AsyncConfig.java` - 异步线程池配置
|
||||
|
||||
### 2. 依赖检查
|
||||
确保 `pom.xml` 包含必要的依赖:
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>io.projectreactor.netty</groupId>
|
||||
<artifactId>reactor-netty</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-actuator</artifactId>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
### 3. 编译检查
|
||||
```bash
|
||||
mvn clean compile
|
||||
# 确保没有编译错误
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 部署步骤
|
||||
|
||||
### 1. 备份当前版本
|
||||
```bash
|
||||
# 备份配置文件
|
||||
cp src/main/resources/application.yml application.yml.backup
|
||||
|
||||
# 备份当前 JAR 包
|
||||
cp target/gameplatform-server-*.jar gameplatform-server-backup.jar
|
||||
```
|
||||
|
||||
### 2. 编译新版本
|
||||
```bash
|
||||
mvn clean package -DskipTests
|
||||
```
|
||||
|
||||
### 3. 停止旧服务
|
||||
```bash
|
||||
# 找到进程
|
||||
ps aux | grep gameplatform-server
|
||||
|
||||
# 优雅停止(如果配置了)
|
||||
kill -TERM <PID>
|
||||
|
||||
# 或强制停止
|
||||
kill -9 <PID>
|
||||
```
|
||||
|
||||
### 4. 启动新服务
|
||||
```bash
|
||||
java -jar target/gameplatform-server-*.jar \
|
||||
-Xms2g -Xmx4g \
|
||||
-XX:+UseG1GC \
|
||||
-XX:MaxGCPauseMillis=200 \
|
||||
> logs/app.log 2>&1 &
|
||||
```
|
||||
|
||||
### 5. 启动日志检查
|
||||
```bash
|
||||
tail -f logs/app.log
|
||||
```
|
||||
|
||||
**期望看到的日志**:
|
||||
```
|
||||
ScriptClient 初始化完成: baseUrl=..., 最大连接数=100
|
||||
设备检测线程池已初始化: coreSize=10, maxSize=50, queueCapacity=500
|
||||
通用异步任务线程池已初始化: coreSize=20, maxSize=100, queueCapacity=1000
|
||||
HikariPool-1 - Starting...
|
||||
HikariPool-1 - Start completed.
|
||||
Tomcat started on port(s): 18080
|
||||
Started GamePlatformServerApplication in X.XX seconds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ 部署后验证
|
||||
|
||||
### 1. 健康检查
|
||||
```bash
|
||||
# 检查服务是否启动
|
||||
curl http://localhost:18080/actuator/health
|
||||
|
||||
# 期望输出
|
||||
{"status":"UP"}
|
||||
```
|
||||
|
||||
### 2. 功能验证
|
||||
```bash
|
||||
# 测试关键接口
|
||||
curl -X POST http://localhost:18080/api/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"test","password":"test"}'
|
||||
|
||||
# 测试设备状态接口
|
||||
curl http://localhost:18080/api/devices/status
|
||||
```
|
||||
|
||||
### 3. 监控指标检查
|
||||
```bash
|
||||
# 检查数据库连接池
|
||||
curl http://localhost:18080/actuator/metrics/hikaricp.connections.active
|
||||
curl http://localhost:18080/actuator/metrics/hikaricp.connections
|
||||
|
||||
# 检查 Tomcat 线程
|
||||
curl http://localhost:18080/actuator/metrics/tomcat.threads.busy
|
||||
curl http://localhost:18080/actuator/metrics/tomcat.threads.current
|
||||
|
||||
# 检查 HTTP 客户端指标
|
||||
curl http://localhost:18080/actuator/metrics/reactor.netty.connection.provider
|
||||
```
|
||||
|
||||
### 4. 性能基准测试
|
||||
```bash
|
||||
# 简单压测(100并发,持续30秒)
|
||||
ab -n 10000 -c 100 -t 30 http://localhost:18080/api/health
|
||||
|
||||
# 观察结果
|
||||
# - 成功率应该 > 99%
|
||||
# - 平均响应时间应该 < 100ms
|
||||
# - 无连接错误
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 监控观察(前24小时)
|
||||
|
||||
### 1. 关键指标监控
|
||||
|
||||
| 指标 | 正常范围 | 告警阈值 |
|
||||
|------|----------|----------|
|
||||
| CPU 使用率 | < 60% | > 80% |
|
||||
| 内存使用率 | < 70% | > 85% |
|
||||
| 数据库连接数 | < 80 | > 90 |
|
||||
| Tomcat 线程数 | < 150 | > 180 |
|
||||
| 响应时间 P95 | < 500ms | > 1000ms |
|
||||
| 错误率 | < 0.1% | > 1% |
|
||||
|
||||
### 2. 日志监控
|
||||
|
||||
观察是否有以下错误:
|
||||
```bash
|
||||
# 连接池耗尽
|
||||
grep "Connection is not available" logs/app.log
|
||||
|
||||
# 线程池拒绝
|
||||
grep "Task rejected" logs/app.log
|
||||
|
||||
# 超时错误
|
||||
grep "timeout" logs/app.log
|
||||
|
||||
# 数据库锁等待
|
||||
grep "Lock wait timeout" logs/app.log
|
||||
```
|
||||
|
||||
### 3. 业务指标监控
|
||||
|
||||
- 链接生成成功率
|
||||
- 设备分配成功率
|
||||
- 选区操作成功率
|
||||
- 平均处理时间
|
||||
|
||||
---
|
||||
|
||||
## 🔧 常见问题处理
|
||||
|
||||
### 问题1: 启动失败 - 端口被占用
|
||||
```bash
|
||||
# 查找占用端口的进程
|
||||
lsof -i :18080
|
||||
netstat -tulpn | grep 18080
|
||||
|
||||
# 停止占用的进程
|
||||
kill -9 <PID>
|
||||
```
|
||||
|
||||
### 问题2: 内存不足
|
||||
```bash
|
||||
# 检查可用内存
|
||||
free -m
|
||||
|
||||
# 调整 JVM 参数
|
||||
java -jar app.jar -Xms1g -Xmx2g # 降低内存配置
|
||||
```
|
||||
|
||||
### 问题3: 数据库连接失败
|
||||
```bash
|
||||
# 测试数据库连接
|
||||
mysql -h 192.140.164.137 -u login_task_db -p
|
||||
|
||||
# 检查防火墙
|
||||
telnet 192.140.164.137 3306
|
||||
```
|
||||
|
||||
### 问题4: 性能不如预期
|
||||
|
||||
**排查步骤**:
|
||||
1. 检查 JVM GC 日志
|
||||
2. 查看线程 dump
|
||||
3. 检查数据库慢查询
|
||||
4. 查看外部接口响应时间
|
||||
|
||||
```bash
|
||||
# 导出线程 dump
|
||||
curl http://localhost:18080/actuator/threaddump > threaddump.json
|
||||
|
||||
# 检查 GC 情况
|
||||
jstat -gc <PID> 1000 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔄 回滚流程
|
||||
|
||||
如果遇到严重问题需要回滚:
|
||||
|
||||
### 1. 快速回滚
|
||||
```bash
|
||||
# 停止新服务
|
||||
kill -TERM <PID>
|
||||
|
||||
# 启动备份版本
|
||||
java -jar gameplatform-server-backup.jar > logs/app.log 2>&1 &
|
||||
```
|
||||
|
||||
### 2. 完整回滚
|
||||
```bash
|
||||
# 恢复代码
|
||||
git checkout HEAD~1 -- src/main/resources/application.yml
|
||||
git checkout HEAD~1 -- src/main/java/com/gameplatform/server/service/external/ScriptClient.java
|
||||
git checkout HEAD~1 -- src/main/java/com/gameplatform/server/config/AsyncConfig.java
|
||||
|
||||
# 重新编译
|
||||
mvn clean package -DskipTests
|
||||
|
||||
# 部署旧版本
|
||||
# ... 按照部署步骤重新部署
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 性能对比记录
|
||||
|
||||
在部署后记录性能数据,用于对比:
|
||||
|
||||
### 优化前基准(记录日期:____)
|
||||
- 并发处理能力: ___ req/s
|
||||
- 平均响应时间: ___ ms
|
||||
- P95 响应时间: ___ ms
|
||||
- 错误率: ___%
|
||||
|
||||
### 优化后指标(记录日期:____)
|
||||
- 并发处理能力: ___ req/s
|
||||
- 平均响应时间: ___ ms
|
||||
- P95 响应时间: ___ ms
|
||||
- 错误率: ___%
|
||||
|
||||
### 提升幅度
|
||||
- 并发能力提升: ___%
|
||||
- 响应时间降低: ___%
|
||||
- 错误率变化: ___%
|
||||
|
||||
---
|
||||
|
||||
## ✅ 最终确认
|
||||
|
||||
部署完成后,确认以下各项:
|
||||
|
||||
- [ ] 服务正常启动
|
||||
- [ ] 健康检查通过
|
||||
- [ ] 关键接口功能正常
|
||||
- [ ] 监控指标正常
|
||||
- [ ] 日志无严重错误
|
||||
- [ ] 压力测试通过
|
||||
- [ ] 业务功能验证通过
|
||||
- [ ] 已配置监控告警
|
||||
- [ ] 已记录性能基准数据
|
||||
- [ ] 已通知相关人员
|
||||
|
||||
---
|
||||
|
||||
## 📞 联系方式
|
||||
|
||||
如遇到问题,请联系:
|
||||
- 技术负责人: ___________
|
||||
- 运维负责人: ___________
|
||||
- 紧急联系电话: ___________
|
||||
|
||||
---
|
||||
|
||||
**部署日期**: ___________
|
||||
**部署人员**: ___________
|
||||
**审核人员**: ___________
|
||||
|
||||
Reference in New Issue
Block a user