Skip to content

[BUG]heartbeat timeout related repair #3404

@ylinzhu

Description

@ylinzhu

1.There are two heartbeat instances for the same connection

arthas instruction
[arthas@566784]$ watch *MySQLHeartbeat heartbeat "{params,target.source.dbGroupConfig,target.isStop,target.hashcode}" "target.source.config.instanceName=='hostM8'" -x 1 -b Press Q or Ctrl+C to abort. Affect(class count: 1 , method count: 1) cost in 150 ms, listenerId: 29 method=com.actiontech.dble.backend.heartbeat.MySQLHeartbeat.heartbeat location=AtEnter ts=2021-09-10 10:54:07; [cost=3.71E-4ms] result=@ArrayList[ @Object[][isEmpty=true;size=0], @DbGroupConfig[com.actiontech.dble.config.model.db.DbGroupConfig@560f3ad8], @Boolean[false], @Integer[149924462], ] method=com.actiontech.dble.backend.heartbeat.MySQLHeartbeat.heartbeat location=AtEnter ts=2021-09-10 10:54:15; [cost=0.002031ms] result=@ArrayList[ @Object[][isEmpty=true;size=0], @DbGroupConfig[com.actiontech.dble.config.model.db.DbGroupConfig@1a263566], @Boolean[true], @Integer[68519983], ] method=com.actiontech.dble.backend.heartbeat.MySQLHeartbeat.heartbeat location=AtEnter ts=2021-09-10 10:54:17; [cost=2.35E-4ms] result=@ArrayList[ @Object[][isEmpty=true;size=0], @DbGroupConfig[com.actiontech.dble.config.model.db.DbGroupConfig@560f3ad8], @Boolean[false], @Integer[149924462], ] method=com.actiontech.dble.backend.heartbeat.MySQLHeartbeat.heartbeat location=AtEnter ts=2021-09-10 10:54:25; [cost=0.001981ms] result=@ArrayList[ @Object[][isEmpty=true;size=0], @DbGroupConfig[com.actiontech.dble.config.model.db.DbGroupConfig@1a263566], @Boolean[true], @Integer[68519983], ]

  1. The heartbeat thread hangs and has a memory leak due to the heartbeat

jstack instructions
"complexQueryExecutor12" daemon prio=5 tid=96 WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) Local Variable: java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#45 Local Variable: java.util.concurrent.locks.AbstractQueuedSynchronizer$Node#14 at com.actiontech.dble.config.helper.GetAndSyncDbInstanceKeyVariables.call(GetAndSyncDbInstanceKeyVariables.java:69) Local Variable: java.lang.String[]#944 Local Variable: java.lang.StringBuilder#717 Local Variable: com.actiontech.dble.sqlengine.OneRawSQLQueryResultHandler#22 Local Variable: com.actiontech.dble.sqlengine.OneTimeConnJob#2 at com.actiontech.dble.backend.heartbeat.MySQLDetector.checkRecoverFail(MySQLDetector.java:141) Local Variable: com.actiontech.dble.config.helper.GetAndSyncDbInstanceKeyVariables#2 at com.actiontech.dble.backend.heartbeat.MySQLDetector.setStatusBySlave(MySQLDetector.java:204) at com.actiontech.dble.backend.heartbeat.MySQLDetector.onResult(MySQLDetector.java:99) Local Variable: java.util.HashMap#587 Local Variable: com.actiontech.dble.backend.mysql.nio.MySQLInstance#1 at com.actiontech.dble.backend.heartbeat.MySQLDetector.onResult(MySQLDetector.java:30) Local Variable: com.actiontech.dble.backend.heartbeat.MySQLDetector#10 at com.actiontech.dble.sqlengine.OneRawSQLQueryResultHandler.finished(OneRawSQLQueryResultHandler.java:89) Local Variable: com.actiontech.dble.sqlengine.SQLQueryResult#2 Local Variable: com.actiontech.dble.sqlengine.OneRawSQLQueryResultHandler#10 at com.actiontech.dble.backend.heartbeat.HeartbeatSQLJob.doFinished(HeartbeatSQLJob.java:79) at com.actiontech.dble.backend.heartbeat.HeartbeatSQLJob.rowEofResponse(HeartbeatSQLJob.java:129) Local Variable: com.actiontech.dble.backend.heartbeat.HeartbeatSQLJob#10 at com.actiontech.dble.net.response.DefaultResponseHandler.handleRowEofPacket(DefaultResponseHandler.java:145) at com.actiontech.dble.net.response.DefaultResponseHandler.eof(DefaultResponseHandler.java:87) Local Variable: com.actiontech.dble.net.response.DefaultResponseHandler#34 at com.actiontech.dble.services.BackendService.handleInnerData(BackendService.java:202) at com.actiontech.dble.net.service.AbstractService.consumeSingleTask(AbstractService.java:184) Local Variable: byte[]#9415 at com.actiontech.dble.services.BackendService.handleTaskQueue(BackendService.java:169) Local Variable: com.actiontech.dble.net.service.NormalServiceTask#7537 at com.actiontech.dble.services.BackendService.access$300(BackendService.java:48) Local Variable: com.actiontech.dble.net.executor.ThreadContext#40 Local Variable: com.actiontech.dble.services.mysqlsharding.MySQLResponseService#33 at com.actiontech.dble.services.BackendService$BackendOnetimeRunnable.run(BackendService.java:621) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) Local Variable: com.actiontech.dble.util.NameableExecutor#7 Local Variable: com.actiontech.dble.services.BackendService$BackendOnetimeRunnable#33 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) Local Variable: java.util.concurrent.ThreadPoolExecutor$Worker#26 at java.lang.Thread.run(Thread.java:745)

3.TCP semi-connection queue is full, causing the heartbeat to time out.

Only log printing times out, and no other exception occurs

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions