-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Background
It will happen, after N time, that my node starts having channels deactivated to several peers, one after the other, to be reactivated in a few minutes time tops; at the same time, it will become unable to route HTLCs, as I see no more forwards being notified by bos bot, and one of my peers who tracks these things reports a spike in pending outgoing HTLCs from their node to mine, whenever this happens, that will slowly resolve themselves by failing.
Restarting lnd solves the issue, until next time this happens.
I couldn't make solid hypotheses about why this happens, but here's all the details that I can provide so you maybe have some ideas of your own.
I run sqlite backend, and increased timeout to 10m to avoid SQLITE_BUSY errors. I don't remember this error happening before, but I am 90% sure it started after I began connecting to more peers other than my direct channel ones, to get gossip updates faster from the network (this is before I knew about active sync peers and passive sync peers, before I was connecting to many peers which were all passive, I later on caught up and increased my active peers value, but all of this doesn't seem to have had any influence on the issue).
What I seemed to notice, other than seeing this problem arise after I increased the number of peers I connect to, is that the more peers I have, the sooner this happens. Using persistent connections or not doesn't appear to change anything.
I attached a log for one node which I picked among the ones my nodes detected disconnections to this last time. I had increased PEER loglevel to debug, and zgrep'd logs for its pubkey. I have since then restored info loglevel for everything.
I have disabled, for the time being, my script that connects to more peers, to be bale to report what happens in the upcoming days.
Your environment
- version of
lnd0.16.4 - which operating system (
uname -aon *Nix) 5.10.0-26-amd64 Fix name typo in README #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux - version of
btcd,bitcoind, or other backend 24.1 - any other relevant environment details sqlite backend, 12-core Xeon with 64GB of ECC RAM and 6-ssd zpool mirror pool
Steps to reproduce
Have sqlite backend (no idea if necessary), have an active routing node with 40something channels, connect to many peers (above 300 for faster mishap) with lncli connect <pubkey>@<address>:<port>
Expected behaviour
lnd continues operating normally, managing forwards like a champ
Actual behaviour
channels are disconnected at random, htlcs are not being processed