From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <20180524154623.GA19254@redhat.com> References: <20180524154623.GA19254@redhat.com> From: Damon Wang Date: Fri, 25 May 2018 00:50:27 +0800 Message-ID: Subject: Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: David Teigland Cc: LVM general discussion and development Thank you for your reply! I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try whether it happen again Damon 2018-05-24 23:46 GMT+08:00 David Teigland : > On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote: >> Hi all, >> >> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually >> intensive operations), it shows vglock is failed: > > Hi, thanks for this report. > >> /var/log/messages: >> >> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 >> [1112]: r627 paxos_release 8255 other lver 8258 > > I believe this is the sanlock bug that was fixed here: > https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master > > By itself, the bug isn't a big problem, the lock was released but sanlock > returns an error. The bigger problem is that lvmlockd then believes that > the lock was not released: > >> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK >> unlock_san release error -1 > > so subsequent requests for the lock get backed up in lvmlockd: > >> [root@dev1 ~]# lvmlockctl -i >> LW VG sh ver 0 pid 34216 (lvchange) >> LW VG sh ver 0 pid 75685 (lvs) >> LW VG sh ver 0 pid 83741 (lvdisplay) >> LW VG sh ver 0 pid 90569 (lvchange) >> LW VG sh ver 0 pid 92735 (lvchange) >> LW VG sh ver 0 pid 99982 (lvs) >> LW VG sh ver 0 pid 14069 (lvchange) > >> My questions are: >> >> 1. why VGLK failed, is it because network failure(cause iSCSI fail and >> sanlock could not find VGLK volume), can I find a direct proof? > > I believe the bug. Failures of the storage network can also cause similar > issues, but you would see error messages related to i/o timeouts. > >> 2. Is it recoverable? I have tried kill all hung commands but new >> command still hung forever. > > There are recently added options for this kind of situation, but I don't > believe there is an lvm release with those yet. > > If you are prepared to build your own version of lvm, build lvm release > 2.02.178 (which should be ready shortly, if it's not, take git master > branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try: > > lvchange -an --lockopt skipvg > lvmlockctl --drop > stop lvmlockd, stop sanlock > restart everything as usual > > If that doesn't work, or if you don't want to build lvm, then unmount file > systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup > if LVs were active (or perhaps just reboot the machine.) Restart > everything as usual. > > Dave