* [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever @ 2018-05-24 14:44 Damon Wang 2018-05-24 15:46 ` David Teigland 0 siblings, 1 reply; 4+ messages in thread From: Damon Wang @ 2018-05-24 14:44 UTC (permalink / raw) To: LVM general discussion and development Hi all, I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually intensive operations), it shows vglock is failed: /var/log/messages: May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 [1112]: r627 paxos_release 8255 other lver 8258 May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 [1112]: r627 release_token release leader -250 May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error -250 May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -250 May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error -1 May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -1 May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 [1111]: cmd_release 2,9,1061 no resource VGLK meanwhile lvmlockctl -d: 1527167668 close lvchange[34042] cl 476 fd 9 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_lock rv 0 read vb 101 0 1761 1527167669 send lvcreate[34073] cl 478 lock vg rv 0 1527167669 recv lvcreate[34073] cl 478 find_free_lock vg "ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find_free_lock_san found unused area at 127926272 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock 0 offset 127926272 1527167669 send lvcreate[34073] cl 478 find_free_lock vg rv 0 1527167669 recv lvcreate[34073] cl 478 init lv "ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0 1527167669 work init_lv ff35ecc8217543e0a5be9cbe935ffc84/c0dda9c02a7347a098203fbe2a7df1d0 uuid Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 init_lv_san Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK found unused area at 127926272 1527167669 send lvcreate[34073] cl 478 init lv rv 0 vg_args 1.0.0:lvmlock lv_args 1.0.0:127926272 1527167669 recv lvcreate[34073] cl 478 update vg "ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action update iv 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_update cl 478 lk version to 1762 1527167669 send lvcreate[34073] cl 478 update vg rv 0 1527167669 recv lvcreate[34073] cl 478 lock lv "ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 1 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK action lock un 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK res_unlock cl 478 no locks 1527167669 send lvcreate[34073] cl 478 lock lv rv -2 1527167669 recv lvcreate[34073] cl 478 lock vg "ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 0 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action lock un 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock cl 478 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock r_version new 1762 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san ex r_version 1762 flags 0 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san set r_version 1762 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error -250 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -250 1527167669 send lvcreate[34073] cl 478 lock vg rv -250 1527167669 close lvcreate[34073] cl 478 fd 11 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock cl 478 from close 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san ex r_version 0 flags 0 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error -1 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -1 1527167670 new cl 479 pi 2 fd 9 and then any lvm related commands will hung: [root@dev1 ~]# lvmlockctl -i VG ff35ecc8217543e0a5be9cbe935ffc84 lock_type=sanlock Nz4PmR-Bwfi-pItr-C3om-Ewnj-SICA-5rFuFg LS sanlock lvm_ff35ecc8217543e0a5be9cbe935ffc84 LK VG ex ver 1762 pid 0 () LW VG sh ver 0 pid 34216 (lvchange) LW VG sh ver 0 pid 75685 (lvs) LW VG sh ver 0 pid 83741 (lvdisplay) LW VG sh ver 0 pid 90569 (lvchange) LW VG sh ver 0 pid 92735 (lvchange) LW VG sh ver 0 pid 99982 (lvs) LW VG sh ver 0 pid 14069 (lvchange) LK LV sh onuPlt-YXI0-nYtv-CV8Q-yS93-6Zt9-QvlfyH LK GL sh ver 0 pid 75685 (lvs) LK GL sh ver 0 pid 99982 (lvs) LK LV un XOgw6I-3Nuh-ejIP-ydPq-rr7g-LMjb-oklfup My questions are: 1. why VGLK failed, is it because network failure(cause iSCSI fail and sanlock could not find VGLK volume), can I find a direct proof? 2. Is it recoverable? I have tried kill all hung commands but new command still hung forever. Thanks a lot, Damon ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever 2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang @ 2018-05-24 15:46 ` David Teigland 2018-05-24 16:50 ` Damon Wang 0 siblings, 1 reply; 4+ messages in thread From: David Teigland @ 2018-05-24 15:46 UTC (permalink / raw) To: Damon Wang; +Cc: linux-lvm On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote: > Hi all, > > I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually > intensive operations), it shows vglock is failed: Hi, thanks for this report. > /var/log/messages: > > May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 > [1112]: r627 paxos_release 8255 other lver 8258 I believe this is the sanlock bug that was fixed here: https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master By itself, the bug isn't a big problem, the lock was released but sanlock returns an error. The bigger problem is that lvmlockd then believes that the lock was not released: > 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK > unlock_san release error -1 so subsequent requests for the lock get backed up in lvmlockd: > [root@dev1 ~]# lvmlockctl -i > LW VG sh ver 0 pid 34216 (lvchange) > LW VG sh ver 0 pid 75685 (lvs) > LW VG sh ver 0 pid 83741 (lvdisplay) > LW VG sh ver 0 pid 90569 (lvchange) > LW VG sh ver 0 pid 92735 (lvchange) > LW VG sh ver 0 pid 99982 (lvs) > LW VG sh ver 0 pid 14069 (lvchange) > My questions are: > > 1. why VGLK failed, is it because network failure(cause iSCSI fail and > sanlock could not find VGLK volume), can I find a direct proof? I believe the bug. Failures of the storage network can also cause similar issues, but you would see error messages related to i/o timeouts. > 2. Is it recoverable? I have tried kill all hung commands but new > command still hung forever. There are recently added options for this kind of situation, but I don't believe there is an lvm release with those yet. If you are prepared to build your own version of lvm, build lvm release 2.02.178 (which should be ready shortly, if it's not, take git master branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try: lvchange -an --lockopt skipvg <vgname> lvmlockctl --drop <vgname> stop lvmlockd, stop sanlock restart everything as usual If that doesn't work, or if you don't want to build lvm, then unmount file systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup if LVs were active (or perhaps just reboot the machine.) Restart everything as usual. Dave ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever 2018-05-24 15:46 ` David Teigland @ 2018-05-24 16:50 ` Damon Wang 2018-05-30 13:17 ` Damon Wang 0 siblings, 1 reply; 4+ messages in thread From: Damon Wang @ 2018-05-24 16:50 UTC (permalink / raw) To: David Teigland; +Cc: LVM general discussion and development Thank you for your reply! I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try whether it happen again Damon 2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>: > On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote: >> Hi all, >> >> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually >> intensive operations), it shows vglock is failed: > > Hi, thanks for this report. > >> /var/log/messages: >> >> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 >> [1112]: r627 paxos_release 8255 other lver 8258 > > I believe this is the sanlock bug that was fixed here: > https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master > > By itself, the bug isn't a big problem, the lock was released but sanlock > returns an error. The bigger problem is that lvmlockd then believes that > the lock was not released: > >> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK >> unlock_san release error -1 > > so subsequent requests for the lock get backed up in lvmlockd: > >> [root@dev1 ~]# lvmlockctl -i >> LW VG sh ver 0 pid 34216 (lvchange) >> LW VG sh ver 0 pid 75685 (lvs) >> LW VG sh ver 0 pid 83741 (lvdisplay) >> LW VG sh ver 0 pid 90569 (lvchange) >> LW VG sh ver 0 pid 92735 (lvchange) >> LW VG sh ver 0 pid 99982 (lvs) >> LW VG sh ver 0 pid 14069 (lvchange) > >> My questions are: >> >> 1. why VGLK failed, is it because network failure(cause iSCSI fail and >> sanlock could not find VGLK volume), can I find a direct proof? > > I believe the bug. Failures of the storage network can also cause similar > issues, but you would see error messages related to i/o timeouts. > >> 2. Is it recoverable? I have tried kill all hung commands but new >> command still hung forever. > > There are recently added options for this kind of situation, but I don't > believe there is an lvm release with those yet. > > If you are prepared to build your own version of lvm, build lvm release > 2.02.178 (which should be ready shortly, if it's not, take git master > branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try: > > lvchange -an --lockopt skipvg <vgname> > lvmlockctl --drop <vgname> > stop lvmlockd, stop sanlock > restart everything as usual > > If that doesn't work, or if you don't want to build lvm, then unmount file > systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup > if LVs were active (or perhaps just reboot the machine.) Restart > everything as usual. > > Dave ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever 2018-05-24 16:50 ` Damon Wang @ 2018-05-30 13:17 ` Damon Wang 0 siblings, 0 replies; 4+ messages in thread From: Damon Wang @ 2018-05-30 13:17 UTC (permalink / raw) To: David Teigland; +Cc: LVM general discussion and development After days testing, I'm pretty sure the problem has been solved by upgrade sanlock to 3.6.0, thanks Dave! Damon 2018-05-25 0:50 GMT+08:00 Damon Wang <damon.devops@gmail.com>: > Thank you for your reply! > > I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try > whether it happen again > > Damon > > 2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>: >> On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote: >>> Hi all, >>> >>> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually >>> intensive operations), it shows vglock is failed: >> >> Hi, thanks for this report. >> >>> /var/log/messages: >>> >>> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471 >>> [1112]: r627 paxos_release 8255 other lver 8258 >> >> I believe this is the sanlock bug that was fixed here: >> https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master >> >> By itself, the bug isn't a big problem, the lock was released but sanlock >> returns an error. The bigger problem is that lvmlockd then believes that >> the lock was not released: >> >>> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK >>> unlock_san release error -1 >> >> so subsequent requests for the lock get backed up in lvmlockd: >> >>> [root@dev1 ~]# lvmlockctl -i >>> LW VG sh ver 0 pid 34216 (lvchange) >>> LW VG sh ver 0 pid 75685 (lvs) >>> LW VG sh ver 0 pid 83741 (lvdisplay) >>> LW VG sh ver 0 pid 90569 (lvchange) >>> LW VG sh ver 0 pid 92735 (lvchange) >>> LW VG sh ver 0 pid 99982 (lvs) >>> LW VG sh ver 0 pid 14069 (lvchange) >> >>> My questions are: >>> >>> 1. why VGLK failed, is it because network failure(cause iSCSI fail and >>> sanlock could not find VGLK volume), can I find a direct proof? >> >> I believe the bug. Failures of the storage network can also cause similar >> issues, but you would see error messages related to i/o timeouts. >> >>> 2. Is it recoverable? I have tried kill all hung commands but new >>> command still hung forever. >> >> There are recently added options for this kind of situation, but I don't >> believe there is an lvm release with those yet. >> >> If you are prepared to build your own version of lvm, build lvm release >> 2.02.178 (which should be ready shortly, if it's not, take git master >> branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try: >> >> lvchange -an --lockopt skipvg <vgname> >> lvmlockctl --drop <vgname> >> stop lvmlockd, stop sanlock >> restart everything as usual >> >> If that doesn't work, or if you don't want to build lvm, then unmount file >> systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup >> if LVs were active (or perhaps just reboot the machine.) Restart >> everything as usual. >> >> Dave ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-05-30 13:17 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang 2018-05-24 15:46 ` David Teigland 2018-05-24 16:50 ` Damon Wang 2018-05-30 13:17 ` Damon Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).