* [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
@ 2018-05-24 14:44 Damon Wang
2018-05-24 15:46 ` David Teigland
0 siblings, 1 reply; 4+ messages in thread
From: Damon Wang @ 2018-05-24 14:44 UTC (permalink / raw)
To: LVM general discussion and development
Hi all,
I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
intensive operations), it shows vglock is failed:
/var/log/messages:
May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1112]: r627 paxos_release 8255 other lver 8258
May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1112]: r627 release_token release leader -250
May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error
-250
May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -250
May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error
-1
May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -1
May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1111]: cmd_release 2,9,1061 no resource VGLK
meanwhile lvmlockctl -d:
1527167668 close lvchange[34042] cl 476 fd 9
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_lock
rv 0 read vb 101 0 1761
1527167669 send lvcreate[34073] cl 478 lock vg rv 0
1527167669 recv lvcreate[34073] cl 478 find_free_lock vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84
find_free_lock_san found unused area at 127926272
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock 0
offset 127926272
1527167669 send lvcreate[34073] cl 478 find_free_lock vg rv 0
1527167669 recv lvcreate[34073] cl 478 init lv
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
1527167669 work init_lv
ff35ecc8217543e0a5be9cbe935ffc84/c0dda9c02a7347a098203fbe2a7df1d0 uuid
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 init_lv_san
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK found unused area at 127926272
1527167669 send lvcreate[34073] cl 478 init lv rv 0 vg_args
1.0.0:lvmlock lv_args 1.0.0:127926272
1527167669 recv lvcreate[34073] cl 478 update vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action update iv
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_update cl 478 lk version to 1762
1527167669 send lvcreate[34073] cl 478 update vg rv 0
1527167669 recv lvcreate[34073] cl 478 lock lv
"ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 1
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK action lock un
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK res_unlock cl 478 no locks
1527167669 send lvcreate[34073] cl 478 lock lv rv -2
1527167669 recv lvcreate[34073] cl 478 lock vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 0
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action lock un
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock cl 478
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock r_version new 1762
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san ex r_version 1762 flags 0
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san set r_version 1762
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san release error -250
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock lm error -250
1527167669 send lvcreate[34073] cl 478 lock vg rv -250
1527167669 close lvcreate[34073] cl 478 fd 11
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock cl 478 from close
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san ex r_version 0 flags 0
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san release error -1
1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock lm error -1
1527167670 new cl 479 pi 2 fd 9
and then any lvm related commands will hung:
[root@dev1 ~]# lvmlockctl -i
VG ff35ecc8217543e0a5be9cbe935ffc84 lock_type=sanlock
Nz4PmR-Bwfi-pItr-C3om-Ewnj-SICA-5rFuFg
LS sanlock lvm_ff35ecc8217543e0a5be9cbe935ffc84
LK VG ex ver 1762 pid 0 ()
LW VG sh ver 0 pid 34216 (lvchange)
LW VG sh ver 0 pid 75685 (lvs)
LW VG sh ver 0 pid 83741 (lvdisplay)
LW VG sh ver 0 pid 90569 (lvchange)
LW VG sh ver 0 pid 92735 (lvchange)
LW VG sh ver 0 pid 99982 (lvs)
LW VG sh ver 0 pid 14069 (lvchange)
LK LV sh onuPlt-YXI0-nYtv-CV8Q-yS93-6Zt9-QvlfyH
LK GL sh ver 0 pid 75685 (lvs)
LK GL sh ver 0 pid 99982 (lvs)
LK LV un XOgw6I-3Nuh-ejIP-ydPq-rr7g-LMjb-oklfup
My questions are:
1. why VGLK failed, is it because network failure(cause iSCSI fail and
sanlock could not find VGLK volume), can I find a direct proof?
2. Is it recoverable? I have tried kill all hung commands but new
command still hung forever.
Thanks a lot,
Damon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang
@ 2018-05-24 15:46 ` David Teigland
2018-05-24 16:50 ` Damon Wang
0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2018-05-24 15:46 UTC (permalink / raw)
To: Damon Wang; +Cc: linux-lvm
On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
> Hi all,
>
> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
> intensive operations), it shows vglock is failed:
Hi, thanks for this report.
> /var/log/messages:
>
> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
> [1112]: r627 paxos_release 8255 other lver 8258
I believe this is the sanlock bug that was fixed here:
https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master
By itself, the bug isn't a big problem, the lock was released but sanlock
returns an error. The bigger problem is that lvmlockd then believes that
the lock was not released:
> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
> unlock_san release error -1
so subsequent requests for the lock get backed up in lvmlockd:
> [root@dev1 ~]# lvmlockctl -i
> LW VG sh ver 0 pid 34216 (lvchange)
> LW VG sh ver 0 pid 75685 (lvs)
> LW VG sh ver 0 pid 83741 (lvdisplay)
> LW VG sh ver 0 pid 90569 (lvchange)
> LW VG sh ver 0 pid 92735 (lvchange)
> LW VG sh ver 0 pid 99982 (lvs)
> LW VG sh ver 0 pid 14069 (lvchange)
> My questions are:
>
> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
> sanlock could not find VGLK volume), can I find a direct proof?
I believe the bug. Failures of the storage network can also cause similar
issues, but you would see error messages related to i/o timeouts.
> 2. Is it recoverable? I have tried kill all hung commands but new
> command still hung forever.
There are recently added options for this kind of situation, but I don't
believe there is an lvm release with those yet.
If you are prepared to build your own version of lvm, build lvm release
2.02.178 (which should be ready shortly, if it's not, take git master
branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try:
lvchange -an --lockopt skipvg <vgname>
lvmlockctl --drop <vgname>
stop lvmlockd, stop sanlock
restart everything as usual
If that doesn't work, or if you don't want to build lvm, then unmount file
systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
if LVs were active (or perhaps just reboot the machine.) Restart
everything as usual.
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
2018-05-24 15:46 ` David Teigland
@ 2018-05-24 16:50 ` Damon Wang
2018-05-30 13:17 ` Damon Wang
0 siblings, 1 reply; 4+ messages in thread
From: Damon Wang @ 2018-05-24 16:50 UTC (permalink / raw)
To: David Teigland; +Cc: LVM general discussion and development
Thank you for your reply!
I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try
whether it happen again
Damon
2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>:
> On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
>> Hi all,
>>
>> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
>> intensive operations), it shows vglock is failed:
>
> Hi, thanks for this report.
>
>> /var/log/messages:
>>
>> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
>> [1112]: r627 paxos_release 8255 other lver 8258
>
> I believe this is the sanlock bug that was fixed here:
> https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master
>
> By itself, the bug isn't a big problem, the lock was released but sanlock
> returns an error. The bigger problem is that lvmlockd then believes that
> the lock was not released:
>
>> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
>> unlock_san release error -1
>
> so subsequent requests for the lock get backed up in lvmlockd:
>
>> [root@dev1 ~]# lvmlockctl -i
>> LW VG sh ver 0 pid 34216 (lvchange)
>> LW VG sh ver 0 pid 75685 (lvs)
>> LW VG sh ver 0 pid 83741 (lvdisplay)
>> LW VG sh ver 0 pid 90569 (lvchange)
>> LW VG sh ver 0 pid 92735 (lvchange)
>> LW VG sh ver 0 pid 99982 (lvs)
>> LW VG sh ver 0 pid 14069 (lvchange)
>
>> My questions are:
>>
>> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
>> sanlock could not find VGLK volume), can I find a direct proof?
>
> I believe the bug. Failures of the storage network can also cause similar
> issues, but you would see error messages related to i/o timeouts.
>
>> 2. Is it recoverable? I have tried kill all hung commands but new
>> command still hung forever.
>
> There are recently added options for this kind of situation, but I don't
> believe there is an lvm release with those yet.
>
> If you are prepared to build your own version of lvm, build lvm release
> 2.02.178 (which should be ready shortly, if it's not, take git master
> branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try:
>
> lvchange -an --lockopt skipvg <vgname>
> lvmlockctl --drop <vgname>
> stop lvmlockd, stop sanlock
> restart everything as usual
>
> If that doesn't work, or if you don't want to build lvm, then unmount file
> systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
> if LVs were active (or perhaps just reboot the machine.) Restart
> everything as usual.
>
> Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
2018-05-24 16:50 ` Damon Wang
@ 2018-05-30 13:17 ` Damon Wang
0 siblings, 0 replies; 4+ messages in thread
From: Damon Wang @ 2018-05-30 13:17 UTC (permalink / raw)
To: David Teigland; +Cc: LVM general discussion and development
After days testing, I'm pretty sure the problem has been solved by
upgrade sanlock to 3.6.0, thanks Dave!
Damon
2018-05-25 0:50 GMT+08:00 Damon Wang <damon.devops@gmail.com>:
> Thank you for your reply!
>
> I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try
> whether it happen again
>
> Damon
>
> 2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>:
>> On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
>>> Hi all,
>>>
>>> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
>>> intensive operations), it shows vglock is failed:
>>
>> Hi, thanks for this report.
>>
>>> /var/log/messages:
>>>
>>> May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
>>> [1112]: r627 paxos_release 8255 other lver 8258
>>
>> I believe this is the sanlock bug that was fixed here:
>> https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master
>>
>> By itself, the bug isn't a big problem, the lock was released but sanlock
>> returns an error. The bigger problem is that lvmlockd then believes that
>> the lock was not released:
>>
>>> 1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
>>> unlock_san release error -1
>>
>> so subsequent requests for the lock get backed up in lvmlockd:
>>
>>> [root@dev1 ~]# lvmlockctl -i
>>> LW VG sh ver 0 pid 34216 (lvchange)
>>> LW VG sh ver 0 pid 75685 (lvs)
>>> LW VG sh ver 0 pid 83741 (lvdisplay)
>>> LW VG sh ver 0 pid 90569 (lvchange)
>>> LW VG sh ver 0 pid 92735 (lvchange)
>>> LW VG sh ver 0 pid 99982 (lvs)
>>> LW VG sh ver 0 pid 14069 (lvchange)
>>
>>> My questions are:
>>>
>>> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
>>> sanlock could not find VGLK volume), can I find a direct proof?
>>
>> I believe the bug. Failures of the storage network can also cause similar
>> issues, but you would see error messages related to i/o timeouts.
>>
>>> 2. Is it recoverable? I have tried kill all hung commands but new
>>> command still hung forever.
>>
>> There are recently added options for this kind of situation, but I don't
>> believe there is an lvm release with those yet.
>>
>> If you are prepared to build your own version of lvm, build lvm release
>> 2.02.178 (which should be ready shortly, if it's not, take git master
>> branch). Be sure to configure with --enable-lvmlockd-sanlock. Then try:
>>
>> lvchange -an --lockopt skipvg <vgname>
>> lvmlockctl --drop <vgname>
>> stop lvmlockd, stop sanlock
>> restart everything as usual
>>
>> If that doesn't work, or if you don't want to build lvm, then unmount file
>> systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
>> if LVs were active (or perhaps just reboot the machine.) Restart
>> everything as usual.
>>
>> Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-05-30 13:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang
2018-05-24 15:46 ` David Teigland
2018-05-24 16:50 ` Damon Wang
2018-05-30 13:17 ` Damon Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).