linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
@ 2018-05-24 14:44 Damon Wang
  2018-05-24 15:46 ` David Teigland
  0 siblings, 1 reply; 4+ messages in thread
From: Damon Wang @ 2018-05-24 14:44 UTC (permalink / raw)
  To: LVM general discussion and development

Hi all,

I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
intensive operations), it shows vglock is failed:

/var/log/messages:

    May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1112]: r627 paxos_release 8255 other lver 8258
    May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1112]: r627 release_token release leader -250
    May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error
-250
    May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -250
    May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK unlock_san release error
-1
    May 24 21:14:29 dev1 lvmlockd[1061]: 1527167669 S
lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock lm error -1
    May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
[1111]: cmd_release 2,9,1061 no resource VGLK


meanwhile lvmlockctl -d:

    1527167668 close lvchange[34042] cl 476 fd 9
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_lock
rv 0 read vb 101 0 1761
    1527167669 send lvcreate[34073] cl 478 lock vg rv 0
    1527167669 recv lvcreate[34073] cl 478 find_free_lock vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84
find_free_lock_san found unused area at 127926272
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 find free lock 0
offset 127926272
    1527167669 send lvcreate[34073] cl 478 find_free_lock vg rv 0
    1527167669 recv lvcreate[34073] cl 478 init lv
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
    1527167669 work init_lv
ff35ecc8217543e0a5be9cbe935ffc84/c0dda9c02a7347a098203fbe2a7df1d0 uuid
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 init_lv_san
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK found unused area at 127926272
    1527167669 send lvcreate[34073] cl 478 init lv rv 0 vg_args
1.0.0:lvmlock lv_args 1.0.0:127926272
    1527167669 recv lvcreate[34073] cl 478 update vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode iv flags 0
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action update iv
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_update cl 478 lk version to 1762
    1527167669 send lvcreate[34073] cl 478 update vg rv 0
    1527167669 recv lvcreate[34073] cl 478 lock lv
"ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 1
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK action lock un
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R
Wnj8vm-2Hiy-gdqs-Uh1i-fUiA-W3mj-zdASzK res_unlock cl 478 no locks
    1527167669 send lvcreate[34073] cl 478 lock lv rv -2
    1527167669 recv lvcreate[34073] cl 478 lock vg
"ff35ecc8217543e0a5be9cbe935ffc84" mode un flags 0
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK action lock un
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK res_unlock cl 478
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock r_version new 1762
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san ex r_version 1762 flags 0
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san set r_version 1762
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san release error -250
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock lm error -250
    1527167669 send lvcreate[34073] cl 478 lock vg rv -250
    1527167669 close lvcreate[34073] cl 478 fd 11
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock cl 478 from close
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san ex r_version 0 flags 0
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
unlock_san release error -1
    1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
res_unlock lm error -1
    1527167670 new cl 479 pi 2 fd 9


and then any lvm related commands will hung:

    [root@dev1 ~]# lvmlockctl -i
    VG ff35ecc8217543e0a5be9cbe935ffc84 lock_type=sanlock
Nz4PmR-Bwfi-pItr-C3om-Ewnj-SICA-5rFuFg
    LS sanlock lvm_ff35ecc8217543e0a5be9cbe935ffc84
    LK VG ex ver 1762 pid 0 ()
    LW VG sh ver 0 pid 34216 (lvchange)
    LW VG sh ver 0 pid 75685 (lvs)
    LW VG sh ver 0 pid 83741 (lvdisplay)
    LW VG sh ver 0 pid 90569 (lvchange)
    LW VG sh ver 0 pid 92735 (lvchange)
    LW VG sh ver 0 pid 99982 (lvs)
    LW VG sh ver 0 pid 14069 (lvchange)
    LK LV sh onuPlt-YXI0-nYtv-CV8Q-yS93-6Zt9-QvlfyH
    LK GL sh ver 0 pid 75685 (lvs)
    LK GL sh ver 0 pid 99982 (lvs)
    LK LV un XOgw6I-3Nuh-ejIP-ydPq-rr7g-LMjb-oklfup

My questions are:

1. why VGLK failed, is it because network failure(cause iSCSI fail and
sanlock could not find VGLK volume), can I find a direct proof?
2. Is it recoverable? I have tried kill all hung commands but new
command still hung forever.

Thanks a lot,
Damon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
  2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang
@ 2018-05-24 15:46 ` David Teigland
  2018-05-24 16:50   ` Damon Wang
  0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2018-05-24 15:46 UTC (permalink / raw)
  To: Damon Wang; +Cc: linux-lvm

On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
> Hi all,
> 
> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
> intensive operations), it shows vglock is failed:

Hi, thanks for this report.

> /var/log/messages:
> 
>     May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
> [1112]: r627 paxos_release 8255 other lver 8258

I believe this is the sanlock bug that was fixed here:
https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master

By itself, the bug isn't a big problem, the lock was released but sanlock
returns an error.  The bigger problem is that lvmlockd then believes that
the lock was not released:

>     1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
> unlock_san release error -1

so subsequent requests for the lock get backed up in lvmlockd:

>     [root@dev1 ~]# lvmlockctl -i
>     LW VG sh ver 0 pid 34216 (lvchange)
>     LW VG sh ver 0 pid 75685 (lvs)
>     LW VG sh ver 0 pid 83741 (lvdisplay)
>     LW VG sh ver 0 pid 90569 (lvchange)
>     LW VG sh ver 0 pid 92735 (lvchange)
>     LW VG sh ver 0 pid 99982 (lvs)
>     LW VG sh ver 0 pid 14069 (lvchange)

> My questions are:
> 
> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
> sanlock could not find VGLK volume), can I find a direct proof?

I believe the bug.  Failures of the storage network can also cause similar
issues, but you would see error messages related to i/o timeouts.

> 2. Is it recoverable? I have tried kill all hung commands but new
> command still hung forever.

There are recently added options for this kind of situation, but I don't
believe there is an lvm release with those yet.

If you are prepared to build your own version of lvm, build lvm release
2.02.178 (which should be ready shortly, if it's not, take git master
branch).  Be sure to configure with --enable-lvmlockd-sanlock.  Then try:

  lvchange -an --lockopt skipvg <vgname>
  lvmlockctl --drop <vgname>
  stop lvmlockd, stop sanlock
  restart everything as usual

If that doesn't work, or if you don't want to build lvm, then unmount file
systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
if LVs were active (or perhaps just reboot the machine.) Restart
everything as usual.

Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
  2018-05-24 15:46 ` David Teigland
@ 2018-05-24 16:50   ` Damon Wang
  2018-05-30 13:17     ` Damon Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Damon Wang @ 2018-05-24 16:50 UTC (permalink / raw)
  To: David Teigland; +Cc: LVM general discussion and development

Thank you for your reply!

I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try
whether it happen again

Damon

2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>:
> On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
>> Hi all,
>>
>> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
>> intensive operations), it shows vglock is failed:
>
> Hi, thanks for this report.
>
>> /var/log/messages:
>>
>>     May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
>> [1112]: r627 paxos_release 8255 other lver 8258
>
> I believe this is the sanlock bug that was fixed here:
> https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master
>
> By itself, the bug isn't a big problem, the lock was released but sanlock
> returns an error.  The bigger problem is that lvmlockd then believes that
> the lock was not released:
>
>>     1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
>> unlock_san release error -1
>
> so subsequent requests for the lock get backed up in lvmlockd:
>
>>     [root@dev1 ~]# lvmlockctl -i
>>     LW VG sh ver 0 pid 34216 (lvchange)
>>     LW VG sh ver 0 pid 75685 (lvs)
>>     LW VG sh ver 0 pid 83741 (lvdisplay)
>>     LW VG sh ver 0 pid 90569 (lvchange)
>>     LW VG sh ver 0 pid 92735 (lvchange)
>>     LW VG sh ver 0 pid 99982 (lvs)
>>     LW VG sh ver 0 pid 14069 (lvchange)
>
>> My questions are:
>>
>> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
>> sanlock could not find VGLK volume), can I find a direct proof?
>
> I believe the bug.  Failures of the storage network can also cause similar
> issues, but you would see error messages related to i/o timeouts.
>
>> 2. Is it recoverable? I have tried kill all hung commands but new
>> command still hung forever.
>
> There are recently added options for this kind of situation, but I don't
> believe there is an lvm release with those yet.
>
> If you are prepared to build your own version of lvm, build lvm release
> 2.02.178 (which should be ready shortly, if it's not, take git master
> branch).  Be sure to configure with --enable-lvmlockd-sanlock.  Then try:
>
>   lvchange -an --lockopt skipvg <vgname>
>   lvmlockctl --drop <vgname>
>   stop lvmlockd, stop sanlock
>   restart everything as usual
>
> If that doesn't work, or if you don't want to build lvm, then unmount file
> systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
> if LVs were active (or perhaps just reboot the machine.) Restart
> everything as usual.
>
> Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever
  2018-05-24 16:50   ` Damon Wang
@ 2018-05-30 13:17     ` Damon Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Damon Wang @ 2018-05-30 13:17 UTC (permalink / raw)
  To: David Teigland; +Cc: LVM general discussion and development

After days testing, I'm pretty sure the problem has been solved by
upgrade sanlock to 3.6.0, thanks Dave!

Damon

2018-05-25 0:50 GMT+08:00 Damon Wang <damon.devops@gmail.com>:
> Thank you for your reply!
>
> I'll try to sanlock-3.6.0 first (currently I'm using 3.5.0) and try
> whether it happen again
>
> Damon
>
> 2018-05-24 23:46 GMT+08:00 David Teigland <teigland@redhat.com>:
>> On Thu, May 24, 2018 at 10:44:05PM +0800, Damon Wang wrote:
>>> Hi all,
>>>
>>> I'm using lvmlockd + sanlock on iSCSI, and sometimes (usually
>>> intensive operations), it shows vglock is failed:
>>
>> Hi, thanks for this report.
>>
>>> /var/log/messages:
>>>
>>>     May 24 21:14:29 dev1 sanlock[1108]: 2018-05-24 21:14:29 605471
>>> [1112]: r627 paxos_release 8255 other lver 8258
>>
>> I believe this is the sanlock bug that was fixed here:
>> https://pagure.io/sanlock/c/735781d683e99cccb3be7ffe8b4fff1392a2a4c8?branch=master
>>
>> By itself, the bug isn't a big problem, the lock was released but sanlock
>> returns an error.  The bigger problem is that lvmlockd then believes that
>> the lock was not released:
>>
>>>     1527167669 S lvm_ff35ecc8217543e0a5be9cbe935ffc84 R VGLK
>>> unlock_san release error -1
>>
>> so subsequent requests for the lock get backed up in lvmlockd:
>>
>>>     [root@dev1 ~]# lvmlockctl -i
>>>     LW VG sh ver 0 pid 34216 (lvchange)
>>>     LW VG sh ver 0 pid 75685 (lvs)
>>>     LW VG sh ver 0 pid 83741 (lvdisplay)
>>>     LW VG sh ver 0 pid 90569 (lvchange)
>>>     LW VG sh ver 0 pid 92735 (lvchange)
>>>     LW VG sh ver 0 pid 99982 (lvs)
>>>     LW VG sh ver 0 pid 14069 (lvchange)
>>
>>> My questions are:
>>>
>>> 1. why VGLK failed, is it because network failure(cause iSCSI fail and
>>> sanlock could not find VGLK volume), can I find a direct proof?
>>
>> I believe the bug.  Failures of the storage network can also cause similar
>> issues, but you would see error messages related to i/o timeouts.
>>
>>> 2. Is it recoverable? I have tried kill all hung commands but new
>>> command still hung forever.
>>
>> There are recently added options for this kind of situation, but I don't
>> believe there is an lvm release with those yet.
>>
>> If you are prepared to build your own version of lvm, build lvm release
>> 2.02.178 (which should be ready shortly, if it's not, take git master
>> branch).  Be sure to configure with --enable-lvmlockd-sanlock.  Then try:
>>
>>   lvchange -an --lockopt skipvg <vgname>
>>   lvmlockctl --drop <vgname>
>>   stop lvmlockd, stop sanlock
>>   restart everything as usual
>>
>> If that doesn't work, or if you don't want to build lvm, then unmount file
>> systems, kill lvmlockd, kill sanlock, you might need to do some dm cleanup
>> if LVs were active (or perhaps just reboot the machine.) Restart
>> everything as usual.
>>
>> Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-30 13:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 14:44 [linux-lvm] [lvmlockd] "VGLK res_unlock lm error -250" and lvm command hung forever Damon Wang
2018-05-24 15:46 ` David Teigland
2018-05-24 16:50   ` Damon Wang
2018-05-30 13:17     ` Damon Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).