linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM hangs
@ 2017-11-13 13:41 Alexander 'Leo' Bergolth
  2017-11-13 14:51 ` Zdenek Kabelac
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander 'Leo' Bergolth @ 2017-11-13 13:41 UTC (permalink / raw)
  To: LVM general discussion and development

Hi!

I have a EL7 desktop box with two sata harddisks and two ssds in a
LVM raid1 - thin pool - cache configuration. (Just migrated to this
setup a few weeks ago.)

After some days, individual processes start to block in disk wait.
I don't know if the problem resides in the cache-, thin- or raid1-layer
but the underlying block-devices are fully responsive.

I have prepared some info at:
  http://leo.kloburg.at/tmp/lvm-blocks/

Do the stack backtraces provide enough information to locate the source
of the blocks?

I'd be happy to provide additional info, if necessary.
Meanwhile I'll disable the LVM cache layer to eliminate this potential
candidate.

Cheers,
--leo

Kernel is 3.10.0-693.5.2.el7.x86_64
filesystem is XFS
lvm2-2.02.171-8.el7.x86_64

-- 
e-mail   ::: Leo.Bergolth (at) wu.ac.at
fax      ::: +43-1-31336-906050
location ::: IT-Services | Vienna University of Economics | Austria

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 13:41 [linux-lvm] LVM hangs Alexander 'Leo' Bergolth
@ 2017-11-13 14:51 ` Zdenek Kabelac
  2017-11-13 15:12   ` Alexander 'Leo' Bergolth
  2017-11-16 11:02   ` Alexander 'Leo' Bergolth
  0 siblings, 2 replies; 9+ messages in thread
From: Zdenek Kabelac @ 2017-11-13 14:51 UTC (permalink / raw)
  To: LVM general discussion and development, Alexander 'Leo' Bergolth

Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
> Hi!
> 
> I have a EL7 desktop box with two sata harddisks and two ssds in a
> LVM raid1 - thin pool - cache configuration. (Just migrated to this
> setup a few weeks ago.)
> 
> After some days, individual processes start to block in disk wait.
> I don't know if the problem resides in the cache-, thin- or raid1-layer
> but the underlying block-devices are fully responsive.
> 
> I have prepared some info at:
>    http://leo.kloburg.at/tmp/lvm-blocks/
> 
> Do the stack backtraces provide enough information to locate the source
> of the blocks?
> 
> I'd be happy to provide additional info, if necessary.
> Meanwhile I'll disable the LVM cache layer to eliminate this potential
> candidate.
> 

Hi


It would be probably nice to see the result of 'dmsetup status'

I'd have guessed you are probably hitting  'frozen' raid state
which is unfortunate existing upstream bug.


Regards


Zdenek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 14:51 ` Zdenek Kabelac
@ 2017-11-13 15:12   ` Alexander 'Leo' Bergolth
  2017-11-13 15:20     ` Zdenek Kabelac
  2017-11-16 11:02   ` Alexander 'Leo' Bergolth
  1 sibling, 1 reply; 9+ messages in thread
From: Alexander 'Leo' Bergolth @ 2017-11-13 15:12 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

Hi!

On 11/13/2017 03:51 PM, Zdenek Kabelac wrote:
> Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
>> I have a EL7 desktop box with two sata harddisks and two ssds in a
>> LVM raid1 - thin pool - cache configuration. (Just migrated to this
>> setup a few weeks ago.)
>>
>> After some days, individual processes start to block in disk wait.
>> I don't know if the problem resides in the cache-, thin- or raid1-layer
>> but the underlying block-devices are fully responsive.
>>
>> I have prepared some info at:
>> �� http://leo.kloburg.at/tmp/lvm-blocks/
>>
>> Do the stack backtraces provide enough information to locate the source
>> of the blocks?
>>
>> I'd be happy to provide additional info, if necessary.
>> Meanwhile I'll disable the LVM cache layer to eliminate this potential
>> candidate.
> 
> It would be probably nice to see the result of 'dmsetup status'


OK. Will be included next time.


> I'd have guessed you are probably hitting� 'frozen' raid state
> which is unfortunate existing upstream bug.

Are you talking about RH bug 1388632?
https://bugzilla.redhat.com/show_bug.cgi?id=1388632

Unfortunately I can only view the google-cached version of the bugzilla
page, since the bug is restricted to internal view only.

But the google-cached version suggests that the bug is mainly hit when
removing the raid-backed cache pool under IO.

I my scenario, no modification (like cache removal) of the lvm setup was
done when the blocks occured.

Cheers,
--leo
-- 
e-mail   ::: Leo.Bergolth (at) wu.ac.at
fax      ::: +43-1-31336-906050
location ::: IT-Services | Vienna University of Economics | Austria

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 15:12   ` Alexander 'Leo' Bergolth
@ 2017-11-13 15:20     ` Zdenek Kabelac
  2017-11-13 17:41       ` Gionatan Danti
  0 siblings, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2017-11-13 15:20 UTC (permalink / raw)
  To: LVM general discussion and development, Alexander 'Leo' Bergolth

Dne 13.11.2017 v 16:12 Alexander 'Leo' Bergolth napsal(a):
> Hi!
> 
> On 11/13/2017 03:51 PM, Zdenek Kabelac wrote:
>> Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
>>> I have a EL7 desktop box with two sata harddisks and two ssds in a
>>> LVM raid1 - thin pool - cache configuration. (Just migrated to this
>>> setup a few weeks ago.)
>>>
>>> After some days, individual processes start to block in disk wait.
>>> I don't know if the problem resides in the cache-, thin- or raid1-layer
>>> but the underlying block-devices are fully responsive.
>>>
>>> I have prepared some info at:
>>>  �� http://leo.kloburg.at/tmp/lvm-blocks/
>>>
>>> Do the stack backtraces provide enough information to locate the source
>>> of the blocks?
>>>
>>> I'd be happy to provide additional info, if necessary.
>>> Meanwhile I'll disable the LVM cache layer to eliminate this potential
>>> candidate.
>>
>> It would be probably nice to see the result of 'dmsetup status'
> 
> 
> OK. Will be included next time.
> 
> 
>> I'd have guessed you are probably hitting� 'frozen' raid state
>> which is unfortunate existing upstream bug.
> 
> Are you talking about RH bug 1388632?
> https://bugzilla.redhat.com/show_bug.cgi?id=1388632
> 
> Unfortunately I can only view the google-cached version of the bugzilla
> page, since the bug is restricted to internal view only.
> 

that could be similar issue yes

> But the google-cached version suggests that the bug is mainly hit when
> removing the raid-backed cache pool under IO.
> 
> I my scenario, no modification (like cache removal) of the lvm setup was
> done when the blocks occured.

Easiest is to check  'dmsetup status' - just to exclude if it's frozen raid case.


Zdenek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 15:20     ` Zdenek Kabelac
@ 2017-11-13 17:41       ` Gionatan Danti
  2017-11-13 21:56         ` Zdenek Kabelac
  0 siblings, 1 reply; 9+ messages in thread
From: Gionatan Danti @ 2017-11-13 17:41 UTC (permalink / raw)
  To: LVM general discussion and development, Zdenek Kabelac,
	Alexander 'Leo' Bergolth

On 13/11/2017 16:20, Zdenek Kabelac wrote:
>>
>> Are you talking about RH bug 1388632?
>> https://bugzilla.redhat.com/show_bug.cgi?id=1388632
>>
>> Unfortunately I can only view the google-cached version of the bugzilla
>> page, since the bug is restricted to internal view only.
>>
> 
> that could be similar issue yes
> 
>> But the google-cached version suggests that the bug is mainly hit when
>> removing the raid-backed cache pool under IO.
>>
>> I my scenario, no modification (like cache removal) of the lvm setup was
>> done when the blocks occured.
> 
> Easiest is to check� 'dmsetup status' - just to exclude if it's frozen 
> raid case.

Hi Zdeneck,
due to how easy is to trigger the bug, it seems a very serious problem 
to me. As the bug report is for internal use only, can you shed some 
light on what causes it and how to avoid?

Specifically can you confirm that, if using an "old-school" mdadm RAID 
device, the bug does not apply?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 17:41       ` Gionatan Danti
@ 2017-11-13 21:56         ` Zdenek Kabelac
  0 siblings, 0 replies; 9+ messages in thread
From: Zdenek Kabelac @ 2017-11-13 21:56 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti,
	Zdenek Kabelac, Alexander 'Leo' Bergolth

Dne 13.11.2017 v 18:41 Gionatan Danti napsal(a):
> On 13/11/2017 16:20, Zdenek Kabelac wrote:
>>>
>>> Are you talking about RH bug 1388632?
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1388632
>>>
>>> Unfortunately I can only view the google-cached version of the bugzilla
>>> page, since the bug is restricted to internal view only.
>>>
>>
>> that could be similar issue yes
>>
>>> But the google-cached version suggests that the bug is mainly hit when
>>> removing the raid-backed cache pool under IO.
>>>
>>> I my scenario, no modification (like cache removal) of the lvm setup was
>>> done when the blocks occured.
>>
>> Easiest is to check� 'dmsetup status' - just to exclude if it's frozen raid 
>> case.
> 
> Hi Zdeneck,
> due to how easy is to trigger the bug, it seems a very serious problem to me. 
> As the bug report is for internal use only, can you shed some light on what 
> causes it and how to avoid?
> 
> Specifically can you confirm that, if using an "old-school" mdadm RAID device, 
> the bug does not apply?


IMHO this particular issue is probably not triggerable (at least not so 
easily) by mdadm.

lvm2 has some sort of problem compared to mdadm - it's able to 'generate' more 
device state changes per second then mdadm.

BZ is still being examined AFAIK....


Zdenek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-13 14:51 ` Zdenek Kabelac
  2017-11-13 15:12   ` Alexander 'Leo' Bergolth
@ 2017-11-16 11:02   ` Alexander 'Leo' Bergolth
  2017-11-16 11:47     ` Zdenek Kabelac
  1 sibling, 1 reply; 9+ messages in thread
From: Alexander 'Leo' Bergolth @ 2017-11-16 11:02 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 2017-11-13 15:51, Zdenek Kabelac wrote:
> Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
>> I have a EL7 desktop box with two sata harddisks and two ssds in a
>> LVM raid1 - thin pool - cache configuration. (Just migrated to this
>> setup a few weeks ago.)
>>
>> After some days, individual processes start to block in disk wait.
>> I don't know if the problem resides in the cache-, thin- or raid1-layer
>> but the underlying block-devices are fully responsive.
>>
> It would be probably nice to see the result of 'dmsetup status'
> 
> I'd have guessed you are probably hitting  'frozen' raid state
> which is unfortunate existing upstream bug.

As it just happened again, I have collected some additional info like
dmsetup status
dmsetup info -c (do the event counts look suspicious?)

https://leo.kloburg.at/tmp/lvm-blocks/2017-11-16/

I don't see any volume in "frozen" state.

I haven't rebooted the box yet. Maybe I provide some more info?

Cheers,
--leo
-- 
e-mail   ::: Leo.Bergolth (at) wu.ac.at
fax      ::: +43-1-31336-906050
location ::: IT-Services | Vienna University of Economics | Austria

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-16 11:02   ` Alexander 'Leo' Bergolth
@ 2017-11-16 11:47     ` Zdenek Kabelac
  2017-11-16 14:16       ` Alexander 'Leo' Bergolth
  0 siblings, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2017-11-16 11:47 UTC (permalink / raw)
  To: Alexander 'Leo' Bergolth, LVM general discussion and development

Dne 16.11.2017 v 12:02 Alexander 'Leo' Bergolth napsal(a):
> On 2017-11-13 15:51, Zdenek Kabelac wrote:
>> Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
>>> I have a EL7 desktop box with two sata harddisks and two ssds in a
>>> LVM raid1 - thin pool - cache configuration. (Just migrated to this
>>> setup a few weeks ago.)
>>>
>>> After some days, individual processes start to block in disk wait.
>>> I don't know if the problem resides in the cache-, thin- or raid1-layer
>>> but the underlying block-devices are fully responsive.
>>>
>> It would be probably nice to see the result of 'dmsetup status'
>>
>> I'd have guessed you are probably hitting  'frozen' raid state
>> which is unfortunate existing upstream bug.
> 
> As it just happened again, I have collected some additional info like
> dmsetup status
> dmsetup info -c (do the event counts look suspicious?)
> 
> https://leo.kloburg.at/tmp/lvm-blocks/2017-11-16/
> 
> I don't see any volume in "frozen" state.
> 
> I haven't rebooted the box yet. Maybe I provide some more info?
> 


 From the plain look over those file - it doesn't even seem there is anything 
wrong with dm devices as such.

So it looks like  possibly XFS got into some unhappy moment.

I'd probably recommend to open regular  Bugzilla case  and attach files from 
your directory.

You can try if individual devices in the  'stack' are blocked.

i.e. try 'dd' read from every 'dm'  if there is something blocked.


 From status all device looks fully operational and also process stack trace 
do look reasonable idle.


I'm not sure how 'afs' is involved here - can you reproduce without afs ?


Zdenek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] LVM hangs
  2017-11-16 11:47     ` Zdenek Kabelac
@ 2017-11-16 14:16       ` Alexander 'Leo' Bergolth
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander 'Leo' Bergolth @ 2017-11-16 14:16 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 2017-11-16 12:47, Zdenek Kabelac wrote:
> Dne 16.11.2017 v 12:02 Alexander 'Leo' Bergolth napsal(a):
>> On 2017-11-13 15:51, Zdenek Kabelac wrote:
>>> Dne 13.11.2017 v 14:41 Alexander 'Leo' Bergolth napsal(a):
>>>> I have a EL7 desktop box with two sata harddisks and two ssds in a
>>>> LVM raid1 - thin pool - cache configuration. (Just migrated to this
>>>> setup a few weeks ago.)
>>>>
>>>> After some days, individual processes start to block in disk wait.
>>>> I don't know if the problem resides in the cache-, thin- or raid1-layer
>>>> but the underlying block-devices are fully responsive.
> 
> From the plain look over those file - it doesn't even seem there is
> anything wrong with dm devices as such.
> 
> So it looks like  possibly XFS got into some unhappy moment.
>
> I'd probably recommend to open regular  Bugzilla case  and attach files
> from your directory.

OK.

> You can try if individual devices in the  'stack' are blocked.
> i.e. try 'dd' read from every 'dm'  if there is something blocked.

No device is currently blocking. I can read from all LV devices
(including meta devices), all underlying PVs and all filesystems:

for dev in $(lvs -a -olv_dm_path --noheadings); do
  echo $dev;
  dd if=$dev of=/dev/null bs=4k count=10000 iflag=direct;
done
for pv in $(pvs -oname --noheadings); do
  echo $pv
  dd if=$pv of=/dev/null bs=4k count=10000 iflag=direct
done
echo 3 >/proc/sys/vm/drop_caches
for mp in $(findmnt -t xfs,ext4 -o TARGET -l -n); do
  echo $mp;
  tar -cf- --one-file-system "$mp" 2>/dev/null| head -c $((1024**3))
>/dev/null;
done

> From status all device looks fully operational and also process stack
> trace do look reasonable idle.
> 
> I'm not sure how 'afs' is involved here - can you reproduce without afs ?

OK. I'll try.

Thanks for your help!

--leo
-- 
e-mail   ::: Leo.Bergolth (at) wu.ac.at
fax      ::: +43-1-31336-906050
location ::: IT-Services | Vienna University of Economics | Austria

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-16 14:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-13 13:41 [linux-lvm] LVM hangs Alexander 'Leo' Bergolth
2017-11-13 14:51 ` Zdenek Kabelac
2017-11-13 15:12   ` Alexander 'Leo' Bergolth
2017-11-13 15:20     ` Zdenek Kabelac
2017-11-13 17:41       ` Gionatan Danti
2017-11-13 21:56         ` Zdenek Kabelac
2017-11-16 11:02   ` Alexander 'Leo' Bergolth
2017-11-16 11:47     ` Zdenek Kabelac
2017-11-16 14:16       ` Alexander 'Leo' Bergolth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).