linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] lvmpolld causes IO performance issue
@ 2022-08-16  9:28 Heming Zhao
  2022-08-16  9:38 ` Zdenek Kabelac
  0 siblings, 1 reply; 23+ messages in thread
From: Heming Zhao @ 2022-08-16  9:28 UTC (permalink / raw)
  To: linux-lvm, martin.wilck; +Cc: teigland, zdenek.kabelac

Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath devices. 
   echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
    /etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
  2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?

Thanks,
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes IO performance issue
  2022-08-16  9:28 [linux-lvm] lvmpolld causes IO performance issue Heming Zhao
@ 2022-08-16  9:38 ` Zdenek Kabelac
  2022-08-16 10:08   ` [linux-lvm] lvmpolld causes high cpu load issue Heming Zhao
  0 siblings, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-16  9:38 UTC (permalink / raw)
  To: Heming Zhao, linux-lvm, martin.wilck; +Cc: teigland

Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
> Hello maintainers & list,
> 
> I bring a story:
> One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
> decrease.
> 
> How to trigger:
> When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
> disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
> cpu load. But when system connects ~10 LUNs, the performance is fine.
> 
> We found two work arounds:
> 1. set lvm.conf 'activation/polling_interval=120'.
> 2. write a speical udev rule, which make udev ignore the event for mpath devices.
>     echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
>      /etc/udev/rules.d/90-dm-watch.rules
> 
> Run above any one of two can make the performance issue disappear.
> 
> ** the root cause **
> 
> lvmpolld will do interval requeset info job for updating the pvmove status
> 
> On every polling_interval time, lvm2 will update vg metadata. The update job will
> call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
>    2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
> (8 is IN_CLOSE_WRITE.)
> 
> These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
> even if pvmove write a few data, the sys_close action trigger udev's "watch"
> mechanism to gets notified frequently about a process that has written to the
> device and closed it. This causes frequent, pointless re-evaluation of the udev
> rules for these devices.
> 
> My question: Does LVM2 maintainers have any idea to fix this bug?
> 
> In my view, does lvm2 could drop VGs devices fds until pvmove finish?

Hi

Please provide more info about lvm2  metadata and also some  'lvs -avvvvv' 
trace so we can get better picture about the layout - also version of 
lvm2,systemd,kernel in use.

pvmove is progressing by mirroring each segment of an LV - so if there would 
be a lot of segments - then each such update may trigger udev watch rule event.

But ATM I could hardly imagine how this could cause some 'dramatic' 
performance decrease -  maybe there is something wrong with udev rules on the 
system ?

What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it 
tries to not eat all the disk bandwidth as such)


Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-16  9:38 ` Zdenek Kabelac
@ 2022-08-16 10:08   ` Heming Zhao
  2022-08-16 10:26     ` Zdenek Kabelac
  0 siblings, 1 reply; 23+ messages in thread
From: Heming Zhao @ 2022-08-16 10:08 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm, teigland, martin.wilck

Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
is triggered by pvmove.

On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
> Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
> > Hello maintainers & list,
> > 
> > I bring a story:
> > One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
> > decrease.
> > 
> > How to trigger:
> > When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
> > disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
> > cpu load. But when system connects ~10 LUNs, the performance is fine.
> > 
> > We found two work arounds:
> > 1. set lvm.conf 'activation/polling_interval=120'.
> > 2. write a speical udev rule, which make udev ignore the event for mpath devices.
> >     echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
> >      /etc/udev/rules.d/90-dm-watch.rules
> > 
> > Run above any one of two can make the performance issue disappear.
> > 
> > ** the root cause **
> > 
> > lvmpolld will do interval requeset info job for updating the pvmove status
> > 
> > On every polling_interval time, lvm2 will update vg metadata. The update job will
> > call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
> >    2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
> > (8 is IN_CLOSE_WRITE.)
> > 
> > These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
> > even if pvmove write a few data, the sys_close action trigger udev's "watch"
> > mechanism to gets notified frequently about a process that has written to the
> > device and closed it. This causes frequent, pointless re-evaluation of the udev
> > rules for these devices.
> > 
> > My question: Does LVM2 maintainers have any idea to fix this bug?
> > 
> > In my view, does lvm2 could drop VGs devices fds until pvmove finish?
> 
> Hi
> 
> Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
> trace so we can get better picture about the layout - also version of
> lvm2,systemd,kernel in use.
> 
> pvmove is progressing by mirroring each segment of an LV - so if there would
> be a lot of segments - then each such update may trigger udev watch rule
> event.
> 
> But ATM I could hardly imagine how this could cause some 'dramatic'
> performance decrease -  maybe there is something wrong with udev rules on
> the system ?
> 
> What is the actual impact ?
> 
> Note - pvmove was never designed as a high performance operation (in fact it
> tries to not eat all the disk bandwidth as such)
> 
> Regards
> Zdenek

My mistake, I write here again:
The subject is wrong, not IO performance but cpu high load is triggered by pvmove.

There is no IO performance issue.

When system is connecting 80~200, the cpu load increase by 15~20, the
cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
times to the cores fully utilized.
In another word: a single pvmove process cost 5-6 (sometime 10) cores
utilization. It's abnormal & unaccepted. 

The lvm2 is 2.03.05, kernel is 5.3. systemd is v246. 

BTW:
I change this mail subject from:  lvmpolld causes IO performance issue
to: lvmpolld causes high cpu load issue 
Please use this mail for later discussing.


- Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-16 10:08   ` [linux-lvm] lvmpolld causes high cpu load issue Heming Zhao
@ 2022-08-16 10:26     ` Zdenek Kabelac
  2022-08-17  2:03       ` Heming Zhao
  0 siblings, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-16 10:26 UTC (permalink / raw)
  To: Heming Zhao; +Cc: linux-lvm, teigland, martin.wilck

Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
> is triggered by pvmove.
> 
> On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
>> Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
>>> Hello maintainers & list,
>>>
>>> I bring a story:
>>> One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
>>> decrease.
>>>
>>> How to trigger:
>>> When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
>>> disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
>>> cpu load. But when system connects ~10 LUNs, the performance is fine.
>>>
>>> We found two work arounds:
>>> 1. set lvm.conf 'activation/polling_interval=120'.
>>> 2. write a speical udev rule, which make udev ignore the event for mpath devices.
>>>      echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
>>>       /etc/udev/rules.d/90-dm-watch.rules
>>>
>>> Run above any one of two can make the performance issue disappear.
>>>
>>> ** the root cause **
>>>
>>> lvmpolld will do interval requeset info job for updating the pvmove status
>>>
>>> On every polling_interval time, lvm2 will update vg metadata. The update job will
>>> call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
>>>     2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
>>> (8 is IN_CLOSE_WRITE.)
>>>
>>> These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
>>> even if pvmove write a few data, the sys_close action trigger udev's "watch"
>>> mechanism to gets notified frequently about a process that has written to the
>>> device and closed it. This causes frequent, pointless re-evaluation of the udev
>>> rules for these devices.
>>>
>>> My question: Does LVM2 maintainers have any idea to fix this bug?
>>>
>>> In my view, does lvm2 could drop VGs devices fds until pvmove finish?
>>
>> Hi
>>
>> Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
>> trace so we can get better picture about the layout - also version of
>> lvm2,systemd,kernel in use.
>>
>> pvmove is progressing by mirroring each segment of an LV - so if there would
>> be a lot of segments - then each such update may trigger udev watch rule
>> event.
>>
>> But ATM I could hardly imagine how this could cause some 'dramatic'
>> performance decrease -  maybe there is something wrong with udev rules on
>> the system ?
>>
>> What is the actual impact ?
>>
>> Note - pvmove was never designed as a high performance operation (in fact it
>> tries to not eat all the disk bandwidth as such)
>>
>> Regards
>> Zdenek
> 
> My mistake, I write here again:
> The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
> 
> There is no IO performance issue.
> 
> When system is connecting 80~200, the cpu load increase by 15~20, the
> cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
> times to the cores fully utilized.
> In another word: a single pvmove process cost 5-6 (sometime 10) cores
> utilization. It's abnormal & unaccepted.
> 
> The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
> 
> BTW:
> I change this mail subject from:  lvmpolld causes IO performance issue
> to: lvmpolld causes high cpu load issue
> Please use this mail for later discussing.


Hi

Could you please retest with recent version of lvm2. There have been certainly 
some improvements in scanning - which might have caused in the older releases 
some higher CPU usage with longer set of devices.

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-16 10:26     ` Zdenek Kabelac
@ 2022-08-17  2:03       ` Heming Zhao
  2022-08-17  8:06         ` Zdenek Kabelac
  0 siblings, 1 reply; 23+ messages in thread
From: Heming Zhao @ 2022-08-17  2:03 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm, teigland, martin.wilck

On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
> > Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
> > is triggered by pvmove.
> > 
> > On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
> > > Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
> > > > Hello maintainers & list,
> > > > 
> > > > I bring a story:
> > > > One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
> > > > decrease.
> > > > 
> > > > How to trigger:
> > > > When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
> > > > disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
> > > > cpu load. But when system connects ~10 LUNs, the performance is fine.
> > > > 
> > > > We found two work arounds:
> > > > 1. set lvm.conf 'activation/polling_interval=120'.
> > > > 2. write a speical udev rule, which make udev ignore the event for mpath devices.
> > > >      echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
> > > >       /etc/udev/rules.d/90-dm-watch.rules
> > > > 
> > > > Run above any one of two can make the performance issue disappear.
> > > > 
> > > > ** the root cause **
> > > > 
> > > > lvmpolld will do interval requeset info job for updating the pvmove status
> > > > 
> > > > On every polling_interval time, lvm2 will update vg metadata. The update job will
> > > > call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
> > > >     2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
> > > > (8 is IN_CLOSE_WRITE.)
> > > > 
> > > > These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
> > > > even if pvmove write a few data, the sys_close action trigger udev's "watch"
> > > > mechanism to gets notified frequently about a process that has written to the
> > > > device and closed it. This causes frequent, pointless re-evaluation of the udev
> > > > rules for these devices.
> > > > 
> > > > My question: Does LVM2 maintainers have any idea to fix this bug?
> > > > 
> > > > In my view, does lvm2 could drop VGs devices fds until pvmove finish?
> > > 
> > > Hi
> > > 
> > > Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
> > > trace so we can get better picture about the layout - also version of
> > > lvm2,systemd,kernel in use.
> > > 
> > > pvmove is progressing by mirroring each segment of an LV - so if there would
> > > be a lot of segments - then each such update may trigger udev watch rule
> > > event.
> > > 
> > > But ATM I could hardly imagine how this could cause some 'dramatic'
> > > performance decrease -  maybe there is something wrong with udev rules on
> > > the system ?
> > > 
> > > What is the actual impact ?
> > > 
> > > Note - pvmove was never designed as a high performance operation (in fact it
> > > tries to not eat all the disk bandwidth as such)
> > > 
> > > Regards
> > > Zdenek
> > 
> > My mistake, I write here again:
> > The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
> > 
> > There is no IO performance issue.
> > 
> > When system is connecting 80~200, the cpu load increase by 15~20, the
> > cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
> > times to the cores fully utilized.
> > In another word: a single pvmove process cost 5-6 (sometime 10) cores
> > utilization. It's abnormal & unaccepted.
> > 
> > The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
> > 
> > BTW:
> > I change this mail subject from:  lvmpolld causes IO performance issue
> > to: lvmpolld causes high cpu load issue
> > Please use this mail for later discussing.
> 
> 
> Hi
> 
> Could you please retest with recent version of lvm2. There have been
> certainly some improvements in scanning - which might have caused in the
> older releases some higher CPU usage with longer set of devices.
> 
> Regards
> 
> Zdenek

The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
version include the improvements change?
Could you mind to point out which commits related with the improvements?
I don't have the reproducible env, I need to get a little detail before
asking customer to try new version. 

Thanks,
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17  2:03       ` Heming Zhao
@ 2022-08-17  8:06         ` Zdenek Kabelac
  2022-08-17  8:43           ` Heming Zhao
  0 siblings, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17  8:06 UTC (permalink / raw)
  To: Heming Zhao; +Cc: linux-lvm, teigland, martin.wilck

Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
> On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
>> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
>>> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
>>> is triggered by pvmove.
>>>
>>> On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
>>>> Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
>>>>> Hello maintainers & list,
>>>>>
>>>>> I bring a story:
>>>>> One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
>>>>> decrease.
>>>>>
>>>>> How to trigger:
>>>>> When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
>>>>> disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
>>>>> cpu load. But when system connects ~10 LUNs, the performance is fine.
>>>>>
>>>>> We found two work arounds:
>>>>> 1. set lvm.conf 'activation/polling_interval=120'.
>>>>> 2. write a speical udev rule, which make udev ignore the event for mpath devices.
>>>>>       echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
>>>>>        /etc/udev/rules.d/90-dm-watch.rules
>>>>>
>>>>> Run above any one of two can make the performance issue disappear.
>>>>>
>>>>> ** the root cause **
>>>>>
>>>>> lvmpolld will do interval requeset info job for updating the pvmove status
>>>>>
>>>>> On every polling_interval time, lvm2 will update vg metadata. The update job will
>>>>> call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
>>>>>      2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
>>>>> (8 is IN_CLOSE_WRITE.)
>>>>>
>>>>> These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
>>>>> even if pvmove write a few data, the sys_close action trigger udev's "watch"
>>>>> mechanism to gets notified frequently about a process that has written to the
>>>>> device and closed it. This causes frequent, pointless re-evaluation of the udev
>>>>> rules for these devices.
>>>>>
>>>>> My question: Does LVM2 maintainers have any idea to fix this bug?
>>>>>
>>>>> In my view, does lvm2 could drop VGs devices fds until pvmove finish?
>>>>
>>>> Hi
>>>>
>>>> Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
>>>> trace so we can get better picture about the layout - also version of
>>>> lvm2,systemd,kernel in use.
>>>>
>>>> pvmove is progressing by mirroring each segment of an LV - so if there would
>>>> be a lot of segments - then each such update may trigger udev watch rule
>>>> event.
>>>>
>>>> But ATM I could hardly imagine how this could cause some 'dramatic'
>>>> performance decrease -  maybe there is something wrong with udev rules on
>>>> the system ?
>>>>
>>>> What is the actual impact ?
>>>>
>>>> Note - pvmove was never designed as a high performance operation (in fact it
>>>> tries to not eat all the disk bandwidth as such)
>>>>
>>>> Regards
>>>> Zdenek
>>>
>>> My mistake, I write here again:
>>> The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
>>>
>>> There is no IO performance issue.
>>>
>>> When system is connecting 80~200, the cpu load increase by 15~20, the
>>> cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
>>> times to the cores fully utilized.
>>> In another word: a single pvmove process cost 5-6 (sometime 10) cores
>>> utilization. It's abnormal & unaccepted.
>>>
>>> The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
>>>
>>> BTW:
>>> I change this mail subject from:  lvmpolld causes IO performance issue
>>> to: lvmpolld causes high cpu load issue
>>> Please use this mail for later discussing.
>>
>>
>> Hi
>>
>> Could you please retest with recent version of lvm2. There have been
>> certainly some improvements in scanning - which might have caused in the
>> older releases some higher CPU usage with longer set of devices.
>>
>> Regards
>>
>> Zdenek
> 
> The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
> version include the improvements change?
> Could you mind to point out which commits related with the improvements?
> I don't have the reproducible env, I need to get a little detail before
> asking customer to try new version.
> 


Please try to reproduce your customer's problem and see if the newer version 
solves the issue.   Otherwise we could waste hours on theoretical discussions 
what might or might not have helped with this problem. Having a reproducer is 
a starting point for fixing it, if the problem is still there.

Here is one commit that may possibly affect CPU load:

d2522f4a05aa027bcc911ecb832450bc19b7fb57


Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17  8:06         ` Zdenek Kabelac
@ 2022-08-17  8:43           ` Heming Zhao
  2022-08-17  9:46             ` Zdenek Kabelac
  0 siblings, 1 reply; 23+ messages in thread
From: Heming Zhao @ 2022-08-17  8:43 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm, teigland, martin.wilck

On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
> > On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
> > > Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
> > > > Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
> > > > is triggered by pvmove.
> > > > 
> > > > On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
> > > > > Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
> > > > > > Hello maintainers & list,
> > > > > > 
> > > > > > I bring a story:
> > > > > > One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
> > > > > > decrease.
> > > > > > 
> > > > > > How to trigger:
> > > > > > When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
> > > > > > disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
> > > > > > cpu load. But when system connects ~10 LUNs, the performance is fine.
> > > > > > 
> > > > > > We found two work arounds:
> > > > > > 1. set lvm.conf 'activation/polling_interval=120'.
> > > > > > 2. write a speical udev rule, which make udev ignore the event for mpath devices.
> > > > > >       echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
> > > > > >        /etc/udev/rules.d/90-dm-watch.rules
> > > > > > 
> > > > > > Run above any one of two can make the performance issue disappear.
> > > > > > 
> > > > > > ** the root cause **
> > > > > > 
> > > > > > lvmpolld will do interval requeset info job for updating the pvmove status
> > > > > > 
> > > > > > On every polling_interval time, lvm2 will update vg metadata. The update job will
> > > > > > call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
> > > > > >      2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
> > > > > > (8 is IN_CLOSE_WRITE.)
> > > > > > 
> > > > > > These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
> > > > > > even if pvmove write a few data, the sys_close action trigger udev's "watch"
> > > > > > mechanism to gets notified frequently about a process that has written to the
> > > > > > device and closed it. This causes frequent, pointless re-evaluation of the udev
> > > > > > rules for these devices.
> > > > > > 
> > > > > > My question: Does LVM2 maintainers have any idea to fix this bug?
> > > > > > 
> > > > > > In my view, does lvm2 could drop VGs devices fds until pvmove finish?
> > > > > 
> > > > > Hi
> > > > > 
> > > > > Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
> > > > > trace so we can get better picture about the layout - also version of
> > > > > lvm2,systemd,kernel in use.
> > > > > 
> > > > > pvmove is progressing by mirroring each segment of an LV - so if there would
> > > > > be a lot of segments - then each such update may trigger udev watch rule
> > > > > event.
> > > > > 
> > > > > But ATM I could hardly imagine how this could cause some 'dramatic'
> > > > > performance decrease -  maybe there is something wrong with udev rules on
> > > > > the system ?
> > > > > 
> > > > > What is the actual impact ?
> > > > > 
> > > > > Note - pvmove was never designed as a high performance operation (in fact it
> > > > > tries to not eat all the disk bandwidth as such)
> > > > > 
> > > > > Regards
> > > > > Zdenek
> > > > 
> > > > My mistake, I write here again:
> > > > The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
> > > > 
> > > > There is no IO performance issue.
> > > > 
> > > > When system is connecting 80~200, the cpu load increase by 15~20, the
> > > > cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
> > > > times to the cores fully utilized.
> > > > In another word: a single pvmove process cost 5-6 (sometime 10) cores
> > > > utilization. It's abnormal & unaccepted.
> > > > 
> > > > The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
> > > > 
> > > > BTW:
> > > > I change this mail subject from:  lvmpolld causes IO performance issue
> > > > to: lvmpolld causes high cpu load issue
> > > > Please use this mail for later discussing.
> > > 
> > > 
> > > Hi
> > > 
> > > Could you please retest with recent version of lvm2. There have been
> > > certainly some improvements in scanning - which might have caused in the
> > > older releases some higher CPU usage with longer set of devices.
> > > 
> > > Regards
> > > 
> > > Zdenek
> > 
> > The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
> > version include the improvements change?
> > Could you mind to point out which commits related with the improvements?
> > I don't have the reproducible env, I need to get a little detail before
> > asking customer to try new version.
> > 
> 
> 
> Please try to reproduce your customer's problem and see if the newer version
> solves the issue.   Otherwise we could waste hours on theoretical
> discussions what might or might not have helped with this problem. Having a
> reproducer is a starting point for fixing it, if the problem is still there.
> 
> Here is one commit that may possibly affect CPU load:
> 
> d2522f4a05aa027bcc911ecb832450bc19b7fb57
> 
> 
> Regards
> 
> Zdenek

I gave a little bit explain for the root cause in previous mail, And the
work around <2> also matchs my analysis.

The machine connects lots of LUNs. pvmove one disk will trigger lvm2
update all underlying mpath devices (80~200). I guess the update job is
vg_commit() which updates latest metadata info, and the metadata locates in
all PVs. The update job finished with close(2) which trigger hundreds
devices udevd IN_CLOSE_WRITE event. every IN_CLOSE_WRITE will trigger
mpathd udev rules (11-dm-mpath.rules) to start scanning devices. So the
real world will flooding hundreds of multipath processes, the cpus load
become high.

Thanks,
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17  8:43           ` Heming Zhao
@ 2022-08-17  9:46             ` Zdenek Kabelac
  2022-08-17 10:47               ` Heming Zhao
  0 siblings, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17  9:46 UTC (permalink / raw)
  To: Heming Zhao; +Cc: linux-lvm, teigland, martin.wilck

Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):
> On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:
>> Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
>>> On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
>>>> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
>>>>> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
>>>>> is triggered by pvmove.
>>>>>
>>>>> On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
>>>>>> Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
>>>>>>> Hello maintainers & list,
>>>>>>>
>>>>>>> I bring a story:
>>>>>>> One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
>>>>>>> decrease.
>>>>>>>
>>>>>>> How to trigger:
>>>>>>> When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
>>>>>>> disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
>>>>>>> cpu load. But when system connects ~10 LUNs, the performance is fine.
>>>>>>>
>>>>>>> We found two work arounds:
>>>>>>> 1. set lvm.conf 'activation/polling_interval=120'.
>>>>>>> 2. write a speical udev rule, which make udev ignore the event for mpath devices.
>>>>>>>        echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
>>>>>>>         /etc/udev/rules.d/90-dm-watch.rules
>>>>>>>
>>>>>>> Run above any one of two can make the performance issue disappear.
>>>>>>>
>>>>>>> ** the root cause **
>>>>>>>
>>>>>>> lvmpolld will do interval requeset info job for updating the pvmove status
>>>>>>>
>>>>>>> On every polling_interval time, lvm2 will update vg metadata. The update job will
>>>>>>> call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
>>>>>>>       2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
>>>>>>> (8 is IN_CLOSE_WRITE.)
>>>>>>>
>>>>>>> These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
>>>>>>> even if pvmove write a few data, the sys_close action trigger udev's "watch"
>>>>>>> mechanism to gets notified frequently about a process that has written to the
>>>>>>> device and closed it. This causes frequent, pointless re-evaluation of the udev
>>>>>>> rules for these devices.
>>>>>>>
>>>>>>> My question: Does LVM2 maintainers have any idea to fix this bug?
>>>>>>>
>>>>>>> In my view, does lvm2 could drop VGs devices fds until pvmove finish?
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
>>>>>> trace so we can get better picture about the layout - also version of
>>>>>> lvm2,systemd,kernel in use.
>>>>>>
>>>>>> pvmove is progressing by mirroring each segment of an LV - so if there would
>>>>>> be a lot of segments - then each such update may trigger udev watch rule
>>>>>> event.
>>>>>>
>>>>>> But ATM I could hardly imagine how this could cause some 'dramatic'
>>>>>> performance decrease -  maybe there is something wrong with udev rules on
>>>>>> the system ?
>>>>>>
>>>>>> What is the actual impact ?
>>>>>>
>>>>>> Note - pvmove was never designed as a high performance operation (in fact it
>>>>>> tries to not eat all the disk bandwidth as such)
>>>>>>
>>>>>> Regards
>>>>>> Zdenek
>>>>>
>>>>> My mistake, I write here again:
>>>>> The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
>>>>>
>>>>> There is no IO performance issue.
>>>>>
>>>>> When system is connecting 80~200, the cpu load increase by 15~20, the
>>>>> cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
>>>>> times to the cores fully utilized.
>>>>> In another word: a single pvmove process cost 5-6 (sometime 10) cores
>>>>> utilization. It's abnormal & unaccepted.
>>>>>
>>>>> The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
>>>>>
>>>>> BTW:
>>>>> I change this mail subject from:  lvmpolld causes IO performance issue
>>>>> to: lvmpolld causes high cpu load issue
>>>>> Please use this mail for later discussing.
>>>>
>>>>
>>>> Hi
>>>>
>>>> Could you please retest with recent version of lvm2. There have been
>>>> certainly some improvements in scanning - which might have caused in the
>>>> older releases some higher CPU usage with longer set of devices.
>>>>
>>>> Regards
>>>>
>>>> Zdenek
>>>
>>> The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
>>> version include the improvements change?
>>> Could you mind to point out which commits related with the improvements?
>>> I don't have the reproducible env, I need to get a little detail before
>>> asking customer to try new version.
>>>
>>
>>
>> Please try to reproduce your customer's problem and see if the newer version
>> solves the issue.   Otherwise we could waste hours on theoretical
>> discussions what might or might not have helped with this problem. Having a
>> reproducer is a starting point for fixing it, if the problem is still there.
>>
>> Here is one commit that may possibly affect CPU load:
>>
>> d2522f4a05aa027bcc911ecb832450bc19b7fb57
>>
>>
>> Regards
>>
>> Zdenek
> 
> I gave a little bit explain for the root cause in previous mail, And the
> work around <2> also matchs my analysis.
> 
> The machine connects lots of LUNs. pvmove one disk will trigger lvm2
> update all underlying mpath devices (80~200). I guess the update job is
> vg_commit() which updates latest metadata info, and the metadata locates in
> all PVs. The update job finished with close(2) which trigger hundreds
> devices udevd IN_CLOSE_WRITE event. every IN_CLOSE_WRITE will trigger
> mpathd udev rules (11-dm-mpath.rules) to start scanning devices. So the
> real world will flooding hundreds of multipath processes, the cpus load
> become high.


Your 'guess explanation' is not as useful as you might think - as we do not 
know the layout of lvm2 metadata, how many disks are involved into the 
operation, number of segments  and many other things (in RHEL we have 
'sosreport' to harvest all the needed info).

ATM I'm not even sure if you are complaining about how CPU usage of lvmpolld 
or just huge udev rules processing overhead.

If you have too many disks in VG  (again unclear how many there are paths and 
how many distinct PVs) - user may *significantly* reduce burden associated 
with metadata updating by reducing number of 'actively' maintained metadata 
areas in VG - so i.e. if you have 100PVs in VG - you may keep metadata only on 
5-10 PVs to have 'enough' duplicate copies of lvm2 metadata within VG 
(vgchange --metadaatacopies X) - clearly it depends on the use case and how 
many PVs are added/removed from a VG over the lifetime....

There are IMHO still too many variations to guess from - so it's easier to 
create the most similar reproducer to your customer case if you can't reveal 
more physical info about it  and  lvm2 test suite has lot of power to emulate 
most of your system setup combination (it's easy to put there 100 fake PVs and 
prepare metadata set similar to customer's one - once we will have a local 
reproducer it's easier to seek for solution.


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17  9:46             ` Zdenek Kabelac
@ 2022-08-17 10:47               ` Heming Zhao
  2022-08-17 11:13                 ` Zdenek Kabelac
  2022-08-17 12:39                 ` Martin Wilck
  0 siblings, 2 replies; 23+ messages in thread
From: Heming Zhao @ 2022-08-17 10:47 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm, teigland, martin.wilck

On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):
> > On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:
> > > Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
> > > > On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
> > > > > Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
> > > > > > Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
> > > > > > is triggered by pvmove.
> > > > > > 
> > > > > > On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
> > > > > > > Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
> > > > > > > > Hello maintainers & list,
> > > > > > > > 
> > > > > > > > I bring a story:
> > > > > > > > One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
> > > > > > > > decrease.
> > > > > > > > 
> > > > > > > > How to trigger:
> > > > > > > > When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
> > > > > > > > disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
> > > > > > > > cpu load. But when system connects ~10 LUNs, the performance is fine.
> > > > > > > > 
> > > > > > > > We found two work arounds:
> > > > > > > > 1. set lvm.conf 'activation/polling_interval=120'.
> > > > > > > > 2. write a speical udev rule, which make udev ignore the event for mpath devices.
> > > > > > > >        echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
> > > > > > > >         /etc/udev/rules.d/90-dm-watch.rules
> > > > > > > > 
> > > > > > > > Run above any one of two can make the performance issue disappear.
> > > > > > > > 
> > > > > > > > ** the root cause **
> > > > > > > > 
> > > > > > > > lvmpolld will do interval requeset info job for updating the pvmove status
> > > > > > > > 
> > > > > > > > On every polling_interval time, lvm2 will update vg metadata. The update job will
> > > > > > > > call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
> > > > > > > >       2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
> > > > > > > > (8 is IN_CLOSE_WRITE.)
> > > > > > > > 
> > > > > > > > These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
> > > > > > > > even if pvmove write a few data, the sys_close action trigger udev's "watch"
> > > > > > > > mechanism to gets notified frequently about a process that has written to the
> > > > > > > > device and closed it. This causes frequent, pointless re-evaluation of the udev
> > > > > > > > rules for these devices.
> > > > > > > > 
> > > > > > > > My question: Does LVM2 maintainers have any idea to fix this bug?
> > > > > > > > 
> > > > > > > > In my view, does lvm2 could drop VGs devices fds until pvmove finish?
> > > > > > > 
> > > > > > > Hi
> > > > > > > 
> > > > > > > Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
> > > > > > > trace so we can get better picture about the layout - also version of
> > > > > > > lvm2,systemd,kernel in use.
> > > > > > > 
> > > > > > > pvmove is progressing by mirroring each segment of an LV - so if there would
> > > > > > > be a lot of segments - then each such update may trigger udev watch rule
> > > > > > > event.
> > > > > > > 
> > > > > > > But ATM I could hardly imagine how this could cause some 'dramatic'
> > > > > > > performance decrease -  maybe there is something wrong with udev rules on
> > > > > > > the system ?
> > > > > > > 
> > > > > > > What is the actual impact ?
> > > > > > > 
> > > > > > > Note - pvmove was never designed as a high performance operation (in fact it
> > > > > > > tries to not eat all the disk bandwidth as such)
> > > > > > > 
> > > > > > > Regards
> > > > > > > Zdenek
> > > > > > 
> > > > > > My mistake, I write here again:
> > > > > > The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
> > > > > > 
> > > > > > There is no IO performance issue.
> > > > > > 
> > > > > > When system is connecting 80~200, the cpu load increase by 15~20, the
> > > > > > cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
> > > > > > times to the cores fully utilized.
> > > > > > In another word: a single pvmove process cost 5-6 (sometime 10) cores
> > > > > > utilization. It's abnormal & unaccepted.
> > > > > > 
> > > > > > The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
> > > > > > 
> > > > > > BTW:
> > > > > > I change this mail subject from:  lvmpolld causes IO performance issue
> > > > > > to: lvmpolld causes high cpu load issue
> > > > > > Please use this mail for later discussing.
> > > > > 
> > > > > 
> > > > > Hi
> > > > > 
> > > > > Could you please retest with recent version of lvm2. There have been
> > > > > certainly some improvements in scanning - which might have caused in the
> > > > > older releases some higher CPU usage with longer set of devices.
> > > > > 
> > > > > Regards
> > > > > 
> > > > > Zdenek
> > > > 
> > > > The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
> > > > version include the improvements change?
> > > > Could you mind to point out which commits related with the improvements?
> > > > I don't have the reproducible env, I need to get a little detail before
> > > > asking customer to try new version.
> > > > 
> > > 
> > > 
> > > Please try to reproduce your customer's problem and see if the newer version
> > > solves the issue.   Otherwise we could waste hours on theoretical
> > > discussions what might or might not have helped with this problem. Having a
> > > reproducer is a starting point for fixing it, if the problem is still there.
> > > 
> > > Here is one commit that may possibly affect CPU load:
> > > 
> > > d2522f4a05aa027bcc911ecb832450bc19b7fb57
> > > 
> > > 
> > > Regards
> > > 
> > > Zdenek
> > 
> > I gave a little bit explain for the root cause in previous mail, And the
> > work around <2> also matchs my analysis.
> > 
> > The machine connects lots of LUNs. pvmove one disk will trigger lvm2
> > update all underlying mpath devices (80~200). I guess the update job is
> > vg_commit() which updates latest metadata info, and the metadata locates in
> > all PVs. The update job finished with close(2) which trigger hundreds
> > devices udevd IN_CLOSE_WRITE event. every IN_CLOSE_WRITE will trigger
> > mpathd udev rules (11-dm-mpath.rules) to start scanning devices. So the
> > real world will flooding hundreds of multipath processes, the cpus load
> > become high.
> 
> 
> Your 'guess explanation' is not as useful as you might think - as we do not
> know the layout of lvm2 metadata, how many disks are involved into the
> operation, number of segments  and many other things (in RHEL we have
> 'sosreport' to harvest all the needed info).

SUSE also has the similar data collection software. But we can't share any
customer info to public except customer agree to do.
You could ask what your interesting, I will masking the private info
before sharing.

The machine connecting disks are more than 250. The VG has 103 PVs & 79 LVs.

# /sbin/vgs
  VG           #PV #LV #SN Attr   VSize   VFree
  <vgname>     103  79   0 wz--n-  52t    17t

There is no clustered env, all the operations are local.
Each PV is mpath device, which at least has two legs, some PV has three legs.
The LV type is dm linear type.
The operation: pvmove one PV to another PV in the same VG.

> 
> ATM I'm not even sure if you are complaining about how CPU usage of lvmpolld
> or just huge udev rules processing overhead.

The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE action
which is the trigger.

> 
> If you have too many disks in VG  (again unclear how many there are paths
> and how many distinct PVs) - user may *significantly* reduce burden
> associated with metadata updating by reducing number of 'actively'
> maintained metadata areas in VG - so i.e. if you have 100PVs in VG - you may
> keep metadata only on 5-10 PVs to have 'enough' duplicate copies of lvm2
> metadata within VG (vgchange --metadaatacopies X) - clearly it depends on
> the use case and how many PVs are added/removed from a VG over the
> lifetime....

Thanks for the important info. I also found the related VG config from
/etc/lvm/backup/<vgname>, this file shows 'metadata_copies = 0'.

This should be another solution. But why not lvm2 takes this behavior by
default, or give a notification when pv number beyond a threshold when user
executing pvs/vgs/lvs or pvmove.
There are too many magic switch, users don't know how to adjust them for
better performance.

> 
> There are IMHO still too many variations to guess from - so it's easier to
> create the most similar reproducer to your customer case if you can't reveal
> more physical info about it  and  lvm2 test suite has lot of power to
> emulate most of your system setup combination (it's easy to put there 100
> fake PVs and prepare metadata set similar to customer's one - once we will
> have a local reproducer it's easier to seek for solution.
> 

Sorry from my side, I could share any your interested info under keep
customer data safe.

I'm busy with many bugs, still can't find a time slot to set up a env.
For this performance issue, it relates with mpath, I can't find a easy
way to set up a env. (I suspect it may trigger this issue by setting up
300 fake PVs without mpath, then do pvmove cmd.)

- Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 10:47               ` Heming Zhao
@ 2022-08-17 11:13                 ` Zdenek Kabelac
  2022-08-17 12:39                 ` Martin Wilck
  1 sibling, 0 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17 11:13 UTC (permalink / raw)
  To: Heming Zhao; +Cc: linux-lvm, teigland, martin.wilck

Dne 17. 08. 22 v 12:47 Heming Zhao napsal(a):
> On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
>> Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):
>>> On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:
>>>> Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
>>>>> On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
>>>>>> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
>>>>>>> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
>>>>>>> is triggered by pvmove.
>>>>>>>
> The machine connecting disks are more than 250. The VG has 103 PVs & 79 LVs.
> 
> # /sbin/vgs
>    VG           #PV #LV #SN Attr   VSize   VFree
>    <vgname>     103  79   0 wz--n-  52t    17t

Ok - so main issue could be too many  PVs with relatively high latency of 
mpath devices  (which could be all actually simulated easily in lvm2 test suite)

> The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE action
> which is the trigger.
> 

I'll check lvmpolld whether it's using correct locking while checking for the 
operational state - you may possibly extend checking interval of polling 
(although that's where the mentioned patchset has been enhancing couple things)


>>
>> If you have too many disks in VG  (again unclear how many there are paths
>> and how many distinct PVs) - user may *significantly* reduce burden
>> associated with metadata updating by reducing number of 'actively'
>> maintained metadata areas in VG - so i.e. if you have 100PVs in VG - you may
>> keep metadata only on 5-10 PVs to have 'enough' duplicate copies of lvm2
>> metadata within VG (vgchange --metadaatacopies X) - clearly it depends on
>> the use case and how many PVs are added/removed from a VG over the
>> lifetime....
> 
> Thanks for the important info. I also found the related VG config from
> /etc/lvm/backup/<vgname>, this file shows 'metadata_copies = 0'.
> 
> This should be another solution. But why not lvm2 takes this behavior by
> default, or give a notification when pv number beyond a threshold when user
> executing pvs/vgs/lvs or pvmove.
> There are too many magic switch, users don't know how to adjust them for
> better performance.

Problem is always the same -  selecting right 'default' :) what suites to user 
A is sometimes  'no go' for user B. So ATM it's more 'secure/safe' to keep 
metadata with each PV - so when a PV is discovered it's known how the VG using 
such PV looks like.  When only fraction of PV have the info - VG is way more 
fragile on damage when disks are lost i.e. there is no 'smart' mechanism to 
pick disks in different racks....

So this option is there for administrators that are 'clever' enough to deal 
with a new set of problems it may create for them.

Yes - lvm2 has lot of options - but that's what is usually necessary when we 
want to be capable to provide optimal solution for really wide variety of 
setups - so I think spending couple minutes on reading man pages pays off - 
especially if you had to spend 'days' on build your disk racks ;)

And yes we may add few more hints - but then we are asked by 'second' group of 
users ('skilled admins')  - why do we print so many dumb messages every time 
they do some simple operation :)

> I'm busy with many bugs, still can't find a time slot to set up a env.
> For this performance issue, it relates with mpath, I can't find a easy
> way to set up a env. (I suspect it may trigger this issue by setting up
> 300 fake PVs without mpath, then do pvmove cmd.)


'Fragmented' LVs with small segment sizes my significantly raise the amount of 
metadata updates needed during pvmove operation as  each single LV segments 
will be mirrored by individual mirror.


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 10:47               ` Heming Zhao
  2022-08-17 11:13                 ` Zdenek Kabelac
@ 2022-08-17 12:39                 ` Martin Wilck
  2022-08-17 12:54                   ` Zdenek Kabelac
  2022-08-18 21:13                   ` Martin Wilck
  1 sibling, 2 replies; 23+ messages in thread
From: Martin Wilck @ 2022-08-17 12:39 UTC (permalink / raw)
  To: Heming Zhao, zdenek.kabelac; +Cc: teigland, linux-lvm

On Wed, 2022-08-17 at 18:47 +0800, Heming Zhao wrote:
> On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
> 
> 
> > 
> > ATM I'm not even sure if you are complaining about how CPU usage of
> > lvmpolld
> > or just huge udev rules processing overhead.
> 
> The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE
> action
> which is the trigger.

Let's be clear here: every close-after-write operation triggers udev's
"watch" mechanism for block devices, which causes the udev rules to be
executed for the device. That is not a cheap operation. In the case at
hand, the customer was observing a lot of "multipath -U" commands. So
apparently a significant part of the udev rule processing was spent in
"multipath -U". Running "multipath -U" is important, because the rule
could have been triggered by a change of the number of available paths
devices, and later commands run from udev rules might hang indefinitely
if the multipath device had no usable paths any more. "multipath -U" is
already quite well optimized, but it needs to do some I/O to complete
it's work, thus it takes a few milliseconds to run.

IOW, it would be misleading to point at multipath. close-after-write
operations on block devices should be avoided if possible. As you
probably know, the purpose udev's "watch" operation is to be able to
determine changes on layered devices, e.g. newly created LVs or the
like. "pvmove" is special, because by definition it will usually not
cause any changes in higher layers. Therefore it might make sense to
disable the udev watch on the affected PVs while pvmove is running, and
trigger a single change event (re-enabling the watch) after the pvmove
has finished. If that is impossible, lvmpolld and other lvm tools that
are involved in the pvmove operation should avoid calling close() on
the PVs, IOW keep the fds open until the operation is finished.

Regards
Martin

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 12:39                 ` Martin Wilck
@ 2022-08-17 12:54                   ` Zdenek Kabelac
  2022-08-17 13:41                     ` Martin Wilck
  2022-08-18 21:13                   ` Martin Wilck
  1 sibling, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17 12:54 UTC (permalink / raw)
  To: Martin Wilck, Heming Zhao; +Cc: teigland, linux-lvm

Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> On Wed, 2022-08-17 at 18:47 +0800, Heming Zhao wrote:
>> On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
>>
>>
>>>
>>> ATM I'm not even sure if you are complaining about how CPU usage of
>>> lvmpolld
>>> or just huge udev rules processing overhead.
>>
>> The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE
>> action
>> which is the trigger.
> 
> Let's be clear here: every close-after-write operation triggers udev's
> "watch" mechanism for block devices, which causes the udev rules to be
> executed for the device. That is not a cheap operation. In the case at
> hand, the customer was observing a lot of "multipath -U" commands. So
> apparently a significant part of the udev rule processing was spent in
> "multipath -U". Running "multipath -U" is important, because the rule
> could have been triggered by a change of the number of available paths
> devices, and later commands run from udev rules might hang indefinitely
> if the multipath device had no usable paths any more. "multipath -U" is
> already quite well optimized, but it needs to do some I/O to complete
> it's work, thus it takes a few milliseconds to run.
> 
> IOW, it would be misleading to point at multipath. close-after-write
> operations on block devices should be avoided if possible. As you
> probably know, the purpose udev's "watch" operation is to be able to
> determine changes on layered devices, e.g. newly created LVs or the
> like. "pvmove" is special, because by definition it will usually not
> cause any changes in higher layers. Therefore it might make sense to
> disable the udev watch on the affected PVs while pvmove is running, and
> trigger a single change event (re-enabling the watch) after the pvmove
> has finished. If that is impossible, lvmpolld and other lvm tools that
> are involved in the pvmove operation should avoid calling close() on
> the PVs, IOW keep the fds open until the operation is finished.

Hi

Let's make clear we are very well aware of all the constrains associated with 
udev rule logic  (and we tried quite hard to minimize impact - however udevd 
developers kind of 'misunderstood'  how badly they will be impacting system's 
performance with the existing watch rule logic - and the story kind of 
'continues' with  'systemd's' & dBus services unfortunatelly...

However let's focus on 'pvmove' as it is potentially very lengthy operation - 
so it's not feasible to keep the  VG locked/blocked  across an operation which 
might take even days with slower storage and big moved sizes (write 
access/lock disables all readers...)

So the lvm2 does try to minimize locking time. We will re validate whether 
just necessary  'vg updating' operation are using 'write' access - since 
occasionally due to some unrelated code changes it might eventually result 
sometimes in unwanted 'write' VG open - but we can't keep the operation 
blocking  a whole VG because of slow udev rule processing.

In normal circumstances udev rule should be processed very fast - unless there 
is something mis-designe causing a CPU overloading.

But as mentioned already few times - without more knowledge about the case we 
could hardly guess exact reasoning.  But we already provided useful suggestion 
how to reduce number of 'processed' device by udev by reduction of 'lvm2 
metadata PVs'  - the big reason for frequent metadata upsate would be a big 
segmentation of LV - but this we will not know without seeing user's 
'metadata' of a VG in this case...


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 12:54                   ` Zdenek Kabelac
@ 2022-08-17 13:41                     ` Martin Wilck
  2022-08-17 15:11                       ` David Teigland
  2022-08-17 15:26                       ` Zdenek Kabelac
  0 siblings, 2 replies; 23+ messages in thread
From: Martin Wilck @ 2022-08-17 13:41 UTC (permalink / raw)
  To: Heming Zhao, zdenek.kabelac; +Cc: teigland, linux-lvm

On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> 
> 
> Let's make clear we are very well aware of all the constrains
> associated with 
> udev rule logic  (and we tried quite hard to minimize impact -
> however udevd 
> developers kind of 'misunderstood'  how badly they will be impacting
> system's 
> performance with the existing watch rule logic - and the story kind
> of 
> 'continues' with  'systemd's' & dBus services unfortunatelly...

I dimly remember you dislike udev ;-)

I like the general idea of the udev watch. It is the magic that causes
newly created partitions to magically appear in the system, which is
very convenient for users and wouldn't work otherwise. I can see that
it might be inappropriate for LVM PVs. We can discuss changing the
rules such that the watch is disabled for LVM devices (both PV and LV).
I don't claim to overlook all possible side effects, but it might be
worth a try. It would mean that newly created LVs, LV size changes etc.
would not be visible in the system immediately. I suppose you could
work around that in the LVM tools by triggering change events after
operations like lvcreate.

> However let's focus on 'pvmove' as it is potentially very lengthy
> operation - 
> so it's not feasible to keep the  VG locked/blocked  across an
> operation which 
> might take even days with slower storage and big moved sizes (write 
> access/lock disables all readers...)

So these close-after-write operations are caused by locking/unlocking
the PVs?

Note: We were observing that watch events were triggered every 30s, for
every PV, simultaneously. (@Heming correct me if I'mn wrong here).

> So the lvm2 does try to minimize locking time. We will re validate
> whether 
> just necessary  'vg updating' operation are using 'write' access -
> since 
> occasionally due to some unrelated code changes it might eventually
> result 
> sometimes in unwanted 'write' VG open - but we can't keep the
> operation 
> blocking  a whole VG because of slow udev rule processing.

> In normal circumstances udev rule should be processed very fast -
> unless there 
> is something mis-designe causing a CPU overloading.
> 

IIRC there is no evidence that the udev rules are really processed
"slowly". udev isn't efficient, a run time in the order 10 ms is
expected for a worker. We tried different tracing approaches, but we
never saw "multipath -U" hanging on a lock or a resource shortage. It
seems be the sheer amount of events and processes that is causing
trouble. The customer had a very lengthy "multipath.conf" file (~50k
lines), which needs to be parsed by every new multipath instance; that
was slowing things down somewhat. Still the runtime of "multipath -U"
would be no more than 100ms, AFAICT.

Martin

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 13:41                     ` Martin Wilck
@ 2022-08-17 15:11                       ` David Teigland
  2022-08-18  8:06                         ` Martin Wilck
  2022-08-17 15:26                       ` Zdenek Kabelac
  1 sibling, 1 reply; 23+ messages in thread
From: David Teigland @ 2022-08-17 15:11 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zdenek.kabelac, linux-lvm, Heming Zhao

On Wed, Aug 17, 2022 at 01:41:17PM +0000, Martin Wilck wrote:
> I like the general idea of the udev watch. It is the magic that causes
> newly created partitions to magically appear in the system, which is
> very convenient for users and wouldn't work otherwise. I can see that
> it might be inappropriate for LVM PVs. We can discuss changing the
> rules such that the watch is disabled for LVM devices (both PV and LV).
> I don't claim to overlook all possible side effects, but it might be
> worth a try. It would mean that newly created LVs, LV size changes etc.
> would not be visible in the system immediately. I suppose you could
> work around that in the LVM tools by triggering change events after
> operations like lvcreate.

I think it's worth looking into at least.  udev causes most of our major
problems, and causes things to fall apart everywhere at scale.


> Note: We were observing that watch events were triggered every 30s, for
> every PV, simultaneously.

Can you see what parts of the VG metadata are changing each time?  If it's
just metadata related to pvmove progress, then things are probably working
as designed, and we'd just be looking for optimizations (perhaps by some
code changes, or by reducing copies of metadata as suggested by Zdenek.)

Dave
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 13:41                     ` Martin Wilck
  2022-08-17 15:11                       ` David Teigland
@ 2022-08-17 15:26                       ` Zdenek Kabelac
  2022-08-17 15:58                         ` Demi Marie Obenour
  2022-08-17 17:35                         ` Gionatan Danti
  1 sibling, 2 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17 15:26 UTC (permalink / raw)
  To: Martin Wilck, Heming Zhao; +Cc: teigland, linux-lvm

Dne 17. 08. 22 v 15:41 Martin Wilck napsal(a):
> On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:
>> Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
>>
>>
>> Let's make clear we are very well aware of all the constrains
>> associated with
>> udev rule logic  (and we tried quite hard to minimize impact -
>> however udevd
>> developers kind of 'misunderstood'  how badly they will be impacting
>> system's
>> performance with the existing watch rule logic - and the story kind
>> of
>> 'continues' with  'systemd's' & dBus services unfortunatelly...
> 
> I dimly remember you dislike udev ;-)

Well it's not 'a dislike' from my side - but the architecture alone is just 
missing in many areas...

Dave is a complete disliker of udev & systemd all together :)....


> 
> I like the general idea of the udev watch. It is the magic that causes
> newly created partitions to magically appear in the system, which is

Tragedy of design comes from the plain fact that there are only 'very 
occasional' consumers of all these 'collected' data - but gathering all the 
info and keeping all of it 'up-to-date' is getting very very expensive and can 
basically 'neutralize' a lot of your CPU if you have too many resources to 
watch and keep update.....


> very convenient for users and wouldn't work otherwise. I can see that
> it might be inappropriate for LVM PVs. We can discuss changing the
> rules such that the watch is disabled for LVM devices (both PV and LV).

It's really not fixable as is - since of the complete lack of 'error' handling 
of device in udev DB (i.e. duplicate devices...., various frozen devices...)

There is on going  'SID' project - that might push the logic somewhat further, 
but existing 'device' support logic as is today is unfortunate 'trace' of how 
the design should not have been made - and since all 'original' programmers 
left the project long time ago - it's non-trivial to push things forward.

> I don't claim to overlook all possible side effects, but it might be
> worth a try. It would mean that newly created LVs, LV size changes etc.
> would not be visible in the system immediately. I suppose you could
> work around that in the LVM tools by triggering change events after
> operations like lvcreate.

We just hope the SID will make some progress (although probably small one at 
the beginning).


>> However let's focus on 'pvmove' as it is potentially very lengthy
>> operation -
>> so it's not feasible to keep the  VG locked/blocked  across an
>> operation which
>> might take even days with slower storage and big moved sizes (write
>> access/lock disables all readers...)
> 
> So these close-after-write operations are caused by locking/unlocking
> the PVs?
> 
> Note: We were observing that watch events were triggered every 30s, for
> every PV, simultaneously. (@Heming correct me if I'mn wrong here).

That's why we would like to see 'metadata' and also check if the issue is 
appearing on the latest version of lvm2.


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 15:26                       ` Zdenek Kabelac
@ 2022-08-17 15:58                         ` Demi Marie Obenour
  2022-08-18  7:37                           ` Martin Wilck
  2022-08-17 17:35                         ` Gionatan Danti
  1 sibling, 1 reply; 23+ messages in thread
From: Demi Marie Obenour @ 2022-08-17 15:58 UTC (permalink / raw)
  To: LVM general discussion and development, Martin Wilck, Heming Zhao
  Cc: teigland

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Wed, Aug 17, 2022 at 05:26:08PM +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 15:41 Martin Wilck napsal(a):
> > On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:
> > > Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> > > 
> > > 
> > > Let's make clear we are very well aware of all the constrains
> > > associated with
> > > udev rule logic  (and we tried quite hard to minimize impact -
> > > however udevd
> > > developers kind of 'misunderstood'  how badly they will be impacting
> > > system's
> > > performance with the existing watch rule logic - and the story kind
> > > of
> > > 'continues' with  'systemd's' & dBus services unfortunatelly...
> > 
> > I dimly remember you dislike udev ;-)
> 
> Well it's not 'a dislike' from my side - but the architecture alone is just
> missing in many areas...
> 
> Dave is a complete disliker of udev & systemd all together :)....

I find udev useful for physical devices, but for virtual devices it is a
terrible fit.  It is far too slow and full of race conditions.

Ideally, device-mapper ioctls would use diskseq instead of major+minor
number everywhere, and devices would be named after the diskseq.

> > I like the general idea of the udev watch. It is the magic that causes
> > newly created partitions to magically appear in the system, which is
> 
> Tragedy of design comes from the plain fact that there are only 'very
> occasional' consumers of all these 'collected' data - but gathering all the
> info and keeping all of it 'up-to-date' is getting very very expensive and
> can basically 'neutralize' a lot of your CPU if you have too many resources
> to watch and keep update.....
> 
> 
> > very convenient for users and wouldn't work otherwise. I can see that
> > it might be inappropriate for LVM PVs. We can discuss changing the
> > rules such that the watch is disabled for LVM devices (both PV and LV).
> 
> It's really not fixable as is - since of the complete lack of 'error'
> handling of device in udev DB (i.e. duplicate devices...., various frozen
> devices...)
> 
> There is on going  'SID' project - that might push the logic somewhat
> further, but existing 'device' support logic as is today is unfortunate
> 'trace' of how the design should not have been made - and since all
> 'original' programmers left the project long time ago - it's non-trivial to
> push things forward.

What is the SID project, what are its goals, and how does it plan to
achieve them?

- -- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmL9EEEACgkQsoi1X/+c
IsEkLhAA0/7E1rP44bEFCK76JzKvevdlxe7sRBEPvgn/1m9SiEntC47QiEjnQeNi
cI9RLmHUlpYRghfDQMq8vhKk6a+NbnGsWTx3jciqQph+5SSIfPW9VuNi9w0nlvwS
GPHLweMadCblWqXh8XP2RvJx1Z1QeXZ6kYbfMjhZdxY7a/vg0rXTh0XghSyrgfYs
lgFbcqdJbEX5q70OGds8rhxAbTiBKnPHh3z5aFTCN7ILXO4blRWcqhDvAk0w3SQf
lt5WgDBjZ+5gv2pNiNuwZIzqsgL6FDE4CcR+7JWlAakC1GcocVp87aoiR1hNGMob
ZQoGaivvIjqYwSkWUDUArS8ntcKRBr/mYBcm6WuGZFbWja6NT2tEVJ8vcXdr2x5W
DoPk7Vkj/Y9pOn2kcYQMKR1mGOQhq1AwimSHuzPOeWifUWM5BOkH7hS46Tyq2bZJ
BM/QjUQcnckyAgPRYu+OWP3IvfOU+bFdTKabaoNgtCT85mfgL65sr8kx23ikQeZb
RQ9VcbQnJceKrNsqBnCDE4Xegh96er4Gm+68Crdgs0adHOTcyC5937PPSVy99ls8
MbkdPEVGHe4L1TS8XhI6+NCf0oaFCVE/1vKeS4yO28VbSn/N3pbhiNF6cpc0sWDg
NA0mbIsl19t4j8CtXVCjPeh1+RULvXqhedQIC/xJF3FserAInkc=
=J7YY
-----END PGP SIGNATURE-----

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 15:26                       ` Zdenek Kabelac
  2022-08-17 15:58                         ` Demi Marie Obenour
@ 2022-08-17 17:35                         ` Gionatan Danti
  2022-08-17 18:54                           ` Zdenek Kabelac
  1 sibling, 1 reply; 23+ messages in thread
From: Gionatan Danti @ 2022-08-17 17:35 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: Heming Zhao, teigland, Martin Wilck

Il 2022-08-17 17:26 Zdenek Kabelac ha scritto:
>> I like the general idea of the udev watch. It is the magic that causes
>> newly created partitions to magically appear in the system, which is

Would disabling the watch rule be a reasonable approach in this case? If 
the user want to scan a new device, it only needs to issue partprobe or 
kpartx, or am I missing something?

> There is on going  'SID' project - that might push the logic somewhat
> further, but existing 'device' support logic as is today is
> unfortunate 'trace' of how the design should not have been made - and
> since all 'original' programmers left the project long time ago - it's
> non-trivial to push things forward.

Well, this is not good news. Just for my education, it is possibile to 
run a modern linux distro without udev at all? I still remember when the 
new cool thing for device autodiscovery was devfs (with some distro - 
like gentoo - taking the alternative approach to simply tarrig & 
untarring much of the entire /dev/ files to prepopulate the major+minor 
number...)

> We just hope the SID will make some progress (although probably small
> one at the beginning).

Any info on the project?
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 17:35                         ` Gionatan Danti
@ 2022-08-17 18:54                           ` Zdenek Kabelac
  2022-08-17 18:54                             ` Zdenek Kabelac
  2022-08-17 19:13                             ` Gionatan Danti
  0 siblings, 2 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17 18:54 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti
  Cc: Martin Wilck, teigland, Heming Zhao

Dne 17. 08. 22 v 19:35 Gionatan Danti napsal(a):
> Il 2022-08-17 17:26 Zdenek Kabelac ha scritto:
>>> I like the general idea of the udev watch. It is the magic that causes
>>> newly created partitions to magically appear in the system, which is
> 
> Would disabling the watch rule be a reasonable approach in this case? If the 
> user want to scan a new device, it only needs to issue partprobe or kpartx, or 
> am I missing something?

Before diving into these 'deep waters'  - I'd really like to first see if the 
problem is still such an issue with our upstream code base.

There have been lot of minor optimization committed over the time - so the 
amount of fired 'watch' rules should be considerably smaller then the version 
mentioned in customer issue.

>> There is on going  'SID' project - that might push the logic somewhat
>> further, but existing 'device' support logic as is today is
>> unfortunate 'trace' of how the design should not have been made - and
>> since all 'original' programmers left the project long time ago - it's
>> non-trivial to push things forward.
> 
> Well, this is not good news. Just for my education, it is possibile to run a 
> modern linux distro without udev at all? I still remember when the new cool 
> thing for device autodiscovery was devfs (with some distro - like gentoo - 
> taking the alternative approach to simply tarrig & untarring much of the 
> entire /dev/ files to prepopulate the major+minor number... >
>> We just hope the SID will make some progress (although probably small
>> one at the beginning).
> 
> Any info on the project?

https://github.com/prajnoha/sid


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 18:54                           ` Zdenek Kabelac
@ 2022-08-17 18:54                             ` Zdenek Kabelac
  2022-08-17 19:13                             ` Gionatan Danti
  1 sibling, 0 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2022-08-17 18:54 UTC (permalink / raw)
  To: linux-lvm; +Cc: Martin Wilck, teigland, Heming Zhao

Dne 17. 08. 22 v 19:35 Gionatan Danti napsal(a):
> Il 2022-08-17 17:26 Zdenek Kabelac ha scritto:
>>> I like the general idea of the udev watch. It is the magic that causes
>>> newly created partitions to magically appear in the system, which is
> 
> Would disabling the watch rule be a reasonable approach in this case? If the 
> user want to scan a new device, it only needs to issue partprobe or kpartx, or 
> am I missing something?

Before diving into these 'deep waters'  - I'd really like to first see if the 
problem is still such an issue with our upstream code base.

There have been lot of minor optimization committed over the time - so the 
amount of fired 'watch' rules should be considerably smaller then the version 
mentioned in customer issue.

>> There is on going  'SID' project - that might push the logic somewhat
>> further, but existing 'device' support logic as is today is
>> unfortunate 'trace' of how the design should not have been made - and
>> since all 'original' programmers left the project long time ago - it's
>> non-trivial to push things forward.
> 
> Well, this is not good news. Just for my education, it is possibile to run a 
> modern linux distro without udev at all? I still remember when the new cool 
> thing for device autodiscovery was devfs (with some distro - like gentoo - 
> taking the alternative approach to simply tarrig & untarring much of the 
> entire /dev/ files to prepopulate the major+minor number... >
>> We just hope the SID will make some progress (although probably small
>> one at the beginning).
> 
> Any info on the project?

https://github.com/prajnoha/sid


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 18:54                           ` Zdenek Kabelac
  2022-08-17 18:54                             ` Zdenek Kabelac
@ 2022-08-17 19:13                             ` Gionatan Danti
  1 sibling, 0 replies; 23+ messages in thread
From: Gionatan Danti @ 2022-08-17 19:13 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Heming Zhao, teigland, Martin Wilck,
	LVM general discussion and development

Il 2022-08-17 20:54 Zdenek Kabelac ha scritto:
> https://github.com/prajnoha/sid

Thanks for sharing. From the linked page:

"SID positions itself on top of udev, reacting to uevents. It is closely 
interlinked and cooperating with udev daemon. The udev daemon is 
enhanced with specialized sid udev builtin command that is used to 
communicate with SID. SID also listens to udev uevents issued by udev 
daemon which in turn triggers further processing."

For my untrained eyes, it seems an *additional* component relying on 
udev itself, so I am not sure on how it should solve the issue at hand.

But hey - anything that improve current sore spots is more than 
welcomed!

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 15:58                         ` Demi Marie Obenour
@ 2022-08-18  7:37                           ` Martin Wilck
  0 siblings, 0 replies; 23+ messages in thread
From: Martin Wilck @ 2022-08-18  7:37 UTC (permalink / raw)
  To: Heming Zhao, demi, linux-lvm; +Cc: teigland

On Wed, 2022-08-17 at 11:58 -0400, Demi Marie Obenour wrote:
> > 
> > Dave is a complete disliker of udev & systemd all together :)....
> 
> I find udev useful for physical devices, but for virtual devices it
> is a
> terrible fit.  It is far too slow and full of race conditions.
> 
> Ideally, device-mapper ioctls would use diskseq instead of
> major+minor
> number everywhere, and devices would be named after the diskseq.

This is almost guaranteed to open up big can of new issues. I can see
that diskseq makes sense for loop devices, but general dm is a very
different issue. Not to say that changing the device names would be
nightmare.

Martin

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 15:11                       ` David Teigland
@ 2022-08-18  8:06                         ` Martin Wilck
  0 siblings, 0 replies; 23+ messages in thread
From: Martin Wilck @ 2022-08-18  8:06 UTC (permalink / raw)
  To: Heming Zhao, teigland; +Cc: linux-lvm, zdenek.kabelac

On Wed, 2022-08-17 at 10:11 -0500, David Teigland wrote:
> On Wed, Aug 17, 2022 at 01:41:17PM +0000, Martin Wilck wrote:
> > I like the general idea of the udev watch. It is the magic that
> > causes
> > newly created partitions to magically appear in the system, which
> > is
> > very convenient for users and wouldn't work otherwise. I can see
> > that
> > it might be inappropriate for LVM PVs. We can discuss changing the
> > rules such that the watch is disabled for LVM devices (both PV and
> > LV).
> > I don't claim to overlook all possible side effects, but it might
> > be
> > worth a try. It would mean that newly created LVs, LV size changes
> > etc.
> > would not be visible in the system immediately. I suppose you could
> > work around that in the LVM tools by triggering change events after
> > operations like lvcreate.
> 
> I think it's worth looking into at least.  udev causes most of our
> major
> problems, and causes things to fall apart everywhere at scale.

My first proposal for a quick workaround for our customer looked like
this:

echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
    /run/udev/rules.d/90-dm-watch.rules

It made the issue disappear. This is obviously not a general solution.
Rather than applying the rule to mpath devices, it would be useful to
apply it to PVs. 

lvm could create a temporary rule similar to this one before starting a
pvmove operation (and possibly other operations that involve a lot of
metadata writes). It could actually create a rule specific to those PVs
it's going to change, with minimal side effects. When cleaning up after
the operation, the temporary rule would be removed again, and one
uevent triggered in case some other process had made changes while the
events were blocked. IMO that would be quite a simple solution to the
issue.

Using --vgmetadatacopies is also a nice workaround, but apparently the
default is storing metadata on all PVs.

Regards
Martin

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [linux-lvm] lvmpolld causes high cpu load issue
  2022-08-17 12:39                 ` Martin Wilck
  2022-08-17 12:54                   ` Zdenek Kabelac
@ 2022-08-18 21:13                   ` Martin Wilck
  1 sibling, 0 replies; 23+ messages in thread
From: Martin Wilck @ 2022-08-18 21:13 UTC (permalink / raw)
  To: Heming Zhao, zdenek.kabelac; +Cc: teigland, linux-lvm

On Wed, 2022-08-17 at 14:39 +0200, Martin Wilck wrote:



> "multipath -U" is
> already quite well optimized, but it needs to do some I/O to complete
> it's work, thus it takes a few milliseconds to run.
> 
> IOW, it would be misleading to point at multipath. 

This is what I thought. I had another look today, and I found that the
case of a very large "multipath.conf" file is indeed not well
optimized. I just sent a patch set to dm-devel ("multipath:
optimizations for large mptable") which should the issue discussed in
this thread on the multipath side.

The pvmove / watch problem could still deserve some attention, but the
urgency should be much  lower.

Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-08-23  8:29 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16  9:28 [linux-lvm] lvmpolld causes IO performance issue Heming Zhao
2022-08-16  9:38 ` Zdenek Kabelac
2022-08-16 10:08   ` [linux-lvm] lvmpolld causes high cpu load issue Heming Zhao
2022-08-16 10:26     ` Zdenek Kabelac
2022-08-17  2:03       ` Heming Zhao
2022-08-17  8:06         ` Zdenek Kabelac
2022-08-17  8:43           ` Heming Zhao
2022-08-17  9:46             ` Zdenek Kabelac
2022-08-17 10:47               ` Heming Zhao
2022-08-17 11:13                 ` Zdenek Kabelac
2022-08-17 12:39                 ` Martin Wilck
2022-08-17 12:54                   ` Zdenek Kabelac
2022-08-17 13:41                     ` Martin Wilck
2022-08-17 15:11                       ` David Teigland
2022-08-18  8:06                         ` Martin Wilck
2022-08-17 15:26                       ` Zdenek Kabelac
2022-08-17 15:58                         ` Demi Marie Obenour
2022-08-18  7:37                           ` Martin Wilck
2022-08-17 17:35                         ` Gionatan Danti
2022-08-17 18:54                           ` Zdenek Kabelac
2022-08-17 18:54                             ` Zdenek Kabelac
2022-08-17 19:13                             ` Gionatan Danti
2022-08-18 21:13                   ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).