linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zdenek.kabelac@gmail.com>
To: Heming Zhao <heming.zhao@suse.com>
Cc: linux-lvm@redhat.com, teigland@redhat.com, martin.wilck@suse.com
Subject: Re: [linux-lvm] lvmpolld causes high cpu load issue
Date: Wed, 17 Aug 2022 10:06:35 +0200	[thread overview]
Message-ID: <6fa27852-e898-659f-76a5-52f50f0de898@gmail.com> (raw)
In-Reply-To: <20220817020225.gf6ooxobdf5xhpxe@c73>

Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
> On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
>> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
>>> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
>>> is triggered by pvmove.
>>>
>>> On Tue, Aug 16, 2022 at 11:38:52AM +0200, Zdenek Kabelac wrote:
>>>> Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
>>>>> Hello maintainers & list,
>>>>>
>>>>> I bring a story:
>>>>> One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
>>>>> decrease.
>>>>>
>>>>> How to trigger:
>>>>> When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
>>>>> disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
>>>>> cpu load. But when system connects ~10 LUNs, the performance is fine.
>>>>>
>>>>> We found two work arounds:
>>>>> 1. set lvm.conf 'activation/polling_interval=120'.
>>>>> 2. write a speical udev rule, which make udev ignore the event for mpath devices.
>>>>>       echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
>>>>>        /etc/udev/rules.d/90-dm-watch.rules
>>>>>
>>>>> Run above any one of two can make the performance issue disappear.
>>>>>
>>>>> ** the root cause **
>>>>>
>>>>> lvmpolld will do interval requeset info job for updating the pvmove status
>>>>>
>>>>> On every polling_interval time, lvm2 will update vg metadata. The update job will
>>>>> call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
>>>>>      2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
>>>>> (8 is IN_CLOSE_WRITE.)
>>>>>
>>>>> These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
>>>>> even if pvmove write a few data, the sys_close action trigger udev's "watch"
>>>>> mechanism to gets notified frequently about a process that has written to the
>>>>> device and closed it. This causes frequent, pointless re-evaluation of the udev
>>>>> rules for these devices.
>>>>>
>>>>> My question: Does LVM2 maintainers have any idea to fix this bug?
>>>>>
>>>>> In my view, does lvm2 could drop VGs devices fds until pvmove finish?
>>>>
>>>> Hi
>>>>
>>>> Please provide more info about lvm2  metadata and also some  'lvs -avvvvv'
>>>> trace so we can get better picture about the layout - also version of
>>>> lvm2,systemd,kernel in use.
>>>>
>>>> pvmove is progressing by mirroring each segment of an LV - so if there would
>>>> be a lot of segments - then each such update may trigger udev watch rule
>>>> event.
>>>>
>>>> But ATM I could hardly imagine how this could cause some 'dramatic'
>>>> performance decrease -  maybe there is something wrong with udev rules on
>>>> the system ?
>>>>
>>>> What is the actual impact ?
>>>>
>>>> Note - pvmove was never designed as a high performance operation (in fact it
>>>> tries to not eat all the disk bandwidth as such)
>>>>
>>>> Regards
>>>> Zdenek
>>>
>>> My mistake, I write here again:
>>> The subject is wrong, not IO performance but cpu high load is triggered by pvmove.
>>>
>>> There is no IO performance issue.
>>>
>>> When system is connecting 80~200, the cpu load increase by 15~20, the
>>> cpu usage by ~20%, which corresponds to about ~5,6 cores and led at
>>> times to the cores fully utilized.
>>> In another word: a single pvmove process cost 5-6 (sometime 10) cores
>>> utilization. It's abnormal & unaccepted.
>>>
>>> The lvm2 is 2.03.05, kernel is 5.3. systemd is v246.
>>>
>>> BTW:
>>> I change this mail subject from:  lvmpolld causes IO performance issue
>>> to: lvmpolld causes high cpu load issue
>>> Please use this mail for later discussing.
>>
>>
>> Hi
>>
>> Could you please retest with recent version of lvm2. There have been
>> certainly some improvements in scanning - which might have caused in the
>> older releases some higher CPU usage with longer set of devices.
>>
>> Regards
>>
>> Zdenek
> 
> The highest lvm2 version in SUSE products is lvm2-2.03.15, does this
> version include the improvements change?
> Could you mind to point out which commits related with the improvements?
> I don't have the reproducible env, I need to get a little detail before
> asking customer to try new version.
> 


Please try to reproduce your customer's problem and see if the newer version 
solves the issue.   Otherwise we could waste hours on theoretical discussions 
what might or might not have helped with this problem. Having a reproducer is 
a starting point for fixing it, if the problem is still there.

Here is one commit that may possibly affect CPU load:

d2522f4a05aa027bcc911ecb832450bc19b7fb57


Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


  reply	other threads:[~2022-08-17  8:07 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  9:28 [linux-lvm] lvmpolld causes IO performance issue Heming Zhao
2022-08-16  9:38 ` Zdenek Kabelac
2022-08-16 10:08   ` [linux-lvm] lvmpolld causes high cpu load issue Heming Zhao
2022-08-16 10:26     ` Zdenek Kabelac
2022-08-17  2:03       ` Heming Zhao
2022-08-17  8:06         ` Zdenek Kabelac [this message]
2022-08-17  8:43           ` Heming Zhao
2022-08-17  9:46             ` Zdenek Kabelac
2022-08-17 10:47               ` Heming Zhao
2022-08-17 11:13                 ` Zdenek Kabelac
2022-08-17 12:39                 ` Martin Wilck
2022-08-17 12:54                   ` Zdenek Kabelac
2022-08-17 13:41                     ` Martin Wilck
2022-08-17 15:11                       ` David Teigland
2022-08-18  8:06                         ` Martin Wilck
2022-08-17 15:26                       ` Zdenek Kabelac
2022-08-17 15:58                         ` Demi Marie Obenour
2022-08-18  7:37                           ` Martin Wilck
2022-08-17 17:35                         ` Gionatan Danti
2022-08-17 18:54                           ` Zdenek Kabelac
2022-08-17 18:54                             ` Zdenek Kabelac
2022-08-17 19:13                             ` Gionatan Danti
2022-08-18 21:13                   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fa27852-e898-659f-76a5-52f50f0de898@gmail.com \
    --to=zdenek.kabelac@gmail.com \
    --cc=heming.zhao@suse.com \
    --cc=linux-lvm@redhat.com \
    --cc=martin.wilck@suse.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).