linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] Discussion: performance issue on event activation mode
@ 2021-06-06  6:15 heming.zhao
  2021-06-06 16:35 ` Roger Heflin
                   ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: heming.zhao @ 2021-06-06  6:15 UTC (permalink / raw)
  To: Zdenek Kabelac, David Teigland, Martin Wilck,
	LVM general discussion and development

Hello David & Zdenek,

I send this mail for a well known performance issue:
  when system is attached huge numbers of devices. (ie. 1000+ disks),
  the lvm2-pvscan@.service costs too much time and systemd is very easy to
  time out, and enter emergency shell in the end.

This performance topic had been discussed in there some times, and the issue was
lasting for many years. From the lvm2 latest code, this issue still can't be fix
completely. The latest code add new function _pvscan_aa_quick(), which makes the
booting time largely reduce but still can's fix this issue utterly.

In my test env, x86 qemu-kvm machine, 6vcpu, 22GB mem, 1015 pv/vg/lv, comparing
with/without _pvscan_aa_quick() code, booting time reduce from "9min 51s" to
"2min 6s". But after switching to direct activation, the booting time is 8.7s
(for longest lvm2 services: lvm2-activation-early.service).

The hot spot of event activation is dev_cache_scan, which time complexity is
O(n^2). And at the same time, systemd-udev worker will generate/run
lvm2-pvscan@.service on all detecting disks. So the overall is O(n^3).

```
dev_cache_scan //order: O(n^2)
  + _insert_dirs //O(n)
  | if obtain_device_list_from_udev() true
  |   _insert_udev_dir //O(n)
  |
  + dev_cache_index_devs //O(n)

There are 'n' lvm2-pvscan@.service running: O(n)
Overall: O(n) * O(n^2) => O(n^3)
```

Question/topic:
Could we find out a final solution to have a good performance & scale well under
event-based activation?

Maybe two solutions (Martin & I discussed):

1. During boot phase, lvm2 automatically swithes to direct activation mode
("event_activation = 0"). After booted, switch back to the event activation mode.

Booting phase is a speical stage. *During boot*, we could "pretend" that direct
activation (event_activation=0) is set, and rely on lvm2-activation-*.service
for PV detection. Once lvm2-activation-net.service has finished, we could
"switch on" event activation.

More precisely: pvscan --cache would look at some file under /run,
e.g. /run/lvm2/boot-finished, and quit immediately if the file doesn't exist
(as if event_activation=0 was set). In lvm2-activation-net.service, we would add
something like:

```
ExecStartPost=/bin/touch /run/lvm2/boot-finished
```

... so that, from this point in time onward, "pvscan --cache" would _not_ quit
immediately any more, but run normally (assuming that the global
event_activation setting is 1). This way we'd get the benefit of using the
static activation services during boot (good performance) while still being able
to react to udev events after booting has finished.

This idea would be worked out with very few code changes.
The result would be a huge step forward on booting time.


2. change lvm2-pvscan@.service running mode from parallel to serival.

This idea looks a little weird, it goes the opposite trend of today's
programming technologies: parallel programming on multi-cores.

idea:
the action of lvm2 scaning "/dev" is hard to change, the outside parallel
lvm2-pvscan@.service could change from parallel to serial.

For example, a running pvscan instance could set a "running" flag in tmpfs (ie.
/run/lvm/) indicating that no other pvscan process should be called in parallel.
If another pvscan is invoked and sees "running", it would create a "pending"
flag, and quit. Any other pvscan process seeing the "pending" flag would
just quit. If the first instance sees the "pending" flag, it would
atomically remove "pending" and restart itself, in order to catch any device
that might have appeared since the previous sysfs scan.
In most condition, devices had been found by once pvscan scanning,
then next time of pvscan scanning should work with order O(n), because the
target device had been inserted internal cache tree already. and on overall,
there is only a single pvscan process would be running at any given time.

We could create a list of pending to-be-scanned devices then (might be directory
entries in some tmpfs directory). On exit, pvscan could check this dir and
restart if it's non-empty.


Thanks
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-06  6:15 [linux-lvm] Discussion: performance issue on event activation mode heming.zhao
@ 2021-06-06 16:35 ` Roger Heflin
  2021-06-07 10:27   ` Martin Wilck
  2021-06-07 15:48 ` Martin Wilck
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Roger Heflin @ 2021-06-06 16:35 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: Martin Wilck, David Teigland, Zdenek Kabelac


[-- Attachment #1.1: Type: text/plain, Size: 6858 bytes --]

This might be a simpler way to control the number of threads at the same
time.

On large machines (cpu wise, memory wise and disk wise).   I have only seen
lvm timeout when udev_children is set to default.   The default seems to be
set wrong, and the default seemed to be tuned for a case where a large
number of the disks on the machine were going to be timing out (or
otherwise really really slow), so to support this case a huge number of
threads was required..    I found that with it set to default on a close to
100 core machine that udev got about 87 minutes of time during the boot up
(about 2 minutes).  Changing the number of children to =4 resulted in udev
getting around 2-3 minutes in the same window, and actually resulted in a
much faster boot up and a much more reliable boot up (no timeouts).   We
experienced these timeouts on a number of the larger machines (70 cores or
more) before we debugged and determined what was going on.   It would
appear that the udev threads on the giant machines with a lot of disk are
overwhelming each other in some sort of tight (either process creating
system time or some other resource constraint) loop and causing contention
and doing very little useful work.

Just an observation, this may have nothing to do with what you have going
on, but what you are describing sounds very close to what I debugged.   We
were doing "ps axuwwS | grep -i udev" just after boot up to determine how
much cpu time udev was getting during boot up, and determined that as we
lowered the children the time got less, and the boot up got faster and
stopped timing out.  And since udev was getting 90 minutes in an elapsed
time of around 120 seconds, it had to be using a significant number of
threads during boot up.  I believe these same udev threads call the pvscans.

Below is one case, but I know there are several other similar cases for
other distributions.    Note the number of default workers = 8 +
number_of_cpus * 64 which is going to be a disaster as it will result in
one thread per disk/lun being started at the same time or the
max_number_of_workers.  Either of which will result in a high degree of
non-productive useless system contention on a machine with a significant
number of luns.
https://www.suse.com/support/kb/doc/?id=000019156

On Sun, Jun 6, 2021 at 1:16 AM heming.zhao@suse.com <heming.zhao@suse.com>
wrote:

> Hello David & Zdenek,
>
> I send this mail for a well known performance issue:
>   when system is attached huge numbers of devices. (ie. 1000+ disks),
>   the lvm2-pvscan@.service costs too much time and systemd is very easy to
>   time out, and enter emergency shell in the end.
>
> This performance topic had been discussed in there some times, and the
> issue was
> lasting for many years. From the lvm2 latest code, this issue still can't
> be fix
> completely. The latest code add new function _pvscan_aa_quick(), which
> makes the
> booting time largely reduce but still can's fix this issue utterly.
>
> In my test env, x86 qemu-kvm machine, 6vcpu, 22GB mem, 1015 pv/vg/lv,
> comparing
> with/without _pvscan_aa_quick() code, booting time reduce from "9min 51s"
> to
> "2min 6s". But after switching to direct activation, the booting time is
> 8.7s
> (for longest lvm2 services: lvm2-activation-early.service).
>
> The hot spot of event activation is dev_cache_scan, which time complexity
> is
> O(n^2). And at the same time, systemd-udev worker will generate/run
> lvm2-pvscan@.service on all detecting disks. So the overall is O(n^3).
>
> ```
> dev_cache_scan //order: O(n^2)
>   + _insert_dirs //O(n)
>   | if obtain_device_list_from_udev() true
>   |   _insert_udev_dir //O(n)
>   |
>   + dev_cache_index_devs //O(n)
>
> There are 'n' lvm2-pvscan@.service running: O(n)
> Overall: O(n) * O(n^2) => O(n^3)
> ```
>
> Question/topic:
> Could we find out a final solution to have a good performance & scale well
> under
> event-based activation?
>
> Maybe two solutions (Martin & I discussed):
>
> 1. During boot phase, lvm2 automatically swithes to direct activation mode
> ("event_activation = 0"). After booted, switch back to the event
> activation mode.
>
> Booting phase is a speical stage. *During boot*, we could "pretend" that
> direct
> activation (event_activation=0) is set, and rely on
> lvm2-activation-*.service
> for PV detection. Once lvm2-activation-net.service has finished, we could
> "switch on" event activation.
>
> More precisely: pvscan --cache would look at some file under /run,
> e.g. /run/lvm2/boot-finished, and quit immediately if the file doesn't
> exist
> (as if event_activation=0 was set). In lvm2-activation-net.service, we
> would add
> something like:
>
> ```
> ExecStartPost=/bin/touch /run/lvm2/boot-finished
> ```
>
> ... so that, from this point in time onward, "pvscan --cache" would _not_
> quit
> immediately any more, but run normally (assuming that the global
> event_activation setting is 1). This way we'd get the benefit of using the
> static activation services during boot (good performance) while still
> being able
> to react to udev events after booting has finished.
>
> This idea would be worked out with very few code changes.
> The result would be a huge step forward on booting time.
>
>
> 2. change lvm2-pvscan@.service running mode from parallel to serival.
>
> This idea looks a little weird, it goes the opposite trend of today's
> programming technologies: parallel programming on multi-cores.
>
> idea:
> the action of lvm2 scaning "/dev" is hard to change, the outside parallel
> lvm2-pvscan@.service could change from parallel to serial.
>
> For example, a running pvscan instance could set a "running" flag in tmpfs
> (ie.
> /run/lvm/) indicating that no other pvscan process should be called in
> parallel.
> If another pvscan is invoked and sees "running", it would create a
> "pending"
> flag, and quit. Any other pvscan process seeing the "pending" flag would
> just quit. If the first instance sees the "pending" flag, it would
> atomically remove "pending" and restart itself, in order to catch any
> device
> that might have appeared since the previous sysfs scan.
> In most condition, devices had been found by once pvscan scanning,
> then next time of pvscan scanning should work with order O(n), because the
> target device had been inserted internal cache tree already. and on
> overall,
> there is only a single pvscan process would be running at any given time.
>
> We could create a list of pending to-be-scanned devices then (might be
> directory
> entries in some tmpfs directory). On exit, pvscan could check this dir and
> restart if it's non-empty.
>
>
> Thanks
> Heming
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>

[-- Attachment #1.2: Type: text/html, Size: 8124 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-06 16:35 ` Roger Heflin
@ 2021-06-07 10:27   ` Martin Wilck
  2021-06-07 15:30     ` heming.zhao
  2021-06-07 21:30     ` David Teigland
  0 siblings, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-07 10:27 UTC (permalink / raw)
  To: Heming Zhao, linux-lvm, rogerheflin; +Cc: teigland, zkabelac

On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
> This might be a simpler way to control the number of threads at the
> same time.
> 
> On large machines (cpu wise, memory wise and disk wise).   I have
> only seen lvm timeout when udev_children is set to default.   The
> default seems to be set wrong, and the default seemed to be tuned for
> a case where a large number of the disks on the machine were going to
> be timing out (or otherwise really really slow), so to support this
> case a huge number of threads was required..    I found that with it
> set to default on a close to 100 core machine that udev got about 87
> minutes of time during the boot up (about 2 minutes).  Changing the
> number of children to =4 resulted in udev getting around 2-3 minutes
> in the same window, and actually resulted in a much faster boot up
> and a much more reliable boot up (no timeouts).

Wow, setting the number of children to 4 is pretty radical. We decrease
this parameter often on large machines, but we never went all the way
down to a single-digit number. If that's really necessary under
whatever circumstances, it's clear evidence of udev's deficiencies.

I am not sure if it's better than Heming's suggestion though. It would
affect every device in the system. It wouldn't even be possible to
process more than 4 totally different events at the same time.

Most importantly, this was about LVM2 scanning of physical volumes. The
number of udev workers has very little influence on PV scanning,
because the udev rules only activate systemd service. The actual
scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
no limit for the number of instances of a given systemd service
template that can run at any given time.

Note that there have been various changes in the way udev calculates
the default number of workers; what udev will use by default depends on
the systemd version and may even be patched by the distribution.

> Below is one case, but I know there are several other similar cases
> for other distributions.    Note the number of default workers = 8 +
> number_of_cpus * 64 which is going to be a disaster as it will result
> in one thread per disk/lun being started at the same time or the
> max_number_of_workers. 

What distribution are you using? This is not the default formula for
children-max any more, and hasn't been for a while.

Regards
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 10:27   ` Martin Wilck
@ 2021-06-07 15:30     ` heming.zhao
  2021-06-07 15:45       ` Martin Wilck
  2021-06-07 20:52       ` Roger Heflin
  2021-06-07 21:30     ` David Teigland
  1 sibling, 2 replies; 77+ messages in thread
From: heming.zhao @ 2021-06-07 15:30 UTC (permalink / raw)
  To: Martin Wilck, linux-lvm, rogerheflin; +Cc: teigland, zkabelac

On 6/7/21 6:27 PM, Martin Wilck wrote:
> On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
>> This might be a simpler way to control the number of threads at the
>> same time.
>>
>> On large machines (cpu wise, memory wise and disk wise).   I have
>> only seen lvm timeout when udev_children is set to default.   The
>> default seems to be set wrong, and the default seemed to be tuned for
>> a case where a large number of the disks on the machine were going to
>> be timing out (or otherwise really really slow), so to support this
>> case a huge number of threads was required..    I found that with it
>> set to default on a close to 100 core machine that udev got about 87
>> minutes of time during the boot up (about 2 minutes).  Changing the
>> number of children to =4 resulted in udev getting around 2-3 minutes
>> in the same window, and actually resulted in a much faster boot up
>> and a much more reliable boot up (no timeouts).
> 
> Wow, setting the number of children to 4 is pretty radical. We decrease
> this parameter often on large machines, but we never went all the way
> down to a single-digit number. If that's really necessary under
> whatever circumstances, it's clear evidence of udev's deficiencies.
> 
> I am not sure if it's better than Heming's suggestion though. It would
> affect every device in the system. It wouldn't even be possible to
> process more than 4 totally different events at the same time.
> 

hello

I tested udev.children_max with value 1, 2 & 4. The results showed it
didn't take effect, and the booting time even longer than before.
This solution may suite for some special cases.

(my env: kvm-qemu vm, 6vpu, 22G mem, 1015 disks)

Regards
heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 15:30     ` heming.zhao
@ 2021-06-07 15:45       ` Martin Wilck
  2021-06-07 20:52       ` Roger Heflin
  1 sibling, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-07 15:45 UTC (permalink / raw)
  To: Heming Zhao, linux-lvm, rogerheflin; +Cc: teigland, zkabelac

On Mo, 2021-06-07 at 23:30 +0800, heming.zhao@suse.com wrote:
> On 6/7/21 6:27 PM, Martin Wilck wrote:
> > On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
> > > This might be a simpler way to control the number of threads at
> > > the
> > > same time.
> > > 
> > > On large machines (cpu wise, memory wise and disk wise).   I have
> > > only seen lvm timeout when udev_children is set to default.   The
> > > default seems to be set wrong, and the default seemed to be tuned
> > > for
> > > a case where a large number of the disks on the machine were
> > > going to
> > > be timing out (or otherwise really really slow), so to support
> > > this
> > > case a huge number of threads was required..    I found that with
> > > it
> > > set to default on a close to 100 core machine that udev got about
> > > 87
> > > minutes of time during the boot up (about 2 minutes).  Changing
> > > the
> > > number of children to =4 resulted in udev getting around 2-3
> > > minutes
> > > in the same window, and actually resulted in a much faster boot
> > > up
> > > and a much more reliable boot up (no timeouts).
> > 
> > Wow, setting the number of children to 4 is pretty radical. We
> > decrease
> > this parameter often on large machines, but we never went all the
> > way
> > down to a single-digit number. If that's really necessary under
> > whatever circumstances, it's clear evidence of udev's deficiencies.
> > 
> > I am not sure if it's better than Heming's suggestion though. It
> > would
> > affect every device in the system. It wouldn't even be possible to
> > process more than 4 totally different events at the same time.
> > 
> 
> hello
> 
> I tested udev.children_max with value 1, 2 & 4. The results showed it
> didn't take effect, and the booting time even longer than before.
> This solution may suite for some special cases.

Thanks, good to know. There may be other scenarios where Roger's
suggestion might help. But it should be clear that no distribution will
ever use such low limits, because it'd slow down booting on other
systems unnecessarily.

Thanks,
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-06  6:15 [linux-lvm] Discussion: performance issue on event activation mode heming.zhao
  2021-06-06 16:35 ` Roger Heflin
@ 2021-06-07 15:48 ` Martin Wilck
  2021-06-07 16:31   ` Zdenek Kabelac
  2021-06-07 21:48   ` David Teigland
  2021-06-07 16:40 ` David Teigland
  2021-07-02 21:09 ` David Teigland
  3 siblings, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-07 15:48 UTC (permalink / raw)
  To: Heming Zhao, zkabelac, teigland, linux-lvm

On So, 2021-06-06 at 14:15 +0800, heming.zhao@suse.com wrote:
> 
> 1. During boot phase, lvm2 automatically swithes to direct activation
> mode
> ("event_activation = 0"). After booted, switch back to the event
> activation mode.
> 
> Booting phase is a speical stage. *During boot*, we could "pretend"
> that direct
> activation (event_activation=0) is set, and rely on lvm2-activation-
> *.service
> for PV detection. Once lvm2-activation-net.service has finished, we
> could
> "switch on" event activation.

I like this idea. Alternatively, we could discuss disabling event
activation only in the "coldplug" phase after switching root (i.e.
between start of systemd-udev-trigger.service and lvm2-
activation.service), because that's the critical time span during which
1000s of events can happen simultaneously.

Regards
Martin



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 15:48 ` Martin Wilck
@ 2021-06-07 16:31   ` Zdenek Kabelac
  2021-06-07 21:48   ` David Teigland
  1 sibling, 0 replies; 77+ messages in thread
From: Zdenek Kabelac @ 2021-06-07 16:31 UTC (permalink / raw)
  To: Martin Wilck, Heming Zhao, teigland, linux-lvm

Dne 07. 06. 21 v 17:48 Martin Wilck napsal(a):
> On So, 2021-06-06 at 14:15 +0800, heming.zhao@suse.com wrote:
>> 1. During boot phase, lvm2 automatically swithes to direct activation
>> mode
>> ("event_activation = 0"). After booted, switch back to the event
>> activation mode.
>>
>> Booting phase is a speical stage. *During boot*, we could "pretend"
>> that direct
>> activation (event_activation=0) is set, and rely on lvm2-activation-
>> *.service
>> for PV detection. Once lvm2-activation-net.service has finished, we
>> could
>> "switch on" event activation.
> I like this idea. Alternatively, we could discuss disabling event
> activation only in the "coldplug" phase after switching root (i.e.
> between start of systemd-udev-trigger.service and lvm2-
> activation.service), because that's the critical time span during which
> 1000s of events can happen simultaneously.


Hello


In lvm2 we never actually suggested to use 'autoactivation' during the boot - 
this case doesn't make much sense - as it's already know ahead of time which 
device ID is needed to be activated. So whoever started to use autoactivation 
during boot - did it his own distro way.   Fedora  nor RHEL uses this.  On 
second note  David is currently trying to optimize and rework Dracut's booting 
as it didn't aged quite well and there are several weak points to be fixed.



Regards


Zdenek


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-06  6:15 [linux-lvm] Discussion: performance issue on event activation mode heming.zhao
  2021-06-06 16:35 ` Roger Heflin
  2021-06-07 15:48 ` Martin Wilck
@ 2021-06-07 16:40 ` David Teigland
  2021-07-02 21:09 ` David Teigland
  3 siblings, 0 replies; 77+ messages in thread
From: David Teigland @ 2021-06-07 16:40 UTC (permalink / raw)
  To: heming.zhao
  Cc: Martin Wilck, LVM general discussion and development, Zdenek Kabelac


Hi Heming,

Thanks for the analysis and tying things together for us so clearly, and I
like the ideas you've outlined.

On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> I send this mail for a well known performance issue:
>  when system is attached huge numbers of devices. (ie. 1000+ disks),
>  the lvm2-pvscan@.service costs too much time and systemd is very easy to
>  time out, and enter emergency shell in the end.
> 
> This performance topic had been discussed in there some times, and the issue was
> lasting for many years. From the lvm2 latest code, this issue still can't be fix
> completely. The latest code add new function _pvscan_aa_quick(), which makes the
> booting time largely reduce but still can's fix this issue utterly.
> 
> In my test env, x86 qemu-kvm machine, 6vcpu, 22GB mem, 1015 pv/vg/lv, comparing
> with/without _pvscan_aa_quick() code, booting time reduce from "9min 51s" to
> "2min 6s". But after switching to direct activation, the booting time is 8.7s
> (for longest lvm2 services: lvm2-activation-early.service).

Interesting, it's good to see the "quick" optimization is so effective.
Another optimization that should be helping in many cases is the
"vgs_online" file which will prevent concurrent pvscans from all
attempting to autoactivate a VG.

> The hot spot of event activation is dev_cache_scan, which time complexity is
> O(n^2). And at the same time, systemd-udev worker will generate/run
> lvm2-pvscan@.service on all detecting disks. So the overall is O(n^3).
> 
> ```
> dev_cache_scan //order: O(n^2)
>  + _insert_dirs //O(n)
>  | if obtain_device_list_from_udev() true
>  |   _insert_udev_dir //O(n)
>  |
>  + dev_cache_index_devs //O(n)
> 
> There are 'n' lvm2-pvscan@.service running: O(n)
> Overall: O(n) * O(n^2) => O(n^3)
> ```

I knew the dev_cache_scan was inefficient, but didn't realize it was
having such a negative impact, especially since it isn't reading devices.
Some details I'm interested to look at more closely (and perhaps you
already have some answers here):

1. Does obtain_device_list_from_udev=0 improve things?  I recently noticed
that 0 appeared to be faster (anecdotally), and proposed we change the
default to 0 (also because I'm biased toward avoiding udev whenever
possible.)

2. We should probably move or improve the "index_devs" step; it's not the
main job of dev_cache_scan and I suspect this could be done more
efficiently, or avoided in many cases.

3. pvscan --cache is supposed to be scalable because it only (usually)
reads the single device that is passed to it, until activation is needed,
at which point all devices are read to perform a proper VG activation.
However, pvscan does not attempt to reduce dev_cache_scan since I didn't
know it was a problem.  It probably makes sense to avoid a full
dev_cache_scan when pvscan is only processing one device (e.g.
setup_device() rather than setup_devices().)

> Question/topic:
> Could we find out a final solution to have a good performance & scale well under
> event-based activation?

First, you might not have seen my recently added udev rule for
autoactivation, I apologize it's been sitting in the "dev-next" branch
since we've not figured out a good a branching strategy for this change.
We just began getting some feedback on this change last week:

https://sourceware.org/git/?p=lvm2.git;a=blob;f=udev/69-dm-lvm.rules.in;h=03c8fbbd6870bbd925c123d66b40ac135b295574;hb=refs/heads/dev-next

There's a similar change I'm working on for dracut:
https://github.com/dracutdevs/dracut/pull/1506

Each device uevent still triggers a pvscan --cache, reading just the one
device, but when a VG is complete, the udev rule runs systemd-run vgchange
-aay VG.  Since it's not changing dev_cache_scan usage, the issues you're
describing will still need to be looked at.

> Maybe two solutions (Martin & I discussed):
> 
> 1. During boot phase, lvm2 automatically swithes to direct activation mode
> ("event_activation = 0"). After booted, switch back to the event activation mode.
> 
> Booting phase is a speical stage. *During boot*, we could "pretend" that direct
> activation (event_activation=0) is set, and rely on lvm2-activation-*.service
> for PV detection. Once lvm2-activation-net.service has finished, we could
> "switch on" event activation.
> 
> More precisely: pvscan --cache would look at some file under /run,
> e.g. /run/lvm2/boot-finished, and quit immediately if the file doesn't exist
> (as if event_activation=0 was set). In lvm2-activation-net.service, we would add
> something like:
> 
> ```
> ExecStartPost=/bin/touch /run/lvm2/boot-finished
> ```
> 
> ... so that, from this point in time onward, "pvscan --cache" would _not_ quit
> immediately any more, but run normally (assuming that the global
> event_activation setting is 1). This way we'd get the benefit of using the
> static activation services during boot (good performance) while still being able
> to react to udev events after booting has finished.
> 
> This idea would be worked out with very few code changes.
> The result would be a huge step forward on booting time.

This sounds appealing to me, I've always found it somewhat dubious how we
pretend each device is newly attached, and process it individually, even
if all devices are already present.  We should be taking advantage of the
common case when many or most devices are already present, which is what
you're doing here.  Async/event-based processing has it's place, but it's
surely not always the best answer.  I will think some more about the
details of how this might work, it seems promising.

> 2. change lvm2-pvscan@.service running mode from parallel to serival.
> 
> This idea looks a little weird, it goes the opposite trend of today's
> programming technologies: parallel programming on multi-cores.
> 
> idea:
> the action of lvm2 scaning "/dev" is hard to change, the outside parallel
> lvm2-pvscan@.service could change from parallel to serial.
>
> For example, a running pvscan instance could set a "running" flag in tmpfs (ie.
> /run/lvm/) indicating that no other pvscan process should be called in parallel.
> If another pvscan is invoked and sees "running", it would create a "pending"
> flag, and quit. Any other pvscan process seeing the "pending" flag would
> just quit. If the first instance sees the "pending" flag, it would
> atomically remove "pending" and restart itself, in order to catch any device
> that might have appeared since the previous sysfs scan.
> In most condition, devices had been found by once pvscan scanning,
> then next time of pvscan scanning should work with order O(n), because the
> target device had been inserted internal cache tree already. and on overall,
> there is only a single pvscan process would be running at any given time.
> 
> We could create a list of pending to-be-scanned devices then (might be directory
> entries in some tmpfs directory). On exit, pvscan could check this dir and
> restart if it's non-empty.

The present design is based on pvscan --cache reading only the one device
that has been attached, and I think that's good.  I'd expect that also
lends itself to running pvscans in parallel, since they are all reading
different devices.  If it's just dev_cache_scan that needs optimizing, I
expect there are better ways to do that than adding serialization.  This
is also related to the number of udev workers as mentioned in the next
email.  So I think we need to narrow down the problem a little more before
we know if serializing is going to be the right answer, or where/how to do
it.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 15:30     ` heming.zhao
  2021-06-07 15:45       ` Martin Wilck
@ 2021-06-07 20:52       ` Roger Heflin
  1 sibling, 0 replies; 77+ messages in thread
From: Roger Heflin @ 2021-06-07 20:52 UTC (permalink / raw)
  To: heming.zhao; +Cc: zkabelac, linux-lvm, teigland, Martin Wilck


[-- Attachment #1.1: Type: text/plain, Size: 3546 bytes --]

The case we had is large physical machines with around 1000 disks.   We did
not see the issue on the smaller cpu/disked physicals and/or vm's.  It
seemed like both high cpu counts and high disk counts were needed, but in
our environment both of those are usually together.    The smallest
machines that had the issues had 72 threads (36 actual cores).   And the
disk devices were all SSD SAN luns so I would expect all of the devices to
respond to and return IO requests in under .3ms under normal conditions.
They were also all partitioned and multipath'ed.  90% of the disk would not
have had any LVM on them at all but would have been at least initial
scanned by something, but the systemd LVM parts where what was timing out,
and based on the time udev was getting in the 90-120 seconds (90 minutes of
time) it very much seemed to be having serious cpu time issues doing
something.

I have done some simple tests forking a bunch of tests forking off a bunch
of  /usr/sbin/lvm pvscan --cache major:minor in the background and in
parallel rapidly and cannot get it to really act badly except with numbers
that are >20000.

And if I am reading the direct case pvscan that is fast about the only
thing that differs is that it does not spawn off lots of processes and
events and just does the pvscan once.  Between udev and systemd I am not
clear on how many different events have to be handled and how many of those
events need to spawn new threads and/or fork new processes off.
Something doing one of those 2 things or both would seem to have been the
cause of the issue I saw in the past.

When it has difficult booting up like this what does ps axuS | grep udev
look like time wise?


On Mon, Jun 7, 2021 at 10:30 AM heming.zhao@suse.com <heming.zhao@suse.com>
wrote:

> On 6/7/21 6:27 PM, Martin Wilck wrote:
> > On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
> >> This might be a simpler way to control the number of threads at the
> >> same time.
> >>
> >> On large machines (cpu wise, memory wise and disk wise).   I have
> >> only seen lvm timeout when udev_children is set to default.   The
> >> default seems to be set wrong, and the default seemed to be tuned for
> >> a case where a large number of the disks on the machine were going to
> >> be timing out (or otherwise really really slow), so to support this
> >> case a huge number of threads was required..    I found that with it
> >> set to default on a close to 100 core machine that udev got about 87
> >> minutes of time during the boot up (about 2 minutes).  Changing the
> >> number of children to =4 resulted in udev getting around 2-3 minutes
> >> in the same window, and actually resulted in a much faster boot up
> >> and a much more reliable boot up (no timeouts).
> >
> > Wow, setting the number of children to 4 is pretty radical. We decrease
> > this parameter often on large machines, but we never went all the way
> > down to a single-digit number. If that's really necessary under
> > whatever circumstances, it's clear evidence of udev's deficiencies.
> >
> > I am not sure if it's better than Heming's suggestion though. It would
> > affect every device in the system. It wouldn't even be possible to
> > process more than 4 totally different events at the same time.
> >
>
> hello
>
> I tested udev.children_max with value 1, 2 & 4. The results showed it
> didn't take effect, and the booting time even longer than before.
> This solution may suite for some special cases.
>
> (my env: kvm-qemu vm, 6vpu, 22G mem, 1015 disks)
>
> Regards
> heming
>
>

[-- Attachment #1.2: Type: text/html, Size: 4283 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 10:27   ` Martin Wilck
  2021-06-07 15:30     ` heming.zhao
@ 2021-06-07 21:30     ` David Teigland
  2021-06-08  8:26       ` Martin Wilck
  2021-06-08 16:18       ` heming.zhao
  1 sibling, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-06-07 21:30 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, rogerheflin, linux-lvm, Heming Zhao

On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
> Most importantly, this was about LVM2 scanning of physical volumes. The
> number of udev workers has very little influence on PV scanning,
> because the udev rules only activate systemd service. The actual
> scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
> no limit for the number of instances of a given systemd service
> template that can run at any given time.

Excessive device scanning has been the historical problem in this area,
but Heming mentioned dev_cache_scan() specifically as a problem.  That was
surprising to me since it doesn't scan/read devices, it just creates a
list of device names on the system (either readdir in /dev or udev
listing.)  If there are still problems with excessive scannning/reading,
we'll need some more diagnosis of what's happening, there could be some
cases we've missed.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 15:48 ` Martin Wilck
  2021-06-07 16:31   ` Zdenek Kabelac
@ 2021-06-07 21:48   ` David Teigland
  2021-06-08 12:29     ` Peter Rajnoha
  1 sibling, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-06-07 21:48 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, linux-lvm, Heming Zhao

On Mon, Jun 07, 2021 at 03:48:30PM +0000, Martin Wilck wrote:
> On So, 2021-06-06 at 14:15 +0800, heming.zhao@suse.com wrote:
> > 
> > 1. During boot phase, lvm2 automatically swithes to direct activation
> > mode
> > ("event_activation = 0"). After booted, switch back to the event
> > activation mode.
> > 
> > Booting phase is a speical stage. *During boot*, we could "pretend"
> > that direct
> > activation (event_activation=0) is set, and rely on lvm2-activation-
> > *.service
> > for PV detection. Once lvm2-activation-net.service has finished, we
> > could
> > "switch on" event activation.
> 
> I like this idea. Alternatively, we could discuss disabling event
> activation only in the "coldplug" phase after switching root (i.e.
> between start of systemd-udev-trigger.service and lvm2-
> activation.service), because that's the critical time span during which
> 1000s of events can happen simultaneously.

If there are say 1000 PVs already present on the system, there could be
real savings in having one lvm command process all 1000, and then switch
over to processing uevents for any further devices afterward.  The switch
over would be delicate because of the obvious races involved with new devs
appearing, but probably feasible.  I'd like to see the scale of
improvement from a proof of concept before we spend too much time on the
details.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 21:30     ` David Teigland
@ 2021-06-08  8:26       ` Martin Wilck
  2021-06-08 15:39         ` David Teigland
  2021-06-08 16:49         ` heming.zhao
  2021-06-08 16:18       ` heming.zhao
  1 sibling, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08  8:26 UTC (permalink / raw)
  To: teigland, linux-lvm; +Cc: rogerheflin, Heming Zhao, zkabelac

On Mo, 2021-06-07 at 16:30 -0500, David Teigland wrote:
> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
> > Most importantly, this was about LVM2 scanning of physical volumes.
> > The
> > number of udev workers has very little influence on PV scanning,
> > because the udev rules only activate systemd service. The actual
> > scanning takes place in lvm2-pvscan@.service. And unlike udev,
> > there's
> > no limit for the number of instances of a given systemd service
> > template that can run at any given time.
> 
> Excessive device scanning has been the historical problem in this area,
> but Heming mentioned dev_cache_scan() specifically as a problem.  That
> was
> surprising to me since it doesn't scan/read devices, it just creates a
> list of device names on the system (either readdir in /dev or udev
> listing.)  If there are still problems with excessive
> scannning/reading,
> we'll need some more diagnosis of what's happening, there could be some
> cases we've missed.

Heming didn't include his measurement results in the initial post.
Here's a small summary. Heming will be able to provide more details.
You'll see that the effects are quite drastic, factors 3-4 between
every step below, factor >60 between best and worst. I'd say these
results are typical for what we observe also on real-world systems.

kvm-qemu, 6 vcpu, 20G memory, 1258 scsi disks, 1015 vg/lv
Shown is "systemd-analyze blame" output.

 1) lvm2 2.03.05 (SUSE SLE15-SP2),
    obtain_device_list_from_udev=1 & event_activation=1
        9min 51.782s lvm2-pvscan@253:2.service
        9min 51.626s lvm2-pvscan@65:96.service
    (many other lvm2-pvscan@ services follow)
 2) lvm2 latest master
    obtain_device_list_from_udev=1 & event_activation=1
        2min 6.736s lvm2-pvscan@70:384.service         
        2min 6.628s lvm2-pvscan@70:400.service
 3) lvm2 latest master
    obtain_device_list_from_udev=0 & event_activation=1
            40.589s lvm2-pvscan@131:976.service
            40.589s lvm2-pvscan@131:928.service
 4) lvm2 latest master
    obtain_device_list_from_udev=0 & event_activation=0,
            21.034s dracut-initqueue.service
             8.674s lvm2-activation-early.service

IIUC, 2) is the effect of _pvscan_aa_quick(). 3) is surprising;
apparently libudev's device detection causes a factor 3 slowdown.
While 40s is not bad, you can see that event based activation still
performs far worse than "serial" device detection lvm2-activation-
early.service.

Personally, I'm sort of wary about obtain_device_list_from_udev=0
because I'm uncertain whether it might break multipath/MD detection.
Perhaps you can clarify that.

Regards
Martin



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 21:48   ` David Teigland
@ 2021-06-08 12:29     ` Peter Rajnoha
  2021-06-08 13:23       ` Martin Wilck
  0 siblings, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-06-08 12:29 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: zkabelac, Heming Zhao, Martin Wilck

On Mon 07 Jun 2021 16:48, David Teigland wrote:
> On Mon, Jun 07, 2021 at 03:48:30PM +0000, Martin Wilck wrote:
> > On So, 2021-06-06 at 14:15 +0800, heming.zhao@suse.com wrote:
> > > 
> > > 1. During boot phase, lvm2 automatically swithes to direct activation
> > > mode
> > > ("event_activation = 0"). After booted, switch back to the event
> > > activation mode.
> > > 
> > > Booting phase is a speical stage. *During boot*, we could "pretend"
> > > that direct
> > > activation (event_activation=0) is set, and rely on lvm2-activation-
> > > *.service
> > > for PV detection. Once lvm2-activation-net.service has finished, we
> > > could
> > > "switch on" event activation.
> > 
> > I like this idea. Alternatively, we could discuss disabling event
> > activation only in the "coldplug" phase after switching root (i.e.
> > between start of systemd-udev-trigger.service and lvm2-
> > activation.service), because that's the critical time span during which
> > 1000s of events can happen simultaneously.
> 
> If there are say 1000 PVs already present on the system, there could be
> real savings in having one lvm command process all 1000, and then switch
> over to processing uevents for any further devices afterward.  The switch
> over would be delicate because of the obvious races involved with new devs
> appearing, but probably feasible.

Maybe to avoid the race, we could possibly write the proposed
"/run/lvm2/boot-finished" right before we initiate scanning in "vgchange
-aay" that is a part of the lvm2-activation-net.service (the last
service to do the direct activation).

A few event-based pvscans could fire during the window between
"scan initiated phase" in lvm2-activation-net.service's "ExecStart=vgchange -aay..."
and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-finished",
but I think still better than missing important uevents completely in
this window.

Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 12:29     ` Peter Rajnoha
@ 2021-06-08 13:23       ` Martin Wilck
  2021-06-08 13:41         ` Peter Rajnoha
  2021-09-09 19:44         ` David Teigland
  0 siblings, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 13:23 UTC (permalink / raw)
  To: linux-lvm, prajnoha; +Cc: Heming Zhao, zkabelac

On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > 
> > If there are say 1000 PVs already present on the system, there
> > could be
> > real savings in having one lvm command process all 1000, and then
> > switch
> > over to processing uevents for any further devices afterward.  The
> > switch
> > over would be delicate because of the obvious races involved with
> > new devs
> > appearing, but probably feasible.
> 
> Maybe to avoid the race, we could possibly write the proposed
> "/run/lvm2/boot-finished" right before we initiate scanning in
> "vgchange
> -aay" that is a part of the lvm2-activation-net.service (the last
> service to do the direct activation).
> 
> A few event-based pvscans could fire during the window between
> "scan initiated phase" in lvm2-activation-net.service's
> "ExecStart=vgchange -aay..."
> and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
> finished",
> but I think still better than missing important uevents completely in
> this window.

That sounds reasonable. I was thinking along similar lines. Note that
in the case where we had problems lately, all actual activation (and
slowness) happened in lvm2-activation-early.service.

Regards,
Martin



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:23       ` Martin Wilck
@ 2021-06-08 13:41         ` Peter Rajnoha
  2021-06-08 13:46           ` Zdenek Kabelac
  2021-09-09 19:44         ` David Teigland
  1 sibling, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-06-08 13:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, Heming Zhao, linux-lvm

On Tue 08 Jun 2021 13:23, Martin Wilck wrote:
> On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> > On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > > 
> > > If there are say 1000 PVs already present on the system, there
> > > could be
> > > real savings in having one lvm command process all 1000, and then
> > > switch
> > > over to processing uevents for any further devices afterward.  The
> > > switch
> > > over would be delicate because of the obvious races involved with
> > > new devs
> > > appearing, but probably feasible.
> > 
> > Maybe to avoid the race, we could possibly write the proposed
> > "/run/lvm2/boot-finished" right before we initiate scanning in
> > "vgchange
> > -aay" that is a part of the lvm2-activation-net.service (the last
> > service to do the direct activation).
> > 
> > A few event-based pvscans could fire during the window between
> > "scan initiated phase" in lvm2-activation-net.service's
> > "ExecStart=vgchange -aay..."
> > and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
> > finished",
> > but I think still better than missing important uevents completely in
> > this window.
> 
> That sounds reasonable. I was thinking along similar lines. Note that
> in the case where we had problems lately, all actual activation (and
> slowness) happened in lvm2-activation-early.service.
> 

Yes, I think most of the activations are covered with the first service
where most of the devices are already present, then the rest is covered
by the other two services.

Anyway, I'd still like to know why exactly
obtain_device_list_from_udev=1 is so slow. The only thing that it does
is that it calls libudev's enumeration for "block" subsystem devs. We
don't even check if the device is intialized in udev in this case if I
remember correctly, so if there's any udev processing in parallel hapenning,
it shouldn't be slowing down. BUT we're waiting for udev records to
get initialized for filtering reasons, like mpath and MD component detection.
We should probably inspect this in detail and see where the time is really
taken underneath before we do any futher changes...

Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:41         ` Peter Rajnoha
@ 2021-06-08 13:46           ` Zdenek Kabelac
  2021-06-08 13:56             ` Peter Rajnoha
  0 siblings, 1 reply; 77+ messages in thread
From: Zdenek Kabelac @ 2021-06-08 13:46 UTC (permalink / raw)
  To: Peter Rajnoha, Martin Wilck; +Cc: Heming Zhao, linux-lvm

Dne 08. 06. 21 v 15:41 Peter Rajnoha napsal(a):
> On Tue 08 Jun 2021 13:23, Martin Wilck wrote:
>> On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
>>> On Mon 07 Jun 2021 16:48, David Teigland wrote:
>>>> If there are say 1000 PVs already present on the system, there
>>>> could be
>>>> real savings in having one lvm command process all 1000, and then
>>>> switch
>>>> over to processing uevents for any further devices afterward.  The
>>>> switch
>>>> over would be delicate because of the obvious races involved with
>>>> new devs
>>>> appearing, but probably feasible.
>>> Maybe to avoid the race, we could possibly write the proposed
>>> "/run/lvm2/boot-finished" right before we initiate scanning in
>>> "vgchange
>>> -aay" that is a part of the lvm2-activation-net.service (the last
>>> service to do the direct activation).
>>>
>>> A few event-based pvscans could fire during the window between
>>> "scan initiated phase" in lvm2-activation-net.service's
>>> "ExecStart=vgchange -aay..."
>>> and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
>>> finished",
>>> but I think still better than missing important uevents completely in
>>> this window.
>> That sounds reasonable. I was thinking along similar lines. Note that
>> in the case where we had problems lately, all actual activation (and
>> slowness) happened in lvm2-activation-early.service.
>>
> Yes, I think most of the activations are covered with the first service
> where most of the devices are already present, then the rest is covered
> by the other two services.
>
> Anyway, I'd still like to know why exactly
> obtain_device_list_from_udev=1 is so slow. The only thing that it does
> is that it calls libudev's enumeration for "block" subsystem devs. We
> don't even check if the device is intialized in udev in this case if I
> remember correctly, so if there's any udev processing in parallel hapenning,
> it shouldn't be slowing down. BUT we're waiting for udev records to
> get initialized for filtering reasons, like mpath and MD component detection.
> We should probably inspect this in detail and see where the time is really
> taken underneath before we do any futher changes...


This remains me - did we already fix the anoying problem of 'repeated' sleep 
for every 'unfinished' udev intialization?

I believe there should be exactly one sleep try to wait for udev and if it 
doesn't work - go with out.

But I've seen some trace where the sleep was repeatedly for each device were 
udev was 'uninitiated'.

Clearly this doesn't fix the problem of 'unitialized udev' but at least avoid 
extremely lengthy sleeping lvm command.

Zdene


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:46           ` Zdenek Kabelac
@ 2021-06-08 13:56             ` Peter Rajnoha
  2021-06-08 14:23               ` Zdenek Kabelac
  2021-06-08 14:48               ` Martin Wilck
  0 siblings, 2 replies; 77+ messages in thread
From: Peter Rajnoha @ 2021-06-08 13:56 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm, teigland, Heming Zhao, Martin Wilck

On Tue 08 Jun 2021 15:46, Zdenek Kabelac wrote:
> Dne 08. 06. 21 v 15:41 Peter Rajnoha napsal(a):
> > On Tue 08 Jun 2021 13:23, Martin Wilck wrote:
> > > On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> > > > On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > > > > If there are say 1000 PVs already present on the system, there
> > > > > could be
> > > > > real savings in having one lvm command process all 1000, and then
> > > > > switch
> > > > > over to processing uevents for any further devices afterward.  The
> > > > > switch
> > > > > over would be delicate because of the obvious races involved with
> > > > > new devs
> > > > > appearing, but probably feasible.
> > > > Maybe to avoid the race, we could possibly write the proposed
> > > > "/run/lvm2/boot-finished" right before we initiate scanning in
> > > > "vgchange
> > > > -aay" that is a part of the lvm2-activation-net.service (the last
> > > > service to do the direct activation).
> > > > 
> > > > A few event-based pvscans could fire during the window between
> > > > "scan initiated phase" in lvm2-activation-net.service's
> > > > "ExecStart=vgchange -aay..."
> > > > and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
> > > > finished",
> > > > but I think still better than missing important uevents completely in
> > > > this window.
> > > That sounds reasonable. I was thinking along similar lines. Note that
> > > in the case where we had problems lately, all actual activation (and
> > > slowness) happened in lvm2-activation-early.service.
> > > 
> > Yes, I think most of the activations are covered with the first service
> > where most of the devices are already present, then the rest is covered
> > by the other two services.
> > 
> > Anyway, I'd still like to know why exactly
> > obtain_device_list_from_udev=1 is so slow. The only thing that it does
> > is that it calls libudev's enumeration for "block" subsystem devs. We
> > don't even check if the device is intialized in udev in this case if I
> > remember correctly, so if there's any udev processing in parallel hapenning,
> > it shouldn't be slowing down. BUT we're waiting for udev records to
> > get initialized for filtering reasons, like mpath and MD component detection.
> > We should probably inspect this in detail and see where the time is really
> > taken underneath before we do any futher changes...
> 
> 
> This remains me - did we already fix the anoying problem of 'repeated' sleep
> for every 'unfinished' udev intialization?
> 
> I believe there should be exactly one sleep try to wait for udev and if it
> doesn't work - go with out.
> 
> But I've seen some trace where the sleep was repeatedly for each device were
> udev was 'uninitiated'.
> 
> Clearly this doesn't fix the problem of 'unitialized udev' but at least
> avoid extremely lengthy sleeping lvm command.

The sleep + iteration is still there!

The issue is that we're relying now on udev db records that contain
info about mpath and MD components - without this, the detection (and
hence filtering) could fail in certain cases. So if go without checking
udev db, that'll be a step back. As an alternative, we'd need to call
out mpath and MD directly from LVM2 if we really wanted to avoid
checking udev db (but then, we're checking the same thing that is
already checked by udev means).

Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:56             ` Peter Rajnoha
@ 2021-06-08 14:23               ` Zdenek Kabelac
  2021-06-08 14:48               ` Martin Wilck
  1 sibling, 0 replies; 77+ messages in thread
From: Zdenek Kabelac @ 2021-06-08 14:23 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: linux-lvm, teigland, Heming Zhao, Martin Wilck

Dne 08. 06. 21 v 15:56 Peter Rajnoha napsal(a):
> On Tue 08 Jun 2021 15:46, Zdenek Kabelac wrote:
>> Dne 08. 06. 21 v 15:41 Peter Rajnoha napsal(a):
>>> On Tue 08 Jun 2021 13:23, Martin Wilck wrote:
>>>> On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
>>>>> On Mon 07 Jun 2021 16:48, David Teigland wrote:
>>>>>> If there are say 1000 PVs already present on the system, there
>>>>>> could be
>>>>>> real savings in having one lvm command process all 1000, and then
>>>>>> switch
>>>>>> over to processing uevents for any further devices afterward.  The
>>>>>> switch
>>>>>> over would be delicate because of the obvious races involved with
>>>>>> new devs
>>>>>> appearing, but probably feasible.
>>>>> Maybe to avoid the race, we could possibly write the proposed
>>>>> "/run/lvm2/boot-finished" right before we initiate scanning in
>>>>> "vgchange
>>>>> -aay" that is a part of the lvm2-activation-net.service (the last
>>>>> service to do the direct activation).
>>>>>
>>>>> A few event-based pvscans could fire during the window between
>>>>> "scan initiated phase" in lvm2-activation-net.service's
>>>>> "ExecStart=vgchange -aay..."
>>>>> and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
>>>>> finished",
>>>>> but I think still better than missing important uevents completely in
>>>>> this window.
>>>> That sounds reasonable. I was thinking along similar lines. Note that
>>>> in the case where we had problems lately, all actual activation (and
>>>> slowness) happened in lvm2-activation-early.service.
>>>>
>>> Yes, I think most of the activations are covered with the first service
>>> where most of the devices are already present, then the rest is covered
>>> by the other two services.
>>>
>>> Anyway, I'd still like to know why exactly
>>> obtain_device_list_from_udev=1 is so slow. The only thing that it does
>>> is that it calls libudev's enumeration for "block" subsystem devs. We
>>> don't even check if the device is intialized in udev in this case if I
>>> remember correctly, so if there's any udev processing in parallel hapenning,
>>> it shouldn't be slowing down. BUT we're waiting for udev records to
>>> get initialized for filtering reasons, like mpath and MD component detection.
>>> We should probably inspect this in detail and see where the time is really
>>> taken underneath before we do any futher changes...
>>
>> This remains me - did we already fix the anoying problem of 'repeated' sleep
>> for every 'unfinished' udev intialization?
>>
>> I believe there should be exactly one sleep try to wait for udev and if it
>> doesn't work - go with out.
>>
>> But I've seen some trace where the sleep was repeatedly for each device were
>> udev was 'uninitiated'.
>>
>> Clearly this doesn't fix the problem of 'unitialized udev' but at least
>> avoid extremely lengthy sleeping lvm command.
> The sleep + iteration is still there!
>
> The issue is that we're relying now on udev db records that contain
> info about mpath and MD components - without this, the detection (and
> hence filtering) could fail in certain cases. So if go without checking
> udev db, that'll be a step back. As an alternative, we'd need to call
> out mpath and MD directly from LVM2 if we really wanted to avoid
> checking udev db (but then, we're checking the same thing that is
> already checked by udev means).


Few things here: I've already seen traces where we've been waiting for udev 
basically 'endlessly' - like if sleep actually does not help at all.

So either our command holds some lock - preventing 'udev' rule to finish -  or 
some other trouble is blocking it.

My point why we should wait 'just once' is - that if the 1st. sleep didn't 
help - likely all other next sleep for other devices won't help either.

So we may like report some 'garbage' if we don't have all the info from udev 
we need to - but at least it won't take so many minutes, and in some cases the 
device isn't actually needed for successful command completiion.

But of course we should figure out why udev isn't initialized in-time.


Zdenek


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:56             ` Peter Rajnoha
  2021-06-08 14:23               ` Zdenek Kabelac
@ 2021-06-08 14:48               ` Martin Wilck
  2021-06-08 15:19                 ` Peter Rajnoha
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 14:48 UTC (permalink / raw)
  To: zkabelac, prajnoha; +Cc: bmarzins, linux-lvm, teigland, Heming Zhao

On Di, 2021-06-08 at 15:56 +0200, Peter Rajnoha wrote:
> 
> The issue is that we're relying now on udev db records that contain
> info about mpath and MD components - without this, the detection (and
> hence filtering) could fail in certain cases. So if go without
> checking
> udev db, that'll be a step back. As an alternative, we'd need to call
> out mpath and MD directly from LVM2 if we really wanted to avoid
> checking udev db (but then, we're checking the same thing that is
> already checked by udev means).

Recent multipath-tools ships the "libmpathvalid" library that
could be used for this purpose, to make the logic comply with what
multipathd itself uses. It could be used as an alternative to libudev
for this part of the equation.

Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 14:48               ` Martin Wilck
@ 2021-06-08 15:19                 ` Peter Rajnoha
  2021-06-08 15:39                   ` Martin Wilck
  0 siblings, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-06-08 15:19 UTC (permalink / raw)
  To: Martin Wilck; +Cc: bmarzins, Heming Zhao, teigland, linux-lvm, zkabelac

On Tue 08 Jun 2021 14:48, Martin Wilck wrote:
> On Di, 2021-06-08 at 15:56 +0200, Peter Rajnoha wrote:
> > 
> > The issue is that we're relying now on udev db records that contain
> > info about mpath and MD components - without this, the detection (and
> > hence filtering) could fail in certain cases. So if go without
> > checking
> > udev db, that'll be a step back. As an alternative, we'd need to call
> > out mpath and MD directly from LVM2 if we really wanted to avoid
> > checking udev db (but then, we're checking the same thing that is
> > already checked by udev means).
> 
> Recent multipath-tools ships the "libmpathvalid" library that
> could be used for this purpose, to make the logic comply with what
> multipathd itself uses. It could be used as an alternative to libudev
> for this part of the equation.

Ah, yes, sure! Still, we have the MD in play...

Out of curiosity - if you disable mpath and MD component detection in
lvm.conf, can you still hit the issue? (Of course, if you're not using
any of the two in your stack.)

Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 15:19                 ` Peter Rajnoha
@ 2021-06-08 15:39                   ` Martin Wilck
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 15:39 UTC (permalink / raw)
  To: prajnoha; +Cc: bmarzins, zkabelac, teigland, linux-lvm, Heming Zhao

On Di, 2021-06-08 at 17:19 +0200, Peter Rajnoha wrote:
> On Tue 08 Jun 2021 14:48, Martin Wilck wrote:
> > 
> > Recent multipath-tools ships the "libmpathvalid" library that
> > could be used for this purpose, to make the logic comply with what
> > multipathd itself uses. It could be used as an alternative to
> > libudev
> > for this part of the equation.
> 
> Ah, yes, sure! Still, we have the MD in play...

That's easier than multipath, because you can check for RAID meta data.

> Out of curiosity - if you disable mpath and MD component detection in
> lvm.conf, can you still hit the issue? (Of course, if you're not
> using
> any of the two in your stack.)

The performance issue, you mean? Heming would need to answer that.

Thanks,
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08  8:26       ` Martin Wilck
@ 2021-06-08 15:39         ` David Teigland
  2021-06-08 15:47           ` Martin Wilck
  2021-06-15 17:03           ` David Teigland
  2021-06-08 16:49         ` heming.zhao
  1 sibling, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-06-08 15:39 UTC (permalink / raw)
  To: Martin Wilck; +Cc: rogerheflin, zkabelac, prajnoha, Heming Zhao, linux-lvm

On Tue, Jun 08, 2021 at 08:26:01AM +0000, Martin Wilck wrote:
> IIUC, 2) is the effect of _pvscan_aa_quick(). 3) is surprising;
> apparently libudev's device detection causes a factor 3 slowdown.
> While 40s is not bad, you can see that event based activation still
> performs far worse than "serial" device detection lvm2-activation-
> early.service.
> 
> Personally, I'm sort of wary about obtain_device_list_from_udev=0
> because I'm uncertain whether it might break multipath/MD detection.
> Perhaps you can clarify that.

Yes, that's an issue, but it's something we've needed to clean up for a
while, and I made a small start on it a while back.

obtain_device_list_from_udev is supposed to only control whether lvm gets
a list of device names from readdir /dev, or from libudev.  My preference
is default 0, readdir /dev.  This avoids the performance problem and makes
lvm less dependent on the vagaries of udev in general.

But, as you and Peter mentioned, obtain_device_list_from_udev also became
entangled with md/mpath detection methods, which is more closely related
to external_device_info_source=udev|none.

I think it would be an improvement to:

. Make obtain_device_list_from_udev only control how we get the device
  list. Then we can easily default to 0 and readdir /dev if it's better.

. Use both native md/mpath detection *and* udev info when it's readily
  available (don't wait for it), instead of limiting ourselves to one
  source of info.  If either source indicates an md/mpath component,
  then we consider it true.

The second point means we are free to change obtain_device_list_from_udev
as we wish, without affecting md/mpath detection.  It may also improve
md/mpath detection overall.

A third related improvement that could follow is to add stronger native
mpath detection, in which lvm uses uses /etc/multipath/wwids, directly or
through a multipath library, to identify mpath components.  This would
supplement the existing sysfs and udev sources, and address the difficult
case where the mpath device is not yet set up.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 15:39         ` David Teigland
@ 2021-06-08 15:47           ` Martin Wilck
  2021-06-08 16:02             ` Zdenek Kabelac
  2021-06-08 16:03             ` David Teigland
  2021-06-15 17:03           ` David Teigland
  1 sibling, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 15:47 UTC (permalink / raw)
  To: teigland; +Cc: rogerheflin, Heming Zhao, prajnoha, linux-lvm, zkabelac

On Di, 2021-06-08 at 10:39 -0500, David Teigland wrote:
> 
> . Use both native md/mpath detection *and* udev info when it's
> readily
>   available (don't wait for it), instead of limiting ourselves to one
>   source of info.  If either source indicates an md/mpath component,
>   then we consider it true.

Hm. You can boot with "multipath=off" which udev would take into
account. What would you do in that case? Native mpath detection would
probably not figure it out.

multipath-tools itself follows the "try udev and fall back to native if
it fails" approach, which isn't always perfect, either.

> A third related improvement that could follow is to add stronger
> native
> mpath detection, in which lvm uses uses /etc/multipath/wwids,
> directly or
> through a multipath library, to identify mpath components.  This
> would
> supplement the existing sysfs and udev sources, and address the
> difficult
> case where the mpath device is not yet set up.
> 

Please don't. Use libmpathvalid if you want to improve in this area.
That's what it was made for.

Regards,
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 15:47           ` Martin Wilck
@ 2021-06-08 16:02             ` Zdenek Kabelac
  2021-06-08 16:05               ` Martin Wilck
  2021-06-08 16:03             ` David Teigland
  1 sibling, 1 reply; 77+ messages in thread
From: Zdenek Kabelac @ 2021-06-08 16:02 UTC (permalink / raw)
  To: Martin Wilck, teigland; +Cc: rogerheflin, linux-lvm, prajnoha, Heming Zhao

Dne 08. 06. 21 v 17:47 Martin Wilck napsal(a):
> On Di, 2021-06-08 at 10:39 -0500, David Teigland wrote:
>> . Use both native md/mpath detection *and* udev info when it's
>> readily
>>    available (don't wait for it), instead of limiting ourselves to one
>>    source of info.  If either source indicates an md/mpath component,
>>    then we consider it true.
> Hm. You can boot with "multipath=off" which udev would take into
> account. What would you do in that case? Native mpath detection would
> probably not figure it out.
>
> multipath-tools itself follows the "try udev and fall back to native if
> it fails" approach, which isn't always perfect, either.
>
>> A third related improvement that could follow is to add stronger
>> native
>> mpath detection, in which lvm uses uses /etc/multipath/wwids,
>> directly or
>> through a multipath library, to identify mpath components.  This
>> would
>> supplement the existing sysfs and udev sources, and address the
>> difficult
>> case where the mpath device is not yet set up.
>>
> Please don't. Use libmpathvalid if you want to improve in this area.
> That's what it was made for.

Problem is addition of another dependency here.

We may probably think about using  'dlopen' and if library is present use it, 
but IMHO  libmpathvalid should be integrated into libblkid  in some way  - 
linking another library to many other projects that needs to detect MP devices 
really complicates this a lot.  libblkid should be able to decode this and 
make things much cleaner.

I'd also vote for lvm2 plugin for blkid as forking thousands of process simply 
always will take a lot of time (but this would require quite some code 
shufling withing lvm codebase).

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 15:47           ` Martin Wilck
  2021-06-08 16:02             ` Zdenek Kabelac
@ 2021-06-08 16:03             ` David Teigland
  2021-06-08 16:07               ` Martin Wilck
  1 sibling, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-06-08 16:03 UTC (permalink / raw)
  To: Martin Wilck; +Cc: rogerheflin, Heming Zhao, prajnoha, linux-lvm, zkabelac

On Tue, Jun 08, 2021 at 03:47:39PM +0000, Martin Wilck wrote:
> Hm. You can boot with "multipath=off" which udev would take into
> account. What would you do in that case? Native mpath detection would
> probably not figure it out.

libmpathvalid will still know?

> Please don't. Use libmpathvalid if you want to improve in this area.
> That's what it was made for.

Yes, that sounds better (it didn't exist when I last experimented with
this idea.)

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 16:02             ` Zdenek Kabelac
@ 2021-06-08 16:05               ` Martin Wilck
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 16:05 UTC (permalink / raw)
  To: zkabelac, teigland; +Cc: rogerheflin, linux-lvm, prajnoha, Heming Zhao

On Di, 2021-06-08 at 18:02 +0200, Zdenek Kabelac wrote:
> > 
> > > A third related improvement that could follow is to add stronger
> > > native
> > > mpath detection, in which lvm uses uses /etc/multipath/wwids,
> > > directly or
> > > through a multipath library, to identify mpath components.  This
> > > would
> > > supplement the existing sysfs and udev sources, and address the
> > > difficult
> > > case where the mpath device is not yet set up.
> > > 
> > Please don't. Use libmpathvalid if you want to improve in this
> > area.
> > That's what it was made for.
> 
> Problem is addition of another dependency here.
> 
> We may probably think about using  'dlopen' and if library is present
> use it, 
> but IMHO  libmpathvalid should be integrated into libblkid  in some
> way  - 
> linking another library to many other projects that needs to detect
> MP devices 
> really complicates this a lot.  libblkid should be able to decode
> this and 
> make things much cleaner.

Fair enough. I just wanted to say "don't start hardcoding anything new
in lvm2". Currently, you won't find libmpathvalid on many
distributions.

Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 16:03             ` David Teigland
@ 2021-06-08 16:07               ` Martin Wilck
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-06-08 16:07 UTC (permalink / raw)
  To: teigland; +Cc: rogerheflin, Heming Zhao, prajnoha, linux-lvm, zkabelac

On Di, 2021-06-08 at 11:03 -0500, David Teigland wrote:
> On Tue, Jun 08, 2021 at 03:47:39PM +0000, Martin Wilck wrote:
> > Hm. You can boot with "multipath=off" which udev would take into
> > account. What would you do in that case? Native mpath detection
> > would
> > probably not figure it out.
> 
> libmpathvalid will still know?

No. You got me :-)
But it would be the right place for implementing this if required.


Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-07 21:30     ` David Teigland
  2021-06-08  8:26       ` Martin Wilck
@ 2021-06-08 16:18       ` heming.zhao
  2021-06-09  4:01         ` heming.zhao
  1 sibling, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-06-08 16:18 UTC (permalink / raw)
  To: David Teigland, Martin Wilck; +Cc: rogerheflin, zkabelac, linux-lvm

On 6/8/21 5:30 AM, David Teigland wrote:
> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
>> Most importantly, this was about LVM2 scanning of physical volumes. The
>> number of udev workers has very little influence on PV scanning,
>> because the udev rules only activate systemd service. The actual
>> scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
>> no limit for the number of instances of a given systemd service
>> template that can run at any given time.
> 
> Excessive device scanning has been the historical problem in this area,
> but Heming mentioned dev_cache_scan() specifically as a problem.  That was
> surprising to me since it doesn't scan/read devices, it just creates a
> list of device names on the system (either readdir in /dev or udev
> listing.)  If there are still problems with excessive scannning/reading,
> we'll need some more diagnosis of what's happening, there could be some
> cases we've missed.
> 

the dev_cache_scan doesn't have direct disk IOs, but libudev will scan/read
udev db which issue real disk IOs (location is /run/udev/data).
we can see with combination "obtain_device_list_from_udev=0 &
event_activation=1" could largely reduce booting time from 2min6s to 40s.
the key is dev_cache_scan() does the scan device by itself (scaning "/dev").

I am not very familiar with systemd-udev, below shows a little more info
about libudev path. the top function is _insert_udev_dir, this function:
1. scans/reads /sys/class/block/. O(n)
2. scans/reads udev db (/run/udev/data). may O(n)
    udev will call device_read_db => handle_db_line to handle every
    line of a db file.
3. does qsort & deduplication the devices list.  O(n) + O(n)
4. has lots of "memory alloc" & "string copy" actions during working.
    it takes too much memory, from the host side, use 'top' can see:
    - direct activation only used 2G memory during boot
    - event activation cost ~20G memory.

I didn't test the related udev code, and guess the <2> takes too much time.
And there are thousand scanning job parallel in /run/udev/data, meanwhile
there are many devices need to generate udev db file in the same dir. I am
not sure if the filesystem can perfect handle this scenario.
the another code path, obtain_device_list_from_udev=0, which triggers to
scan/read "/dev", this dir has less write IOs than /run/udev/data.

Regards
heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08  8:26       ` Martin Wilck
  2021-06-08 15:39         ` David Teigland
@ 2021-06-08 16:49         ` heming.zhao
  1 sibling, 0 replies; 77+ messages in thread
From: heming.zhao @ 2021-06-08 16:49 UTC (permalink / raw)
  To: Martin Wilck, teigland, linux-lvm; +Cc: rogerheflin, zkabelac

On 6/8/21 4:26 PM, Martin Wilck wrote:
> On Mo, 2021-06-07 at 16:30 -0500, David Teigland wrote:
>> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
>>> Most importantly, this was about LVM2 scanning of physical volumes.
>>> The
>>> number of udev workers has very little influence on PV scanning,
>>> because the udev rules only activate systemd service. The actual
>>> scanning takes place in lvm2-pvscan@.service. And unlike udev,
>>> there's
>>> no limit for the number of instances of a given systemd service
>>> template that can run at any given time.
>>
>> Excessive device scanning has been the historical problem in this area,
>> but Heming mentioned dev_cache_scan() specifically as a problem.  That
>> was
>> surprising to me since it doesn't scan/read devices, it just creates a
>> list of device names on the system (either readdir in /dev or udev
>> listing.)  If there are still problems with excessive
>> scannning/reading,
>> we'll need some more diagnosis of what's happening, there could be some
>> cases we've missed.
> 
> Heming didn't include his measurement results in the initial post.
> Here's a small summary. Heming will be able to provide more details.
> You'll see that the effects are quite drastic, factors 3-4 between
> every step below, factor >60 between best and worst. I'd say these
> results are typical for what we observe also on real-world systems.
> 
> kvm-qemu, 6 vcpu, 20G memory, 1258 scsi disks, 1015 vg/lv
> Shown is "systemd-analyze blame" output.
> 
>   1) lvm2 2.03.05 (SUSE SLE15-SP2),
>      obtain_device_list_from_udev=1 & event_activation=1
>          9min 51.782s lvm2-pvscan@253:2.service
>          9min 51.626s lvm2-pvscan@65:96.service
>      (many other lvm2-pvscan@ services follow)
>   2) lvm2 latest master
>      obtain_device_list_from_udev=1 & event_activation=1
>          2min 6.736s lvm2-pvscan@70:384.service
>          2min 6.628s lvm2-pvscan@70:400.service
>   3) lvm2 latest master
>      obtain_device_list_from_udev=0 & event_activation=1
>              40.589s lvm2-pvscan@131:976.service
>              40.589s lvm2-pvscan@131:928.service
>   4) lvm2 latest master
>      obtain_device_list_from_udev=0 & event_activation=0,
>              21.034s dracut-initqueue.service
>               8.674s lvm2-activation-early.service
> 
> IIUC, 2) is the effect of _pvscan_aa_quick(). 3) is surprising;
> apparently libudev's device detection causes a factor 3 slowdown.
> While 40s is not bad, you can see that event based activation still
> performs far worse than "serial" device detection lvm2-activation-
> early.service.
> 
> Personally, I'm sort of wary about obtain_device_list_from_udev=0
> because I'm uncertain whether it might break multipath/MD detection.
> Perhaps you can clarify that.
> 
> Regards
> Martin
> 
> 

my latest test results. there combines 3 cfg items:
devices/obtain_device_list_from_udev
global/event_activation
activation/udev_sync

<0> is under lvm2-2.03.05+
<1> ~ <8> is under lvm2-2.03.12+

all results are from "systemd-analyze blame", and I only
post top n services.


0>
with suse 15sp2 lvm2 version: lvm2-2.03.05+
"systemd-analyze blame" show the top serives:

devices/obtain_device_list_from_udev=1
global/event_activation=1
activation/udev_sync=1

     9min 51.782s lvm2-pvscan@253:2.service <===
     9min 51.626s lvm2-pvscan@65:96.service
     9min 51.625s lvm2-pvscan@65:208.service
     9min 51.624s lvm2-pvscan@65:16.service
     9min 51.622s lvm2-pvscan@8:176.service
     9min 51.614s lvm2-pvscan@65:144.service

1>
devices/obtain_device_list_from_udev=1
global/event_activation=0
activation/udev_sync=0

          18.307s dracut-initqueue.service
           6.168s btrfsmaintenance-refresh.service
           4.327s systemd-udev-settle.service
           3.633s wicked.service
           2.976s lvm2-activation-early.service  <===
           1.560s lvm2-pvscan@135:832.service
           1.559s lvm2-pvscan@135:816.service
           1.558s lvm2-pvscan@135:784.service
           1.558s lvm2-pvscan@134:976.service
           1.557s lvm2-pvscan@134:832.service
           1.556s dev-system-swap.swap
           1.554s lvm2-pvscan@134:992.service
           1.553s lvm2-pvscan@134:1008.service

2>
devices/obtain_device_list_from_udev=0
global/event_activation=0
activation/udev_sync=0

          17.164s dracut-initqueue.service
          10.420s wicked.service
           7.109s btrfsmaintenance-refresh.service
           4.471s systemd-udev-settle.service
           3.415s lvm2-activation-early.service <===
           1.679s lvm2-pvscan@135:816.service
           1.678s lvm2-pvscan@135:832.service
           1.677s lvm2-pvscan@134:992.service
           1.675s lvm2-pvscan@135:784.service
           1.674s lvm2-pvscan@134:928.service
           1.673s lvm2-pvscan@134:896.service
           1.673s dev-system-swap.swap
           1.672s lvm2-pvscan@134:1008.service


3>
devices/obtain_device_list_from_udev=1
global/event_activation=0
activation/udev_sync=1

          17.552s dracut-initqueue.service
           7.401s lvm2-activation-early.service <====
           6.519s btrfsmaintenance-refresh.service
           5.375s systemd-udev-settle.service
           3.588s wicked.service
           1.723s wickedd-nanny.service
           1.686s wickedd.service
           1.655s lvm2-pvscan@129:992.service
           1.654s lvm2-pvscan@129:960.service
           1.653s lvm2-pvscan@129:896.service
           1.652s lvm2-pvscan@130:784.service
           1.651s lvm2-pvscan@130:768.service


4>
devices/obtain_device_list_from_udev=0
global/event_activation=0
activation/udev_sync=1

          17.975s dracut-initqueue.service
          10.162s wicked.service
           8.238s lvm2-activation-early.service  <===
           6.955s btrfsmaintenance-refresh.service
           4.444s systemd-udev-settle.service
           1.800s rsyslog.service
           1.768s wickedd.service
           1.751s kbdsettings.service
           1.751s kdump-early.service
           1.602s lvm2-pvscan@135:832.service
           1.601s lvm2-pvscan@135:816.service
           1.601s lvm2-pvscan@135:784.service
           1.600s lvm2-pvscan@134:1008.service
           1.599s dev-system-swap.swap
           1.598s lvm2-pvscan@134:832.service

5>
devices/obtain_device_list_from_udev=0
global/event_activation=1
activation/udev_sync=1

          34.908s dracut-initqueue.service
          25.440s systemd-udev-settle.service
          23.335s lvm2-pvscan@66:832.service  <===
          23.335s lvm2-pvscan@65:976.service
          23.335s lvm2-pvscan@66:784.service
          23.335s lvm2-pvscan@65:816.service
          23.335s lvm2-pvscan@8:976.service
          23.327s lvm2-pvscan@66:864.service
          23.323s lvm2-pvscan@66:848.service
          23.316s lvm2-pvscan@65:800.service

6>
devices/obtain_device_list_from_udev=0
global/event_activation=1
activation/udev_sync=0

          36.222s lvm2-pvscan@134:912.service <===
          36.222s lvm2-pvscan@134:816.service
          36.222s lvm2-pvscan@134:784.service
          36.221s lvm2-pvscan@133:816.service
          36.221s lvm2-pvscan@133:848.service
          36.220s lvm2-pvscan@133:928.service
          36.220s lvm2-pvscan@133:768.service
          36.219s lvm2-pvscan@133:992.service
          36.218s lvm2-pvscan@133:784.service
          36.218s lvm2-pvscan@134:800.service
          36.218s lvm2-pvscan@133:864.service
          36.217s lvm2-pvscan@133:896.service
          36.209s lvm2-pvscan@133:960.service
          36.197s lvm2-pvscan@134:1008.service


7>
devices/obtain_device_list_from_udev=1
global/event_activation=1
activation/udev_sync=1

      2min 6.736s lvm2-pvscan@70:384.service <===
      2min 6.628s lvm2-pvscan@70:400.service
      2min 6.554s lvm2-pvscan@69:432.service
      2min 6.518s lvm2-pvscan@69:480.service
      2min 6.478s lvm2-pvscan@69:416.service
      2min 6.277s lvm2-pvscan@69:464.service
      2min 5.791s lvm2-pvscan@69:544.service


8>
devices/obtain_device_list_from_udev=1
global/event_activation=1
activation/udev_sync=0

     2min 27.091s lvm2-pvscan@129:944.service <===
     2min 26.952s lvm2-pvscan@129:912.service
     2min 26.950s lvm2-pvscan@129:880.service
     2min 26.947s lvm2-pvscan@129:960.service
     2min 26.947s lvm2-pvscan@129:928.service
     2min 26.947s lvm2-pvscan@129:832.service
     2min 26.938s lvm2-pvscan@129:848.service
     2min 26.733s lvm2-pvscan@129:864.service
     2min 16.241s lvm2-pvscan@66:976.service
     2min 15.166s lvm2-pvscan@66:992.service


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 16:18       ` heming.zhao
@ 2021-06-09  4:01         ` heming.zhao
  2021-06-09  5:37           ` Heming Zhao
  0 siblings, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-06-09  4:01 UTC (permalink / raw)
  To: David Teigland, Martin Wilck; +Cc: rogerheflin, zkabelac, linux-lvm

On 6/9/21 12:18 AM, heming.zhao@suse.com wrote:
> On 6/8/21 5:30 AM, David Teigland wrote:
>> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
>>> Most importantly, this was about LVM2 scanning of physical volumes. The
>>> number of udev workers has very little influence on PV scanning,
>>> because the udev rules only activate systemd service. The actual
>>> scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
>>> no limit for the number of instances of a given systemd service
>>> template that can run at any given time.
>>
>> Excessive device scanning has been the historical problem in this area,
>> but Heming mentioned dev_cache_scan() specifically as a problem.  That was
>> surprising to me since it doesn't scan/read devices, it just creates a
>> list of device names on the system (either readdir in /dev or udev
>> listing.)  If there are still problems with excessive scannning/reading,
>> we'll need some more diagnosis of what's happening, there could be some
>> cases we've missed.
>>
> 
> the dev_cache_scan doesn't have direct disk IOs, but libudev will scan/read
> udev db which issue real disk IOs (location is /run/udev/data).
> we can see with combination "obtain_device_list_from_udev=0 &
> event_activation=1" could largely reduce booting time from 2min6s to 40s.
> the key is dev_cache_scan() does the scan device by itself (scaning "/dev").
> 
> I am not very familiar with systemd-udev, below shows a little more info
> about libudev path. the top function is _insert_udev_dir, this function:
> 1. scans/reads /sys/class/block/. O(n)
> 2. scans/reads udev db (/run/udev/data). may O(n)
>    udev will call device_read_db => handle_db_line to handle every
>    line of a db file.
> 3. does qsort & deduplication the devices list.  O(n) + O(n)
> 4. has lots of "memory alloc" & "string copy" actions during working.
>    it takes too much memory, from the host side, use 'top' can see:
>    - direct activation only used 2G memory during boot
>    - event activation cost ~20G memory.
> 
> I didn't test the related udev code, and guess the <2> takes too much time.
> And there are thousand scanning job parallel in /run/udev/data, meanwhile
> there are many devices need to generate udev db file in the same dir. I am
> not sure if the filesystem can perfect handle this scenario.
> the another code path, obtain_device_list_from_udev=0, which triggers to
> scan/read "/dev", this dir has less write IOs than /run/udev/data.
> 
> Regards
> heming

I made a minor mistake: above <3> qsort time is O(logn).

More info about my analysis:
I set filter in lvm.conf, the rule: filter = [ "a|/dev/vda2|", "r|.*|" ]
the booting time reduced a little, from 2min 6s to 1min 42s.

The vm vda2 layout:
# lsblk | egrep -A 4 "^vd"
vda                 253:0     0  40G  0 disk
├─vda1              253:1     0   8M  0 part
└─vda2              253:2     0  40G  0 part
   ├─system-swap     254:0     0   2G  0 lvm  [SWAP]
   └─system-root     254:1     0  35G  0 lvm  /

the filter rule denies all the LVs except rootfs LVs.

the rule makes _pvscan_cache_args() to remove dev from devl->list by nodata filters.
the hot spot narrow to setup_devices (calling dev_cache_scan()).



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-09  4:01         ` heming.zhao
@ 2021-06-09  5:37           ` Heming Zhao
  2021-06-09 18:59             ` David Teigland
  0 siblings, 1 reply; 77+ messages in thread
From: Heming Zhao @ 2021-06-09  5:37 UTC (permalink / raw)
  To: David Teigland, Martin Wilck; +Cc: rogerheflin, zkabelac, linux-lvm

Either my mailbox or lvm mail list is broken, I can't see my last two mails appear in the mail list.

This mail I will mention another issue about lvm2-pvscan@.service.
both event activation and direct activation have same issue:
the shutdown take much time.

the code logic in pvscan_cache_cmd() only takes effect on activation job:
```
    if (do_activate &&
        !find_config_tree_bool(cmd, global_event_activation_CFG, NULL)) {
        log_verbose("Ignoring pvscan --cache -aay because event_activation is disabled.");
        return ECMD_PROCESSED;
    }
```

and I have a question about the script  lvm2-pvscan@.service:
why there also does a scan job when stopping? could we remove/modify this line?
  ```
  ExecStop=@SBINDIR@/lvm pvscan --cache %i
  ```

Regards
heming

On 6/9/21 12:18 AM, heming.zhao@suse.com wrote:
> On 6/8/21 5:30 AM, David Teigland wrote:
>> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
>>> Most importantly, this was about LVM2 scanning of physical volumes. The
>>> number of udev workers has very little influence on PV scanning,
>>> because the udev rules only activate systemd service. The actual
>>> scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
>>> no limit for the number of instances of a given systemd service
>>> template that can run at any given time.
>>
>> Excessive device scanning has been the historical problem in this area,
>> but Heming mentioned dev_cache_scan() specifically as a problem.  That was
>> surprising to me since it doesn't scan/read devices, it just creates a
>> list of device names on the system (either readdir in /dev or udev
>> listing.)  If there are still problems with excessive scannning/reading,
>> we'll need some more diagnosis of what's happening, there could be some
>> cases we've missed.
>>
> 
> the dev_cache_scan doesn't have direct disk IOs, but libudev will scan/read
> udev db which issue real disk IOs (location is /run/udev/data).
> we can see with combination "obtain_device_list_from_udev=0 &
> event_activation=1" could largely reduce booting time from 2min6s to 40s.
> the key is dev_cache_scan() does the scan device by itself (scaning "/dev").
> 
> I am not very familiar with systemd-udev, below shows a little more info
> about libudev path. the top function is _insert_udev_dir, this function:
> 1. scans/reads /sys/class/block/. O(n)
> 2. scans/reads udev db (/run/udev/data). may O(n)
>    udev will call device_read_db => handle_db_line to handle every
>    line of a db file.
> 3. does qsort & deduplication the devices list.  O(n) + O(n)
> 4. has lots of "memory alloc" & "string copy" actions during working.
>    it takes too much memory, from the host side, use 'top' can see:
>    - direct activation only used 2G memory during boot
>    - event activation cost ~20G memory.
> 
> I didn't test the related udev code, and guess the <2> takes too much time.
> And there are thousand scanning job parallel in /run/udev/data, meanwhile
> there are many devices need to generate udev db file in the same dir. I am
> not sure if the filesystem can perfect handle this scenario.
> the another code path, obtain_device_list_from_udev=0, which triggers to
> scan/read "/dev", this dir has less write IOs than /run/udev/data.
> 
> Regards
> heming



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-09  5:37           ` Heming Zhao
@ 2021-06-09 18:59             ` David Teigland
  2021-06-10 17:23               ` heming.zhao
  0 siblings, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-06-09 18:59 UTC (permalink / raw)
  To: Heming Zhao; +Cc: zkabelac, rogerheflin, linux-lvm, Martin Wilck

On Wed, Jun 09, 2021 at 05:37:47AM +0000, Heming Zhao wrote:
> This mail I will mention another issue about lvm2-pvscan@.service.
> both event activation and direct activation have same issue:
> the shutdown take much time.
> 
> the code logic in pvscan_cache_cmd() only takes effect on activation job:
> ```
>     if (do_activate &&
>         !find_config_tree_bool(cmd, global_event_activation_CFG, NULL)) {
>         log_verbose("Ignoring pvscan --cache -aay because event_activation is disabled.");
>         return ECMD_PROCESSED;
>     }
> ```

Good point, event_activation=0 should also apply to pvscan on device
removal.

> and I have a question about the script  lvm2-pvscan@.service:
> why there also does a scan job when stopping? could we remove/modify this line?
>   ```
>   ExecStop=@SBINDIR@/lvm pvscan --cache %i
>   ```

This removes the /run/lvm/pvs_online/ file for the device.  If the PVs for
the VG are all removed, and then they are all reattached, pvscan will
autoactivate the VG again.  This reactivation isn't a core capability, or
one that we've explicitly mentioned or supported, but it's there.  I can
imagine that reactivation may be undesirable in many cases, and it's
certainly reasonable to remove the ExecStop as needed.  Some time ago I
suggested that we stop doing repeated autoactivation entirely, which would
let us remove the ExecStop.  But we don't know the extent to which users
depend on this behavior, so we haven't considered it further.  Perhaps
pvscan could detect system shutdown and exit directly without doing
anything?

For the future, the new udev rule I mentioned in an earlier message no
longer does this and removes the lvm2-pvscan service.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-09 18:59             ` David Teigland
@ 2021-06-10 17:23               ` heming.zhao
  0 siblings, 0 replies; 77+ messages in thread
From: heming.zhao @ 2021-06-10 17:23 UTC (permalink / raw)
  To: David Teigland; +Cc: zkabelac, rogerheflin, linux-lvm, Martin Wilck

On 6/10/21 2:59 AM, David Teigland wrote:
> On Wed, Jun 09, 2021 at 05:37:47AM +0000, Heming Zhao wrote:
>> This mail I will mention another issue about lvm2-pvscan@.service.
>> both event activation and direct activation have same issue:
>> the shutdown take much time.
>>
>> the code logic in pvscan_cache_cmd() only takes effect on activation job:
>> ```
>>      if (do_activate &&
>>          !find_config_tree_bool(cmd, global_event_activation_CFG, NULL)) {
>>          log_verbose("Ignoring pvscan --cache -aay because event_activation is disabled.");
>>          return ECMD_PROCESSED;
>>      }
>> ```
> 
> Good point, event_activation=0 should also apply to pvscan on device
> removal.
> 
>> and I have a question about the script  lvm2-pvscan@.service:
>> why there also does a scan job when stopping? could we remove/modify this line?
>>    ```
>>    ExecStop=@SBINDIR@/lvm pvscan --cache %i
>>    ```
> 
> This removes the /run/lvm/pvs_online/ file for the device.  If the PVs for
> the VG are all removed, and then they are all reattached, pvscan will
> autoactivate the VG again.  This reactivation isn't a core capability, or
> one that we've explicitly mentioned or supported, but it's there.  I can
> imagine that reactivation may be undesirable in many cases, and it's
> certainly reasonable to remove the ExecStop as needed.  Some time ago I
> suggested that we stop doing repeated autoactivation entirely, which would
> let us remove the ExecStop.  But we don't know the extent to which users
> depend on this behavior, so we haven't considered it further.  Perhaps
> pvscan could detect system shutdown and exit directly without doing
> anything?
> 
> For the future, the new udev rule I mentioned in an earlier message no
> longer does this and removes the lvm2-pvscan service.
> 
> Dave
> 

Thank you for your explaination. I am reading your code in dev-next branch.

Heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 15:39         ` David Teigland
  2021-06-08 15:47           ` Martin Wilck
@ 2021-06-15 17:03           ` David Teigland
  2021-06-15 18:21             ` Zdenek Kabelac
  2021-06-16 16:18             ` heming.zhao
  1 sibling, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-06-15 17:03 UTC (permalink / raw)
  To: Martin Wilck; +Cc: rogerheflin, prajnoha, linux-lvm, Heming Zhao, zkabelac

On Tue, Jun 08, 2021 at 10:39:37AM -0500, David Teigland wrote:
> I think it would be an improvement to:
> 
> . Make obtain_device_list_from_udev only control how we get the device
>   list. Then we can easily default to 0 and readdir /dev if it's better.
> 
> . Use both native md/mpath detection *and* udev info when it's readily
>   available (don't wait for it), instead of limiting ourselves to one
>   source of info.  If either source indicates an md/mpath component,
>   then we consider it true.
> 
> The second point means we are free to change obtain_device_list_from_udev
> as we wish, without affecting md/mpath detection.  It may also improve
> md/mpath detection overall.

Here are the initial patches I'm testing (libmpathvalid not yet added)
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-device-info-1

devices: rework native and udev device info

. Make the obtain_device_list_from_udev setting
  affect only the choice of readdir /dev vs libudev
  for a device name listing.  The setting no longer
  controls if udev is used for device type checks.

. Change obtain_device_list_from_udev default to 0.
  A list of device names is obtained from readdir /dev
  by default, which is faster than libudev (especially
  with many devices.)

. Change external_device_info_source="none" behavior:
  remove libudev device info queries for "none", so
  libudev usage will be avoided entirely. "none"
  remains the default setting.

. Change external_device_info_source="udev" behavior:
  information from libudev is added to the native
  device info rather than replacing the native device
  info. This may be useful if there is some gap in
  the lvm native device info.

. Remove sleep/retry loop when attempting libudev
  queries for device info.  udev info will simply
  be skipped if it's not immediately available.
  Because udev info is supplementary to native
  info, it's not essential to get it.

filter-usable: remove udev dev size check

For the pv_min_size check, always use dev_get_size()
which is commonly used elsewhere, and don't bother
asking libudev for the device size when
external_device_info_source=udev.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-15 17:03           ` David Teigland
@ 2021-06-15 18:21             ` Zdenek Kabelac
  2021-06-16 16:18             ` heming.zhao
  1 sibling, 0 replies; 77+ messages in thread
From: Zdenek Kabelac @ 2021-06-15 18:21 UTC (permalink / raw)
  To: LVM general discussion and development, David Teigland, Martin Wilck
  Cc: zkabelac, rogerheflin, prajnoha, Heming Zhao

Dne 15. 06. 21 v 19:03 David Teigland napsal(a):
> On Tue, Jun 08, 2021 at 10:39:37AM -0500, David Teigland wrote:
>> I think it would be an improvement to:
>>
>> . Make obtain_device_list_from_udev only control how we get the device
>>    list. Then we can easily default to 0 and readdir /dev if it's better.
>>
>> . Use both native md/mpath detection *and* udev info when it's readily
>>    available (don't wait for it), instead of limiting ourselves to one
>>    source of info.  If either source indicates an md/mpath component,
>>    then we consider it true.
>>
>> The second point means we are free to change obtain_device_list_from_udev
>> as we wish, without affecting md/mpath detection.  It may also improve
>> md/mpath detection overall.
> 
> Here are the initial patches I'm testing (libmpathvalid not yet added)
> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-device-info-1
> 
> devices: rework native and udev device info
> 
> . Make the obtain_device_list_from_udev setting
>    affect only the choice of readdir /dev vs libudev
>    for a device name listing.  The setting no longer
>    controls if udev is used for device type checks.
> 


While in the local testing it may appear devices on laptops are always fast,
in some cases it may actually be more expensive to check physical
device instead on checking content of udev DB.

So for some users this may result in performance regression as
udevDB is in ramdisk - while there are device where it's opening
may take seconds depending on operating status (disk suspend,
disk firmware upgrade....)

(one of lvmetad aspects should have been to avoid waking suspended device)

> . Change obtain_device_list_from_udev default to 0.
>    A list of device names is obtained from readdir /dev
>    by default, which is faster than libudev (especially
>    with many devices.)

So we need at least backward compatible setting for those users
where the performance impact would cause regression.


Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-15 17:03           ` David Teigland
  2021-06-15 18:21             ` Zdenek Kabelac
@ 2021-06-16 16:18             ` heming.zhao
  2021-06-16 16:38               ` David Teigland
  1 sibling, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-06-16 16:18 UTC (permalink / raw)
  To: David Teigland, linux-lvm; +Cc: rogerheflin, prajnoha, Martin Wilck, zkabelac

On 6/16/21 1:03 AM, David Teigland wrote:
> On Tue, Jun 08, 2021 at 10:39:37AM -0500, David Teigland wrote:
>> I think it would be an improvement to:
>>
>> . Make obtain_device_list_from_udev only control how we get the device
>>    list. Then we can easily default to 0 and readdir /dev if it's better.
>>
>> . Use both native md/mpath detection *and* udev info when it's readily
>>    available (don't wait for it), instead of limiting ourselves to one
>>    source of info.  If either source indicates an md/mpath component,
>>    then we consider it true.
>>
>> The second point means we are free to change obtain_device_list_from_udev
>> as we wish, without affecting md/mpath detection.  It may also improve
>> md/mpath detection overall.
> 
> Here are the initial patches I'm testing (libmpathvalid not yet added)
> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-device-info-1
> 

I compiled & tested the code.
I don't know if I missed something, the result didn't show any progress.
the result of "devices/obtain_device_list_from_udev=0" even got regression: from 23.3 => 39.8

the lvm2 version with dev-dct-device-info-1 branch code
```
sle15sp2-base40g:~ # lvm version
   LVM version:     2.03.13(2)-git (2021-05-07)
   Library version: 1.02.179-git (2021-05-07)
   Driver version:  4.40.0
   Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --enable-dmeventd --enable-cmdlib --enable-udev_rules --enable-udev_sync --with-udev-prefix=/usr/ --enable-selinux --enable-pkgconfig --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --with-default-dm-run-dir=/run --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/usr/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-blkid_wiping --enable-lvmpolld --enable-rea
 ltime --with-cache=internal --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --enable-fsadm --disable-silent-rules --enable-write_install --with-vdo=none
```

Installation with cmd: make install && dracut -f --add "lvm"

top 10 blame services
```
external_device_info_source = "none"
obtain_device_list_from_udev = 1
event_activation = 1
udev_sync = 1

sle15sp2-base40g:~ # systemd-analyze blame | head -n 10
      2min 4.515s lvm2-pvscan@135:720.service
      2min 4.332s lvm2-pvscan@135:704.service
      2min 3.162s lvm2-pvscan@8:768.service
      2min 2.168s lvm2-pvscan@135:672.service
      2min 2.166s lvm2-pvscan@135:688.service
     1min 55.275s lvm2-pvscan@130:688.service
     1min 52.541s lvm2-pvscan@135:656.service
     1min 52.483s lvm2-pvscan@135:640.service
     1min 51.066s lvm2-pvscan@128:688.service
     1min 51.065s lvm2-pvscan@128:704.service

devices/obtain_device_list_from_udev=0
global/event_activation=1
activation/udev_sync=1

sle15sp2-base40g:~ # systemd-analyze blame | head -n 10
          39.845s lvm2-pvscan@133:576.service
          39.830s lvm2-pvscan@133:640.service
          39.829s lvm2-pvscan@133:720.service
          39.827s lvm2-pvscan@132:736.service
          39.825s lvm2-pvscan@132:656.service
          39.823s lvm2-pvscan@132:672.service
          39.821s lvm2-pvscan@132:720.service
          39.820s lvm2-pvscan@132:544.service
          39.819s lvm2-pvscan@132:624.service
          39.808s lvm2-pvscan@132:576.servic
```

*** compare with my previous test result. (list in below) ***

```
external_device_info_source = "none"
devices/obtain_device_list_from_udev=1
global/event_activation=1
activation/udev_sync=1

      2min 6.736s lvm2-pvscan@70:384.service
      2min 6.628s lvm2-pvscan@70:400.service
      2min 6.554s lvm2-pvscan@69:432.service
      2min 6.518s lvm2-pvscan@69:480.service
      2min 6.478s lvm2-pvscan@69:416.service
      2min 6.277s lvm2-pvscan@69:464.service
      2min 5.791s lvm2-pvscan@69:544.service

devices/obtain_device_list_from_udev=0
global/event_activation=1
activation/udev_sync=1

          34.908s dracut-initqueue.service
          25.440s systemd-udev-settle.service
          23.335s lvm2-pvscan@66:832.service
          23.335s lvm2-pvscan@65:976.service
          23.335s lvm2-pvscan@66:784.service
          23.335s lvm2-pvscan@65:816.service
          23.335s lvm2-pvscan@8:976.service
          23.327s lvm2-pvscan@66:864.service
          23.323s lvm2-pvscan@66:848.service
          23.316s lvm2-pvscan@65:800.service
```

Thanks
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-16 16:18             ` heming.zhao
@ 2021-06-16 16:38               ` David Teigland
  2021-06-17  3:46                 ` heming.zhao
  0 siblings, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-06-16 16:38 UTC (permalink / raw)
  To: heming.zhao; +Cc: rogerheflin, zkabelac, Martin Wilck, prajnoha, linux-lvm

On Thu, Jun 17, 2021 at 12:18:47AM +0800, heming.zhao@suse.com wrote:
> I don't know if I missed something, the result didn't show any progress.
> the result of "devices/obtain_device_list_from_udev=0" even got regression: from 23.3 => 39.8
> 
> the lvm2 version with dev-dct-device-info-1 branch code
> ```
> sle15sp2-base40g:~ # lvm version
>   LVM version:     2.03.13(2)-git (2021-05-07)
>   Library version: 1.02.179-git (2021-05-07)
>   Driver version:  4.40.0
>   Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --enable-dmeventd --enable-cmdlib --enable-udev_rules --enable-udev_sync --with-udev-prefix=/usr/ --enable-selinux --enable-pkgconfig --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --with-default-dm-run-dir=/run --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/usr/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-blkid_wiping --enable-lvmpolld --enable-re
 altime --with-cache=internal --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --enable-fsadm --disable-silent-rules --enable-write_install --with-vdo=none
> ```
> 
> Installation with cmd: make install && dracut -f --add "lvm"
> 
> top 10 blame services
> ```
> external_device_info_source = "none"
> obtain_device_list_from_udev = 1
> event_activation = 1
> udev_sync = 1

Thanks for running that again.  From your previous testing, my conclusion
was that libudev caused the slowdown.  So, the patch is supposed to avoid
all libudev calls by default, and rely only on lvm's native device type
detection.  The default settings to avoid libudev, are:

  obtain_device_list_from_udev = 0
  external_device_info_source = "none"

I'm not sure which of your test results match those settings, but if those
results are not improved, then we should look further to see if the
slowdown is caused by something other than libudev calls.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-16 16:38               ` David Teigland
@ 2021-06-17  3:46                 ` heming.zhao
  2021-06-17 15:27                   ` David Teigland
  0 siblings, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-06-17  3:46 UTC (permalink / raw)
  To: David Teigland; +Cc: rogerheflin, zkabelac, Martin Wilck, prajnoha, linux-lvm

On 6/17/21 12:38 AM, David Teigland wrote:
> On Thu, Jun 17, 2021 at 12:18:47AM +0800, heming.zhao@suse.com wrote:
>> I don't know if I missed something, the result didn't show any progress.
>> the result of "devices/obtain_device_list_from_udev=0" even got regression: from 23.3 => 39.8
>>
>> the lvm2 version with dev-dct-device-info-1 branch code
>> ```
>> sle15sp2-base40g:~ # lvm version
>>    LVM version:     2.03.13(2)-git (2021-05-07)
>>    Library version: 1.02.179-git (2021-05-07)
>>    Driver version:  4.40.0
>>    Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --enable-dmeventd --enable-cmdlib --enable-udev_rules --enable-udev_sync --with-udev-prefix=/usr/ --enable-selinux --enable-pkgconfig --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --with-default-dm-run-dir=/run --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/usr/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-blkid_wiping --enable-lvmpolld --enable-
 realtime --with-cache=internal --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --enable-fsadm --disable-silent-rules --enable-write_install --with-vdo=none
>> ```
>>
>> Installation with cmd: make install && dracut -f --add "lvm"
>>
>> top 10 blame services
>> ```
>> external_device_info_source = "none"
>> obtain_device_list_from_udev = 1
>> event_activation = 1
>> udev_sync = 1
> 
> Thanks for running that again.  From your previous testing, my conclusion
> was that libudev caused the slowdown.  So, the patch is supposed to avoid
> all libudev calls by default, and rely only on lvm's native device type
> detection.  The default settings to avoid libudev, are:
> 
>    obtain_device_list_from_udev = 0
>    external_device_info_source = "none"
> 
> I'm not sure which of your test results match those settings, but if those
> results are not improved, then we should look further to see if the
> slowdown is caused by something other than libudev calls.
> 

the default value of external_device_info_source is "none" and I didn't change it.
So below (from my last mail) is your wanted result, it's worse than before (23.3 vs 39.8).

```
devices/obtain_device_list_from_udev=0
global/event_activation=1
activation/udev_sync=1

sle15sp2-base40g:~ # systemd-analyze blame | head -n 10
          39.845s lvm2-pvscan@133:576.service
          39.830s lvm2-pvscan@133:640.service
          39.829s lvm2-pvscan@133:720.service
          39.827s lvm2-pvscan@132:736.service
          39.825s lvm2-pvscan@132:656.service
          39.823s lvm2-pvscan@132:672.service
          39.821s lvm2-pvscan@132:720.service
          39.820s lvm2-pvscan@132:544.service
          39.819s lvm2-pvscan@132:624.service
          39.808s lvm2-pvscan@132:576.servic
```

the booting time out problem cause by two side:
1> libudev take much time
2> thousands of lvm2-pvscan@.service running on the same time

the new patch or existing code (with obtain_device_list_from_udev=0) can avoid <1>.
but <2> is still not to avoid. (direct activation can avoid it)

Thanks,
heming

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-17  3:46                 ` heming.zhao
@ 2021-06-17 15:27                   ` David Teigland
  0 siblings, 0 replies; 77+ messages in thread
From: David Teigland @ 2021-06-17 15:27 UTC (permalink / raw)
  To: heming.zhao; +Cc: rogerheflin, zkabelac, Martin Wilck, prajnoha, linux-lvm

On Thu, Jun 17, 2021 at 11:46:55AM +0800, heming.zhao@suse.com wrote:
> the booting time out problem cause by two side:
> 1> libudev take much time
> 2> thousands of lvm2-pvscan@.service running on the same time
> 
> the new patch or existing code (with obtain_device_list_from_udev=0) can avoid <1>.
> but <2> is still not to avoid. (direct activation can avoid it)

Thanks, I'll set up a similar test, hopefully we can avoid such extreme
delays with the current activation mechanism.  I expect we'll also look at
combining a fixed lvm-activation service followed by uevent-driven
activations, which should be more efficient.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-06  6:15 [linux-lvm] Discussion: performance issue on event activation mode heming.zhao
                   ` (2 preceding siblings ...)
  2021-06-07 16:40 ` David Teigland
@ 2021-07-02 21:09 ` David Teigland
  2021-07-02 21:22   ` Martin Wilck
  2021-07-02 21:31   ` Tom Yan
  3 siblings, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-07-02 21:09 UTC (permalink / raw)
  To: heming.zhao
  Cc: Martin Wilck, LVM general discussion and development, Zdenek Kabelac

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> dev_cache_scan //order: O(n^2)
>  + _insert_dirs //O(n)
>  | if obtain_device_list_from_udev() true
>  |   _insert_udev_dir //O(n)
>  |
>  + dev_cache_index_devs //O(n)

I've been running some experiments and trying some patches to improve
this.  By setting obtain_device_list_from_udev=0, and using the attached
patch to disable dev_cache_index_devs, the pvscan is much better.

systemctl status lvm2-pvscan appears to show that the pvscan command
itself runs for only 2-4 seconds, while the service as a whole takes
around 15 seconds.  See the 16 sec gap below from the end of pvscan
to the systemd Started message.  If that's accurate, the remaining delay
would lie outside lvm.

Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event activation on device 253:1710...
Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9 is complete.
Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now active
Jul 02 15:28:16 localhost.localdomain systemd[1]: Started LVM event activation on device 253:1710.


[-- Attachment #2: 0001-pvscan-skip-indexing-devices-used-by-LVs.patch --]
[-- Type: text/plain, Size: 5318 bytes --]

>From f862af22f499e2828cf14ae52b9f3382816b66c5 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Thu, 1 Jul 2021 17:25:43 -0500
Subject: [PATCH] pvscan: skip indexing devices used by LVs

dev_cache_index_devs() is taking a large amount of time
when there are many PVs.  The index keeps track of
devices that are currently in use by active LVs.  This
info is used to print warnings for users in some limited
cases.

The checks/warnings that are enabled by the index are not
needed by pvscan --cache, so disable it in this case.

This may be expanded to other cases in future commits.
dev_cache_index_devs should also be improved in another
commit to avoid the extreme delays with many devices.
---
 lib/commands/toolcontext.c | 1 +
 lib/commands/toolcontext.h | 1 +
 lib/device/dev-cache.c     | 9 +++++----
 lib/device/dev-cache.h     | 2 +-
 lib/metadata/metadata.c    | 3 ++-
 tools/lvmdevices.c         | 2 +-
 tools/pvscan.c             | 2 ++
 7 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/lib/commands/toolcontext.c b/lib/commands/toolcontext.c
index e2be89d0f480..b295a20efe52 100644
--- a/lib/commands/toolcontext.c
+++ b/lib/commands/toolcontext.c
@@ -1608,6 +1608,7 @@ struct cmd_context *create_toolcontext(unsigned is_clvmd,
 	cmd->handles_missing_pvs = 0;
 	cmd->handles_unknown_segments = 0;
 	cmd->hosttags = 0;
+	cmd->check_devs_used = 1;
 	dm_list_init(&cmd->arg_value_groups);
 	dm_list_init(&cmd->formats);
 	dm_list_init(&cmd->segtypes);
diff --git a/lib/commands/toolcontext.h b/lib/commands/toolcontext.h
index a47b7d760317..34808ce46b41 100644
--- a/lib/commands/toolcontext.h
+++ b/lib/commands/toolcontext.h
@@ -192,6 +192,7 @@ struct cmd_context {
 	unsigned filter_nodata_only:1;          /* only use filters that do not require data from the dev */
 	unsigned run_by_dmeventd:1;		/* command is being run by dmeventd */
 	unsigned sysinit:1;			/* --sysinit is used */
+	unsigned check_devs_used:1;		/* check devs used by LVs */
 
 	/*
 	 * Devices and filtering.
diff --git a/lib/device/dev-cache.c b/lib/device/dev-cache.c
index d7fa93fd6fdb..bb0d0f211d36 100644
--- a/lib/device/dev-cache.c
+++ b/lib/device/dev-cache.c
@@ -1139,7 +1139,7 @@ static int _insert(const char *path, const struct stat *info,
 	return 1;
 }
 
-void dev_cache_scan(void)
+void dev_cache_scan(struct cmd_context *cmd)
 {
 	log_debug_devs("Creating list of system devices.");
 
@@ -1147,7 +1147,8 @@ void dev_cache_scan(void)
 
 	_insert_dirs(&_cache.dirs);
 
-	(void) dev_cache_index_devs();
+	if (cmd->check_devs_used)
+		(void) dev_cache_index_devs();
 }
 
 int dev_cache_has_scanned(void)
@@ -1583,7 +1584,7 @@ struct device *dev_cache_get_by_devt(struct cmd_context *cmd, dev_t dev, struct
 
 		log_debug_devs("Device num not found in dev_cache repeat dev_cache_scan for %d:%d",
 				(int)MAJOR(dev), (int)MINOR(dev));
-		dev_cache_scan();
+		dev_cache_scan(cmd);
 		d = (struct device *) btree_lookup(_cache.devices, (uint32_t) dev);
 
 		if (!d)
@@ -1953,7 +1954,7 @@ int setup_devices(struct cmd_context *cmd)
 	 * This will not open or read any devices, but may look at sysfs properties.
 	 * This list of devs comes from looking /dev entries, or from asking libudev.
 	 */
-	dev_cache_scan();
+	dev_cache_scan(cmd);
 
 	/*
 	 * Match entries from cmd->use_devices with device structs in dev-cache.
diff --git a/lib/device/dev-cache.h b/lib/device/dev-cache.h
index 9b7e39d33cf1..c3f5eddda802 100644
--- a/lib/device/dev-cache.h
+++ b/lib/device/dev-cache.h
@@ -48,7 +48,7 @@ int dev_cache_exit(void);
  */
 int dev_cache_check_for_open_devices(void);
 
-void dev_cache_scan(void);
+void dev_cache_scan(struct cmd_context *cmd);
 int dev_cache_has_scanned(void);
 
 int dev_cache_add_dir(const char *path);
diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c
index d5b28a58f200..0cbf678690b7 100644
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -5132,7 +5132,8 @@ struct volume_group *vg_read(struct cmd_context *cmd, const char *vg_name, const
 	if (!check_pv_dev_sizes(vg))
 		log_warn("WARNING: One or more devices used as PVs in VG %s have changed sizes.", vg->name);
 
-	_check_devs_used_correspond_with_vg(vg);
+	if (cmd->check_devs_used)
+		_check_devs_used_correspond_with_vg(vg);
 
 	if (!_access_vg_lock_type(cmd, vg, lockd_state, &failure)) {
 		/* Either FAILED_LOCK_TYPE or FAILED_LOCK_MODE were set. */
diff --git a/tools/lvmdevices.c b/tools/lvmdevices.c
index 3448bdd14722..5117277a091d 100644
--- a/tools/lvmdevices.c
+++ b/tools/lvmdevices.c
@@ -171,7 +171,7 @@ int lvmdevices(struct cmd_context *cmd, int argc, char **argv)
 		log_error("Failed to read the devices file.");
 		return ECMD_FAILED;
 	}
-	dev_cache_scan();
+	dev_cache_scan(cmd);
 	device_ids_match(cmd);
 
 	if (arg_is_set(cmd, check_ARG) || arg_is_set(cmd, update_ARG)) {
diff --git a/tools/pvscan.c b/tools/pvscan.c
index f8d27372b4ae..464501ad54c4 100644
--- a/tools/pvscan.c
+++ b/tools/pvscan.c
@@ -1627,6 +1627,8 @@ int pvscan_cache_cmd(struct cmd_context *cmd, int argc, char **argv)
 
 	dm_list_init(&complete_vgnames);
 
+	cmd->check_devs_used = 0;
+
 	if (do_activate &&
 	    !find_config_tree_bool(cmd, global_event_activation_CFG, NULL)) {
 		log_verbose("Ignoring pvscan --cache -aay because event_activation is disabled.");
-- 
2.10.1


[-- Attachment #3: Type: text/plain, Size: 201 bytes --]

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-07-02 21:09 ` David Teigland
@ 2021-07-02 21:22   ` Martin Wilck
  2021-07-02 22:02     ` David Teigland
  2021-07-02 21:31   ` Tom Yan
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Wilck @ 2021-07-02 21:22 UTC (permalink / raw)
  To: Heming Zhao, teigland; +Cc: linux-lvm, zkabelac

On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> > dev_cache_scan //order: O(n^2)
> >  + _insert_dirs //O(n)
> >  | if obtain_device_list_from_udev() true
> >  |   _insert_udev_dir //O(n)
> >  |
> >  + dev_cache_index_devs //O(n)
> 
> I've been running some experiments and trying some patches to improve
> this.  By setting obtain_device_list_from_udev=0, and using the
> attached
> patch to disable dev_cache_index_devs, the pvscan is much better.
> 
> systemctl status lvm2-pvscan appears to show that the pvscan command
> itself runs for only 2-4 seconds, while the service as a whole takes
> around 15 seconds.  See the 16 sec gap below from the end of pvscan
> to the systemd Started message.  If that's accurate, the remaining
> delay
> would lie outside lvm.
> 
> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
> activation on device 253:1710...
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
> /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
> is complete.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
> 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
> volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
> active

Printing this message is really the last thing that pvscan does?

> Jul 02 15:28:16 localhost.localdomain systemd[1]: Started LVM event
> activation on device 253:1710.

If systemd is very busy, it might take some time until it sees the
completion of the unit. We may need to involve systemd experts. Anyway,
what counts is the behavior if we have lots of parallel pvscan
processes.

Thanks,
Martin





_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-07-02 21:09 ` David Teigland
  2021-07-02 21:22   ` Martin Wilck
@ 2021-07-02 21:31   ` Tom Yan
  1 sibling, 0 replies; 77+ messages in thread
From: Tom Yan @ 2021-07-02 21:31 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: Zdenek Kabelac, Martin Wilck, heming.zhao

On Sat, 3 Jul 2021 at 05:17, David Teigland <teigland@redhat.com> wrote:
>
> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> > dev_cache_scan //order: O(n^2)
> >  + _insert_dirs //O(n)
> >  | if obtain_device_list_from_udev() true
> >  |   _insert_udev_dir //O(n)
> >  |
> >  + dev_cache_index_devs //O(n)
>
> I've been running some experiments and trying some patches to improve
> this.  By setting obtain_device_list_from_udev=0, and using the attached
> patch to disable dev_cache_index_devs, the pvscan is much better.
>
> systemctl status lvm2-pvscan appears to show that the pvscan command
> itself runs for only 2-4 seconds, while the service as a whole takes
> around 15 seconds.  See the 16 sec gap below from the end of pvscan
> to the systemd Started message.  If that's accurate, the remaining delay
> would lie outside lvm.

Not necessarily. It could be pvscan stalling for 16s after printing
the "now active" message for some reason before exiting (with 0).

>
> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event activation on device 253:1710...
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9 is complete.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now active
> Jul 02 15:28:16 localhost.localdomain systemd[1]: Started LVM event activation on device 253:1710.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-07-02 21:22   ` Martin Wilck
@ 2021-07-02 22:02     ` David Teigland
  2021-07-03 11:49       ` heming.zhao
  0 siblings, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-07-02 22:02 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, linux-lvm, Heming Zhao

On Fri, Jul 02, 2021 at 09:22:03PM +0000, Martin Wilck wrote:
> On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
> > On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> > > dev_cache_scan //order: O(n^2)
> > >  + _insert_dirs //O(n)
> > >  | if obtain_device_list_from_udev() true
> > >  |   _insert_udev_dir //O(n)
> > >  |
> > >  + dev_cache_index_devs //O(n)
> > 
> > I've been running some experiments and trying some patches to improve
> > this.  By setting obtain_device_list_from_udev=0, and using the
> > attached
> > patch to disable dev_cache_index_devs, the pvscan is much better.
> > 
> > systemctl status lvm2-pvscan appears to show that the pvscan command
> > itself runs for only 2-4 seconds, while the service as a whole takes
> > around 15 seconds.  See the 16 sec gap below from the end of pvscan
> > to the systemd Started message.  If that's accurate, the remaining
> > delay
> > would lie outside lvm.
> > 
> > Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
> > activation on device 253:1710...
> > Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
> > /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
> > is complete.
> > Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
> > 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
> > Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
> > volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
> > active
> 
> Printing this message is really the last thing that pvscan does?

I've not seen anything measurable after that message.  However, digging
through the command init/exit paths I did find libudev setup/destroy calls
that can also be skipped when the command is not accessing libudev info.
A quick check seemed to show some further improvement from dropping those.
(That will be part of the larger patch isolating libudev usage.)

I'm still seeing significant delay between the apparent command exit and
the systemd "Started" message, but this will require a little more work to
prove.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-07-02 22:02     ` David Teigland
@ 2021-07-03 11:49       ` heming.zhao
  2021-07-08 10:10         ` Tom Yan
  0 siblings, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-07-03 11:49 UTC (permalink / raw)
  To: David Teigland, Martin Wilck; +Cc: linux-lvm, zkabelac

On 7/3/21 6:02 AM, David Teigland wrote:
> On Fri, Jul 02, 2021 at 09:22:03PM +0000, Martin Wilck wrote:
>> On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
>>> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
>>>> dev_cache_scan //order: O(n^2)
>>>>   + _insert_dirs //O(n)
>>>>   | if obtain_device_list_from_udev() true
>>>>   |   _insert_udev_dir //O(n)
>>>>   |
>>>>   + dev_cache_index_devs //O(n)
>>>
>>> I've been running some experiments and trying some patches to improve
>>> this.  By setting obtain_device_list_from_udev=0, and using the
>>> attached
>>> patch to disable dev_cache_index_devs, the pvscan is much better.
>>>
>>> systemctl status lvm2-pvscan appears to show that the pvscan command
>>> itself runs for only 2-4 seconds, while the service as a whole takes
>>> around 15 seconds.  See the 16 sec gap below from the end of pvscan
>>> to the systemd Started message.  If that's accurate, the remaining
>>> delay
>>> would lie outside lvm.
>>>
>>> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
>>> activation on device 253:1710...
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
>>> /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
>>> is complete.
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
>>> 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
>>> volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
>>> active
>>
>> Printing this message is really the last thing that pvscan does?
> 
> I've not seen anything measurable after that message.  However, digging
> through the command init/exit paths I did find libudev setup/destroy calls
> that can also be skipped when the command is not accessing libudev info.
> A quick check seemed to show some further improvement from dropping those.
> (That will be part of the larger patch isolating libudev usage.)
> 
> I'm still seeing significant delay between the apparent command exit and
> the systemd "Started" message, but this will require a little more work to
> prove.
> 

I applied the patch and got the very similar result.


```with patch
# systemd-analyze blame
          27.679s lvm2-pvscan@134:896.service
          27.664s lvm2-pvscan@134:800.service
          27.648s lvm2-pvscan@134:960.service
          27.645s lvm2-pvscan@132:816.service
          27.589s lvm2-pvscan@134:816.service
          27.582s lvm2-pvscan@133:992.service
          ... ...

# systemctl status lvm2-pvscan@134:896.service
... ...
Jul 03 19:43:02 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 134:896...
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] PV /dev/sdalm online, VG vg_sdalm is complet>
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] VG vg_sdalm run autoactivation.
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   1 logical volume(s) in volume group "vg_sdalm" now active
Jul 03 19:43:30 sle15sp2-base40g systemd[1]: Started LVM event activation on device 134:896.
```

the 27.679s get from 19:43:30 minus 19:43:02.

and 27s is 10s quick than the lvm without patch (by today's test result)
```without patch
# systemd-analyze blame
          37.650s lvm2-pvscan@133:992.service
          37.650s lvm2-pvscan@133:1008.service
          37.649s lvm2-pvscan@133:896.service
          37.649s lvm2-pvscan@134:960.service
          37.612s lvm2-pvscan@133:880.service
          37.612s lvm2-pvscan@133:864.servic
          ... ...

# systemctl status lvm2-pvscan@133:992.service
... ...
Jul 03 19:31:28 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 133:992...
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] PV /dev/sdalc online, VG vg_sdalc is complet>
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] VG vg_sdalc run autoactivation.
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   1 logical volume(s) in volume group "vg_sdalc" now active
Jul 03 19:32:05 sle15sp2-base40g systemd[1]: Started LVM event activation on device 133:992.
```

I added log in lvm2 code, the time of log "1 logical volume(s) in volume
group "vg_sdalm" now active" is the real pvscan finished time. Martin
comment in previous mail is may right, systemd is may busy and delayed to
detect/see the pvscan service completion.

Thanks,
heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-07-03 11:49       ` heming.zhao
@ 2021-07-08 10:10         ` Tom Yan
  0 siblings, 0 replies; 77+ messages in thread
From: Tom Yan @ 2021-07-08 10:10 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: zkabelac, David Teigland, Martin Wilck

I wonder if using a shell wrapper to detect whether pvscan actually
exits way earlier than the unit does is a more accurate approach.

On Sat, 3 Jul 2021 at 19:50, heming.zhao@suse.com <heming.zhao@suse.com> wrote:
>
> On 7/3/21 6:02 AM, David Teigland wrote:
> > On Fri, Jul 02, 2021 at 09:22:03PM +0000, Martin Wilck wrote:
> >> On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
> >>> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao@suse.com wrote:
> >>>> dev_cache_scan //order: O(n^2)
> >>>>   + _insert_dirs //O(n)
> >>>>   | if obtain_device_list_from_udev() true
> >>>>   |   _insert_udev_dir //O(n)
> >>>>   |
> >>>>   + dev_cache_index_devs //O(n)
> >>>
> >>> I've been running some experiments and trying some patches to improve
> >>> this.  By setting obtain_device_list_from_udev=0, and using the
> >>> attached
> >>> patch to disable dev_cache_index_devs, the pvscan is much better.
> >>>
> >>> systemctl status lvm2-pvscan appears to show that the pvscan command
> >>> itself runs for only 2-4 seconds, while the service as a whole takes
> >>> around 15 seconds.  See the 16 sec gap below from the end of pvscan
> >>> to the systemd Started message.  If that's accurate, the remaining
> >>> delay
> >>> would lie outside lvm.
> >>>
> >>> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
> >>> activation on device 253:1710...
> >>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
> >>> /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
> >>> is complete.
> >>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
> >>> 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
> >>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
> >>> volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
> >>> active
> >>
> >> Printing this message is really the last thing that pvscan does?
> >
> > I've not seen anything measurable after that message.  However, digging
> > through the command init/exit paths I did find libudev setup/destroy calls
> > that can also be skipped when the command is not accessing libudev info.
> > A quick check seemed to show some further improvement from dropping those.
> > (That will be part of the larger patch isolating libudev usage.)
> >
> > I'm still seeing significant delay between the apparent command exit and
> > the systemd "Started" message, but this will require a little more work to
> > prove.
> >
>
> I applied the patch and got the very similar result.
>
>
> ```with patch
> # systemd-analyze blame
>           27.679s lvm2-pvscan@134:896.service
>           27.664s lvm2-pvscan@134:800.service
>           27.648s lvm2-pvscan@134:960.service
>           27.645s lvm2-pvscan@132:816.service
>           27.589s lvm2-pvscan@134:816.service
>           27.582s lvm2-pvscan@133:992.service
>           ... ...
>
> # systemctl status lvm2-pvscan@134:896.service
> ... ...
> Jul 03 19:43:02 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 134:896...
> Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] PV /dev/sdalm online, VG vg_sdalm is complet>
> Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] VG vg_sdalm run autoactivation.
> Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   1 logical volume(s) in volume group "vg_sdalm" now active
> Jul 03 19:43:30 sle15sp2-base40g systemd[1]: Started LVM event activation on device 134:896.
> ```
>
> the 27.679s get from 19:43:30 minus 19:43:02.
>
> and 27s is 10s quick than the lvm without patch (by today's test result)
> ```without patch
> # systemd-analyze blame
>           37.650s lvm2-pvscan@133:992.service
>           37.650s lvm2-pvscan@133:1008.service
>           37.649s lvm2-pvscan@133:896.service
>           37.649s lvm2-pvscan@134:960.service
>           37.612s lvm2-pvscan@133:880.service
>           37.612s lvm2-pvscan@133:864.servic
>           ... ...
>
> # systemctl status lvm2-pvscan@133:992.service
> ... ...
> Jul 03 19:31:28 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 133:992...
> Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] PV /dev/sdalc online, VG vg_sdalc is complet>
> Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] VG vg_sdalc run autoactivation.
> Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   1 logical volume(s) in volume group "vg_sdalc" now active
> Jul 03 19:32:05 sle15sp2-base40g systemd[1]: Started LVM event activation on device 133:992.
> ```
>
> I added log in lvm2 code, the time of log "1 logical volume(s) in volume
> group "vg_sdalm" now active" is the real pvscan finished time. Martin
> comment in previous mail is may right, systemd is may busy and delayed to
> detect/see the pvscan service completion.
>
> Thanks,
> heming
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-06-08 13:23       ` Martin Wilck
  2021-06-08 13:41         ` Peter Rajnoha
@ 2021-09-09 19:44         ` David Teigland
  2021-09-10 17:38           ` Martin Wilck
  2021-09-27 10:00           ` Peter Rajnoha
  1 sibling, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-09-09 19:44 UTC (permalink / raw)
  To: linux-lvm; +Cc: zkabelac, bmarzins, prajnoha, heming.zhao, martin.wilck

On Tue, Jun 08, 2021 at 01:23:33PM +0000, Martin Wilck wrote:
> On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> > On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > > 
> > > If there are say 1000 PVs already present on the system, there
> > > could be
> > > real savings in having one lvm command process all 1000, and then
> > > switch
> > > over to processing uevents for any further devices afterward.  The
> > > switch
> > > over would be delicate because of the obvious races involved with
> > > new devs
> > > appearing, but probably feasible.
> > 
> > Maybe to avoid the race, we could possibly write the proposed
> > "/run/lvm2/boot-finished" right before we initiate scanning in
> > "vgchange
> > -aay" that is a part of the lvm2-activation-net.service (the last
> > service to do the direct activation).
> > 
> > A few event-based pvscans could fire during the window between
> > "scan initiated phase" in lvm2-activation-net.service's
> > "ExecStart=vgchange -aay..."
> > and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
> > finished",
> > but I think still better than missing important uevents completely in
> > this window.
> 
> That sounds reasonable. I was thinking along similar lines. Note that
> in the case where we had problems lately, all actual activation (and
> slowness) happened in lvm2-activation-early.service.

I've implemented a solution like this and would like any thoughts,
improvements, or testing to verify it can help:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1

I've taken some direction from the lvm activation generator, but there are
details of that I'm not too familiar with, so I may be missing something
(in particular it has three activation points but I'm showing two below.)
This new method would probably let us drop the activation-generator, since
we could easily configure an equivalent using this new method.

Here's how it works:

uevents for PVs run pvscan with the new option --eventactivation check.
This makes pvscan check if the /run/lvm/event-activation-on file exists.
If not, pvscan does nothing.

lvm-activate-vgs-main.service
. always runs (not generated)
. does not wait for other virtual block device systems to start
. runs vgchange -aay to activate any VGs already present

lvm-activate-vgs-last.service
. always runs (not generated)
. runs after other systems, like multipathd, have started (we want it
  to find as many VGs to activate as possible)
. runs vgchange -aay --eventactivation enable
. the --eventactivation enable creates /run/lvm/event-activation-on,
  which enables the traditional pvscan activations from uevents.
. this vgchange also creates pv online files for existing PVs.
  (Future pvscans will need the online files to know when VGs are
  completed, i.e. for VGs that are partially complete at the point
  of switching to event based actvivation.)

uevents for PVs continue to run pvscan with the new option
--eventactivation check, but the check now sees the event-activation-on
temp file, so they will do activation as they have before.

Notes:

- To avoid missing VGs during the transition to event-based, the vgchange
in lvm-activate-vgs-last will create event-activation-on before doing
anything else.  This means for a period of time both vgchange and pvscan
may attempt to activate the same VG.  These commits use the existing
mechanism to resolve this (the --vgonline option and /run/lvm/vgs_online).

- We could use the new lvm-activate-* services to replace the activation
generator when lvm.conf event_activation=0.  This would be done by simply
not creating the event-activation-on file when event_activation=0.

- To do the reverse, and use only event based activation without any
lvm-activate-vgs services, a new lvm.conf setting could be used, e.g.
event_activation_switch=0 and disabling lvm-activate-vgs services.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-09 19:44         ` David Teigland
@ 2021-09-10 17:38           ` Martin Wilck
  2021-09-12 16:51             ` heming.zhao
  2021-09-27 10:00           ` Peter Rajnoha
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Wilck @ 2021-09-10 17:38 UTC (permalink / raw)
  To: Heming Zhao, teigland, linux-lvm; +Cc: bmarzins, prajnoha, zkabelac

On Thu, 2021-09-09 at 14:44 -0500, David Teigland wrote:
> On Tue, Jun 08, 2021 at 01:23:33PM +0000, Martin Wilck wrote:
> > On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> > > On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > > > 
> > > > If there are say 1000 PVs already present on the system, there
> > > > could be
> > > > real savings in having one lvm command process all 1000, and
> > > > then
> > > > switch
> > > > over to processing uevents for any further devices afterward. 
> > > > The
> > > > switch
> > > > over would be delicate because of the obvious races involved
> > > > with
> > > > new devs
> > > > appearing, but probably feasible.
> > > 
> > > Maybe to avoid the race, we could possibly write the proposed
> > > "/run/lvm2/boot-finished" right before we initiate scanning in
> > > "vgchange
> > > -aay" that is a part of the lvm2-activation-net.service (the last
> > > service to do the direct activation).
> > > 
> > > A few event-based pvscans could fire during the window between
> > > "scan initiated phase" in lvm2-activation-net.service's
> > > "ExecStart=vgchange -aay..."
> > > and the originally proposed "ExecStartPost=/bin/touch
> > > /run/lvm2/boot-
> > > finished",
> > > but I think still better than missing important uevents
> > > completely in
> > > this window.
> > 
> > That sounds reasonable. I was thinking along similar lines. Note
> > that
> > in the case where we had problems lately, all actual activation
> > (and
> > slowness) happened in lvm2-activation-early.service.
> 
> I've implemented a solution like this and would like any thoughts,
> improvements, or testing to verify it can help:
> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1
> 
> I've taken some direction from the lvm activation generator, but
> there are
> details of that I'm not too familiar with, so I may be missing
> something
> (in particular it has three activation points but I'm showing two
> below.)
> This new method would probably let us drop the activation-generator,
> since
> we could easily configure an equivalent using this new method.
> 
> Here's how it works:
> 
> uevents for PVs run pvscan with the new option --eventactivation
> check.
> This makes pvscan check if the /run/lvm/event-activation-on file
> exists.
> If not, pvscan does nothing.
> 
> lvm-activate-vgs-main.service
> . always runs (not generated)
> . does not wait for other virtual block device systems to start
> . runs vgchange -aay to activate any VGs already present
> 
> lvm-activate-vgs-last.service
> . always runs (not generated)
> . runs after other systems, like multipathd, have started (we want it
>   to find as many VGs to activate as possible)
> . runs vgchange -aay --eventactivation enable
> . the --eventactivation enable creates /run/lvm/event-activation-on,
>   which enables the traditional pvscan activations from uevents.
> . this vgchange also creates pv online files for existing PVs.
>   (Future pvscans will need the online files to know when VGs are
>   completed, i.e. for VGs that are partially complete at the point
>   of switching to event based actvivation.)
> 
> uevents for PVs continue to run pvscan with the new option
> --eventactivation check, but the check now sees the event-activation-
> on
> temp file, so they will do activation as they have before.
> 
> Notes:
> 
> - To avoid missing VGs during the transition to event-based, the
> vgchange
> in lvm-activate-vgs-last will create event-activation-on before doing
> anything else.  This means for a period of time both vgchange and
> pvscan
> may attempt to activate the same VG.  These commits use the existing
> mechanism to resolve this (the --vgonline option and
> /run/lvm/vgs_online).
> 
> - We could use the new lvm-activate-* services to replace the
> activation
> generator when lvm.conf event_activation=0.  This would be done by
> simply
> not creating the event-activation-on file when event_activation=0.
> 
> - To do the reverse, and use only event based activation without any
> lvm-activate-vgs services, a new lvm.conf setting could be used, e.g.
> event_activation_switch=0 and disabling lvm-activate-vgs services.

This last idea sounds awkward to me. But the rest is very nice. 
Heming, do you agree we should give it a try?

Thanks,
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-10 17:38           ` Martin Wilck
@ 2021-09-12 16:51             ` heming.zhao
  0 siblings, 0 replies; 77+ messages in thread
From: heming.zhao @ 2021-09-12 16:51 UTC (permalink / raw)
  To: Martin Wilck, teigland, linux-lvm; +Cc: bmarzins, prajnoha, zkabelac

On 9/11/21 1:38 AM, Martin Wilck wrote:
> On Thu, 2021-09-09 at 14:44 -0500, David Teigland wrote:
>> On Tue, Jun 08, 2021 at 01:23:33PM +0000, Martin Wilck wrote:
>>> On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
>>>> On Mon 07 Jun 2021 16:48, David Teigland wrote:
>>>>>
>>>>> If there are say 1000 PVs already present on the system, there
>>>>> could be
>>>>> real savings in having one lvm command process all 1000, and
>>>>> then
>>>>> switch
>>>>> over to processing uevents for any further devices afterward.
>>>>> The
>>>>> switch
>>>>> over would be delicate because of the obvious races involved
>>>>> with
>>>>> new devs
>>>>> appearing, but probably feasible.
>>>>
>>>> Maybe to avoid the race, we could possibly write the proposed
>>>> "/run/lvm2/boot-finished" right before we initiate scanning in
>>>> "vgchange
>>>> -aay" that is a part of the lvm2-activation-net.service (the last
>>>> service to do the direct activation).
>>>>
>>>> A few event-based pvscans could fire during the window between
>>>> "scan initiated phase" in lvm2-activation-net.service's
>>>> "ExecStart=vgchange -aay..."
>>>> and the originally proposed "ExecStartPost=/bin/touch
>>>> /run/lvm2/boot-
>>>> finished",
>>>> but I think still better than missing important uevents
>>>> completely in
>>>> this window.
>>>
>>> That sounds reasonable. I was thinking along similar lines. Note
>>> that
>>> in the case where we had problems lately, all actual activation
>>> (and
>>> slowness) happened in lvm2-activation-early.service.
>>
>> I've implemented a solution like this and would like any thoughts,
>> improvements, or testing to verify it can help:
>> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1
>>
>> I've taken some direction from the lvm activation generator, but
>> there are
>> details of that I'm not too familiar with, so I may be missing
>> something
>> (in particular it has three activation points but I'm showing two
>> below.)
>> This new method would probably let us drop the activation-generator,
>> since
>> we could easily configure an equivalent using this new method.
>>
>> Here's how it works:
>>
>> uevents for PVs run pvscan with the new option --eventactivation
>> check.
>> This makes pvscan check if the /run/lvm/event-activation-on file
>> exists.
>> If not, pvscan does nothing.
>>
>> lvm-activate-vgs-main.service
>> . always runs (not generated)
>> . does not wait for other virtual block device systems to start
>> . runs vgchange -aay to activate any VGs already present
>>
>> lvm-activate-vgs-last.service
>> . always runs (not generated)
>> . runs after other systems, like multipathd, have started (we want it
>>    to find as many VGs to activate as possible)
>> . runs vgchange -aay --eventactivation enable
>> . the --eventactivation enable creates /run/lvm/event-activation-on,
>>    which enables the traditional pvscan activations from uevents.
>> . this vgchange also creates pv online files for existing PVs.
>>    (Future pvscans will need the online files to know when VGs are
>>    completed, i.e. for VGs that are partially complete at the point
>>    of switching to event based actvivation.)
>>
>> uevents for PVs continue to run pvscan with the new option
>> --eventactivation check, but the check now sees the event-activation-
>> on
>> temp file, so they will do activation as they have before.
>>
>> Notes:
>>
>> - To avoid missing VGs during the transition to event-based, the
>> vgchange
>> in lvm-activate-vgs-last will create event-activation-on before doing
>> anything else.  This means for a period of time both vgchange and
>> pvscan
>> may attempt to activate the same VG.  These commits use the existing
>> mechanism to resolve this (the --vgonline option and
>> /run/lvm/vgs_online).
>>
>> - We could use the new lvm-activate-* services to replace the
>> activation
>> generator when lvm.conf event_activation=0.  This would be done by
>> simply
>> not creating the event-activation-on file when event_activation=0.
>>
>> - To do the reverse, and use only event based activation without any
>> lvm-activate-vgs services, a new lvm.conf setting could be used, e.g.
>> event_activation_switch=0 and disabling lvm-activate-vgs services.
> 
> This last idea sounds awkward to me. But the rest is very nice.
> Heming, do you agree we should give it a try?
> 

the last note is do the compatible things. we can't image & can't test all
the use cases, create a switch is a good idea.
but believe me, except lvm2 developers, no one understand event/direct activation
story. the new cfg item (event_activation_switch) is related with another item
(event_activation) will make users confuse.

We should help users to do the best performance job/selection. So we could reuse
the item "event_activation", current value is 0 and 1, we can add new value '2'.
i.e.:
0 - disable event_activation (use direct activation)
1 - new behaviour
2 - old/legacy mode/behavior

default value is 1 but the lvm behavior is changed.
if anyone want to use/reset, to assign '2' to this item.

-------
I had verified this new feature in my env. this feature make a great progress.

new feature with lvm config:
obtain_device_list_from_udev = 1
event_activation = 1
udev_sync = 1

systemd-analyze blame: (top 9 items)
          20.809s lvm2-pvscan@134:544.service
          20.808s lvm2-pvscan@134:656.service
          20.808s lvm2-pvscan@134:528.service
          20.807s lvm2-pvscan@133:640.service
          20.806s lvm2-pvscan@133:672.service
          20.785s lvm2-pvscan@134:672.service
          20.784s lvm2-pvscan@134:624.service
          20.784s lvm2-pvscan@128:1008.service
          20.783s lvm2-pvscan@128:832.service

the same lvm config costed 2min 6.736s (could find this result from my previous mail).

and the shortest time in previous mail is 17.552s, under cfg:
obtain_device_list_from_udev=1, event_activation=0, udev_sync=1

the result (20.809s) is very close to direct activation, which is also reasonable.
(lvm first uses direct mode then switch to event mode)

Thanks,
Heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-09 19:44         ` David Teigland
  2021-09-10 17:38           ` Martin Wilck
@ 2021-09-27 10:00           ` Peter Rajnoha
  2021-09-27 15:38             ` David Teigland
  1 sibling, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-27 10:00 UTC (permalink / raw)
  To: David Teigland; +Cc: zkabelac, bmarzins, martin.wilck, heming.zhao, linux-lvm

Hi!

the patches and logic looks promising, there's just one thing I'm
worried about...

On Thu 09 Sep 2021 14:44, David Teigland wrote:
> I've implemented a solution like this and would like any thoughts,
> improvements, or testing to verify it can help:
> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1
> 
> I've taken some direction from the lvm activation generator, but there are
> details of that I'm not too familiar with, so I may be missing something
> (in particular it has three activation points but I'm showing two below.)
> This new method would probably let us drop the activation-generator, since
> we could easily configure an equivalent using this new method.
> 
> Here's how it works:
> 
> uevents for PVs run pvscan with the new option --eventactivation check.
> This makes pvscan check if the /run/lvm/event-activation-on file exists.
> If not, pvscan does nothing.
> 
> lvm-activate-vgs-main.service
> . always runs (not generated)
...
> lvm-activate-vgs-last.service
> . always runs (not generated)
...
> - We could use the new lvm-activate-* services to replace the activation
> generator when lvm.conf event_activation=0.  This would be done by simply
> not creating the event-activation-on file when event_activation=0.

...the issue I see here is around the systemd-udev-settle:

  - the setup where lvm-activate-vgs*.service are always there (not
    generated only on event_activation=0 as it was before with the
    original lvm2-activation-*.service) practically means we always
    make a dependency on systemd-udev-settle.service, which we shouldn't
    do in case we have event_activation=1.

  - If we want to make sure that we run our "non-event-based activation"
    after systemd-udev-settle.service, we also need to use
    "After=systemd-udev-settle.service" (the "Wants" will only make the
    udev settle service executed, but it doesn't order it with respect
    to our activation services, so it can happen in parallel - we want
    it to happen after the udev settle).

Now the question is whether we really need the systemd-udev-settle at
all, even for that non-event-based lvm activation. The udev-settle is
just to make sure that all the udev processing and udev db content is
complete for all triggered devices. But if we're not reading udev db and
we're OK that those devices might be open in parallel to lvm activation
period (e.g. because there's blkid scan done on disks/PVs), we should be
OK even without that settle. However, we're reading some info from udev db,
right? (like the multipath component state etc.)

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-27 10:00           ` Peter Rajnoha
@ 2021-09-27 15:38             ` David Teigland
  2021-09-28  6:34               ` Martin Wilck
  2021-09-29 21:39               ` Peter Rajnoha
  0 siblings, 2 replies; 77+ messages in thread
From: David Teigland @ 2021-09-27 15:38 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: zkabelac, bmarzins, martin.wilck, heming.zhao, linux-lvm

On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > - We could use the new lvm-activate-* services to replace the activation
> > generator when lvm.conf event_activation=0.  This would be done by simply
> > not creating the event-activation-on file when event_activation=0.
> 
> ...the issue I see here is around the systemd-udev-settle:

Thanks, I have a couple questions about the udev-settle to understand that
better, although it seems we may not need it.

>   - the setup where lvm-activate-vgs*.service are always there (not
>     generated only on event_activation=0 as it was before with the
>     original lvm2-activation-*.service) practically means we always
>     make a dependency on systemd-udev-settle.service, which we shouldn't
>     do in case we have event_activation=1.

Why wouldn't the event_activation=1 case want a dependency on udev-settle?

>   - If we want to make sure that we run our "non-event-based activation"
>     after systemd-udev-settle.service, we also need to use
>     "After=systemd-udev-settle.service" (the "Wants" will only make the
>     udev settle service executed, but it doesn't order it with respect
>     to our activation services, so it can happen in parallel - we want
>     it to happen after the udev settle).

So we may not fully benefit from settling unless we use After (although
the benefits are uncertain as mentioned below.)

> Now the question is whether we really need the systemd-udev-settle at
> all, even for that non-event-based lvm activation. The udev-settle is
> just to make sure that all the udev processing and udev db content is
> complete for all triggered devices. But if we're not reading udev db and
> we're OK that those devices might be open in parallel to lvm activation
> period (e.g. because there's blkid scan done on disks/PVs), we should be
> OK even without that settle. However, we're reading some info from udev db,
> right? (like the multipath component state etc.)

- Reading the udev db: with the default external_device_info_source=none
  we no longer ask the udev db for any info about devs.  (We now follow
  that setting strictly, and only ask udev when source=udev.)

- Concurrent blkid and activation: I can't find an issue with this
  (couldn't force any interference with some quick tests.)

- I wonder if After=udev-settle could have an incidental but meaningful
  effect of more PVs being in place before the service runs?

I'll try dropping udev-settle in all cases to see how things look.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-27 15:38             ` David Teigland
@ 2021-09-28  6:34               ` Martin Wilck
  2021-09-28 14:42                 ` David Teigland
  2021-09-29 21:53                 ` Peter Rajnoha
  2021-09-29 21:39               ` Peter Rajnoha
  1 sibling, 2 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-28  6:34 UTC (permalink / raw)
  To: teigland, prajnoha; +Cc: bmarzins, zkabelac, linux-lvm, Heming Zhao

Hello David and Peter,

On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > - We could use the new lvm-activate-* services to replace the
> > > activation
> > > generator when lvm.conf event_activation=0.  This would be done by
> > > simply
> > > not creating the event-activation-on file when event_activation=0.
> > 
> > ...the issue I see here is around the systemd-udev-settle:
> 
> Thanks, I have a couple questions about the udev-settle to understand
> that
> better, although it seems we may not need it.
> 
> >   - the setup where lvm-activate-vgs*.service are always there (not
> >     generated only on event_activation=0 as it was before with the
> >     original lvm2-activation-*.service) practically means we always
> >     make a dependency on systemd-udev-settle.service, which we
> > shouldn't
> >     do in case we have event_activation=1.
> 
> Why wouldn't the event_activation=1 case want a dependency on udev-
> settle?

You said it should wait for multipathd, which in turn waits for udev
settle. And indeed it makes some sense. After all: the idea was to
avoid locking issues or general resource starvation during uevent
storms, which typically occur in the coldplug phase, and for which the
completion of "udev settle" is the best available indicator.

> 
> >   - If we want to make sure that we run our "non-event-based
> > activation"
> >     after systemd-udev-settle.service, we also need to use
> >     "After=systemd-udev-settle.service" (the "Wants" will only make
> > the
> >     udev settle service executed, but it doesn't order it with
> > respect
> >     to our activation services, so it can happen in parallel - we
> > want
> >     it to happen after the udev settle).
> 
> So we may not fully benefit from settling unless we use After
> (although
> the benefits are uncertain as mentioned below.)

Side note :You may be aware that the systemd people are deprecating
this service (e.g.
https://github.com/opensvc/multipath-tools/issues/3). 
I'm arguing against it (perhaps you want to join in :-), but odds are
that it'll disappear sooner or later. Fot the time being, I don't see a
good alternative.

The dependency type you have to use depends on what you need. Do you
really only depend on udev settle because of multipathd? I don't think
so; even without multipath, thousands of PVs being probed
simultaneously can bring the performance of parallel pvscans down. That
was the original motivation for this discussion, after all. If this is
so, you should use both "Wants" and "After". Otherwise, using only
"After" might be sufficient.

> 
> > Now the question is whether we really need the systemd-udev-settle
> > at
> > all, even for that non-event-based lvm activation. The udev-settle
> > is
> > just to make sure that all the udev processing and udev db content
> > is
> > complete for all triggered devices. But if we're not reading udev
> > db and
> > we're OK that those devices might be open in parallel to lvm
> > activation
> > period (e.g. because there's blkid scan done on disks/PVs), we
> > should be
> > OK even without that settle. However, we're reading some info from
> > udev db,
> > right? (like the multipath component state etc.)
> 
> - Reading the udev db: with the default
> external_device_info_source=none
>   we no longer ask the udev db for any info about devs.  (We now
> follow
>   that setting strictly, and only ask udev when source=udev.)

This is a different discussion, but if you don't ask udev, how do you
determine (reliably, and consistently with other services) whether a
given device will be part of a multipath device or a MD Raid member?

I know well there are arguments both for and against using udev in this
context, but whatever optimizations you implement, they should work
both ways.

> - Concurrent blkid and activation: I can't find an issue with this
>   (couldn't force any interference with some quick tests.)

In the past, there were issues with either pvscan or blkid (or
multipath) failing to open a device while another process had opened it
exclusively. I've never understood all the subtleties. See systemd
commit 3ebdb81 ("udev: serialize/synchronize block device event
handling with file locks").

> - I wonder if After=udev-settle could have an incidental but
> meaningful
>   effect of more PVs being in place before the service runs?

After=udev-settle will make sure that you're past a coldplug uevent
storm during boot. IMO this is the most important part of the equation.
I'd be happy to find a solution for this that doesn't rely on udev
settle, but I don't see any.

Regards
Martin

> 
> I'll try dropping udev-settle in all cases to see how things look.
> 
> Dave
> 


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28  6:34               ` Martin Wilck
@ 2021-09-28 14:42                 ` David Teigland
  2021-09-28 15:16                   ` Martin Wilck
  2021-09-29 21:53                 ` Peter Rajnoha
  1 sibling, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-09-28 14:42 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, bmarzins, prajnoha, linux-lvm, Heming Zhao

On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> Hello David and Peter,
> 
> On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > - We could use the new lvm-activate-* services to replace the
> > > > activation
> > > > generator when lvm.conf event_activation=0.  This would be done by
> > > > simply
> > > > not creating the event-activation-on file when event_activation=0.
> > > 
> > > ...the issue I see here is around the systemd-udev-settle:
> > 
> > Thanks, I have a couple questions about the udev-settle to understand
> > that
> > better, although it seems we may not need it.
> > 
> > >   - the setup where lvm-activate-vgs*.service are always there (not
> > >     generated only on event_activation=0 as it was before with the
> > >     original lvm2-activation-*.service) practically means we always
> > >     make a dependency on systemd-udev-settle.service, which we
> > > shouldn't
> > >     do in case we have event_activation=1.
> > 
> > Why wouldn't the event_activation=1 case want a dependency on udev-
> > settle?
> 
> You said it should wait for multipathd, which in turn waits for udev
> settle. And indeed it makes some sense. After all: the idea was to
> avoid locking issues or general resource starvation during uevent
> storms, which typically occur in the coldplug phase, and for which the
> completion of "udev settle" is the best available indicator.

Hi Martin, thanks, you have some interesting details here.

Right, the idea is for lvm-activate-vgs-last to wait for other services
like multipath (or anything else that a PV would typically sit on), so
that it will be able to activate as many VGs as it can that are present at
startup.  And we avoid responding to individual coldplug events for PVs,
saving time/effort/etc.

> I'm arguing against it (perhaps you want to join in :-), but odds are
> that it'll disappear sooner or later. Fot the time being, I don't see a
> good alternative.

multipath has more complex udev dependencies, I'll be interested to see
how you manage to reduce those, since I've been reducing/isolating our
udev usage also.

> The dependency type you have to use depends on what you need. Do you
> really only depend on udev settle because of multipathd? I don't think
> so; even without multipath, thousands of PVs being probed
> simultaneously can bring the performance of parallel pvscans down. That
> was the original motivation for this discussion, after all. If this is
> so, you should use both "Wants" and "After". Otherwise, using only
> "After" might be sufficient.

I don't think we really need the settle.  If device nodes for PVs are
present, then vgchange -aay from lvm-activate-vgs* will see them and
activate VGs from them, regardless of what udev has or hasn't done with
them yet.

> > - Reading the udev db: with the default
> > external_device_info_source=none
> > we no longer ask the udev db for any info about devs.  (We now
> > follow that setting strictly, and only ask udev when source=udev.)
> 
> This is a different discussion, but if you don't ask udev, how do you
> determine (reliably, and consistently with other services) whether a
> given device will be part of a multipath device or a MD Raid member?

Firstly, with the new devices file, only the actual md/mpath device will
be in the devices file, the components will not be, so lvm will never
attempt to look at an md or mpath component device.

Otherwise, when the devices file is not used,
md: from reading the md headers from the disk
mpath: from reading sysfs links and /etc/multipath/wwids

> In the past, there were issues with either pvscan or blkid (or
> multipath) failing to open a device while another process had opened it
> exclusively. I've never understood all the subtleties. See systemd
> commit 3ebdb81 ("udev: serialize/synchronize block device event
> handling with file locks").

Those locks look like a fine solution if a problem comes up like that.
I suspect the old issues may have been caused by a program using an
exclusive open when it shouldn't.

> After=udev-settle will make sure that you're past a coldplug uevent
> storm during boot. IMO this is the most important part of the equation.
> I'd be happy to find a solution for this that doesn't rely on udev
> settle, but I don't see any.

I don't think multipathd is listening to uevents directly?  If it were,
you might use a heuristic to detect a change in uevents (e.g. the volume)
and conclude coldplug is finished.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 14:42                 ` David Teigland
@ 2021-09-28 15:16                   ` Martin Wilck
  2021-09-28 15:31                     ` Martin Wilck
                                       ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-28 15:16 UTC (permalink / raw)
  To: teigland; +Cc: bmarzins, zkabelac, prajnoha, linux-lvm, Heming Zhao

On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> > Hello David and Peter,
> > 
> > On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > > - We could use the new lvm-activate-* services to replace the
> > > > > activation
> > > > > generator when lvm.conf event_activation=0.  This would be
> > > > > done by
> > > > > simply
> > > > > not creating the event-activation-on file when
> > > > > event_activation=0.
> > > > 
> > > > ...the issue I see here is around the systemd-udev-settle:
> > > 
> > > Thanks, I have a couple questions about the udev-settle to
> > > understand
> > > that
> > > better, although it seems we may not need it.
> > > 
> > > >   - the setup where lvm-activate-vgs*.service are always there
> > > > (not
> > > >     generated only on event_activation=0 as it was before with
> > > > the
> > > >     original lvm2-activation-*.service) practically means we
> > > > always
> > > >     make a dependency on systemd-udev-settle.service, which we
> > > > shouldn't
> > > >     do in case we have event_activation=1.
> > > 
> > > Why wouldn't the event_activation=1 case want a dependency on
> > > udev-
> > > settle?
> > 
> > You said it should wait for multipathd, which in turn waits for
> > udev
> > settle. And indeed it makes some sense. After all: the idea was to
> > avoid locking issues or general resource starvation during uevent
> > storms, which typically occur in the coldplug phase, and for which
> > the
> > completion of "udev settle" is the best available indicator.
> 
> Hi Martin, thanks, you have some interesting details here.
> 
> Right, the idea is for lvm-activate-vgs-last to wait for other
> services
> like multipath (or anything else that a PV would typically sit on),
> so
> that it will be able to activate as many VGs as it can that are
> present at
> startup.  And we avoid responding to individual coldplug events for
> PVs,
> saving time/effort/etc.
> 
> > I'm arguing against it (perhaps you want to join in :-), but odds
> > are
> > that it'll disappear sooner or later. Fot the time being, I don't
> > see a
> > good alternative.
> 
> multipath has more complex udev dependencies, I'll be interested to
> see
> how you manage to reduce those, since I've been reducing/isolating
> our
> udev usage also.

I have pondered this quite a bit, but I can't say I have a concrete
plan.

To avoid depending on "udev settle", multipathd needs to partially
revert to udev-independent device detection. At least during initial
startup, we may encounter multipath maps with members that don't exist
in the udev db, and we need to deal with this situation gracefully. We
currently don't, and it's a tough problem to solve cleanly. Not relying
on udev opens up a Pandora's box wrt WWID determination, for example.
Any such change would without doubt carry a large risk of regressions
in some scenarios, which we wouldn't want to happen in our large
customer's data centers.

I also looked into Lennart's "storage daemon" concept where multipathd
would continue running over the initramfs/rootfs switch, but that would
be yet another step with even higher risk.

> 
> > The dependency type you have to use depends on what you need. Do
> > you
> > really only depend on udev settle because of multipathd? I don't
> > think
> > so; even without multipath, thousands of PVs being probed
> > simultaneously can bring the performance of parallel pvscans down.
> > That
> > was the original motivation for this discussion, after all. If this
> > is
> > so, you should use both "Wants" and "After". Otherwise, using only
> > "After" might be sufficient.
> 
> I don't think we really need the settle.  If device nodes for PVs are
> present, then vgchange -aay from lvm-activate-vgs* will see them and
> activate VGs from them, regardless of what udev has or hasn't done
> with
> them yet.

Hm. This would mean that the switch to event-based PV detection could
happen before "udev settle" ends. A coldplug storm of uevents could
create 1000s of PVs in a blink after event-based detection was enabled.
Wouldn't that resurrect the performance issues that you are trying to
fix with this patch set?

> 
> > > - Reading the udev db: with the default
> > > external_device_info_source=none
> > > we no longer ask the udev db for any info about devs.  (We now
> > > follow that setting strictly, and only ask udev when
> > > source=udev.)
> > 
> > This is a different discussion, but if you don't ask udev, how do
> > you
> > determine (reliably, and consistently with other services) whether
> > a
> > given device will be part of a multipath device or a MD Raid
> > member?
> 
> Firstly, with the new devices file, only the actual md/mpath device
> will
> be in the devices file, the components will not be, so lvm will never
> attempt to look at an md or mpath component device.

I have to look more closely into the devices file and how it's created
and used. 

> Otherwise, when the devices file is not used,
> md: from reading the md headers from the disk
> mpath: from reading sysfs links and /etc/multipath/wwids

Ugh. Reading sysfs links means that you're indirectly depending on
udev, because udev creates those. It's *more* fragile than calling into
libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
general. It works only on distros that use "find_multipaths strict",
like RHEL. Not to mention that the path can be customized in
multipath.conf.

> 
> > In the past, there were issues with either pvscan or blkid (or
> > multipath) failing to open a device while another process had
> > opened it
> > exclusively. I've never understood all the subtleties. See systemd
> > commit 3ebdb81 ("udev: serialize/synchronize block device event
> > handling with file locks").
> 
> Those locks look like a fine solution if a problem comes up like
> that.
> I suspect the old issues may have been caused by a program using an
> exclusive open when it shouldn't.

Possible. I haven't seen many of these issues recently. Very rarely, I
see reports of a mount command mysteriously, sporadically failing
during boot. It's very hard to figure out why that happens if it does.
I suspect some transient effect of this kind.

> 
> > After=udev-settle will make sure that you're past a coldplug uevent
> > storm during boot. IMO this is the most important part of the
> > equation.
> > I'd be happy to find a solution for this that doesn't rely on udev
> > settle, but I don't see any.
> 
> I don't think multipathd is listening to uevents directly?
>   If it were,
> you might use a heuristic to detect a change in uevents (e.g. the
> volume)
> and conclude coldplug is finished.

multipathd does listen to uevents (only "udev" events, not "kernel").
But that doesn't help us on startup. Currently we try hard to start up
after coldplug is finished. multipathd doesn't have a concurrency issue
like LVM2 (at least I hope so; it handles events with just two threads,
a producer and a consumer). The problem is rather that dm devices
survive the initramfs->rootfs switch, while member devices don't (see
above).

Cheers,
Martin


> 
> Dave
> 


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 15:16                   ` Martin Wilck
@ 2021-09-28 15:31                     ` Martin Wilck
  2021-09-28 15:56                     ` David Teigland
  2021-09-28 17:42                     ` Benjamin Marzinski
  2 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-28 15:31 UTC (permalink / raw)
  To: teigland; +Cc: bmarzins, zkabelac, prajnoha, linux-lvm, Heming Zhao

On Tue, 2021-09-28 at 17:16 +0200, Martin Wilck wrote:
> On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> 
> > 
> > Firstly, with the new devices file, only the actual md/mpath device
> > will
> > be in the devices file, the components will not be, so lvm will
> > never
> > attempt to look at an md or mpath component device.
> 
> I have to look more closely into the devices file and how it's
> created
> and used. 
> 
> > Otherwise, when the devices file is not used,
> > md: from reading the md headers from the disk
> > mpath: from reading sysfs links and /etc/multipath/wwids
> 
> Ugh. Reading sysfs links means that you're indirectly depending on
> udev, because udev creates those. It's *more* fragile than calling
> into
> libudev directly, IMO.

Bah. Mental short-circuit. You wrote "sysfs symlinks" and I read
"/dev/disk symlinks". Sorry! Then, I'm not quite sure what symlinks you
are talking about though.

Martin




_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 15:16                   ` Martin Wilck
  2021-09-28 15:31                     ` Martin Wilck
@ 2021-09-28 15:56                     ` David Teigland
  2021-09-28 18:03                       ` Benjamin Marzinski
  2021-09-28 17:42                     ` Benjamin Marzinski
  2 siblings, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-09-28 15:56 UTC (permalink / raw)
  To: Martin Wilck; +Cc: bmarzins, zkabelac, prajnoha, linux-lvm, Heming Zhao

On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> Hm. This would mean that the switch to event-based PV detection could
> happen before "udev settle" ends. A coldplug storm of uevents could
> create 1000s of PVs in a blink after event-based detection was enabled.
> Wouldn't that resurrect the performance issues that you are trying to
> fix with this patch set?

Possibly, I'm unsure how this looks in practice, so I need to try it.
When the device node exists will make a difference, not only when the
uevent occurs.

> > Otherwise, when the devices file is not used,
> > md: from reading the md headers from the disk
> > mpath: from reading sysfs links and /etc/multipath/wwids
> 
> Ugh. Reading sysfs links means that you're indirectly depending on
> udev, because udev creates those. It's *more* fragile than calling into
> libudev directly, IMO.

I meant /sys/dev/block/... (some of those files are links).
We don't look at /dev symlinks created by udev.

> Using /etc/multipath/wwids is plain wrong in
> general. It works only on distros that use "find_multipaths strict",
> like RHEL. Not to mention that the path can be customized in
> multipath.conf.

Right, it's not great and I held off for a couple years adding that.
As a practical matter it can at least help.  There's a constant stream
of problems with mpath component detection, so anything that can help I'm
interested in.  I expect we could be more intelligent understanding
multipath config to handle more cases.

> multipathd does listen to uevents (only "udev" events, not "kernel").
> But that doesn't help us on startup. Currently we try hard to start up
> after coldplug is finished. multipathd doesn't have a concurrency issue
> like LVM2 (at least I hope so; it handles events with just two threads,
> a producer and a consumer). The problem is rather that dm devices
> survive the initramfs->rootfs switch, while member devices don't (see
> above).

The other day I suggested that multipath devices not be set up in
the initramfs at all.  If the root fs requires mpath, then handle that
as a special one-off setup.  Then the transition problems go away.
But, I know basically nothing about this, so I won't be surprised if
there are reasons it's done this way.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 15:16                   ` Martin Wilck
  2021-09-28 15:31                     ` Martin Wilck
  2021-09-28 15:56                     ` David Teigland
@ 2021-09-28 17:42                     ` Benjamin Marzinski
  2021-09-28 19:15                       ` Martin Wilck
  2021-09-29 22:06                       ` Peter Rajnoha
  2 siblings, 2 replies; 77+ messages in thread
From: Benjamin Marzinski @ 2021-09-28 17:42 UTC (permalink / raw)
  To: Martin Wilck; +Cc: prajnoha, zkabelac, teigland, linux-lvm, Heming Zhao

On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> > On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> > > Hello David and Peter,
> > > 
> > > On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > > > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > > > - We could use the new lvm-activate-* services to replace the
> > > > > > activation
> > > > > > generator when lvm.conf event_activation=0.  This would be
> > > > > > done by
> > > > > > simply
> > > > > > not creating the event-activation-on file when
> > > > > > event_activation=0.
> > > > > 
> > > > > ...the issue I see here is around the systemd-udev-settle:
> > > > 
> > > > Thanks, I have a couple questions about the udev-settle to
> > > > understand
> > > > that
> > > > better, although it seems we may not need it.
> > > > 
> > > > >   - the setup where lvm-activate-vgs*.service are always there
> > > > > (not
> > > > >     generated only on event_activation=0 as it was before with
> > > > > the
> > > > >     original lvm2-activation-*.service) practically means we
> > > > > always
> > > > >     make a dependency on systemd-udev-settle.service, which we
> > > > > shouldn't
> > > > >     do in case we have event_activation=1.
> > > > 
> > > > Why wouldn't the event_activation=1 case want a dependency on
> > > > udev-
> > > > settle?
> > > 
> > > You said it should wait for multipathd, which in turn waits for
> > > udev
> > > settle. And indeed it makes some sense. After all: the idea was to
> > > avoid locking issues or general resource starvation during uevent
> > > storms, which typically occur in the coldplug phase, and for which
> > > the
> > > completion of "udev settle" is the best available indicator.
> > 
> > Hi Martin, thanks, you have some interesting details here.
> > 
> > Right, the idea is for lvm-activate-vgs-last to wait for other
> > services
> > like multipath (or anything else that a PV would typically sit on),
> > so
> > that it will be able to activate as many VGs as it can that are
> > present at
> > startup.  And we avoid responding to individual coldplug events for
> > PVs,
> > saving time/effort/etc.
> > 
> > > I'm arguing against it (perhaps you want to join in :-), but odds
> > > are
> > > that it'll disappear sooner or later. Fot the time being, I don't
> > > see a
> > > good alternative.
> > 
> > multipath has more complex udev dependencies, I'll be interested to
> > see
> > how you manage to reduce those, since I've been reducing/isolating
> > our
> > udev usage also.
> 
> I have pondered this quite a bit, but I can't say I have a concrete
> plan.
> 
> To avoid depending on "udev settle", multipathd needs to partially
> revert to udev-independent device detection. At least during initial
> startup, we may encounter multipath maps with members that don't exist
> in the udev db, and we need to deal with this situation gracefully. We
> currently don't, and it's a tough problem to solve cleanly. Not relying
> on udev opens up a Pandora's box wrt WWID determination, for example.
> Any such change would without doubt carry a large risk of regressions
> in some scenarios, which we wouldn't want to happen in our large
> customer's data centers.

I'm not actually sure that it's as bad as all that. We just may need a
way for multipathd to detect if the coldplug has happened.  I'm sure if
we say we need it to remove the udev settle, we can get some method to
check this. Perhaps there is one already, that I don't know about. If
multipathd starts up and the coldplug hasn't happened, we can just
assume the existing devices are correct, and set up the paths enough to
check them, until we are notified that the coldplug has finished. Then
we just run reconfigure, and continue along like everything currently
is.  The basic idea it to have multipathd run in mode where its only
concern is monitoring the paths of the existing devices, until we're
notified that the coldplug has completed. The important thing would be
to make sure that we can't accidentally miss the notification that the
coldplug has completed. But we could always time out if it takes too
long, and we haven't gotten any uevents recently.
 
> I also looked into Lennart's "storage daemon" concept where multipathd
> would continue running over the initramfs/rootfs switch, but that would
> be yet another step with even higher risk.

This is the "set argv[0][0] = '@' to disble initramfs daemon killing"
concept, right? We still have the problem where the udev database gets
cleared, so if we ever need to look at that while processing the
coldplug events, we'll have problems.

> > 
> > > The dependency type you have to use depends on what you need. Do
> > > you
> > > really only depend on udev settle because of multipathd? I don't
> > > think
> > > so; even without multipath, thousands of PVs being probed
> > > simultaneously can bring the performance of parallel pvscans down.
> > > That
> > > was the original motivation for this discussion, after all. If this
> > > is
> > > so, you should use both "Wants" and "After". Otherwise, using only
> > > "After" might be sufficient.
> > 
> > I don't think we really need the settle.  If device nodes for PVs are
> > present, then vgchange -aay from lvm-activate-vgs* will see them and
> > activate VGs from them, regardless of what udev has or hasn't done
> > with
> > them yet.
> 
> Hm. This would mean that the switch to event-based PV detection could
> happen before "udev settle" ends. A coldplug storm of uevents could
> create 1000s of PVs in a blink after event-based detection was enabled.
> Wouldn't that resurrect the performance issues that you are trying to
> fix with this patch set?
> 
> > 
> > > > - Reading the udev db: with the default
> > > > external_device_info_source=none
> > > > we no longer ask the udev db for any info about devs.  (We now
> > > > follow that setting strictly, and only ask udev when
> > > > source=udev.)
> > > 
> > > This is a different discussion, but if you don't ask udev, how do
> > > you
> > > determine (reliably, and consistently with other services) whether
> > > a
> > > given device will be part of a multipath device or a MD Raid
> > > member?
> > 
> > Firstly, with the new devices file, only the actual md/mpath device
> > will
> > be in the devices file, the components will not be, so lvm will never
> > attempt to look at an md or mpath component device.
> 
> I have to look more closely into the devices file and how it's created
> and used. 
> 
> > Otherwise, when the devices file is not used,
> > md: from reading the md headers from the disk
> > mpath: from reading sysfs links and /etc/multipath/wwids
> 
> Ugh. Reading sysfs links means that you're indirectly depending on
> udev, because udev creates those. It's *more* fragile than calling into
> libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
> general. It works only on distros that use "find_multipaths strict",
> like RHEL. Not to mention that the path can be customized in
> multipath.conf.

I admit that a wwid being in the wwids file doesn't mean that it is
definitely a multipath path device (it could always still be blacklisted
for instance). Also, the ability to move the wwids file is unfortunate,
and probably never used. But it is the case that every wwid in the wwids
file has had a multipath device successfully created for it. This is
true regardless of the find_multipaths setting, and seems to me to be a
good hint. Conversely, if a device wwid isn't in the wwids file, then it
very likely has never been multipathed before (assuming that the wwids
file is on a writable filesystem).

So relying on it being correct is wrong, but it certainly provides
useful hints.

> > 
> > > In the past, there were issues with either pvscan or blkid (or
> > > multipath) failing to open a device while another process had
> > > opened it
> > > exclusively. I've never understood all the subtleties. See systemd
> > > commit 3ebdb81 ("udev: serialize/synchronize block device event
> > > handling with file locks").
> > 
> > Those locks look like a fine solution if a problem comes up like
> > that.
> > I suspect the old issues may have been caused by a program using an
> > exclusive open when it shouldn't.
> 
> Possible. I haven't seen many of these issues recently. Very rarely, I
> see reports of a mount command mysteriously, sporadically failing
> during boot. It's very hard to figure out why that happens if it does.
> I suspect some transient effect of this kind.
> 
> > 
> > > After=udev-settle will make sure that you're past a coldplug uevent
> > > storm during boot. IMO this is the most important part of the
> > > equation.
> > > I'd be happy to find a solution for this that doesn't rely on udev
> > > settle, but I don't see any.
> > 
> > I don't think multipathd is listening to uevents directly?
> >   If it were,
> > you might use a heuristic to detect a change in uevents (e.g. the
> > volume)
> > and conclude coldplug is finished.
> 
> multipathd does listen to uevents (only "udev" events, not "kernel").
> But that doesn't help us on startup. Currently we try hard to start up
> after coldplug is finished. multipathd doesn't have a concurrency issue
> like LVM2 (at least I hope so; it handles events with just two threads,
> a producer and a consumer). The problem is rather that dm devices
> survive the initramfs->rootfs switch, while member devices don't (see
> above).
> 
> Cheers,
> Martin
> 
> 
> > 
> > Dave
> > 

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 15:56                     ` David Teigland
@ 2021-09-28 18:03                       ` Benjamin Marzinski
  0 siblings, 0 replies; 77+ messages in thread
From: Benjamin Marzinski @ 2021-09-28 18:03 UTC (permalink / raw)
  To: David Teigland; +Cc: zkabelac, Heming Zhao, prajnoha, linux-lvm, Martin Wilck

On Tue, Sep 28, 2021 at 10:56:09AM -0500, David Teigland wrote:
> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > Hm. This would mean that the switch to event-based PV detection could
> > happen before "udev settle" ends. A coldplug storm of uevents could
> > create 1000s of PVs in a blink after event-based detection was enabled.
> > Wouldn't that resurrect the performance issues that you are trying to
> > fix with this patch set?
> 
> Possibly, I'm unsure how this looks in practice, so I need to try it.
> When the device node exists will make a difference, not only when the
> uevent occurs.
> 
> > > Otherwise, when the devices file is not used,
> > > md: from reading the md headers from the disk
> > > mpath: from reading sysfs links and /etc/multipath/wwids
> > 
> > Ugh. Reading sysfs links means that you're indirectly depending on
> > udev, because udev creates those. It's *more* fragile than calling into
> > libudev directly, IMO.
> 
> I meant /sys/dev/block/... (some of those files are links).
> We don't look at /dev symlinks created by udev.
> 
> > Using /etc/multipath/wwids is plain wrong in
> > general. It works only on distros that use "find_multipaths strict",
> > like RHEL. Not to mention that the path can be customized in
> > multipath.conf.
> 
> Right, it's not great and I held off for a couple years adding that.
> As a practical matter it can at least help.  There's a constant stream
> of problems with mpath component detection, so anything that can help I'm
> interested in.  I expect we could be more intelligent understanding
> multipath config to handle more cases.
> 
> > multipathd does listen to uevents (only "udev" events, not "kernel").
> > But that doesn't help us on startup. Currently we try hard to start up
> > after coldplug is finished. multipathd doesn't have a concurrency issue
> > like LVM2 (at least I hope so; it handles events with just two threads,
> > a producer and a consumer). The problem is rather that dm devices
> > survive the initramfs->rootfs switch, while member devices don't (see
> > above).
> 
> The other day I suggested that multipath devices not be set up in
> the initramfs at all.  If the root fs requires mpath, then handle that
> as a special one-off setup.  Then the transition problems go away.
> But, I know basically nothing about this, so I won't be surprised if
> there are reasons it's done this way.


If you don't need the device to pivot to the real filesystem and LVM,
MD, etc. don't activate those devices in the initramfs, you don't need
to include the multipath module when building the initramfs in dracut.
Many existing setups with multipath already work this way. The problem
we need to solve is the setups that DO need the multipath device to
exist before other devices get stacked on top or filesystems get
mounted in the initramfs.

-Ben

> 
> Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 17:42                     ` Benjamin Marzinski
@ 2021-09-28 19:15                       ` Martin Wilck
  2021-09-29 22:06                       ` Peter Rajnoha
  1 sibling, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-28 19:15 UTC (permalink / raw)
  To: bmarzins; +Cc: prajnoha, dm-devel, teigland, zkabelac, linux-lvm, Heming Zhao

On Tue, 2021-09-28 at 12:42 -0500, Benjamin Marzinski wrote:
> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> > 
> > 
> > I have pondered this quite a bit, but I can't say I have a concrete
> > plan.
> > 
> > To avoid depending on "udev settle", multipathd needs to partially
> > revert to udev-independent device detection. At least during
> > initial
> > startup, we may encounter multipath maps with members that don't
> > exist
> > in the udev db, and we need to deal with this situation gracefully.
> > We
> > currently don't, and it's a tough problem to solve cleanly. Not
> > relying
> > on udev opens up a Pandora's box wrt WWID determination, for
> > example.
> > Any such change would without doubt carry a large risk of
> > regressions
> > in some scenarios, which we wouldn't want to happen in our large
> > customer's data centers.
> 
> I'm not actually sure that it's as bad as all that. We just may need
> a
> way for multipathd to detect if the coldplug has happened.  I'm sure
> if
> we say we need it to remove the udev settle, we can get some method
> to
> check this. Perhaps there is one already, that I don't know about.

Our ideas are not so far apart, but this is the wrong thread on the
wrong mailing list :-) Adding dm-devel.

My thinking is: if during startup multipathd encounters existing maps
with member devices missing in udev, it can test the existence of the
devices in sysfs, and if the devices are present there, it shouldn't
flush the maps. This should probably be a general principle, not only
during startup or "boot" (wondering if it makes sense to try and add a
concept like "started during boot" to multipathd - I'd rather try to
keep it generic). Anyway, however you put it, that means that we'd
deviate at least to some extent from the current "always rely on udev"
principle. That's what I meant. Perhaps I exaggerated the difficulties.
Anyway, details need to be worked out, and I expect some rough edges.

> > I also looked into Lennart's "storage daemon" concept where
> > multipathd
> > would continue running over the initramfs/rootfs switch, but that
> > would
> > be yet another step with even higher risk.
> 
> This is the "set argv[0][0] = '@' to disble initramfs daemon killing"
> concept, right? We still have the problem where the udev database
> gets
> cleared, so if we ever need to look at that while processing the
> coldplug events, we'll have problems.

If multipathd had started during initrd processing, it would have seen
the uevents for the member devices. There are no "remove" events, so
multipathd might not even notice that the devices are gone. But libudev
queries on the devices could fail between pivot and coldplug, which is
perhaps even nastier... Also, a daemon running like this would live in
a separate, detached mount namespace. It couldn't just reread its
configuration file or the wwids file; it would have no access to the
ordinary root FS. 

> > 
> > > Otherwise, when the devices file is not used,
> > > md: from reading the md headers from the disk
> > > mpath: from reading sysfs links and /etc/multipath/wwids
> > 
> > Ugh. Reading sysfs links means that you're indirectly depending on
> > udev, because udev creates those. It's *more* fragile than calling
> > into
> > libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
> > general. It works only on distros that use "find_multipaths
> > strict",
> > like RHEL. Not to mention that the path can be customized in
> > multipath.conf.
> 
> I admit that a wwid being in the wwids file doesn't mean that it is
> definitely a multipath path device (it could always still be
> blacklisted
> for instance). Also, the ability to move the wwids file is
> unfortunate,
> and probably never used. But it is the case that every wwid in the
> wwids
> file has had a multipath device successfully created for it. This is
> true regardless of the find_multipaths setting, and seems to me to be
> a
> good hint. Conversely, if a device wwid isn't in the wwids file, then
> it
> very likely has never been multipathed before (assuming that the
> wwids
> file is on a writable filesystem).

Hm. I hear you, but I am able to run "multipath -a" and add a wwid to
the file without it being created. Actually I'm able to add bogus wwids
to the file in this way.

Regards,
Martin
> 


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-27 15:38             ` David Teigland
  2021-09-28  6:34               ` Martin Wilck
@ 2021-09-29 21:39               ` Peter Rajnoha
  2021-09-30  7:22                 ` Martin Wilck
  2021-09-30 15:55                 ` David Teigland
  1 sibling, 2 replies; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-29 21:39 UTC (permalink / raw)
  To: David Teigland; +Cc: zkabelac, bmarzins, martin.wilck, heming.zhao, linux-lvm

On Mon 27 Sep 2021 10:38, David Teigland wrote:
> On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > - We could use the new lvm-activate-* services to replace the activation
> > > generator when lvm.conf event_activation=0.  This would be done by simply
> > > not creating the event-activation-on file when event_activation=0.
> > 
> > ...the issue I see here is around the systemd-udev-settle:
> 
> Thanks, I have a couple questions about the udev-settle to understand that
> better, although it seems we may not need it.
> 
> >   - the setup where lvm-activate-vgs*.service are always there (not
> >     generated only on event_activation=0 as it was before with the
> >     original lvm2-activation-*.service) practically means we always
> >     make a dependency on systemd-udev-settle.service, which we shouldn't
> >     do in case we have event_activation=1.
> 
> Why wouldn't the event_activation=1 case want a dependency on udev-settle?
> 

For event-based activation, I'd expect it to really behave in event-based
manner, that is, to respond to events as soon as they come and not wait
for all the other devices unnecessarily.

The use of udev-settle is always a pain - for example, if there's a mount
point defined on top of an LV, with udev-settle as dependency, we practically
wait for all devices to settle. With 'all', I mean even devices which are not
block devices and which are not event related to any of that LVM layout and
the stack underneath. So simply we could be waiting uselessly and we
could increase possibility of a timeout (...for the mount point etc.).

With the settle in play, we'd have this sequence/ordering with the
services/executions:

systemd-udev-settle.service --> lvm-activate-vgs-main.service -->
lvm-activate-vgs-last.service --> event-based pvscans

> >   - If we want to make sure that we run our "non-event-based activation"
> >     after systemd-udev-settle.service, we also need to use
> >     "After=systemd-udev-settle.service" (the "Wants" will only make the
> >     udev settle service executed, but it doesn't order it with respect
> >     to our activation services, so it can happen in parallel - we want
> >     it to happen after the udev settle).
> 
> So we may not fully benefit from settling unless we use After (although
> the benefits are uncertain as mentioned below.)
> 
> > Now the question is whether we really need the systemd-udev-settle at
> > all, even for that non-event-based lvm activation. The udev-settle is
> > just to make sure that all the udev processing and udev db content is
> > complete for all triggered devices. But if we're not reading udev db and
> > we're OK that those devices might be open in parallel to lvm activation
> > period (e.g. because there's blkid scan done on disks/PVs), we should be
> > OK even without that settle. However, we're reading some info from udev db,
> > right? (like the multipath component state etc.)
> 
> - Reading the udev db: with the default external_device_info_source=none
>   we no longer ask the udev db for any info about devs.  (We now follow
>   that setting strictly, and only ask udev when source=udev.)

Hmm, thinking about this, I've just realized one more important and related
thing here I didn't realize before - the LVM regex filters! These may contain
symlink names as one can find them in /dev. But for those symlinks, we need
to be sure that the rules are already applied. This practically means that:

  - For non-event-based activation, we need udev-settle (so we're sure
    all the rules are applied for all devices we might be scanning).

  - For event-based activation, we need to be sure that we use "RUN"
    rule, not any of "IMPORT{program}" or "PROGRAM" rule. The difference
    is that the "RUN" rules are applied after all the other udev rules are
    already applied for current uevent, including creation of symlinks. And
    currently, we have IMPORT{program}="pvscan..." in our rule,
    unfortunately...

So what if someone defines an LVM regex filter that accepts only the
symlink name which is just to be created based on udev rules we're
processing right now?

(The nodes under /dev are OK because they're created in devtmpfs as
soon as the devices are created in kernel, but those symlinks in /dev
are always created by udev based on udev rules.)

> 
> - Concurrent blkid and activation: I can't find an issue with this
>   (couldn't force any interference with some quick tests.)
> 
> - I wonder if After=udev-settle could have an incidental but meaningful
>   effect of more PVs being in place before the service runs?
> 

The nodes are already there, the symlinks could be missing because the
udev rules haven't been processed yet.

Non-event-based LVM activation needs to wait for settle for sure (because
there's full scan across all devices).

Event-based LVM activation just needs to be sure that:

  - the pvscan only scans the single device (the one for which there's
    the uevent currently being processed),

  - the pvscan should be called in a way that we have all the symlinks
    in place so the regex filter still works for symlinks (== putting
    the pvscan onto a RUN exec queue).

> I'll try dropping udev-settle in all cases to see how things look.
> 
> Dave
> 

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28  6:34               ` Martin Wilck
  2021-09-28 14:42                 ` David Teigland
@ 2021-09-29 21:53                 ` Peter Rajnoha
  2021-09-30  7:45                   ` Martin Wilck
  1 sibling, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-29 21:53 UTC (permalink / raw)
  To: Martin Wilck; +Cc: bmarzins, zkabelac, teigland, linux-lvm, Heming Zhao

On Tue 28 Sep 2021 06:34, Martin Wilck wrote:
> Hello David and Peter,
> 
> On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > - We could use the new lvm-activate-* services to replace the
> > > > activation
> > > > generator when lvm.conf event_activation=0.  This would be done by
> > > > simply
> > > > not creating the event-activation-on file when event_activation=0.
> > > 
> > > ...the issue I see here is around the systemd-udev-settle:
> > 
> > Thanks, I have a couple questions about the udev-settle to understand
> > that
> > better, although it seems we may not need it.
> > 
> > >   - the setup where lvm-activate-vgs*.service are always there (not
> > >     generated only on event_activation=0 as it was before with the
> > >     original lvm2-activation-*.service) practically means we always
> > >     make a dependency on systemd-udev-settle.service, which we
> > > shouldn't
> > >     do in case we have event_activation=1.
> > 
> > Why wouldn't the event_activation=1 case want a dependency on udev-
> > settle?
> 
> You said it should wait for multipathd, which in turn waits for udev
> settle. And indeed it makes some sense. After all: the idea was to
> avoid locking issues or general resource starvation during uevent
> storms, which typically occur in the coldplug phase, and for which the
> completion of "udev settle" is the best available indicator.
> 

Udevd already limits the number of concurent worker processes
processing the udev rules for each uevent. So even if we trigger all the
uevents, they are not processed all in parallel, there's some queueing.

However, whether this is good or not depends on perspective - you could
have massive paralelism and a risk of resource starvation or, from the
other side, you could have timeouts because something wasn't processed
in time for other parts of the system which are waiting for dependencies.

Also, the situation might differ based on the fact whether during the
uevent processing we're only looking at that concrete single device for
which we've just received an event or whether we also need to look at
other devices.

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-28 17:42                     ` Benjamin Marzinski
  2021-09-28 19:15                       ` Martin Wilck
@ 2021-09-29 22:06                       ` Peter Rajnoha
  2021-09-30  7:51                         ` Martin Wilck
  1 sibling, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-29 22:06 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: zkabelac, Heming Zhao, teigland, linux-lvm, Martin Wilck

On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > I have pondered this quite a bit, but I can't say I have a concrete
> > plan.
> > 
> > To avoid depending on "udev settle", multipathd needs to partially
> > revert to udev-independent device detection. At least during initial
> > startup, we may encounter multipath maps with members that don't exist
> > in the udev db, and we need to deal with this situation gracefully. We
> > currently don't, and it's a tough problem to solve cleanly. Not relying
> > on udev opens up a Pandora's box wrt WWID determination, for example.
> > Any such change would without doubt carry a large risk of regressions
> > in some scenarios, which we wouldn't want to happen in our large
> > customer's data centers.
> 
> I'm not actually sure that it's as bad as all that. We just may need a
> way for multipathd to detect if the coldplug has happened.  I'm sure if
> we say we need it to remove the udev settle, we can get some method to
> check this. Perhaps there is one already, that I don't know about. If

The coldplug events are synthesized and as such, they all now contain
SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent

I've already tried to proposee a patch for systemd/udev that would mark
all uevents coming from the trigger (including the one used at boot for
coldplug) with an extra key-value pair that we could easily match in rules,
but that was not accepted. So right now, we could detect that
synthesized uevent happened, though we can't be sure it was the actual
udev trigger at boot. For that, we'd need the extra marks. I can give it
another try though, maybe if there are more people asking for this
functionality, we'll be at better position for this to be accepted.

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-29 21:39               ` Peter Rajnoha
@ 2021-09-30  7:22                 ` Martin Wilck
  2021-09-30 14:26                   ` David Teigland
  2021-09-30 15:55                 ` David Teigland
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Wilck @ 2021-09-30  7:22 UTC (permalink / raw)
  To: teigland, prajnoha; +Cc: bmarzins, zkabelac, linux-lvm, Heming Zhao

On Wed, 2021-09-29 at 23:39 +0200, Peter Rajnoha wrote:
> On Mon 27 Sep 2021 10:38, David Teigland wrote:
> > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > - We could use the new lvm-activate-* services to replace the
> > > > activation
> > > > generator when lvm.conf event_activation=0.  This would be done
> > > > by simply
> > > > not creating the event-activation-on file when
> > > > event_activation=0.
> > > 
> > > ...the issue I see here is around the systemd-udev-settle:
> > 
> > Thanks, I have a couple questions about the udev-settle to
> > understand that
> > better, although it seems we may not need it.
> > 
> > >   - the setup where lvm-activate-vgs*.service are always there
> > > (not
> > >     generated only on event_activation=0 as it was before with
> > > the
> > >     original lvm2-activation-*.service) practically means we
> > > always
> > >     make a dependency on systemd-udev-settle.service, which we
> > > shouldn't
> > >     do in case we have event_activation=1.
> > 
> > Why wouldn't the event_activation=1 case want a dependency on udev-
> > settle?
> > 
> 
> For event-based activation, I'd expect it to really behave in event-
> based
> manner, that is, to respond to events as soon as they come and not
> wait
> for all the other devices unnecessarily.

I may be missing something here. Perhaps I misunderstood David's
concept. Of course event-based activation is best - in theory.
The reason we're having this discussion is that it may cause thousands
of event handlers being executed in parallel, and that we have seen
cases where this was causing the system to stall during boot for
minutes, or even forever. The ideal solution for that would  be to
figure out how to avoid the contention, but I thought you and David had
given up on that.

Heming has shown that the "static" activation didn't suffer from this
problem. So, to my understanding, David was seeking for a way to
reconcile these two concepts, by starting out statically and switching
to event-based activation when we can without the risk of stalling. To
do that, we must figure out when to switch, and (like it or not) udev
settle is the best indicator we have.

Also IMO David was striving for a solution that "just works"
efficiently both an small and big systems, without the admin having to
adjust configuration files.

> The use of udev-settle is always a pain - for example, if there's a
> mount
> point defined on top of an LV, with udev-settle as dependency, we
> practically
> wait for all devices to settle. With 'all', I mean even devices which
> are not
> block devices and which are not event related to any of that LVM
> layout and
> the stack underneath. So simply we could be waiting uselessly and we
> could increase possibility of a timeout (...for the mount point
> etc.).

True, but is there anything better?
> 

> Non-event-based LVM activation needs to wait for settle for sure
> (because
> there's full scan across all devices).
> 
> Event-based LVM activation just needs to be sure that:
> 
>   - the pvscan only scans the single device (the one for which
> there's
>     the uevent currently being processed),

If that really worked without taking any locks (e.g. on the data
structures about VGs), it would be the answer.


Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-29 21:53                 ` Peter Rajnoha
@ 2021-09-30  7:45                   ` Martin Wilck
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-30  7:45 UTC (permalink / raw)
  To: prajnoha; +Cc: bmarzins, zkabelac, teigland, linux-lvm, Heming Zhao

On Wed, 2021-09-29 at 23:53 +0200, Peter Rajnoha wrote:
> On Tue 28 Sep 2021 06:34, Martin Wilck wrote:
> > 
> > You said it should wait for multipathd, which in turn waits for
> > udev
> > settle. And indeed it makes some sense. After all: the idea was to
> > avoid locking issues or general resource starvation during uevent
> > storms, which typically occur in the coldplug phase, and for which
> > the
> > completion of "udev settle" is the best available indicator.
> > 
> 
> Udevd already limits the number of concurent worker processes
> processing the udev rules for each uevent. So even if we trigger all
> the
> uevents, they are not processed all in parallel, there's some
> queueing.
> 

This is true, but there are situations where reducing the number of
workers to anything reasonable hasn't helped avoid contention
(udev.children-max=1 is unrealistic :-) ). Heming can fill in the
details, I believe. When contention happens, it's very difficult to
debug what's going on, as it's usually during boot, the system is
unresponsive, and it only happens on very large installments that
developers rarely have access to. But Heming went quite a long way
analyzing this.

> However, whether this is good or not depends on perspective - you
> could
> have massive paralelism and a risk of resource starvation or, from
> the
> other side, you could have timeouts because something wasn't
> processed
> in time for other parts of the system which are waiting for
> dependencies.
> 
> Also, the situation might differ based on the fact whether during the
> uevent processing we're only looking at that concrete single device
> for
> which we've just received an event or whether we also need to look at
> other devices.

Yes, "it depends". We are looking for a solution that "works well" for
any setup without specific tuning. Meaning that the system doesn't
stall for substantial amounts of time during boot.

Regards
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-29 22:06                       ` Peter Rajnoha
@ 2021-09-30  7:51                         ` Martin Wilck
  2021-09-30  8:07                           ` heming.zhao
                                             ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-30  7:51 UTC (permalink / raw)
  To: bmarzins, prajnoha; +Cc: zkabelac, teigland, linux-lvm, Heming Zhao

On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
> > On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > > I have pondered this quite a bit, but I can't say I have a
> > > concrete
> > > plan.
> > > 
> > > To avoid depending on "udev settle", multipathd needs to
> > > partially
> > > revert to udev-independent device detection. At least during
> > > initial
> > > startup, we may encounter multipath maps with members that don't
> > > exist
> > > in the udev db, and we need to deal with this situation
> > > gracefully. We
> > > currently don't, and it's a tough problem to solve cleanly. Not
> > > relying
> > > on udev opens up a Pandora's box wrt WWID determination, for
> > > example.
> > > Any such change would without doubt carry a large risk of
> > > regressions
> > > in some scenarios, which we wouldn't want to happen in our large
> > > customer's data centers.
> > 
> > I'm not actually sure that it's as bad as all that. We just may
> > need a
> > way for multipathd to detect if the coldplug has happened.  I'm
> > sure if
> > we say we need it to remove the udev settle, we can get some method
> > to
> > check this. Perhaps there is one already, that I don't know about.
> > If
> 
> The coldplug events are synthesized and as such, they all now contain
> SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
> 
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
> 
> I've already tried to proposee a patch for systemd/udev that would
> mark
> all uevents coming from the trigger (including the one used at boot
> for
> coldplug) with an extra key-value pair that we could easily match in
> rules,
> but that was not accepted. So right now, we could detect that
> synthesized uevent happened, though we can't be sure it was the
> actual
> udev trigger at boot. For that, we'd need the extra marks. I can give
> it
> another try though, maybe if there are more people asking for this
> functionality, we'll be at better position for this to be accepted.

That would allow us to discern synthetic events, but I'm unsure how
this what help us. Here, what matters is to figure out when we don't
expect any more of them to arrive.

I guess it would be possible to compare the list of (interesting)
devices in sysfs with the list of devices in the udev db. For
multipathd, we could

 - scan set U of udev devices on startup
 - scan set S of sysfs devices on startup
 - listen for uevents for updating both S and U
 - after each uevent, check if the difference set of S and U is emtpy
 - if yes, coldplug has finished
 - otherwise, continue waiting, possibly until some timeout expires.

It's more difficult for LVM because you have no daemon maintaining
state.

Martin





> 


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  7:51                         ` Martin Wilck
@ 2021-09-30  8:07                           ` heming.zhao
  2021-09-30  9:31                             ` Martin Wilck
  2021-09-30 11:41                             ` Peter Rajnoha
  2021-09-30 11:29                           ` Peter Rajnoha
  2021-09-30 14:41                           ` Benjamin Marzinski
  2 siblings, 2 replies; 77+ messages in thread
From: heming.zhao @ 2021-09-30  8:07 UTC (permalink / raw)
  To: Martin Wilck, bmarzins, prajnoha; +Cc: linux-lvm, teigland, zkabelac

On 9/30/21 3:51 PM, Martin Wilck wrote:
> On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
>> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
>>> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
>>>> I have pondered this quite a bit, but I can't say I have a
>>>> concrete
>>>> plan.
>>>>
>>>> To avoid depending on "udev settle", multipathd needs to
>>>> partially
>>>> revert to udev-independent device detection. At least during
>>>> initial
>>>> startup, we may encounter multipath maps with members that don't
>>>> exist
>>>> in the udev db, and we need to deal with this situation
>>>> gracefully. We
>>>> currently don't, and it's a tough problem to solve cleanly. Not
>>>> relying
>>>> on udev opens up a Pandora's box wrt WWID determination, for
>>>> example.
>>>> Any such change would without doubt carry a large risk of
>>>> regressions
>>>> in some scenarios, which we wouldn't want to happen in our large
>>>> customer's data centers.
>>>
>>> I'm not actually sure that it's as bad as all that. We just may
>>> need a
>>> way for multipathd to detect if the coldplug has happened.  I'm
>>> sure if
>>> we say we need it to remove the udev settle, we can get some method
>>> to
>>> check this. Perhaps there is one already, that I don't know about.
>>> If
>>
>> The coldplug events are synthesized and as such, they all now contain
>> SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
>>
>>   
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
>>
>> I've already tried to proposee a patch for systemd/udev that would
>> mark
>> all uevents coming from the trigger (including the one used at boot
>> for
>> coldplug) with an extra key-value pair that we could easily match in
>> rules,
>> but that was not accepted. So right now, we could detect that
>> synthesized uevent happened, though we can't be sure it was the
>> actual
>> udev trigger at boot. For that, we'd need the extra marks. I can give
>> it
>> another try though, maybe if there are more people asking for this
>> functionality, we'll be at better position for this to be accepted.
> 
> That would allow us to discern synthetic events, but I'm unsure how
> this what help us. Here, what matters is to figure out when we don't
> expect any more of them to arrive.
> 
> I guess it would be possible to compare the list of (interesting)
> devices in sysfs with the list of devices in the udev db. For
> multipathd, we could
> 
>   - scan set U of udev devices on startup
>   - scan set S of sysfs devices on startup
>   - listen for uevents for updating both S and U
>   - after each uevent, check if the difference set of S and U is emtpy
>   - if yes, coldplug has finished
>   - otherwise, continue waiting, possibly until some timeout expires.
> 
> It's more difficult for LVM because you have no daemon maintaining
> state.
> 

Another performance story:
The legacy lvm2 (2.02.xx) with lvmetad daemon, the event-activation mode
is very likely timeout on a large scale PVs.
When customer met this issue, we suggested them to disable lvmetad.

Heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  8:07                           ` heming.zhao
@ 2021-09-30  9:31                             ` Martin Wilck
  2021-09-30 11:41                             ` Peter Rajnoha
  1 sibling, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-09-30  9:31 UTC (permalink / raw)
  To: linux-lvm

On Thu, 2021-09-30 at 16:07 +0800, heming.zhao@suse.com wrote:
> On 9/30/21 3:51 PM, Martin Wilck wrote:
> 
> 
> Another performance story:
> The legacy lvm2 (2.02.xx) with lvmetad daemon, the event-activation
> mode
> is very likely timeout on a large scale PVs.
> When customer met this issue, we suggested them to disable lvmetad.
> 

Right. IIRC, that used to be a common suggestion to make without having
detailed clues about the issue at hand... it would help more often than
not.

In theory, I believe that a well written daemon maintaining and
consistent internal state (and possibly manipulating it) would scale
better than thousands of clients trying to access state in some shared
fashion (database, filesystem tree, whatever). I have no clue why that
didn't work with lvmetad. It had other issues I never clearly
understood, either. No need to discuss it further, as it has been
abandoned anyway.

Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  7:51                         ` Martin Wilck
  2021-09-30  8:07                           ` heming.zhao
@ 2021-09-30 11:29                           ` Peter Rajnoha
  2021-09-30 16:04                             ` David Teigland
  2021-09-30 14:41                           ` Benjamin Marzinski
  2 siblings, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-30 11:29 UTC (permalink / raw)
  To: Martin Wilck, bmarzins; +Cc: zkabelac, teigland, linux-lvm, Heming Zhao

On 9/30/21 09:51, Martin Wilck wrote:
> On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
>> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
>>> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
>>>> I have pondered this quite a bit, but I can't say I have a
>>>> concrete
>>>> plan.
>>>>
>>>> To avoid depending on "udev settle", multipathd needs to
>>>> partially
>>>> revert to udev-independent device detection. At least during
>>>> initial
>>>> startup, we may encounter multipath maps with members that don't
>>>> exist
>>>> in the udev db, and we need to deal with this situation
>>>> gracefully. We
>>>> currently don't, and it's a tough problem to solve cleanly. Not
>>>> relying
>>>> on udev opens up a Pandora's box wrt WWID determination, for
>>>> example.
>>>> Any such change would without doubt carry a large risk of
>>>> regressions
>>>> in some scenarios, which we wouldn't want to happen in our large
>>>> customer's data centers.
>>>
>>> I'm not actually sure that it's as bad as all that. We just may
>>> need a
>>> way for multipathd to detect if the coldplug has happened.  I'm
>>> sure if
>>> we say we need it to remove the udev settle, we can get some method
>>> to
>>> check this. Perhaps there is one already, that I don't know about.
>>> If
>>
>> The coldplug events are synthesized and as such, they all now contain
>> SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
>>
>>   
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
>>
>> I've already tried to proposee a patch for systemd/udev that would
>> mark
>> all uevents coming from the trigger (including the one used at boot
>> for
>> coldplug) with an extra key-value pair that we could easily match in
>> rules,
>> but that was not accepted. So right now, we could detect that
>> synthesized uevent happened, though we can't be sure it was the
>> actual
>> udev trigger at boot. For that, we'd need the extra marks. I can give
>> it
>> another try though, maybe if there are more people asking for this
>> functionality, we'll be at better position for this to be accepted.
> 
> That would allow us to discern synthetic events, but I'm unsure how
> this what help us. Here, what matters is to figure out when we don't
> expect any more of them to arrive.
> 

I think this would require different approach on systemd/udev side. Currently, 
"udevadm trigger --setlle" uses different UUID for each synthesized uevent's 
SYNTH_UUID. This is actually not exactly how it was meant to be used. Instead, 
the SYNTH_UUID was also meant to be used as form of grouping - so in case of 
"udevadm trigger", there should be a single UUID used to group all the 
generated uevents based on that UUID. Then, this logic could be enhanced in a 
way that there would be different SYNTH_UUID used for each subsystem (e.g. 
block), hence we could wait for each subsystem's devices separately, not being 
dragged by waiting for anything else.

So then we could have services like:
   systemd-udev-settle-block.service
   systemd-udev-settle-othersubsystem.service
   ...

And then place our services after that. We'd need to elaborate a bit if more 
fine grained separation would be needed or not...

If we see this udev settle as the key point, then I think we should probably 
concentrate on enhancing systemd/udev to provide this functionality (and 
primarily the udevadm trigger functionality and waiting for related 
synthesized events). I think the infrastructure to accomplish this is already 
there. It just needs suitable user-space changes (the udevadm trigger).

> I guess it would be possible to compare the list of (interesting)
> devices in sysfs with the list of devices in the udev db. For
> multipathd, we could
> 
>   - scan set U of udev devices on startup
>   - scan set S of sysfs devices on startup

Well, I think that's exactly the functionality that could be provided by the 
settle separation as described above... And then everybody could benefit from 
this.

>   - listen for uevents for updating both S and U
>   - after each uevent, check if the difference set of S and U is emtpy
>   - if yes, coldplug has finished
>   - otherwise, continue waiting, possibly until some timeout expires.
> 
> It's more difficult for LVM because you have no daemon maintaining
> state.
> 
> Martin
> 
> 
> 
> 
> 
>>
> 


-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  8:07                           ` heming.zhao
  2021-09-30  9:31                             ` Martin Wilck
@ 2021-09-30 11:41                             ` Peter Rajnoha
  2021-09-30 15:32                               ` heming.zhao
  1 sibling, 1 reply; 77+ messages in thread
From: Peter Rajnoha @ 2021-09-30 11:41 UTC (permalink / raw)
  To: heming.zhao, Martin Wilck, bmarzins; +Cc: linux-lvm, teigland, zkabelac

On 9/30/21 10:07, heming.zhao@suse.com wrote:
> On 9/30/21 3:51 PM, Martin Wilck wrote:
>> On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
>>> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
>>>> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
>>>>> I have pondered this quite a bit, but I can't say I have a
>>>>> concrete
>>>>> plan.
>>>>>
>>>>> To avoid depending on "udev settle", multipathd needs to
>>>>> partially
>>>>> revert to udev-independent device detection. At least during
>>>>> initial
>>>>> startup, we may encounter multipath maps with members that don't
>>>>> exist
>>>>> in the udev db, and we need to deal with this situation
>>>>> gracefully. We
>>>>> currently don't, and it's a tough problem to solve cleanly. Not
>>>>> relying
>>>>> on udev opens up a Pandora's box wrt WWID determination, for
>>>>> example.
>>>>> Any such change would without doubt carry a large risk of
>>>>> regressions
>>>>> in some scenarios, which we wouldn't want to happen in our large
>>>>> customer's data centers.
>>>>
>>>> I'm not actually sure that it's as bad as all that. We just may
>>>> need a
>>>> way for multipathd to detect if the coldplug has happened.  I'm
>>>> sure if
>>>> we say we need it to remove the udev settle, we can get some method
>>>> to
>>>> check this. Perhaps there is one already, that I don't know about.
>>>> If
>>>
>>> The coldplug events are synthesized and as such, they all now contain
>>> SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent 
>>>
>>>
>>> I've already tried to proposee a patch for systemd/udev that would
>>> mark
>>> all uevents coming from the trigger (including the one used at boot
>>> for
>>> coldplug) with an extra key-value pair that we could easily match in
>>> rules,
>>> but that was not accepted. So right now, we could detect that
>>> synthesized uevent happened, though we can't be sure it was the
>>> actual
>>> udev trigger at boot. For that, we'd need the extra marks. I can give
>>> it
>>> another try though, maybe if there are more people asking for this
>>> functionality, we'll be at better position for this to be accepted.
>>
>> That would allow us to discern synthetic events, but I'm unsure how
>> this what help us. Here, what matters is to figure out when we don't
>> expect any more of them to arrive.
>>
>> I guess it would be possible to compare the list of (interesting)
>> devices in sysfs with the list of devices in the udev db. For
>> multipathd, we could
>>
>>   - scan set U of udev devices on startup
>>   - scan set S of sysfs devices on startup
>>   - listen for uevents for updating both S and U
>>   - after each uevent, check if the difference set of S and U is emtpy
>>   - if yes, coldplug has finished
>>   - otherwise, continue waiting, possibly until some timeout expires.
>>
>> It's more difficult for LVM because you have no daemon maintaining
>> state.
>>
> 
> Another performance story:
> The legacy lvm2 (2.02.xx) with lvmetad daemon, the event-activation mode
> is very likely timeout on a large scale PVs.
> When customer met this issue, we suggested them to disable lvmetad.

We've already dumped lvmetad. Has this also been an issue with lvm versions 
without lvmetad, but still using the event-activation mode? (...the lvm 
versions where instead of lvmetad, we use the helper files under /run/lvm to 
track the state of incoming PVs and VG completeness)

Also, when I tried bootup with over 1000 devices in place (though in a VM, I 
don't have access to real machine with so many devices), I've noticed a 
performance regression in libudev itself with the interface to enumerate 
devices (which is the default obtain_device_list_from_udev=1 in lvm.conf):
https://bugzilla.redhat.com/show_bug.cgi?id=1986158

It's very important to measure what's exactly causing the delays. And also 
important how we measure it - I'm not that trustful to systemd-analyze blame 
as it's very misty of what it is actually measuring.

I just want to say that some of the issues might simply be regressions/issues 
with systemd/udev that could be fixed. We as providers of block device 
abstractions where we need to handle, sometimes, thousands of devices, might 
be the first ones to hit these issues.

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  7:22                 ` Martin Wilck
@ 2021-09-30 14:26                   ` David Teigland
  0 siblings, 0 replies; 77+ messages in thread
From: David Teigland @ 2021-09-30 14:26 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, bmarzins, prajnoha, linux-lvm, Heming Zhao

On Thu, Sep 30, 2021 at 07:22:29AM +0000, Martin Wilck wrote:
> On Wed, 2021-09-29 at 23:39 +0200, Peter Rajnoha wrote:
> > For event-based activation, I'd expect it to really behave in event-
> > based manner, that is, to respond to events as soon as they come and not
> > wait for all the other devices unnecessarily.
> 
> I may be missing something here. Perhaps I misunderstood David's
> concept. Of course event-based activation is best - in theory.
> The reason we're having this discussion is that it may cause thousands
> of event handlers being executed in parallel, and that we have seen
> cases where this was causing the system to stall during boot for
> minutes, or even forever. The ideal solution for that would  be to
> figure out how to avoid the contention, but I thought you and David had
> given up on that.
> 
> Heming has shown that the "static" activation didn't suffer from this
> problem. So, to my understanding, David was seeking for a way to
> reconcile these two concepts, by starting out statically and switching
> to event-based activation when we can without the risk of stalling. To
> do that, we must figure out when to switch, and (like it or not) udev
> settle is the best indicator we have.
> 
> Also IMO David was striving for a solution that "just works"
> efficiently both an small and big systems, without the admin having to
> adjust configuration files.

Right, this is not entirely event based any longer, so there could be some
advantage of an event-based system that we sacrifice.  I think that will
be a good tradeoff for the large majority of cases, and will make a good
default.

> > The use of udev-settle is always a pain - for example, if there's a mount
> > point defined on top of an LV, with udev-settle as dependency, we practically
> > wait for all devices to settle. With 'all', I mean even devices which are not
> > block devices and which are not event related to any of that LVM
> > layout and the stack underneath. So simply we could be waiting uselessly and we
> > could increase possibility of a timeout (...for the mount point etc.).

One theoretical advantage of an event-based system is that it reacts
immediately, so you get faster results.  In practice it's often anything
but immediate, largely because of extra work and moving parts in the
event-based scheme, processing each event individually.  So, the simpler
non-event-based method will often be faster I think, and more robust (all
the moving parts are where things break, so best to minimize them.)

You've filled in some interesting details about udev-settle for me, and it
sounds like there are some ideas forming about an alternative, which would
offer us a better way to switch to event-base-mode.  I'd like to be able
to simply replace the systemd-udev-settle dependency with an improved
"new-settle" dependency when that's ready.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30  7:51                         ` Martin Wilck
  2021-09-30  8:07                           ` heming.zhao
  2021-09-30 11:29                           ` Peter Rajnoha
@ 2021-09-30 14:41                           ` Benjamin Marzinski
  2021-10-01  7:42                             ` Martin Wilck
  2 siblings, 1 reply; 77+ messages in thread
From: Benjamin Marzinski @ 2021-09-30 14:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: zkabelac, prajnoha, teigland, linux-lvm, Heming Zhao

On Thu, Sep 30, 2021 at 07:51:08AM +0000, Martin Wilck wrote:
> On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
> > On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
> > > On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > > > I have pondered this quite a bit, but I can't say I have a
> > > > concrete
> > > > plan.
> > > > 
> > > > To avoid depending on "udev settle", multipathd needs to
> > > > partially
> > > > revert to udev-independent device detection. At least during
> > > > initial
> > > > startup, we may encounter multipath maps with members that don't
> > > > exist
> > > > in the udev db, and we need to deal with this situation
> > > > gracefully. We
> > > > currently don't, and it's a tough problem to solve cleanly. Not
> > > > relying
> > > > on udev opens up a Pandora's box wrt WWID determination, for
> > > > example.
> > > > Any such change would without doubt carry a large risk of
> > > > regressions
> > > > in some scenarios, which we wouldn't want to happen in our large
> > > > customer's data centers.
> > > 
> > > I'm not actually sure that it's as bad as all that. We just may
> > > need a
> > > way for multipathd to detect if the coldplug has happened.  I'm
> > > sure if
> > > we say we need it to remove the udev settle, we can get some method
> > > to
> > > check this. Perhaps there is one already, that I don't know about.
> > > If
> > 
> > The coldplug events are synthesized and as such, they all now contain
> > SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
> > 
> >  
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
> > 
> > I've already tried to proposee a patch for systemd/udev that would
> > mark
> > all uevents coming from the trigger (including the one used at boot
> > for
> > coldplug) with an extra key-value pair that we could easily match in
> > rules,
> > but that was not accepted. So right now, we could detect that
> > synthesized uevent happened, though we can't be sure it was the
> > actual
> > udev trigger at boot. For that, we'd need the extra marks. I can give
> > it
> > another try though, maybe if there are more people asking for this
> > functionality, we'll be at better position for this to be accepted.
> 
> That would allow us to discern synthetic events, but I'm unsure how
> this what help us. Here, what matters is to figure out when we don't
> expect any more of them to arrive.
> 
> I guess it would be possible to compare the list of (interesting)
> devices in sysfs with the list of devices in the udev db. For
> multipathd, we could
> 
>  - scan set U of udev devices on startup
>  - scan set S of sysfs devices on startup
>  - listen for uevents for updating both S and U
>  - after each uevent, check if the difference set of S and U is emtpy
>  - if yes, coldplug has finished

For multipathd, we don't even need to care when all the block devices
have been processed.  We only need to care about devices that are
currently multipathed. If multipathd starts up and notices devices that
are in S and not in U, but those devices aren't currently part of a
multipath device, it can ignore them. 

>  - otherwise, continue waiting, possibly until some timeout expires.
> 
> It's more difficult for LVM because you have no daemon maintaining
> state.
> 
> Martin
> 
> 
> 
> 
> 
> > 

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30 11:41                             ` Peter Rajnoha
@ 2021-09-30 15:32                               ` heming.zhao
  2021-10-01  7:41                                 ` Martin Wilck
  0 siblings, 1 reply; 77+ messages in thread
From: heming.zhao @ 2021-09-30 15:32 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: bmarzins, linux-lvm, teigland, Martin Wilck, zkabelac

On 9/30/21 7:41 PM, Peter Rajnoha wrote:
> On 9/30/21 10:07, heming.zhao@suse.com wrote:
>> On 9/30/21 3:51 PM, Martin Wilck wrote:
>>> On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
>>>> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
>>>>> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
>>>>>> I have pondered this quite a bit, but I can't say I have a
>>>>>> concrete
>>>>>> plan.
>>>>>>
>>>>>> To avoid depending on "udev settle", multipathd needs to
>>>>>> partially
>>>>>> revert to udev-independent device detection. At least during
>>>>>> initial
>>>>>> startup, we may encounter multipath maps with members that don't
>>>>>> exist
>>>>>> in the udev db, and we need to deal with this situation
>>>>>> gracefully. We
>>>>>> currently don't, and it's a tough problem to solve cleanly. Not
>>>>>> relying
>>>>>> on udev opens up a Pandora's box wrt WWID determination, for
>>>>>> example.
>>>>>> Any such change would without doubt carry a large risk of
>>>>>> regressions
>>>>>> in some scenarios, which we wouldn't want to happen in our large
>>>>>> customer's data centers.
>>>>>
>>>>> I'm not actually sure that it's as bad as all that. We just may
>>>>> need a
>>>>> way for multipathd to detect if the coldplug has happened.  I'm
>>>>> sure if
>>>>> we say we need it to remove the udev settle, we can get some method
>>>>> to
>>>>> check this. Perhaps there is one already, that I don't know about.
>>>>> If
>>>>
>>>> The coldplug events are synthesized and as such, they all now contain
>>>> SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
>>>>
>>>> I've already tried to proposee a patch for systemd/udev that would
>>>> mark
>>>> all uevents coming from the trigger (including the one used at boot
>>>> for
>>>> coldplug) with an extra key-value pair that we could easily match in
>>>> rules,
>>>> but that was not accepted. So right now, we could detect that
>>>> synthesized uevent happened, though we can't be sure it was the
>>>> actual
>>>> udev trigger at boot. For that, we'd need the extra marks. I can give
>>>> it
>>>> another try though, maybe if there are more people asking for this
>>>> functionality, we'll be at better position for this to be accepted.
>>>
>>> That would allow us to discern synthetic events, but I'm unsure how
>>> this what help us. Here, what matters is to figure out when we don't
>>> expect any more of them to arrive.
>>>
>>> I guess it would be possible to compare the list of (interesting)
>>> devices in sysfs with the list of devices in the udev db. For
>>> multipathd, we could
>>>
>>>   - scan set U of udev devices on startup
>>>   - scan set S of sysfs devices on startup
>>>   - listen for uevents for updating both S and U
>>>   - after each uevent, check if the difference set of S and U is emtpy
>>>   - if yes, coldplug has finished
>>>   - otherwise, continue waiting, possibly until some timeout expires.
>>>
>>> It's more difficult for LVM because you have no daemon maintaining
>>> state.
>>>
>>
>> Another performance story:
>> The legacy lvm2 (2.02.xx) with lvmetad daemon, the event-activation mode
>> is very likely timeout on a large scale PVs.
>> When customer met this issue, we suggested them to disable lvmetad.
> 
> We've already dumped lvmetad. Has this also been an issue with lvm versions without lvmetad, but still using the event-activation mode? (...the lvm versions where instead of lvmetad, we use the helper files under /run/lvm to track the state of incoming PVs and VG completeness)
> 
> Also, when I tried bootup with over 1000 devices in place (though in a VM, I don't have access to real machine with so many devices), I've noticed a performance regression in libudev itself with the interface to enumerate devices (which is the default obtain_device_list_from_udev=1 in lvm.conf):
> https://bugzilla.redhat.com/show_bug.cgi?id=1986158
> 
> It's very important to measure what's exactly causing the delays. And also important how we measure it - I'm not that trustful to systemd-analyze blame as it's very misty of what it is actually measuring.
> 
> I just want to say that some of the issues might simply be regressions/issues with systemd/udev that could be fixed. We as providers of block device abstractions where we need to handle, sometimes, thousands of devices, might be the first ones to hit these issues.
> 

The rhel8 callgrind picture (https://prajnoha.fedorapeople.org/bz1986158/rhel8_libudev_critical_cost.png)
responds to my analysis:
https://listman.redhat.com/archives/linux-lvm/2021-June/msg00022.html
handle_db_line took too much time and become the hotspot.

Heming


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-29 21:39               ` Peter Rajnoha
  2021-09-30  7:22                 ` Martin Wilck
@ 2021-09-30 15:55                 ` David Teigland
  2021-10-01  8:00                   ` Peter Rajnoha
  1 sibling, 1 reply; 77+ messages in thread
From: David Teigland @ 2021-09-30 15:55 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: zkabelac, bmarzins, martin.wilck, heming.zhao, linux-lvm

On Wed, Sep 29, 2021 at 11:39:52PM +0200, Peter Rajnoha wrote:
> Hmm, thinking about this, I've just realized one more important and related
> thing here I didn't realize before - the LVM regex filters! These may contain
> symlink names as one can find them in /dev. But for those symlinks, we need
> to be sure that the rules are already applied. This practically means that:

At least at RH we're enabling the devices file by default (meaning no
filter) in the same timeframe that we're looking at these activation
services.  So, I don't think this is a big factor.

>   - For non-event-based activation, we need udev-settle (so we're sure
>     all the rules are applied for all devices we might be scanning).
> 
>   - For event-based activation, we need to be sure that we use "RUN"
>     rule, not any of "IMPORT{program}" or "PROGRAM" rule. The difference
>     is that the "RUN" rules are applied after all the other udev rules are
>     already applied for current uevent, including creation of symlinks. And
>     currently, we have IMPORT{program}="pvscan..." in our rule,
>     unfortunately...

That's pretty subtle, I'm wary about propagating such specific and
delicate behavior, seems fragile.

> The nodes are already there, the symlinks could be missing because the
> udev rules haven't been processed yet.
> 
> Non-event-based LVM activation needs to wait for settle for sure (because
> there's full scan across all devices).
> 
> Event-based LVM activation just needs to be sure that:
> 
>   - the pvscan only scans the single device (the one for which there's
>     the uevent currently being processed),
> 
>   - the pvscan should be called in a way that we have all the symlinks
>     in place so the regex filter still works for symlinks (== putting
>     the pvscan onto a RUN exec queue).

I think we're looking at a udev-settle dependency (or alternative) for all
cases, best to just make that explicit and consistent, and isolated in one
place.  I'm not really seeing a downside to that.  Then, focus efforts on
refining a replacement.  If the subtle dependencies are spread around then
it's hard to extract and improve the unwanted parts.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30 11:29                           ` Peter Rajnoha
@ 2021-09-30 16:04                             ` David Teigland
  0 siblings, 0 replies; 77+ messages in thread
From: David Teigland @ 2021-09-30 16:04 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: zkabelac, bmarzins, Heming Zhao, linux-lvm, Martin Wilck

On Thu, Sep 30, 2021 at 01:29:07PM +0200, Peter Rajnoha wrote:
> If we see this udev settle as the key point, then I think we should probably
> concentrate on enhancing systemd/udev to provide this functionality (and
> primarily the udevadm trigger functionality and waiting for related
> synthesized events). I think the infrastructure to accomplish this is
> already there. It just needs suitable user-space changes (the udevadm
> trigger).

I think it would be nice if the waiting/settling concept was generalized a
little more than we have now, to make it easier to improve or replace the
implementation, either as we find better mechanisms or as user needs vary.

Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30 15:32                               ` heming.zhao
@ 2021-10-01  7:41                                 ` Martin Wilck
  2021-10-01  8:08                                   ` Peter Rajnoha
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Wilck @ 2021-10-01  7:41 UTC (permalink / raw)
  To: linux-lvm, prajnoha; +Cc: bmarzins, teigland, zkabelac

On Thu, 2021-09-30 at 23:32 +0800, heming.zhao@suse.com wrote:
> > I just want to say that some of the issues might simply be
> > regressions/issues with systemd/udev that could be fixed. We as
> > providers of block device abstractions where we need to handle,
> > sometimes, thousands of devices, might be the first ones to hit these
> > issues.
> > 
> 
> The rhel8 callgrind picture
> (https://prajnoha.fedorapeople.org/bz1986158/rhel8_libudev_critical_cost.png
> )
> responds to my analysis:
> https://listman.redhat.com/archives/linux-lvm/2021-June/msg00022.html
> handle_db_line took too much time and become the hotspot.

I missed that post. You wrote

> the dev_cache_scan doesn't have direct disk IOs, but libudev will
scan/read
> udev db which issue real disk IOs (location is /run/udev/data).
> ...
> 2. scans/reads udev db (/run/udev/data). may O(n)
>  udev will call device_read_db => handle_db_line to handle every
>    line of a db file.
> ...
> I didn't test the related udev code, and guess the <2> takes too much
time.

... but note that /run/udev is on tmpfs, not on a real disk. So  the
accesses should be very fast unless there's some locking happening.

Regards
Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30 14:41                           ` Benjamin Marzinski
@ 2021-10-01  7:42                             ` Martin Wilck
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Wilck @ 2021-10-01  7:42 UTC (permalink / raw)
  To: bmarzins; +Cc: prajnoha, zkabelac, teigland, linux-lvm, Heming Zhao

On Thu, 2021-09-30 at 09:41 -0500, Benjamin Marzinski wrote:
> On Thu, Sep 30, 2021 at 07:51:08AM +0000, Martin Wilck wrote:
> 
> 
> For multipathd, we don't even need to care when all the block devices
> have been processed.  We only need to care about devices that are
> currently multipathed. If multipathd starts up and notices devices
> that
> are in S and not in U, but those devices aren't currently part of a
> multipath device, it can ignore them. 

True. Sorry for being imprecise.

Martin


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-09-30 15:55                 ` David Teigland
@ 2021-10-01  8:00                   ` Peter Rajnoha
  0 siblings, 0 replies; 77+ messages in thread
From: Peter Rajnoha @ 2021-10-01  8:00 UTC (permalink / raw)
  To: David Teigland; +Cc: zkabelac, bmarzins, martin.wilck, heming.zhao, linux-lvm

On Thu 30 Sep 2021 10:55, David Teigland wrote:
> On Wed, Sep 29, 2021 at 11:39:52PM +0200, Peter Rajnoha wrote:
> > Hmm, thinking about this, I've just realized one more important and related
> > thing here I didn't realize before - the LVM regex filters! These may contain
> > symlink names as one can find them in /dev. But for those symlinks, we need
> > to be sure that the rules are already applied. This practically means that:
> 
> At least at RH we're enabling the devices file by default (meaning no
> filter) in the same timeframe that we're looking at these activation
> services.  So, I don't think this is a big factor.
> 

OK, if we don't need access to all the aliases (all the symlink names under /dev),
then that's great. We just need to make sure the devices file is then
always used with pvscan called out of the udev rule.

> >   - For non-event-based activation, we need udev-settle (so we're sure
> >     all the rules are applied for all devices we might be scanning).
> > 
> >   - For event-based activation, we need to be sure that we use "RUN"
> >     rule, not any of "IMPORT{program}" or "PROGRAM" rule. The difference
> >     is that the "RUN" rules are applied after all the other udev rules are
> >     already applied for current uevent, including creation of symlinks. And
> >     currently, we have IMPORT{program}="pvscan..." in our rule,
> >     unfortunately...
> 
> That's pretty subtle, I'm wary about propagating such specific and
> delicate behavior, seems fragile.
> 

Yeah, that was just because I didn't realize we don't actually need those
symlinks and the classic filter is ignored in this case. I thought we
had a bug in our current 69-dm-lvm.rules where we call IMPORT{progra}=pvscan
in which case all the symlinks are not created yet. I'm glad it's not
the case.

> > The nodes are already there, the symlinks could be missing because the
> > udev rules haven't been processed yet.
> > 
> > Non-event-based LVM activation needs to wait for settle for sure (because
> > there's full scan across all devices).
> > 
> > Event-based LVM activation just needs to be sure that:
> > 
> >   - the pvscan only scans the single device (the one for which there's
> >     the uevent currently being processed),
> > 
> >   - the pvscan should be called in a way that we have all the symlinks
> >     in place so the regex filter still works for symlinks (== putting
> >     the pvscan onto a RUN exec queue).
> 
> I think we're looking at a udev-settle dependency (or alternative) for all
> cases, best to just make that explicit and consistent, and isolated in one
> place.  I'm not really seeing a downside to that.  Then, focus efforts on
> refining a replacement.  If the subtle dependencies are spread around then
> it's hard to extract and improve the unwanted parts.

So then we can concentrate on minimizing the set of devices we need to
"settle". Right now, I don't see technical obstacles to implementing this,
but the separation of systemd-udev-settle.service into pieces would be a
first step for this to perform better, I think.

The only problematic part here might be systemd/udev's aversion to
anything having "udev-settle" in mind. But I think if we're able to show
them the difference and advantage of using the settle here by presenting
real numbers and analysis, there might be a chance. The separation of
udev-settle into pieces would surely help too, because we could isolate
the settle to devices we're only interested in.

We can surely discuss that with them. Would be great if we find consensus
here on all sides, I don't want to fight with or completely ignore
systemd/udev guys and their optinions...

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [linux-lvm] Discussion: performance issue on event activation mode
  2021-10-01  7:41                                 ` Martin Wilck
@ 2021-10-01  8:08                                   ` Peter Rajnoha
  0 siblings, 0 replies; 77+ messages in thread
From: Peter Rajnoha @ 2021-10-01  8:08 UTC (permalink / raw)
  To: Martin Wilck; +Cc: bmarzins, zkabelac, teigland, linux-lvm

On Fri 01 Oct 2021 07:41, Martin Wilck wrote:
> On Thu, 2021-09-30 at 23:32 +0800, heming.zhao@suse.com wrote:
> > > I just want to say that some of the issues might simply be
> > > regressions/issues with systemd/udev that could be fixed. We as
> > > providers of block device abstractions where we need to handle,
> > > sometimes, thousands of devices, might be the first ones to hit these
> > > issues.
> > > 
> > 
> > The rhel8 callgrind picture
> > (https://prajnoha.fedorapeople.org/bz1986158/rhel8_libudev_critical_cost.png
> > )
> > responds to my analysis:
> > https://listman.redhat.com/archives/linux-lvm/2021-June/msg00022.html
> > handle_db_line took too much time and become the hotspot.
> 
> I missed that post. You wrote
> 
> > the dev_cache_scan doesn't have direct disk IOs, but libudev will
> scan/read
> > udev db which issue real disk IOs (location is /run/udev/data).
> > ...
> > 2. scans/reads udev db (/run/udev/data). may O(n)
> >  udev will call device_read_db => handle_db_line to handle every
> >    line of a db file.
> > ...
> > I didn't test the related udev code, and guess the <2> takes too much
> time.
> 
> ... but note that /run/udev is on tmpfs, not on a real disk. So  the
> accesses should be very fast unless there's some locking happening.

Yes, indeed! I think this is a regression.

The results/graphs show that lots of time is spent on some internal
hashmap handling. I don't see this in older versions of udev like the
one bundled with systemd v219 (I compared RHEL7 and 8, haven't done
detailed bisection yet). My suspicion is that some of the code in udev
got more shared with native systemd code, like that hash usage, so this
might be the clue, but someone from systemd/udev should look more closer
into this.

-- 
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2021-10-01  8:09 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06  6:15 [linux-lvm] Discussion: performance issue on event activation mode heming.zhao
2021-06-06 16:35 ` Roger Heflin
2021-06-07 10:27   ` Martin Wilck
2021-06-07 15:30     ` heming.zhao
2021-06-07 15:45       ` Martin Wilck
2021-06-07 20:52       ` Roger Heflin
2021-06-07 21:30     ` David Teigland
2021-06-08  8:26       ` Martin Wilck
2021-06-08 15:39         ` David Teigland
2021-06-08 15:47           ` Martin Wilck
2021-06-08 16:02             ` Zdenek Kabelac
2021-06-08 16:05               ` Martin Wilck
2021-06-08 16:03             ` David Teigland
2021-06-08 16:07               ` Martin Wilck
2021-06-15 17:03           ` David Teigland
2021-06-15 18:21             ` Zdenek Kabelac
2021-06-16 16:18             ` heming.zhao
2021-06-16 16:38               ` David Teigland
2021-06-17  3:46                 ` heming.zhao
2021-06-17 15:27                   ` David Teigland
2021-06-08 16:49         ` heming.zhao
2021-06-08 16:18       ` heming.zhao
2021-06-09  4:01         ` heming.zhao
2021-06-09  5:37           ` Heming Zhao
2021-06-09 18:59             ` David Teigland
2021-06-10 17:23               ` heming.zhao
2021-06-07 15:48 ` Martin Wilck
2021-06-07 16:31   ` Zdenek Kabelac
2021-06-07 21:48   ` David Teigland
2021-06-08 12:29     ` Peter Rajnoha
2021-06-08 13:23       ` Martin Wilck
2021-06-08 13:41         ` Peter Rajnoha
2021-06-08 13:46           ` Zdenek Kabelac
2021-06-08 13:56             ` Peter Rajnoha
2021-06-08 14:23               ` Zdenek Kabelac
2021-06-08 14:48               ` Martin Wilck
2021-06-08 15:19                 ` Peter Rajnoha
2021-06-08 15:39                   ` Martin Wilck
2021-09-09 19:44         ` David Teigland
2021-09-10 17:38           ` Martin Wilck
2021-09-12 16:51             ` heming.zhao
2021-09-27 10:00           ` Peter Rajnoha
2021-09-27 15:38             ` David Teigland
2021-09-28  6:34               ` Martin Wilck
2021-09-28 14:42                 ` David Teigland
2021-09-28 15:16                   ` Martin Wilck
2021-09-28 15:31                     ` Martin Wilck
2021-09-28 15:56                     ` David Teigland
2021-09-28 18:03                       ` Benjamin Marzinski
2021-09-28 17:42                     ` Benjamin Marzinski
2021-09-28 19:15                       ` Martin Wilck
2021-09-29 22:06                       ` Peter Rajnoha
2021-09-30  7:51                         ` Martin Wilck
2021-09-30  8:07                           ` heming.zhao
2021-09-30  9:31                             ` Martin Wilck
2021-09-30 11:41                             ` Peter Rajnoha
2021-09-30 15:32                               ` heming.zhao
2021-10-01  7:41                                 ` Martin Wilck
2021-10-01  8:08                                   ` Peter Rajnoha
2021-09-30 11:29                           ` Peter Rajnoha
2021-09-30 16:04                             ` David Teigland
2021-09-30 14:41                           ` Benjamin Marzinski
2021-10-01  7:42                             ` Martin Wilck
2021-09-29 21:53                 ` Peter Rajnoha
2021-09-30  7:45                   ` Martin Wilck
2021-09-29 21:39               ` Peter Rajnoha
2021-09-30  7:22                 ` Martin Wilck
2021-09-30 14:26                   ` David Teigland
2021-09-30 15:55                 ` David Teigland
2021-10-01  8:00                   ` Peter Rajnoha
2021-06-07 16:40 ` David Teigland
2021-07-02 21:09 ` David Teigland
2021-07-02 21:22   ` Martin Wilck
2021-07-02 22:02     ` David Teigland
2021-07-03 11:49       ` heming.zhao
2021-07-08 10:10         ` Tom Yan
2021-07-02 21:31   ` Tom Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).