The case we had is large physical machines with around 1000 disks. We did not see the issue on the smaller cpu/disked physicals and/or vm's. It seemed like both high cpu counts and high disk counts were needed, but in our environment both of those are usually together. The smallest machines that had the issues had 72 threads (36 actual cores). And the disk devices were all SSD SAN luns so I would expect all of the devices to respond to and return IO requests in under .3ms under normal conditions. They were also all partitioned and multipath'ed. 90% of the disk would not have had any LVM on them at all but would have been at least initial scanned by something, but the systemd LVM parts where what was timing out, and based on the time udev was getting in the 90-120 seconds (90 minutes of time) it very much seemed to be having serious cpu time issues doing something.

I have done some simple tests forking a bunch of tests forking off a bunch of /usr/sbin/lvm pvscan --cache major:minor in the background and in parallel rapidly and cannot get it to really act badly except with numbers that are >20000.

And if I am reading the direct case pvscan that is fast about the only thing that differs is that it does not spawn off lots of processes and events and just does the pvscan once. Between udev and systemd I am not clear on how many different events have to be handled and how many of those events need to spawn new threads and/or fork new processes off. Something doing one of those 2 things or both would seem to have been the cause of the issue I saw in the past.

When it has difficult booting up like this what does ps axuS | grep udev look like time wise?

On Mon, Jun 7, 2021 at 10:30 AM heming.zhao@suse.com <heming.zhao@suse.com> wrote:

On 6/7/21 6:27 PM, Martin Wilck wrote:
> On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
>> This might be a simpler way to control the number of threads at the
>> same time.
>>
>> On large machines (cpu wise, memory wise and disk wise). I have
>> only seen lvm timeout when udev_children is set to default. The
>> default seems to be set wrong, and the default seemed to be tuned for
>> a case where a large number of the disks on the machine were going to
>> be timing out (or otherwise really really slow), so to support this
>> case a huge number of threads was required.. I found that with it
>> set to default on a close to 100 core machine that udev got about 87
>> minutes of time during the boot up (about 2 minutes). Changing the
>> number of children to =4 resulted in udev getting around 2-3 minutes
>> in the same window, and actually resulted in a much faster boot up
>> and a much more reliable boot up (no timeouts).
>
> Wow, setting the number of children to 4 is pretty radical. We decrease
> this parameter often on large machines, but we never went all the way
> down to a single-digit number. If that's really necessary under
> whatever circumstances, it's clear evidence of udev's deficiencies.
>
> I am not sure if it's better than Heming's suggestion though. It would
> affect every device in the system. It wouldn't even be possible to
> process more than 4 totally different events at the same time.
>

hello

I tested udev.children_max with value 1, 2 & 4. The results showed it
didn't take effect, and the booting time even longer than before.
This solution may suite for some special cases.

(my env: kvm-qemu vm, 6vpu, 22G mem, 1015 disks)

Regards
heming