This might be a simpler way to control the number of threads at the same time.
On large machines (cpu wise, memory wise and disk wise). I have only seen lvm timeout when udev_children is set to default. The default seems to be set wrong, and the default seemed to be tuned for a case where a large number of the disks on the machine were going to be timing out (or otherwise really really slow), so to support this case a huge number of threads was required.. I found that with it set to default on a close to 100 core machine that udev got about 87 minutes of time during the boot up (about 2 minutes). Changing the number of children to =4 resulted in udev getting around 2-3 minutes in the same window, and actually resulted in a much faster boot up and a much more reliable boot up (no timeouts). We experienced these timeouts on a number of the larger machines (70 cores or more) before we debugged and determined what was going on. It would appear that the udev threads on the giant machines with a lot of disk are overwhelming each other in some sort of tight (either process creating system time or some other resource constraint) loop and causing contention and doing very little useful work.
Just an observation, this may have nothing to do with what you have going on, but what you are describing sounds very close to what I debugged. We were doing "ps axuwwS | grep -i udev" just after boot up to determine how much cpu time udev was getting during boot up, and determined that as we lowered the children the time got less, and the boot up got faster and stopped timing out. And since udev was getting 90 minutes in an elapsed time of around 120 seconds, it had to be using a significant number of threads during boot up. I believe these same udev threads call the pvscans.
Below is one case, but I know there are several other similar cases for other distributions. Note the number of default workers = 8 + number_of_cpus * 64 which is going to be a disaster as it will result in one thread per disk/lun being started at the same time or the max_number_of_workers. Either of which will result in a high degree of non-productive useless system contention on a machine with a significant number of luns.