All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [Question] multipathd add/remove paths takes a long time
@ 2022-07-19 12:13 Wu Guanghao
  2022-07-19 16:55 ` Roger Heflin
  2022-07-21 14:34 ` Benjamin Marzinski
  0 siblings, 2 replies; 5+ messages in thread
From: Wu Guanghao @ 2022-07-19 12:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, mwilck, dm-devel
  Cc: lixiaokeng, liuxing108, zhangying134, chenmao2, linfeilong,
	liuzhiqiang26

The system has 1K multipath devices, each device has 16 paths.
Execute multipathd add/multipathd remove or uev_add_path/
uev_remove_path to add/remove paths, which takes over 20s.
What's more, the second checkloop may be execed immediately
after finishing first checkloop. It's too long.

We found that time was mostly spent waiting for locks.

checkerloop(){
	...
	lock(&vecs->lock);
	vector_foreach_slot (vecs->pathvec, pp, i) {
		rc = check_path(...); // Too many paths, it takes a long time
		...
	}
	lock_cleanup_pop(vecs->lock);
	...
}

Can the range of vecs->lock locks be adjusted to reduce the time consuming
when adding/removing paths?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [Question] multipathd add/remove paths takes a long time
  2022-07-19 12:13 [dm-devel] [Question] multipathd add/remove paths takes a long time Wu Guanghao
@ 2022-07-19 16:55 ` Roger Heflin
  2022-07-21 11:53   ` Wu Guanghao
  2022-07-21 14:34 ` Benjamin Marzinski
  1 sibling, 1 reply; 5+ messages in thread
From: Roger Heflin @ 2022-07-19 16:55 UTC (permalink / raw)
  To: Wu Guanghao
  Cc: lixiaokeng, liuxing108, zhangying134, chenmao2, liuzhiqiang (I),
	linfeilong, device-mapper development, Christophe Varoqui,
	Martin Wilck

What does the cpu time look like when you are seeing this issue?

I have seen large numbers of scsi devices coming in and multipaths
getting built cause the system to seem to waste time.   With a high
numbers of udev_children (I believe the default is pretty high) it can
use excessive cpu on a big machine with a lot of paths and appears to
be interfering with itself.

In testing I was involved in it was found that setting udev_children
to 4 produced consistent fast behavior, whereas having it set to
default (lots of threads on large machines, exact number varies on
machine size/distribution/udev version) sometimes producing systemd
timeouts when paths were brought in. (>90seconds find PV for required
LV's).

The give away was udev accumulated 50-90 minutes of cpu time in a
couple of minutes of boot up with default udev_children, but with it
set to only 4 the paths processed faster and the machine booted up
faster and udev did the same real work faster with much less cputime
2-3 minutes of cpu time.

this is the option:
/usr/lib/systemd/systemd-udevd --children-max=4.

On Tue, Jul 19, 2022 at 7:33 AM Wu Guanghao <wuguanghao3@huawei.com> wrote:
>
> The system has 1K multipath devices, each device has 16 paths.
> Execute multipathd add/multipathd remove or uev_add_path/
> uev_remove_path to add/remove paths, which takes over 20s.
> What's more, the second checkloop may be execed immediately
> after finishing first checkloop. It's too long.
>
> We found that time was mostly spent waiting for locks.
>
> checkerloop(){
>         ...
>         lock(&vecs->lock);
>         vector_foreach_slot (vecs->pathvec, pp, i) {
>                 rc = check_path(...); // Too many paths, it takes a long time
>                 ...
>         }
>         lock_cleanup_pop(vecs->lock);
>         ...
> }
>
> Can the range of vecs->lock locks be adjusted to reduce the time consuming
> when adding/removing paths?
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [Question] multipathd add/remove paths takes a long time
  2022-07-19 16:55 ` Roger Heflin
@ 2022-07-21 11:53   ` Wu Guanghao
  0 siblings, 0 replies; 5+ messages in thread
From: Wu Guanghao @ 2022-07-21 11:53 UTC (permalink / raw)
  To: Roger Heflin
  Cc: lixiaokeng, liuxing108, zhangying134, chenmao2, liuzhiqiang (I),
	linfeilong, device-mapper development, Christophe Varoqui,
	Martin Wilck



在 2022/7/20 0:55, Roger Heflin 写道:
> What does the cpu time look like when you are seeing this issue?
> 
> I have seen large numbers of scsi devices coming in and multipaths
> getting built cause the system to seem to waste time.   With a high
> numbers of udev_children (I believe the default is pretty high) it can
> use excessive cpu on a big machine with a lot of paths and appears to
> be interfering with itself.
> Our problem may be a little different. A large number of multipath devices have been created.
We only add a multipath device,so there shouldn't be a lot of udev events.

> In testing I was involved in it was found that setting udev_children
> to 4 produced consistent fast behavior, whereas having it set to
> default (lots of threads on large machines, exact number varies on
> machine size/distribution/udev version) sometimes producing systemd
> timeouts when paths were brought in. (>90seconds find PV for required
> LV's).
> 
> The give away was udev accumulated 50-90 minutes of cpu time in a
> couple of minutes of boot up with default udev_children, but with it
> set to only 4 the paths processed faster and the machine booted up
> faster and udev did the same real work faster with much less cputime
> 2-3 minutes of cpu time.
> 
> this is the option:
> /usr/lib/systemd/systemd-udevd --children-max=4.
>Modified as you suggested, but it doesn't work very well.

> On Tue, Jul 19, 2022 at 7:33 AM Wu Guanghao <wuguanghao3@huawei.com> wrote:
>>
>> The system has 1K multipath devices, each device has 16 paths.
>> Execute multipathd add/multipathd remove or uev_add_path/
>> uev_remove_path to add/remove paths, which takes over 20s.
>> What's more, the second checkloop may be execed immediately
>> after finishing first checkloop. It's too long.
>>
>> We found that time was mostly spent waiting for locks.
>>
>> checkerloop(){
>>         ...
>>         lock(&vecs->lock);
>>         vector_foreach_slot (vecs->pathvec, pp, i) {
>>                 rc = check_path(...); // Too many paths, it takes a long time
>>                 ...
>>         }
>>         lock_cleanup_pop(vecs->lock);
>>         ...
>> }
>>
>> Can the range of vecs->lock locks be adjusted to reduce the time consuming
>> when adding/removing paths?
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://listman.redhat.com/mailman/listinfo/dm-devel
>>
> .
> 
In our test environment, it takes over 40s for checkerloop() to check all paths,
vecs->lock will not be released during this time.So if we execute commands to
add/remove paths during this time, we may have to wait more than 40s at most
to get vecs->lock.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [Question] multipathd add/remove paths takes a long time
  2022-07-19 12:13 [dm-devel] [Question] multipathd add/remove paths takes a long time Wu Guanghao
  2022-07-19 16:55 ` Roger Heflin
@ 2022-07-21 14:34 ` Benjamin Marzinski
  2022-07-22  6:28   ` Wu Guanghao
  1 sibling, 1 reply; 5+ messages in thread
From: Benjamin Marzinski @ 2022-07-21 14:34 UTC (permalink / raw)
  To: Wu Guanghao
  Cc: lixiaokeng, liuxing108, zhangying134, chenmao2, liuzhiqiang26,
	linfeilong, dm-devel, Christophe Varoqui, mwilck

On Tue, Jul 19, 2022 at 08:13:39PM +0800, Wu Guanghao wrote:
> The system has 1K multipath devices, each device has 16 paths.
> Execute multipathd add/multipathd remove or uev_add_path/
> uev_remove_path to add/remove paths, which takes over 20s.
> What's more, the second checkloop may be execed immediately
> after finishing first checkloop. It's too long.
> 
> We found that time was mostly spent waiting for locks.
> 
> checkerloop(){
> 	...
> 	lock(&vecs->lock);
> 	vector_foreach_slot (vecs->pathvec, pp, i) {
> 		rc = check_path(...); // Too many paths, it takes a long time
> 		...
> 	}
> 	lock_cleanup_pop(vecs->lock);
> 	...
> }
> 
> Can the range of vecs->lock locks be adjusted to reduce the time consuming
> when adding/removing paths?

As long as we make sure not to skip any paths or double-check any paths,
we don't need to hold the vecs->lock between checking paths. There is
certainly some optimization that could get done here.

could you post the output of:

# multipath -l <sample_multipath_device>
# multipathd show config local

-Ben
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [Question] multipathd add/remove paths takes a long time
  2022-07-21 14:34 ` Benjamin Marzinski
@ 2022-07-22  6:28   ` Wu Guanghao
  0 siblings, 0 replies; 5+ messages in thread
From: Wu Guanghao @ 2022-07-22  6:28 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: lixiaokeng, liuxing108, zhangying134, chenmao2, liuzhiqiang26,
	linfeilong, dm-devel, Christophe Varoqui, mwilck



在 2022/7/21 22:34, Benjamin Marzinski 写道:
> On Tue, Jul 19, 2022 at 08:13:39PM +0800, Wu Guanghao wrote:
>> The system has 1K multipath devices, each device has 16 paths.
>> Execute multipathd add/multipathd remove or uev_add_path/
>> uev_remove_path to add/remove paths, which takes over 20s.
>> What's more, the second checkloop may be execed immediately
>> after finishing first checkloop. It's too long.
>>
>> We found that time was mostly spent waiting for locks.
>>
>> checkerloop(){
>> 	...
>> 	lock(&vecs->lock);
>> 	vector_foreach_slot (vecs->pathvec, pp, i) {
>> 		rc = check_path(...); // Too many paths, it takes a long time
>> 		...
>> 	}
>> 	lock_cleanup_pop(vecs->lock);
>> 	...
>> }
>>
>> Can the range of vecs->lock locks be adjusted to reduce the time consuming
>> when adding/removing paths?
> 
> As long as we make sure not to skip any paths or double-check any paths,
> we don't need to hold the vecs->lock between checking paths. There is
> certainly some optimization that could get done here.
> 
> could you post the output of:
> 
> # multipath -l <sample_multipath_device>
> # multipathd show config local
> 

This is the output and time consuming of 'multipath -l'

# time multipath -l 364cf55b10097699e01a9c4d4000003c4
174181.329435 | loading /lib64/multipath/libchecktur.so checker
364cf55b10097699e01a9c4d4000003c4 dm-604 HUAWEI,XSG1
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 27:0:0:591  sdlgf 70:8368   active undef running
  |- 28:0:0:591  sdlhs 129:8224  active undef running
  |- 39:0:0:591  sdlst 131:8464  active undef running
  |- 24:0:0:591  sdmdv 133:8720  active undef running
  |- 40:0:0:591  sdmej 133:8944  active undef running
  |- 29:0:0:591  sdmes 134:8832  active undef running
  |- 37:0:0:591  sdmhl 66:9200   active undef running
  |- 38:0:0:591  sdmju 70:9152   active undef running
  |- 26:0:0:591  sdmjx 70:9200   active undef running
  |- 41:0:0:591  sdnre 133:9728  active undef running
  |- 43:0:0:591  sdnrz 134:9808  active undef running
  `- 42:0:0:591  sdnsp 135:9808  active undef running

real    0m37.570s
user    0m28.676s
sys     0m8.112s

This is the output of 'multipathd show config local'

timeout
defaults {
	verbosity 2
	polling_interval 5
	max_polling_interval 20
	reassign_maps "no"
	multipath_dir "/lib64/multipath"
	path_selector "service-time 0"
	path_grouping_policy "group_by_prio"
	uid_attribute "ID_SERIAL"
	prio "const"
	prio_args ""
	features "0"
	path_checker "tur"
	alias_prefix "mpath"
	failback "immediate"
	rr_min_io 1000
	rr_min_io_rq 1
	max_fds "max"
	rr_weight "uniform"
	no_path_retry 18
	queue_without_daemon "no"
	flush_on_last_del "no"
	user_friendly_names "no"
	fast_io_fail_tmo 5
	bindings_file "/etc/multipath/bindings"
	wwids_file "/etc/multipath/wwids"
	prkeys_file "/etc/multipath/prkeys"
	log_checker_err once
	reservation_key file
	all_tg_pt "no"
	retain_attached_hw_handler "yes"
	detect_prio "yes"
	detect_checker "yes"
	force_sync "no"
	strict_timing "no"
	deferred_remove "yes"
	config_dir "/etc/multipath/conf.d"
	delay_watch_checks "no"
	delay_wait_checks "no"
	san_path_err_threshold "no"
	san_path_err_forget_rate "no"
	san_path_err_recovery_time "no"
	marginal_path_err_sample_time "no"
	marginal_path_err_rate_threshold "no"
	marginal_path_err_recheck_gap_time "no"
	marginal_path_double_failed_time "no"
	find_multipaths "off"
	uxsock_timeout 4000
	retrigger_tries 3
	retrigger_delay 10
	missing_uev_wait_timeout 30
	skip_kpartx "no"
	disable_changed_wwids ignored
	remove_retries 0
	ghost_delay "no"
	find_multipaths_timeout -10
	enable_foreign ""
	marginal_pathgroups "no"
}
blacklist {
	devnode "^(ram|zram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]"
	devnode "^(td|hd|vd)[a-z]"
	device {
		vendor "SGI"
		product "Universal Xport"
	}
	device {
		vendor "^DGC"
		product "LUNZ"
	}
	device {
		vendor "EMC"
		product "LUNZ"
	}
	device {
		vendor "DELL"
		product "Universal Xport"
	}
	device {
		vendor "IBM"
		product "Universal Xport"
	}
	device {
		vendor "LENOVO"
		product "Universal Xport"
	}
	device {
		vendor "(NETAPP|LSI|ENGENIO)"
		product "Universal Xport"
	}
	device {
		vendor "STK"
		product "Universal Xport"
	}
	device {
		vendor "SUN"
		product "Universal Xport"
	}
	device {
		vendor "(Intel|INTEL)"
		product "VTrak V-LUN"
	}
	device {
		vendor "Promise"
		product "VTrak V-LUN"
	}
	device {
		vendor "Promise"
		product "Vess V-LUN"
	}
	device {
		vendor "IBM"
		product "S/390.*"
	}
}
blacklist_exceptions {
}
devices {
	device {
		vendor "HUAWEI"
		product "XSG1"
		path_grouping_policy "group_by_prio"
		prio "alua"
	}
}
overrides {
}
multipaths {
	multipath {
		wwid "364cf55b10097699e0197320300000174"
	}
	multipath {
		wwid "364cf55b10097699e018d9d9500000171"
	}
	...
}
> -Ben
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-22  6:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-19 12:13 [dm-devel] [Question] multipathd add/remove paths takes a long time Wu Guanghao
2022-07-19 16:55 ` Roger Heflin
2022-07-21 11:53   ` Wu Guanghao
2022-07-21 14:34 ` Benjamin Marzinski
2022-07-22  6:28   ` Wu Guanghao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.