LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* missing doorbell interrupt when onlining cpu
@ 2019-09-04 22:51 Nathan Lynch
  2019-09-04 23:18 ` Nathan Lynch
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Lynch @ 2019-09-04 22:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gautham R Shenoy, Nicholas Piggin

I'm hoping for some help investigating a behavior I see when doing cpu
hotplug under load on P9 and P8 LPARs. Occasionally, while coming online
a cpu will seem to get "stuck" in idle, with a pending doorbell
interrupt unserviced (cpu 12 here):

cpuhp/12-70    [012] 46133.602202: cpuhp_enter:          cpu: 0012 target: 205 step: 174 (0xc000000000028920s)
 load.sh-8201  [014] 46133.602248: sched_waking:         comm=cpuhp/12 pid=70 prio=120 target_cpu=012
 load.sh-8201  [014] 46133.602251: smp_send_reschedule:  (c000000000052868) cpu=12
  <idle>-0     [012] 46133.602252: do_idle:              (c000000000162e08)
 load.sh-8201  [014] 46133.602252: smp_muxed_ipi_message_pass: (c0000000000527e8) cpu=12 msg=1
 load.sh-8201  [014] 46133.602253: doorbell_core_ipi:    (c00000000004d3e8) cpu=12
  <idle>-0     [012] 46133.602257: arch_cpu_idle:        (c000000000022d08)
  <idle>-0     [012] 46133.602259: pseries_lpar_idle:    (c0000000000d43c8)

This leaves the task initiating the online blocked in a state like this:

[<0>] __switch_to+0x2dc/0x430
[<0>] __cpuhp_kick_ap+0x78/0xa0
[<0>] cpuhp_kick_ap+0x60/0xf0
[<0>] cpuhp_invoke_callback+0xf4/0x780
[<0>] _cpu_up+0x138/0x260
[<0>] do_cpu_up+0x130/0x160
[<0>] cpu_subsys_online+0x68/0xe0
[<0>] device_online+0xb4/0x120
[<0>] online_store+0xb4/0xc0
[<0>] dev_attr_store+0x3c/0x60
[<0>] sysfs_kf_write+0x70/0xb0
[<0>] kernfs_fop_write+0x17c/0x250
[<0>] __vfs_write+0x40/0x80
[<0>] vfs_write+0xd4/0x250
[<0>] ksys_write+0x74/0x130
[<0>] system_call+0x5c/0x70

This trace is from a 5.2.10 kernel, and I've observed the problem on a
4.12 vendor kernel as well.

The issue always occurs before the cpu has completed all the cpuhp
callbacks that need to run on that cpu. Often it occurs before it even
runs a task (rcu_sched, migration, or cpuhp kthreads are the first to
run). But sometimes it will have run a task or two, as in this case.

It seems specific to doorbell i.e. intra-core IPIs; I have not observed
IPIs between cores getting dropped.

sysrq-l gets the newly onlined cpu unstuck.

The cpu can get in this state even after servicing doorbells earlier in
the online process.

This is using the default cede offline state, not stop-self (which I
haven't tried).

Ideas?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 22:51 missing doorbell interrupt when onlining cpu Nathan Lynch
2019-09-04 23:18 ` Nathan Lynch
2019-09-10 23:04   ` Nathan Lynch

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org linuxppc-dev@archiver.kernel.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox