All of lore.kernel.org
 help / color / mirror / Atom feed
* Question: handling early hotplug interrupts
@ 2017-08-29 20:43 Daniel Henrique Barboza
  2017-08-29 21:55 ` Benjamin Herrenschmidt
  2017-08-30  5:59 ` Michael Ellerman
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Henrique Barboza @ 2017-08-29 20:43 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

This is a scenario I've been facing when working in early device 
hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn 
the guest of the event, then the kernel fetches it by calling 
'check_exception' and handles it. If the hotplug is done too early 
(before SLOF, for example), the pulse is ignored and the hotplug event 
is left unchecked in the events queue.

One solution would be to pulse the hotplug queue interrupt after CAS, 
when we are sure that the hotplug queue is negotiated. However, this 
panics the kernel with sig 11 kernel access of bad area, which suggests 
that the kernel wasn't quite ready to handle it.

In my experiments using upstream 4.13 I saw that there is a 'safe time' 
to pulse the queue, sometime after CAS and before mounting the root fs, 
but I wasn't able to pinpoint it. From QEMU perspective, the last hcall 
done (an h_set_mode) is still too early to pulse it and the kernel 
panics. Looking at the kernel source I saw that the IRQ handling is 
initiated quite early in the init process.

So my question (ok, actually 2 questions):

- Is my analysis correct? Is there an unsafe time to fire a IRQ pulse 
before CAS that can break the kernel or am I overlooking/doing something 
wrong?
- is there a reliable way to know when can the kernel safely handle the 
hotplug interrupt?


Thanks,


Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-29 20:43 Question: handling early hotplug interrupts Daniel Henrique Barboza
@ 2017-08-29 21:55 ` Benjamin Herrenschmidt
  2017-08-29 23:53   ` Daniel Henrique Barboza
  2017-08-30  5:59 ` Michael Ellerman
  1 sibling, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-29 21:55 UTC (permalink / raw)
  To: Daniel Henrique Barboza, linuxppc-dev; +Cc: Nathan Fontenot

On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is a scenario I've been facing when working in early device 
> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn 
> the guest of the event, then the kernel fetches it by calling 
> 'check_exception' and handles it. If the hotplug is done too early 
> (before SLOF, for example), the pulse is ignored and the hotplug event 
> is left unchecked in the events queue.
> 
> One solution would be to pulse the hotplug queue interrupt after CAS, 
> when we are sure that the hotplug queue is negotiated. However, this 
> panics the kernel with sig 11 kernel access of bad area, which suggests 
> that the kernel wasn't quite ready to handle it.

That's not right. This is a bug that needs fixing. The interrupt should
be masked anyway but still.

Tell us more about the crash (backtrace etc...)  this definitely needs
fixing.

> In my experiments using upstream 4.13 I saw that there is a 'safe time' 
> to pulse the queue, sometime after CAS and before mounting the root fs, 
> but I wasn't able to pinpoint it. From QEMU perspective, the last hcall 
> done (an h_set_mode) is still too early to pulse it and the kernel 
> panics. Looking at the kernel source I saw that the IRQ handling is 
> initiated quite early in the init process.
> 
> So my question (ok, actually 2 questions):
> 
> - Is my analysis correct? Is there an unsafe time to fire a IRQ pulse 
> before CAS that can break the kernel or am I overlooking/doing something 
> wrong?
> - is there a reliable way to know when can the kernel safely handle the 
> hotplug interrupt?

So I don't think that's the right approach. Virtual interrutps are edge
sensitive and we will potentially lose them if they occur early. I
think what needs to happen is:

 - Fix whatever's causing the above crash

and

 - The hotplug code should check for pending events (check_exception ?)
at boot time to enqueue whatever's there. It needs to do that after
unmasking the interrupt and in a way that is protected from races with
said interrupt.

Cheers,
Ben.
 

> 
> Thanks,
> 
> 
> Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-29 21:55 ` Benjamin Herrenschmidt
@ 2017-08-29 23:53   ` Daniel Henrique Barboza
  2017-08-30  6:09     ` Michael Ellerman
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Henrique Barboza @ 2017-08-29 23:53 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: Nathan Fontenot, David Gibson

Hi Ben,

On 08/29/2017 06:55 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
>> Hi,
>>
>> This is a scenario I've been facing when working in early device
>> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
>> the guest of the event, then the kernel fetches it by calling
>> 'check_exception' and handles it. If the hotplug is done too early
>> (before SLOF, for example), the pulse is ignored and the hotplug event
>> is left unchecked in the events queue.
>>
>> One solution would be to pulse the hotplug queue interrupt after CAS,
>> when we are sure that the hotplug queue is negotiated. However, this
>> panics the kernel with sig 11 kernel access of bad area, which suggests
>> that the kernel wasn't quite ready to handle it.
> That's not right. This is a bug that needs fixing. The interrupt should
> be masked anyway but still.
>
> Tell us more about the crash (backtrace etc...)  this definitely needs
> fixing.

This is the backtrace using a 4.13.0-rc3 guest:

---------
[    0.008913] Unable to handle kernel paging request for data at 
address 0x00000100
[    0.008989] Faulting instruction address: 0xc00000000012c318
[    0.009046] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.009092] SMP NR_CPUS=1024
[    0.009092] NUMA
[    0.009128] pSeries
[    0.009173] Modules linked in:
[    0.009210] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3+ #1
[    0.009268] task: c0000000feb02580 task.stack: c0000000fe108000
[    0.009325] NIP: c00000000012c318 LR: c00000000012c9c4 CTR: 
0000000000000000
[    0.009394] REGS: c0000000fffef910 TRAP: 0380   Not tainted (4.13.0-rc3+)
[    0.009450] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
[    0.009454]   CR: 28000822  XER: 20000000
[    0.009554] CFAR: c00000000012c9c0 SOFTE: 0
[    0.009554] GPR00: c00000000012c9c4 c0000000fffefb90 c00000000141f100 
0000000000000400
[    0.009554] GPR04: 0000000000000000 c0000000fe1851c0 0000000000000000 
00000000fee60000
[    0.009554] GPR08: 0000000fffffffe1 0000000000000000 0000000000000001 
0000000002001001
[    0.009554] GPR12: 0000000000000040 c00000000fd80000 c00000000000db58 
0000000000000000
[    0.009554] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.009554] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000001
[    0.009554] GPR24: 0000000000000002 0000000000000013 c0000000fe14bc00 
0000000000000400
[    0.009554] GPR28: 0000000000000400 0000000000000000 c0000000fe1851c0 
0000000000000001
[    0.010121] NIP [c00000000012c318] __queue_work+0x48/0x640
[    0.010168] LR [c00000000012c9c4] queue_work_on+0xb4/0xf0
[    0.010213] Call Trace:
[    0.010239] [c0000000fffefb90] [c00000000000db58] 
kernel_init+0x8/0x160 (unreliable)
[    0.010308] [c0000000fffefc70] [c00000000012c9c4] queue_work_on+0xb4/0xf0
[    0.010368] [c0000000fffefcb0] [c0000000000c4608] 
queue_hotplug_event+0xd8/0x150
[    0.010435] [c0000000fffefd00] [c0000000000c30d0] 
ras_hotplug_interrupt+0x140/0x190
[    0.010505] [c0000000fffefd90] [c00000000018c8b0] 
__handle_irq_event_percpu+0x90/0x310
[    0.010573] [c0000000fffefe50] [c00000000018cb6c] 
handle_irq_event_percpu+0x3c/0x90
[    0.010642] [c0000000fffefe90] [c00000000018cc24] 
handle_irq_event+0x64/0xc0
[    0.010710] [c0000000fffefec0] [c0000000001928b0] 
handle_fasteoi_irq+0xc0/0x230
[    0.010779] [c0000000fffefef0] [c00000000018ae14] 
generic_handle_irq+0x54/0x80
[    0.010847] [c0000000fffeff20] [c0000000000189f0] __do_irq+0x90/0x210
[    0.010904] [c0000000fffeff90] [c00000000002e730] call_do_irq+0x14/0x24
[    0.010961] [c0000000fe10b640] [c000000000018c10] do_IRQ+0xa0/0x130
[    0.011021] [c0000000fe10b6a0] [c000000000008c58] 
hardware_interrupt_common+0x158/0x160
[    0.011090] --- interrupt: 501 at __replay_interrupt+0x38/0x3c
[    0.011090]     LR = arch_local_irq_restore+0x74/0x90
[    0.011179] [c0000000fe10b990] [c0000000fe10b9e0] 0xc0000000fe10b9e0 
(unreliable)
[    0.011249] [c0000000fe10b9b0] [c000000000b967fc] 
_raw_spin_unlock_irqrestore+0x4c/0xb0
[    0.011316] [c0000000fe10b9e0] [c00000000018ff50] __setup_irq+0x630/0x9e0
[    0.011374] [c0000000fe10ba90] [c00000000019054c] 
request_threaded_irq+0x13c/0x250
[    0.011441] [c0000000fe10baf0] [c0000000000c2cd0] 
request_event_sources_irqs+0x100/0x180
[    0.011511] [c0000000fe10bc10] [c000000000eceda8] 
__machine_initcall_pseries_init_ras_IRQ+0xc4/0x12c
[    0.011591] [c0000000fe10bc40] [c00000000000d8c8] 
do_one_initcall+0x68/0x1e0
[    0.011659] [c0000000fe10bd00] [c000000000eb4484] 
kernel_init_freeable+0x284/0x370
[    0.011725] [c0000000fe10bdc0] [c00000000000db7c] kernel_init+0x2c/0x160
[    0.011782] [c0000000fe10be30] [c00000000000bc9c] 
ret_from_kernel_thread+0x5c/0xc0
[    0.011848] Instruction dump:
[    0.011885] fbc1fff0 f8010010 f821ff21 7c7c1b78 7c9d2378 7cbe2b78 
787b0020 60000000
[    0.011955] 60000000 892d028a 2fa90000 409e04bc <813d0100> 75290001 
408204c0 3d2061c8
[    0.012026] ---[ end trace e0b4d36daf3f8b2a ]---
[    0.013850]
[    2.013962] Kernel panic - not syncing: Fatal exception in interrupt
-------------

To reproduce it, what I did was to fire a pulse in the hotplug queue 
right after CAS by
hacking QEMU code.

However, this can also be reproduced without changing QEMU by simply 
hotpluging a
CPU/LMB after CAS using device_add.


[adding dgibson in CC in case he wants to comment]


Thanks,


Daniel

>
>> In my experiments using upstream 4.13 I saw that there is a 'safe time'
>> to pulse the queue, sometime after CAS and before mounting the root fs,
>> but I wasn't able to pinpoint it. From QEMU perspective, the last hcall
>> done (an h_set_mode) is still too early to pulse it and the kernel
>> panics. Looking at the kernel source I saw that the IRQ handling is
>> initiated quite early in the init process.
>>
>> So my question (ok, actually 2 questions):
>>
>> - Is my analysis correct? Is there an unsafe time to fire a IRQ pulse
>> before CAS that can break the kernel or am I overlooking/doing something
>> wrong?
>> - is there a reliable way to know when can the kernel safely handle the
>> hotplug interrupt?
> So I don't think that's the right approach. Virtual interrutps are edge
> sensitive and we will potentially lose them if they occur early. I
> think what needs to happen is:
>
>   - Fix whatever's causing the above crash
>
> and
>
>   - The hotplug code should check for pending events (check_exception ?)
> at boot time to enqueue whatever's there. It needs to do that after
> unmasking the interrupt and in a way that is protected from races with
> said interrupt.
>
> Cheers,
> Ben.
>
>
>> Thanks,
>>
>>
>> Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-29 20:43 Question: handling early hotplug interrupts Daniel Henrique Barboza
  2017-08-29 21:55 ` Benjamin Herrenschmidt
@ 2017-08-30  5:59 ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2017-08-30  5:59 UTC (permalink / raw)
  To: Daniel Henrique Barboza, linuxppc-dev

Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> writes:

> Hi,
>
> This is a scenario I've been facing when working in early device 
> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn 
> the guest of the event, then the kernel fetches it by calling 
> 'check_exception' and handles it. If the hotplug is done too early 
> (before SLOF, for example), the pulse is ignored and the hotplug event 
> is left unchecked in the events queue.
>
> One solution would be to pulse the hotplug queue interrupt after CAS, 
> when we are sure that the hotplug queue is negotiated. However, this 
> panics the kernel with sig 11 kernel access of bad area, which suggests 
> that the kernel wasn't quite ready to handle it.
>
> In my experiments using upstream 4.13 I saw that there is a 'safe time' 
> to pulse the queue, sometime after CAS and before mounting the root fs, 
> but I wasn't able to pinpoint it. From QEMU perspective, the last hcall 
> done (an h_set_mode) is still too early to pulse it and the kernel 
> panics. Looking at the kernel source I saw that the IRQ handling is 
> initiated quite early in the init process.
>
> So my question (ok, actually 2 questions):
>
> - Is my analysis correct? Is there an unsafe time to fire a IRQ pulse 
> before CAS that can break the kernel or am I overlooking/doing something 
> wrong?
> - is there a reliable way to know when can the kernel safely handle the 
> hotplug interrupt?

In addition to Ben's comments, you need to think about this differently.

The operating system you're booting may not be Linux.

Whatever Qemu does needs to make sense without reference to the exact
details or ordering of the Linux code. Qemu needs to provide a mechanism
that any operating system could use, and then we can make it work with
Linux.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-29 23:53   ` Daniel Henrique Barboza
@ 2017-08-30  6:09     ` Michael Ellerman
  2017-08-30 14:37       ` Nathan Fontenot
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2017-08-30  6:09 UTC (permalink / raw)
  To: Daniel Henrique Barboza, benh, linuxppc-dev; +Cc: Nathan Fontenot, David Gibson

Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> writes:

> Hi Ben,
>
> On 08/29/2017 06:55 PM, Benjamin Herrenschmidt wrote:
>> On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
>>> Hi,
>>>
>>> This is a scenario I've been facing when working in early device
>>> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
>>> the guest of the event, then the kernel fetches it by calling
>>> 'check_exception' and handles it. If the hotplug is done too early
>>> (before SLOF, for example), the pulse is ignored and the hotplug event
>>> is left unchecked in the events queue.
>>>
>>> One solution would be to pulse the hotplug queue interrupt after CAS,
>>> when we are sure that the hotplug queue is negotiated. However, this
>>> panics the kernel with sig 11 kernel access of bad area, which suggests
>>> that the kernel wasn't quite ready to handle it.
>> That's not right. This is a bug that needs fixing. The interrupt should
>> be masked anyway but still.
>>
>> Tell us more about the crash (backtrace etc...)  this definitely needs
>> fixing.
>
> This is the backtrace using a 4.13.0-rc3 guest:
>
> ---------
> [    0.008913] Unable to handle kernel paging request for data at address 0x00000100
> [    0.008989] Faulting instruction address: 0xc00000000012c318
> [    0.009046] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.009092] SMP NR_CPUS=1024
> [    0.009092] NUMA
> [    0.009128] pSeries
> [    0.009173] Modules linked in:
> [    0.009210] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3+ #1
> [    0.009268] task: c0000000feb02580 task.stack: c0000000fe108000
> [    0.009325] NIP: c00000000012c318 LR: c00000000012c9c4 CTR: 0000000000000000
> [    0.009394] REGS: c0000000fffef910 TRAP: 0380   Not tainted (4.13.0-rc3+)
> [    0.009450] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
> [    0.009454]   CR: 28000822  XER: 20000000
> [    0.009554] CFAR: c00000000012c9c0 SOFTE: 0
> [    0.009554] GPR00: c00000000012c9c4 c0000000fffefb90 c00000000141f100 0000000000000400
> [    0.009554] GPR04: 0000000000000000 c0000000fe1851c0 0000000000000000 00000000fee60000
> [    0.009554] GPR08: 0000000fffffffe1 0000000000000000 0000000000000001 0000000002001001
> [    0.009554] GPR12: 0000000000000040 c00000000fd80000 c00000000000db58 0000000000000000
> [    0.009554] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.009554] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> [    0.009554] GPR24: 0000000000000002 0000000000000013 c0000000fe14bc00 0000000000000400
> [    0.009554] GPR28: 0000000000000400 0000000000000000 c0000000fe1851c0 0000000000000001
> [    0.010121] NIP [c00000000012c318] __queue_work+0x48/0x640
> [    0.010168] LR [c00000000012c9c4] queue_work_on+0xb4/0xf0
> [    0.010213] Call Trace:
> [    0.010239] [c0000000fffefb90] [c00000000000db58] kernel_init+0x8/0x160 (unreliable)
> [    0.010308] [c0000000fffefc70] [c00000000012c9c4] queue_work_on+0xb4/0xf0
> [    0.010368] [c0000000fffefcb0] [c0000000000c4608] queue_hotplug_event+0xd8/0x150
> [    0.010435] [c0000000fffefd00] [c0000000000c30d0] ras_hotplug_interrupt+0x140/0x190
> [    0.010505] [c0000000fffefd90] [c00000000018c8b0] __handle_irq_event_percpu+0x90/0x310
> [    0.010573] [c0000000fffefe50] [c00000000018cb6c] handle_irq_event_percpu+0x3c/0x90
> [    0.010642] [c0000000fffefe90] [c00000000018cc24] handle_irq_event+0x64/0xc0
> [    0.010710] [c0000000fffefec0] [c0000000001928b0] handle_fasteoi_irq+0xc0/0x230
> [    0.010779] [c0000000fffefef0] [c00000000018ae14] generic_handle_irq+0x54/0x80
> [    0.010847] [c0000000fffeff20] [c0000000000189f0] __do_irq+0x90/0x210
> [    0.010904] [c0000000fffeff90] [c00000000002e730] call_do_irq+0x14/0x24
> [    0.010961] [c0000000fe10b640] [c000000000018c10] do_IRQ+0xa0/0x130
> [    0.011021] [c0000000fe10b6a0] [c000000000008c58] hardware_interrupt_common+0x158/0x160
> [    0.011090] --- interrupt: 501 at __replay_interrupt+0x38/0x3c
> [    0.011090]     LR = arch_local_irq_restore+0x74/0x90
> [    0.011179] [c0000000fe10b990] [c0000000fe10b9e0] 0xc0000000fe10b9e0 (unreliable)
> [    0.011249] [c0000000fe10b9b0] [c000000000b967fc] _raw_spin_unlock_irqrestore+0x4c/0xb0
> [    0.011316] [c0000000fe10b9e0] [c00000000018ff50] __setup_irq+0x630/0x9e0
> [    0.011374] [c0000000fe10ba90] [c00000000019054c] request_threaded_irq+0x13c/0x250
> [    0.011441] [c0000000fe10baf0] [c0000000000c2cd0] request_event_sources_irqs+0x100/0x180
> [    0.011511] [c0000000fe10bc10] [c000000000eceda8] __machine_initcall_pseries_init_ras_IRQ+0xc4/0x12c
> [    0.011591] [c0000000fe10bc40] [c00000000000d8c8] do_one_initcall+0x68/0x1e0
> [    0.011659] [c0000000fe10bd00] [c000000000eb4484] kernel_init_freeable+0x284/0x370
> [    0.011725] [c0000000fe10bdc0] [c00000000000db7c] kernel_init+0x2c/0x160
> [    0.011782] [c0000000fe10be30] [c00000000000bc9c] ret_from_kernel_thread+0x5c/0xc0
> [    0.011848] Instruction dump:
> [    0.011885] fbc1fff0 f8010010 f821ff21 7c7c1b78 7c9d2378 7cbe2b78 787b0020 60000000
> [    0.011955] 60000000 892d028a 2fa90000 409e04bc <813d0100> 75290001 408204c0 3d2061c8
> [    0.012026] ---[ end trace e0b4d36daf3f8b2a ]---
> [    0.013850]
> [    2.013962] Kernel panic - not syncing: Fatal exception in interrupt
> -------------
>
> To reproduce it, what I did was to fire a pulse in the hotplug queue 
> right after CAS by hacking QEMU code.

That's not right after CAS, that's much later.

It appears the interrupt has been queued and we've taken it immediately
on the first unmask of interrupts after registering the
ras_hotplug_interrupt() IRQ.

That happens at subsys initcall time.

ras_hotplug_interrupt() calls queue_hotplug_event() which does:

	queue_work(pseries_hp_wq, (struct work_struct *)work);

Where pseries_hp_wq is initialised in:

  static int __init pseries_dlpar_init(void)
  {
  	pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
  					WQ_UNBOUND, 1);
  	return sysfs_create_file(kernel_kobj, &class_attr_dlpar.attr);
  }
  machine_device_initcall(pseries, pseries_dlpar_init);


The ordering of subsys vs device init call is:

  #define subsys_initcall(fn)		__define_initcall(fn, 4)
  #define fs_initcall(fn)			__define_initcall(fn, 5)
  #define device_initcall(fn)		__define_initcall(fn, 6)


So this is simply a case of the init calls being out of order.

We either need to create the pseries_hp_wq earlier, or register the
event sources IRQs later. I'm not sure which is better.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-30  6:09     ` Michael Ellerman
@ 2017-08-30 14:37       ` Nathan Fontenot
  2017-08-31  9:53         ` Michael Ellerman
  0 siblings, 1 reply; 7+ messages in thread
From: Nathan Fontenot @ 2017-08-30 14:37 UTC (permalink / raw)
  To: Michael Ellerman, Daniel Henrique Barboza, benh, linuxppc-dev
  Cc: David Gibson

On 08/30/2017 01:09 AM, Michael Ellerman wrote:
> Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> writes:
> 
>> Hi Ben,
>>
>> On 08/29/2017 06:55 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
>>>> Hi,
>>>>
>>>> This is a scenario I've been facing when working in early device
>>>> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
>>>> the guest of the event, then the kernel fetches it by calling
>>>> 'check_exception' and handles it. If the hotplug is done too early
>>>> (before SLOF, for example), the pulse is ignored and the hotplug event
>>>> is left unchecked in the events queue.
>>>>
>>>> One solution would be to pulse the hotplug queue interrupt after CAS,
>>>> when we are sure that the hotplug queue is negotiated. However, this
>>>> panics the kernel with sig 11 kernel access of bad area, which suggests
>>>> that the kernel wasn't quite ready to handle it.
>>> That's not right. This is a bug that needs fixing. The interrupt should
>>> be masked anyway but still.
>>>
>>> Tell us more about the crash (backtrace etc...)  this definitely needs
>>> fixing.
>>
>> This is the backtrace using a 4.13.0-rc3 guest:
>>
>> ---------
>> [    0.008913] Unable to handle kernel paging request for data at address 0x00000100
>> [    0.008989] Faulting instruction address: 0xc00000000012c318
>> [    0.009046] Oops: Kernel access of bad area, sig: 11 [#1]
>> [    0.009092] SMP NR_CPUS=1024
>> [    0.009092] NUMA
>> [    0.009128] pSeries
>> [    0.009173] Modules linked in:
>> [    0.009210] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3+ #1
>> [    0.009268] task: c0000000feb02580 task.stack: c0000000fe108000
>> [    0.009325] NIP: c00000000012c318 LR: c00000000012c9c4 CTR: 0000000000000000
>> [    0.009394] REGS: c0000000fffef910 TRAP: 0380   Not tainted (4.13.0-rc3+)
>> [    0.009450] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
>> [    0.009454]   CR: 28000822  XER: 20000000
>> [    0.009554] CFAR: c00000000012c9c0 SOFTE: 0
>> [    0.009554] GPR00: c00000000012c9c4 c0000000fffefb90 c00000000141f100 0000000000000400
>> [    0.009554] GPR04: 0000000000000000 c0000000fe1851c0 0000000000000000 00000000fee60000
>> [    0.009554] GPR08: 0000000fffffffe1 0000000000000000 0000000000000001 0000000002001001
>> [    0.009554] GPR12: 0000000000000040 c00000000fd80000 c00000000000db58 0000000000000000
>> [    0.009554] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [    0.009554] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
>> [    0.009554] GPR24: 0000000000000002 0000000000000013 c0000000fe14bc00 0000000000000400
>> [    0.009554] GPR28: 0000000000000400 0000000000000000 c0000000fe1851c0 0000000000000001
>> [    0.010121] NIP [c00000000012c318] __queue_work+0x48/0x640
>> [    0.010168] LR [c00000000012c9c4] queue_work_on+0xb4/0xf0
>> [    0.010213] Call Trace:
>> [    0.010239] [c0000000fffefb90] [c00000000000db58] kernel_init+0x8/0x160 (unreliable)
>> [    0.010308] [c0000000fffefc70] [c00000000012c9c4] queue_work_on+0xb4/0xf0
>> [    0.010368] [c0000000fffefcb0] [c0000000000c4608] queue_hotplug_event+0xd8/0x150
>> [    0.010435] [c0000000fffefd00] [c0000000000c30d0] ras_hotplug_interrupt+0x140/0x190
>> [    0.010505] [c0000000fffefd90] [c00000000018c8b0] __handle_irq_event_percpu+0x90/0x310
>> [    0.010573] [c0000000fffefe50] [c00000000018cb6c] handle_irq_event_percpu+0x3c/0x90
>> [    0.010642] [c0000000fffefe90] [c00000000018cc24] handle_irq_event+0x64/0xc0
>> [    0.010710] [c0000000fffefec0] [c0000000001928b0] handle_fasteoi_irq+0xc0/0x230
>> [    0.010779] [c0000000fffefef0] [c00000000018ae14] generic_handle_irq+0x54/0x80
>> [    0.010847] [c0000000fffeff20] [c0000000000189f0] __do_irq+0x90/0x210
>> [    0.010904] [c0000000fffeff90] [c00000000002e730] call_do_irq+0x14/0x24
>> [    0.010961] [c0000000fe10b640] [c000000000018c10] do_IRQ+0xa0/0x130
>> [    0.011021] [c0000000fe10b6a0] [c000000000008c58] hardware_interrupt_common+0x158/0x160
>> [    0.011090] --- interrupt: 501 at __replay_interrupt+0x38/0x3c
>> [    0.011090]     LR = arch_local_irq_restore+0x74/0x90
>> [    0.011179] [c0000000fe10b990] [c0000000fe10b9e0] 0xc0000000fe10b9e0 (unreliable)
>> [    0.011249] [c0000000fe10b9b0] [c000000000b967fc] _raw_spin_unlock_irqrestore+0x4c/0xb0
>> [    0.011316] [c0000000fe10b9e0] [c00000000018ff50] __setup_irq+0x630/0x9e0
>> [    0.011374] [c0000000fe10ba90] [c00000000019054c] request_threaded_irq+0x13c/0x250
>> [    0.011441] [c0000000fe10baf0] [c0000000000c2cd0] request_event_sources_irqs+0x100/0x180
>> [    0.011511] [c0000000fe10bc10] [c000000000eceda8] __machine_initcall_pseries_init_ras_IRQ+0xc4/0x12c
>> [    0.011591] [c0000000fe10bc40] [c00000000000d8c8] do_one_initcall+0x68/0x1e0
>> [    0.011659] [c0000000fe10bd00] [c000000000eb4484] kernel_init_freeable+0x284/0x370
>> [    0.011725] [c0000000fe10bdc0] [c00000000000db7c] kernel_init+0x2c/0x160
>> [    0.011782] [c0000000fe10be30] [c00000000000bc9c] ret_from_kernel_thread+0x5c/0xc0
>> [    0.011848] Instruction dump:
>> [    0.011885] fbc1fff0 f8010010 f821ff21 7c7c1b78 7c9d2378 7cbe2b78 787b0020 60000000
>> [    0.011955] 60000000 892d028a 2fa90000 409e04bc <813d0100> 75290001 408204c0 3d2061c8
>> [    0.012026] ---[ end trace e0b4d36daf3f8b2a ]---
>> [    0.013850]
>> [    2.013962] Kernel panic - not syncing: Fatal exception in interrupt
>> -------------
>>
>> To reproduce it, what I did was to fire a pulse in the hotplug queue 
>> right after CAS by hacking QEMU code.
> 
> That's not right after CAS, that's much later.
> 
> It appears the interrupt has been queued and we've taken it immediately
> on the first unmask of interrupts after registering the
> ras_hotplug_interrupt() IRQ.
> 
> That happens at subsys initcall time.
> 
> ras_hotplug_interrupt() calls queue_hotplug_event() which does:
> 
> 	queue_work(pseries_hp_wq, (struct work_struct *)work);
> 
> Where pseries_hp_wq is initialised in:
> 
>   static int __init pseries_dlpar_init(void)
>   {
>   	pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
>   					WQ_UNBOUND, 1);
>   	return sysfs_create_file(kernel_kobj, &class_attr_dlpar.attr);
>   }
>   machine_device_initcall(pseries, pseries_dlpar_init);
> 
> 
> The ordering of subsys vs device init call is:
> 
>   #define subsys_initcall(fn)		__define_initcall(fn, 4)
>   #define fs_initcall(fn)			__define_initcall(fn, 5)
>   #define device_initcall(fn)		__define_initcall(fn, 6)
> 
> 
> So this is simply a case of the init calls being out of order.
> 
> We either need to create the pseries_hp_wq earlier, or register the
> event sources IRQs later. I'm not sure which is better.

Perhaps I'm erring on the side of caution, but I think registering
the IRQs later would be better. I think this would give the kernel
more time to come and better handle a hotplug request.

-Nathan 
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question: handling early hotplug interrupts
  2017-08-30 14:37       ` Nathan Fontenot
@ 2017-08-31  9:53         ` Michael Ellerman
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2017-08-31  9:53 UTC (permalink / raw)
  To: Nathan Fontenot, Daniel Henrique Barboza, benh, linuxppc-dev; +Cc: David Gibson

Nathan Fontenot <nfont@linux.vnet.ibm.com> writes:
> On 08/30/2017 01:09 AM, Michael Ellerman wrote:
...
>> 
>> So this is simply a case of the init calls being out of order.
>> 
>> We either need to create the pseries_hp_wq earlier, or register the
>> event sources IRQs later. I'm not sure which is better.
>
> Perhaps I'm erring on the side of caution, but I think registering
> the IRQs later would be better. I think this would give the kernel
> more time to come and better handle a hotplug request.

Yeah I agree, unless there's some other reason why we must register
those IRQs early, but I don't know of one.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-31  9:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 20:43 Question: handling early hotplug interrupts Daniel Henrique Barboza
2017-08-29 21:55 ` Benjamin Herrenschmidt
2017-08-29 23:53   ` Daniel Henrique Barboza
2017-08-30  6:09     ` Michael Ellerman
2017-08-30 14:37       ` Nathan Fontenot
2017-08-31  9:53         ` Michael Ellerman
2017-08-30  5:59 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.