All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine.
@ 2016-10-12  7:58 Lan Tianyu
  2016-10-12  7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu
  2016-10-12  7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu
  0 siblings, 2 replies; 9+ messages in thread
From: Lan Tianyu @ 2016-10-12  7:58 UTC (permalink / raw)
  To: xen-devel, xen-devel
  Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, jbeulich

This patchset is to fix triggering NMI watchdog during dump timer info
on the huge machine with a mount of physical cpus. Detail please see
change log of Patch 1.

Previous discussion:
https://patchwork.kernel.org/patch/9328449/

Change since V1:
        Add "async" param for handle_keypress() to identify
whether run nonirq keyhandler in tasklet or not. This is to
avoid processing debugkey sysctl asynchronously.


Lan Tianyu (2):
  Xen/Keyhandler: Rework process of nonirq keyhandler
  Xen/timer: Process softirq during dumping timer info

 xen/common/keyhandler.c      |    8 +++++---
 xen/common/sysctl.c          |    2 +-
 xen/common/timer.c           |    1 +
 xen/drivers/char/console.c   |    2 +-
 xen/include/xen/keyhandler.h |    4 +++-
 5 files changed, 11 insertions(+), 6 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler
  2016-10-12  7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu
@ 2016-10-12  7:58 ` Lan Tianyu
  2016-10-12 13:19   ` Jan Beulich
  2016-10-12  7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu
  1 sibling, 1 reply; 9+ messages in thread
From: Lan Tianyu @ 2016-10-12  7:58 UTC (permalink / raw)
  To: xen-devel, xen-devel
  Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, jbeulich


Keyhandler may run for a long time in serial port driver's
timer handler on the large machine with a lot of physical
cpus(e,g dump_timerq()) when serial port driver works in
the poll mode(via the exception mechanism).

If a timer handler runs a long time, it will block nmi_timer_fn()
to feed NMI watchdog and cause Xen hypervisor panic. Inserting
process_pending_softirqs() in timer handler will not help. when timer
interrupt arrives, timer subsystem calls all expired timer handlers
before programming next timer interrupt. There is no timer interrupt
arriving to trigger timer softirq during run a timer handler.

This patch is to fix the issue to make nonirq keyhandler run in
tasklet when receive debug key from serial port.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/common/keyhandler.c      |    8 +++++---
 xen/common/sysctl.c          |    2 +-
 xen/drivers/char/console.c   |    2 +-
 xen/include/xen/keyhandler.h |    4 +++-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 16de6e8..3d50041 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -75,19 +75,21 @@ static struct keyhandler {
 
 static void keypress_action(unsigned long unused)
 {
-    handle_keypress(keypress_key, NULL);
+    console_start_log_everything();
+    key_table[keypress_key].fn(keypress_key);
+    console_end_log_everything();
 }
 
 static DECLARE_TASKLET(keypress_tasklet, keypress_action, 0);
 
-void handle_keypress(unsigned char key, struct cpu_user_regs *regs)
+void handle_keypress(unsigned char key, struct cpu_user_regs *regs, bool async)
 {
     struct keyhandler *h;
 
     if ( key >= ARRAY_SIZE(key_table) || !(h = &key_table[key])->fn )
         return;
 
-    if ( !in_irq() || h->irq_callback )
+    if ( h->irq_callback || !async )
     {
         console_start_log_everything();
         h->irq_callback ? h->irq_fn(key, regs) : h->fn(key);
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 8aea6ef..1eb7bad 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -136,7 +136,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         {
             if ( copy_from_guest_offset(&c, op->u.debug_keys.keys, i, 1) )
                 goto out;
-            handle_keypress(c, guest_cpu_user_regs());
+            handle_keypress(c, guest_cpu_user_regs(), false);
         }
         ret = 0;
         copyback = 0;
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 55ae31a..184b523 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -347,7 +347,7 @@ static void switch_serial_input(void)
 static void __serial_rx(char c, struct cpu_user_regs *regs)
 {
     if ( xen_rx )
-        return handle_keypress(c, regs);
+        return handle_keypress(c, regs, true);
 
     /* Deliver input to guest buffer, unless it is already full. */
     if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE )
diff --git a/xen/include/xen/keyhandler.h b/xen/include/xen/keyhandler.h
index 06c05c8..e9595bd 100644
--- a/xen/include/xen/keyhandler.h
+++ b/xen/include/xen/keyhandler.h
@@ -46,7 +46,9 @@ void register_irq_keyhandler(unsigned char key,
                              bool_t diagnostic);
 
 /* Inject a keypress into the key-handling subsystem. */
-extern void handle_keypress(unsigned char key, struct cpu_user_regs *regs);
+extern void handle_keypress(unsigned char key,
+			    struct cpu_user_regs *regs,
+			    bool async);
 
 /* Scratch space is available for use of any keyhandler. */
 extern char keyhandler_scratch[1024];
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info
  2016-10-12  7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu
  2016-10-12  7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu
@ 2016-10-12  7:58 ` Lan Tianyu
  2016-10-21 17:27   ` Wei Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Lan Tianyu @ 2016-10-12  7:58 UTC (permalink / raw)
  To: xen-devel, xen-devel
  Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, jbeulich

Dumping timer info may run for a long time on the huge machine with
a lot of physical cpus. To avoid triggering NMI watchdog, add
process_pending_softirqs() in the loop of dumping timer info.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/common/timer.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/xen/common/timer.c b/xen/common/timer.c
index 29a60a9..ab6bca0 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key)
     {
         ts = &per_cpu(timers, i);
 
+        process_pending_softirqs();
         printk("CPU%02d:\n", i);
         spin_lock_irqsave(&ts->lock, flags);
         for ( j = 1; j <= GET_HEAP_SIZE(ts->heap); j++ )
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler
  2016-10-12  7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu
@ 2016-10-12 13:19   ` Jan Beulich
  2016-10-12 14:30     ` Lan, Tianyu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-12 13:19 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote:
> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>  {
>      if ( xen_rx )
> -        return handle_keypress(c, regs);
> +        return handle_keypress(c, regs, true);

I think it would be nice to pass true here only when in polling mode,
unless you know or can deduce that the a similar problem also exists
in IRQ mode. Perhaps you could simply move the !in_irq() here? (Of
course the new function parameter would then want to be renamed.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler
  2016-10-12 13:19   ` Jan Beulich
@ 2016-10-12 14:30     ` Lan, Tianyu
  2016-10-12 16:03       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Lan, Tianyu @ 2016-10-12 14:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 10/12/2016 9:19 PM, Jan Beulich wrote:
>>>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote:
>> --- a/xen/drivers/char/console.c
>> +++ b/xen/drivers/char/console.c
>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>  {
>>      if ( xen_rx )
>> -        return handle_keypress(c, regs);
>> +        return handle_keypress(c, regs, true);
>
> I think it would be nice to pass true here only when in polling mode,
> unless you know or can deduce that the a similar problem also exists
> in IRQ mode. Perhaps you could simply move the !in_irq() here?

That's a good idea. Thanks.

>(Of course the new function parameter would then want to be renamed.)

Since the issue happens when handle_keypress() runs in a timer handler,
how about to name new parameter "intimer"? __serial_rx() is called in a 
timer handler or interrupt handler. Or do you have other suggestion?

>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler
  2016-10-12 14:30     ` Lan, Tianyu
@ 2016-10-12 16:03       ` Jan Beulich
  2016-10-13  1:15         ` Lan Tianyu
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-12 16:03 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 12.10.16 at 16:30, <tianyu.lan@intel.com> wrote:

> 
> On 10/12/2016 9:19 PM, Jan Beulich wrote:
>>>>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote:
>>> --- a/xen/drivers/char/console.c
>>> +++ b/xen/drivers/char/console.c
>>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>  {
>>>      if ( xen_rx )
>>> -        return handle_keypress(c, regs);
>>> +        return handle_keypress(c, regs, true);
>>
>> I think it would be nice to pass true here only when in polling mode,
>> unless you know or can deduce that the a similar problem also exists
>> in IRQ mode. Perhaps you could simply move the !in_irq() here?
> 
> That's a good idea. Thanks.
> 
>>(Of course the new function parameter would then want to be renamed.)
> 
> Since the issue happens when handle_keypress() runs in a timer handler,
> how about to name new parameter "intimer"? __serial_rx() is called in a 
> timer handler or interrupt handler. Or do you have other suggestion?

I think "intimer" can be confusing (to be mixed up with timer interrupt).
How about "force_tasklet"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler
  2016-10-12 16:03       ` Jan Beulich
@ 2016-10-13  1:15         ` Lan Tianyu
  0 siblings, 0 replies; 9+ messages in thread
From: Lan Tianyu @ 2016-10-13  1:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 2016年10月13日 00:03, Jan Beulich wrote:
>>>> On 12.10.16 at 16:30, <tianyu.lan@intel.com> wrote:
>>
>> Since the issue happens when handle_keypress() runs in a timer handler,
>> how about to name new parameter "intimer"? __serial_rx() is called in a 
>> timer handler or interrupt handler. Or do you have other suggestion?
> 
> I think "intimer" can be confusing (to be mixed up with timer interrupt).
> How about "force_tasklet"?

OK. I will update.
-- 
Best regards
Tianyu Lan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info
  2016-10-12  7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu
@ 2016-10-21 17:27   ` Wei Liu
  2016-10-22  3:52     ` Lan, Tianyu
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Liu @ 2016-10-21 17:27 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich, xen-devel

On Wed, Oct 12, 2016 at 03:58:24PM +0800, Lan Tianyu wrote:
> Dumping timer info may run for a long time on the huge machine with
> a lot of physical cpus. To avoid triggering NMI watchdog, add
> process_pending_softirqs() in the loop of dumping timer info.
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  xen/common/timer.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/xen/common/timer.c b/xen/common/timer.c
> index 29a60a9..ab6bca0 100644
> --- a/xen/common/timer.c
> +++ b/xen/common/timer.c
> @@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key)
>      {
>          ts = &per_cpu(timers, i);
>  
> +        process_pending_softirqs();

This is causing issues in ARM (x86 has a similar issue):

Oct 20 01:43:31.410010 (XEN) Xen call trace:
Oct 20 01:43:31.410048 (XEN)    [<00233920>] process_pending_softirqs+0x34/0x5c (PC)
Oct 20 01:43:31.417990 (XEN)    [<00237c6c>] timer.c#dump_timerq+0x9c/0x1fc (LR)
Oct 20 01:43:31.418030 (XEN)    [<00218658>] handle_keypress+0xc0/0xf4
Oct 20 01:43:31.426001 (XEN)    [<002490c8>] console.c#__serial_rx+0x4c/0x9c
Oct 20 01:43:31.433970 (XEN)    [<00249b74>] console.c#serial_rx+0xcc/0xe4
Oct 20 01:43:31.434007 (XEN)    [<0024b6ec>] serial_rx_interrupt+0xcc/0xf8
Oct 20 01:43:31.441964 (XEN)    [<0024ae54>] exynos4210-uart.c#exynos4210_uart_interrupt+0xf8/0x160
Oct 20 01:43:31.450001 (XEN)    [<00256338>] do_IRQ+0x1a0/0x228
Oct 20 01:43:31.450040 (XEN)    [<00254074>] gic_interrupt+0x58/0xfc
Oct 20 01:43:31.457985 (XEN)    [<00260f98>] do_trap_irq+0x24/0x38
Oct 20 01:43:31.458022 (XEN)    [<00264970>] entry.o#return_from_trap+0/0x4
Oct 20 01:43:31.466010 (XEN)    [<0030a240>] 0030a240
Oct 20 01:43:31.466044 (XEN) 
Oct 20 01:43:31.466066 (XEN) 
Oct 20 01:43:31.466099 (XEN) ****************************************
Oct 20 01:43:31.473998 (XEN) Panic on CPU 0:
Oct 20 01:43:31.474029 (XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57
Oct 20 01:43:31.481982 (XEN) ****************************************

See
http://logs.test-lab.xenproject.org/osstest/logs/101571/test-armhf-armhf-libvirt/serial-arndale-bluewater.log

I've reverted this patch in staging.

Wei.

>          printk("CPU%02d:\n", i);
>          spin_lock_irqsave(&ts->lock, flags);
>          for ( j = 1; j <= GET_HEAP_SIZE(ts->heap); j++ )
> -- 
> 1.7.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info
  2016-10-21 17:27   ` Wei Liu
@ 2016-10-22  3:52     ` Lan, Tianyu
  0 siblings, 0 replies; 9+ messages in thread
From: Lan, Tianyu @ 2016-10-22  3:52 UTC (permalink / raw)
  To: Wei Liu
  Cc: sstabellini, George.Dunlap, andrew.cooper3, ian.jackson,
	xen-devel, jbeulich



On 10/22/2016 1:27 AM, Wei Liu wrote:
> On Wed, Oct 12, 2016 at 03:58:24PM +0800, Lan Tianyu wrote:
>> Dumping timer info may run for a long time on the huge machine with
>> a lot of physical cpus. To avoid triggering NMI watchdog, add
>> process_pending_softirqs() in the loop of dumping timer info.
>>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  xen/common/timer.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/xen/common/timer.c b/xen/common/timer.c
>> index 29a60a9..ab6bca0 100644
>> --- a/xen/common/timer.c
>> +++ b/xen/common/timer.c
>> @@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key)
>>      {
>>          ts = &per_cpu(timers, i);
>>
>> +        process_pending_softirqs();
>
> This is causing issues in ARM (x86 has a similar issue):
>
> Oct 20 01:43:31.410010 (XEN) Xen call trace:
> Oct 20 01:43:31.410048 (XEN)    [<00233920>] process_pending_softirqs+0x34/0x5c (PC)
> Oct 20 01:43:31.417990 (XEN)    [<00237c6c>] timer.c#dump_timerq+0x9c/0x1fc (LR)
> Oct 20 01:43:31.418030 (XEN)    [<00218658>] handle_keypress+0xc0/0xf4
> Oct 20 01:43:31.426001 (XEN)    [<002490c8>] console.c#__serial_rx+0x4c/0x9c
> Oct 20 01:43:31.433970 (XEN)    [<00249b74>] console.c#serial_rx+0xcc/0xe4
> Oct 20 01:43:31.434007 (XEN)    [<0024b6ec>] serial_rx_interrupt+0xcc/0xf8
> Oct 20 01:43:31.441964 (XEN)    [<0024ae54>] exynos4210-uart.c#exynos4210_uart_interrupt+0xf8/0x160
> Oct 20 01:43:31.450001 (XEN)    [<00256338>] do_IRQ+0x1a0/0x228
> Oct 20 01:43:31.450040 (XEN)    [<00254074>] gic_interrupt+0x58/0xfc
> Oct 20 01:43:31.457985 (XEN)    [<00260f98>] do_trap_irq+0x24/0x38
> Oct 20 01:43:31.458022 (XEN)    [<00264970>] entry.o#return_from_trap+0/0x4
> Oct 20 01:43:31.466010 (XEN)    [<0030a240>] 0030a240
> Oct 20 01:43:31.466044 (XEN)
> Oct 20 01:43:31.466066 (XEN)
> Oct 20 01:43:31.466099 (XEN) ****************************************
> Oct 20 01:43:31.473998 (XEN) Panic on CPU 0:
> Oct 20 01:43:31.474029 (XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57
> Oct 20 01:43:31.481982 (XEN) ****************************************
>
> See
> http://logs.test-lab.xenproject.org/osstest/logs/101571/test-armhf-armhf-libvirt/serial-arndale-bluewater.log
>
> I've reverted this patch in staging.
>
> Wei.

dump_timerq() or other non-irq keyhandlers should not run in irq context 
and has sent out a fix patch.

https://lists.xen.org/archives/html/xen-devel/2016-10/msg01391.html



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-22  3:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12  7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu
2016-10-12  7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu
2016-10-12 13:19   ` Jan Beulich
2016-10-12 14:30     ` Lan, Tianyu
2016-10-12 16:03       ` Jan Beulich
2016-10-13  1:15         ` Lan Tianyu
2016-10-12  7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu
2016-10-21 17:27   ` Wei Liu
2016-10-22  3:52     ` Lan, Tianyu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.