All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Xen/timer: Disable watchdog during dumping timer queues
@ 2016-09-13  7:12 Lan Tianyu
       [not found] ` <57D7DCC2020000780010E553@prv-mh.provo.novell.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Lan Tianyu @ 2016-09-13  7:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, jbeulich

On a machine with a mount of cpus, dump_timerq() lasts several seconds
which may exceed watchdog timeout and cause Xen hyperviosr reboot.
This patch is to disable watchdog when dump timer queues to fix the
issue.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/common/timer.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/xen/common/timer.c b/xen/common/timer.c
index 29a60a9..2d9d828 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -21,6 +21,7 @@
 #include <xen/cpu.h>
 #include <xen/rcupdate.h>
 #include <xen/symbols.h>
+#include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/desc.h>
 #include <asm/atomic.h>
@@ -524,6 +525,7 @@ static void dump_timerq(unsigned char key)
     s_time_t       now = NOW();
     int            i, j;
 
+    watchdog_disable();
     printk("Dumping timer queues:\n");
 
     for_each_online_cpu( i )
@@ -538,6 +540,8 @@ static void dump_timerq(unsigned char key)
             dump_timer(t, now);
         spin_unlock_irqrestore(&ts->lock, flags);
     }
+
+    watchdog_enable();
 }
 
 static void migrate_timers_from_cpu(unsigned int old_cpu)
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
       [not found]   ` <c7777159-7863-99fc-76ab-e8f4195286e1@intel.com>
@ 2016-09-13 15:25     ` Jan Beulich
  2016-09-15 14:16       ` Lan, Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-13 15:25 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 13.09.16 at 16:56, <tianyu.lan@intel.com> wrote:

Seems like mistakenly dropped xen-devel in my earlier reply; re-added.

> On 9/13/2016 5:02 PM, Jan Beulich wrote:
>>>>> On 13.09.16 at 09:12, <tianyu.lan@intel.com> wrote:
>>> On a machine with a mount of cpus, dump_timerq() lasts several seconds
>>> which may exceed watchdog timeout and cause Xen hyperviosr reboot.
>>> This patch is to disable watchdog when dump timer queues to fix the
>>> issue.
>>
>> But disabling the watchdog is not the resolution. Instead you
>> want to invoke process_pending_softirqs() every once in a while,
>> just like certain other key handlers do.
> 
> Actually, I have tried that way but it's not enough.
> Keyhandler may run in the timer handler and the following log shows
> calltrace. The timer subsystem run all expired timers' handler
> before programing next timer event. If keyhandler runs longer than
> timeout, there will be no chance to configure timer before triggering
> watchdog and hypervisor rebooting.
> 
> (XEN)    [<ffff82d0801145be>] handle_keypress+0x7b/0xa5
> (XEN)    [<ffff82d080142dff>] console.c#__serial_rx+0x16/0x54
> (XEN)    [<ffff82d080142ed1>] console.c#serial_rx+0x94/0x9f
> (XEN)    [<ffff82d080145641>] serial_rx_interrupt+0xa9/0xbf
> (XEN)    [<ffff82d080143b5f>] ns16550.c#__ns16550_poll+0x53/0xb4
> (XEN)    [<ffff82d080197c1c>] do_invalid_op+0x26d/0x49f
> (XEN)    [<ffff82d08023cfb0>] cpufreq.c#handle_exception_saved+0x2e/0x6c
> (XEN)    [<ffff82d080143594>] ns16550.c#ns16550_poll+0x20/0x24
> (XEN)    [<ffff82d080130769>] timer.c#execute_timer+0x4e/0x6c
> (XEN)    [<ffff82d0801308b7>] timer.c#timer_softirq_action+0xea/0x222
> (XEN)    [<ffff82d08012cc8a>] softirq.c#__do_softirq+0x9b/0xac
> (XEN)    [<ffff82d08012ccae>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08016400a>] domain.c#idle_loop+0x78/0x88

Wait - what is do_invalid_op() doing on the stack? I don't think it
belongs there, and hence I wonder whether the keypress
happened after some already fatal event (in which case all bets
are off anyway).

> Another solution is to schedule a tasklet to run keyhandler in timer
> handler and invoke process_pending_softirqs() in the dump_timerq().
> This also works but it requires to rework keyhandler mechanism.
> 
> Disable watchdog seems to be simpler and I found dump_registers() also
> used the same way to deal with the issue.

That's true. Just that on large machines it defaults to the
alternative model, for which I'm not sure it actually needs the
watchdog disabled (as data for a single CPU shouldn't exceed
the threshold).

Jan

> Here is my draft patch of reworking keyhandler.
> diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
> index 16de6e8..579d076 100644
> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -75,9 +75,10 @@ static struct keyhandler {
> 
>   static void keypress_action(unsigned long unused)
>   {
> -    handle_keypress(keypress_key, NULL);
> +    console_start_log_everything();
> +    h->fn(keypress_key);
> +    console_end_log_everything();
>   }
> -
>   static DECLARE_TASKLET(keypress_tasklet, keypress_action, 0);
> 
>   void handle_keypress(unsigned char key, struct cpu_user_regs *regs)
> @@ -87,10 +88,10 @@ void handle_keypress(unsigned char key, struct 
> cpu_user_regs *regs)
>       if ( key >= ARRAY_SIZE(key_table) || !(h = &key_table[key])->fn )
>           return;
> 
> -    if ( !in_irq() || h->irq_callback )
> +    if (h->irq_callback))
>       {
>           console_start_log_everything();
> -        h->irq_callback ? h->irq_fn(key, regs) : h->fn(key);
> +        h->irq_fn(key, regs);
>           console_end_log_everything();
>       }
>       else
> 
> 
>>
>> Jan
>>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-13 15:25     ` Jan Beulich
@ 2016-09-15 14:16       ` Lan, Tianyu
  2016-09-15 14:32         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan, Tianyu @ 2016-09-15 14:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 9/13/2016 11:25 PM, Jan Beulich wrote:
> Wait - what is do_invalid_op() doing on the stack? I don't think it
> belongs there, and hence I wonder whether the keypress
> happened after some already fatal event (in which case all bets
> are off anyway).

Not clear why do_invalid_op() on the stack. There is no other fatal
event. The issue disappears when set watchdog_timeout to 10s.

>> > Another solution is to schedule a tasklet to run keyhandler in timer
>> > handler and invoke process_pending_softirqs() in the dump_timerq().
>> > This also works but it requires to rework keyhandler mechanism.
>> >
>> > Disable watchdog seems to be simpler and I found dump_registers() also
>> > used the same way to deal with the issue.
> That's true. Just that on large machines it defaults to the
> alternative model, for which I'm not sure it actually needs the
> watchdog disabled (as data for a single CPU shouldn't exceed
> the threshold).
>

It seems not to be necessary to disable watchdog in alternative model
since dumping a single cpu's status will not last a long time.


For the issue in the dump timer info handler, disabling watchdog is ok
for you or you have other suggestions to resolve the issue?

I also found other places where dump a lot of logs disable watchdog.
(E,G run_all_keyhandlers(), debugtrace_dump() debugtrace_toggle() and so
on). This seems a common solution.






> Jan
>
>> > Here is my draft patch of reworking keyhandler.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-15 14:16       ` Lan, Tianyu
@ 2016-09-15 14:32         ` Jan Beulich
  2016-09-19 13:57           ` Lan, Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-15 14:32 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 15.09.16 at 16:16, <tianyu.lan@intel.com> wrote:
> On 9/13/2016 11:25 PM, Jan Beulich wrote:
>> Wait - what is do_invalid_op() doing on the stack? I don't think it
>> belongs there, and hence I wonder whether the keypress
>> happened after some already fatal event (in which case all bets
>> are off anyway).
> 
> Not clear why do_invalid_op() on the stack. There is no other fatal
> event. The issue disappears when set watchdog_timeout to 10s.
> 
>>> > Another solution is to schedule a tasklet to run keyhandler in timer
>>> > handler and invoke process_pending_softirqs() in the dump_timerq().
>>> > This also works but it requires to rework keyhandler mechanism.
>>> >
>>> > Disable watchdog seems to be simpler and I found dump_registers() also
>>> > used the same way to deal with the issue.
>> That's true. Just that on large machines it defaults to the
>> alternative model, for which I'm not sure it actually needs the
>> watchdog disabled (as data for a single CPU shouldn't exceed
>> the threshold).
>>
> 
> It seems not to be necessary to disable watchdog in alternative model
> since dumping a single cpu's status will not last a long time.
> 
> 
> For the issue in the dump timer info handler, disabling watchdog is ok
> for you or you have other suggestions to resolve the issue?

Well, without a clear understanding of why the issue occurs (for
which I need to refer you back to the questionable stack dump)
I'm hesitant to agree to this step, yet ...

> I also found other places where dump a lot of logs disable watchdog.
> (E,G run_all_keyhandlers(), debugtrace_dump() debugtrace_toggle() and so
> on). This seems a common solution.

... I'm also not entirely against it considering the various other
examples. I.e. as almost always: As long as the need for the
change can be properly explained, I won't stand in the way of
getting it in.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-15 14:32         ` Jan Beulich
@ 2016-09-19 13:57           ` Lan, Tianyu
  2016-09-19 14:46             ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan, Tianyu @ 2016-09-19 13:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 9/15/2016 10:32 PM, Jan Beulich wrote:
>>>> On 15.09.16 at 16:16, <tianyu.lan@intel.com> wrote:
>> On 9/13/2016 11:25 PM, Jan Beulich wrote:
>>> Wait - what is do_invalid_op() doing on the stack? I don't think it
>>> belongs there, and hence I wonder whether the keypress
>>> happened after some already fatal event (in which case all bets
>>> are off anyway).
>>
>> Not clear why do_invalid_op() on the stack. There is no other fatal
>> event. The issue disappears when set watchdog_timeout to 10s.
>>
>>>>> Another solution is to schedule a tasklet to run keyhandler in timer
>>>>> handler and invoke process_pending_softirqs() in the dump_timerq().
>>>>> This also works but it requires to rework keyhandler mechanism.
>>>>>
>>>>> Disable watchdog seems to be simpler and I found dump_registers() also
>>>>> used the same way to deal with the issue.
>>> That's true. Just that on large machines it defaults to the
>>> alternative model, for which I'm not sure it actually needs the
>>> watchdog disabled (as data for a single CPU shouldn't exceed
>>> the threshold).
>>>
>>
>> It seems not to be necessary to disable watchdog in alternative model
>> since dumping a single cpu's status will not last a long time.
>>
>>
>> For the issue in the dump timer info handler, disabling watchdog is ok
>> for you or you have other suggestions to resolve the issue?
>
> Well, without a clear understanding of why the issue occurs (for
> which I need to refer you back to the questionable stack dump)
> I'm hesitant to agree to this step, yet ...

After some researches, I found do_invalid_op() on the stack dump is
caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
rather than fatal event. The timeout issue still exists when run
__ns16550_poll() directly in the ns16550_poll().


>
>> I also found other places where dump a lot of logs disable watchdog.
>> (E,G run_all_keyhandlers(), debugtrace_dump() debugtrace_toggle() and so
>> on). This seems a common solution.
>
> ... I'm also not entirely against it considering the various other
> examples. I.e. as almost always: As long as the need for the
> change can be properly explained, I won't stand in the way of
> getting it in.
>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-19 13:57           ` Lan, Tianyu
@ 2016-09-19 14:46             ` Jan Beulich
  2016-09-20 14:52               ` Lan, Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-19 14:46 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.09.16 at 15:57, <tianyu.lan@intel.com> wrote:

> 
> On 9/15/2016 10:32 PM, Jan Beulich wrote:
>>>>> On 15.09.16 at 16:16, <tianyu.lan@intel.com> wrote:
>>> On 9/13/2016 11:25 PM, Jan Beulich wrote:
>>>> Wait - what is do_invalid_op() doing on the stack? I don't think it
>>>> belongs there, and hence I wonder whether the keypress
>>>> happened after some already fatal event (in which case all bets
>>>> are off anyway).
>>>
>>> Not clear why do_invalid_op() on the stack. There is no other fatal
>>> event. The issue disappears when set watchdog_timeout to 10s.
>>>
>>>>>> Another solution is to schedule a tasklet to run keyhandler in timer
>>>>>> handler and invoke process_pending_softirqs() in the dump_timerq().
>>>>>> This also works but it requires to rework keyhandler mechanism.
>>>>>>
>>>>>> Disable watchdog seems to be simpler and I found dump_registers() also
>>>>>> used the same way to deal with the issue.
>>>> That's true. Just that on large machines it defaults to the
>>>> alternative model, for which I'm not sure it actually needs the
>>>> watchdog disabled (as data for a single CPU shouldn't exceed
>>>> the threshold).
>>>>
>>>
>>> It seems not to be necessary to disable watchdog in alternative model
>>> since dumping a single cpu's status will not last a long time.
>>>
>>>
>>> For the issue in the dump timer info handler, disabling watchdog is ok
>>> for you or you have other suggestions to resolve the issue?
>>
>> Well, without a clear understanding of why the issue occurs (for
>> which I need to refer you back to the questionable stack dump)
>> I'm hesitant to agree to this step, yet ...
> 
> After some researches, I found do_invalid_op() on the stack dump is
> caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
> rather than fatal event. The timeout issue still exists when run
> __ns16550_poll() directly in the ns16550_poll().

Well, I then still don't see why e.g. dump_domains() doesn't also need
it. Earlier you did say:

  Keyhandler may run in the timer handler and the following log shows
  calltrace. The timer subsystem run all expired timers' handler
  before programing next timer event. If keyhandler runs longer than
  timeout, there will be no chance to configure timer before triggering
  watchdog and hypervisor rebooting.

The fact that using debug keys may adversely affect the rest of the
system is known. And the nesting of process_pending_softirqs()
inside do_softirq() should, from looking at them, work fine. So I
continue to have trouble seeing the specific reason for the problem
you say you observe.

And as a separate note - dump_registers() is quite an exception
among the key handlers, and that's for a good reason (as the
comment there says). So I continue to be hesitant to see this
spread to other key handlers.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-19 14:46             ` Jan Beulich
@ 2016-09-20 14:52               ` Lan, Tianyu
  2016-09-20 15:36                 ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan, Tianyu @ 2016-09-20 14:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 9/19/2016 10:46 PM, Jan Beulich wrote:
>>> Well, without a clear understanding of why the issue occurs (for
>>> >> which I need to refer you back to the questionable stack dump)
>>> >> I'm hesitant to agree to this step, yet ...
>> >
>> > After some researches, I found do_invalid_op() on the stack dump is
>> > caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
>> > rather than fatal event. The timeout issue still exists when run
>> > __ns16550_poll() directly in the ns16550_poll().
> Well, I then still don't see why e.g. dump_domains() doesn't also need
> it.

After testing, dump_domains() also has such issue after I create two VM
with 128 vcpus.

> Earlier you did say:
>
>   Keyhandler may run in the timer handler and the following log shows
>   calltrace. The timer subsystem run all expired timers' handler
>   before programing next timer event. If keyhandler runs longer than
>   timeout, there will be no chance to configure timer before triggering
>   watchdog and hypervisor rebooting.
>
> The fact that using debug keys may adversely affect the rest of the
> system is known. And the nesting of process_pending_softirqs()
> inside do_softirq() should, from looking at them, work fine. So I
> continue to have trouble seeing the specific reason for the problem
> you say you observe.

The precondition of process_pending_softirq() working in the debug key
handler is that timer interrupt arrives on time and nmi_timer_fn() can
run to update nmi_timer_ticks before watchdog timeout.

When a timer interrupt arrives, timer_softirq_action() will run all
expired timer handlers before programing next timer interrupt via
reprogram_timer(). If a timer handler runs too long E,G >5s(Time for
watchdog timeout is default to be 5s.), this will cause no timer
interrupt arriving within 5s and nmi_timer_fn() also won't be called.
Does this make sense to you?


> And as a separate note - dump_registers() is quite an exception
> among the key handlers, and that's for a good reason (as the
> comment there says). So I continue to be hesitant to see this
> spread to other key handlers.
>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-20 14:52               ` Lan, Tianyu
@ 2016-09-20 15:36                 ` Jan Beulich
  2016-09-21  1:54                   ` Lan Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-20 15:36 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 20.09.16 at 16:52, <tianyu.lan@intel.com> wrote:
> On 9/19/2016 10:46 PM, Jan Beulich wrote:
>>>> Well, without a clear understanding of why the issue occurs (for
>>>> >> which I need to refer you back to the questionable stack dump)
>>>> >> I'm hesitant to agree to this step, yet ...
>>> >
>>> > After some researches, I found do_invalid_op() on the stack dump is
>>> > caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
>>> > rather than fatal event. The timeout issue still exists when run
>>> > __ns16550_poll() directly in the ns16550_poll().
>> Well, I then still don't see why e.g. dump_domains() doesn't also need
>> it.
> 
> After testing, dump_domains() also has such issue after I create two VM
> with 128 vcpus.
> 
>> Earlier you did say:
>>
>>   Keyhandler may run in the timer handler and the following log shows
>>   calltrace. The timer subsystem run all expired timers' handler
>>   before programing next timer event. If keyhandler runs longer than
>>   timeout, there will be no chance to configure timer before triggering
>>   watchdog and hypervisor rebooting.
>>
>> The fact that using debug keys may adversely affect the rest of the
>> system is known. And the nesting of process_pending_softirqs()
>> inside do_softirq() should, from looking at them, work fine. So I
>> continue to have trouble seeing the specific reason for the problem
>> you say you observe.
> 
> The precondition of process_pending_softirq() working in the debug key
> handler is that timer interrupt arrives on time and nmi_timer_fn() can
> run to update nmi_timer_ticks before watchdog timeout.

Precondition?

> When a timer interrupt arrives, timer_softirq_action() will run all
> expired timer handlers before programing next timer interrupt via
> reprogram_timer(). If a timer handler runs too long E,G >5s(Time for
> watchdog timeout is default to be 5s.), this will cause no timer
> interrupt arriving within 5s and nmi_timer_fn() also won't be called.
> Does this make sense to you?

Partly. I continue to think that the sequence

some keyhandler
	timer interrupt
keyhandler continues
keyhandler calls process_pending_softirq()

should, among other things, result in timer_softirq_action() to get
run. And I don't see the _timer_ handler running for to long here,
only a key handler. Are you perhaps instead suffering from the
nested instance of timer_softirq_action() not being able to acquire
its lock? That would be an entirely different issue than you had
described so far.

And irrespective of this it is of course quite clear that timers aren't
meant to run heavyweight work like key handlers, so the way
ns16550_poll() works right now is probably what we'll want to alter.
Which btw raises another question: Why are you in polling mode in
the first place? Do you have a UART without working interrupt?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-20 15:36                 ` Jan Beulich
@ 2016-09-21  1:54                   ` Lan Tianyu
  2016-09-21  9:25                     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan Tianyu @ 2016-09-21  1:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 2016年09月20日 23:36, Jan Beulich wrote:
>> The precondition of process_pending_softirq() working in the debug key
>> > handler is that timer interrupt arrives on time and nmi_timer_fn() can
>> > run to update nmi_timer_ticks before watchdog timeout.
> Precondition?

Process_pending_softirq() in debug key handler is mainly to deal with
timer softirq to update nmi_timer_ticks in order to avoid NMI watchdog.
If there is no timer interrupt arriving for long time,
process_pending_softirq() here is meaningless and NMI watchdog still
will be timeout.

> 
>> > When a timer interrupt arrives, timer_softirq_action() will run all
>> > expired timer handlers before programing next timer interrupt via
>> > reprogram_timer(). If a timer handler runs too long E,G >5s(Time for
>> > watchdog timeout is default to be 5s.), this will cause no timer
>> > interrupt arriving within 5s and nmi_timer_fn() also won't be called.
>> > Does this make sense to you?
> Partly. I continue to think that the sequence
> 
> some keyhandler
> 	timer interrupt
> keyhandler continues
> keyhandler calls process_pending_softirq()
> 

Question for your sequence is why there is timer interrupt before
programing timer interrupt.

Actually the sequence in this case is
timer interrupt
run key handlers in timer handler
program next timer interrupt
...



> should, among other things, result in timer_softirq_action() to get
> run. And I don't see the _timer_ handler running for to long here,
> only a key handler.

Key handler may run a long time(E,G >5s) on machine with amount of cpus
or create huge VM. If keyhandler doesn't run for long time,
timer_softirq_action() would also be not necessary since the default
timeout is 5s and nmi timer's interval is 1s.


> Are you perhaps instead suffering from the
> nested instance of timer_softirq_action() not being able to acquire
> its lock?

No, the serial port continues printing timer info before watchdog timeout.


-- 
Best regards
Tianyu Lan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-21  1:54                   ` Lan Tianyu
@ 2016-09-21  9:25                     ` Jan Beulich
  2016-09-22 14:18                       ` Lan Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-21  9:25 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 21.09.16 at 03:54, <tianyu.lan@intel.com> wrote:
> On 2016年09月20日 23:36, Jan Beulich wrote:
>>> The precondition of process_pending_softirq() working in the debug key
>>> > handler is that timer interrupt arrives on time and nmi_timer_fn() can
>>> > run to update nmi_timer_ticks before watchdog timeout.
>> Precondition?
> 
> Process_pending_softirq() in debug key handler is mainly to deal with
> timer softirq to update nmi_timer_ticks in order to avoid NMI watchdog.
> If there is no timer interrupt arriving for long time,
> process_pending_softirq() here is meaningless and NMI watchdog still
> will be timeout.

Oh, right. Still I continue to be unconvinced that disabling the
watchdog is the right answer (not running timers for a long time
has other undesirable consequence), or if it is, then it being
needed in only this one key handler. So perhaps you should
really consider submitting your generic key handler adjustment
as an alternative.

But please also answer the earlier question, which you did strip
from your reply:

>Which btw raises another question: Why are you in polling mode in
>the first place? Do you have a UART without working interrupt?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-21  9:25                     ` Jan Beulich
@ 2016-09-22 14:18                       ` Lan Tianyu
  2016-09-22 14:26                         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan Tianyu @ 2016-09-22 14:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 9/21/2016 5:25 PM, Jan Beulich wrote:
>>>> On 21.09.16 at 03:54, <tianyu.lan@intel.com> wrote:
>> On 2016年09月20日 23:36, Jan Beulich wrote:
>>>> The precondition of process_pending_softirq() working in the debug key
>>>>> handler is that timer interrupt arrives on time and nmi_timer_fn() can
>>>>> run to update nmi_timer_ticks before watchdog timeout.
>>> Precondition?
>>
>> Process_pending_softirq() in debug key handler is mainly to deal with
>> timer softirq to update nmi_timer_ticks in order to avoid NMI watchdog.
>> If there is no timer interrupt arriving for long time,
>> process_pending_softirq() here is meaningless and NMI watchdog still
>> will be timeout.
>
> Oh, right. Still I continue to be unconvinced that disabling the
> watchdog is the right answer (not running timers for a long time
> has other undesirable consequence), or if it is, then it being
> needed in only this one key handler. So perhaps you should
> really consider submitting your generic key handler adjustment
> as an alternative.
>

Disable watchdog is common solution for such kind of issues in current
codes and so I chose it. I also proposed another solution in previous
mail that run keyhandler always in a tasklet and insert
process_pending_softirq() in the keyhandler.

> But please also answer the earlier question, which you did strip
> from your reply:
>
>> Which btw raises another question: Why are you in polling mode in
>> the first place? Do you have a UART without working interrupt?
>

I found there was no interrupt with Xen ns16550 dirver while
linux kernel's serial driver can receive interrupt.


> Jan
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-22 14:18                       ` Lan Tianyu
@ 2016-09-22 14:26                         ` Jan Beulich
  2016-09-22 15:04                           ` Lan, Tianyu
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2016-09-22 14:26 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 22.09.16 at 16:18, <tianyu.lan@intel.com> wrote:
> On 9/21/2016 5:25 PM, Jan Beulich wrote:
>>>>> On 21.09.16 at 03:54, <tianyu.lan@intel.com> wrote:
>>> On 2016年09月20日 23:36, Jan Beulich wrote:
>>>>> The precondition of process_pending_softirq() working in the debug key
>>>>>> handler is that timer interrupt arrives on time and nmi_timer_fn() can
>>>>>> run to update nmi_timer_ticks before watchdog timeout.
>>>> Precondition?
>>>
>>> Process_pending_softirq() in debug key handler is mainly to deal with
>>> timer softirq to update nmi_timer_ticks in order to avoid NMI watchdog.
>>> If there is no timer interrupt arriving for long time,
>>> process_pending_softirq() here is meaningless and NMI watchdog still
>>> will be timeout.
>>
>> Oh, right. Still I continue to be unconvinced that disabling the
>> watchdog is the right answer (not running timers for a long time
>> has other undesirable consequence), or if it is, then it being
>> needed in only this one key handler. So perhaps you should
>> really consider submitting your generic key handler adjustment
>> as an alternative.
> 
> Disable watchdog is common solution for such kind of issues in current
> codes and so I chose it. I also proposed another solution in previous
> mail that run keyhandler always in a tasklet and insert
> process_pending_softirq() in the keyhandler.

Yes, that's the patch I've been referring to in my previous answer.

>> But please also answer the earlier question, which you did strip
>> from your reply:
>>
>>> Which btw raises another question: Why are you in polling mode in
>>> the first place? Do you have a UART without working interrupt?
> 
> I found there was no interrupt with Xen ns16550 dirver while
> linux kernel's serial driver can receive interrupt.

And do you know the reason? Is it perhaps a PCI plug in card, and
you don't specify the IRQ on the command line? Or the kernel
doesn't provide the necessary information (from ACPI) for Xen to
set up that IRQ?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-22 14:26                         ` Jan Beulich
@ 2016-09-22 15:04                           ` Lan, Tianyu
  2016-09-22 15:35                             ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Lan, Tianyu @ 2016-09-22 15:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 9/22/2016 10:26 PM, Jan Beulich wrote:
>>> But please also answer the earlier question, which you did strip
>>> >> from your reply:
>>> >>
>>>> >>> Which btw raises another question: Why are you in polling mode in
>>>> >>> the first place? Do you have a UART without working interrupt?
>> >
>> > I found there was no interrupt with Xen ns16550 dirver while
>> > linux kernel's serial driver can receive interrupt.
> And do you know the reason? Is it perhaps a PCI plug in card, and
> you don't specify the IRQ on the command line? Or the kernel
> doesn't provide the necessary information (from ACPI) for Xen to
> set up that IRQ?

No, I am not familiar serial device. But it's a ACPI device from linux
sysfs node and serial drivers use irq 4 for their interrupt both on
linux and Xen.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Xen/timer: Disable watchdog during dumping timer queues
  2016-09-22 15:04                           ` Lan, Tianyu
@ 2016-09-22 15:35                             ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2016-09-22 15:35 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 22.09.16 at 17:04, <tianyu.lan@intel.com> wrote:
> On 9/22/2016 10:26 PM, Jan Beulich wrote:
>>>> But please also answer the earlier question, which you did strip
>>>> >> from your reply:
>>>> >>
>>>>> >>> Which btw raises another question: Why are you in polling mode in
>>>>> >>> the first place? Do you have a UART without working interrupt?
>>> >
>>> > I found there was no interrupt with Xen ns16550 dirver while
>>> > linux kernel's serial driver can receive interrupt.
>> And do you know the reason? Is it perhaps a PCI plug in card, and
>> you don't specify the IRQ on the command line? Or the kernel
>> doesn't provide the necessary information (from ACPI) for Xen to
>> set up that IRQ?
> 
> No, I am not familiar serial device. But it's a ACPI device from linux
> sysfs node and serial drivers use irq 4 for their interrupt both on
> linux and Xen.

Hmm - I do not see why IRQ4 would not work, especially if it does
under Linux.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-09-22 15:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-13  7:12 [PATCH] Xen/timer: Disable watchdog during dumping timer queues Lan Tianyu
     [not found] ` <57D7DCC2020000780010E553@prv-mh.provo.novell.com>
     [not found]   ` <c7777159-7863-99fc-76ab-e8f4195286e1@intel.com>
2016-09-13 15:25     ` Jan Beulich
2016-09-15 14:16       ` Lan, Tianyu
2016-09-15 14:32         ` Jan Beulich
2016-09-19 13:57           ` Lan, Tianyu
2016-09-19 14:46             ` Jan Beulich
2016-09-20 14:52               ` Lan, Tianyu
2016-09-20 15:36                 ` Jan Beulich
2016-09-21  1:54                   ` Lan Tianyu
2016-09-21  9:25                     ` Jan Beulich
2016-09-22 14:18                       ` Lan Tianyu
2016-09-22 14:26                         ` Jan Beulich
2016-09-22 15:04                           ` Lan, Tianyu
2016-09-22 15:35                             ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.