linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Grygorii Strashko <grygorii.strashko@ti.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	<stable@vger.kernel.org>
Subject: Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq
Date: Fri, 24 Aug 2018 11:14:29 -0500	[thread overview]
Message-ID: <64587070-b9b1-4bc4-0f2e-59d33fe68f67@ti.com> (raw)
In-Reply-To: <20180824061750.GA20523@kroah.com>



On 08/24/2018 01:17 AM, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
>> Hi
>>
>> On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
>>> Before updating the full nohz tick or the idle time on IRQ exit, we
>>> check first if we are not in a nesting interrupt, whether the inner
>>> interrupt is a hard or a soft IRQ.
>>>
>>> There is a historical reason for that: the dyntick idle mode used to
>>> reprogram the tick on IRQ exit, after softirq processing, and there was
>>> no point in doing that job in the outer nesting interrupt because the
>>> tick update will be performed through the end of the inner interrupt
>>> eventually, with even potential new timer updates.
>>>
>>> One corner case could show up though: if an idle tick interrupts a softirq
>>> executing inline in the idle loop (through a call to local_bh_enable())
>>> after we entered in dynticks mode, the IRQ won't reprogram the tick
>>> because it assumes the softirq executes on an inner IRQ-tail. As a
>>> result we might put the CPU in sleep mode with the tick completely
>>> stopped whereas a timer can still be enqueued. Indeed there is no tick
>>> reprogramming in local_bh_enable(). We probably asssumed there was no bh
>>> disabled section in idle, although there didn't seem to be debug code
>>> ensuring that.
>>>
>>> Nowadays the nesting interrupt optimization still stands but only concern
>>> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
>>> and we want to wait for the end of the inner IRQ to reprogramm the tick.
>>> But in_interrupt() doesn't make a difference between softirqs executing
>>> on IRQ tail and those executing inline. What was to be considered a
>>> corner case in dynticks-idle mode now becomes a serious opportunity for
>>> a bug in full dynticks mode: if a tick interrupts a task executing
>>> softirq inline, the tick reprogramming will be ignored and we may exit
>>> to userspace after local_bh_enable() with an enqueued timer that will
>>> never fire.
>>>
>>> To fix this, simply keep reprogramming the tick if we are in a hardirq
>>> interrupting softirq. We can still figure out a way later to restore
>>> this optimization while excluding inline softirq processing.
>>>
>>> Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
>>> ---
>>>    kernel/softirq.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>>> index 900dcfe..0980a81 100644
>>> --- a/kernel/softirq.c
>>> +++ b/kernel/softirq.c
>>> @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
>>>    
>>>    	/* Make sure that timer wheel updates are propagated */
>>>    	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>>> -		if (!in_interrupt())
>>> +		if (!in_irq())
>>>    			tick_nohz_irq_exit();
>>>    	}
>>>    #endif
>>>
>>
>> This patch was back ported to the Stable linux-4.14.y and It causes regression -
>>   flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot (NFS boot):
>>
>> [    4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
>> [    4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
>>
>> the same is not reproducible with LKML - seems due to changes in tick-sched.c
>> __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?

not sure. But it seems set of changes from Rafael J. Wysocki:

ff7de62 nohz: Avoid duplication of code related to got_idle_tick
296bb1e cpuidle: menu: Refine idle state selection for running tick
554c8aa sched: idle: Select idle state before stopping the tick
23a8d88 time: tick-sched: Split tick_nohz_stop_sched_tick()
45f1ff5 cpuidle: Return nohz hint from cpuidle_select()
2aaf709 sched: idle: Do not stop the tick upfront in the idle loop
0e77676 time: tick-sched: Reorganize idle tick management code
b7eaf1a cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

> 
>> I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
>> patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt:
>>
>> gic_handle_irq
>>   |- irq_exit
>>      |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
>>      |-__do_softirq
>> 	<irqs enabled>
>> 	|- gic_handle_irq()
>> 	   |- irq_exit()
>> 		|- tick_irq_exit()
>> 		   if (!in_irq()) <-- My understanding is that this condition will be always true due to [1]
>> 			tick_nohz_irq_exit();
>> 			|-__tick_nohz_idle_enter()
>> 			  |- can_stop_idle_tick()
>>
>> Sry, not sure if my conclusion is right and how can it be fixed.
> 
> Any pointers to a patch that might need to be backported would be
> appreciated.
> 

commit 
Author: Frederic Weisbecker <frederic@kernel.org>
Date: Fri Aug 3 15:31:34 2018 +0200

nohz: Fix missing tick reprogram when interrupting an inline softirq

commit 0a0e0829f990120cef165bbb804237f400953ec2 upstream.



-- 
regards,
-grygorii

      parent reply	other threads:[~2018-08-24 16:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-31 22:52 [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq Frederic Weisbecker
2018-08-01 12:03 ` Anna-Maria Gleixner
2018-08-01 17:46 ` Thomas Gleixner
2018-08-01 21:08   ` Frederic Weisbecker
2018-08-03 11:42     ` Thomas Gleixner
2018-08-23 22:57 ` Grygorii Strashko
2018-08-24  6:17   ` Greg KH
2018-08-24  7:01     ` Thomas Gleixner
2018-08-24 14:27       ` Frederic Weisbecker
2018-08-24 16:10       ` Grygorii Strashko
2018-08-24 18:41         ` Frederic Weisbecker
2018-08-28 17:56           ` Grygorii Strashko
2018-08-30 14:10             ` John Crispin
2018-08-30 14:29               ` Thomas Gleixner
2018-08-24 16:14     ` Grygorii Strashko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64587070-b9b1-4bc4-0f2e-59d33fe68f67@ti.com \
    --to=grygorii.strashko@ti.com \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).