All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] nohz_full state entry failure in preempt_rt and proposed fix
@ 2020-12-23  9:20 ramesh.thomas
  2020-12-23  9:20 ` [PATCH 1/1] dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt ramesh.thomas
  0 siblings, 1 reply; 3+ messages in thread
From: ramesh.thomas @ 2020-12-23  9:20 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Ramesh Thomas, williams, frederic, bigeasy

From: Ramesh Thomas <ramesh.thomas@intel.com>

Hello,

This addresses an issue we have been facing with preempt_rt kernels not
able to enter nohz_full state consistently. Following are my debug
findings and details of a tool I had devloped that can help reproduce
the issue. Following patch has a proposed fix or at least a pointer to
areas worth looking into. We need preempt_rt for determinism and being
able to use nohz_full along with it is very valuable.

Problem:
Sometimes nohz_full state is never entered even when all necessary
conditions are met. It is easier to reproduce the issue in preempt_rt
kernel, however it may not be limited to preempt_rt.

Debug findings and proposed fix:
Observed that in the failure condition, entry into nohz_full state is
repeatedly aborted due to the detection of a pending timer event in the
next period in tick_nohz_next_event(). The issue is not reproduceable if
tick stoppage is not bailed out here. The skipping of the bailing out
code is done only if CONFIG_NO_HZ_FULL is defined. Since in nohz_full
mode, idle state is not entered when ticks are being stopped, aborting
tick stoppage may not be necessary. It is simpler to let the common code
that handles reprogramming of the timer at tick_nohz_stop_tick() handle
the next tick. 

Environment to reproduce:
Used Intel NUCs with 4 cores (Apollo Lake and Tiger Lake). Easier to
reproduce in embedded platforms. 

Kernel version:5.10.1-rt19 (This is not a new issue and I have seen it
in rt kernel version 4.17)

Relevant kernel config flags:
- NO_HZ_FULL=y
- PREEMPT_RT=y
- CPU_ISOLATION=y
- RCU_NOCB_CPU=y

Relevant kernel boot parameters:
- isolcpus=nohz,domain,1,3 nohz_full=1,3 rcu_nocbs=1,3 irqaffinity=0
- cpufreq.off=1 idle=poll cpuidle.off=1

Steps to reproduce:
1. Disable rt throttling assigning 100% scheduler period to rt tasks
2. Set cpu affinity to one of the nohz_full cpus
3. Set scheduling policy to sched_fifo with max priority
4. Wait till "tick_stopped" gets set in /proc/timer_list
5. Return failure if tick is not stopped in 15 seconds
6. Run above steps in a loop to stress it. It may take a while to
reproduce.

The above can be done using a tool I had developed as part of a
framework to assist setting up CPU thread isolation and measuring
jitter. It can be found at https://github.com/intel/tif

Build the tif_test app and run as follows using the included script to
run it in a loop till the failure is reproduced. 

make test
./tif_stress.sh

e.g. output. 
"Test# 818
Successfully entered nohz state in 724us

Error entering nohz state after 15000102us
Reproduced NOHZ_FULL failure after 818 tries!!!
Test elapsed time: 835 seconds"

(PS: The framework has a workaround for the issue which is not used in
the test. The workaround that helped was, switching CPU affinity in and
out of the nohz_full CPU giving the scheduler a fresh start)


Ramesh Thomas (1):
  dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt

 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt
  2020-12-23  9:20 [PATCH 0/1] nohz_full state entry failure in preempt_rt and proposed fix ramesh.thomas
@ 2020-12-23  9:20 ` ramesh.thomas
  2021-01-14 14:48   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 3+ messages in thread
From: ramesh.thomas @ 2020-12-23  9:20 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Ramesh Thomas, williams, frederic, bigeasy

From: Ramesh Thomas <ramesh.thomas@intel.com>

After all conditions to enter nohz_full state are met, if a timer event
is detected in the next period, nohz_full entry is aborted in
tick_nohz_next_event(). In the failure condition, this scenario keeps
repeating. This has been reproduced in preempt_rt kernel, however it may
not be limited to it.

Not bailing out if CONFIG_NO_HZ_FULL is defined, fixes the issue. Since
in NO_HZ_FULL mode, idle state is not entered at tick stoppage, the
pending timer event can be handled where the next timer is programmed in
tick_nohz_stop_tick(). This also simplifies the logic by keeping a
common path to handle the entry/exit of nohz_full state, whether the
tick was already stopped or not.

(The entire check for the presence of a timer event in the next period
can possibly be skipped for NO_HZ_FULL)

Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com>
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 308a98c28ee8..3798dbc5f082 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -739,10 +739,12 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 		 * We've not stopped the tick yet, and there's a timer in the
 		 * next period, so no point in stopping it either, bail.
 		 */
+#ifndef CONFIG_NO_HZ_FULL
 		if (!ts->tick_stopped) {
 			ts->timer_expires = 0;
-			goto out;
+			return 0;
 		}
+#endif
 	}
 
 	/*
@@ -763,7 +765,6 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 
 	ts->timer_expires = min_t(u64, expires, next_tick);
 
-out:
 	return ts->timer_expires;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt
  2020-12-23  9:20 ` [PATCH 1/1] dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt ramesh.thomas
@ 2021-01-14 14:48   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-01-14 14:48 UTC (permalink / raw)
  To: ramesh.thomas; +Cc: linux-rt-users, williams, frederic, tglx

On 2020-12-23 04:20:36 [-0500], ramesh.thomas@intel.com wrote:
> From: Ramesh Thomas <ramesh.thomas@intel.com>
> 
> After all conditions to enter nohz_full state are met, if a timer event
> is detected in the next period, nohz_full entry is aborted in
> tick_nohz_next_event(). In the failure condition, this scenario keeps
> repeating. This has been reproduced in preempt_rt kernel, however it may
> not be limited to it.
 
Which timer is? Why does it make sense to enter NOHZ_FULL if the timer
expires shortly after (in next jiffy window if I understood it
properly).

> Not bailing out if CONFIG_NO_HZ_FULL is defined, fixes the issue. Since
> in NO_HZ_FULL mode, idle state is not entered at tick stoppage, the
> pending timer event can be handled where the next timer is programmed in
> tick_nohz_stop_tick(). This also simplifies the logic by keeping a
> common path to handle the entry/exit of nohz_full state, whether the
> tick was already stopped or not.
> 
> (The entire check for the presence of a timer event in the next period
> can possibly be skipped for NO_HZ_FULL)
> 
> Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com>
> ---
>  kernel/time/tick-sched.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 308a98c28ee8..3798dbc5f082 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -739,10 +739,12 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
>  		 * We've not stopped the tick yet, and there's a timer in the
>  		 * next period, so no point in stopping it either, bail.
>  		 */
> +#ifndef CONFIG_NO_HZ_FULL
>  		if (!ts->tick_stopped) {
>  			ts->timer_expires = 0;
> -			goto out;
> +			return 0;
>  		}
> +#endif
>  	}
>  
>  	/*
> @@ -763,7 +765,6 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
>  
>  	ts->timer_expires = min_t(u64, expires, next_tick);
>  
> -out:
>  	return ts->timer_expires;
>  }
>  
> -- 
> 2.26.2
> 

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-14 14:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23  9:20 [PATCH 0/1] nohz_full state entry failure in preempt_rt and proposed fix ramesh.thomas
2020-12-23  9:20 ` [PATCH 1/1] dynticks/preempt_rt: Fix a nohz_full entry failure in preempt_rt ramesh.thomas
2021-01-14 14:48   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.