Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH] tick/sched: update full_nohz status after SCHED dep is cleared
@ 2020-05-20 14:04 Juri Lelli
  2020-05-20 16:24 ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Juri Lelli @ 2020-05-20 14:04 UTC (permalink / raw)
  To: fweisbec, tglx, mingo; +Cc: linux-rt-users, peterz, linux-kernel, Juri Lelli

After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
dependency is checked (via sched_update_tick_dependency()). In case tick
can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
dependency for such CPU is cleared. However, this new information is not
used right away to actually stop the tick.

In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
tick is serviced by an actual task (ksoftirqd corresponding to the CPU
where tick timer fired). So, in case a CPU was running a single task,
servicing the timer involves exiting full nozh mode. Problem at this
point is that we might lose chances to enter back into full nozh mode,
since info about ksoftirqd thread going back to sleep is not used (as
mentioned above).

Fix it by calling tick_nozh_full_update_tick(_cpu)() right after SCHED
dependency is cleared, so that tick can be promptly stopped.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
Hi,

I noticed what seems to be the problem described in the changelog while
running sysjitter [1] on a PREEMPT system setup for isolation and full
nozh, i.e. ... skew_tick=1 nohz=on nohz_full=4-35 rcu_nocbs=4-35
threadirqs (36 CPUs box).

Starting sysjitter with something like the following

perf stat -C 4-35 -e irq_vectors:local_timer_entry taskset \
--cpu-list 4-35 ./sysjitter/sysjitter --runtime 30 200

with vanilla kernel was returning ~5k local_timer_entry events, while
that number goes down to ~100 or so with the proposed patch applied.

The following trace snippet also highlight the problematic situation:

...
ksoftirqd/19-125   [019]   170.700754: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.700755: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.700756: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.700757: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.700759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.701740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.701742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.701743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.701744: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.701747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.701748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.701749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.701750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.701751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.701753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.701754: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.701756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.701756: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.701758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.701759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.702740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.702742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.702743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.702744: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.702747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.702748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.702749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.702750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.702751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.702753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.702755: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.702756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.702757: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.702758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.702760: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.703740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.703742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.703743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.703745: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.703747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.703748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.703749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.703750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.703751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.703753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.703755: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.703756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.703757: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.703758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.703759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.704500: call_function_single_entry: vector=251
   sysjitter-2459  [019]   170.704502: call_function_single_exit: vector=251
   sysjitter-2459  [019]   170.704504: tick_stop:            success=1 dependency=NONE
...

Luckily sometimes it happens (like in the above) that an interrupt fires
on a CPU while ksoftirqd is not executing, SCHED dep is seen as cleared
and the CPU enters full nohz. I've however observed pathological cases
where CPUs aren't able to enter back full nohz for a long time.

Hope my understanding is correct. Please find below a naive attempt to
fix the problem. I'm pretty sure I'm missing details, e.g. I'm not sure
it is safe to possibly stop the tick remotely. Consider the following as
not more than another way of pointing the problem out. :-)

Best,

Juri

1 - https://github.com/alexeiz/sysjitter
---
 kernel/time/tick-sched.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..0d696e55d774 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -342,11 +342,16 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);
 
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu);
+
 void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
 {
 	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
 	atomic_andnot(BIT(bit), &ts->tick_dep_mask);
+
+	if (!in_irq())
+		tick_nohz_full_update_tick_cpu(ts, cpu);
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
 
@@ -870,11 +875,9 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_nohz_restart(ts, now);
 }
 
-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	int cpu = smp_processor_id();
-
 	if (!tick_nohz_full_cpu(cpu))
 		return;
 
@@ -888,6 +891,15 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 #endif
 }
 
+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	int cpu = smp_processor_id();
+
+	tick_nohz_full_update_tick_cpu(ts, cpu);
+#endif
+}
+
 static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 {
 	/*
-- 
2.25.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 14:04 [RFC PATCH] tick/sched: update full_nohz status after SCHED dep is cleared Juri Lelli
2020-05-20 16:24 ` Frederic Weisbecker
2020-05-20 16:49   ` Juri Lelli
2020-05-20 17:02     ` Frederic Weisbecker
2020-05-20 18:47       ` Juri Lelli
2020-05-21  0:44         ` Frederic Weisbecker
2020-05-21 17:00           ` Juri Lelli
2020-06-15 15:05             ` Juri Lelli
2020-06-15 21:07             ` Frederic Weisbecker
2020-06-16  6:57               ` Juri Lelli
2020-06-16 20:46                 ` Frederic Weisbecker
2020-06-29 12:36                   ` Juri Lelli
2020-06-29 12:42                     ` Frederic Weisbecker
2020-06-29 12:53                       ` Juri Lelli

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git