Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: fweisbec@gmail.com, tglx@linutronix.de, mingo@kernel.org
Cc: linux-rt-users@vger.kernel.org, peterz@infradead.org,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>
Subject: [RFC PATCH] tick/sched: update full_nohz status after SCHED dep is cleared
Date: Wed, 20 May 2020 16:04:02 +0200
Message-ID: <20200520140402.358880-1-juri.lelli@redhat.com> (raw)

After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz
dependency is checked (via sched_update_tick_dependency()). In case tick
can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED
dependency for such CPU is cleared. However, this new information is not
used right away to actually stop the tick.

In CONFIG_PREEMPT systems booted with threadirqs option, sched clock
tick is serviced by an actual task (ksoftirqd corresponding to the CPU
where tick timer fired). So, in case a CPU was running a single task,
servicing the timer involves exiting full nozh mode. Problem at this
point is that we might lose chances to enter back into full nozh mode,
since info about ksoftirqd thread going back to sleep is not used (as
mentioned above).

Fix it by calling tick_nozh_full_update_tick(_cpu)() right after SCHED
dependency is cleared, so that tick can be promptly stopped.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
Hi,

I noticed what seems to be the problem described in the changelog while
running sysjitter [1] on a PREEMPT system setup for isolation and full
nozh, i.e. ... skew_tick=1 nohz=on nohz_full=4-35 rcu_nocbs=4-35
threadirqs (36 CPUs box).

Starting sysjitter with something like the following

perf stat -C 4-35 -e irq_vectors:local_timer_entry taskset \
--cpu-list 4-35 ./sysjitter/sysjitter --runtime 30 200

with vanilla kernel was returning ~5k local_timer_entry events, while
that number goes down to ~100 or so with the proposed patch applied.

The following trace snippet also highlight the problematic situation:

...
ksoftirqd/19-125   [019]   170.700754: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.700755: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.700756: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.700757: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.700759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.701740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.701742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.701743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.701744: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.701747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.701748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.701749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.701750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.701751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.701753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.701754: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.701756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.701756: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.701758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.701759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.702740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.702742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.702743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.702744: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.702747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.702748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.702749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.702750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.702751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.702753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.702755: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.702756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.702757: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.702758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.702760: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.703740: local_timer_entry:    vector=236
   sysjitter-2459  [019]   170.703742: softirq_raise:        vec=1 [action=TIMER]
   sysjitter-2459  [019]   170.703743: softirq_raise:        vec=7 [action=SCHED]
   sysjitter-2459  [019]   170.703745: local_timer_exit:     vector=236
   sysjitter-2459  [019]   170.703747: sched_wakeup:         ksoftirqd/19:125 [120] success=1 CPU:019
   sysjitter-2459  [019]   170.703748: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.703749: irq_work_entry:       vector=246
   sysjitter-2459  [019]   170.703750: irq_work_exit:        vector=246
   sysjitter-2459  [019]   170.703751: tick_stop:            success=0 dependency=SCHED
   sysjitter-2459  [019]   170.703753: sched_switch:         sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120]
ksoftirqd/19-125   [019]   170.703755: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.703756: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/19-125   [019]   170.703757: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.703758: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/19-125   [019]   170.703759: sched_switch:         ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120]
   sysjitter-2459  [019]   170.704500: call_function_single_entry: vector=251
   sysjitter-2459  [019]   170.704502: call_function_single_exit: vector=251
   sysjitter-2459  [019]   170.704504: tick_stop:            success=1 dependency=NONE
...

Luckily sometimes it happens (like in the above) that an interrupt fires
on a CPU while ksoftirqd is not executing, SCHED dep is seen as cleared
and the CPU enters full nohz. I've however observed pathological cases
where CPUs aren't able to enter back full nohz for a long time.

Hope my understanding is correct. Please find below a naive attempt to
fix the problem. I'm pretty sure I'm missing details, e.g. I'm not sure
it is safe to possibly stop the tick remotely. Consider the following as
not more than another way of pointing the problem out. :-)

Best,

Juri

1 - https://github.com/alexeiz/sysjitter
---
 kernel/time/tick-sched.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..0d696e55d774 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -342,11 +342,16 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);
 
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu);
+
 void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
 {
 	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
 	atomic_andnot(BIT(bit), &ts->tick_dep_mask);
+
+	if (!in_irq())
+		tick_nohz_full_update_tick_cpu(ts, cpu);
 }
 EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
 
@@ -870,11 +875,9 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_nohz_restart(ts, now);
 }
 
-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	int cpu = smp_processor_id();
-
 	if (!tick_nohz_full_cpu(cpu))
 		return;
 
@@ -888,6 +891,15 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 #endif
 }
 
+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	int cpu = smp_processor_id();
+
+	tick_nohz_full_update_tick_cpu(ts, cpu);
+#endif
+}
+
 static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 {
 	/*
-- 
2.25.3


             reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 14:04 Juri Lelli [this message]
2020-05-20 16:24 ` Frederic Weisbecker
2020-05-20 16:49   ` Juri Lelli
2020-05-20 17:02     ` Frederic Weisbecker
2020-05-20 18:47       ` Juri Lelli
2020-05-21  0:44         ` Frederic Weisbecker
2020-05-21 17:00           ` Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200520140402.358880-1-juri.lelli@redhat.com \
    --to=juri.lelli@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git