linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/3] Softirq -rt Optimizations
@ 2022-10-03 23:20 John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 1/3] softirq: Add generic accessor to percpu softirq_pending data John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: John Stultz @ 2022-10-03 23:20 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz

Hey all,

This series is a set of patches that optimize scheduler decisions around
realtime tasks and softirqs.  This series is a rebased and reworked set
of changes that have been shipping on Android devices for a number of
years, originally created to resolve audio glitches seen on devices
caused by softirqs for network or storage drivers.

Long running softirqs cause issues because they aren’t currently taken
into account when a realtime task is woken up, but they will delay
realtime tasks from running if the realtime tasks are placed on a cpu
currently running a softirq.

This can easily be seen on some devices by running cyclictest* along
with some heavy background filesystems noise:

Without the patches:
T: 0 ( 7596) P:99 I:1000 C:  59980 Min:      7 Act:   13 Avg:   29 Max: 4107
T: 1 ( 7597) P:99 I:1500 C:  39990 Min:     14 Act:   26 Avg:   36 Max: 8994
T: 2 ( 7598) P:99 I:2000 C:  29995 Min:      7 Act:   22 Avg:   35 Max: 3616
T: 3 ( 7599) P:99 I:2500 C:  23915 Min:      7 Act:   25 Avg:   49 Max: 40273
T: 4 ( 7600) P:99 I:3000 C:  19995 Min:      8 Act:   22 Avg:   38 Max: 10510
T: 5 ( 7601) P:99 I:3500 C:  17135 Min:      7 Act:   26 Avg:   39 Max: 13194
T: 6 ( 7602) P:99 I:4000 C:  14990 Min:      7 Act:   26 Avg:   40 Max: 9470
T: 7 ( 7603) P:99 I:4500 C:  13318 Min:      8 Act:   29 Avg:   44 Max: 20101

Which you can visually see in the image here:
 https://github.com/johnstultz-work/misc/raw/main/images/2022-08-09-softirq-rt-big-latency.png

Which is from the perfetto trace captured here:
 https://ui.perfetto.dev/#!/?s=33661aec8ec82c2da0a59263f36f7d72b4a2f4e7a99b28b222bd12ad872f

The first patch adds a bit of generic infrastructure to get the per-cpu
softirq_pending flag.

The second patch in the series adds logic to account for when softirqs
are running, and then conditionally based on
CONFIG_RT_SOFTIRQ_OPTIMIZATION allows rt-task placement to be done in a
way that’s aware if a current softirq might be a long-running one, to
potentially place the rt task on another free core.

The third patch in the series adds logic in __do_softirq(), also under
CONFIG_RT_SOFTIRQ_OPTIMIZATION, to defer some of the potentially long
running softirqs to ksoftirqd if a -rt task is currently running on the
cpu. This patch also includes a folded down fix that stubbs out
ksoftirqd_running() based on CONFIG_RT_SOFTIRQ_OPTIMIZATION, since in
changing to more frequently defer long running softirqs, the logic using
ksoftirqd_running will end up being too conservative and needlessly
delay shorter-running softirqs.

With these patches we see dramatic improvements in the worst case
latencies in the cyclictest* + filesystem noise test above:

With the patches
T: 0 ( 7527) P:99 I:1000 C:  59998 Min:      6 Act:   29 Avg:   35 Max: 1734
T: 1 ( 7528) P:99 I:1500 C:  40000 Min:      7 Act:   39 Avg:   35 Max: 1181
T: 2 ( 7529) P:99 I:2000 C:  30000 Min:      7 Act:   25 Avg:   25 Max: 444
T: 3 ( 7530) P:99 I:2500 C:  24000 Min:      7 Act:   34 Avg:   36 Max: 1729
T: 4 ( 7531) P:99 I:3000 C:  20000 Min:      7 Act:   36 Avg:   25 Max: 406
T: 5 ( 7532) P:99 I:3500 C:  17143 Min:      7 Act:   38 Avg:   34 Max: 1264
T: 6 ( 7533) P:99 I:4000 C:  15000 Min:      7 Act:   27 Avg:   33 Max: 2351
T: 7 ( 7534) P:99 I:4500 C:  13334 Min:      7 Act:   41 Avg:   29 Max: 2285

Since these patches have been carried along for years, and have at times
badly collided with upstream, I wanted to submit them for some initial
review, discussion and feedback so we could hopefully eventually find a
reasonable solution that might land upstream.


* Unfortunately cyclictest had a bug that causes it to always affine
threads to cpus preventing them from being migrated. So you’ll need
to update to the latest version (which includes a fix) to reproduce.

Let me know what you think!

thanks
-john


Chages in v4:
* Drop TASKLET_SOFTIRQ from LONG_SOFTIRQS (suggested by Qais)
* Depend on !PREEMPT_RT (Suggested by Qais)
* Larger simplification of logic (suggested by Qais)
* Rework LONG_SOFTIRQS to use BIT() macros
* Rename task_may_preempt() to cpu_busy_with_softirqs()
* Fix commit message to accurately note long-running softirqs
  (suggested by Qais)
* Switch to using rt_task(current) (suggested by Qais)


Connor O'Brien (1):
  sched: Avoid placing RT threads on cores handling long softirqs

John Stultz (1):
  softirq: Add generic accessor to percpu softirq_pending data

Pavankumar Kondeti (1):
  softirq: defer softirq processing to ksoftirqd if CPU is busy with RT

 arch/s390/include/asm/hardirq.h |  6 ++++
 include/linux/interrupt.h       | 17 +++++++++
 init/Kconfig                    | 10 ++++++
 kernel/sched/rt.c               | 61 ++++++++++++++++++++++++++++-----
 kernel/softirq.c                | 38 ++++++++++++++++++--
 5 files changed, 121 insertions(+), 11 deletions(-)

-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v4 1/3] softirq: Add generic accessor to percpu softirq_pending data
  2022-10-03 23:20 [RFC PATCH v4 0/3] Softirq -rt Optimizations John Stultz
@ 2022-10-03 23:20 ` John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT John Stultz
  2 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-10-03 23:20 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, John Dias, Connor O'Brien, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner,
	Heiko Carstens, Vasily Gorbik, kernel-team, kernel test robot

In a previous iteration of this patch series, I was checking:

   per_cpu(irq_stat, cpu).__softirq_pending

which resulted in build errors on s390.

This patch tries to create a generic accessor to this percpu
softirq_pending data.

This interface is inherently racy as its reading percpu data
without a lock. However, being able to peek at the softirq
pending data allows us to make better decisions about rt task
placement vs just ignoring it.

On s390 this call returns 0, which maybe isn't ideal but
results in no functional change from what we do now.

TODO: Heiko suggested changing s390 to use a proper per-cpu
irqstat variable instead.

Feedback or suggestions for better approach here would be
welcome!

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: kernel-team@android.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
 arch/s390/include/asm/hardirq.h |  6 ++++++
 include/linux/interrupt.h       | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
index 58668ffb5488..cd9cc11588ab 100644
--- a/arch/s390/include/asm/hardirq.h
+++ b/arch/s390/include/asm/hardirq.h
@@ -16,6 +16,12 @@
 #define local_softirq_pending() (S390_lowcore.softirq_pending)
 #define set_softirq_pending(x) (S390_lowcore.softirq_pending = (x))
 #define or_softirq_pending(x)  (S390_lowcore.softirq_pending |= (x))
+/*
+ *  Not sure what the right thing is here  for s390,
+ *  but returning 0 will result in no logical change
+ *  from what happens now
+ */
+#define __cpu_softirq_pending(x) (0)
 
 #define __ARCH_IRQ_STAT
 #define __ARCH_IRQ_EXIT_IRQS_DISABLED
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce40b04b..a749a8663841 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -527,6 +527,17 @@ DECLARE_STATIC_KEY_FALSE(force_irqthreads_key);
 #define set_softirq_pending(x)	(__this_cpu_write(local_softirq_pending_ref, (x)))
 #define or_softirq_pending(x)	(__this_cpu_or(local_softirq_pending_ref, (x)))
 
+/**
+ * __cpu_softirq_pending() - Checks to see if softirq is pending on a cpu
+ *
+ * This helper is inherently racy, as we're accessing per-cpu data w/o locks.
+ * But peeking at the flag can still be useful when deciding where to place a
+ * task.
+ */
+static inline u32 __cpu_softirq_pending(int cpu)
+{
+	return (u32)per_cpu(local_softirq_pending_ref, cpu);
+}
 #endif /* local_softirq_pending */
 
 /* Some architectures might implement lazy enabling/disabling of
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-03 23:20 [RFC PATCH v4 0/3] Softirq -rt Optimizations John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 1/3] softirq: Add generic accessor to percpu softirq_pending data John Stultz
@ 2022-10-03 23:20 ` John Stultz
       [not found]   ` <20221004013611.1822-1-hdanton@sina.com>
                     ` (2 more replies)
  2022-10-03 23:20 ` [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT John Stultz
  2 siblings, 3 replies; 29+ messages in thread
From: John Stultz @ 2022-10-03 23:20 UTC (permalink / raw)
  To: LKML
  Cc: Connor O'Brien, John Dias, Rick Yiu, John Kacur, Qais Yousef,
	Chris Redpath, Abhijeet Dharmapurikar, Peter Zijlstra,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Thomas Gleixner, kernel-team, J . Avila,
	John Stultz

From: Connor O'Brien <connoro@google.com>

In certain audio use cases, scheduling RT threads on cores that
are handling softirqs can lead to glitches. Prevent this
behavior in cases where the softirq is likely to take a long
time. To avoid unnecessary migrations, the old behavior is
preserved for RCU, SCHED and TIMER irqs which are expected to be
relatively quick.

This patch reworks and combines two related changes originally
by John Dias <joaodias@google.com>

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: John Dias <joaodias@google.com>
[elavila: Port to mainline, amend commit text]
Signed-off-by: J. Avila <elavila@google.com>
[connoro: Reworked, simplified, and merged two patches together]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Further simplified and fixed issues, reworded commit
 message, removed arm64-isms]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Reformatted Kconfig entry to match coding style
  (Reported-by: Randy Dunlap <rdunlap@infradead.org>)
* Made rt_task_fits_capacity_and_may_preempt static to
  avoid warnings (Reported-by: kernel test robot <lkp@intel.com>)
* Rework to use preempt_count and drop kconfig dependency on ARM64
v3:
* Use introduced __cpu_softirq_pending() to avoid s390 build
  issues (Reported-by: kernel test robot <lkp@intel.com>)
v4:
* Drop TASKLET_SOFTIRQ from LONG_SOFTIRQS (suggested by Qais)
* Depend on !PREEMPT_RT (Suggested by Qais)
* Larger simplification of logic (suggested by Qais)
* Rework LONG_SOFTIRQS to use BIT() macros
* Rename task_may_preempt() to cpu_busy_with_softirqs()
---
 include/linux/interrupt.h |  6 ++++
 init/Kconfig              | 10 +++++++
 kernel/sched/rt.c         | 61 +++++++++++++++++++++++++++++++++------
 kernel/softirq.c          |  9 ++++++
 4 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a749a8663841..e3a4add67e8c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -582,6 +582,11 @@ enum
  * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
  */
 #define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
+/* Softirq's where the handling might be long: */
+#define LONG_SOFTIRQ_MASK (BIT(NET_TX_SOFTIRQ)    | \
+			   BIT(NET_RX_SOFTIRQ)    | \
+			   BIT(BLOCK_SOFTIRQ)     | \
+			   BIT(IRQ_POLL_SOFTIRQ))
 
 /* map softirq index to softirq name. update 'softirq_to_name' in
  * kernel/softirq.c when adding a new softirq.
@@ -617,6 +622,7 @@ extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 
 DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+DECLARE_PER_CPU(u32, active_softirqs);
 
 static inline struct task_struct *this_cpu_ksoftirqd(void)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 532362fcfe31..3d1de6edcfa1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1284,6 +1284,16 @@ config SCHED_AUTOGROUP
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config RT_SOFTIRQ_OPTIMIZATION
+	bool "Improve RT scheduling during long softirq execution"
+	depends on SMP && !PREEMPT_RT
+	default n
+	help
+	  Enable an optimization which tries to avoid placing RT tasks on CPUs
+	  occupied by nonpreemptible tasks, such as a long softirq or CPUs
+	  which may soon block preemptions, such as a CPU running a ksoftirq
+	  thread which handles slow softirqs.
+
 config SYSFS_DEPRECATED
 	bool "Enable deprecated sysfs features to support old userspace tools"
 	depends on SYSFS
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 55f39c8f4203..3c628db807c8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
 #ifdef CONFIG_SMP
 static int find_lowest_rq(struct task_struct *task);
 
+#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
+#define __use_softirq_opt 1
+/*
+ * Return whether the given cpu is currently non-preemptible
+ * while handling a potentially long softirq, or if the current
+ * task is likely to block preemptions soon because it is a
+ * ksoftirq thread that is handling slow softirq.
+ */
+static bool cpu_busy_with_softirqs(int cpu)
+{
+	u32 softirqs = per_cpu(active_softirqs, cpu) |
+		       __cpu_softirq_pending(cpu);
+	struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
+	struct task_struct *curr;
+	struct rq *rq = cpu_rq(cpu);
+	int ret;
+
+	rcu_read_lock();
+	curr = READ_ONCE(rq->curr); /* unlocked access */
+	ret = (softirqs & LONG_SOFTIRQ_MASK) &&
+		 (curr == cpu_ksoftirqd ||
+		  preempt_count() & SOFTIRQ_MASK);
+	rcu_read_unlock();
+	return ret;
+}
+#else
+#define __use_softirq_opt 0
+static bool cpu_busy_with_softirqs(int cpu)
+{
+	return false;
+}
+#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
+
+static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
+{
+	return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);
+}
+
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
@@ -1637,22 +1675,24 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 	 * This test is optimistic, if we get it wrong the load-balancer
 	 * will have to sort it out.
 	 *
-	 * We take into account the capacity of the CPU to ensure it fits the
-	 * requirement of the task - which is only important on heterogeneous
-	 * systems like big.LITTLE.
+	 * We use rt_task_fits_cpu() to evaluate if the CPU is busy with
+	 * potentially long-running softirq work, as well as take into
+	 * account the capacity of the CPU to ensure it fits the
+	 * requirement of the task - which is only important on
+	 * heterogeneous systems like big.LITTLE.
 	 */
 	test = curr &&
 	       unlikely(rt_task(curr)) &&
 	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
 
-	if (test || !rt_task_fits_capacity(p, cpu)) {
+	if (test || !rt_task_fits_cpu(p, cpu)) {
 		int target = find_lowest_rq(p);
 
 		/*
 		 * Bail out if we were forcing a migration to find a better
 		 * fitting CPU but our search failed.
 		 */
-		if (!test && target != -1 && !rt_task_fits_capacity(p, target))
+		if (!test && target != -1 && !rt_task_fits_cpu(p, target))
 			goto out_unlock;
 
 		/*
@@ -1894,14 +1934,17 @@ static int find_lowest_rq(struct task_struct *task)
 		return -1; /* No other targets possible */
 
 	/*
-	 * If we're on asym system ensure we consider the different capacities
-	 * of the CPUs when searching for the lowest_mask.
+	 * If we're using the softirq optimization or if we are
+	 * on asym system, ensure we consider the softirq processing
+	 * or different capacities of the CPUs when searching for the
+	 * lowest_mask.
 	 */
-	if (static_branch_unlikely(&sched_asym_cpucapacity)) {
+	if (__use_softirq_opt ||
+	    static_branch_unlikely(&sched_asym_cpucapacity)) {
 
 		ret = cpupri_find_fitness(&task_rq(task)->rd->cpupri,
 					  task, lowest_mask,
-					  rt_task_fits_capacity);
+					  rt_task_fits_cpu);
 	} else {
 
 		ret = cpupri_find(&task_rq(task)->rd->cpupri,
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913c067d..35ee79dd8786 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -60,6 +60,13 @@ static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
 
+/*
+ * active_softirqs -- per cpu, a mask of softirqs that are being handled,
+ * with the expectation that approximate answers are acceptable and therefore
+ * no synchronization.
+ */
+DEFINE_PER_CPU(u32, active_softirqs);
+
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
 	"TASKLET", "SCHED", "HRTIMER", "RCU"
@@ -551,6 +558,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 restart:
 	/* Reset the pending bitmask before enabling irqs */
 	set_softirq_pending(0);
+	__this_cpu_write(active_softirqs, pending);
 
 	local_irq_enable();
 
@@ -580,6 +588,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		pending >>= softirq_bit;
 	}
 
+	__this_cpu_write(active_softirqs, 0);
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
 	    __this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-03 23:20 [RFC PATCH v4 0/3] Softirq -rt Optimizations John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 1/3] softirq: Add generic accessor to percpu softirq_pending data John Stultz
  2022-10-03 23:20 ` [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs John Stultz
@ 2022-10-03 23:20 ` John Stultz
  2022-10-04 10:45   ` David Laight
  2022-10-10 16:09   ` Qais Yousef
  2 siblings, 2 replies; 29+ messages in thread
From: John Stultz @ 2022-10-03 23:20 UTC (permalink / raw)
  To: LKML
  Cc: Pavankumar Kondeti, John Dias, Connor O'Brien, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	Satya Durga Srinivasu Prabhala, J . Avila, John Stultz

From: Pavankumar Kondeti <pkondeti@codeaurora.org>

Defer the softirq processing to ksoftirqd if a RT task is
running or queued on the current CPU. This complements the RT
task placement algorithm which tries to find a CPU that is not
currently busy with softirqs.

Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
deferred as they can potentially run for long time.

Additionally, this patch stubs out ksoftirqd_running() logic,
in the CONFIG_RT_SOFTIRQ_OPTIMIZATION case, as deferring
potentially long-running softirqs will cause the logic to not
process shorter-running softirqs immediately. By stubbing it out
the potentially long running softirqs are deferred, but the
shorter running ones can still run immediately.

This patch includes folded-in fixes by:
  Lingutla Chandrasekhar <clingutla@codeaurora.org>
  Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
  J. Avila <elavila@google.com>

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebase to linus/HEAD, minor rearranging of code,
 included bug fix Reported-by: Qais Yousef <qais.yousef@arm.com> ]
Signed-off-by: John Stultz <jstultz@google.com>
---
v4:
* Fix commit message to accurately note long-running softirqs
  (suggested by Qais)
* Switch to using rt_task(current) (suggested by Qais)
---
 kernel/softirq.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 35ee79dd8786..c8ce12bbab04 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -87,6 +87,7 @@ static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
+#ifndef CONFIG_RT_SOFTIRQ_OPTIMIZATION
 /*
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
  * right now. Let ksoftirqd handle this at its own rate, to get fairness,
@@ -101,6 +102,9 @@ static bool ksoftirqd_running(unsigned long pending)
 		return false;
 	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
+#else
+#define ksoftirqd_running(pending) (false)
+#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_PER_CPU(int, hardirqs_enabled);
@@ -532,6 +536,21 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
+static __u32 softirq_deferred_for_rt(__u32 *pending)
+{
+	__u32 deferred = 0;
+
+	if (rt_task(current)) {
+		deferred = *pending & LONG_SOFTIRQ_MASK;
+		*pending &= ~LONG_SOFTIRQ_MASK;
+	}
+	return deferred;
+}
+#else
+#define softirq_deferred_for_rt(x) (0)
+#endif
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
@@ -539,6 +558,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
 	bool in_hardirq;
+	__u32 deferred;
 	__u32 pending;
 	int softirq_bit;
 
@@ -550,14 +570,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
 
 	softirq_handle_begin();
+
 	in_hardirq = lockdep_softirq_start();
 	account_softirq_enter(current);
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	set_softirq_pending(deferred);
 	__this_cpu_write(active_softirqs, pending);
 
 	local_irq_enable();
@@ -596,13 +618,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	local_irq_disable();
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
+
 	if (pending) {
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
+	}
 
+	if (pending | deferred)
 		wakeup_softirqd();
-	}
 
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
       [not found]   ` <20221004013611.1822-1-hdanton@sina.com>
@ 2022-10-04  2:29     ` John Stultz
       [not found]       ` <20221005002149.1876-1-hdanton@sina.com>
  0 siblings, 1 reply; 29+ messages in thread
From: John Stultz @ 2022-10-04  2:29 UTC (permalink / raw)
  To: Hillf Danton
  Cc: LKML, linux-mm, Connor O'Brien, Qais Yousef, Peter Zijlstra,
	Steven Rostedt

On Mon, Oct 3, 2022 at 6:56 PM Hillf Danton <hdanton@sina.com> wrote:
> On 3 Oct 2022 23:20:32 +0000 John Stultz <jstultz@google.com>
> > +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> > +#define __use_softirq_opt 1
> > +/*
> > + * Return whether the given cpu is currently non-preemptible
> > + * while handling a potentially long softirq, or if the current
> > + * task is likely to block preemptions soon because it is a
> > + * ksoftirq thread that is handling slow softirq.
> > + */
> > +static bool cpu_busy_with_softirqs(int cpu)
> > +{
> > +     u32 softirqs = per_cpu(active_softirqs, cpu) |
> > +                    __cpu_softirq_pending(cpu);
> > +     struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> > +     struct task_struct *curr;
> > +     struct rq *rq = cpu_rq(cpu);
> > +     int ret;
> > +
> > +     rcu_read_lock();
> > +     curr = READ_ONCE(rq->curr); /* unlocked access */
> > +     ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> > +              (curr == cpu_ksoftirqd ||
> > +               preempt_count() & SOFTIRQ_MASK);
> > +     rcu_read_unlock();
> > +     return ret;
> > +}
> > +#else
> > +#define __use_softirq_opt 0
> > +static bool cpu_busy_with_softirqs(int cpu)
> > +{
> > +     return false;
> > +}
> > +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> > +
> > +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
> > +{
> > +     return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);
> > +}
>
> On one hand, RT task is not layency sensitive enough if it fails to preempt
> ksoftirqd. On the other, deferring softirq to ksoftirqd barely makes sense
> in 3/3 if it preempts the current RT task.

Apologies, I'm not sure I'm following you here. Why would ksoftirqd
preempt the rt task?

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-03 23:20 ` [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT John Stultz
@ 2022-10-04 10:45   ` David Laight
  2022-10-04 19:19     ` John Stultz
  2022-10-10 16:09   ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: David Laight @ 2022-10-04 10:45 UTC (permalink / raw)
  To: 'John Stultz', LKML
  Cc: Pavankumar Kondeti, John Dias, Connor O'Brien, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	Satya Durga Srinivasu Prabhala, J . Avila

From: John Stultz
> Sent: 04 October 2022 00:21
> 
> From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> 
> Defer the softirq processing to ksoftirqd if a RT task is
> running or queued on the current CPU. This complements the RT
> task placement algorithm which tries to find a CPU that is not
> currently busy with softirqs.
> 
> Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> deferred as they can potentially run for long time.

Deferring NET_RX to ksoftirqd stops the NET_RX code from
running until the RT process completes.

This has exactly the same problems as the softint taking
priority of the RT task - just the other way around.

Under very high traffic loads receive packets get lost.
In many cases that is actually far worse that the wakeup
of an RT process being delayed slightly.

The is no 'one size fits all' answer to the problem.

Plausibly depending on the priority of the RT task
might be useful.
But sometimes it depends on the actual reason for the
wakeup.
For instance a wakeup from an ISR or a hish-res timer
might need lower latency than one from a futex.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-04 10:45   ` David Laight
@ 2022-10-04 19:19     ` John Stultz
  0 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-10-04 19:19 UTC (permalink / raw)
  To: David Laight
  Cc: LKML, John Dias, Connor O'Brien, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Tue, Oct 4, 2022 at 3:45 AM David Laight <David.Laight@aculab.com> wrote:
> From: John Stultz
> > Sent: 04 October 2022 00:21
> >
> > From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> >
> > Defer the softirq processing to ksoftirqd if a RT task is
> > running or queued on the current CPU. This complements the RT
> > task placement algorithm which tries to find a CPU that is not
> > currently busy with softirqs.
> >
> > Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> > deferred as they can potentially run for long time.
>
> Deferring NET_RX to ksoftirqd stops the NET_RX code from
> running until the RT process completes.
>
> This has exactly the same problems as the softint taking
> priority of the RT task - just the other way around.
>
> Under very high traffic loads receive packets get lost.
> In many cases that is actually far worse that the wakeup
> of an RT process being delayed slightly.
>
> The is no 'one size fits all' answer to the problem.

I'm not claiming there is a one size fits all solution, just like
there are cases where PREEMPT_RT is the right choice and cases where
it is not.

I'm only proposing we add this build-time configurable option, because
the issues they address do affect Android devices and we need some
sort of solution (though I'm open to alternatives).

> Plausibly depending on the priority of the RT task
> might be useful.
> But sometimes it depends on the actual reason for the
> wakeup.
> For instance a wakeup from an ISR or a hish-res timer
> might need lower latency than one from a futex.

I mean, with the PREEMPT_RT series years ago there used to be
configuration guides on setting the rtprio for each softirq thread to
provide tuning knobs like you mention, but I'm not sure what the
current state of that is.

If you're wanting more flexibility from the proposed patch, I guess we
could have a knob to runtime set which softirqs would be deferred by
rt tasks, but I'm not sure if folks want to expose that level of
detail of the kernel's workings as a UABI, so having it be a kernel
build option avoids that.

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
       [not found]       ` <20221005002149.1876-1-hdanton@sina.com>
@ 2022-10-05  1:13         ` John Stultz
       [not found]         ` <20221005060155.1571-1-hdanton@sina.com>
  1 sibling, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-10-05  1:13 UTC (permalink / raw)
  To: Hillf Danton
  Cc: LKML, linux-mm, Connor O'Brien, Qais Yousef, Peter Zijlstra,
	Steven Rostedt

On Tue, Oct 4, 2022 at 5:22 PM Hillf Danton <hdanton@sina.com> wrote:
> On 3 Oct 2022 19:29:36 -0700 John Stultz <jstultz@google.com>
> >
> > Why would ksoftirqd preempt the rt task?
> >
> For example the kthread becomes sensitive to latency.

Again, my apologies, I'm still not sure I am following. Could you
expand a bit on the case you're concerned about? Is it the case where
the ksoftirqd thread is configured to run at higher rtprio?

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
       [not found]         ` <20221005060155.1571-1-hdanton@sina.com>
@ 2022-10-10 15:42           ` Qais Yousef
       [not found]             ` <20221011111846.284-1-hdanton@sina.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2022-10-10 15:42 UTC (permalink / raw)
  To: Hillf Danton
  Cc: John Stultz, LKML, linux-mm, Connor O'Brien, Peter Zijlstra,
	Steven Rostedt

On 10/05/22 14:01, Hillf Danton wrote:
> On 4 Oct 2022 18:13:52 -0700 John Stultz <jstultz@google.com>
> > On Tue, Oct 4, 2022 at 5:22 PM Hillf Danton <hdanton@sina.com> wrote:
> > > On 3 Oct 2022 19:29:36 -0700 John Stultz <jstultz@google.com>
> > > >
> > > > Why would ksoftirqd preempt the rt task?
> > > >
> > > For example the kthread becomes sensitive to latency.
> > 
> > Is it the case where
> > the ksoftirqd thread is configured to run at higher rtprio?
> > 
> Yes, you are right.

I don't see a problem here. If a sys-admin configures their ksoftirqds to be
a higher priority RT tasks than the audio threads, then they better know what
they're doing :-)

The issue at hand here is that the softirqs boundedness is hard to control. And
the scheduling delays ensued are hard to deal with by any sys-admin.

Networking has actually introduced some knobs to help control that - but the
tricky bit of still being able to deliver high throughput networking while
keeping the softirq bounded to minimize scheduling delays/latencies. I think
even for PREEMPT_RT, high performance networking could be impacted to achieve
the required low latency.

See this paper which explores this duality:

	https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.702.7571&rep=rep1&type=pdf


With WiFi 6 and 5G mobile networks, phones are actually expected to deliver
multi-gigabit network throughputs.


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-03 23:20 ` [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT John Stultz
  2022-10-04 10:45   ` David Laight
@ 2022-10-10 16:09   ` Qais Yousef
  2022-10-17 14:44     ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2022-10-10 16:09 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Pavankumar Kondeti, John Dias, Connor O'Brien,
	Rick Yiu, John Kacur, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	Satya Durga Srinivasu Prabhala, J . Avila

Hi John

On 10/03/22 23:20, John Stultz wrote:
> From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> 
> Defer the softirq processing to ksoftirqd if a RT task is
> running or queued on the current CPU. This complements the RT
> task placement algorithm which tries to find a CPU that is not
> currently busy with softirqs.
> 
> Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> deferred as they can potentially run for long time.
> 
> Additionally, this patch stubs out ksoftirqd_running() logic,
> in the CONFIG_RT_SOFTIRQ_OPTIMIZATION case, as deferring
> potentially long-running softirqs will cause the logic to not
> process shorter-running softirqs immediately. By stubbing it out
> the potentially long running softirqs are deferred, but the
> shorter running ones can still run immediately.

The cover letter didn't make it to my inbox (nor to others AFAICS from lore),
so replying here.

The series looks good to me. It offers a compromise to avoid an existing
conflict between RT and softirqs without disrupting much how both inherently
work. I guess it's up to the maintainers to decide if this direction is
acceptable or not.

I've seen Thomas had a comment on another softirq series (which attempts to
throttle them ;-) by the way that is worth looking it:

	https://lore.kernel.org/lkml/877d81jc13.ffs@tglx/


Meanwhile, I did run a few tests on a test laptop that has 2 core SMT2 i7
laptop (since I assume you tested on Android)

I set priority to 1 for all of these cyclic tests.

First I ran without applying your patch to fix the affinity problem in
cyclictest:

I had a 1 hour run of 4 threads - 4 iperf threads and 4 dd threads are
doing work in the background:

                   |     vanilla     | with softirq patches v4 |
-------------------|-----------------|-------------------------|
t0 max delay (us)  |      6728       |         2096            |
t1 max delay (us)  |      2545       |         1990            |
t2 max delay (us)  |      2282       |         2094            |
t3 max delay (us)  |      6038       |         2162            |

Which shows max latency is improved a lot. Though because I missed applying
your cyclictest patch, I believe this can be attributed only to patch 3 which
defers the softirq if there's current task is an RT one.


I applied your patch to cyclictest to NOT force affinity when specifying -t
option.



Ran cyclictest for 4 hours, -t 3, 3 iperf threads and 3 dd threads running in
the background:

                   |     vanilla     | with softirq patches v4 |
-------------------|-----------------|-------------------------|
t0 max delay (us)  |      2656       |         2164            |
t1 max delay (us)  |      2773       |         1982            |
t2 max delay (us)  |      2272       |         2278            |

I didn't hit a big max delay on this case. It shows things are better still.



Ran another cyclictest for 4 hours, -t 4, 4 iperf threads and 4 dd threads in
the background:

                   |     vanilla     | with softirq patches v4 |
-------------------|-----------------|-------------------------|
t0 max delay (us)  |      4012       |         7074            |
t1 max delay (us)  |      2460       |         9088            |
t2 max delay (us)  |      3755       |         2784            |
t3 max delay (us)  |      4392       |         2295            |

Here the results worryingly show that applying the patches make things much
worse.

I still have to investigate why. I'll have another run to see if the results
look different, then try to dig more.

All results are from the cyclictest json dump.


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
       [not found]             ` <20221011111846.284-1-hdanton@sina.com>
@ 2022-10-12 14:10               ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2022-10-12 14:10 UTC (permalink / raw)
  To: Hillf Danton
  Cc: John Stultz, LKML, linux-mm, Connor O'Brien, Peter Zijlstra,
	Steven Rostedt

On 10/11/22 19:18, Hillf Danton wrote:

[...]

> > The issue at hand here is that the softirqs boundedness is hard to control. And
> > the scheduling delays ensued are hard to deal with by any sys-admin.
> > 
> Given "The RT scheduler is for tasks that require strick scheduling
> requirements over all else, including performance." [1], I would invite
> Steve to shed some more light on the relation between RT scheduler and
> performance/network throughputs.
> 
> Bonus question, why softirq took no care of RT tasks, given strick
> scheduling requirements above?
> 
> [1] https://lore.kernel.org/lkml/257E96C2-6ABD-4DD6-9B7F-771DA3D1BAAA@goodmis.org/

I think you're mixing the two patches up. That patch is to help describe some
latency requirements for CFS tasks. It has nothing to do with RT. Your
suggestion to use RT scheduler is not valid as these tasks can't be converted
to RT. Which is what Steve was trying to say IIUC.

Generally converting normal application tasks into RT is a recipe for disaster
because:

	1. Misbehaving RT tasks (busy looping indefinitely) can hog the system
	   to a halt.
	2. How do you manage priorities of all these pseudo-random RT tasks
	   each application spawns so you end up with correct resource sharing?

ie: using RT policy is a privileged operation for a reason :-)

The target audience for latency_nice is the average Joe task from any
application that might have some latency requirements to deliver a better user
experience. ie: it's not strict latency requirement. We just want to handle
delays within _CFS_ part of the scheduler.

By the way, your emails don't seem to make it to LKML for some reason; they
show as '[not found]' on lore.

	https://lore.kernel.org/lkml/20221010154216.6mw7fszdaoajurvm@wubuntu/#r


Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-03 23:20 ` [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs John Stultz
       [not found]   ` <20221004013611.1822-1-hdanton@sina.com>
@ 2022-10-17 12:40   ` Alexander Gordeev
  2022-10-18  3:42     ` John Stultz
  2022-10-22 18:28   ` [RFC PATCH " Joel Fernandes
  2 siblings, 1 reply; 29+ messages in thread
From: Alexander Gordeev @ 2022-10-17 12:40 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> From: Connor O'Brien <connoro@google.com>

Hi John, Connor,

I took a cursory look and have couple of hopefully meaningful
comments, but mostly - questions.

[...]

> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 55f39c8f4203..3c628db807c8 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
>  #ifdef CONFIG_SMP
>  static int find_lowest_rq(struct task_struct *task);
>  
> +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> +#define __use_softirq_opt 1
> +/*
> + * Return whether the given cpu is currently non-preemptible
> + * while handling a potentially long softirq, or if the current
> + * task is likely to block preemptions soon because it is a
> + * ksoftirq thread that is handling slow softirq.

What is slow softirqs in this context compared to long?

> + */
> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	u32 softirqs = per_cpu(active_softirqs, cpu) |
> +		       __cpu_softirq_pending(cpu);
> +	struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> +	struct task_struct *curr;
> +	struct rq *rq = cpu_rq(cpu);
> +	int ret;
> +
> +	rcu_read_lock();
> +	curr = READ_ONCE(rq->curr); /* unlocked access */

select_task_rq_rt() takes the lock and reads curr already,
before calling this funciton. I think there is a way to
decompose it in a better way.

> +	ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> +		 (curr == cpu_ksoftirqd ||

EOL is extra.

> +		  preempt_count() & SOFTIRQ_MASK);

Could you please clarify this whole check in more detail?

What is the point in checking if a remote CPU is handling
a "long" softirq while the local one is handling any softirq?

> +	rcu_read_unlock();

Why ret needs to be calculated under the lock?

> +	return ret;
> +}
> +#else
> +#define __use_softirq_opt 0
> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> +
> +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)

To me, the new name is unfortunate, since it strips a notion
of the reason. Instead, "CPU un/fits, because of capacity" it
reads as "CPU un/fits, because of ..." what?

> +{
> +	return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);

I guess the order needs to be swapped, as rt_task_fits_capacity()
is rather "quick" while cpu_busy_with_softirqs() is rather "slow".

> +}
> +
>  static int
>  select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>  {
> @@ -1894,14 +1934,17 @@ static int find_lowest_rq(struct task_struct *task)
>  		return -1; /* No other targets possible */
>  
>  	/*
> -	 * If we're on asym system ensure we consider the different capacities
> -	 * of the CPUs when searching for the lowest_mask.
> +	 * If we're using the softirq optimization or if we are
> +	 * on asym system, ensure we consider the softirq processing
> +	 * or different capacities of the CPUs when searching for the
> +	 * lowest_mask.
>  	 */
> -	if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> +	if (__use_softirq_opt ||

Why use __use_softirq_opt and not IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION)?

> +	    static_branch_unlikely(&sched_asym_cpucapacity)) {
>  
>  		ret = cpupri_find_fitness(&task_rq(task)->rd->cpupri,
>  					  task, lowest_mask,
> -					  rt_task_fits_capacity);
> +					  rt_task_fits_cpu);
>  	} else {
>  
>  		ret = cpupri_find(&task_rq(task)->rd->cpupri,
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..35ee79dd8786 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -60,6 +60,13 @@ static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
>  
> +/*
> + * active_softirqs -- per cpu, a mask of softirqs that are being handled,
> + * with the expectation that approximate answers are acceptable and therefore
> + * no synchronization.
> + */
> +DEFINE_PER_CPU(u32, active_softirqs);

I guess all active_softirqs uses need to be coupled with
IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION) check.

>  const char * const softirq_to_name[NR_SOFTIRQS] = {
>  	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
>  	"TASKLET", "SCHED", "HRTIMER", "RCU"
> @@ -551,6 +558,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  restart:
>  	/* Reset the pending bitmask before enabling irqs */
>  	set_softirq_pending(0);
> +	__this_cpu_write(active_softirqs, pending);
>  
>  	local_irq_enable();
>  
> @@ -580,6 +588,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  		pending >>= softirq_bit;
>  	}
>  
> +	__this_cpu_write(active_softirqs, 0);
>  	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
>  	    __this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-10 16:09   ` Qais Yousef
@ 2022-10-17 14:44     ` Qais Yousef
  2022-10-18  0:04       ` John Stultz
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2022-10-17 14:44 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Pavankumar Kondeti, John Dias, Connor O'Brien,
	Rick Yiu, John Kacur, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	Satya Durga Srinivasu Prabhala, J . Avila

On 10/10/22 17:09, Qais Yousef wrote:
> Hi John
> 
> On 10/03/22 23:20, John Stultz wrote:
> > From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> > 
> > Defer the softirq processing to ksoftirqd if a RT task is
> > running or queued on the current CPU. This complements the RT
> > task placement algorithm which tries to find a CPU that is not
> > currently busy with softirqs.
> > 
> > Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> > deferred as they can potentially run for long time.
> > 
> > Additionally, this patch stubs out ksoftirqd_running() logic,
> > in the CONFIG_RT_SOFTIRQ_OPTIMIZATION case, as deferring
> > potentially long-running softirqs will cause the logic to not
> > process shorter-running softirqs immediately. By stubbing it out
> > the potentially long running softirqs are deferred, but the
> > shorter running ones can still run immediately.
> 
> The cover letter didn't make it to my inbox (nor to others AFAICS from lore),
> so replying here.
> 
> The series looks good to me. It offers a compromise to avoid an existing
> conflict between RT and softirqs without disrupting much how both inherently
> work. I guess it's up to the maintainers to decide if this direction is
> acceptable or not.
> 
> I've seen Thomas had a comment on another softirq series (which attempts to
> throttle them ;-) by the way that is worth looking it:
> 
> 	https://lore.kernel.org/lkml/877d81jc13.ffs@tglx/
> 
> 
> Meanwhile, I did run a few tests on a test laptop that has 2 core SMT2 i7
> laptop (since I assume you tested on Android)
> 
> I set priority to 1 for all of these cyclic tests.
> 
> First I ran without applying your patch to fix the affinity problem in
> cyclictest:
> 
> I had a 1 hour run of 4 threads - 4 iperf threads and 4 dd threads are
> doing work in the background:
> 
>                    |     vanilla     | with softirq patches v4 |
> -------------------|-----------------|-------------------------|
> t0 max delay (us)  |      6728       |         2096            |
> t1 max delay (us)  |      2545       |         1990            |
> t2 max delay (us)  |      2282       |         2094            |
> t3 max delay (us)  |      6038       |         2162            |
> 
> Which shows max latency is improved a lot. Though because I missed applying
> your cyclictest patch, I believe this can be attributed only to patch 3 which
> defers the softirq if there's current task is an RT one.
> 
> 
> I applied your patch to cyclictest to NOT force affinity when specifying -t
> option.
> 
> 
> 
> Ran cyclictest for 4 hours, -t 3, 3 iperf threads and 3 dd threads running in
> the background:
> 
>                    |     vanilla     | with softirq patches v4 |
> -------------------|-----------------|-------------------------|
> t0 max delay (us)  |      2656       |         2164            |
> t1 max delay (us)  |      2773       |         1982            |
> t2 max delay (us)  |      2272       |         2278            |
> 
> I didn't hit a big max delay on this case. It shows things are better still.
> 
> 
> 
> Ran another cyclictest for 4 hours, -t 4, 4 iperf threads and 4 dd threads in
> the background:
> 
>                    |     vanilla     | with softirq patches v4 |
> -------------------|-----------------|-------------------------|
> t0 max delay (us)  |      4012       |         7074            |
> t1 max delay (us)  |      2460       |         9088            |
> t2 max delay (us)  |      3755       |         2784            |
> t3 max delay (us)  |      4392       |         2295            |
> 
> Here the results worryingly show that applying the patches make things much
> worse.
> 
> I still have to investigate why. I'll have another run to see if the results
> look different, then try to dig more.
> 
> All results are from the cyclictest json dump.

Actually scrap those results. I stupidly forgot to enable the
CONFIG_RT_SOFTIRQ_OPTIMIZATION..


I repeated the 4hours, 4-threads test 3 times:

                   |       vanilla      | with softirq patches v4  |
-------------------|--------------------|--------------------------|
                   |  #1  |  #2  |  #3  |   #1   |   #2   |   #3   |
-------------------|------|------|------|--------|--------|--------|
t0 max delay (us)  | 9594*| 2246 | 2317 |  2763  |  2274  |  2623  |
t1 max delay (us)  | 3236 | 2356 | 2816 |  2675  |  2962  |  2944  |
t2 max delay (us)  | 2271 | 2622 | 2829 |  2274  |  2848  |  2400  |
t3 max delay (us)  | 2216 | 6587*| 2724 |  2631  |  2753  |  3034  |

Worst case scenario is reduced to 3034us instead of 9594us.

I repeated the 1 hour 3 threads tests again too:

                   |       vanilla      | with softirq patches v4  |
-------------------|--------------------|--------------------------|
                   |  #1  |  #2  |  #3  |   #1   |   #2   |   #3   |
-------------------|_-----|------|------|--------|--------|--------|
t0 max delay (us)  |18059 | 2251 | 2365 |  2684  |  2779  |  2838  |
t1 max delay (us)  |16311 | 2261 | 2477 |  2963  |  3020  |  3226  |
t2 max delay (us)  | 8887 | 2259 | 2432 |  2904  |  2833  |  3016  |

Worst case scenario is 3226us for softirq compared to 18059 for vanilla 6.0.

This time I paid attention to the average as the best case number for vanilla
kernel is better:

                   |       vanilla      | with softirq patches v4  |
-------------------|--------------------|--------------------------|
                   |  #1  |  #2  |  #3  |   #1   |   #2   |   #3   |
-------------------|------|------|------|--------|--------|--------|
t0 avg delay (us)  |31.59 |22.94 |26.50 | 31.81  | 33.57  | 34.90  |
t1 avg delay (us)  |16.85 |16.32 |37.16 | 29.05  | 30.51  | 31.65  |
t2 avg delay (us)  |25.34 |32.12 |17.40 | 26.76  | 28.28  | 28.56  |

It shows that we largely hover around 30us with the patches compared to 16-26us
being more prevalent for vanilla kernels.

I am not sure I can draw a concrete conclusion from these numbers. It seems
I need to run longer than 4 hours to hit the worst case scenario every run on
the vanilla kernel. There's an indication that the worst case scenario is
harder to hit, and it looks there's a hit on the average delay.

I'm losing access to this system from today. I think I'll wait for more
feedback on this RFC; and do another round of testing for longer periods of
time once there's clearer sense this is indeed the direction we'll be going
for.

HTH.


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-17 14:44     ` Qais Yousef
@ 2022-10-18  0:04       ` John Stultz
  2022-10-19 11:01         ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: John Stultz @ 2022-10-18  0:04 UTC (permalink / raw)
  To: Qais Yousef
  Cc: LKML, John Dias, Connor O'Brien, Rick Yiu, John Kacur,
	Chris Redpath, Abhijeet Dharmapurikar, Peter Zijlstra,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Thomas Gleixner, kernel-team, J . Avila

On Mon, Oct 17, 2022 at 7:45 AM Qais Yousef <qais.yousef@arm.com> wrote:
> This time I paid attention to the average as the best case number for vanilla
> kernel is better:
>
>                    |       vanilla      | with softirq patches v4  |
> -------------------|--------------------|--------------------------|
>                    |  #1  |  #2  |  #3  |   #1   |   #2   |   #3   |
> -------------------|------|------|------|--------|--------|--------|
> t0 avg delay (us)  |31.59 |22.94 |26.50 | 31.81  | 33.57  | 34.90  |
> t1 avg delay (us)  |16.85 |16.32 |37.16 | 29.05  | 30.51  | 31.65  |
> t2 avg delay (us)  |25.34 |32.12 |17.40 | 26.76  | 28.28  | 28.56  |
>
> It shows that we largely hover around 30us with the patches compared to 16-26us
> being more prevalent for vanilla kernels.
>
> I am not sure I can draw a concrete conclusion from these numbers. It seems
> I need to run longer than 4 hours to hit the worst case scenario every run on
> the vanilla kernel. There's an indication that the worst case scenario is
> harder to hit, and it looks there's a hit on the average delay.

Thanks so much for running these tests and capturing these detailed numbers!

I'll have to look further into the average case going up here.

> I'm losing access to this system from today. I think I'll wait for more
> feedback on this RFC; and do another round of testing for longer periods of
> time once there's clearer sense this is indeed the direction we'll be going
> for.

Do you mind sending me the script you used to run the test, and I'll
try to reproduce on some x86 hardware locally?

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-17 12:40   ` [PATCH RFC " Alexander Gordeev
@ 2022-10-18  3:42     ` John Stultz
  2022-10-18  3:59       ` John Stultz
                         ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: John Stultz @ 2022-10-18  3:42 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

 On Mon, Oct 17, 2022 at 5:40 AM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> > From: Connor O'Brien <connoro@google.com>
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 55f39c8f4203..3c628db807c8 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
> >  #ifdef CONFIG_SMP
> >  static int find_lowest_rq(struct task_struct *task);
> >
> > +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> > +#define __use_softirq_opt 1
> > +/*
> > + * Return whether the given cpu is currently non-preemptible
> > + * while handling a potentially long softirq, or if the current
> > + * task is likely to block preemptions soon because it is a
> > + * ksoftirq thread that is handling slow softirq.
>
> What is slow softirqs in this context compared to long?

So the long softirqs are any of the softirqs in the LONG_SOFTIRQ_MASK
(net, block, irqpoll).
The "slow softirq" just refers to softirq processing that has already
been pushed out to ksoftirqd on the target cpu.

By default ksoftirqd's priority is likely not high enough priority to
prevent the rt task from running, however, it does disable preemption
while it's running the softirqs, so it effectively could block the rt
task from running for a while.


> > + */
> > +static bool cpu_busy_with_softirqs(int cpu)
> > +{
> > +     u32 softirqs = per_cpu(active_softirqs, cpu) |
> > +                    __cpu_softirq_pending(cpu);
> > +     struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> > +     struct task_struct *curr;
> > +     struct rq *rq = cpu_rq(cpu);
> > +     int ret;
> > +
> > +     rcu_read_lock();
> > +     curr = READ_ONCE(rq->curr); /* unlocked access */
>
> select_task_rq_rt() takes the lock and reads curr already,
> before calling this funciton. I think there is a way to
> decompose it in a better way.

Hrm. Suggestions? As select_task_rq_rt() is only one of the callers.
Trying to pass curr into cpu_busy_with_softirqs() would mean
cpupri_find_fitness() would need to read the cpu_rq(cpu)->curr for the
specified cpu and pass that in.

The original patch actually was


> > +     ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> > +              (curr == cpu_ksoftirqd ||
>
> EOL is extra.

Removed - thanks!

>
> > +               preempt_count() & SOFTIRQ_MASK);
>
> Could you please clarify this whole check in more detail?
>
> What is the point in checking if a remote CPU is handling
> a "long" softirq while the local one is handling any softirq?

Good question! This looks like an error from my rework of the earlier
version of the patch.
In v1 it was:
   task_thread_info(curr)->preempt_count & SOFTIRQ_MASK));
And it looks like I erroneously condensed that to preempt_count() &
SOFTIRQ_MASK  treating curr (from the target cpu rq) as if it were
current. :P

I'll fix this. Thank you for catching it!

Just to expand what it should be in detail:
1:  (softirqs & LONG_SOFTIRQ_MASK) &&
2:  (curr == cpu_ksoftirqd ||
3:  task_thread_info(curr)->preempt_count & SOFTIRQ_MASK)

Where we're checking
1) that  the active_softirqs and __cpu_softirq_pending() values on the
target cpu are running a long softirq.
AND (
2) The current task on the target cpu is ksoftirqd
OR
3) The preempt_count of the current task on the target cpu has SOFTIRQ entries
)

> > +     rcu_read_unlock();
>
> Why ret needs to be calculated under the lock?

I believe it is the rq->curr == cpu_ksoftirqd (and the erroneously
removed task_thread_info(curr)) check?
Part of me wonders if it should have the rcu_dereference(rq->curr) as well?

> > +     return ret;
> > +}
> > +#else
> > +#define __use_softirq_opt 0
> > +static bool cpu_busy_with_softirqs(int cpu)
> > +{
> > +     return false;
> > +}
> > +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> > +
> > +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
>
> To me, the new name is unfortunate, since it strips a notion
> of the reason. Instead, "CPU un/fits, because of capacity" it
> reads as "CPU un/fits, because of ..." what?

As with all complicated questions: "It depends" :) On the kernel
config specifically.

The earlier version of the patch had:
rt_task_fits_capacity_and_may_preempt() but Qais suggested it be
switched to the generic "fits_cpu" as it would depend on the kernel
config as to which aspect of "fit" was being considered.

I guess it could be  rt_task_fits_capacity_and_softirq_free() ?

> > +{
> > +     return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);
>
> I guess the order needs to be swapped, as rt_task_fits_capacity()
> is rather "quick" while cpu_busy_with_softirqs() is rather "slow".

Fair point. I've gone ahead and tweaked that. Thanks!

> > @@ -1894,14 +1934,17 @@ static int find_lowest_rq(struct task_struct *task)
> >               return -1; /* No other targets possible */
> >
> >       /*
> > -      * If we're on asym system ensure we consider the different capacities
> > -      * of the CPUs when searching for the lowest_mask.
> > +      * If we're using the softirq optimization or if we are
> > +      * on asym system, ensure we consider the softirq processing
> > +      * or different capacities of the CPUs when searching for the
> > +      * lowest_mask.
> >        */
> > -     if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> > +     if (__use_softirq_opt ||
>
> Why use __use_softirq_opt and not IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION)?

Because IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION) seemed a bit long
and less easy to read?
I'm ambivalent if folks would rather have the CONFIG switch be
explicit here, but to me it seemed nicer to read the flag variable.

> > @@ -60,6 +60,13 @@ static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp
> >
> >  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
> >
> > +/*
> > + * active_softirqs -- per cpu, a mask of softirqs that are being handled,
> > + * with the expectation that approximate answers are acceptable and therefore
> > + * no synchronization.
> > + */
> > +DEFINE_PER_CPU(u32, active_softirqs);
>
> I guess all active_softirqs uses need to be coupled with
> IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION) check.

I guess it could be wrapped. I didn't see the per-cpu accounting as
troublesome enough to conditionalize, but the extra per-cpu data isn't
great if no one uses it. So I'll do that.

Thanks so much for the review and feedback! I really appreciate it.
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-18  3:42     ` John Stultz
@ 2022-10-18  3:59       ` John Stultz
  2022-10-18  5:32       ` John Stultz
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-10-18  3:59 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Mon, Oct 17, 2022 at 8:42 PM John Stultz <jstultz@google.com> wrote:
>  On Mon, Oct 17, 2022 at 5:40 AM Alexander Gordeev
> <agordeev@linux.ibm.com> wrote:
> > select_task_rq_rt() takes the lock and reads curr already,
> > before calling this funciton. I think there is a way to
> > decompose it in a better way.
>
> Hrm. Suggestions? As select_task_rq_rt() is only one of the callers.
> Trying to pass curr into cpu_busy_with_softirqs() would mean
> cpupri_find_fitness() would need to read the cpu_rq(cpu)->curr for the
> specified cpu and pass that in.
>
> The original patch actually was
>

Whoops I didn't finish my thought there. I was going to say the
original patch did similar to your suggestion, passing the target cpu
curr task value in from select_task_rq_rt().
However it didn't use the cpupri_find_fitness() and duplicated a lot
of similar logic in a way that is not as nice as what the current
version uses.  But I'm very much open to suggestions for other ways to
simplify that as you suggest.

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-18  3:42     ` John Stultz
  2022-10-18  3:59       ` John Stultz
@ 2022-10-18  5:32       ` John Stultz
  2022-10-19  9:11       ` Alexander Gordeev
  2022-10-19 11:23       ` Qais Yousef
  3 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-10-18  5:32 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Mon, Oct 17, 2022 at 8:42 PM John Stultz <jstultz@google.com> wrote:
>  On Mon, Oct 17, 2022 at 5:40 AM Alexander Gordeev
> <agordeev@linux.ibm.com> wrote:
> > On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> > > +               preempt_count() & SOFTIRQ_MASK);
> >
> > Could you please clarify this whole check in more detail?
> >
> > What is the point in checking if a remote CPU is handling
> > a "long" softirq while the local one is handling any softirq?
>
> Good question! This looks like an error from my rework of the earlier
> version of the patch.
> In v1 it was:
>    task_thread_info(curr)->preempt_count & SOFTIRQ_MASK));
> And it looks like I erroneously condensed that to preempt_count() &
> SOFTIRQ_MASK  treating curr (from the target cpu rq) as if it were
> current. :P
>
> I'll fix this. Thank you for catching it!

Ah, it's not so trivial to fix as I see part of my motivation for
condensing it was task_thread_info(curr)->preempt_count doesn't build
on x86 (which uses percpu values instead of threadinfo).

So I'll have to think a bit more about how to do this generically, and
if the conditionalization can be otherwise simplified.

Thanks again for pointing the issue out!
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-18  3:42     ` John Stultz
  2022-10-18  3:59       ` John Stultz
  2022-10-18  5:32       ` John Stultz
@ 2022-10-19  9:11       ` Alexander Gordeev
  2022-10-19 22:09         ` John Stultz
  2022-10-19 11:23       ` Qais Yousef
  3 siblings, 1 reply; 29+ messages in thread
From: Alexander Gordeev @ 2022-10-19  9:11 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Mon, Oct 17, 2022 at 08:42:53PM -0700, John Stultz wrote:
> Hrm. Suggestions? As select_task_rq_rt() is only one of the callers.
> Trying to pass curr into cpu_busy_with_softirqs() would mean
> cpupri_find_fitness() would need to read the cpu_rq(cpu)->curr for the
> specified cpu and pass that in.

May be you could have a lightweight checker that accepts rq and curr
and gets called from select_task_rq_rt(). Then you could call that
same checker from cpu_busy_with_softirqs().

> Just to expand what it should be in detail:
> 1:  (softirqs & LONG_SOFTIRQ_MASK) &&
> 2:  (curr == cpu_ksoftirqd ||
> 3:  task_thread_info(curr)->preempt_count & SOFTIRQ_MASK)
> 
> Where we're checking
> 1) that  the active_softirqs and __cpu_softirq_pending() values on the
> target cpu are running a long softirq.
> AND (
> 2) The current task on the target cpu is ksoftirqd
> OR
> 3) The preempt_count of the current task on the target cpu has SOFTIRQ entries
> )

2) When the target CPU is handling or about to handle long softirqs
already what is the difference if it is also running ksoftirqd or not?

3) What is the point of this check when 1) is true already?

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
  2022-10-18  0:04       ` John Stultz
@ 2022-10-19 11:01         ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2022-10-19 11:01 UTC (permalink / raw)
  To: John Stultz
  Cc: Qais Yousef, LKML, John Dias, Connor O'Brien, Rick Yiu,
	John Kacur, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

(note the change in email)

On 10/17/22 17:04, John Stultz wrote:
> On Mon, Oct 17, 2022 at 7:45 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > This time I paid attention to the average as the best case number for vanilla
> > kernel is better:
> >
> >                    |       vanilla      | with softirq patches v4  |
> > -------------------|--------------------|--------------------------|
> >                    |  #1  |  #2  |  #3  |   #1   |   #2   |   #3   |
> > -------------------|------|------|------|--------|--------|--------|
> > t0 avg delay (us)  |31.59 |22.94 |26.50 | 31.81  | 33.57  | 34.90  |
> > t1 avg delay (us)  |16.85 |16.32 |37.16 | 29.05  | 30.51  | 31.65  |
> > t2 avg delay (us)  |25.34 |32.12 |17.40 | 26.76  | 28.28  | 28.56  |
> >
> > It shows that we largely hover around 30us with the patches compared to 16-26us
> > being more prevalent for vanilla kernels.
> >
> > I am not sure I can draw a concrete conclusion from these numbers. It seems
> > I need to run longer than 4 hours to hit the worst case scenario every run on
> > the vanilla kernel. There's an indication that the worst case scenario is
> > harder to hit, and it looks there's a hit on the average delay.
> 
> Thanks so much for running these tests and capturing these detailed numbers!
> 
> I'll have to look further into the average case going up here.
> 
> > I'm losing access to this system from today. I think I'll wait for more
> > feedback on this RFC; and do another round of testing for longer periods of
> > time once there's clearer sense this is indeed the direction we'll be going
> > for.
> 
> Do you mind sending me the script you used to run the test, and I'll
> try to reproduce on some x86 hardware locally?

I ran that in a personal CI setup. I basically do the following 3 in parallel
'scripts':

cyclictest.sh [1]:

	cyclictest -t 3 -p 99 -D 3600 -i 1000 --json=cyclictest.json

iperf.sh [2]:

	iperf -s -D
	iperf -c localhost -u -b 10g -t 3600 -i 1 -P 3

dd.sh [3]:

	while true
	do
		cyclictest_running=`ps -e | grep cyclictest || true`
		if [ "x$cyclictest_running" == "x" ]; then
			break
		fi

		#
		# Run dd
		#
		file="/tmp/myci.dd.file"
		for i in $(seq 3)
		do
			dd if=/dev/zero  of=$file.$i bs=1M count=2048 &
		done
		wait
		rm -f $file*

		sleep 3
	done

[1] https://github.com/qais-yousef/myci-sched-tests/blob/dev/vars/run_cyclictest.groovy
[2] https://github.com/qais-yousef/myci-sched-tests/blob/dev/vars/run_iperf_parallel.groovy
[3] https://github.com/qais-yousef/myci-sched-tests/blob/dev/vars/run_dd_parallel.groovy


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-18  3:42     ` John Stultz
                         ` (2 preceding siblings ...)
  2022-10-19  9:11       ` Alexander Gordeev
@ 2022-10-19 11:23       ` Qais Yousef
  3 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2022-10-19 11:23 UTC (permalink / raw)
  To: John Stultz
  Cc: Alexander Gordeev, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On 10/17/22 20:42, John Stultz wrote:

> > > +     return ret;
> > > +}
> > > +#else
> > > +#define __use_softirq_opt 0
> > > +static bool cpu_busy_with_softirqs(int cpu)
> > > +{
> > > +     return false;
> > > +}
> > > +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> > > +
> > > +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
> >
> > To me, the new name is unfortunate, since it strips a notion
> > of the reason. Instead, "CPU un/fits, because of capacity" it
> > reads as "CPU un/fits, because of ..." what?
> 
> As with all complicated questions: "It depends" :) On the kernel
> config specifically.
> 
> The earlier version of the patch had:
> rt_task_fits_capacity_and_may_preempt() but Qais suggested it be
> switched to the generic "fits_cpu" as it would depend on the kernel
> config as to which aspect of "fit" was being considered.
> 
> I guess it could be  rt_task_fits_capacity_and_softirq_free() ?

My line of thinking is that we have multiple reasons which could potentially
come and go. Having a simple generic name is future proof to the details of the
reason.

For example, in the future we might find a better way to handle the softirq
conflict and remove this reason; or we might find a new 'reason' is required.
Documenting that in the function header (if required, I see the one line
implementation is self descriptive so far) rather than in name made more sense
to me, hence the suggestion.

I am already aware of another new reason required to handle thermal pressure
when checking for capacity [1]. The discussion has stalled, but the problem is
still there and we'd need to address it. So I expect this code will be massaged
somewhat in the near future.

Sticking to rt_task_fits_cpu() will reduce the churn IMHO. But if you really
prefer something else, works for me :-)

[1] https://lore.kernel.org/lkml/20220514235513.jm7ul2y6uddj6eh2@airbuntu/


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-19  9:11       ` Alexander Gordeev
@ 2022-10-19 22:09         ` John Stultz
  2022-10-20  0:15           ` Qais Yousef
  2022-10-20 12:47           ` Alexander Gordeev
  0 siblings, 2 replies; 29+ messages in thread
From: John Stultz @ 2022-10-19 22:09 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Wed, Oct 19, 2022 at 2:11 AM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> On Mon, Oct 17, 2022 at 08:42:53PM -0700, John Stultz wrote:
> > Hrm. Suggestions? As select_task_rq_rt() is only one of the callers.
> > Trying to pass curr into cpu_busy_with_softirqs() would mean
> > cpupri_find_fitness() would need to read the cpu_rq(cpu)->curr for the
> > specified cpu and pass that in.
>
> May be you could have a lightweight checker that accepts rq and curr
> and gets called from select_task_rq_rt(). Then you could call that
> same checker from cpu_busy_with_softirqs().

Fair enough. Though your other questions are making me wonder if this
is necessary.

> > Just to expand what it should be in detail:
> > 1:  (softirqs & LONG_SOFTIRQ_MASK) &&
> > 2:  (curr == cpu_ksoftirqd ||
> > 3:  task_thread_info(curr)->preempt_count & SOFTIRQ_MASK)
> >
> > Where we're checking
> > 1) that  the active_softirqs and __cpu_softirq_pending() values on the
> > target cpu are running a long softirq.
> > AND (
> > 2) The current task on the target cpu is ksoftirqd
> > OR
> > 3) The preempt_count of the current task on the target cpu has SOFTIRQ entries
> > )
>
> 2) When the target CPU is handling or about to handle long softirqs
> already what is the difference if it is also running ksoftirqd or not?

Again, a good question! From my understanding, the original patch was
basically checking just #2 and #3 above, then additional logic was
added to narrow it to only the LONG_SOFTIRQ_MASK values, so that may
make the older part of the check redundant.

I fret there are some edge cases where on the target cpu softirqs
might be pending but ksoftirqd isn't running yet maybe due to a
lowish-prio rt task - such that the cpu could still be considered a
good target. But this seems a bit of a stretch.

> 3) What is the point of this check when 1) is true already?

Yeah, the more I think about this, the more duplicative it seems.
Again, there's some edge details about the preempt_count being set
before the active_softirq accounting is set, but the whole decision
here about the target cpus is a bit racy to begin with, so I'm not
sure if that is significant.

So I'll go ahead and simplify the check to just the LONG_SOFTIRQ_MASK
& (active | pending softirqs) check. This should avoid the need to
pull the cpu_rq(cpu)->curr value and simplify things.

Will send out a new version once I've been able to validate that
similification doesn't introduce a regression.

Thanks so much for the feedback and suggestions!
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-19 22:09         ` John Stultz
@ 2022-10-20  0:15           ` Qais Yousef
  2022-10-20 12:47           ` Alexander Gordeev
  1 sibling, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2022-10-20  0:15 UTC (permalink / raw)
  To: John Stultz
  Cc: Alexander Gordeev, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On 10/19/22 15:09, John Stultz wrote:

> I fret there are some edge cases where on the target cpu softirqs
> might be pending but ksoftirqd isn't running yet maybe due to a
> lowish-prio rt task - such that the cpu could still be considered a
> good target. But this seems a bit of a stretch.

Could using ksoftirqd_running() instead help with robustness here?


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-19 22:09         ` John Stultz
  2022-10-20  0:15           ` Qais Yousef
@ 2022-10-20 12:47           ` Alexander Gordeev
  2022-10-22 18:34             ` Joel Fernandes
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Gordeev @ 2022-10-20 12:47 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Wed, Oct 19, 2022 at 03:09:15PM -0700, John Stultz wrote:

Hi John,

[...]

> So I'll go ahead and simplify the check to just the LONG_SOFTIRQ_MASK
> & (active | pending softirqs) check. This should avoid the need to
> pull the cpu_rq(cpu)->curr value and simplify things.

In my reading of your approach if you find a way to additionally
indicate long softirqs being handled by the remote ksoftirqd, it
would cover all obvious/not-corner cases.

> -john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-03 23:20 ` [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs John Stultz
       [not found]   ` <20221004013611.1822-1-hdanton@sina.com>
  2022-10-17 12:40   ` [PATCH RFC " Alexander Gordeev
@ 2022-10-22 18:28   ` Joel Fernandes
  2022-11-15 21:36     ` John Stultz
  2 siblings, 1 reply; 29+ messages in thread
From: Joel Fernandes @ 2022-10-22 18:28 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

Hi,

On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> From: Connor O'Brien <connoro@google.com>
> 
> In certain audio use cases, scheduling RT threads on cores that
> are handling softirqs can lead to glitches. Prevent this
> behavior in cases where the softirq is likely to take a long
> time. To avoid unnecessary migrations, the old behavior is
> preserved for RCU, SCHED and TIMER irqs which are expected to be
> relatively quick.
> 
> This patch reworks and combines two related changes originally
> by John Dias <joaodias@google.com>
[..]
> ---
> v2:
> * Reformatted Kconfig entry to match coding style
>   (Reported-by: Randy Dunlap <rdunlap@infradead.org>)
> * Made rt_task_fits_capacity_and_may_preempt static to
>   avoid warnings (Reported-by: kernel test robot <lkp@intel.com>)
> * Rework to use preempt_count and drop kconfig dependency on ARM64
> v3:
> * Use introduced __cpu_softirq_pending() to avoid s390 build
>   issues (Reported-by: kernel test robot <lkp@intel.com>)
> v4:
> * Drop TASKLET_SOFTIRQ from LONG_SOFTIRQS (suggested by Qais)
> * Depend on !PREEMPT_RT (Suggested by Qais)
> * Larger simplification of logic (suggested by Qais)
> * Rework LONG_SOFTIRQS to use BIT() macros
> * Rename task_may_preempt() to cpu_busy_with_softirqs()
> ---
>  include/linux/interrupt.h |  6 ++++
>  init/Kconfig              | 10 +++++++
>  kernel/sched/rt.c         | 61 +++++++++++++++++++++++++++++++++------
>  kernel/softirq.c          |  9 ++++++
>  4 files changed, 77 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a749a8663841..e3a4add67e8c 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -582,6 +582,11 @@ enum
>   * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
>   */
>  #define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
> +/* Softirq's where the handling might be long: */
> +#define LONG_SOFTIRQ_MASK (BIT(NET_TX_SOFTIRQ)    | \
> +			   BIT(NET_RX_SOFTIRQ)    | \
> +			   BIT(BLOCK_SOFTIRQ)     | \
> +			   BIT(IRQ_POLL_SOFTIRQ))

Instead of a mask, I wonder if a better approach is to have a heuristic based
on measurement of softirq load. There is already code to measure irq load
(see update_irq_load_avg). Perhaps, Vincent would be willing to add the
softirq load in it as well if asked nicely ;)

That's much better because:
1. Any new softirqs added will also trigger this optimization.
2. Systems where these softirqs are quiet will not unnecessarily trigger it.
3. You can also include irq load so that things like IRQ storms also don't
cause RT issues (when there are better "uncompromised" idle CPU candidates).
4. may not be need CONFIG option at all if the optimization makes
sense for everything. Think all the systems not running Android.

>  /* map softirq index to softirq name. update 'softirq_to_name' in
>   * kernel/softirq.c when adding a new softirq.
> @@ -617,6 +622,7 @@ extern void raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq(unsigned int nr);
>  
>  DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> +DECLARE_PER_CPU(u32, active_softirqs);
>  
>  static inline struct task_struct *this_cpu_ksoftirqd(void)
>  {
> diff --git a/init/Kconfig b/init/Kconfig
> index 532362fcfe31..3d1de6edcfa1 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1284,6 +1284,16 @@ config SCHED_AUTOGROUP
>  	  desktop applications.  Task group autogeneration is currently based
>  	  upon task session.
>  
> +config RT_SOFTIRQ_OPTIMIZATION

I much dislike this config option name because it does not self-explain what
the option is and it is too long. More so, any future such optimizations,
even if rare, will have to come up with another name causing more confusion.
Instead, CONFIG_RT_AVOID_SOFTIRQS seems cleaner, but I'll defer to scheduler
maintainers since they have to take the patch ultimately. Though I'll give my
tag for a rename / better name. More below:

> +	bool "Improve RT scheduling during long softirq execution"
> +	depends on SMP && !PREEMPT_RT
> +	default n
> +	help
> +	  Enable an optimization which tries to avoid placing RT tasks on CPUs
> +	  occupied by nonpreemptible tasks, such as a long softirq or CPUs
> +	  which may soon block preemptions, such as a CPU running a ksoftirq
> +	  thread which handles slow softirqs.
> +
>  config SYSFS_DEPRECATED
>  	bool "Enable deprecated sysfs features to support old userspace tools"
>  	depends on SYSFS
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 55f39c8f4203..3c628db807c8 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
>  #ifdef CONFIG_SMP
>  static int find_lowest_rq(struct task_struct *task);
>  
> +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> +#define __use_softirq_opt 1

Why not just use IS_ENABLED(CONFIG_RT_SOFTIRQ_OPT..) instead of defining new
macros?

> + * Return whether the given cpu is currently non-preemptible
> + * while handling a potentially long softirq, or if the current
> + * task is likely to block preemptions soon because it is a
> + * ksoftirq thread that is handling slow softirq.
> + */
> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	u32 softirqs = per_cpu(active_softirqs, cpu) |
> +		       __cpu_softirq_pending(cpu);
> +	struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> +	struct task_struct *curr;
> +	struct rq *rq = cpu_rq(cpu);
> +	int ret;
> +
> +	rcu_read_lock();

Could you clarify what is the RCU read-side critical section protecting?

> +	curr = READ_ONCE(rq->curr); /* unlocked access */
> +	ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> +		 (curr == cpu_ksoftirqd ||
> +		  preempt_count() & SOFTIRQ_MASK);

If the goal is to avoid ksoftirqd when it is running an actual softirq,
doesn't ksoftirqd already run softirqs under local_bh_disable()? If so, then
SOFTIRQ_MASK should already be set. If ksoftirqd is in between softirqs (and
bh is enabled there), then scheduling the RT task on the CPU may preempt
ksoftirqd and give the RT task a chance to run anyway, right?.

> +	rcu_read_unlock();
> +	return ret;
> +}
> +#else
> +#define __use_softirq_opt 0

define not needed if using IS_ENABLED.

more comments later :)

thanks,

 - Joel


> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> +
> +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
> +{
> +	return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);
> +}
> +
>  static int
>  select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>  {
> @@ -1637,22 +1675,24 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>  	 * This test is optimistic, if we get it wrong the load-balancer
>  	 * will have to sort it out.
>  	 *
> -	 * We take into account the capacity of the CPU to ensure it fits the
> -	 * requirement of the task - which is only important on heterogeneous
> -	 * systems like big.LITTLE.
> +	 * We use rt_task_fits_cpu() to evaluate if the CPU is busy with
> +	 * potentially long-running softirq work, as well as take into
> +	 * account the capacity of the CPU to ensure it fits the
> +	 * requirement of the task - which is only important on
> +	 * heterogeneous systems like big.LITTLE.
>  	 */
>  	test = curr &&
>  	       unlikely(rt_task(curr)) &&
>  	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
>  
> -	if (test || !rt_task_fits_capacity(p, cpu)) {
> +	if (test || !rt_task_fits_cpu(p, cpu)) {
>  		int target = find_lowest_rq(p);
>  
>  		/*
>  		 * Bail out if we were forcing a migration to find a better
>  		 * fitting CPU but our search failed.
>  		 */
> -		if (!test && target != -1 && !rt_task_fits_capacity(p, target))
> +		if (!test && target != -1 && !rt_task_fits_cpu(p, target))
>  			goto out_unlock;
>  
>  		/*
> @@ -1894,14 +1934,17 @@ static int find_lowest_rq(struct task_struct *task)
>  		return -1; /* No other targets possible */
>  
>  	/*
> -	 * If we're on asym system ensure we consider the different capacities
> -	 * of the CPUs when searching for the lowest_mask.
> +	 * If we're using the softirq optimization or if we are
> +	 * on asym system, ensure we consider the softirq processing
> +	 * or different capacities of the CPUs when searching for the
> +	 * lowest_mask.
>  	 */
> -	if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> +	if (__use_softirq_opt ||

replace with IS_ENABLED.

> +	    static_branch_unlikely(&sched_asym_cpucapacity)) {
>  
>  		ret = cpupri_find_fitness(&task_rq(task)->rd->cpupri,
>  					  task, lowest_mask,
> -					  rt_task_fits_capacity);
> +					  rt_task_fits_cpu);
>  	} else {
>  
>  		ret = cpupri_find(&task_rq(task)->rd->cpupri,
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..35ee79dd8786 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -60,6 +60,13 @@ static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
>  
> +/*
> + * active_softirqs -- per cpu, a mask of softirqs that are being handled,
> + * with the expectation that approximate answers are acceptable and therefore
> + * no synchronization.
> + */
> +DEFINE_PER_CPU(u32, active_softirqs);
> +
>  const char * const softirq_to_name[NR_SOFTIRQS] = {
>  	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
>  	"TASKLET", "SCHED", "HRTIMER", "RCU"
> @@ -551,6 +558,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  restart:
>  	/* Reset the pending bitmask before enabling irqs */
>  	set_softirq_pending(0);
> +	__this_cpu_write(active_softirqs, pending);
>  
>  	local_irq_enable();
>  
> @@ -580,6 +588,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  		pending >>= softirq_bit;
>  	}
>  
> +	__this_cpu_write(active_softirqs, 0);
>  	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
>  	    __this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-20 12:47           ` Alexander Gordeev
@ 2022-10-22 18:34             ` Joel Fernandes
  2022-10-23  7:45               ` Alexander Gordeev
  0 siblings, 1 reply; 29+ messages in thread
From: Joel Fernandes @ 2022-10-22 18:34 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: John Stultz, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Thu, Oct 20, 2022 at 02:47:54PM +0200, Alexander Gordeev wrote:
> On Wed, Oct 19, 2022 at 03:09:15PM -0700, John Stultz wrote:
> 
> Hi John,
> 
> [...]
> 
> > So I'll go ahead and simplify the check to just the LONG_SOFTIRQ_MASK
> > & (active | pending softirqs) check. This should avoid the need to
> > pull the cpu_rq(cpu)->curr value and simplify things.
> 
> In my reading of your approach if you find a way to additionally
> indicate long softirqs being handled by the remote ksoftirqd, it
> would cover all obvious/not-corner cases.

How will that help? The long softirq executing inside ksoftirqd will disable
preemption and prevent any RT task from executing.

Did I miss something?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-22 18:34             ` Joel Fernandes
@ 2022-10-23  7:45               ` Alexander Gordeev
  2022-11-15  7:08                 ` John Stultz
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Gordeev @ 2022-10-23  7:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: John Stultz, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Sat, Oct 22, 2022 at 06:34:37PM +0000, Joel Fernandes wrote:
> > In my reading of your approach if you find a way to additionally
> > indicate long softirqs being handled by the remote ksoftirqd, it
> > would cover all obvious/not-corner cases.
> 
> How will that help? The long softirq executing inside ksoftirqd will disable
> preemption and prevent any RT task from executing.

Right. So the check to deem a remote CPU unfit would (logically) look like this:

(active | pending | ksoftirqd) & LONG_SOFTIRQ_MASK

> Did I miss something?

Or me :)

> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-23  7:45               ` Alexander Gordeev
@ 2022-11-15  7:08                 ` John Stultz
  2022-11-15 12:55                   ` Alexander Gordeev
  0 siblings, 1 reply; 29+ messages in thread
From: John Stultz @ 2022-11-15  7:08 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Joel Fernandes, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Sun, Oct 23, 2022 at 12:45 AM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> On Sat, Oct 22, 2022 at 06:34:37PM +0000, Joel Fernandes wrote:
> > > In my reading of your approach if you find a way to additionally
> > > indicate long softirqs being handled by the remote ksoftirqd, it
> > > would cover all obvious/not-corner cases.
> >
> > How will that help? The long softirq executing inside ksoftirqd will disable
> > preemption and prevent any RT task from executing.
>
> Right. So the check to deem a remote CPU unfit would (logically) look like this:
>
> (active | pending | ksoftirqd) & LONG_SOFTIRQ_MASK
>

Alexander,
  Apologies for the late response here, some other work took priority for a bit.

Thanks again for the feedback. But I wanted to follow up on your
suggestion here, as I'm not quite sure I see why we need the separate
ksoftirqd bitmask here?

As run_ksoftirqd() basically looks at the pending set and calls
__do_softirq() which then moves the bits from the pending mask  to
active mask while they are being run.

So (pending|active)&LONG_SOFTIRQ_MASK seems like it should be a
sufficient check regardless of if the remote cpu is in softirq or
ksoftirqd, no?

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-11-15  7:08                 ` John Stultz
@ 2022-11-15 12:55                   ` Alexander Gordeev
  0 siblings, 0 replies; 29+ messages in thread
From: Alexander Gordeev @ 2022-11-15 12:55 UTC (permalink / raw)
  To: John Stultz
  Cc: Joel Fernandes, LKML, Connor O'Brien, John Dias, Rick Yiu,
	John Kacur, Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Mon, Nov 14, 2022 at 11:08:36PM -0800, John Stultz wrote:

Hi John,
...
> > Right. So the check to deem a remote CPU unfit would (logically) look like this:
> >
> > (active | pending | ksoftirqd) & LONG_SOFTIRQ_MASK
...
> As run_ksoftirqd() basically looks at the pending set and calls
> __do_softirq() which then moves the bits from the pending mask  to
> active mask while they are being run.
> 
> So (pending|active)&LONG_SOFTIRQ_MASK seems like it should be a
> sufficient check regardless of if the remote cpu is in softirq or
> ksoftirqd, no?

I did not realize run_ksoftirqd()->__do_softirq() covers it.
Sorry for the noise.

Thanks!

> thanks
> -john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs
  2022-10-22 18:28   ` [RFC PATCH " Joel Fernandes
@ 2022-11-15 21:36     ` John Stultz
  0 siblings, 0 replies; 29+ messages in thread
From: John Stultz @ 2022-11-15 21:36 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Connor O'Brien, John Dias, Rick Yiu, John Kacur,
	Qais Yousef, Chris Redpath, Abhijeet Dharmapurikar,
	Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Thomas Gleixner, kernel-team,
	J . Avila

On Sat, Oct 22, 2022 at 11:28 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index a749a8663841..e3a4add67e8c 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -582,6 +582,11 @@ enum
> >   * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
> >   */
> >  #define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
> > +/* Softirq's where the handling might be long: */
> > +#define LONG_SOFTIRQ_MASK (BIT(NET_TX_SOFTIRQ)    | \
> > +                        BIT(NET_RX_SOFTIRQ)    | \
> > +                        BIT(BLOCK_SOFTIRQ)     | \
> > +                        BIT(IRQ_POLL_SOFTIRQ))
>
> Instead of a mask, I wonder if a better approach is to have a heuristic based
> on measurement of softirq load. There is already code to measure irq load
> (see update_irq_load_avg). Perhaps, Vincent would be willing to add the
> softirq load in it as well if asked nicely ;)
>
> That's much better because:
> 1. Any new softirqs added will also trigger this optimization.
> 2. Systems where these softirqs are quiet will not unnecessarily trigger it.
> 3. You can also include irq load so that things like IRQ storms also don't
> cause RT issues (when there are better "uncompromised" idle CPU candidates).
> 4. may not be need CONFIG option at all if the optimization makes
> sense for everything. Think all the systems not running Android.

Hey Joel,
  Thanks again for the feedback, and sorry to take awhile to get back to you.

I'll have to think about this some more, but I'm hesitant about
switching to a load based hursitic approach. Mostly as responding to
"load" feels a bit fuzzy to me.
If we suddenly get a burst of softirqs preempting scheduling, we don't
want to have to wait for the load tracking to recognize this, as the
effect of the blocked audio processing will be immediate.

That's why this optimization just avoids scheduling rt tasks on cpus
that are running potentially long-running softirqs, as we can't be
sure in that instance we'll be able to run soon.


> >  /* map softirq index to softirq name. update 'softirq_to_name' in
> >   * kernel/softirq.c when adding a new softirq.
> > @@ -617,6 +622,7 @@ extern void raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq(unsigned int nr);
> >
> >  DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> > +DECLARE_PER_CPU(u32, active_softirqs);
> >
> >  static inline struct task_struct *this_cpu_ksoftirqd(void)
> >  {
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 532362fcfe31..3d1de6edcfa1 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1284,6 +1284,16 @@ config SCHED_AUTOGROUP
> >         desktop applications.  Task group autogeneration is currently based
> >         upon task session.
> >
> > +config RT_SOFTIRQ_OPTIMIZATION
>
> I much dislike this config option name because it does not self-explain what
> the option is and it is too long. More so, any future such optimizations,
> even if rare, will have to come up with another name causing more confusion.
> Instead, CONFIG_RT_AVOID_SOFTIRQS seems cleaner, but I'll defer to scheduler
> maintainers since they have to take the patch ultimately. Though I'll give my
> tag for a rename / better name. More below:

That's a fair point.  RT_SOFTIRQ_AWARE_SCHED maybe?


> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 55f39c8f4203..3c628db807c8 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
> >  #ifdef CONFIG_SMP
> >  static int find_lowest_rq(struct task_struct *task);
> >
> > +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> > +#define __use_softirq_opt 1
>
> Why not just use IS_ENABLED(CONFIG_RT_SOFTIRQ_OPT..) instead of defining new
> macros?

Mostly for readability. The lines where its added are already fairly long, ie:
  if (static_branch_unlikely(&sched_asym_cpucapacity)) {

So it seemed nicer to have the shorter macro.  But I'm not strongly
opinionated on this.

> > + * Return whether the given cpu is currently non-preemptible
> > + * while handling a potentially long softirq, or if the current
> > + * task is likely to block preemptions soon because it is a
> > + * ksoftirq thread that is handling slow softirq.
> > + */
> > +static bool cpu_busy_with_softirqs(int cpu)
> > +{
> > +     u32 softirqs = per_cpu(active_softirqs, cpu) |
> > +                    __cpu_softirq_pending(cpu);
> > +     struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> > +     struct task_struct *curr;
> > +     struct rq *rq = cpu_rq(cpu);
> > +     int ret;
> > +
> > +     rcu_read_lock();
>
> Could you clarify what is the RCU read-side critical section protecting?
>
> > +     curr = READ_ONCE(rq->curr); /* unlocked access */
> > +     ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> > +              (curr == cpu_ksoftirqd ||
> > +               preempt_count() & SOFTIRQ_MASK);
>
> If the goal is to avoid ksoftirqd when it is running an actual softirq,
> doesn't ksoftirqd already run softirqs under local_bh_disable()? If so, then
> SOFTIRQ_MASK should already be set. If ksoftirqd is in between softirqs (and
> bh is enabled there), then scheduling the RT task on the CPU may preempt
> ksoftirqd and give the RT task a chance to run anyway, right?.

Yeah. I'm refactoring/simplifying this for v5, so hopefully it will be
more straightforward.

Thanks again for the feedback!
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-11-15 21:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-03 23:20 [RFC PATCH v4 0/3] Softirq -rt Optimizations John Stultz
2022-10-03 23:20 ` [RFC PATCH v4 1/3] softirq: Add generic accessor to percpu softirq_pending data John Stultz
2022-10-03 23:20 ` [RFC PATCH v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs John Stultz
     [not found]   ` <20221004013611.1822-1-hdanton@sina.com>
2022-10-04  2:29     ` John Stultz
     [not found]       ` <20221005002149.1876-1-hdanton@sina.com>
2022-10-05  1:13         ` John Stultz
     [not found]         ` <20221005060155.1571-1-hdanton@sina.com>
2022-10-10 15:42           ` Qais Yousef
     [not found]             ` <20221011111846.284-1-hdanton@sina.com>
2022-10-12 14:10               ` Qais Yousef
2022-10-17 12:40   ` [PATCH RFC " Alexander Gordeev
2022-10-18  3:42     ` John Stultz
2022-10-18  3:59       ` John Stultz
2022-10-18  5:32       ` John Stultz
2022-10-19  9:11       ` Alexander Gordeev
2022-10-19 22:09         ` John Stultz
2022-10-20  0:15           ` Qais Yousef
2022-10-20 12:47           ` Alexander Gordeev
2022-10-22 18:34             ` Joel Fernandes
2022-10-23  7:45               ` Alexander Gordeev
2022-11-15  7:08                 ` John Stultz
2022-11-15 12:55                   ` Alexander Gordeev
2022-10-19 11:23       ` Qais Yousef
2022-10-22 18:28   ` [RFC PATCH " Joel Fernandes
2022-11-15 21:36     ` John Stultz
2022-10-03 23:20 ` [RFC PATCH v4 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT John Stultz
2022-10-04 10:45   ` David Laight
2022-10-04 19:19     ` John Stultz
2022-10-10 16:09   ` Qais Yousef
2022-10-17 14:44     ` Qais Yousef
2022-10-18  0:04       ` John Stultz
2022-10-19 11:01         ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).