All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] softirq: Introduce softirq throttling
@ 2022-04-06  2:52 Liao Chang
  2022-04-06  2:52 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Liao Chang @ 2022-04-06  2:52 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, liaochang1, tglx, clg, nitesh,
	edumazet, peterz, joshdon, masahiroy, nathan, akpm, vbabka,
	gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

Kernel check for pending softirqs periodically, they are performed in a
few points of kernel code, such as irq_exit() and __local_bh_enable_ip(),
softirqs that have been activated by a given CPU must be executed on the
same CPU, this characteristic of softirq is always a potentially
"dangerous" operation, because one CPU might be end up very busy while
the other are most idle.

Above concern is proven in a networking user case: recenlty, we
engineer find out the time used for connection re-establishment on
kernel v5.10 is 300 times larger than v4.19, meanwhile, softirq
monopolize almost 99% of CPU. This problem stem from that the connection
between Sender and Receiver node get lost, the NIC driver on Sender node
will keep raising NET_TX softirq before connection recovery. The system
log show that most of softirq is performed from __local_bh_enable_ip(),
since __local_bh_enable_ip is used widley in kernel code, it is very
easy to run out most of CPU, and the user-mode application can't obtain
enough CPU cycles to establish connection as soon as possible.

Although kernel limit the running time of __do_softirq(), it does not
control the running time of entire softirqs on given CPU, so this
patchset introduce a safeguard mechanism that allows the system
administrator to allocate bandwidth for used by softirqs, this safeguard
mechanism is known as Sofitrq Throttling and is controlled by two
parameters in the /proc file system:

/proc/sys/kernel/sofitrq_period_ms
  Defines the period in ms(millisecond) to be considered as 100% of CPU
  bandwidth, the default value is 1,000 ms(1second). Changes to the
  value of the period must be very well thought out, as too long or too
  short are beyond one's expectation.

/proc/sys/kernel/softirq_runtime_ms
  Define the bandwidth available to softirqs on each CPU, the default
  values is 950 ms(0.95 second) or, in other words, 95% of the CPU
  bandwidth. Setting negative integer to this value means that softirqs
  my use up to 100% CPU times.

The default values for softirq throttling mechanism define that 95% of
the CPU time can be used by softirqs. The remaing 5% will be devoted to
other kinds of tasks, such as syscall, interrupt, exception, real-time
processes and normal processes when the softirqs workload in system are
very heavy. System administrator can tune above two parameters to
satifies the need of system performance and stability.

Liao Chang (3):
  softirq: Add two parameters to control CPU bandwidth for use by
    softirq
  softirq: Do throttling when softirqs use up its bandwidth
  softirq: Introduce statistics about softirq throttling

 fs/proc/softirqs.c          |  18 +++++
 include/linux/interrupt.h   |   7 ++
 include/linux/kernel_stat.h |  27 +++++++
 init/Kconfig                |  10 +++
 kernel/softirq.c            | 155 ++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c             |  16 ++++
 6 files changed, 233 insertions(+)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq
  2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
@ 2022-04-06  2:52 ` Liao Chang
  2022-04-06  2:52 ` [RFC 2/3] softirq: Do throttling when softirqs use up its bandwidth Liao Chang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Liao Chang @ 2022-04-06  2:52 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, liaochang1, tglx, clg, nitesh,
	edumazet, peterz, joshdon, masahiroy, nathan, akpm, vbabka,
	gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

Although kernel merely allow __do_softirq() run for 2ms at most, it does
not control the total running time of softirqs on given CPU. In order to
prevent softirqs from using up all CPU bandwidth and cause other task
starved, Sofitrq Throttling mechanism introduces two parameters in the
/proc file system:

/proc/sys/kernel/sofitrq_period_ms
  Defines the period in ms(millisecond) to be considered as 100% of CPU
  bandwidth, the default value is 1,000 ms(1second). Changes to the
  value of the period must be very well thought out, as too long or too
  short are beyond one's expectation.

/proc/sys/kernel/softirq_runtime_ms
  Define the bandwidth available to softirqs on each CPU, the default
  values is 950 ms(0.95 second) or, in other words, 95% of the CPU
  bandwidth. Setting this value to -1 means that softirqs might use up
  to 100% CPU cycles.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 include/linux/interrupt.h |  7 ++++
 init/Kconfig              | 10 ++++++
 kernel/softirq.c          | 74 +++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c           | 16 +++++++++
 4 files changed, 107 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9367f1cb2e3c..de6973bf72e5 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -605,6 +605,13 @@ extern void __raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+extern unsigned int sysctl_softirq_period_ms;
+extern int sysctl_softirq_runtime_ms;
+int softirq_throttle_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos);
+#endif
+
 DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
 
 static inline struct task_struct *this_cpu_ksoftirqd(void)
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..a63ebc88a199 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2393,3 +2393,13 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+config SOFTIRQ_THROTTLE
+	bool "Softirq Throttling Feature"
+	help
+	  Allow to allocate bandwidth for use by softirq handling. This
+	  saftguard machanism is known as softirq throttling and is controlled
+	  by two parameters in the /proc/ file system:
+
+	  /proc/sysctl/kernel/softirq_period_ms
+	  /proc/sysctl/kernel/softirq_runtime_ms
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 41f470929e99..8aac9e2631fd 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -65,6 +65,76 @@ const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"TASKLET", "SCHED", "HRTIMER", "RCU"
 };
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+unsigned int sysctl_softirq_period_ms = 1000;
+int sysctl_softirq_runtime_ms = 950;
+
+struct softirq_throttle {
+	unsigned int period;
+	unsigned int runtime;
+	raw_spinlock_t lock;
+} si_throttle;
+
+static int softirq_throttle_validate(void)
+{
+	if (((int)sysctl_softirq_period_ms <= 0) ||
+		((sysctl_softirq_runtime_ms != -1) &&
+		 ((unsigned int)sysctl_softirq_runtime_ms > sysctl_softirq_period_ms)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void softirq_throttle_update(void)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&si_throttle.lock, flags);
+	si_throttle.period = sysctl_softirq_period_ms;
+	si_throttle.runtime = sysctl_softirq_runtime_ms;
+	raw_spin_unlock_irqrestore(&si_throttle.lock, flags);
+}
+
+int softirq_throttle_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos)
+{
+	unsigned int old_period, old_runtime;
+	static DEFINE_MUTEX(mutex);
+	int ret;
+
+	mutex_lock(&mutex);
+	old_period = sysctl_softirq_period_ms;
+	old_runtime = sysctl_softirq_runtime_ms;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret)
+		goto undo;
+	if (!write)
+		goto done;
+
+	ret = softirq_throttle_validate();
+	if (ret)
+		goto undo;
+
+	softirq_throttle_update();
+	goto done;
+
+undo:
+	sysctl_softirq_period_ms = old_period;
+	sysctl_softirq_runtime_ms = old_runtime;
+done:
+	mutex_unlock(&mutex);
+	return ret;
+}
+
+static void softirq_throttle_init(void)
+{
+	si_throttle.period = sysctl_softirq_period_ms;
+	si_throttle.runtime = sysctl_softirq_runtime_ms;
+	raw_spin_lock_init(&si_throttle.lock);
+}
+#endif
+
 /*
  * we cannot loop indefinitely here to avoid userspace starvation,
  * but we also don't want to introduce a worst case 1/HZ latency
@@ -894,6 +964,10 @@ void __init softirq_init(void)
 {
 	int cpu;
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	softirq_throttle_init();
+#endif
+
 	for_each_possible_cpu(cpu) {
 		per_cpu(tasklet_vec, cpu).tail =
 			&per_cpu(tasklet_vec, cpu).head;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5ae443b2882e..e5a9ad391cca 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1771,6 +1771,22 @@ static struct ctl_table kern_table[] = {
 		.extra1		= SYSCTL_ONE,
 	},
 #endif
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	{
+		.procname	= "softirq_period_ms",
+		.data		= &sysctl_softirq_period_ms,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= softirq_throttle_handler,
+	},
+	{
+		.procname	= "softirq_runtime_ms",
+		.data		= &sysctl_softirq_runtime_ms,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= softirq_throttle_handler,
+	},
+#endif
 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
 	{
 		.procname	= "sched_energy_aware",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/3] softirq: Do throttling when softirqs use up its bandwidth
  2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
  2022-04-06  2:52 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
@ 2022-04-06  2:52 ` Liao Chang
  2022-04-06  2:52 ` [RFC 3/3] softirq: Introduce statistics about softirq throttling Liao Chang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Liao Chang @ 2022-04-06  2:52 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, liaochang1, tglx, clg, nitesh,
	edumazet, peterz, joshdon, masahiroy, nathan, akpm, vbabka,
	gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

When kernel is about to handle pending softirqs, it will check firstly
whether softirq already use up its CPU bandwidth, Once the total duration
of softirq handling exceed the max value in the user-specified time window,
softirq will be throttled for a while, the throttling will be removed
when time window expires.

On then other hand, kernel will update the runtime of softirq on given CPU
before __do_softirq() function returns.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 kernel/softirq.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 8aac9e2631fd..6de6db794ac5 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -75,6 +75,49 @@ struct softirq_throttle {
 	raw_spinlock_t lock;
 } si_throttle;
 
+struct softirq_runtime {
+	bool	throttled;
+	unsigned long	duration;
+	unsigned long	expires;
+	raw_spinlock_t lock;
+};
+static DEFINE_PER_CPU(struct softirq_runtime, softirq_runtime);
+
+static void forward_softirq_expires(struct softirq_runtime *si_runtime)
+{
+	si_runtime->throttled = false;
+	si_runtime->duration = 0UL;
+	si_runtime->expires = jiffies +
+		msecs_to_jiffies(si_throttle.period - si_throttle.runtime);
+}
+
+static void update_softirq_runtime(unsigned long duration)
+{
+	struct softirq_runtime *si_runtime = this_cpu_ptr(&softirq_runtime);
+
+	raw_spin_lock(&si_runtime->lock);
+	si_runtime->duration += jiffies_to_msecs(duration);
+	if ((si_runtime->duration >= si_throttle.runtime) &&
+		time_before(jiffies, si_runtime->expires)) {
+		si_runtime->throttled = true;
+	}
+	raw_spin_unlock(&si_runtime->lock);
+}
+
+static bool softirq_runtime_exceeded(void)
+{
+	struct softirq_runtime *si_runtime = this_cpu_ptr(&softirq_runtime);
+
+	if ((unsigned int)si_throttle.runtime >= si_throttle.period)
+		return false;
+
+	raw_spin_lock(&si_runtime->lock);
+	if (!time_before(jiffies, si_runtime->expires))
+		forward_softirq_expires(si_runtime);
+	raw_spin_unlock(&si_runtime->lock);
+	return si_runtime->throttled;
+}
+
 static int softirq_throttle_validate(void)
 {
 	if (((int)sysctl_softirq_period_ms <= 0) ||
@@ -88,10 +131,18 @@ static int softirq_throttle_validate(void)
 static void softirq_throttle_update(void)
 {
 	unsigned long flags;
+	struct softirq_runtime *si_runtime;
 
 	raw_spin_lock_irqsave(&si_throttle.lock, flags);
 	si_throttle.period = sysctl_softirq_period_ms;
 	si_throttle.runtime = sysctl_softirq_runtime_ms;
+
+	for_each_possible_cpu(cpu, &) {
+		si_runtime = per_cpu_ptr(&softirq_runtime, cpu);
+		raw_spin_lock(&si_runtime->lock);
+		forward_softirq_expires(si_runtime);
+		raw_spin_unlock(&si_runtime->lock);
+	}
 	raw_spin_unlock_irqrestore(&si_throttle.lock, flags);
 }
 
@@ -129,9 +180,17 @@ int softirq_throttle_handler(struct ctl_table *table, int write, void *buffer,
 
 static void softirq_throttle_init(void)
 {
+	struct softirq_runtime *si_runtime;
+
 	si_throttle.period = sysctl_softirq_period_ms;
 	si_throttle.runtime = sysctl_softirq_runtime_ms;
 	raw_spin_lock_init(&si_throttle.lock);
+
+	for_each_possible_cpu(cpu) {
+		si_runtime = per_cpu_ptr(&softirq_runtime, cpu);
+		forward_softirq_expires(si_runtime);
+		raw_spin_lock_init(&si_runtime->lock);
+	}
 }
 #endif
 
@@ -592,6 +651,13 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	__u32 pending;
 	int softirq_bit;
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	bool exceeded = softirq_runtime_exceeded();
+
+	if (exceeded)
+		return;
+#endif
+
 	/*
 	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
 	 * softirq. A softirq handled, such as network RX, might set PF_MEMALLOC
@@ -652,6 +718,10 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		wakeup_softirqd();
 	}
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	update_softirq_runtime(jiffies - (end - MAX_SOFTIRQ_TIME));
+#endif
+
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
 	softirq_handle_end();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/3] softirq: Introduce statistics about softirq throttling
  2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
  2022-04-06  2:52 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
  2022-04-06  2:52 ` [RFC 2/3] softirq: Do throttling when softirqs use up its bandwidth Liao Chang
@ 2022-04-06  2:52 ` Liao Chang
  2022-04-06 14:54 ` [RFC 0/3] softirq: Introduce " Matthew Wilcox
  2022-04-07 10:57 ` Thomas Gleixner
  4 siblings, 0 replies; 9+ messages in thread
From: Liao Chang @ 2022-04-06  2:52 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, liaochang1, tglx, clg, nitesh,
	edumazet, peterz, joshdon, masahiroy, nathan, akpm, vbabka,
	gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

This patch introduces counting the number of time spent on softirqs for
each CPU and the number of time that softirqs has been throttled for
each CPU, which are reported in /proc/softirqs, for example:

$cat /proc/softirqs
                    CPU0       CPU1       CPU2       CPU3
          HI:          0          0          0          0
       TIMER:       1088        855        197       4862
      NET_TX:          0          0          0          0
      NET_RX:         15          1          0          0
       BLOCK:         14         11         86         75
    IRQ_POLL:          0          0          0          0
     TASKLET:    5926026    6133070    5646523    6149053
       SCHED:      18061      15939      15746      16004
     HRTIMER:          0          0          0          0
         RCU:        668        778        939        720
                    CPU0       CPU1       CPU2       CPU3
 DURATION_MS:      91556      69888      66784      73772
 THROTTLE_MS:      77820       7328       5828       8904

Row starts with "DURATION_MS:" indicates how many milliseconds used for
softirqs on each CPU. Row starts with "THROTTLE_MS:" indicates how many
milliseconds softirq throttling lasted on each CPU.

Notice: the rate of "THROTTLE_MS" increase is controlled by parameter
"kernel.softirq_period_ms" and "kernel.softirq_runtime_ms", generally
speaking, the smaller softirq CPU bandwidth is, the faster "THROTTLE_MS"
increase, especially when pending softirq workload is very heavy.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 fs/proc/softirqs.c          | 18 ++++++++++++++++++
 include/linux/kernel_stat.h | 27 +++++++++++++++++++++++++++
 kernel/softirq.c            | 15 +++++++++++++--
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/fs/proc/softirqs.c b/fs/proc/softirqs.c
index 12901dcf57e2..5ea3ede9833e 100644
--- a/fs/proc/softirqs.c
+++ b/fs/proc/softirqs.c
@@ -22,6 +22,24 @@ static int show_softirqs(struct seq_file *p, void *v)
 			seq_printf(p, " %10u", kstat_softirqs_cpu(i, j));
 		seq_putc(p, '\n');
 	}
+
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	seq_puts(p, "                    ");
+	for_each_possible_cpu(i)
+		seq_printf(p, "CPU%-8d", i);
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%12s:", "DURATION_MS");
+	for_each_possible_cpu(j)
+		seq_printf(p, " %10lu", kstat_softirq_duration(j));
+	seq_putc(p, '\n');
+
+	seq_printf(p, "%12s:", "THROTTLE_MS");
+	for_each_possible_cpu(j)
+		seq_printf(p, " %10lu", kstat_softirq_throttle(j));
+	seq_putc(p, '\n');
+#endif
+
 	return 0;
 }
 
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 69ae6b278464..bbb52c55aad4 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -38,6 +38,11 @@ struct kernel_cpustat {
 struct kernel_stat {
 	unsigned long irqs_sum;
 	unsigned int softirqs[NR_SOFTIRQS];
+
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	unsigned long softirq_duration;
+	unsigned long softirq_throttle;
+#endif
 };
 
 DECLARE_PER_CPU(struct kernel_stat, kstat);
@@ -64,6 +69,28 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
        return kstat_cpu(cpu).softirqs[irq];
 }
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+static inline unsigned long kstat_softirq_duration(int cpu)
+{
+	return jiffies_to_msecs(kstat_cpu(cpu).softirq_duration);
+}
+
+static inline unsigned long kstat_softirq_throttle(int cpu)
+{
+	return jiffies_to_msecs(kstat_cpu(cpu).softirq_throttle);
+}
+
+static inline unsigned long kstat_incr_softirq_duration(unsigned long delta)
+{
+	return kstat_this_cpu->softirq_duration += delta;
+}
+
+static inline unsigned long kstat_incr_softirq_throttle(unsigned long delta)
+{
+	return kstat_this_cpu->softirq_throttle += delta;
+}
+#endif
+
 /*
  * Number of interrupts per specific IRQ source, since bootup
  */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 6de6db794ac5..7fc0dc39f788 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -79,6 +79,7 @@ struct softirq_runtime {
 	bool	throttled;
 	unsigned long	duration;
 	unsigned long	expires;
+	unsigned long	throttled_ts;
 	raw_spinlock_t lock;
 };
 static DEFINE_PER_CPU(struct softirq_runtime, softirq_runtime);
@@ -94,12 +95,16 @@ static void forward_softirq_expires(struct softirq_runtime *si_runtime)
 static void update_softirq_runtime(unsigned long duration)
 {
 	struct softirq_runtime *si_runtime = this_cpu_ptr(&softirq_runtime);
+	unsigned long now = jiffies;
+
+	kstat_incr_softirq_duration(duration);
 
 	raw_spin_lock(&si_runtime->lock);
 	si_runtime->duration += jiffies_to_msecs(duration);
 	if ((si_runtime->duration >= si_throttle.runtime) &&
-		time_before(jiffies, si_runtime->expires)) {
+		time_before(now, si_runtime->expires)) {
 		si_runtime->throttled = true;
+		si_runtime->throttled_ts = now;
 	}
 	raw_spin_unlock(&si_runtime->lock);
 }
@@ -107,13 +112,17 @@ static void update_softirq_runtime(unsigned long duration)
 static bool softirq_runtime_exceeded(void)
 {
 	struct softirq_runtime *si_runtime = this_cpu_ptr(&softirq_runtime);
+	unsigned long now = jiffies;
 
 	if ((unsigned int)si_throttle.runtime >= si_throttle.period)
 		return false;
 
 	raw_spin_lock(&si_runtime->lock);
-	if (!time_before(jiffies, si_runtime->expires))
+	if (!time_before(now, si_runtime->expires)) {
+		if (si_runtime->throttled)
+			kstat_incr_softirq_throttle(now - si_runtime->throttled_ts);
 		forward_softirq_expires(si_runtime);
+	}
 	raw_spin_unlock(&si_runtime->lock);
 	return si_runtime->throttled;
 }
@@ -140,6 +149,8 @@ static void softirq_throttle_update(void)
 	for_each_possible_cpu(cpu, &) {
 		si_runtime = per_cpu_ptr(&softirq_runtime, cpu);
 		raw_spin_lock(&si_runtime->lock);
+		if (si_runtime->throttled)
+			kstat_incr_softirq_throttle(jiffies - si_runtime->throttled_ts);
 		forward_softirq_expires(si_runtime);
 		raw_spin_unlock(&si_runtime->lock);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] softirq: Introduce softirq throttling
  2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
                   ` (2 preceding siblings ...)
  2022-04-06  2:52 ` [RFC 3/3] softirq: Introduce statistics about softirq throttling Liao Chang
@ 2022-04-06 14:54 ` Matthew Wilcox
  2022-04-07 12:47   ` Thomas Gleixner
  2022-04-07 10:57 ` Thomas Gleixner
  4 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2022-04-06 14:54 UTC (permalink / raw)
  To: Liao Chang
  Cc: mcgrof, keescook, yzaikin, tglx, clg, nitesh, edumazet, peterz,
	joshdon, masahiroy, nathan, akpm, vbabka, gustavoars, arnd,
	chris, dmitry.torokhov, linux, daniel, john.ogness, will, dave,
	frederic, linux-kernel, linux-fsdevel, heying24, guohanjun,
	weiyongjun1

On Wed, Apr 06, 2022 at 10:52:38AM +0800, Liao Chang wrote:
> Kernel check for pending softirqs periodically, they are performed in a
> few points of kernel code, such as irq_exit() and __local_bh_enable_ip(),
> softirqs that have been activated by a given CPU must be executed on the
> same CPU, this characteristic of softirq is always a potentially
> "dangerous" operation, because one CPU might be end up very busy while
> the other are most idle.
> 
> Above concern is proven in a networking user case: recenlty, we
> engineer find out the time used for connection re-establishment on
> kernel v5.10 is 300 times larger than v4.19, meanwhile, softirq
> monopolize almost 99% of CPU. This problem stem from that the connection
> between Sender and Receiver node get lost, the NIC driver on Sender node
> will keep raising NET_TX softirq before connection recovery. The system
> log show that most of softirq is performed from __local_bh_enable_ip(),
> since __local_bh_enable_ip is used widley in kernel code, it is very
> easy to run out most of CPU, and the user-mode application can't obtain
> enough CPU cycles to establish connection as soon as possible.

Shouldn't you fix that bug instead?  This seems like papering over the
bad effects of a bug and would make it harder to find bugs like this in
the future.  Essentially, it's the same as a screaming hardware interrupt,
except that it's a software interrupt, so we can fix the bug instead of
working around broken hardware.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] softirq: Introduce softirq throttling
  2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
                   ` (3 preceding siblings ...)
  2022-04-06 14:54 ` [RFC 0/3] softirq: Introduce " Matthew Wilcox
@ 2022-04-07 10:57 ` Thomas Gleixner
  4 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2022-04-07 10:57 UTC (permalink / raw)
  To: Liao Chang, mcgrof, keescook, yzaikin, liaochang1, clg, nitesh,
	edumazet, peterz, joshdon, masahiroy, nathan, akpm, vbabka,
	gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

On Wed, Apr 06 2022 at 10:52, Liao Chang wrote:

Why are you sending this twice within a few hours? See
Documentation/process/

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] softirq: Introduce softirq throttling
  2022-04-06 14:54 ` [RFC 0/3] softirq: Introduce " Matthew Wilcox
@ 2022-04-07 12:47   ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2022-04-07 12:47 UTC (permalink / raw)
  To: Matthew Wilcox, Liao Chang
  Cc: mcgrof, keescook, yzaikin, clg, nitesh, edumazet, peterz,
	joshdon, masahiroy, nathan, akpm, vbabka, gustavoars, arnd,
	chris, dmitry.torokhov, linux, daniel, john.ogness, will, dave,
	frederic, linux-kernel, linux-fsdevel, heying24, guohanjun,
	weiyongjun1

On Wed, Apr 06 2022 at 15:54, Matthew Wilcox wrote:
> On Wed, Apr 06, 2022 at 10:52:38AM +0800, Liao Chang wrote:
>> Kernel check for pending softirqs periodically, they are performed in a
>> few points of kernel code, such as irq_exit() and __local_bh_enable_ip(),
>> softirqs that have been activated by a given CPU must be executed on the
>> same CPU, this characteristic of softirq is always a potentially
>> "dangerous" operation, because one CPU might be end up very busy while
>> the other are most idle.
>> 
>> Above concern is proven in a networking user case: recenlty, we
>> engineer find out the time used for connection re-establishment on
>> kernel v5.10 is 300 times larger than v4.19, meanwhile, softirq
>> monopolize almost 99% of CPU. This problem stem from that the connection
>> between Sender and Receiver node get lost, the NIC driver on Sender node
>> will keep raising NET_TX softirq before connection recovery. The system
>> log show that most of softirq is performed from __local_bh_enable_ip(),
>> since __local_bh_enable_ip is used widley in kernel code, it is very
>> easy to run out most of CPU, and the user-mode application can't obtain
>> enough CPU cycles to establish connection as soon as possible.
>
> Shouldn't you fix that bug instead?  This seems like papering over the
> bad effects of a bug and would make it harder to find bugs like this in
> the future.  Essentially, it's the same as a screaming hardware interrupt,
> except that it's a software interrupt, so we can fix the bug instead of
> working around broken hardware.

It's not necessarily broken hardware. It's a fundamental issue of our
softirq processing magic which can happen in those contexts:

   1) On return from interrupt

   2) In local_bh_enable()

   3) In ksoftirqd

We have heuristics in place which delegate processing to ksoftirqd,
which brings softirq processing under scheduler control to some extent,
but those heuristics are rather easy to evade.

Delegation to ksoftirqd happens when the runtime of the __do_softirq()
loop exceeds a threshold. But if that is not exceeded then you still can
get into a situation where softirq processing eats up a large quantity
of CPU time in #1 and #2 which is the real problem because it prevents
the scheduler from applying fairness.

That's a known issue and attempts to fix that are popping up on a
regular base. There are several issues here:

  1) The runtime check in __do_softirq() is jiffies based and depending
     on CONFIG_HZ it's easy to stay under the threshold for one
     invocation, but still eat a large amount of CPU time.

     Also the runtime check happens at the end of the loop, which means
     that if a single softirq callback runs too long we still process
     all other pending ones.

  2) The decision to process softirqs directly on return from interrupt
     or in local_bh_enable() is error prone.

     The logic is:

         if (!ksoftirq_running() ||
             local_softirq_pending() & (SOFTIRQ_HI | SOFTIRQ_TASKLET))
         	process_direct();

     The reason for the HI/TASKLET exception is documented here:

         3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")

     But this is nasty because tasklets are widely used in networking,
     crypto and quite some of them are self rearming when they take too
     long. See mlx5_cq_tasklet_cb() as one example, which also uses
     jiffies for time limiting...

     With the HI/TASKLET exception this means that there is no
     delegation to ksoftirqd simply because on the next return from
     interrupt or the next local_bh_enable() in task context softirqs
     are processed. And this processes _all_ pending bits not only the
     HI/TASKLET ones...

The approach of just not running softirqs at all via a throttle
mechanism as proposed with these patches here, is definitely wrong and
going nowhere.

The proper solution for load accounting is a moving average with
exponential decay based on sched_clock() and not on jiffies. That gives
a reasonable decision to enforce ksoftirqd processing, but of course
that does neither solve the tasklet issues nor any of the other problems
vs. softirqs at all.

We've tried splitting ksoftirqd into different threads a couple of years
ago in RT, but that turned out to be problematic in some cases. Frederic
did some experiments to make local_bh_disable() take a mask argument to
disable only particular softirqs, but that ran into a dead end and is
problematic because quite some code, esp. networking relies on multiple
softirqs being disabled.

Softirqs are semantically ill defined and that's known for a very long
time, but of course they are conveniant and with a few hacks piled on
top to address the most urgent horrors they work by some definition of
work. IOW, we are accumulating technical depth with a fast pace.

TBH, I have no real good plan how to address this proper, but it's about
time to tackle this in a concerted effort.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq
  2022-04-06  2:27 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
@ 2022-04-06 22:02   ` Randy Dunlap
  0 siblings, 0 replies; 9+ messages in thread
From: Randy Dunlap @ 2022-04-06 22:02 UTC (permalink / raw)
  To: Liao Chang, mcgrof, keescook, yzaikin, tglx, nitesh, edumazet,
	clg, tannerlove, peterz, joshdon, masahiroy, nathan, vbabka,
	akpm, gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

Hi--

On 4/5/22 19:27, Liao Chang wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..a63ebc88a199 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2393,3 +2393,13 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
>  # <asm/syscall_wrapper.h>.
>  config ARCH_HAS_SYSCALL_WRAPPER
>  	def_bool n
> +
> +config SOFTIRQ_THROTTLE
> +	bool "Softirq Throttling Feature"
> +	help
> +	  Allow to allocate bandwidth for use by softirq handling. This
> +	  saftguard machanism is known as softirq throttling and is controlled

typos:
	  safeguard mechanism

> +	  by two parameters in the /proc/ file system:
> +
> +	  /proc/sysctl/kernel/softirq_period_ms
> +	  /proc/sysctl/kernel/softirq_runtime_ms

These should be documented in...
I guess    Documentation/admin-guide/sysctl/kernel.rst

thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq
  2022-04-06  2:27 Liao Chang
@ 2022-04-06  2:27 ` Liao Chang
  2022-04-06 22:02   ` Randy Dunlap
  0 siblings, 1 reply; 9+ messages in thread
From: Liao Chang @ 2022-04-06  2:27 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, liaochang1, tglx, nitesh, edumazet,
	clg, tannerlove, peterz, joshdon, masahiroy, nathan, vbabka,
	akpm, gustavoars, arnd, chris, dmitry.torokhov, linux, daniel,
	john.ogness, will, dave, frederic
  Cc: linux-kernel, linux-fsdevel, heying24, guohanjun, weiyongjun1

Although kernel merely allow __do_softirq() run for 2ms at most, it does
not control the total running time of softirqs on given CPU. In order to
prevent softirqs from using up all CPU bandwidth and cause other task
starved, Sofitrq Throttling mechanism introduces two parameters in the
/proc file system:

/proc/sys/kernel/sofitrq_period_ms
  Defines the period in ms(millisecond) to be considered as 100% of CPU
  bandwidth, the default value is 1,000 ms(1second). Changes to the
  value of the period must be very well thought out, as too long or too
  short are beyond one's expectation.

/proc/sys/kernel/softirq_runtime_ms
  Define the bandwidth available to softirqs on each CPU, the default
  values is 950 ms(0.95 second) or, in other words, 95% of the CPU
  bandwidth. Setting this value to -1 means that softirqs might use up
  to 100% CPU cycles.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 include/linux/interrupt.h |  7 ++++
 init/Kconfig              | 10 ++++++
 kernel/softirq.c          | 74 +++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c           | 16 +++++++++
 4 files changed, 107 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9367f1cb2e3c..de6973bf72e5 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -605,6 +605,13 @@ extern void __raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+extern unsigned int sysctl_softirq_period_ms;
+extern int sysctl_softirq_runtime_ms;
+int softirq_throttle_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos);
+#endif
+
 DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
 
 static inline struct task_struct *this_cpu_ksoftirqd(void)
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..a63ebc88a199 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2393,3 +2393,13 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+config SOFTIRQ_THROTTLE
+	bool "Softirq Throttling Feature"
+	help
+	  Allow to allocate bandwidth for use by softirq handling. This
+	  saftguard machanism is known as softirq throttling and is controlled
+	  by two parameters in the /proc/ file system:
+
+	  /proc/sysctl/kernel/softirq_period_ms
+	  /proc/sysctl/kernel/softirq_runtime_ms
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 41f470929e99..8aac9e2631fd 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -65,6 +65,76 @@ const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"TASKLET", "SCHED", "HRTIMER", "RCU"
 };
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+unsigned int sysctl_softirq_period_ms = 1000;
+int sysctl_softirq_runtime_ms = 950;
+
+struct softirq_throttle {
+	unsigned int period;
+	unsigned int runtime;
+	raw_spinlock_t lock;
+} si_throttle;
+
+static int softirq_throttle_validate(void)
+{
+	if (((int)sysctl_softirq_period_ms <= 0) ||
+		((sysctl_softirq_runtime_ms != -1) &&
+		 ((unsigned int)sysctl_softirq_runtime_ms > sysctl_softirq_period_ms)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void softirq_throttle_update(void)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&si_throttle.lock, flags);
+	si_throttle.period = sysctl_softirq_period_ms;
+	si_throttle.runtime = sysctl_softirq_runtime_ms;
+	raw_spin_unlock_irqrestore(&si_throttle.lock, flags);
+}
+
+int softirq_throttle_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos)
+{
+	unsigned int old_period, old_runtime;
+	static DEFINE_MUTEX(mutex);
+	int ret;
+
+	mutex_lock(&mutex);
+	old_period = sysctl_softirq_period_ms;
+	old_runtime = sysctl_softirq_runtime_ms;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret)
+		goto undo;
+	if (!write)
+		goto done;
+
+	ret = softirq_throttle_validate();
+	if (ret)
+		goto undo;
+
+	softirq_throttle_update();
+	goto done;
+
+undo:
+	sysctl_softirq_period_ms = old_period;
+	sysctl_softirq_runtime_ms = old_runtime;
+done:
+	mutex_unlock(&mutex);
+	return ret;
+}
+
+static void softirq_throttle_init(void)
+{
+	si_throttle.period = sysctl_softirq_period_ms;
+	si_throttle.runtime = sysctl_softirq_runtime_ms;
+	raw_spin_lock_init(&si_throttle.lock);
+}
+#endif
+
 /*
  * we cannot loop indefinitely here to avoid userspace starvation,
  * but we also don't want to introduce a worst case 1/HZ latency
@@ -894,6 +964,10 @@ void __init softirq_init(void)
 {
 	int cpu;
 
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	softirq_throttle_init();
+#endif
+
 	for_each_possible_cpu(cpu) {
 		per_cpu(tasklet_vec, cpu).tail =
 			&per_cpu(tasklet_vec, cpu).head;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5ae443b2882e..e5a9ad391cca 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1771,6 +1771,22 @@ static struct ctl_table kern_table[] = {
 		.extra1		= SYSCTL_ONE,
 	},
 #endif
+#ifdef CONFIG_SOFTIRQ_THROTTLE
+	{
+		.procname	= "softirq_period_ms",
+		.data		= &sysctl_softirq_period_ms,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= softirq_throttle_handler,
+	},
+	{
+		.procname	= "softirq_runtime_ms",
+		.data		= &sysctl_softirq_runtime_ms,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= softirq_throttle_handler,
+	},
+#endif
 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
 	{
 		.procname	= "sched_energy_aware",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-07 12:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06  2:52 [RFC 0/3] softirq: Introduce softirq throttling Liao Chang
2022-04-06  2:52 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
2022-04-06  2:52 ` [RFC 2/3] softirq: Do throttling when softirqs use up its bandwidth Liao Chang
2022-04-06  2:52 ` [RFC 3/3] softirq: Introduce statistics about softirq throttling Liao Chang
2022-04-06 14:54 ` [RFC 0/3] softirq: Introduce " Matthew Wilcox
2022-04-07 12:47   ` Thomas Gleixner
2022-04-07 10:57 ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2022-04-06  2:27 Liao Chang
2022-04-06  2:27 ` [RFC 1/3] softirq: Add two parameters to control CPU bandwidth for use by softirq Liao Chang
2022-04-06 22:02   ` Randy Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.