All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6] Multi-thread per-cpu ksoftirqd
@ 2018-01-18 16:12 Dmitry Safonov
  2018-01-18 16:12 ` [RFC 1/6] softirq: Add softirq_groups boot parameter Dmitry Safonov
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

Another attempt to solve softirq deferring problems.
There are at least two problems, AFAIK:
o deferring one softirq to ksoftirqd results in latencies for other
  (different type) softirqs by the reason of ksoftirqd_running()
  decision for deferring/servicing.
o The logic in __do_softirq() that checks if (pending) after 2ms of
  processing doesn't work on some machines during i.e. UDP storm.

So, what's done here in attempt to improve this is:
- added boot param to separate softirqs in deffer-groups
- per each softirq-group there is a ksoftirqd (per-cpu also)

The last two patches might be just a brain fart as I tried to improve
the metric on which the decision to defer is based.
I measure the time spent to serve each softirq and account that time
to ksoftirqd thread of that softirq-group. After that the decision
to serve/defer a softirq is based on the comparison:
(current->vruntime < ksoftirqd->vruntime)
Ugh, time measures and updating ksoftirqd cpu time each tick might be
costly.. And it looks like it doesn't work as expected: a new task is
being started with normalized vruntime (min_vruntime), which is lower
than ksoftirqd's. And time spent on servicing softirqs are still bigger
than any running task.
Anyway, sending this as RFC, may be some one will like the approach
(or suggests some other ideas).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> 
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> 
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radu Rendec <rrendec@arista.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>

Dmitry Safonov (6):
  softirq: Add softirq_groups boot parameter
  softirq: Introduce mask for __do_softirq()
  softirq: Add reverse group-to-softirq map
  softirq: Run per-group per-cpu ksoftirqd thread
  softirq: Add time accounting per-softirq type
  softirq/sched: Account si cpu time to ksoftirqd(s)

 Documentation/admin-guide/kernel-parameters.txt |  16 ++
 include/linux/hardirq.h                         |   2 +-
 include/linux/interrupt.h                       |  26 +-
 include/linux/vtime.h                           |  10 +-
 init/Kconfig                                    |  10 +
 kernel/sched/cputime.c                          |  60 +++-
 kernel/sched/fair.c                             |  38 +++
 kernel/sched/sched.h                            |  20 ++
 kernel/softirq.c                                | 362 ++++++++++++++++++++----
 net/ipv4/tcp_output.c                           |   2 +-
 10 files changed, 464 insertions(+), 82 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/6] softirq: Add softirq_groups boot parameter
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  2018-01-18 16:12 ` [RFC 2/6] softirq: Introduce mask for __do_softirq() Dmitry Safonov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

ksoftirqd thread allows to defer softirqs if the system is under storm.
While it prevents userspace from cpu-time starving, it increases
latencies for other softirqs (that are not raised under storm).

As creation of one ksoftirqd thread per-each-softirq-per-cpu will be
insane on a huge machines, separate softirqs by groups.
It will allow to defer softirqs of one group and continue servicing
from other. That means that under a storm of one group's softirqs,
softirqs from the other group will be serviced as they come and will
not have latency issues.
For each softirq group will be created a per-cpu kthread which
will process deferred softirqs of the group.

The parameter will allow an admin define how many ksoftirqd threads
will be created on each cpu and which softirqs have the same
deferring group.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 +++++
 include/linux/interrupt.h                       |  1 +
 kernel/softirq.c                                | 87 +++++++++++++++++++++++++
 3 files changed, 104 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 46b26bfee27b..d5c44703a299 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3940,6 +3940,22 @@
 			Format: <integer>
 			Default: -1 (no limit)
 
+	softirq_groups=
+			[KNL] The count and contents of softirq groups.
+			Format:[group1],[group2],[groupN]
+			where group is <softirq1>/<softirq2>/<softirqM>
+			E.g: softirq_groups=HI/TIMER/HRTIMER,NET_TX/NET_RX,BLOCK
+
+			Defines how many ksoftirqd threads create *per-cpu*.
+			For each group one ksoftirqd thread is created.
+			The total number of threads created is
+			(NR_CPUS * NR_SOFTIRQ_GROUPS).
+			Admin can define one softirq in different softirq
+			groups. Softirqs those have no group defined will
+			be put in default softirq_group. If all softirqs
+			have been placed into groups default group is not
+			created.
+
 	softlockup_panic=
 			[KNL] Should the soft-lockup detector generate panics.
 			Format: <integer>
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 69c238210325..5bb6b435f0bb 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -486,6 +486,7 @@ extern const char * const softirq_to_name[NR_SOFTIRQS];
 struct softirq_action
 {
 	void	(*action)(struct softirq_action *);
+	u32	group_mask;
 };
 
 asmlinkage void do_softirq(void);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 2f5e87f1bae2..c9aecdd57107 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -54,6 +54,7 @@ EXPORT_SYMBOL(irq_stat);
 #endif
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
+static unsigned __initdata nr_softirq_groups = 0;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
 
@@ -635,10 +636,25 @@ void tasklet_hrtimer_init(struct tasklet_hrtimer *ttimer,
 }
 EXPORT_SYMBOL_GPL(tasklet_hrtimer_init);
 
+static void __init setup_default_softirq_group(unsigned nr)
+{
+	unsigned i;
+
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		u32 *gr_mask = &softirq_vec[i].group_mask;
+
+		if (!*gr_mask)
+			*gr_mask |= (1 << nr);
+		pr_debug("softirq-%s: %#x\n", softirq_to_name[i], *gr_mask);
+	}
+}
+
 void __init softirq_init(void)
 {
 	int cpu;
 
+	setup_default_softirq_group(nr_softirq_groups++);
+
 	for_each_possible_cpu(cpu) {
 		per_cpu(tasklet_vec, cpu).tail =
 			&per_cpu(tasklet_vec, cpu).head;
@@ -750,6 +766,77 @@ static __init int spawn_ksoftirqd(void)
 }
 early_initcall(spawn_ksoftirqd);
 
+static __init __u32 parse_softirq_name(char *name, size_t len)
+{
+	__u32 i;
+
+	for (i = 0; i < NR_SOFTIRQS; i++)
+		if (strncmp(name, softirq_to_name[i], len) == 0)
+			return i;
+
+	pr_warn("softirq: Ignored `%.*s' in softirq group", (int)len, name);
+
+	return NR_SOFTIRQS;
+}
+
+static bool __init parse_softirq_group(char *start, char *end, u32 group)
+{
+	char *next_softirq = strchrnul(start, '/');
+	bool is_empty = true;
+	u32 softirq_nr;
+
+	if (next_softirq == start)
+		return !is_empty;
+
+	do {
+		next_softirq = min(next_softirq, end);
+
+		softirq_nr = parse_softirq_name(start, next_softirq - start);
+		if (softirq_nr < NR_SOFTIRQS) {
+			softirq_vec[softirq_nr].group_mask |= (1 << group);
+			is_empty = false;
+		}
+
+		if (next_softirq == end)
+			break;
+
+		start = next_softirq + 1;
+		next_softirq = strchrnul(start, '/');
+	} while (1);
+
+	return !is_empty;
+}
+
+/*
+ * Format e.g.:
+ * softirq_groups=HI/TIMER/HRTIMER,NET_TX/NET_RX,BLOCK,TASKLET
+ * Admin *can* define one softirq in different groups.
+ * Softirqs those have no group defined will be put in default softirq_group.
+ * If all softirqs have been placed into groups, default group is not created.
+ */
+static int __init setup_softirq_groups(char *s)
+{
+	char *next_group = strchrnul(s, ',');
+	unsigned i = 0;
+
+	do {
+		/* Skip empty softirq groups. */
+		if (parse_softirq_group(s, next_group, i))
+			i++;
+
+		if (*next_group == '\0')
+			break;
+
+		s = next_group + 1;
+		next_group = strchrnul(s, ',');
+	} while(i < 31); /* if there is default softirq group it's nr 31 */
+
+	nr_softirq_groups = i;
+
+	return 0;
+}
+early_param("softirq_groups", setup_softirq_groups);
+
 /*
  * [ These __weak aliases are kept in a separate compilation unit, so that
  *   GCC does not inline them incorrectly. ]
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/6] softirq: Introduce mask for __do_softirq()
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
  2018-01-18 16:12 ` [RFC 1/6] softirq: Add softirq_groups boot parameter Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  2018-01-18 16:12 ` [RFC 3/6] softirq: Add reverse group-to-softirq map Dmitry Safonov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

__do_softirq() serves all pending softirqs.
As we need to separate softirqs by different groups, we need to serve
softirqs from one group and deffer softirqs from the other.
Change __do_softirq() so it'll have a mask of softirqs it needs to
serve instead of servicing all pending softirqs.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/interrupt.h |  8 ++++----
 kernel/softirq.c          | 27 ++++++++++++++-------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5bb6b435f0bb..2ea09896bd6e 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -490,14 +490,14 @@ struct softirq_action
 };
 
 asmlinkage void do_softirq(void);
-asmlinkage void __do_softirq(void);
+asmlinkage void __do_softirq(__u32 mask);
 
 #ifdef __ARCH_HAS_DO_SOFTIRQ
-void do_softirq_own_stack(void);
+void do_softirq_own_stack(__u32 mask);
 #else
-static inline void do_softirq_own_stack(void)
+static inline void do_softirq_own_stack(__u32 mask)
 {
-	__do_softirq();
+	__do_softirq(mask);
 }
 #endif
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c9aecdd57107..ca8c3db4570d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -240,7 +240,7 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
-asmlinkage __visible void __softirq_entry __do_softirq(void)
+asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	unsigned long old_flags = current->flags;
@@ -265,7 +265,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	set_softirq_pending(pending & ~mask);
+	pending &= mask;
 
 	local_irq_enable();
 
@@ -299,7 +300,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-	if (pending) {
+	if (pending & mask) {
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
@@ -316,18 +317,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 asmlinkage __visible void do_softirq(void)
 {
-	__u32 pending;
+	__u32 pending = local_softirq_pending();
 	unsigned long flags;
 
-	if (in_interrupt())
+	if (in_interrupt() || !pending)
 		return;
 
 	local_irq_save(flags);
 
-	pending = local_softirq_pending();
-
-	if (pending && !ksoftirqd_running())
-		do_softirq_own_stack();
+	if (!ksoftirqd_running())
+		do_softirq_own_stack(pending);
 
 	local_irq_restore(flags);
 }
@@ -353,7 +352,9 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (ksoftirqd_running())
+	__u32 pending = local_softirq_pending();
+
+	if (!pending || !ksoftirqd_running())
 		return;
 
 	if (!force_irqthreads) {
@@ -363,14 +364,14 @@ static inline void invoke_softirq(void)
 		 * it is the irq stack, because it should be near empty
 		 * at this stage.
 		 */
-		__do_softirq();
+		__do_softirq(pending);
 #else
 		/*
 		 * Otherwise, irq_exit() is called on the task stack that can
 		 * be potentially deep already. So call softirq in its own stack
 		 * to prevent from any overrun.
 		 */
-		do_softirq_own_stack();
+		do_softirq_own_stack(pending);
 #endif
 	} else {
 		wakeup_softirqd();
@@ -679,7 +680,7 @@ static void run_ksoftirqd(unsigned int cpu)
 		 * We can safely run softirq on inline stack, as we are not deep
 		 * in the task stack here.
 		 */
-		__do_softirq();
+		__do_softirq(~0);
 		local_irq_enable();
 		cond_resched_rcu_qs();
 		return;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 3/6] softirq: Add reverse group-to-softirq map
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
  2018-01-18 16:12 ` [RFC 1/6] softirq: Add softirq_groups boot parameter Dmitry Safonov
  2018-01-18 16:12 ` [RFC 2/6] softirq: Introduce mask for __do_softirq() Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  2018-01-18 16:12 ` [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread Dmitry Safonov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

For faster operation with pending mask:
pending &= group_to_softirq[group_nr];

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 kernel/softirq.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ca8c3db4570d..7de5791c08f9 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -54,6 +54,7 @@ EXPORT_SYMBOL(irq_stat);
 #endif
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
+static unsigned group_to_softirqs[sizeof(softirq_vec[0].group_mask)] __cacheline_aligned_in_smp;
 static unsigned __initdata nr_softirq_groups = 0;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
@@ -650,11 +651,28 @@ static void __init setup_default_softirq_group(unsigned nr)
 	}
 }
 
+static void __init fill_group_to_softirq_maps(void)
+{
+	unsigned i;
+
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		u32 mask = softirq_vec[i].group_mask;
+		unsigned j, group = 0;
+
+		while ((j = ffs(mask))) {
+			group += j - 1;
+			group_to_softirqs[group] |= (1 << i);
+			mask >>= j;
+		}
+	}
+}
+
 void __init softirq_init(void)
 {
 	int cpu;
 
 	setup_default_softirq_group(nr_softirq_groups++);
+	fill_group_to_softirq_maps();
 
 	for_each_possible_cpu(cpu) {
 		per_cpu(tasklet_vec, cpu).tail =
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
                   ` (2 preceding siblings ...)
  2018-01-18 16:12 ` [RFC 3/6] softirq: Add reverse group-to-softirq map Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  2018-01-18 17:00   ` Mike Galbraith
  2018-01-18 16:12 ` [RFC 5/6] softirq: Add time accounting per-softirq type Dmitry Safonov
  2018-01-18 16:12 ` [RFC 6/6] softirq/sched: Account si cpu time to ksoftirqd(s) Dmitry Safonov
  5 siblings, 1 reply; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

Running one ksoftirqd per-cpu allows to defer processing softirqs under
storm. But having only one ksoftirqd thread for that make it worse for
other kinds of softirqs. As we check if (ksoftirqd_running()) and defer
all softirqs till ksoftirqd time-slice it introduces latencies.
While it's acceptable for softirqs under storm, the other softirqs are
impeded.

For each softirq-group create ksoftirqd thread which will serve deferred
softirqs of the group. Softirqs of other groups will be served as they
come.

Without kernel parameter it'll work as it was previously: creating
default softirq-group which includes all softirqs and running only one
ksoftirqd thread per-cpu.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/interrupt.h |  16 +++-
 kernel/sched/cputime.c    |  27 ++++---
 kernel/softirq.c          | 187 ++++++++++++++++++++++++++++++++++++----------
 net/ipv4/tcp_output.c     |   2 +-
 4 files changed, 177 insertions(+), 55 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 2ea09896bd6e..17e1a04445fa 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -508,11 +508,21 @@ extern void __raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 
-DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+extern struct task_struct *__percpu **ksoftirqd;
+extern unsigned nr_softirq_groups;
 
-static inline struct task_struct *this_cpu_ksoftirqd(void)
+extern bool servicing_softirq(unsigned nr);
+static inline bool current_is_ksoftirqd(void)
 {
-	return this_cpu_read(ksoftirqd);
+	unsigned i;
+
+	if (!ksoftirqd)
+		return false;
+
+	for (i = 0; i < nr_softirq_groups; i++)
+		if (*this_cpu_ptr(ksoftirqd[i]) == current)
+			return true;
+	return false;
 }
 
 /* Tasklets --- multithreaded analogue of BHs.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index bac6ac9a4ec7..faacba00a153 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,6 +46,18 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
 	u64_stats_update_end(&irqtime->sync);
 }
 
+static void irqtime_account_softirq(struct irqtime *irqtime, s64 delta)
+{
+	/*
+	 * We do not account for softirq time from ksoftirqd here.
+	 * We want to continue accounting softirq time to ksoftirqd thread
+	 * in that case, so as not to confuse scheduler with a special task
+	 * that do not consume any time, but still wants to run.
+	 */
+	if (!current_is_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+}
+
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
@@ -63,16 +75,11 @@ void irqtime_account_irq(struct task_struct *curr)
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
-	/*
-	 * We do not account for softirq time from ksoftirqd here.
-	 * We want to continue accounting softirq time to ksoftirqd thread
-	 * in that case, so as not to confuse scheduler with a special task
-	 * that do not consume any time, but still wants to run.
-	 */
-	if (hardirq_count())
+	if (hardirq_count()) {
 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
-		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+	} else if (in_serving_softirq()) {
+		irqtime_account_softirq(irqtime, delta);
+	}
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
@@ -375,7 +382,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 
 	cputime -= other;
 
-	if (this_cpu_ksoftirqd() == p) {
+	if (current_is_ksoftirqd()) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
 		 * So, we have to handle it separately here.
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 7de5791c08f9..fdde3788afba 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -55,28 +55,56 @@ EXPORT_SYMBOL(irq_stat);
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 static unsigned group_to_softirqs[sizeof(softirq_vec[0].group_mask)] __cacheline_aligned_in_smp;
-static unsigned __initdata nr_softirq_groups = 0;
-
-DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+struct task_struct *__percpu **ksoftirqd = 0;
+unsigned nr_softirq_groups = 0;
 
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
 	"TASKLET", "SCHED", "HRTIMER", "RCU"
 };
 
+bool servicing_softirq(unsigned nr)
+{
+	u32 group_mask = softirq_vec[nr].group_mask;
+	unsigned i, group = 0;
+
+	if (!ksoftirqd)
+		return false;
+
+	while ((i = ffs(group_mask))) {
+		group += i - 1;
+		if (*this_cpu_ptr(ksoftirqd[group]) == current)
+			return true;
+		group_mask >>= i;
+	}
+
+	return false;
+}
+
 /*
  * we cannot loop indefinitely here to avoid userspace starvation,
  * but we also don't want to introduce a worst case 1/HZ latency
  * to the pending events, so lets the scheduler to balance
  * the softirq load for us.
  */
-static void wakeup_softirqd(void)
+static void wakeup_softirqd(u32 softirq_mask)
 {
-	/* Interrupts are disabled: no need to stop preemption */
-	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
+	unsigned i;
+
+	if (!ksoftirqd)
+		return;
 
-	if (tsk && tsk->state != TASK_RUNNING)
-		wake_up_process(tsk);
+	for (i = 0; i < nr_softirq_groups; i++) {
+		if (softirq_mask & group_to_softirqs[i]) {
+			struct task_struct *tsk;
+
+			/* Interrupts disabled: no need to stop preemption */
+			tsk = *this_cpu_ptr(ksoftirqd[i]);
+			if (tsk && tsk->state != TASK_RUNNING)
+				wake_up_process(tsk);
+		}
+
+	}
 }
 
 /*
@@ -85,9 +113,27 @@ static void wakeup_softirqd(void)
  */
 static bool ksoftirqd_running(void)
 {
-	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
+	/* We rely that there are pending softirqs */
+	__u32 pending = local_softirq_pending();
+	unsigned i;
 
-	return tsk && (tsk->state == TASK_RUNNING);
+	if (!ksoftirqd)
+		return false;
+
+	for (i = 0; i < nr_softirq_groups && pending; i++) {
+		/* Interrupts are disabled: no need to stop preemption */
+		struct task_struct *tsk = *this_cpu_ptr(ksoftirqd[i]);
+
+		if (!(pending & group_to_softirqs[i]))
+			continue;
+
+		if (!tsk || tsk->state != TASK_RUNNING)
+			continue;
+
+		pending &= ~group_to_softirqs[i];
+	}
+
+	return !pending;
 }
 
 /*
@@ -306,7 +352,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 		    --max_restart)
 			goto restart;
 
-		wakeup_softirqd();
+		/* XXX: not fair ATM, next patches will fix that */
+		wakeup_softirqd(pending);
 	}
 
 	lockdep_softirq_end(in_hardirq);
@@ -375,7 +422,7 @@ static inline void invoke_softirq(void)
 		do_softirq_own_stack(pending);
 #endif
 	} else {
-		wakeup_softirqd();
+		wakeup_softirqd(local_softirq_pending());
 	}
 }
 
@@ -429,7 +476,7 @@ inline void raise_softirq_irqoff(unsigned int nr)
 	 * schedule the softirq soon.
 	 */
 	if (!in_interrupt())
-		wakeup_softirqd();
+		wakeup_softirqd(local_softirq_pending());
 }
 
 void raise_softirq(unsigned int nr)
@@ -685,27 +732,6 @@ void __init softirq_init(void)
 	open_softirq(HI_SOFTIRQ, tasklet_hi_action);
 }
 
-static int ksoftirqd_should_run(unsigned int cpu)
-{
-	return local_softirq_pending();
-}
-
-static void run_ksoftirqd(unsigned int cpu)
-{
-	local_irq_disable();
-	if (local_softirq_pending()) {
-		/*
-		 * We can safely run softirq on inline stack, as we are not deep
-		 * in the task stack here.
-		 */
-		__do_softirq(~0);
-		local_irq_enable();
-		cond_resched_rcu_qs();
-		return;
-	}
-	local_irq_enable();
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 /*
  * tasklet_kill_immediate is called to remove a tasklet which can already be
@@ -768,18 +794,97 @@ static int takeover_tasklets(unsigned int cpu)
 #define takeover_tasklets	NULL
 #endif /* CONFIG_HOTPLUG_CPU */
 
-static struct smp_hotplug_thread softirq_threads = {
-	.store			= &ksoftirqd,
-	.thread_should_run	= ksoftirqd_should_run,
-	.thread_fn		= run_ksoftirqd,
-	.thread_comm		= "ksoftirqd/%u",
-};
+static int ksoftirqd_should_run(unsigned int cpu)
+{
+	__u32 pending = local_softirq_pending();
+	unsigned group;
+
+	if (!ksoftirqd)
+		return 0;
+
+	for (group = 0; group < nr_softirq_groups; group++)
+		if (*this_cpu_ptr(ksoftirqd[group]) == current)
+			break;
+
+	if (WARN_ON_ONCE(group == nr_softirq_groups))
+		return 0;
+
+	return pending & group_to_softirqs[group];
+}
+
+static void run_ksoftirqd(unsigned int cpu)
+{
+	unsigned group;
+
+	for (group = 0; group < nr_softirq_groups; group++)
+		if (*this_cpu_ptr(ksoftirqd[group]) == current)
+			break;
+
+	local_irq_disable();
+	if (local_softirq_pending()) {
+		/*
+		 * We can safely run softirq on inline stack, as we are not deep
+		 * in the task stack here.
+		 */
+		__do_softirq(group_to_softirqs[group]);
+		local_irq_enable();
+		cond_resched_rcu_qs();
+		return;
+	}
+	local_irq_enable();
+}
+
+static __init
+int register_ksoftirqd_group(unsigned nr, struct task_struct *__percpu **tsk)
+{
+	struct smp_hotplug_thread *thread;
+	char *thread_comm;
+
+	thread = kzalloc(sizeof(struct smp_hotplug_thread), GFP_KERNEL);
+	if (WARN_ON_ONCE(!thread))
+		return 1;
+
+	thread_comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL);
+	if (WARN_ON_ONCE(!thread_comm))
+		return 1;
+
+	*tsk = alloc_percpu(struct task_struct*);
+	if (WARN_ON(!*tsk))
+		return 1;
+
+	snprintf(thread_comm, TASK_COMM_LEN, "ksoftirqd-g%d/%%u", nr);
+
+	thread->thread_comm		= thread_comm;
+	thread->store			= *tsk;
+	thread->thread_should_run	= ksoftirqd_should_run;
+	thread->thread_fn		= run_ksoftirqd;
+
+	if (WARN_ON_ONCE(smpboot_register_percpu_thread(thread)))
+		return 1;
+
+	return 0;
+}
 
 static __init int spawn_ksoftirqd(void)
 {
+	size_t k_groups_sz = sizeof(struct task_struct *__percpu *);
+	struct task_struct *__percpu **tmp;
+	unsigned group;
+
 	cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL,
 				  takeover_tasklets);
-	BUG_ON(smpboot_register_percpu_thread(&softirq_threads));
+
+	tmp = kmalloc_array(nr_softirq_groups, k_groups_sz, GFP_KERNEL);
+	if (WARN_ON(!tmp))
+		return 1;
+
+	for (group = 0; group < nr_softirq_groups; group++) {
+		if (register_ksoftirqd_group(group, &tmp[group]))
+			return 1;
+	}
+
+	smp_wmb();
+	ksoftirqd = tmp;
 
 	return 0;
 }
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a4d214c7b506..bb403be3987e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -919,7 +919,7 @@ void tcp_wfree(struct sk_buff *skb)
 	 * - chance for incoming ACK (processed by another cpu maybe)
 	 *   to migrate this flow (skb->ooo_okay will be eventually set)
 	 */
-	if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
+	if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && servicing_softirq(NET_TX_SOFTIRQ))
 		goto out;
 
 	for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 5/6] softirq: Add time accounting per-softirq type
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
                   ` (3 preceding siblings ...)
  2018-01-18 16:12 ` [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  2018-01-18 16:12 ` [RFC 6/6] softirq/sched: Account si cpu time to ksoftirqd(s) Dmitry Safonov
  5 siblings, 0 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

Warning: not a merge-ready in any sense

As discussed, softirqs will be deferred or processed right away
according to how much time this type of softirq spent on CPU.
This will improve e.g. handling of net-rx softirqs during packet storm
and also give fair slice of cpu time for a userspace process to
serve incoming packages.

The time-based decision will work better than check of re-raised softirq
after processing previous one. Because the check might not work even
under softirq storm if softirqs are raised too slowly (e.g. because of hw).

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/hardirq.h |  2 +-
 include/linux/vtime.h   | 10 +++++-----
 init/Kconfig            | 10 ++++++++++
 kernel/sched/cputime.c  | 41 ++++++++++++++++++++++++++++++++++-------
 kernel/sched/sched.h    |  1 +
 kernel/softirq.c        | 16 ++++++++++++++--
 6 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 0fbbcdf0c178..8f42581ef38b 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -51,7 +51,7 @@ extern void irq_enter(void);
 #define __irq_exit()					\
 	do {						\
 		trace_hardirq_exit();			\
-		account_irq_exit_time(current);		\
+		account_irq_exit_time(current, 0);	\
 		preempt_count_sub(HARDIRQ_OFFSET);	\
 	} while (0)
 
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index a26ed10a4eac..ebe140e2a84f 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -97,21 +97,21 @@ static inline void vtime_flush(struct task_struct *tsk) { }
 
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, u64 *si_times);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, u64 *si_times) { }
 #endif
 
 static inline void account_irq_enter_time(struct task_struct *tsk)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, 0);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, u64 *si_times)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, si_times);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/init/Kconfig b/init/Kconfig
index a9a2e2c86671..9d09aa753299 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -387,6 +387,16 @@ config IRQ_TIME_ACCOUNTING
 
 	  If in doubt, say N here.
 
+config FAIR_SOFTIRQ_SCHEDULE
+	bool "Fair schedule softirqs on process context"
+	depends on IRQ_TIME_ACCOUNTING
+	default n
+	help
+	  Account softirq CPU time per softirq-type. Process pending softirq
+	  on current context only if it'll be fair for the task.
+
+	  If in doubt, say N here.
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	depends on MULTIUSER
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index faacba00a153..4da1df879c8a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -34,35 +34,61 @@ void disable_sched_clock_irqtime(void)
 	sched_clock_irqtime = 0;
 }
 
-static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+static void __irqtime_account_delta(struct irqtime *irqtime, u64 delta,
 				  enum cpu_usage_stat idx)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
-	u64_stats_update_begin(&irqtime->sync);
 	cpustat[idx] += delta;
 	irqtime->total += delta;
 	irqtime->tick_delta += delta;
+}
+
+
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+				  enum cpu_usage_stat idx)
+{
+	u64_stats_update_begin(&irqtime->sync);
+	__irqtime_account_delta(irqtime, delta, idx);
 	u64_stats_update_end(&irqtime->sync);
 }
 
-static void irqtime_account_softirq(struct irqtime *irqtime, s64 delta)
+static void irqtime_account_softirq(struct irqtime *irqtime, u64 *si_times, s64 delta)
 {
+	unsigned i;
+
+	u64_stats_update_begin(&irqtime->sync);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (!current_is_ksoftirqd())
-		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+	if (!IS_ENABLED(CONFIG_FAIR_SOFTIRQ_SCHEDULE)) {
+		if (!current_is_ksoftirqd())
+			__irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+		goto out;
+	}
+
+	if (!si_times)
+		goto out;
+
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		if (servicing_softirq(i))
+			continue;
+		/* Account for ksoftirq thread only softirq time of it's type */
+		__irqtime_account_delta(irqtime, si_times[i], CPUTIME_SOFTIRQ);
+		irqtime->total_si[i] += si_times[i];
+	}
+out:
+	u64_stats_update_end(&irqtime->sync);
 }
 
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, u64 *si_times)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
 	s64 delta;
@@ -76,9 +102,10 @@ void irqtime_account_irq(struct task_struct *curr)
 	irqtime->irq_start_time += delta;
 
 	if (hardirq_count()) {
+		WARN_ON_ONCE(si_times);
 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
 	} else if (in_serving_softirq()) {
-		irqtime_account_softirq(irqtime, delta);
+		irqtime_account_softirq(irqtime, si_times, delta);
 	}
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b19552a212de..14e154c86dc5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2055,6 +2055,7 @@ struct irqtime {
 	u64			total;
 	u64			tick_delta;
 	u64			irq_start_time;
+	u64			total_si[NR_SOFTIRQS];
 	struct u64_stats_sync	sync;
 };
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index fdde3788afba..516e31d3d5b4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -22,6 +22,7 @@
 #include <linux/kthread.h>
 #include <linux/rcupdate.h>
 #include <linux/ftrace.h>
+#include <linux/sched/clock.h>
 #include <linux/smp.h>
 #include <linux/smpboot.h>
 #include <linux/tick.h>
@@ -287,6 +288,14 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+static inline u64 time_softirq(u64 start)
+{
+#ifdef CONFIG_FAIR_SOFTIRQ_SCHEDULE
+	return local_clock() - start;
+#endif
+	return 0;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
@@ -296,6 +305,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 	bool in_hardirq;
 	__u32 pending;
 	int softirq_bit;
+	u64 si_times[NR_SOFTIRQS] = {0};
 
 	/*
 	 * Mask out PF_MEMALLOC s current task context is borrowed for the
@@ -322,6 +332,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 	while ((softirq_bit = ffs(pending))) {
 		unsigned int vec_nr;
 		int prev_count;
+		u64 start_time = time_softirq(0);
 
 		h += softirq_bit - 1;
 
@@ -341,6 +352,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 		}
 		h++;
 		pending >>= softirq_bit;
+		si_times[vec_nr] += time_softirq(start_time);
 	}
 
 	rcu_bh_qs();
@@ -357,7 +369,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(__u32 mask)
 	}
 
 	lockdep_softirq_end(in_hardirq);
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, si_times);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	WARN_ON_ONCE(in_interrupt());
 	current_restore_flags(old_flags, PF_MEMALLOC);
@@ -449,7 +461,7 @@ void irq_exit(void)
 #else
 	lockdep_assert_irqs_disabled();
 #endif
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, 0);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 6/6] softirq/sched: Account si cpu time to ksoftirqd(s)
  2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
                   ` (4 preceding siblings ...)
  2018-01-18 16:12 ` [RFC 5/6] softirq: Add time accounting per-softirq type Dmitry Safonov
@ 2018-01-18 16:12 ` Dmitry Safonov
  5 siblings, 0 replies; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Andrew Morton, David Miller, Eric Dumazet,
	Frederic Weisbecker, Hannes Frederic Sowa, Ingo Molnar, Levin,
	Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Mike Galbraith,
	Paolo Abeni, Paul E. McKenney, Peter Zijlstra, Radu Rendec,
	Rik van Riel, Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

Warning: non-merge-ready in any sense

Under CONFIG_FAIR_SOFTIRQ_SCHEDULE each sched tick will account cpu time
spent on processing softirqs to ksoftirqd of the softirq's group.
Update then ksoftirqd->se.sum_exec_runtime and recalculate
ksoftirqd->se.vruntime.

Use CFS's vrutime to decide if softirq needs to be served or deferred.
It's possible to tune this with ksoftirqd nice policy.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/interrupt.h |  1 +
 kernel/sched/fair.c       | 38 ++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h      | 19 +++++++++++++++++++
 kernel/softirq.c          | 45 +++++++++++++++++++++++++++++++++++++--------
 4 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 17e1a04445fa..a0b5c24c088a 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -512,6 +512,7 @@ extern struct task_struct *__percpu **ksoftirqd;
 extern unsigned nr_softirq_groups;
 
 extern bool servicing_softirq(unsigned nr);
+extern unsigned group_softirqs(unsigned nr);
 static inline bool current_is_ksoftirqd(void)
 {
 	unsigned i;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2fe3aa853e4d..d0105739551f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -813,6 +813,42 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
 }
 #endif /* CONFIG_SMP */
 
+static void update_ksoftirqd(struct cfs_rq *cfs_rq)
+{
+#ifdef CONFIG_FAIR_SOFTIRQ_SCHEDULE
+	int rq_cpu = cpu_of(rq_of(cfs_rq));
+	u64 si_times[NR_SOFTIRQS], delta[NR_SOFTIRQS];
+	unsigned i;
+
+	if (unlikely(!ksoftirqd))
+		return;
+
+	softirq_time_read(rq_cpu, si_times);
+
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		delta[i] = si_times[i] - cfs_rq->prev_si_time[i];
+		cfs_rq->prev_si_time[i] = si_times[i];
+		if (unlikely((s64)delta[i] < 0))
+			delta[i] = 0;
+	}
+
+	for (i = 0; i < nr_softirq_groups; i++) {
+		unsigned j, softirq = 0, group_mask = group_softirqs(i);
+		struct task_struct *tsk = *this_cpu_ptr(ksoftirqd[i]);
+		u64 sum_delta = 0;
+
+		while ((j = ffs(group_mask))) {
+			softirq += j - 1;
+			group_mask >>= j;
+			sum_delta += delta[softirq];
+		}
+
+		tsk->se.sum_exec_runtime += sum_delta;
+		tsk->se.vruntime += calc_delta_fair(sum_delta, &tsk->se);
+	}
+#endif
+}
+
 /*
  * Update the current task's runtime statistics.
  */
@@ -822,6 +858,8 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	u64 now = rq_clock_task(rq_of(cfs_rq));
 	u64 delta_exec;
 
+	update_ksoftirqd(cfs_rq);
+
 	if (unlikely(!curr))
 		return;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14e154c86dc5..e95d8d4f9146 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -487,6 +487,10 @@ struct cfs_rq {
 	struct list_head leaf_cfs_rq_list;
 	struct task_group *tg;	/* group that "owns" this runqueue */
 
+#ifdef CONFIG_FAIR_SOFTIRQ_SCHEDULE
+	u64 prev_si_time[NR_SOFTIRQS];
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 	int runtime_enabled;
 	u64 runtime_expires;
@@ -2081,6 +2085,21 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
 
+static inline void softirq_time_read(int cpu, u64 si_times[NR_SOFTIRQS])
+{
+#ifdef CONFIG_FAIR_SOFTIRQ_SCHEDULE
+	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
+	unsigned int seq, i;
+
+	for (i = 0; i < NR_SOFTIRQS; i++) {
+		do {
+			seq = __u64_stats_fetch_begin(&irqtime->sync);
+			si_times[i] = irqtime->total_si[i];
+		} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
+	}
+#endif
+}
+
 #ifdef CONFIG_CPU_FREQ
 DECLARE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 516e31d3d5b4..a123bafa11c2 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -82,6 +82,11 @@ bool servicing_softirq(unsigned nr)
 	return false;
 }
 
+unsigned group_softirqs(unsigned nr)
+{
+	return group_to_softirqs[nr];
+}
+
 /*
  * we cannot loop indefinitely here to avoid userspace starvation,
  * but we also don't want to introduce a worst case 1/HZ latency
@@ -112,15 +117,10 @@ static void wakeup_softirqd(u32 softirq_mask)
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
  * right now. Let ksoftirqd handle this at its own rate, to get fairness.
  */
-static bool ksoftirqd_running(void)
+static bool ksoftirqd_running(__u32 pending)
 {
-	/* We rely that there are pending softirqs */
-	__u32 pending = local_softirq_pending();
 	unsigned i;
 
-	if (!ksoftirqd)
-		return false;
-
 	for (i = 0; i < nr_softirq_groups && pending; i++) {
 		/* Interrupts are disabled: no need to stop preemption */
 		struct task_struct *tsk = *this_cpu_ptr(ksoftirqd[i]);
@@ -137,6 +137,33 @@ static bool ksoftirqd_running(void)
 	return !pending;
 }
 
+static __u32 softirqs_to_serve(__u32 pending)
+{
+	unsigned i;
+	__u32 unserve = pending;
+
+	if (!ksoftirqd || !current || is_idle_task(current))
+		return pending;
+
+	if (!IS_ENABLED(CONFIG_FAIR_SOFTIRQ_SCHEDULE))
+		return ksoftirqd_running(pending) ? 0 : pending;
+
+	for (i = 0; i < nr_softirq_groups && unserve; i++) {
+		/* Interrupts are disabled: no need to stop preemption */
+		struct task_struct *tsk = *this_cpu_ptr(ksoftirqd[i]);
+
+		if (tsk && (s64)(current->se.vruntime - tsk->se.vruntime) < 0) {
+			if (tsk->state != TASK_RUNNING)
+				wake_up_process(tsk);
+			continue;
+		}
+
+		unserve &= ~group_to_softirqs[i];
+	}
+
+	return pending & ~unserve;
+}
+
 /*
  * preempt_count and SOFTIRQ_OFFSET usage:
  * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
@@ -385,7 +412,8 @@ asmlinkage __visible void do_softirq(void)
 
 	local_irq_save(flags);
 
-	if (!ksoftirqd_running())
+	pending = softirqs_to_serve(pending);
+	if (pending)
 		do_softirq_own_stack(pending);
 
 	local_irq_restore(flags);
@@ -414,7 +442,8 @@ static inline void invoke_softirq(void)
 {
 	__u32 pending = local_softirq_pending();
 
-	if (!pending || !ksoftirqd_running())
+	pending = softirqs_to_serve(pending);
+	if (!pending)
 		return;
 
 	if (!force_irqthreads) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread
  2018-01-18 16:12 ` [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread Dmitry Safonov
@ 2018-01-18 17:00   ` Mike Galbraith
  2018-01-18 17:53     ` Dmitry Safonov
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2018-01-18 17:00 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: Andrew Morton, David Miller, Eric Dumazet, Frederic Weisbecker,
	Hannes Frederic Sowa, Ingo Molnar, Levin, Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Paolo Abeni,
	Paul E. McKenney, Peter Zijlstra, Radu Rendec, Rik van Riel,
	Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

On Thu, 2018-01-18 at 16:12 +0000, Dmitry Safonov wrote:
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 2ea09896bd6e..17e1a04445fa 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -508,11 +508,21 @@ extern void __raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq(unsigned int nr);
>  
> -DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> +extern struct task_struct *__percpu **ksoftirqd;
> +extern unsigned nr_softirq_groups;
>  
> -static inline struct task_struct *this_cpu_ksoftirqd(void)
> +extern bool servicing_softirq(unsigned nr);
> +static inline bool current_is_ksoftirqd(void)
>  {
> -	return this_cpu_read(ksoftirqd);
> +	unsigned i;
> +
> +	if (!ksoftirqd)
> +		return false;
> +
> +	for (i = 0; i < nr_softirq_groups; i++)
> +		if (*this_cpu_ptr(ksoftirqd[i]) ==
> current)
> +			return true;
> +	return false;
>  }

I haven't read all this, but in a quick drive-by this poked me in the
eye.  For RT tree fully threaded softirqs, I stole a ->flags bit to
identify threads ala PF_KTHREAD (PF_KSOFTIRQD).  In previous versions,
I added a bit field to do the same, either is quicker than rummaging.

	-Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread
  2018-01-18 17:00   ` Mike Galbraith
@ 2018-01-18 17:53     ` Dmitry Safonov
  2018-01-18 18:28       ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Safonov @ 2018-01-18 17:53 UTC (permalink / raw)
  To: Mike Galbraith, linux-kernel
  Cc: Andrew Morton, David Miller, Eric Dumazet, Frederic Weisbecker,
	Hannes Frederic Sowa, Ingo Molnar, Levin, Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Paolo Abeni,
	Paul E. McKenney, Peter Zijlstra, Radu Rendec, Rik van Riel,
	Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

On Thu, 2018-01-18 at 18:00 +0100, Mike Galbraith wrote:
> On Thu, 2018-01-18 at 16:12 +0000, Dmitry Safonov wrote:
> > 
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index 2ea09896bd6e..17e1a04445fa 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -508,11 +508,21 @@ extern void __raise_softirq_irqoff(unsigned
> > int nr);
> >  extern void raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq(unsigned int nr);
> >  
> > -DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> > +extern struct task_struct *__percpu **ksoftirqd;
> > +extern unsigned nr_softirq_groups;
> >  
> > -static inline struct task_struct *this_cpu_ksoftirqd(void)
> > +extern bool servicing_softirq(unsigned nr);
> > +static inline bool current_is_ksoftirqd(void)
> >  {
> > -	return this_cpu_read(ksoftirqd);
> > +	unsigned i;
> > +
> > +	if (!ksoftirqd)
> > +		return false;
> > +
> > +	for (i = 0; i < nr_softirq_groups; i++)
> > +		if (*this_cpu_ptr(ksoftirqd[i]) ==
> > current)
> > +			return true;
> > +	return false;
> >  }
> 
> I haven't read all this, but in a quick drive-by this poked me in the
> eye.  For RT tree fully threaded softirqs, I stole a ->flags bit to
> identify threads ala PF_KTHREAD (PF_KSOFTIRQD).  In previous
> versions,
> I added a bit field to do the same, either is quicker than rummaging.

Yeah, thank you. It perfectly makes sense to use flags to identify
ksoftirqd thread. How do you identify in RT one ksoftirqd thread from
another? I mean, to find which softirq nr the thread is servicing?

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread
  2018-01-18 17:53     ` Dmitry Safonov
@ 2018-01-18 18:28       ` Mike Galbraith
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2018-01-18 18:28 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: Andrew Morton, David Miller, Eric Dumazet, Frederic Weisbecker,
	Hannes Frederic Sowa, Ingo Molnar, Levin, Alexander (Sasha Levin),
	Linus Torvalds, Mauro Carvalho Chehab, Paolo Abeni,
	Paul E. McKenney, Peter Zijlstra, Radu Rendec, Rik van Riel,
	Stanislaw Gruszka, Thomas Gleixner, Wanpeng Li

On Thu, 2018-01-18 at 17:53 +0000, Dmitry Safonov wrote:
> How do you identify in RT one ksoftirqd thread from
> another? I mean, to find which softirq nr the thread is servicing?

static void do_raise_softirq_irqoff(unsigned int nr)
{
	struct task_struct *tsk = __this_cpu_ksoftirqd(nr);
	unsigned int mask = 1UL << nr;

	trace_softirq_raise(nr);
	or_softirq_pending(mask);

	/*
	 * If we are not in a hard interrupt and inside a bh disabled
	 * region, we simply raise the flag on current. local_bh_enable()
	 * will make sure that the softirq is executed. Otherwise we
	 * delegate it to the proper softirqd thread for this softirq.
	 */
	if (!in_irq() && current->softirq_nestcnt) {
		if (!(current->flags & PF_KSOFTIRQD) || current == tsk)
			current->softirqs_raised |= mask;
		else if (tsk) {
			tsk->softirqs_raised |= mask;
			wakeup_softirqd(nr);
		}
	} else if (tsk)
		tsk->softirqs_raised |= mask;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-18 18:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18 16:12 [RFC 0/6] Multi-thread per-cpu ksoftirqd Dmitry Safonov
2018-01-18 16:12 ` [RFC 1/6] softirq: Add softirq_groups boot parameter Dmitry Safonov
2018-01-18 16:12 ` [RFC 2/6] softirq: Introduce mask for __do_softirq() Dmitry Safonov
2018-01-18 16:12 ` [RFC 3/6] softirq: Add reverse group-to-softirq map Dmitry Safonov
2018-01-18 16:12 ` [RFC 4/6] softirq: Run per-group per-cpu ksoftirqd thread Dmitry Safonov
2018-01-18 17:00   ` Mike Galbraith
2018-01-18 17:53     ` Dmitry Safonov
2018-01-18 18:28       ` Mike Galbraith
2018-01-18 16:12 ` [RFC 5/6] softirq: Add time accounting per-softirq type Dmitry Safonov
2018-01-18 16:12 ` [RFC 6/6] softirq/sched: Account si cpu time to ksoftirqd(s) Dmitry Safonov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.