linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: peterz@infradead.org, davem@davemloft.net, daniel@iogearbox.net,
	edumazet@google.com, tom@quantonium.net, ktkhai@virtuozzo.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC] net;sched: Try to find idle cpu for RPS to handle packets
Date: Wed, 19 Sep 2018 15:28:54 +0300	[thread overview]
Message-ID: <153736009982.24033.13696245431713246950.stgit@localhost.localdomain> (raw)

Many workloads have polling mode of work. The application
checks for incomming packets from time to time, but it also
has a work to do, when there is no packets. This RFC
tries to develop an idea to queue RPS packets on idle
CPU in the the L3 domain of the consumer, so backlog
processing of the packets and the application can execute
in parallel.

We require this in case of network cards does not
have enough RX queues to cover all online CPUs (this seems
to be the most cards), and  get_rps_cpu() actually chooses
remote cpu, and SMP interrupt is sent. Here we may try
our best, and to find idle CPU nearly the consumer's CPU.
Note, that in case of consumer works in poll mode and it
does not waits for incomming packets, its CPU will be not
idle, while CPU of a sleeping consumer may be idle. So,
not polling consumers will still be able to have skb
handled on its CPU.

In case of network card has many queues, the device
interrupts will come on consumer's CPU, and this patch
won't try to find idle cpu for them.

I've tried simple netperf test for this:
netserver -p 1234
netperf -L 127.0.0.1 -p 1234 -l 100

Before:
 87380  16384  16384    100.00   60323.56
 87380  16384  16384    100.00   60388.46
 87380  16384  16384    100.00   60217.68
 87380  16384  16384    100.00   57995.41
 87380  16384  16384    100.00   60659.00

After:
 87380  16384  16384    100.00   64569.09
 87380  16384  16384    100.00   64569.25
 87380  16384  16384    100.00   64691.63
 87380  16384  16384    100.00   64930.14
 87380  16384  16384    100.00   62670.15

The difference between best runs is +7%,
the worst runs differ +8%.

What do you think about following somehow in this way?

[This also requires a pre-patch, which exports
 select_idle_sibling() and teaches it handles
 NULL task argument, but since it's not very
 interesting to see, I skip it sending].

Kirill
---
 net/core/dev.c |   34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 559a91271f82..9a867ff34622 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3738,13 +3738,12 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		       struct rps_dev_flow **rflowp)
 {
-	const struct rps_sock_flow_table *sock_flow_table;
+	struct rps_sock_flow_table *sock_flow_table;
 	struct netdev_rx_queue *rxqueue = dev->_rx;
 	struct rps_dev_flow_table *flow_table;
 	struct rps_map *map;
+	u32 tcpu, hash, val;
 	int cpu = -1;
-	u32 tcpu;
-	u32 hash;
 
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
@@ -3774,6 +3773,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	sock_flow_table = rcu_dereference(rps_sock_flow_table);
 	if (flow_table && sock_flow_table) {
 		struct rps_dev_flow *rflow;
+		bool want_new_cpu = false;
+		unsigned long flags;
+		unsigned int qhead;
 		u32 next_cpu;
 		u32 ident;
 
@@ -3801,12 +3803,26 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		 *     This guarantees that all previous packets for the flow
 		 *     have been dequeued, thus preserving in order delivery.
 		 */
-		if (unlikely(tcpu != next_cpu) &&
-		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
-		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
-		      rflow->last_qtail)) >= 0)) {
-			tcpu = next_cpu;
-			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
+		if (tcpu != next_cpu) {
+			qhead = per_cpu(softnet_data, tcpu).input_queue_head;
+			if (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
+			    (int)(qhead - rflow->last_qtail) >= 0)
+				want_new_cpu = true;
+		} else if (tcpu < nr_cpu_ids && cpu_online(tcpu) &&
+			   tcpu != smp_processor_id() && !available_idle_cpu(tcpu)) {
+			want_new_cpu = true;
+		}
+
+		if (want_new_cpu) {
+			local_irq_save(flags);
+			next_cpu = select_idle_sibling(NULL, next_cpu, next_cpu);
+			local_irq_restore(flags);
+			if (tcpu != next_cpu) {
+				tcpu = next_cpu;
+				rflow = set_rps_cpu(dev, skb, rflow, tcpu);
+				val = (hash & ~rps_cpu_mask) | tcpu;
+				sock_flow_table->ents[hash & sock_flow_table->mask] = val;
+			}
 		}
 
 		if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {


             reply	other threads:[~2018-09-19 12:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 12:28 Kirill Tkhai [this message]
2018-09-19 14:55 ` [RFC] net;sched: Try to find idle cpu for RPS to handle packets Eric Dumazet
2018-09-19 15:41   ` Kirill Tkhai
2018-09-19 15:49     ` Eric Dumazet
2018-09-19 15:58       ` Kirill Tkhai
2018-09-27 16:17         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153736009982.24033.13696245431713246950.stgit@localhost.localdomain \
    --to=ktkhai@virtuozzo.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tom@quantonium.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).