netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/3] implement kthread based napi poll
@ 2021-01-20  3:34 Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-20  3:34 UTC (permalink / raw)
  To: David Miller, netdev, Jakub Kicinski
  Cc: Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck

The idea of moving the napi poll process out of softirq context to a
kernel thread based context is not new.
Paolo Abeni and Hannes Frederic Sowa have proposed patches to move napi
poll to kthread back in 2016. And Felix Fietkau has also proposed
patches of similar ideas to use workqueue to process napi poll just a
few weeks ago.

The main reason we'd like to push forward with this idea is that the
scheduler has poor visibility into cpu cycles spent in softirq context,
and is not able to make optimal scheduling decisions of the user threads.
For example, we see in one of the application benchmark where network
load is high, the CPUs handling network softirqs has ~80% cpu util. And
user threads are still scheduled on those CPUs, despite other more idle
cpus available in the system. And we see very high tail latencies. In this
case, we have to explicitly pin away user threads from the CPUs handling
network softirqs to ensure good performance.
With napi poll moved to kthread, scheduler is in charge of scheduling both
the kthreads handling network load, and the user threads, and is able to
make better decisions. In the previous benchmark, if we do this and we
pin the kthreads processing napi poll to specific CPUs, scheduler is
able to schedule user threads away from these CPUs automatically.

And the reason we prefer 1 kthread per napi, instead of 1 workqueue
entity per host, is that kthread is more configurable than workqueue,
and we could leverage existing tuning tools for threads, like taskset,
chrt, etc to tune scheduling class and cpu set, etc. Another reason is
if we eventually want to provide busy poll feature using kernel threads
for napi poll, kthread seems to be more suitable than workqueue.
Furthermore, for large platforms with 2 NICs attached to 2 sockets,
kthread is more flexible to be pinned to different sets of CPUs.  

In this patch series, I revived Paolo and Hannes's patch in 2016 and
made modifications. Then there are changes proposed by Felix, Jakub,
Paolo and myself on top of those, with suggestions from Eric Dumazet.

In terms of performance, I ran tcp_rr tests with 1000 flows with
various request/response sizes, with RFS/RPS disabled, and compared
performance between softirq vs kthread vs workqueue (patchset proposed
by Felix Fietkau).
Host has 56 hyper threads and 100Gbps nic, 8 rx queues and only 1 numa
node. All threads are unpinned.

        req/resp   QPS   50%tile    90%tile    99%tile    99.9%tile
softirq   1B/1B   2.75M   337us       376us      1.04ms     3.69ms
kthread   1B/1B   2.67M   371us       408us      455us      550us
workq     1B/1B   2.56M   384us       435us      673us      822us

softirq 5KB/5KB   1.46M   678us       750us      969us      2.78ms
kthread 5KB/5KB   1.44M   695us       789us      891us      1.06ms
workq   5KB/5KB   1.34M   720us       905us     1.06ms      1.57ms

softirq 1MB/1MB   11.0K   79ms       166ms      306ms       630ms
kthread 1MB/1MB   11.0K   75ms       177ms      303ms       596ms
workq   1MB/1MB   11.0K   79ms       180ms      303ms       587ms

When running workqueue implementation, I found the number of threads
used is usually twice as much as kthread implementation. This probably
introduces higher scheduling cost, which results in higher tail
latencies in most cases.

I also ran an application benchmark, which performs fixed qps remote SSD
read/write operations, with various sizes. Again, both with RFS/RPS
disabled.
The result is as follows:
         op_size  QPS   50%tile 95%tile 99%tile 99.9%tile  
softirq   4K     572.6K   385us   1.5ms  3.16ms   6.41ms
kthread   4K     572.6K   390us   803us  2.21ms   6.83ms
workq     4k     572.6K   384us   763us  3.12ms   6.87ms

softirq   64K    157.9K   736us   1.17ms 3.40ms   13.75ms
kthread   64K    157.9K   745us   1.23ms 2.76ms    9.87ms 
workq     64K    157.9K   746us   1.23ms 2.76ms    9.96ms

softirq   1M     10.98K   2.03ms  3.10ms  3.7ms   11.56ms
kthread   1M     10.98K   2.13ms  3.21ms  4.02ms  13.3ms
workq     1M     10.98K   2.13ms  3.20ms  3.99ms  14.12ms

In this set of tests, the latency is predominant by the SSD operation.
Also, the user threads are much busier compared to tcp_rr tests. We have
to pin the kthreads/workqueue threads to limit to a few CPUs, to not
disturb user threads, and provide some isolation.

Changes since v6:
Added memory barrier in napi_set_threaded().
Changed /sys/class/net/<dev>/thread to a ternary value.
Change dev->threaded to a bit instead of bool.

Changes since v5:
Removed ASSERT_RTNL() from napi_set_threaded() and removed rtnl_lock()
operation from napi_enable().

Changes since v4:
Recorded the threaded setting in dev and restore it in napi_enable().

Changes since v3:
Merged and rearranged patches in a logical order for easier review. 
Changed sysfs control to be per device.

Changes since v2:
Corrected typo in patch 1, and updated the cover letter with more
detailed and updated test results.

Changes since v1:
Replaced kthread_create() with kthread_run() in patch 5 as suggested by
Felix Fietkau.

Changes since RFC:
Renamed the kthreads to be napi/<dev>-<napi_id> in patch 5 as suggested
by Hannes Frederic Sowa.

Felix Fietkau (1):
  net: extract napi poll functionality to __napi_poll()

Wei Wang (2):
  net: implement threaded-able napi poll loop support
  net: add sysfs attribute to control napi threaded mode

 include/linux/netdevice.h |  21 ++--
 net/core/dev.c            | 200 ++++++++++++++++++++++++++++++++++++--
 net/core/net-sysfs.c      |  68 +++++++++++++
 3 files changed, 265 insertions(+), 24 deletions(-)

-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v7 1/3] net: extract napi poll functionality to __napi_poll()
  2021-01-20  3:34 [PATCH net-next v7 0/3] implement kthread based napi poll Wei Wang
@ 2021-01-20  3:34 ` Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 2/3] net: implement threaded-able napi poll loop support Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
  2 siblings, 0 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-20  3:34 UTC (permalink / raw)
  To: David Miller, netdev, Jakub Kicinski
  Cc: Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck

From: Felix Fietkau <nbd@nbd.name> 

This commit introduces a new function __napi_poll() which does the main
logic of the existing napi_poll() function, and will be called by other
functions in later commits.
This idea and implementation is done by Felix Fietkau <nbd@nbd.name> and
is proposed as part of the patch to move napi work to work_queue
context.
This commit by itself is a code restructure.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <weiwan@google.com>
---
 net/core/dev.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0332f2e8f7da..7d23bff03864 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6768,15 +6768,10 @@ void __netif_napi_del(struct napi_struct *napi)
 }
 EXPORT_SYMBOL(__netif_napi_del);
 
-static int napi_poll(struct napi_struct *n, struct list_head *repoll)
+static int __napi_poll(struct napi_struct *n, bool *repoll)
 {
-	void *have;
 	int work, weight;
 
-	list_del_init(&n->poll_list);
-
-	have = netpoll_poll_lock(n);
-
 	weight = n->weight;
 
 	/* This NAPI_STATE_SCHED test is for avoiding a race
@@ -6796,7 +6791,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 			    n->poll, work, weight);
 
 	if (likely(work < weight))
-		goto out_unlock;
+		return work;
 
 	/* Drivers must not modify the NAPI state if they
 	 * consume the entire weight.  In such cases this code
@@ -6805,7 +6800,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 	 */
 	if (unlikely(napi_disable_pending(n))) {
 		napi_complete(n);
-		goto out_unlock;
+		return work;
 	}
 
 	/* The NAPI context has more processing work, but busy-polling
@@ -6818,7 +6813,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 			 */
 			napi_schedule(n);
 		}
-		goto out_unlock;
+		return work;
 	}
 
 	if (n->gro_bitmask) {
@@ -6836,9 +6831,29 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 	if (unlikely(!list_empty(&n->poll_list))) {
 		pr_warn_once("%s: Budget exhausted after napi rescheduled\n",
 			     n->dev ? n->dev->name : "backlog");
-		goto out_unlock;
+		return work;
 	}
 
+	*repoll = true;
+
+	return work;
+}
+
+static int napi_poll(struct napi_struct *n, struct list_head *repoll)
+{
+	bool do_repoll = false;
+	void *have;
+	int work;
+
+	list_del_init(&n->poll_list);
+
+	have = netpoll_poll_lock(n);
+
+	work = __napi_poll(n, &do_repoll);
+
+	if (!do_repoll)
+		goto out_unlock;
+
 	list_add_tail(&n->poll_list, repoll);
 
 out_unlock:
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v7 2/3] net: implement threaded-able napi poll loop support
  2021-01-20  3:34 [PATCH net-next v7 0/3] implement kthread based napi poll Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
@ 2021-01-20  3:34 ` Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
  2 siblings, 0 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-20  3:34 UTC (permalink / raw)
  To: David Miller, netdev, Jakub Kicinski
  Cc: Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck

This patch allows running each napi poll loop inside its own
kernel thread.
The threaded mode could be enabled through napi_set_threaded()
api, and does not require a device up/down. The kthread gets
created on demand when napi_set_threaded() is called, and gets
shut down eventually in napi_disable().

Once that threaded mode is enabled and the kthread is
started, napi_schedule() will wake-up such thread instead
of scheduling the softirq.

The threaded poll loop behaves quite likely the net_rx_action,
but it does not have to manipulate local irqs and uses
an explicit scheduling point based on napi->weight.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
---
 include/linux/netdevice.h |  19 ++----
 net/core/dev.c            | 137 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 14 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 02dcef4d66e2..8cb8d43ea5fa 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -347,6 +347,7 @@ struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
+	struct task_struct	*thread;
 };
 
 enum {
@@ -358,6 +359,7 @@ enum {
 	NAPI_STATE_NO_BUSY_POLL,	/* Do not add in napi_hash, no busy polling */
 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
+	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
 };
 
 enum {
@@ -369,6 +371,7 @@ enum {
 	NAPIF_STATE_NO_BUSY_POLL	= BIT(NAPI_STATE_NO_BUSY_POLL),
 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
+	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
 };
 
 enum gro_result {
@@ -503,20 +506,7 @@ static inline bool napi_complete(struct napi_struct *n)
  */
 void napi_disable(struct napi_struct *n);
 
-/**
- *	napi_enable - enable NAPI scheduling
- *	@n: NAPI context
- *
- * Resume NAPI from being scheduled on this context.
- * Must be paired with napi_disable.
- */
-static inline void napi_enable(struct napi_struct *n)
-{
-	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
-	smp_mb__before_atomic();
-	clear_bit(NAPI_STATE_SCHED, &n->state);
-	clear_bit(NAPI_STATE_NPSVC, &n->state);
-}
+void napi_enable(struct napi_struct *n);
 
 /**
  *	napi_synchronize - wait until NAPI is not running
@@ -2143,6 +2133,7 @@ struct net_device {
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
+	unsigned		threaded:1;
 
 	struct list_head	net_notifier_list;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 7d23bff03864..7ffa91475856 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -91,6 +91,7 @@
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
 #include <linux/skbuff.h>
+#include <linux/kthread.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <net/net_namespace.h>
@@ -1493,6 +1494,37 @@ void netdev_notify_peers(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_notify_peers);
 
+static int napi_threaded_poll(void *data);
+
+static int napi_kthread_create(struct napi_struct *n)
+{
+	int err = 0;
+
+	/* Create and wake up the kthread once to put it in
+	 * TASK_INTERRUPTIBLE mode to avoid the blocked task
+	 * warning and work with loadavg.
+	 */
+	n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
+				n->dev->name, n->napi_id);
+	if (IS_ERR(n->thread)) {
+		err = PTR_ERR(n->thread);
+		pr_err("kthread_run failed with err %d\n", err);
+		n->thread = NULL;
+	}
+
+	return err;
+}
+
+static void napi_kthread_stop(struct napi_struct *n)
+{
+	if (!n->thread)
+		return;
+
+	kthread_stop(n->thread);
+	clear_bit(NAPI_STATE_THREADED, &n->state);
+	n->thread = NULL;
+}
+
 static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -4252,6 +4284,21 @@ int gro_normal_batch __read_mostly = 8;
 static inline void ____napi_schedule(struct softnet_data *sd,
 				     struct napi_struct *napi)
 {
+	struct task_struct *thread;
+
+	if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
+		/* Paired with smp_mb__before_atomic() in
+		 * napi_set_threaded(). Use READ_ONCE() to guarantee
+		 * a complete read on napi->thread. Only call
+		 * wake_up_process() when it's not NULL.
+		 */
+		thread = READ_ONCE(napi->thread);
+		if (thread) {
+			wake_up_process(thread);
+			return;
+		}
+	}
+
 	list_add_tail(&napi->poll_list, &sd->poll_list);
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
@@ -6693,6 +6740,33 @@ static void init_gro_hash(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 }
 
+static int napi_set_threaded(struct napi_struct *n, bool threaded)
+{
+	int err = 0;
+
+	if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
+		return 0;
+
+	if (!threaded) {
+		clear_bit(NAPI_STATE_THREADED, &n->state);
+		return 0;
+	}
+
+	if (!n->thread) {
+		err = napi_kthread_create(n);
+		if (err)
+			return err;
+	}
+
+	/* Make sure kthread is created before THREADED bit
+	 * is set.
+	 */
+	smp_mb__before_atomic();
+	set_bit(NAPI_STATE_THREADED, &n->state);
+
+	return 0;
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
@@ -6734,12 +6808,30 @@ void napi_disable(struct napi_struct *n)
 		msleep(1);
 
 	hrtimer_cancel(&n->timer);
+	napi_kthread_stop(n);
 
 	clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }
 EXPORT_SYMBOL(napi_disable);
 
+/**
+ *	napi_enable - enable NAPI scheduling
+ *	@n: NAPI context
+ *
+ * Resume NAPI from being scheduled on this context.
+ * Must be paired with napi_disable.
+ */
+void napi_enable(struct napi_struct *n)
+{
+	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
+	WARN_ON(napi_set_threaded(n, n->dev->threaded));
+	smp_mb__before_atomic();
+	clear_bit(NAPI_STATE_SCHED, &n->state);
+	clear_bit(NAPI_STATE_NPSVC, &n->state);
+}
+EXPORT_SYMBOL(napi_enable);
+
 static void flush_gro_hash(struct napi_struct *napi)
 {
 	int i;
@@ -6862,6 +6954,51 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 	return work;
 }
 
+static int napi_thread_wait(struct napi_struct *napi)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+
+	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
+		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+			WARN_ON(!list_empty(&napi->poll_list));
+			__set_current_state(TASK_RUNNING);
+			return 0;
+		}
+
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+	return -1;
+}
+
+static int napi_threaded_poll(void *data)
+{
+	struct napi_struct *napi = data;
+	void *have;
+
+	while (!napi_thread_wait(napi)) {
+		for (;;) {
+			bool repoll = false;
+
+			local_bh_disable();
+
+			have = netpoll_poll_lock(napi);
+			__napi_poll(napi, &repoll);
+			netpoll_poll_unlock(have);
+
+			__kfree_skb_flush();
+			local_bh_enable();
+
+			if (!repoll)
+				break;
+
+			cond_resched();
+		}
+	}
+	return 0;
+}
+
 static __latent_entropy void net_rx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20  3:34 [PATCH net-next v7 0/3] implement kthread based napi poll Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
  2021-01-20  3:34 ` [PATCH net-next v7 2/3] net: implement threaded-able napi poll loop support Wei Wang
@ 2021-01-20  3:34 ` Wei Wang
  2021-01-20 16:12   ` Alexander Duyck
  2021-01-20 17:28   ` Florian Fainelli
  2 siblings, 2 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-20  3:34 UTC (permalink / raw)
  To: David Miller, netdev, Jakub Kicinski
  Cc: Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck

This patch adds a new sysfs attribute to the network device class.
Said attribute provides a per-device control to enable/disable the
threaded mode for all the napi instances of the given network device.
User sets it to 1 or 0 to enable or disable threaded mode per device.
However, when user reads from this sysfs entry, it could return:
  1: means all napi instances belonging to this device have threaded
mode enabled.
  0: means all napi instances belonging to this device have threaded
mode disabled.
  -1: means the system fails to enable threaded mode for certain napi
instances when user requests to enable threaded mode. This happens
when the kthread fails to be created for certain napi instances.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <weiwan@google.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 28 ++++++++++++++++
 net/core/net-sysfs.c      | 68 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8cb8d43ea5fa..26c3e8cf4c01 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
 	return napi_complete_done(n, 0);
 }
 
+int dev_set_threaded(struct net_device *dev, bool threaded);
+
 /**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: NAPI context
diff --git a/net/core/dev.c b/net/core/dev.c
index 7ffa91475856..e71c2fd91595 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6767,6 +6767,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
 	return 0;
 }
 
+static void dev_disable_threaded_all(struct net_device *dev)
+{
+	struct napi_struct *napi;
+
+	list_for_each_entry(napi, &dev->napi_list, dev_list)
+		napi_set_threaded(napi, false);
+}
+
+int dev_set_threaded(struct net_device *dev, bool threaded)
+{
+	struct napi_struct *napi;
+	int ret;
+
+	dev->threaded = threaded;
+	list_for_each_entry(napi, &dev->napi_list, dev_list) {
+		ret = napi_set_threaded(napi, threaded);
+		if (ret) {
+			/* Error occurred on one of the napi,
+			 * reset threaded mode on all napi.
+			 */
+			dev_disable_threaded_all(dev);
+			break;
+		}
+	}
+
+	return ret;
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index daf502c13d6d..bf62a20fbb81 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -538,6 +538,73 @@ static ssize_t phys_switch_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_switch_id);
 
+static ssize_t threaded_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct net_device *netdev = to_net_dev(dev);
+	struct napi_struct *n;
+	int enabled;
+	int ret;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (!dev_isalive(netdev)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	if (list_empty(&netdev->napi_list)) {
+		ret = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	/* Check if there is inconsistency between napi_state and
+	 * dev->threaded for each napi first.
+	 * The inconsistency could happen when the device driver calls
+	 * napi_disable()/napi_enable() with dev->threaded set to true,
+	 * but napi_kthread_create() fails for certain napi instances.
+	 * Return -1 in this inconsistent state.
+	 * Otherwise, return what is set in dev->threaded.
+	 */
+	list_for_each_entry(n, &netdev->napi_list, dev_list) {
+		enabled = !!test_bit(NAPI_STATE_THREADED, &n->state);
+		if (enabled != netdev->threaded) {
+			enabled = -1;
+			break;
+		}
+	}
+
+	ret = sprintf(buf, fmt_dec, enabled);
+
+unlock:
+	rtnl_unlock();
+	return ret;
+}
+
+static int modify_napi_threaded(struct net_device *dev, unsigned long val)
+{
+	int ret;
+
+	if (list_empty(&dev->napi_list))
+		return -EOPNOTSUPP;
+
+	if (val != 0 && val != 1)
+		return -EOPNOTSUPP;
+
+	ret = dev_set_threaded(dev, val);
+
+	return ret;
+}
+
+static ssize_t threaded_store(struct device *dev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t len)
+{
+	return netdev_store(dev, attr, buf, len, modify_napi_threaded);
+}
+static DEVICE_ATTR_RW(threaded);
+
 static struct attribute *net_class_attrs[] __ro_after_init = {
 	&dev_attr_netdev_group.attr,
 	&dev_attr_type.attr,
@@ -570,6 +637,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = {
 	&dev_attr_proto_down.attr,
 	&dev_attr_carrier_up_count.attr,
 	&dev_attr_carrier_down_count.attr,
+	&dev_attr_threaded.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20  3:34 ` [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
@ 2021-01-20 16:12   ` Alexander Duyck
  2021-01-20 18:07     ` Wei Wang
  2021-01-20 17:28   ` Florian Fainelli
  1 sibling, 1 reply; 10+ messages in thread
From: Alexander Duyck @ 2021-01-20 16:12 UTC (permalink / raw)
  To: Wei Wang
  Cc: David Miller, Netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Hannes Frederic Sowa, Felix Fietkau

On Tue, Jan 19, 2021 at 7:35 PM Wei Wang <weiwan@google.com> wrote:
>
> This patch adds a new sysfs attribute to the network device class.
> Said attribute provides a per-device control to enable/disable the
> threaded mode for all the napi instances of the given network device.
> User sets it to 1 or 0 to enable or disable threaded mode per device.
> However, when user reads from this sysfs entry, it could return:
>   1: means all napi instances belonging to this device have threaded
> mode enabled.
>   0: means all napi instances belonging to this device have threaded
> mode disabled.
>   -1: means the system fails to enable threaded mode for certain napi
> instances when user requests to enable threaded mode. This happens
> when the kthread fails to be created for certain napi instances.
>
> Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Co-developed-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Wei Wang <weiwan@google.com>
> ---
>  include/linux/netdevice.h |  2 ++
>  net/core/dev.c            | 28 ++++++++++++++++
>  net/core/net-sysfs.c      | 68 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 98 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8cb8d43ea5fa..26c3e8cf4c01 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
>         return napi_complete_done(n, 0);
>  }
>
> +int dev_set_threaded(struct net_device *dev, bool threaded);
> +
>  /**
>   *     napi_disable - prevent NAPI from scheduling
>   *     @n: NAPI context
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7ffa91475856..e71c2fd91595 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6767,6 +6767,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
>         return 0;
>  }
>
> +static void dev_disable_threaded_all(struct net_device *dev)
> +{
> +       struct napi_struct *napi;
> +
> +       list_for_each_entry(napi, &dev->napi_list, dev_list)
> +               napi_set_threaded(napi, false);
> +}
> +
> +int dev_set_threaded(struct net_device *dev, bool threaded)
> +{
> +       struct napi_struct *napi;
> +       int ret;
> +
> +       dev->threaded = threaded;
> +       list_for_each_entry(napi, &dev->napi_list, dev_list) {
> +               ret = napi_set_threaded(napi, threaded);
> +               if (ret) {
> +                       /* Error occurred on one of the napi,
> +                        * reset threaded mode on all napi.
> +                        */
> +                       dev_disable_threaded_all(dev);
> +                       break;
> +               }
> +       }
> +
> +       return ret;
> +}
> +

So I have a question about this function. Is there any reason why
napi_set_threaded couldn't be broken into two pieces and handled in
two passes with the first allocating the kthread and the second
setting the threaded bit assuming the allocations all succeeded? The
setting or clearing of the bit shouldn't need any return value since
it is void and the allocation of the kthread is the piece that can
fail. So it seems like it would make sense to see if you can allocate
all of the kthreads first before you go through and attempt to enable
threaded NAPI.

Then you should only need to make a change to netif_napi_add that will
allocate the kthread if adding a new instance on a device that is
running in threaded mode and if a thread allocation fails you could
clear dev->threaded so that when napi_enable is called we don't bother
enabling any threaded setups since some of the threads are
non-functional.

Doing so would guarantee all-or-nothing behavior and you could then
just use the dev->threaded to signal if the device is running threaded
or not as you could just clear it if the kthread allocation fails.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20  3:34 ` [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
  2021-01-20 16:12   ` Alexander Duyck
@ 2021-01-20 17:28   ` Florian Fainelli
  2021-01-20 18:07     ` Wei Wang
  1 sibling, 1 reply; 10+ messages in thread
From: Florian Fainelli @ 2021-01-20 17:28 UTC (permalink / raw)
  To: Wei Wang, David Miller, netdev, Jakub Kicinski
  Cc: Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck



On 1/19/2021 7:34 PM, Wei Wang wrote:
> This patch adds a new sysfs attribute to the network device class.
> Said attribute provides a per-device control to enable/disable the
> threaded mode for all the napi instances of the given network device.
> User sets it to 1 or 0 to enable or disable threaded mode per device.
> However, when user reads from this sysfs entry, it could return:
>   1: means all napi instances belonging to this device have threaded
> mode enabled.
>   0: means all napi instances belonging to this device have threaded
> mode disabled.
>   -1: means the system fails to enable threaded mode for certain napi
> instances when user requests to enable threaded mode. This happens
> when the kthread fails to be created for certain napi instances.
> 
> Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Co-developed-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Wei Wang <weiwan@google.com>

Can you document the new threaded sysfs attribute under
Documentation/ABI/testing/sysfs-class-net?
-- 
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20 16:12   ` Alexander Duyck
@ 2021-01-20 18:07     ` Wei Wang
  2021-01-21  3:35       ` Alexander Duyck
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Wang @ 2021-01-20 18:07 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Hannes Frederic Sowa, Felix Fietkau

On Wed, Jan 20, 2021 at 8:13 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Tue, Jan 19, 2021 at 7:35 PM Wei Wang <weiwan@google.com> wrote:
> >
> > This patch adds a new sysfs attribute to the network device class.
> > Said attribute provides a per-device control to enable/disable the
> > threaded mode for all the napi instances of the given network device.
> > User sets it to 1 or 0 to enable or disable threaded mode per device.
> > However, when user reads from this sysfs entry, it could return:
> >   1: means all napi instances belonging to this device have threaded
> > mode enabled.
> >   0: means all napi instances belonging to this device have threaded
> > mode disabled.
> >   -1: means the system fails to enable threaded mode for certain napi
> > instances when user requests to enable threaded mode. This happens
> > when the kthread fails to be created for certain napi instances.
> >
> > Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Co-developed-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > ---
> >  include/linux/netdevice.h |  2 ++
> >  net/core/dev.c            | 28 ++++++++++++++++
> >  net/core/net-sysfs.c      | 68 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 98 insertions(+)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 8cb8d43ea5fa..26c3e8cf4c01 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> >         return napi_complete_done(n, 0);
> >  }
> >
> > +int dev_set_threaded(struct net_device *dev, bool threaded);
> > +
> >  /**
> >   *     napi_disable - prevent NAPI from scheduling
> >   *     @n: NAPI context
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 7ffa91475856..e71c2fd91595 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6767,6 +6767,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
> >         return 0;
> >  }
> >
> > +static void dev_disable_threaded_all(struct net_device *dev)
> > +{
> > +       struct napi_struct *napi;
> > +
> > +       list_for_each_entry(napi, &dev->napi_list, dev_list)
> > +               napi_set_threaded(napi, false);
> > +}
> > +
> > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > +{
> > +       struct napi_struct *napi;
> > +       int ret;
> > +
> > +       dev->threaded = threaded;
> > +       list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > +               ret = napi_set_threaded(napi, threaded);
> > +               if (ret) {
> > +                       /* Error occurred on one of the napi,
> > +                        * reset threaded mode on all napi.
> > +                        */
> > +                       dev_disable_threaded_all(dev);
> > +                       break;
> > +               }
> > +       }
> > +
> > +       return ret;
> > +}
> > +
>
> So I have a question about this function. Is there any reason why
> napi_set_threaded couldn't be broken into two pieces and handled in
> two passes with the first allocating the kthread and the second
> setting the threaded bit assuming the allocations all succeeded? The
> setting or clearing of the bit shouldn't need any return value since
> it is void and the allocation of the kthread is the piece that can
> fail. So it seems like it would make sense to see if you can allocate
> all of the kthreads first before you go through and attempt to enable
> threaded NAPI.
>
> Then you should only need to make a change to netif_napi_add that will
> allocate the kthread if adding a new instance on a device that is
> running in threaded mode and if a thread allocation fails you could
> clear dev->threaded so that when napi_enable is called we don't bother
> enabling any threaded setups since some of the threads are
> non-functional.
>
If we create kthreads during netif_napi_add() when dev->threaded is
set, that means the user has to bring down the device, set
dev->threaded and then bring up the device in order to use threaded
mode. I think this brings extra steps, while we could have made it
easier. I believe being able to live switch between on and off without
device up/down is a good and useful thing.
The other way to do this might be that we always create the kthread
during netif_napi_add() regardless of the dev->threaded value. And
when user requests to enable threaded mode, we only enable it, after
checking every napi has its kthread created.
But this also has its drawbacks. First, it means there will be several
idle kthreads hanging around even if the user never enables threaded
mode. Also, there is still the possibility that the kthread creation
fails. But since netif_napi_add() does not have a return value, no one
will handle it. Do we just never enable threaded mode in this case? Or
do we try to recreate the thread when the user tries to enable
threaded mode through the sysfs interface?


> Doing so would guarantee all-or-nothing behavior and you could then
> just use the dev->threaded to signal if the device is running threaded
> or not as you could just clear it if the kthread allocation fails.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20 17:28   ` Florian Fainelli
@ 2021-01-20 18:07     ` Wei Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-20 18:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David Miller, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa, Felix Fietkau,
	Alexander Duyck

On Wed, Jan 20, 2021 at 9:29 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/19/2021 7:34 PM, Wei Wang wrote:
> > This patch adds a new sysfs attribute to the network device class.
> > Said attribute provides a per-device control to enable/disable the
> > threaded mode for all the napi instances of the given network device.
> > User sets it to 1 or 0 to enable or disable threaded mode per device.
> > However, when user reads from this sysfs entry, it could return:
> >   1: means all napi instances belonging to this device have threaded
> > mode enabled.
> >   0: means all napi instances belonging to this device have threaded
> > mode disabled.
> >   -1: means the system fails to enable threaded mode for certain napi
> > instances when user requests to enable threaded mode. This happens
> > when the kthread fails to be created for certain napi instances.
> >
> > Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Co-developed-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > Signed-off-by: Wei Wang <weiwan@google.com>
>
> Can you document the new threaded sysfs attribute under
> Documentation/ABI/testing/sysfs-class-net?
OK. Will do.


> --
> Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-20 18:07     ` Wei Wang
@ 2021-01-21  3:35       ` Alexander Duyck
  2021-01-21 17:45         ` Wei Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Duyck @ 2021-01-21  3:35 UTC (permalink / raw)
  To: Wei Wang
  Cc: David Miller, Netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Hannes Frederic Sowa, Felix Fietkau

On Wed, Jan 20, 2021 at 10:07 AM Wei Wang <weiwan@google.com> wrote:
>
> On Wed, Jan 20, 2021 at 8:13 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Tue, Jan 19, 2021 at 7:35 PM Wei Wang <weiwan@google.com> wrote:
> > >
> > > This patch adds a new sysfs attribute to the network device class.
> > > Said attribute provides a per-device control to enable/disable the
> > > threaded mode for all the napi instances of the given network device.
> > > User sets it to 1 or 0 to enable or disable threaded mode per device.
> > > However, when user reads from this sysfs entry, it could return:
> > >   1: means all napi instances belonging to this device have threaded
> > > mode enabled.
> > >   0: means all napi instances belonging to this device have threaded
> > > mode disabled.
> > >   -1: means the system fails to enable threaded mode for certain napi
> > > instances when user requests to enable threaded mode. This happens
> > > when the kthread fails to be created for certain napi instances.
> > >
> > > Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > Co-developed-by: Felix Fietkau <nbd@nbd.name>
> > > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > ---
> > >  include/linux/netdevice.h |  2 ++
> > >  net/core/dev.c            | 28 ++++++++++++++++
> > >  net/core/net-sysfs.c      | 68 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 98 insertions(+)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 8cb8d43ea5fa..26c3e8cf4c01 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> > >         return napi_complete_done(n, 0);
> > >  }
> > >
> > > +int dev_set_threaded(struct net_device *dev, bool threaded);
> > > +
> > >  /**
> > >   *     napi_disable - prevent NAPI from scheduling
> > >   *     @n: NAPI context
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 7ffa91475856..e71c2fd91595 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -6767,6 +6767,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
> > >         return 0;
> > >  }
> > >
> > > +static void dev_disable_threaded_all(struct net_device *dev)
> > > +{
> > > +       struct napi_struct *napi;
> > > +
> > > +       list_for_each_entry(napi, &dev->napi_list, dev_list)
> > > +               napi_set_threaded(napi, false);
> > > +}
> > > +
> > > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > > +{
> > > +       struct napi_struct *napi;
> > > +       int ret;
> > > +
> > > +       dev->threaded = threaded;
> > > +       list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > > +               ret = napi_set_threaded(napi, threaded);
> > > +               if (ret) {
> > > +                       /* Error occurred on one of the napi,
> > > +                        * reset threaded mode on all napi.
> > > +                        */
> > > +                       dev_disable_threaded_all(dev);
> > > +                       break;
> > > +               }
> > > +       }
> > > +
> > > +       return ret;
> > > +}
> > > +
> >
> > So I have a question about this function. Is there any reason why
> > napi_set_threaded couldn't be broken into two pieces and handled in
> > two passes with the first allocating the kthread and the second
> > setting the threaded bit assuming the allocations all succeeded? The
> > setting or clearing of the bit shouldn't need any return value since
> > it is void and the allocation of the kthread is the piece that can
> > fail. So it seems like it would make sense to see if you can allocate
> > all of the kthreads first before you go through and attempt to enable
> > threaded NAPI.
> >
> > Then you should only need to make a change to netif_napi_add that will
> > allocate the kthread if adding a new instance on a device that is
> > running in threaded mode and if a thread allocation fails you could
> > clear dev->threaded so that when napi_enable is called we don't bother
> > enabling any threaded setups since some of the threads are
> > non-functional.
> >
> If we create kthreads during netif_napi_add() when dev->threaded is
> set, that means the user has to bring down the device, set
> dev->threaded and then bring up the device in order to use threaded
> mode. I think this brings extra steps, while we could have made it
> easier. I believe being able to live switch between on and off without
> device up/down is a good and useful thing.

I think you missed my point. I wasn't saying  you needed to change
this code to call netif_napi_add. What I was suggesting is breaking
this into two passes. The first pass would allocate the kthreads and
call the function you would likely use in the netif_napi_add call, and
the second would set the threaded bit. That way you don't have queues
in limbo during the process where you enable them and then disable
them.

Right now if the device doesn't have any napi instances enabled and
you make this call all it will be doing is setting dev->threaded. So
the thought is to try to reproduce this two step approach at a driver
level.

Then when napi queues are added you would have to add the kthread to
the new napi threads. You are currently doing that in the napi_enable
call. I would argue that it should happen in the netif_napi_add so you
can add them long before the napi is enabled. Unfortunately that still
isn't perfect as it seems some drivers go immediately from
netif_napi_add to napi_enable, however many will add all of their NAPI
instances before they enable them so in that case you could have
similar functionality to this enable call where first pass does the
allocation, and then if it all succeeded you then enable the feature
on all the queues when you call napi_enable and only modify the bit
there instead of trying to do the allocation.

> The other way to do this might be that we always create the kthread
> during netif_napi_add() regardless of the dev->threaded value. And
> when user requests to enable threaded mode, we only enable it, after
> checking every napi has its kthread created.

It doesn't make sense to always allocate the thread. I would only do
that if the netdev has dev->threaded set. Otherwise you can take care
of allocating it here when you toggle the value and leave it allocated
until the napi instance is deleted.

> But this also has its drawbacks. First, it means there will be several
> idle kthreads hanging around even if the user never enables threaded
> mode. Also, there is still the possibility that the kthread creation

As I said, I would only do the allocation if dev->threaded is set
which means the device wants the threaded NAPI.

Also if any allocation fails we should clear the bit on all existing
napi instances and clear dev->threaded. As far as freeing the kthreads
that would probably need to wait until we delete the NAPI instance. So
we would need to add a check somewhere to make sure we aren't
overwriting existing kthreads in the allocation.

> fails. But since netif_napi_add() does not have a return value, no one
> will handle it. Do we just never enable threaded mode in this case? Or
> do we try to recreate the thread when the user tries to enable
> threaded mode through the sysfs interface?

Right now we could handle this the same way we do for napi_enable
which is to not enable the feature if the thread isn't there. The only
difference is I would take it one step further and probably require
that dev->threaded be cleared on an allocation failure. Then before we
set the threaded bit we have to test for it being set and the kthread
is allocated before we allow setting the bit. Otherwise we clear the
threaded bit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode
  2021-01-21  3:35       ` Alexander Duyck
@ 2021-01-21 17:45         ` Wei Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Wang @ 2021-01-21 17:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Hannes Frederic Sowa, Felix Fietkau

On Wed, Jan 20, 2021 at 7:36 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, Jan 20, 2021 at 10:07 AM Wei Wang <weiwan@google.com> wrote:
> >
> > On Wed, Jan 20, 2021 at 8:13 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Tue, Jan 19, 2021 at 7:35 PM Wei Wang <weiwan@google.com> wrote:
> > > >
> > > > This patch adds a new sysfs attribute to the network device class.
> > > > Said attribute provides a per-device control to enable/disable the
> > > > threaded mode for all the napi instances of the given network device.
> > > > User sets it to 1 or 0 to enable or disable threaded mode per device.
> > > > However, when user reads from this sysfs entry, it could return:
> > > >   1: means all napi instances belonging to this device have threaded
> > > > mode enabled.
> > > >   0: means all napi instances belonging to this device have threaded
> > > > mode disabled.
> > > >   -1: means the system fails to enable threaded mode for certain napi
> > > > instances when user requests to enable threaded mode. This happens
> > > > when the kthread fails to be created for certain napi instances.
> > > >
> > > > Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Co-developed-by: Felix Fietkau <nbd@nbd.name>
> > > > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > ---
> > > >  include/linux/netdevice.h |  2 ++
> > > >  net/core/dev.c            | 28 ++++++++++++++++
> > > >  net/core/net-sysfs.c      | 68 +++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 98 insertions(+)
> > > >
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 8cb8d43ea5fa..26c3e8cf4c01 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> > > >         return napi_complete_done(n, 0);
> > > >  }
> > > >
> > > > +int dev_set_threaded(struct net_device *dev, bool threaded);
> > > > +
> > > >  /**
> > > >   *     napi_disable - prevent NAPI from scheduling
> > > >   *     @n: NAPI context
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 7ffa91475856..e71c2fd91595 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -6767,6 +6767,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static void dev_disable_threaded_all(struct net_device *dev)
> > > > +{
> > > > +       struct napi_struct *napi;
> > > > +
> > > > +       list_for_each_entry(napi, &dev->napi_list, dev_list)
> > > > +               napi_set_threaded(napi, false);
> > > > +}
> > > > +
> > > > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > > > +{
> > > > +       struct napi_struct *napi;
> > > > +       int ret;
> > > > +
> > > > +       dev->threaded = threaded;
> > > > +       list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > > > +               ret = napi_set_threaded(napi, threaded);
> > > > +               if (ret) {
> > > > +                       /* Error occurred on one of the napi,
> > > > +                        * reset threaded mode on all napi.
> > > > +                        */
> > > > +                       dev_disable_threaded_all(dev);
> > > > +                       break;
> > > > +               }
> > > > +       }
> > > > +
> > > > +       return ret;
> > > > +}
> > > > +
> > >
> > > So I have a question about this function. Is there any reason why
> > > napi_set_threaded couldn't be broken into two pieces and handled in
> > > two passes with the first allocating the kthread and the second
> > > setting the threaded bit assuming the allocations all succeeded? The
> > > setting or clearing of the bit shouldn't need any return value since
> > > it is void and the allocation of the kthread is the piece that can
> > > fail. So it seems like it would make sense to see if you can allocate
> > > all of the kthreads first before you go through and attempt to enable
> > > threaded NAPI.
> > >
> > > Then you should only need to make a change to netif_napi_add that will
> > > allocate the kthread if adding a new instance on a device that is
> > > running in threaded mode and if a thread allocation fails you could
> > > clear dev->threaded so that when napi_enable is called we don't bother
> > > enabling any threaded setups since some of the threads are
> > > non-functional.
> > >
> > If we create kthreads during netif_napi_add() when dev->threaded is
> > set, that means the user has to bring down the device, set
> > dev->threaded and then bring up the device in order to use threaded
> > mode. I think this brings extra steps, while we could have made it
> > easier. I believe being able to live switch between on and off without
> > device up/down is a good and useful thing.
>
> I think you missed my point. I wasn't saying  you needed to change
> this code to call netif_napi_add. What I was suggesting is breaking
> this into two passes. The first pass would allocate the kthreads and
> call the function you would likely use in the netif_napi_add call, and
> the second would set the threaded bit. That way you don't have queues
> in limbo during the process where you enable them and then disable
> them.
>
> Right now if the device doesn't have any napi instances enabled and
> you make this call all it will be doing is setting dev->threaded. So
> the thought is to try to reproduce this two step approach at a driver
> level.
>
> Then when napi queues are added you would have to add the kthread to
> the new napi threads. You are currently doing that in the napi_enable
> call. I would argue that it should happen in the netif_napi_add so you
> can add them long before the napi is enabled. Unfortunately that still
> isn't perfect as it seems some drivers go immediately from
> netif_napi_add to napi_enable, however many will add all of their NAPI
> instances before they enable them so in that case you could have
> similar functionality to this enable call where first pass does the
> allocation, and then if it all succeeded you then enable the feature
> on all the queues when you call napi_enable and only modify the bit
> there instead of trying to do the allocation.
>
> > The other way to do this might be that we always create the kthread
> > during netif_napi_add() regardless of the dev->threaded value. And
> > when user requests to enable threaded mode, we only enable it, after
> > checking every napi has its kthread created.
>
> It doesn't make sense to always allocate the thread. I would only do
> that if the netdev has dev->threaded set. Otherwise you can take care
> of allocating it here when you toggle the value and leave it allocated
> until the napi instance is deleted.
>
> > But this also has its drawbacks. First, it means there will be several
> > idle kthreads hanging around even if the user never enables threaded
> > mode. Also, there is still the possibility that the kthread creation
>
> As I said, I would only do the allocation if dev->threaded is set
> which means the device wants the threaded NAPI.
>
> Also if any allocation fails we should clear the bit on all existing
> napi instances and clear dev->threaded. As far as freeing the kthreads
> that would probably need to wait until we delete the NAPI instance. So
> we would need to add a check somewhere to make sure we aren't
> overwriting existing kthreads in the allocation.
>
> > fails. But since netif_napi_add() does not have a return value, no one
> > will handle it. Do we just never enable threaded mode in this case? Or
> > do we try to recreate the thread when the user tries to enable
> > threaded mode through the sysfs interface?
>
> Right now we could handle this the same way we do for napi_enable
> which is to not enable the feature if the thread isn't there. The only
> difference is I would take it one step further and probably require
> that dev->threaded be cleared on an allocation failure. Then before we
> set the threaded bit we have to test for it being set and the kthread
> is allocated before we allow setting the bit. Otherwise we clear the
> threaded bit.

Thanks for the clarification. I think I got it. The only inconsistency
would be the case where you've mentioned that if the driver calls
napi_enable() immediately after netif_napi_add() without adding all
the napi instances in advance. I would ignore that case and still use
dev->threaded to indicate whether threaded mode is enabled or not.
This way, we could get rid of the ternary value on dev->threaded.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-21 17:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20  3:34 [PATCH net-next v7 0/3] implement kthread based napi poll Wei Wang
2021-01-20  3:34 ` [PATCH net-next v7 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
2021-01-20  3:34 ` [PATCH net-next v7 2/3] net: implement threaded-able napi poll loop support Wei Wang
2021-01-20  3:34 ` [PATCH net-next v7 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
2021-01-20 16:12   ` Alexander Duyck
2021-01-20 18:07     ` Wei Wang
2021-01-21  3:35       ` Alexander Duyck
2021-01-21 17:45         ` Wei Wang
2021-01-20 17:28   ` Florian Fainelli
2021-01-20 18:07     ` Wei Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).