* [PATCH net-next v4 1/3] net: extract napi poll functionality to __napi_poll()
2020-12-09 0:54 [PATCH net-next v4 0/3] implement kthread based napi poll Wei Wang
@ 2020-12-09 0:54 ` Wei Wang
2020-12-09 0:54 ` [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support Wei Wang
2020-12-09 0:54 ` [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
2 siblings, 0 replies; 13+ messages in thread
From: Wei Wang @ 2020-12-09 0:54 UTC (permalink / raw)
To: David Miller, netdev, Jakub Kicinski
Cc: Paolo Abeni, Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau,
Hillf Danton
From: Felix Fietkau <nbd@nbd.name>
This commit introduces a new function __napi_poll() which does the main
logic of the existing napi_poll() function, and will be called by other
functions in later commits.
This idea and implementation is done by Felix Fietkau <nbd@nbd.name> and
is proposed as part of the patch to move napi work to work_queue
context.
This commit by itself is a code restructure.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 35 +++++++++++++++++++++++++----------
1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index ce8fea2e2788..8064af1dd03c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6765,15 +6765,10 @@ void __netif_napi_del(struct napi_struct *napi)
}
EXPORT_SYMBOL(__netif_napi_del);
-static int napi_poll(struct napi_struct *n, struct list_head *repoll)
+static int __napi_poll(struct napi_struct *n, bool *repoll)
{
- void *have;
int work, weight;
- list_del_init(&n->poll_list);
-
- have = netpoll_poll_lock(n);
-
weight = n->weight;
/* This NAPI_STATE_SCHED test is for avoiding a race
@@ -6793,7 +6788,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
n->poll, work, weight);
if (likely(work < weight))
- goto out_unlock;
+ return work;
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
@@ -6802,7 +6797,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
*/
if (unlikely(napi_disable_pending(n))) {
napi_complete(n);
- goto out_unlock;
+ return work;
}
/* The NAPI context has more processing work, but busy-polling
@@ -6815,7 +6810,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
*/
napi_schedule(n);
}
- goto out_unlock;
+ return work;
}
if (n->gro_bitmask) {
@@ -6833,9 +6828,29 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
if (unlikely(!list_empty(&n->poll_list))) {
pr_warn_once("%s: Budget exhausted after napi rescheduled\n",
n->dev ? n->dev->name : "backlog");
- goto out_unlock;
+ return work;
}
+ *repoll = true;
+
+ return work;
+}
+
+static int napi_poll(struct napi_struct *n, struct list_head *repoll)
+{
+ bool do_repoll = false;
+ void *have;
+ int work;
+
+ list_del_init(&n->poll_list);
+
+ have = netpoll_poll_lock(n);
+
+ work = __napi_poll(n, &do_repoll);
+
+ if (!do_repoll)
+ goto out_unlock;
+
list_add_tail(&n->poll_list, repoll);
out_unlock:
--
2.29.2.576.ga3fc446d84-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-09 0:54 [PATCH net-next v4 0/3] implement kthread based napi poll Wei Wang
2020-12-09 0:54 ` [PATCH net-next v4 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
@ 2020-12-09 0:54 ` Wei Wang
2020-12-12 22:50 ` Jakub Kicinski
2020-12-09 0:54 ` [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
2 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2020-12-09 0:54 UTC (permalink / raw)
To: David Miller, netdev, Jakub Kicinski
Cc: Paolo Abeni, Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau,
Hillf Danton
This patch allows running each napi poll loop inside its own
kernel thread.
The threaded mode could be enabled through napi_set_threaded()
api, and does not require a device up/down. The kthread gets
created on demand when napi_set_threaded() is called, and gets
shut down eventually in napi_disable().
Once that threaded mode is enabled and the kthread is
started, napi_schedule() will wake-up such thread instead
of scheduling the softirq.
The threaded poll loop behaves quite likely the net_rx_action,
but it does not have to manipulate local irqs and uses
an explicit scheduling point based on netdev_budget.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/linux/netdevice.h | 5 ++
net/core/dev.c | 105 ++++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7bf167993c05..abd3b52b7da6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -347,6 +347,7 @@ struct napi_struct {
struct list_head dev_list;
struct hlist_node napi_hash_node;
unsigned int napi_id;
+ struct task_struct *thread;
};
enum {
@@ -358,6 +359,7 @@ enum {
NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */
NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */
NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/
+ NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/
};
enum {
@@ -369,6 +371,7 @@ enum {
NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL),
+ NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED),
};
enum gro_result {
@@ -495,6 +498,8 @@ static inline bool napi_complete(struct napi_struct *n)
return napi_complete_done(n, 0);
}
+int napi_set_threaded(struct napi_struct *n, bool threaded);
+
/**
* napi_disable - prevent NAPI from scheduling
* @n: NAPI context
diff --git a/net/core/dev.c b/net/core/dev.c
index 8064af1dd03c..cb6c4e2363a4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -91,6 +91,7 @@
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
#include <linux/skbuff.h>
+#include <linux/kthread.h>
#include <linux/bpf.h>
#include <linux/bpf_trace.h>
#include <net/net_namespace.h>
@@ -1475,6 +1476,36 @@ void netdev_notify_peers(struct net_device *dev)
}
EXPORT_SYMBOL(netdev_notify_peers);
+static int napi_threaded_poll(void *data);
+
+static int napi_kthread_create(struct napi_struct *n)
+{
+ int err = 0;
+
+ /* Create and wake up the kthread once to put it in
+ * TASK_INTERRUPTIBLE mode to avoid the blocked task
+ * warning and work with loadavg.
+ */
+ n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
+ n->dev->name, n->napi_id);
+ if (IS_ERR(n->thread)) {
+ err = PTR_ERR(n->thread);
+ pr_err("kthread_run failed with err %d\n", err);
+ n->thread = NULL;
+ }
+
+ return err;
+}
+
+static void napi_kthread_stop(struct napi_struct *n)
+{
+ if (!n->thread)
+ return;
+ kthread_stop(n->thread);
+ clear_bit(NAPI_STATE_THREADED, &n->state);
+ n->thread = NULL;
+}
+
static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
{
const struct net_device_ops *ops = dev->netdev_ops;
@@ -4234,6 +4265,11 @@ int gro_normal_batch __read_mostly = 8;
static inline void ____napi_schedule(struct softnet_data *sd,
struct napi_struct *napi)
{
+ if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
+ wake_up_process(napi->thread);
+ return;
+ }
+
list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
}
@@ -6690,6 +6726,29 @@ static void init_gro_hash(struct napi_struct *napi)
napi->gro_bitmask = 0;
}
+int napi_set_threaded(struct napi_struct *n, bool threaded)
+{
+ int err = 0;
+
+ ASSERT_RTNL();
+
+ if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
+ return 0;
+ if (threaded) {
+ if (!n->thread) {
+ err = napi_kthread_create(n);
+ if (err)
+ goto out;
+ }
+ set_bit(NAPI_STATE_THREADED, &n->state);
+ } else {
+ clear_bit(NAPI_STATE_THREADED, &n->state);
+ }
+
+out:
+ return err;
+}
+
void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
int (*poll)(struct napi_struct *, int), int weight)
{
@@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
msleep(1);
hrtimer_cancel(&n->timer);
+ napi_kthread_stop(n);
clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
clear_bit(NAPI_STATE_DISABLE, &n->state);
@@ -6859,6 +6919,51 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
return work;
}
+static int napi_thread_wait(struct napi_struct *napi)
+{
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ while (!kthread_should_stop() && !napi_disable_pending(napi)) {
+ if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+ WARN_ON(!list_empty(&napi->poll_list));
+ __set_current_state(TASK_RUNNING);
+ return 0;
+ }
+
+ schedule();
+ set_current_state(TASK_INTERRUPTIBLE);
+ }
+ __set_current_state(TASK_RUNNING);
+ return -1;
+}
+
+static int napi_threaded_poll(void *data)
+{
+ struct napi_struct *napi = data;
+ void *have;
+
+ while (!napi_thread_wait(napi)) {
+ for (;;) {
+ bool repoll = false;
+
+ local_bh_disable();
+
+ have = netpoll_poll_lock(napi);
+ __napi_poll(napi, &repoll);
+ netpoll_poll_unlock(have);
+
+ __kfree_skb_flush();
+ local_bh_enable();
+
+ if (!repoll)
+ break;
+
+ cond_resched();
+ }
+ }
+ return 0;
+}
+
static __latent_entropy void net_rx_action(struct softirq_action *h)
{
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
--
2.29.2.576.ga3fc446d84-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-09 0:54 ` [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support Wei Wang
@ 2020-12-12 22:50 ` Jakub Kicinski
2020-12-12 22:55 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-12-12 22:50 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, netdev, Paolo Abeni, Hannes Frederic Sowa,
Eric Dumazet, Felix Fietkau, Hillf Danton
On Tue, 8 Dec 2020 16:54:43 -0800 Wei Wang wrote:
> This patch allows running each napi poll loop inside its own
> kernel thread.
> The threaded mode could be enabled through napi_set_threaded()
> api, and does not require a device up/down. The kthread gets
> created on demand when napi_set_threaded() is called, and gets
> shut down eventually in napi_disable().
>
> Once that threaded mode is enabled and the kthread is
> started, napi_schedule() will wake-up such thread instead
> of scheduling the softirq.
>
> The threaded poll loop behaves quite likely the net_rx_action,
> but it does not have to manipulate local irqs and uses
> an explicit scheduling point based on netdev_budget.
>
> Co-developed-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Co-developed-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> @@ -4234,6 +4265,11 @@ int gro_normal_batch __read_mostly = 8;
> static inline void ____napi_schedule(struct softnet_data *sd,
> struct napi_struct *napi)
> {
> + if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
> + wake_up_process(napi->thread);
FTR your implementation depends on the fact that this is the only
place that can wake the worker and not set kthread_should_stop().
Which I trust you is the case :) maybe I already mentioned this..
> + return;
> + }
> +
> list_add_tail(&napi->poll_list, &sd->poll_list);
> __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> }
> void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
> int (*poll)(struct napi_struct *, int), int weight)
> {
> @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> msleep(1);
>
> hrtimer_cancel(&n->timer);
> + napi_kthread_stop(n);
I'm surprised that we stop the thread on napi_disable() but there is no
start/create in napi_enable(). NAPIs can (and do get) disabled and
enabled again. But that'd make your code crash with many popular
drivers if you tried to change rings with threaded napi enabled so I
feel like I must be missing something..
> clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
> clear_bit(NAPI_STATE_DISABLE, &n->state);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-12 22:50 ` Jakub Kicinski
@ 2020-12-12 22:55 ` Jakub Kicinski
2020-12-14 17:59 ` Wei Wang
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-12-12 22:55 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, netdev, Paolo Abeni, Hannes Frederic Sowa,
Eric Dumazet, Felix Fietkau, Hillf Danton
On Sat, 12 Dec 2020 14:50:22 -0800 Jakub Kicinski wrote:
> > @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> > msleep(1);
> >
> > hrtimer_cancel(&n->timer);
> > + napi_kthread_stop(n);
>
> I'm surprised that we stop the thread on napi_disable() but there is no
> start/create in napi_enable(). NAPIs can (and do get) disabled and
> enabled again. But that'd make your code crash with many popular
> drivers if you tried to change rings with threaded napi enabled so I
> feel like I must be missing something..
Ah, not crash, 'cause the flag gets cleared. Is it intentional that any
changes that disable NAPIs cause us to go back to non-threaded NAPI?
I think I had the "threaded" setting stored in struct netdevice in my
patches, is there a reason not to do that?
In fact your patches may _require_ the device to be up to enable
threaded NAPI if NAPIs are allocated in open.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-12 22:55 ` Jakub Kicinski
@ 2020-12-14 17:59 ` Wei Wang
2020-12-14 19:02 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2020-12-14 17:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Sat, Dec 12, 2020 at 2:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 12 Dec 2020 14:50:22 -0800 Jakub Kicinski wrote:
> > > @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> > > msleep(1);
> > >
> > > hrtimer_cancel(&n->timer);
> > > + napi_kthread_stop(n);
> >
> > I'm surprised that we stop the thread on napi_disable() but there is no
> > start/create in napi_enable(). NAPIs can (and do get) disabled and
> > enabled again. But that'd make your code crash with many popular
> > drivers if you tried to change rings with threaded napi enabled so I
> > feel like I must be missing something..
>
> Ah, not crash, 'cause the flag gets cleared. Is it intentional that any
> changes that disable NAPIs cause us to go back to non-threaded NAPI?
> I think I had the "threaded" setting stored in struct netdevice in my
> patches, is there a reason not to do that?
>
Thanks for the comments!
The reason that I did not record it in dev is: there is a slight
chance that during creation of the kthreads, failures occur and we
flip back all NAPIs to use non-threaded mode. I am not sure the
recorded value in dev should be what user desires, or what the actual
situation is. Same as after the driver does a
napi_disabe()/napi_enable(). It might occur that the dev->threaded =
true, but the operation to re-create the kthreads fail and we flip
back to non-thread mode. This seems to get things more complicated.
What I expect is the user only enables the threaded mode after the
device is up and alive, with all NAPIs attached to dev, and enabled.
And user has to check the sysfs to make sure that the operation
succeeds.
And any operation that brings down the device, will flip this back to
default, which is non-threaded mode.
> In fact your patches may _require_ the device to be up to enable
> threaded NAPI if NAPIs are allocated in open.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-14 17:59 ` Wei Wang
@ 2020-12-14 19:02 ` Jakub Kicinski
2020-12-14 19:45 ` Wei Wang
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-12-14 19:02 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Mon, 14 Dec 2020 09:59:21 -0800 Wei Wang wrote:
> On Sat, Dec 12, 2020 at 2:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sat, 12 Dec 2020 14:50:22 -0800 Jakub Kicinski wrote:
> > > > @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> > > > msleep(1);
> > > >
> > > > hrtimer_cancel(&n->timer);
> > > > + napi_kthread_stop(n);
> > >
> > > I'm surprised that we stop the thread on napi_disable() but there is no
> > > start/create in napi_enable(). NAPIs can (and do get) disabled and
> > > enabled again. But that'd make your code crash with many popular
> > > drivers if you tried to change rings with threaded napi enabled so I
> > > feel like I must be missing something..
> >
> > Ah, not crash, 'cause the flag gets cleared. Is it intentional that any
> > changes that disable NAPIs cause us to go back to non-threaded NAPI?
> > I think I had the "threaded" setting stored in struct netdevice in my
> > patches, is there a reason not to do that?
>
> Thanks for the comments!
>
> The reason that I did not record it in dev is: there is a slight
> chance that during creation of the kthreads, failures occur and we
> flip back all NAPIs to use non-threaded mode. I am not sure the
> recorded value in dev should be what user desires, or what the actual
> situation is. Same as after the driver does a
> napi_disabe()/napi_enable(). It might occur that the dev->threaded =
> true, but the operation to re-create the kthreads fail and we flip
> back to non-thread mode. This seems to get things more complicated.
> What I expect is the user only enables the threaded mode after the
> device is up and alive, with all NAPIs attached to dev, and enabled.
> And user has to check the sysfs to make sure that the operation
> succeeds.
> And any operation that brings down the device, will flip this back to
> default, which is non-threaded mode.
It is quite an annoying problem to address, given all relevant NAPI
helpers seem to return void :/ But we're pushing the problem onto the
user just because of internal API structure.
This reminds me of PTP / timestamping issues some NICs had once upon
a time. The timing application enables HW time stamping, then later some
other application / orchestration changes a seemingly unrelated config,
and since NIC has to reset itself it looses the timestamping config.
Now the time app stops getting HW time stamps, but those are best
effort anyway, so it just assumes the NIC couldn't stamp given frame
(for every frame), not that config got completely broken. The system
keeps running with suboptimal time for months.
What does the deployment you're expecting to see looks like? What
entity controls enabling the threaded mode on a system? Application?
Orchestration? What's the flow?
"Forgetting" config based on driver-dependent events feels very fragile.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-14 19:02 ` Jakub Kicinski
@ 2020-12-14 19:45 ` Wei Wang
2020-12-14 20:33 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2020-12-14 19:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Mon, Dec 14, 2020 at 11:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 14 Dec 2020 09:59:21 -0800 Wei Wang wrote:
> > On Sat, Dec 12, 2020 at 2:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sat, 12 Dec 2020 14:50:22 -0800 Jakub Kicinski wrote:
> > > > > @@ -6731,6 +6790,7 @@ void napi_disable(struct napi_struct *n)
> > > > > msleep(1);
> > > > >
> > > > > hrtimer_cancel(&n->timer);
> > > > > + napi_kthread_stop(n);
> > > >
> > > > I'm surprised that we stop the thread on napi_disable() but there is no
> > > > start/create in napi_enable(). NAPIs can (and do get) disabled and
> > > > enabled again. But that'd make your code crash with many popular
> > > > drivers if you tried to change rings with threaded napi enabled so I
> > > > feel like I must be missing something..
> > >
> > > Ah, not crash, 'cause the flag gets cleared. Is it intentional that any
> > > changes that disable NAPIs cause us to go back to non-threaded NAPI?
> > > I think I had the "threaded" setting stored in struct netdevice in my
> > > patches, is there a reason not to do that?
> >
> > Thanks for the comments!
> >
> > The reason that I did not record it in dev is: there is a slight
> > chance that during creation of the kthreads, failures occur and we
> > flip back all NAPIs to use non-threaded mode. I am not sure the
> > recorded value in dev should be what user desires, or what the actual
> > situation is. Same as after the driver does a
> > napi_disabe()/napi_enable(). It might occur that the dev->threaded =
> > true, but the operation to re-create the kthreads fail and we flip
> > back to non-thread mode. This seems to get things more complicated.
> > What I expect is the user only enables the threaded mode after the
> > device is up and alive, with all NAPIs attached to dev, and enabled.
> > And user has to check the sysfs to make sure that the operation
> > succeeds.
> > And any operation that brings down the device, will flip this back to
> > default, which is non-threaded mode.
>
> It is quite an annoying problem to address, given all relevant NAPI
> helpers seem to return void :/ But we're pushing the problem onto the
> user just because of internal API structure.
>
> This reminds me of PTP / timestamping issues some NICs had once upon
> a time. The timing application enables HW time stamping, then later some
> other application / orchestration changes a seemingly unrelated config,
> and since NIC has to reset itself it looses the timestamping config.
> Now the time app stops getting HW time stamps, but those are best
> effort anyway, so it just assumes the NIC couldn't stamp given frame
> (for every frame), not that config got completely broken. The system
> keeps running with suboptimal time for months.
>
> What does the deployment you're expecting to see looks like? What
> entity controls enabling the threaded mode on a system? Application?
> Orchestration? What's the flow?
>
I see your point. In our deployment, we have a system daemon which is
responsible for setting up all the system tunings after the host boots
up (before application starts to run). If certain operation fails, it
prints out error msg, and will exit with error. For applications that
require threaded mode, I think a check to the sysfs entry to make sure
it is enabled is necessary at the startup phase.
> "Forgetting" config based on driver-dependent events feels very fragile.
I think we could add a recorded value in dev to represent the user
setting, and try to enable threaded mode after napi_disable/enable.
But I think user/application still has to check the sysfs entry value
to make sure if it is enabled successfully.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-14 19:45 ` Wei Wang
@ 2020-12-14 20:33 ` Jakub Kicinski
2020-12-14 21:12 ` Wei Wang
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-12-14 20:33 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Mon, 14 Dec 2020 11:45:43 -0800 Wei Wang wrote:
> > It is quite an annoying problem to address, given all relevant NAPI
> > helpers seem to return void :/ But we're pushing the problem onto the
> > user just because of internal API structure.
> >
> > This reminds me of PTP / timestamping issues some NICs had once upon
> > a time. The timing application enables HW time stamping, then later some
> > other application / orchestration changes a seemingly unrelated config,
> > and since NIC has to reset itself it looses the timestamping config.
> > Now the time app stops getting HW time stamps, but those are best
> > effort anyway, so it just assumes the NIC couldn't stamp given frame
> > (for every frame), not that config got completely broken. The system
> > keeps running with suboptimal time for months.
> >
> > What does the deployment you're expecting to see looks like? What
> > entity controls enabling the threaded mode on a system? Application?
> > Orchestration? What's the flow?
> >
> I see your point. In our deployment, we have a system daemon which is
> responsible for setting up all the system tunings after the host boots
> up (before application starts to run). If certain operation fails, it
> prints out error msg, and will exit with error. For applications that
> require threaded mode, I think a check to the sysfs entry to make sure
> it is enabled is necessary at the startup phase.
That assumes no workload stacking, and dynamic changes after the
workload has started? Or does the daemon have enough clever logic
to resolve config changes?
> > "Forgetting" config based on driver-dependent events feels very fragile.
> I think we could add a recorded value in dev to represent the user
> setting, and try to enable threaded mode after napi_disable/enable.
> But I think user/application still has to check the sysfs entry value
> to make sure if it is enabled successfully.
In case of an error you're thinking of resetting, still, and returning
disabled from sysfs? I guess that's fine, we can leave failing the bad
reconfig operation (rather than resetting config) as a future extension.
Let's add a WARN_ON, tho, so the failures don't get missed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support
2020-12-14 20:33 ` Jakub Kicinski
@ 2020-12-14 21:12 ` Wei Wang
0 siblings, 0 replies; 13+ messages in thread
From: Wei Wang @ 2020-12-14 21:12 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Mon, Dec 14, 2020 at 12:33 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 14 Dec 2020 11:45:43 -0800 Wei Wang wrote:
> > > It is quite an annoying problem to address, given all relevant NAPI
> > > helpers seem to return void :/ But we're pushing the problem onto the
> > > user just because of internal API structure.
> > >
> > > This reminds me of PTP / timestamping issues some NICs had once upon
> > > a time. The timing application enables HW time stamping, then later some
> > > other application / orchestration changes a seemingly unrelated config,
> > > and since NIC has to reset itself it looses the timestamping config.
> > > Now the time app stops getting HW time stamps, but those are best
> > > effort anyway, so it just assumes the NIC couldn't stamp given frame
> > > (for every frame), not that config got completely broken. The system
> > > keeps running with suboptimal time for months.
> > >
> > > What does the deployment you're expecting to see looks like? What
> > > entity controls enabling the threaded mode on a system? Application?
> > > Orchestration? What's the flow?
> > >
> > I see your point. In our deployment, we have a system daemon which is
> > responsible for setting up all the system tunings after the host boots
> > up (before application starts to run). If certain operation fails, it
> > prints out error msg, and will exit with error. For applications that
> > require threaded mode, I think a check to the sysfs entry to make sure
> > it is enabled is necessary at the startup phase.
>
> That assumes no workload stacking, and dynamic changes after the
> workload has started? Or does the daemon have enough clever logic
> to resolve config changes?
>
The former. There should not be any dynamic changes after the workload
has started. At least for now.
> > > "Forgetting" config based on driver-dependent events feels very fragile.
> > I think we could add a recorded value in dev to represent the user
> > setting, and try to enable threaded mode after napi_disable/enable.
> > But I think user/application still has to check the sysfs entry value
> > to make sure if it is enabled successfully.
>
> In case of an error you're thinking of resetting, still, and returning
> disabled from sysfs? I guess that's fine, we can leave failing the bad
> reconfig operation (rather than resetting config) as a future extension.
> Let's add a WARN_ON, tho, so the failures don't get missed.
Yes. In terms of error, all napi will be reset to using non-threaded
mode, and sysfs returns disabled.
OK. Will add a WARN_ON.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode
2020-12-09 0:54 [PATCH net-next v4 0/3] implement kthread based napi poll Wei Wang
2020-12-09 0:54 ` [PATCH net-next v4 1/3] net: extract napi poll functionality to __napi_poll() Wei Wang
2020-12-09 0:54 ` [PATCH net-next v4 2/3] net: implement threaded-able napi poll loop support Wei Wang
@ 2020-12-09 0:54 ` Wei Wang
2020-12-12 22:59 ` Jakub Kicinski
2 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2020-12-09 0:54 UTC (permalink / raw)
To: David Miller, netdev, Jakub Kicinski
Cc: Paolo Abeni, Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau,
Hillf Danton
This patch adds a new sysfs attribute to the network device class.
Said attribute provides a per-device control to enable/disable the
threaded mode for all the napi instances of the given network device.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
net/core/net-sysfs.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 94fff0700bdd..d14dc1da4c97 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -538,6 +538,75 @@ static ssize_t phys_switch_id_show(struct device *dev,
}
static DEVICE_ATTR_RO(phys_switch_id);
+static ssize_t threaded_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct net_device *netdev = to_net_dev(dev);
+ struct napi_struct *n;
+ bool enabled;
+ int ret;
+
+ if (!rtnl_trylock())
+ return restart_syscall();
+
+ if (!dev_isalive(netdev)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (list_empty(&netdev->napi_list)) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+
+ n = list_first_entry(&netdev->napi_list, struct napi_struct, dev_list);
+ enabled = !!test_bit(NAPI_STATE_THREADED, &n->state);
+
+ ret = sprintf(buf, fmt_dec, enabled);
+
+unlock:
+ rtnl_unlock();
+ return ret;
+}
+
+static void dev_disable_threaded_all(struct net_device *dev)
+{
+ struct napi_struct *napi;
+
+ list_for_each_entry(napi, &dev->napi_list, dev_list)
+ napi_set_threaded(napi, false);
+}
+
+static int modify_napi_threaded(struct net_device *dev, unsigned long val)
+{
+ struct napi_struct *napi;
+ int ret;
+
+ if (list_empty(&dev->napi_list))
+ return -EOPNOTSUPP;
+
+ list_for_each_entry(napi, &dev->napi_list, dev_list) {
+ ret = napi_set_threaded(napi, !!val);
+ if (ret) {
+ /* Error occurred on one of the napi,
+ * reset threaded mode on all napi.
+ */
+ dev_disable_threaded_all(dev);
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static ssize_t threaded_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return netdev_store(dev, attr, buf, len, modify_napi_threaded);
+}
+static DEVICE_ATTR_RW(threaded);
+
static struct attribute *net_class_attrs[] __ro_after_init = {
&dev_attr_netdev_group.attr,
&dev_attr_type.attr,
@@ -570,6 +639,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = {
&dev_attr_proto_down.attr,
&dev_attr_carrier_up_count.attr,
&dev_attr_carrier_down_count.attr,
+ &dev_attr_threaded.attr,
NULL,
};
ATTRIBUTE_GROUPS(net_class);
--
2.29.2.576.ga3fc446d84-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode
2020-12-09 0:54 ` [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode Wei Wang
@ 2020-12-12 22:59 ` Jakub Kicinski
2020-12-14 18:02 ` Wei Wang
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-12-12 22:59 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, netdev, Paolo Abeni, Hannes Frederic Sowa,
Eric Dumazet, Felix Fietkau, Hillf Danton
On Tue, 8 Dec 2020 16:54:44 -0800 Wei Wang wrote:
> +static void dev_disable_threaded_all(struct net_device *dev)
> +{
> + struct napi_struct *napi;
> +
> + list_for_each_entry(napi, &dev->napi_list, dev_list)
> + napi_set_threaded(napi, false);
> +}
This is an implementation detail which should be hidden in dev.c IMHO.
Since the sysfs knob is just a device global on/off the iteration over
napis and all is best hidden away.
(sorry about the delayed review BTW, hope we can do a minor revision
and still hit 5.12)
> +static int modify_napi_threaded(struct net_device *dev, unsigned long val)
> +{
> + struct napi_struct *napi;
> + int ret;
> +
> + if (list_empty(&dev->napi_list))
> + return -EOPNOTSUPP;
> +
> + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> + ret = napi_set_threaded(napi, !!val);
> + if (ret) {
> + /* Error occurred on one of the napi,
> + * reset threaded mode on all napi.
> + */
> + dev_disable_threaded_all(dev);
> + break;
> + }
> + }
> +
> + return ret;
> +}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 3/3] net: add sysfs attribute to control napi threaded mode
2020-12-12 22:59 ` Jakub Kicinski
@ 2020-12-14 18:02 ` Wei Wang
0 siblings, 0 replies; 13+ messages in thread
From: Wei Wang @ 2020-12-14 18:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Linux Kernel Network Developers, Paolo Abeni,
Hannes Frederic Sowa, Eric Dumazet, Felix Fietkau, Hillf Danton
On Sat, Dec 12, 2020 at 2:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 8 Dec 2020 16:54:44 -0800 Wei Wang wrote:
> > +static void dev_disable_threaded_all(struct net_device *dev)
> > +{
> > + struct napi_struct *napi;
> > +
> > + list_for_each_entry(napi, &dev->napi_list, dev_list)
> > + napi_set_threaded(napi, false);
> > +}
>
> This is an implementation detail which should be hidden in dev.c IMHO.
> Since the sysfs knob is just a device global on/off the iteration over
> napis and all is best hidden away.
OK. Will move it.
>
> (sorry about the delayed review BTW, hope we can do a minor revision
> and still hit 5.12)
>
> > +static int modify_napi_threaded(struct net_device *dev, unsigned long val)
> > +{
> > + struct napi_struct *napi;
> > + int ret;
> > +
> > + if (list_empty(&dev->napi_list))
> > + return -EOPNOTSUPP;
> > +
> > + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > + ret = napi_set_threaded(napi, !!val);
> > + if (ret) {
> > + /* Error occurred on one of the napi,
> > + * reset threaded mode on all napi.
> > + */
> > + dev_disable_threaded_all(dev);
> > + break;
> > + }
> > + }
> > +
> > + return ret;
> > +}
^ permalink raw reply [flat|nested] 13+ messages in thread