* [RFC PATCH bpf-next 0/9] Introduce biased busy-polling
@ 2020-10-28 13:34 Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, bjorn.topel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
Jakub suggested in [1] a "strict busy-polling mode with out
interrupts". This is a first stab at that.
This series adds a new NAPI mode, called biased busy-polling, which is
an extension to the existing busy-polling mode. The new mode is
enabled on the socket layer, where a socket setting this option
"promisies" to busy-poll the NAPI context via a system call. When this
mode is enabled, the NAPI context will operate in a mode with
interrupts disabled. The kernel monitors that the busy-polling promise
is fulfilled by an internal watchdog. If the socket fail/stop
performing the busy-polling, the mode will be disabled.
Biased busy-polling follows the same mechanism as the existing
busy-poll; The napi_id is reported to the socket via the skbuff. Later
commits will extend napi_id reporting to XDP, in order to work
correctly with XDP sockets.
Let us walk through a flow of execution:
1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
socket now shows an intent of doing busy-polling. No data has been
received to the socket, so the napi_id of the socket is still 0
(non-valid). As usual for busy-polling, the SO_BUSY_POLL option
also has to be non-zero for biased busy-polling.
2. Data is received on the socket changing the napi_id to non-zero.
3. The socket does a system call that has the busy-polling logic wired
up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
as biased busy-poll. The kernel watchdog is armed. If the NAPI
context is already running, it will try to finish as soon as
possible and move to busy-polling. If the NAPI context is not
running, it will execute the NAPI poll function for the
corresponding napi_id.
4. Goto 3, or wait until the watchdog timeout.
The series is outlined as following:
Patch 1-2: Biased busy-polling, and option to set busy-poll budget.
Patch 3-6: Busy-poll plumbing for XDP sockets
Patch 7-9: Add busy-polling support to the xdpsock sample
Performance UDP sockets:
I hacked netperf to use non-blocking sockets, and looping over
recvfrom(). The following command-line was used:
$ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
-o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
Non-blocking:
16,18.45,195,0.94,54070.369
Non-blocking with biased busy-polling:
15,16.59,38,0.70,60086.313
Performance XDP sockets:
Today, running XDP sockets sample on the same core as the softirq
handling, performance tanks mainly because we do not yield to
user-space when the XDP socket Rx queue is full.
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
Rx: 64Kpps
# # biased busy-polling, budget 8
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
Rx 9.9Mpps
# # biased busy-polling, budget 64
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
Rx: 19.3Mpps
# # biased busy-polling, budget 256
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
Rx: 21.4Mpps
# # biased busy-polling, budget 512
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
Rx: 21.4Mpps
Compared to the two-core case:
# taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
Rx: 20.7Mpps
We're getting better single-core performance than two, for this naïve
drop scenario.
The above tests was done for the 'ice' driver.
Some outstanding questions:
* Does biased busy-polling make sense for non-XDP sockets? For a
dedicated queue, biased busy-polling has a strong case. When the
NAPI is shared with other sockets, it can affect the latencies of
sockets that were not explicity busy-poll enabled. Note that this
true for regular busy-polling as well, but the biased version is
stricter.
* Currently busy-polling for UDP/TCP is only wired up in the recvmsg()
path. Does it make sense to extend that to sendmsg() as well?
* Biased busy-polling only makes sense for non-blocking sockets. Reject
enabling of biased busy-polling unless the socket is non-blocking?
* The watchdog is 200 ms. Should it be configurable?
* Extending xdp_rxq_info_reg() with napi_id touches a lot of drivers,
and I've only verified the Intel ones. Some drivers initialize NAPI
(generating the napi_id) after the xdp_rxq_info_reg() call, which
maybe would open up for another API? I did not send this RFC to all
the driver authors. I'll do that for a patch proper series.
* Today, enabling busy-polling require CAP_NET_ADMIN. For a NAPI
context that services multiple socket, this makes sense because one
socket can affect performance of other sockets. Now, for a
*dedicated* queue for say XDP socket, would it be OK to drop
CAP_NET_ADMIN, because it cannot affect other sockets/users?
@Jakub Thanks for the early comments. I left the check in
napi_schedule_prep(), because I hit that for the Intel i40e driver;
forcing busy-polling on a core outside the interrupt affinity mask.
[1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Björn Töpel (9):
net: introduce biased busy-polling
net: add SO_BUSY_POLL_BUDGET socket option
xsk: add support for recvmsg()
xsk: check need wakeup flag in sendmsg()
xsk: add busy-poll support for {recv,send}msg()
xsk: propagate napi_id to XDP socket Rx path
samples/bpf: use recvfrom() in xdpsock
samples/bpf: add busy-poll support to xdpsock
samples/bpf: add option to set the busy-poll budget
arch/alpha/include/uapi/asm/socket.h | 3 +
arch/mips/include/uapi/asm/socket.h | 3 +
arch/parisc/include/uapi/asm/socket.h | 3 +
arch/sparc/include/uapi/asm/socket.h | 3 +
drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
.../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
.../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 4 +-
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
.../ethernet/netronome/nfp/nfp_net_common.c | 2 +-
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
drivers/net/ethernet/sfc/rx_common.c | 2 +-
drivers/net/ethernet/socionext/netsec.c | 2 +-
drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
drivers/net/hyperv/netvsc.c | 2 +-
drivers/net/tun.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/virtio_net.c | 2 +-
drivers/net/xen-netfront.c | 2 +-
fs/eventpoll.c | 3 +-
include/linux/netdevice.h | 33 +++---
include/net/busy_poll.h | 42 +++++--
include/net/sock.h | 4 +
include/net/xdp.h | 3 +-
include/uapi/asm-generic/socket.h | 3 +
net/core/dev.c | 111 +++++++++++++++---
net/core/sock.c | 19 +++
net/core/xdp.c | 3 +-
net/xdp/xsk.c | 36 +++++-
net/xdp/xsk_buff_pool.c | 13 +-
samples/bpf/xdpsock_user.c | 53 +++++++--
37 files changed, 296 insertions(+), 85 deletions(-)
base-commit: 3cb12d27ff655e57e8efe3486dca2a22f4e30578
--
2.27.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 1/9] net: introduce biased busy-polling
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
This change adds a new NAPI mode, called biased busy-polling, which is
an extension to the existing busy-polling mode. The new mode is
enabled on the socket layer, where a socket setting this option
"promisies" to busy-poll the NAPI context via a system call. When this
mode is enabled, the NAPI context will operate in a mode with
interrupts disabled. The kernel monitors that the busy-polling promise
is fulfilled by an internal watchdog. If the socket fail/stop
performing the busy-polling, the mode will be disabled. The watchdog
is currently 200 ms.
Biased busy-polling follows the same mechanism as the existing
busy-poll; The napi_id is reported to the socket via the skbuff. Later
commits will extend napi_id reporting to XDP, in order to work
correctly with XDP sockets.
Let us walk through a flow of execution:
1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
socket now shows an intent of doing busy-polling. No data has been
received to the socket, so the napi_id of the socket is still 0
(non-valid). As usual for busy-polling, the SO_BUSY_POLL option
also has to be non-zero for biased busy-polling.
2. Data is received on the socket changing the napi_id to non-zero.
3. The socket does a system call that has the busy-polling logic wired
up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
as biased busy-poll. The kernel watchdog is armed. If the NAPI
context is already running, it will try to finish as soon as
possible and move to busy-polling. If the NAPI context is not
running, it will execute the NAPI poll function for the
corresponding napi_id.
4. Goto 3, or wait until the watchdog timeout.
Given the nature of busy-polling, this mode only make sense for
non-blocking sockets.
When the NAPI context is in biased busy-polling mode, it will not
allow a NAPI to be scheduled using the
napi_schedule_prep()/napi_scheduleXXX() combo.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
include/linux/netdevice.h | 33 +++++-----
include/net/busy_poll.h | 17 ++++-
include/net/sock.h | 3 +
include/uapi/asm-generic/socket.h | 2 +
net/core/dev.c | 89 +++++++++++++++++++++++++--
net/core/sock.c | 9 +++
10 files changed, 140 insertions(+), 21 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index de6c4df61082..0f776668fb09 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -124,6 +124,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_BIAS_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d0a9ed2ca2d6..d23984731504 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -135,6 +135,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_BIAS_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 10173c32195e..49469713ed2a 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -116,6 +116,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x4042
+#define SO_BIAS_BUSY_POLL 0x4043
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 8029b681fc7c..009aba6f7a54 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -117,6 +117,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x0047
+#define SO_BIAS_BUSY_POLL 0x0048
+
#if !defined(__KERNEL__)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..9bdc84d3d6b8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -344,29 +344,32 @@ struct napi_struct {
struct list_head rx_list; /* Pending GRO_NORMAL skbs */
int rx_count; /* length of rx_list */
struct hrtimer timer;
+ struct hrtimer bp_watchdog;
struct list_head dev_list;
struct hlist_node napi_hash_node;
unsigned int napi_id;
};
enum {
- NAPI_STATE_SCHED, /* Poll is scheduled */
- NAPI_STATE_MISSED, /* reschedule a napi */
- NAPI_STATE_DISABLE, /* Disable pending */
- NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
- NAPI_STATE_LISTED, /* NAPI added to system lists */
- NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
- NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+ NAPI_STATE_SCHED, /* Poll is scheduled */
+ NAPI_STATE_MISSED, /* reschedule a napi */
+ NAPI_STATE_DISABLE, /* Disable pending */
+ NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
+ NAPI_STATE_LISTED, /* NAPI added to system lists */
+ NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */
+ NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */
+ NAPI_STATE_BIAS_BUSY_POLL, /* biased busy-polling */
};
enum {
- NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
- NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
- NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
- NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
- NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
- NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
- NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+ NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
+ NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
+ NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
+ NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
+ NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
+ NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
+ NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+ NAPIF_STATE_BIAS_BUSY_POLL = BIT(NAPI_STATE_BIAS_BUSY_POLL),
};
enum gro_result {
@@ -555,6 +558,8 @@ static inline bool napi_if_scheduled_mark_missed(struct napi_struct *n)
return true;
}
+void napi_bias_busy_poll(unsigned int napi_id);
+
enum netdev_queue_state_t {
__QUEUE_STATE_DRV_XOFF,
__QUEUE_STATE_STACK_XOFF,
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index b001fa91c14e..9738923ed17b 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -23,6 +23,9 @@
*/
#define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1))
+/* Biased busy-poll watchdog timeout in ms */
+#define BIASED_BUSY_POLL_TIMEOUT 200
+
#ifdef CONFIG_NET_RX_BUSY_POLL
struct napi_struct;
@@ -99,13 +102,25 @@ static inline bool sk_busy_loop_timeout(struct sock *sk,
return true;
}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static inline void __sk_bias_busy_poll(struct sock *sk, unsigned int napi_id)
+{
+ if (likely(!READ_ONCE(sk->sk_bias_busy_poll)))
+ return;
+
+ napi_bias_busy_poll(napi_id);
+}
+#endif
+
static inline void sk_busy_loop(struct sock *sk, int nonblock)
{
#ifdef CONFIG_NET_RX_BUSY_POLL
unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
- if (napi_id >= MIN_NAPI_ID)
+ if (napi_id >= MIN_NAPI_ID) {
+ __sk_bias_busy_poll(sk, napi_id);
napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+ }
#endif
}
diff --git a/include/net/sock.h b/include/net/sock.h
index a5c6ae78df77..cf71834fb601 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -479,6 +479,9 @@ struct sock {
u32 sk_ack_backlog;
u32 sk_max_ack_backlog;
kuid_t sk_uid;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ u8 sk_bias_busy_poll;
+#endif
struct pid *sk_peer_pid;
const struct cred *sk_peer_cred;
long sk_rcvtimeo;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 77f7c1638eb1..8a2b37ccd9d5 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -119,6 +119,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_BIAS_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/dev.c b/net/core/dev.c
index 9499a414d67e..a29e4c4a35f6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6378,6 +6378,9 @@ bool napi_schedule_prep(struct napi_struct *n)
val = READ_ONCE(n->state);
if (unlikely(val & NAPIF_STATE_DISABLE))
return false;
+ if (unlikely(val & NAPIF_STATE_BIAS_BUSY_POLL))
+ return false;
+
new = val | NAPIF_STATE_SCHED;
/* Sets STATE_MISSED bit if STATE_SCHED was already set
@@ -6458,12 +6461,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
/* If STATE_MISSED was set, leave STATE_SCHED set,
* because we will call napi->poll() one more time.
- * This C code was suggested by Alexander Duyck to help gcc.
*/
- new |= (val & NAPIF_STATE_MISSED) / NAPIF_STATE_MISSED *
- NAPIF_STATE_SCHED;
+ if (val & NAPIF_STATE_MISSED && !(val & NAPIF_STATE_BIAS_BUSY_POLL))
+ new |= NAPIF_STATE_SCHED;
} while (cmpxchg(&n->state, val, new) != val);
+ if (unlikely(val & NAPIF_STATE_BIAS_BUSY_POLL))
+ return false;
+
if (unlikely(val & NAPIF_STATE_MISSED)) {
__napi_schedule(n);
return false;
@@ -6497,6 +6502,20 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
{
int rc;
+ clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
+
+ local_bh_disable();
+ /* If we're biased towards busy poll, clear the sched flags,
+ * so that we can enter again.
+ */
+ if (READ_ONCE(napi->state) & NAPIF_STATE_BIAS_BUSY_POLL) {
+ netpoll_poll_unlock(have_poll_lock);
+ napi_complete(napi);
+ __kfree_skb_flush();
+ local_bh_enable();
+ return;
+ }
+
/* Busy polling means there is a high chance device driver hard irq
* could not grab NAPI_STATE_SCHED, and that NAPI_STATE_MISSED was
* set in napi_schedule_prep().
@@ -6507,9 +6526,6 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
* to perform these two clear_bit()
*/
clear_bit(NAPI_STATE_MISSED, &napi->state);
- clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
-
- local_bh_disable();
/* All we really want here is to re-enable device interrupts.
* Ideally, a new ndo_busy_poll_stop() could avoid another round.
@@ -6569,6 +6585,11 @@ void napi_busy_loop(unsigned int napi_id,
goto count;
have_poll_lock = netpoll_poll_lock(napi);
napi_poll = napi->poll;
+ if (val & NAPIF_STATE_BIAS_BUSY_POLL) {
+ hrtimer_start(&napi->bp_watchdog,
+ ms_to_ktime(BIASED_BUSY_POLL_TIMEOUT),
+ HRTIMER_MODE_REL_PINNED);
+ }
}
work = napi_poll(napi, BUSY_POLL_BUDGET);
trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
@@ -6652,6 +6673,53 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+static enum hrtimer_restart napi_biased_busy_poll_watchdog(struct hrtimer *timer)
+{
+ struct napi_struct *napi;
+ unsigned long val, new;
+
+ napi = container_of(timer, struct napi_struct, bp_watchdog);
+
+ do {
+ val = READ_ONCE(napi->state);
+ if (WARN_ON_ONCE(!(val & NAPIF_STATE_BIAS_BUSY_POLL)))
+ return HRTIMER_NORESTART;
+
+ new = val & ~NAPIF_STATE_BIAS_BUSY_POLL;
+ } while (cmpxchg(&napi->state, val, new) != val);
+
+ if (!napi_disable_pending(napi) &&
+ !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
+ __napi_schedule_irqoff(napi);
+
+ return HRTIMER_NORESTART;
+}
+
+void napi_bias_busy_poll(unsigned int napi_id)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ struct napi_struct *napi;
+ unsigned long val, new;
+
+ napi = napi_by_id(napi_id);
+ if (!napi)
+ return;
+
+ do {
+ val = READ_ONCE(napi->state);
+ if (val & NAPIF_STATE_BIAS_BUSY_POLL)
+ return;
+
+ new = val | NAPIF_STATE_BIAS_BUSY_POLL;
+ } while (cmpxchg(&napi->state, val, new) != val);
+
+ hrtimer_start(&napi->bp_watchdog, ms_to_ktime(BIASED_BUSY_POLL_TIMEOUT),
+ HRTIMER_MODE_REL_PINNED);
+#endif
+}
+EXPORT_SYMBOL(napi_bias_busy_poll);
+
+
static void init_gro_hash(struct napi_struct *napi)
{
int i;
@@ -6673,6 +6741,8 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
INIT_HLIST_NODE(&napi->napi_hash_node);
hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
napi->timer.function = napi_watchdog;
+ hrtimer_init(&napi->bp_watchdog, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+ napi->bp_watchdog.function = napi_biased_busy_poll_watchdog;
init_gro_hash(napi);
napi->skb = NULL;
INIT_LIST_HEAD(&napi->rx_list);
@@ -6704,7 +6774,9 @@ void napi_disable(struct napi_struct *n)
msleep(1);
hrtimer_cancel(&n->timer);
+ hrtimer_cancel(&n->bp_watchdog);
+ clear_bit(NAPI_STATE_BIAS_BUSY_POLL, &n->state);
clear_bit(NAPI_STATE_DISABLE, &n->state);
}
EXPORT_SYMBOL(napi_disable);
@@ -6767,6 +6839,11 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
if (likely(work < weight))
goto out_unlock;
+ if (unlikely(n->state & NAPIF_STATE_BIAS_BUSY_POLL)) {
+ napi_complete(n);
+ goto out_unlock;
+ }
+
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..686eb5549b79 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
sk->sk_ll_usec = val;
}
break;
+ case SO_BIAS_BUSY_POLL:
+ if (valbool && !capable(CAP_NET_ADMIN))
+ ret = -EPERM;
+ else
+ sk->sk_bias_busy_poll = valbool;
+ break;
#endif
case SO_MAX_PACING_RATE:
@@ -1523,6 +1529,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
case SO_BUSY_POLL:
v.val = sk->sk_ll_usec;
break;
+ case SO_BIAS_BUSY_POLL:
+ v.val = sk->sk_bias_busy_poll;
+ break;
#endif
case SO_MAX_PACING_RATE:
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
arch/alpha/include/uapi/asm/socket.h | 1 +
arch/mips/include/uapi/asm/socket.h | 1 +
arch/parisc/include/uapi/asm/socket.h | 1 +
arch/sparc/include/uapi/asm/socket.h | 1 +
fs/eventpoll.c | 3 ++-
include/net/busy_poll.h | 6 ++++--
include/net/sock.h | 1 +
include/uapi/asm-generic/socket.h | 1 +
net/core/dev.c | 20 +++++++++-----------
net/core/sock.c | 10 ++++++++++
10 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 0f776668fb09..4ea972b7b711 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -125,6 +125,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_BIAS_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d23984731504..13eaffbfbe50 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -136,6 +136,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_BIAS_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 49469713ed2a..036e42dac6b3 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -117,6 +117,7 @@
#define SO_DETACH_REUSEPORT_BPF 0x4042
#define SO_BIAS_BUSY_POLL 0x4043
+#define SO_BUSY_POLL_BUDGET 0x4044
#if !defined(__KERNEL__)
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 009aba6f7a54..bc482dc93bd4 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -118,6 +118,7 @@
#define SO_DETACH_REUSEPORT_BPF 0x0047
#define SO_BIAS_BUSY_POLL 0x0048
+#define SO_BUSY_POLL_BUDGET 0x0049
#if !defined(__KERNEL__)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..fa00a0640264 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,8 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
unsigned int napi_id = READ_ONCE(ep->napi_id);
if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
- napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
+ napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep,
+ BUSY_POLL_BUDGET);
}
static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 9738923ed17b..c6c413d3824d 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -25,6 +25,7 @@
/* Biased busy-poll watchdog timeout in ms */
#define BIASED_BUSY_POLL_TIMEOUT 200
+#define BUSY_POLL_BUDGET 8
#ifdef CONFIG_NET_RX_BUSY_POLL
@@ -46,7 +47,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg);
+ void *loop_end_arg, u16 budget);
#else /* CONFIG_NET_RX_BUSY_POLL */
static inline unsigned long net_busy_loop_on(void)
@@ -119,7 +120,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
if (napi_id >= MIN_NAPI_ID) {
__sk_bias_busy_poll(sk, napi_id);
- napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+ napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
+ sk->sk_busy_poll_budget ?: BUSY_POLL_BUDGET);
}
#endif
}
diff --git a/include/net/sock.h b/include/net/sock.h
index cf71834fb601..3caf53b6bd71 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -481,6 +481,7 @@ struct sock {
kuid_t sk_uid;
#ifdef CONFIG_NET_RX_BUSY_POLL
u8 sk_bias_busy_poll;
+ u16 sk_busy_poll_budget;
#endif
struct pid *sk_peer_pid;
const struct cred *sk_peer_cred;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 8a2b37ccd9d5..9dc1f35fe77f 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -120,6 +120,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_BIAS_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/net/core/dev.c b/net/core/dev.c
index a29e4c4a35f6..b34520acaa7f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6496,9 +6496,7 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
#if defined(CONFIG_NET_RX_BUSY_POLL)
-#define BUSY_POLL_BUDGET 8
-
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, u16 budget)
{
int rc;
@@ -6530,14 +6528,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
/* All we really want here is to re-enable device interrupts.
* Ideally, a new ndo_busy_poll_stop() could avoid another round.
*/
- rc = napi->poll(napi, BUSY_POLL_BUDGET);
+ rc = napi->poll(napi, budget);
/* We can't gro_normal_list() here, because napi->poll() might have
* rearmed the napi (napi_complete_done()) in which case it could
* already be running on another CPU.
*/
- trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
+ trace_napi_poll(napi, rc, budget);
netpoll_poll_unlock(have_poll_lock);
- if (rc == BUSY_POLL_BUDGET) {
+ if (rc == budget) {
/* As the whole budget was spent, we still own the napi so can
* safely handle the rx_list.
*/
@@ -6549,7 +6547,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg)
+ void *loop_end_arg, u16 budget)
{
unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6591,8 +6589,8 @@ void napi_busy_loop(unsigned int napi_id,
HRTIMER_MODE_REL_PINNED);
}
}
- work = napi_poll(napi, BUSY_POLL_BUDGET);
- trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
+ work = napi_poll(napi, budget);
+ trace_napi_poll(napi, work, budget);
gro_normal_list(napi);
count:
if (work > 0)
@@ -6605,7 +6603,7 @@ void napi_busy_loop(unsigned int napi_id,
if (unlikely(need_resched())) {
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock);
+ busy_poll_stop(napi, have_poll_lock, budget);
preempt_enable();
rcu_read_unlock();
cond_resched();
@@ -6616,7 +6614,7 @@ void napi_busy_loop(unsigned int napi_id,
cpu_relax();
}
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock);
+ busy_poll_stop(napi, have_poll_lock, budget);
preempt_enable();
out:
rcu_read_unlock();
diff --git a/net/core/sock.c b/net/core/sock.c
index 686eb5549b79..799125de4add 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
else
sk->sk_bias_busy_poll = valbool;
break;
+ case SO_BUSY_POLL_BUDGET:
+ if ((val > sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN))
+ ret = -EPERM;
+ else {
+ if (val < 0)
+ ret = -EINVAL;
+ else
+ sk->sk_busy_poll_budget = val;
+ }
+ break;
#endif
case SO_MAX_PACING_RATE:
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg()
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Add support for non-blocking recvmsg() to XDP sockets. Previously,
only sendmsg() was supported by XDP socket. Now, for symmetry and the
upcoming busy-polling support, recvmsg() is added.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b71a32eeae65..17d51d1a5752 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -474,6 +474,26 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
return __xsk_sendmsg(sk);
}
+static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
+{
+ bool need_wait = !(flags & MSG_DONTWAIT);
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+
+ if (unlikely(!(xs->dev->flags & IFF_UP)))
+ return -ENETDOWN;
+ if (unlikely(!xs->rx))
+ return -ENOBUFS;
+ if (unlikely(!xsk_is_bound(xs)))
+ return -ENXIO;
+ if (unlikely(need_wait))
+ return -EOPNOTSUPP;
+
+ if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
+ return xsk_wakeup(xs, XDP_WAKEUP_RX);
+ return 0;
+}
+
static __poll_t xsk_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait)
{
@@ -1134,7 +1154,7 @@ static const struct proto_ops xsk_proto_ops = {
.setsockopt = xsk_setsockopt,
.getsockopt = xsk_getsockopt,
.sendmsg = xsk_sendmsg,
- .recvmsg = sock_no_recvmsg,
+ .recvmsg = xsk_recvmsg,
.mmap = xsk_mmap,
.sendpage = sock_no_sendpage,
};
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg()
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (2 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Add a check for need wake up in sendmsg(), so that if a user calls
sendmsg() when no wakeup is needed, do not trigger a wakeup.
To simplify the need wakeup check in the syscall, unconditionally
enable the need wakeup flag for Tx. This has a side-effect for poll();
If poll() is called for a socket without enabled need wakeup, a Tx
wakeup is unconditionally performed.
The wakeup matrix for AF_XDP now looks like:
need wakeup | poll() | sendmsg() | recvmsg()
------------+------------------------+---------------------+---------------------
disabled | wake Tx | wake Tx | nop
enabled | check flag; wake Tx/Rx | check flag; wake Tx | check flag; wake Rx
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 6 +++++-
net/xdp/xsk_buff_pool.c | 13 ++++++-------
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 17d51d1a5752..2e5b9f27c7a3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -465,13 +465,17 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
struct sock *sk = sock->sk;
struct xdp_sock *xs = xdp_sk(sk);
+ struct xsk_buff_pool *pool;
if (unlikely(!xsk_is_bound(xs)))
return -ENXIO;
if (unlikely(need_wait))
return -EOPNOTSUPP;
- return __xsk_sendmsg(sk);
+ pool = xs->pool;
+ if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
+ return __xsk_sendmsg(sk);
+ return 0;
}
static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 64c9e55d4d4e..a4acb5e9576f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
if (err)
return err;
- if (flags & XDP_USE_NEED_WAKEUP) {
+ if (flags & XDP_USE_NEED_WAKEUP)
pool->uses_need_wakeup = true;
- /* Tx needs to be explicitly woken up the first time.
- * Also for supporting drivers that do not implement this
- * feature. They will always have to call sendto().
- */
- pool->cached_need_wakeup = XDP_WAKEUP_TX;
- }
+ /* Tx needs to be explicitly woken up the first time. Also
+ * for supporting drivers that do not implement this
+ * feature. They will always have to call sendto() or poll().
+ */
+ pool->cached_need_wakeup = XDP_WAKEUP_TX;
dev_hold(netdev);
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg()
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (3 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Wire-up XDP socket busy-poll support for recvmsg() and sendmsg().
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2e5b9f27c7a3..da649b4f377c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -23,6 +23,7 @@
#include <linux/netdevice.h>
#include <linux/rculist.h>
#include <net/xdp_sock_drv.h>
+#include <net/busy_poll.h>
#include <net/xdp.h>
#include "xsk_queue.h"
@@ -472,6 +473,9 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (unlikely(need_wait))
return -EOPNOTSUPP;
+ if (sk_can_busy_loop(sk))
+ sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
pool = xs->pool;
if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
return __xsk_sendmsg(sk);
@@ -493,6 +497,9 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl
if (unlikely(need_wait))
return -EOPNOTSUPP;
+ if (sk_can_busy_loop(sk))
+ sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
return xsk_wakeup(xs, XDP_WAKEUP_RX);
return 0;
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (4 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
TODO: Only verified by for the Intel drivers. I'll reach out to the
driver authors for a potential non-RFC version.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
.../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
.../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
.../ethernet/netronome/nfp/nfp_net_common.c | 2 +-
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
drivers/net/ethernet/sfc/rx_common.c | 2 +-
drivers/net/ethernet/socionext/netsec.c | 2 +-
drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
drivers/net/hyperv/netvsc.c | 2 +-
drivers/net/tun.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/virtio_net.c | 2 +-
drivers/net/xen-netfront.c | 2 +-
include/net/busy_poll.h | 19 +++++++++++++++----
include/net/xdp.h | 3 ++-
net/core/dev.c | 2 +-
net/core/xdp.c | 3 ++-
net/xdp/xsk.c | 1 +
26 files changed, 44 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index e8131dadc22c..6ad59f0068f6 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
{
int rc;
- rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
+ rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
if (rc) {
netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index fa147865e33f..5df13387ab74 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2894,7 +2894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
if (rc)
return rc;
- rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
+ rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
if (rc < 0)
return rc;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 7a141ce32e86..f782e6af45e9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
rq->caching = 1;
/* Driver have no proper error path for failed XDP RX-queue info reg */
- WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
+ WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
/* Send a mailbox msg to PF to config RQ */
mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index cf9400a9886d..40953980e846 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
return 0;
err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
- fq->flowid);
+ fq->flowid, 0);
if (err) {
dev_err(dev, "xdp_rxq_info_reg failed\n");
return err;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d43ce13a93c9..a3d5bdaca2f5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1436,7 +1436,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
/* XDP RX-queue info only needed for RX rings exposed to XDP */
if (rx_ring->vsi->type == I40E_VSI_MAIN) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->queue_index);
+ rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
if (err < 0)
return err;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index fe4320e2d1f2..3124a3bf519a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
/* coverity[check_return] */
xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
- ring->q_index);
+ ring->q_index, ring->q_vector->napi.napi_id);
ring->xsk_pool = ice_xsk_pool(ring);
if (ring->xsk_pool) {
@@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
/* coverity[check_return] */
xdp_rxq_info_reg(&ring->xdp_rxq,
ring->netdev,
- ring->q_index);
+ ring->q_index, ring->q_vector->napi.napi_id);
err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_PAGE_SHARED,
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index eae75260fe20..77d5eae6b4c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
if (rx_ring->vsi->type == ICE_VSI_PF &&
!xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->q_index))
+ rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
goto err;
return 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45ae33e15303..50e6b8b6ba7b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
- rx_ring->queue_index) < 0)
+ rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
goto err;
rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 54b0bf574c05..7d0098f4ef9d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3219,7 +3219,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
return err;
}
- err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
if (err < 0)
goto err_free_pp;
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index f6616c8933ca..ff8729b6c414 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2606,11 +2606,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
if (priv->percpu_pools) {
- err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
if (err < 0)
goto err_free_dma;
- err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
if (err < 0)
goto err_unregister_rxq_short;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 502d1b97855c..f561979e5731 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
- if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
+ if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
goto err_ring;
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b150da43adb2..b4acf2f41e84 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
if (dp->netdev) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
- rx_ring->idx);
+ rx_ring->idx, rx_ring->r_vec->napi.napi_id);
if (err < 0)
return err;
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..b73e95329acd 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
/* Driver have no error path from here */
WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
- fp->rxq->rxq_id) < 0);
+ fp->rxq->rxq_id. 0) < 0);
if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
MEM_TYPE_PAGE_ORDER0,
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 19cf7cac1e6e..68fc7d317693 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
/* Initialise XDP queue information */
rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
- rx_queue->core_index);
+ rx_queue->core_index, 0);
if (rc) {
netif_err(efx, rx_err, efx->net_dev,
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 1503cc9ec6e2..80ab24658e87 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
goto err_out;
}
- err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
+ err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, 0);
if (err)
goto err_out;
diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 51cc29f39038..d8f287c88d77 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1189,7 +1189,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
pool = cpsw->page_pool[ch];
rxq = &priv->xdp_rxq[ch];
- ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
+ ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
if (ret)
return ret;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0c3de94b5178..fa8341f8359a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
u64_stats_init(&nvchan->tx_stats.syncp);
u64_stats_init(&nvchan->rx_stats.syncp);
- ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
+ ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
if (ret) {
netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index be69d272052f..f2541d645707 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -791,7 +791,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
} else {
/* Setup XDP RX-queue info, for new tfile getting attached */
err = xdp_rxq_info_reg(&tfile->xdp_rxq,
- tun->dev, tfile->queue_index);
+ tun->dev, tfile->queue_index, 0);
if (err < 0)
goto out;
err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..04d20e9d8431 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -926,7 +926,7 @@ static int veth_enable_xdp(struct net_device *dev)
for (i = 0; i < dev->real_num_rx_queues; i++) {
struct veth_rq *rq = &priv->rq[i];
- err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, 0);
if (err < 0)
goto err_rxq_reg;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..d71fe41595b7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
- err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, 0);
if (err < 0)
return err;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3e9895bec15f..28714a48f5d0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
}
err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
- queue->id);
+ queue->id, 0);
if (err) {
netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
goto err_free_pp;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c6c413d3824d..262f60065355 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -148,14 +148,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
sk_rx_queue_set(sk, skb);
}
-/* variant used for unconnected sockets */
-static inline void sk_mark_napi_id_once(struct sock *sk,
- const struct sk_buff *skb)
+static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
{
#ifdef CONFIG_NET_RX_BUSY_POLL
if (!READ_ONCE(sk->sk_napi_id))
- WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
+ WRITE_ONCE(sk->sk_napi_id, napi_id);
#endif
}
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+ const struct sk_buff *skb)
+{
+ __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
+}
+
+static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
+ const struct xdp_buff *xdp)
+{
+ __sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
+}
+
#endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3814fb631d52..4d4255a94773 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -59,6 +59,7 @@ struct xdp_rxq_info {
u32 queue_index;
u32 reg_state;
struct xdp_mem_info mem;
+ unsigned int napi_id;
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
struct xdp_txq_info {
@@ -211,7 +212,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
}
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
- struct net_device *dev, u32 queue_index);
+ struct net_device *dev, u32 queue_index, unsigned int napi_id);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index b34520acaa7f..ad3261be5e21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9834,7 +9834,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
rx[i].dev = dev;
/* XDP RX-queue setup */
- err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
if (err < 0)
goto err_rxq_info;
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 48aba933a5a8..7cca7cb5b65f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
/* Returns 0 on success, negative on failure */
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
- struct net_device *dev, u32 queue_index)
+ struct net_device *dev, u32 queue_index, unsigned int napi_id)
{
if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
WARN(1, "Driver promised not to register this");
@@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
xdp_rxq_info_init(xdp_rxq);
xdp_rxq->dev = dev;
xdp_rxq->queue_index = queue_index;
+ xdp_rxq->napi_id = napi_id;
xdp_rxq->reg_state = REG_STATE_REGISTERED;
return 0;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index da649b4f377c..0b825612d895 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
return -EINVAL;
+ sk_mark_napi_id_once_xdp(&xs->sk, xdp);
len = xdp->data_end - xdp->data;
return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (5 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Start using recvfrom() the rxdrop scenario.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 1149e94ca32f..96d0b6482ac4 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1172,7 +1172,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
}
}
-static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
+static void rx_drop(struct xsk_socket_info *xsk)
{
unsigned int rcvd, i;
u32 idx_rx = 0, idx_fq = 0;
@@ -1182,7 +1182,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
if (!rcvd) {
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
- ret = poll(fds, num_socks, opt_timeout);
+ recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
return;
}
@@ -1193,7 +1193,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
exit_with_error(-ret);
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.fill_fail_polls++;
- ret = poll(fds, num_socks, opt_timeout);
+ recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
}
@@ -1235,7 +1235,7 @@ static void rx_drop_all(void)
}
for (i = 0; i < num_socks; i++)
- rx_drop(xsks[i], fds);
+ rx_drop(xsks[i]);
if (benchmark_done)
break;
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (6 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Add a new option to xdpsock, 'B', for busy-polling. This option will
also set the batching size, 'b' option, to the busy-poll budget.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 40 +++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 96d0b6482ac4..7ef2c01a1094 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -95,6 +95,7 @@ static int opt_timeout = 1000;
static bool opt_need_wakeup = true;
static u32 opt_num_xsks = 1;
static u32 prog_id;
+static bool opt_busy_poll;
struct xsk_ring_stats {
unsigned long rx_npkts;
@@ -911,6 +912,7 @@ static struct option long_options[] = {
{"quiet", no_argument, 0, 'Q'},
{"app-stats", no_argument, 0, 'a'},
{"irq-string", no_argument, 0, 'I'},
+ {"busy-poll", no_argument, 0, 'B'},
{0, 0, 0, 0}
};
@@ -949,6 +951,7 @@ static void usage(const char *prog)
" -Q, --quiet Do not display any stats.\n"
" -a, --app-stats Display application (syscall) statistics.\n"
" -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n"
+ " -B, --busy-poll Busy poll.\n"
"\n";
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
@@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
- c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
+ c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
long_options, &option_index);
if (c == -1)
break;
@@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
usage(basename(argv[0]));
}
-
+ break;
+ case 'B':
+ opt_busy_poll = 1;
break;
default:
usage(basename(argv[0]));
@@ -1132,7 +1137,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
- if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
xsk->app_stats.fill_fail_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
@@ -1180,7 +1185,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
@@ -1191,7 +1196,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.fill_fail_polls++;
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
@@ -1342,7 +1347,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
@@ -1354,7 +1359,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
if (ret < 0)
exit_with_error(-ret);
complete_tx_l2fwd(xsk, fds);
- if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
xsk->app_stats.tx_wakeup_sendtos++;
kick_tx(xsk);
}
@@ -1461,6 +1466,24 @@ static void enter_xsks_into_map(struct bpf_object *obj)
}
}
+static void apply_setsockopt(struct xsk_socket_info *xsk)
+{
+ int sock_opt;
+
+ if (!opt_busy_poll)
+ return;
+
+ sock_opt = 1;
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BIAS_BUSY_POLL,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
+
+ sock_opt = 20; // randomly picked :-P
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
+}
+
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -1502,6 +1525,9 @@ int main(int argc, char **argv)
for (i = 0; i < opt_num_xsks; i++)
xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
+ for (i = 0; i < opt_num_xsks; i++)
+ apply_setsockopt(xsks[i]);
+
if (opt_bench == BENCH_TXONLY) {
gen_eth_hdr_data();
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (7 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet
9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon
From: Björn Töpel <bjorn.topel@intel.com>
Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching
option ('b').
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7ef2c01a1094..948faada96d5 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1482,6 +1482,11 @@ static void apply_setsockopt(struct xsk_socket_info *xsk)
if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
(void *)&sock_opt, sizeof(sock_opt)) < 0)
exit_with_error(errno);
+
+ sock_opt = opt_batch_size;
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
}
int main(int argc, char **argv)
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH bpf-next 0/9] Introduce biased busy-polling
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
` (8 preceding siblings ...)
2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
@ 2020-10-28 14:13 ` Eric Dumazet
9 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2020-10-28 14:13 UTC (permalink / raw)
To: Björn Töpel
Cc: netdev, bpf, Björn Töpel, magnus.karlsson,
Alexei Starovoitov, Daniel Borkmann, maciej.fijalkowski,
Samudrala, Sridhar, Jesse Brandeburg, qi.z.zhang, Jakub Kicinski,
intel-wired-lan, Jonathan Lemon
On Wed, Oct 28, 2020 at 2:35 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> Jakub suggested in [1] a "strict busy-polling mode with out
> interrupts". This is a first stab at that.
>
> This series adds a new NAPI mode, called biased busy-polling, which is
> an extension to the existing busy-polling mode. The new mode is
> enabled on the socket layer, where a socket setting this option
> "promisies" to busy-poll the NAPI context via a system call. When this
> mode is enabled, the NAPI context will operate in a mode with
> interrupts disabled. The kernel monitors that the busy-polling promise
> is fulfilled by an internal watchdog. If the socket fail/stop
> performing the busy-polling, the mode will be disabled.
>
> Biased busy-polling follows the same mechanism as the existing
> busy-poll; The napi_id is reported to the socket via the skbuff. Later
> commits will extend napi_id reporting to XDP, in order to work
> correctly with XDP sockets.
>
> Let us walk through a flow of execution:
>
> 1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
> socket now shows an intent of doing busy-polling. No data has been
> received to the socket, so the napi_id of the socket is still 0
> (non-valid). As usual for busy-polling, the SO_BUSY_POLL option
> also has to be non-zero for biased busy-polling.
>
> 2. Data is received on the socket changing the napi_id to non-zero.
>
> 3. The socket does a system call that has the busy-polling logic wired
> up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
> as biased busy-poll. The kernel watchdog is armed. If the NAPI
> context is already running, it will try to finish as soon as
> possible and move to busy-polling. If the NAPI context is not
> running, it will execute the NAPI poll function for the
> corresponding napi_id.
>
> 4. Goto 3, or wait until the watchdog timeout.
>
> The series is outlined as following:
> Patch 1-2: Biased busy-polling, and option to set busy-poll budget.
> Patch 3-6: Busy-poll plumbing for XDP sockets
> Patch 7-9: Add busy-polling support to the xdpsock sample
>
> Performance UDP sockets:
>
> I hacked netperf to use non-blocking sockets, and looping over
> recvfrom(). The following command-line was used:
> $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
> -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
>
> Non-blocking:
> 16,18.45,195,0.94,54070.369
> Non-blocking with biased busy-polling:
> 15,16.59,38,0.70,60086.313
>
But a fair comparison should be done using current busy-polling mode,
which does not require netperf to use non-blocking mode in the first place ?
Would disabling/rearming interrupts about 60,000 times per second
bring any benefit ?
Additional questions :
- What happens to the gro_flush_timeout and accumulated TCP segments
in GRO engine
while the biased busy-polling is in use ?
- What mechanism would avoid a potential 200 ms latency when the
application wants to exit cleanly ?
Presumably when/if SO_BIAS_BUSY_POLL is used to clear
sk->sk_bias_busy_poll we need
to make sure device interrupts are re-enabled.
> Performance XDP sockets:
>
> Today, running XDP sockets sample on the same core as the softirq
> handling, performance tanks mainly because we do not yield to
> user-space when the XDP socket Rx queue is full.
> # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
> Rx: 64Kpps
>
> # # biased busy-polling, budget 8
> # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
> Rx 9.9Mpps
> # # biased busy-polling, budget 64
> # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
> Rx: 19.3Mpps
> # # biased busy-polling, budget 256
> # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
> Rx: 21.4Mpps
> # # biased busy-polling, budget 512
> # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
> Rx: 21.4Mpps
>
> Compared to the two-core case:
> # taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
> Rx: 20.7Mpps
>
> We're getting better single-core performance than two, for this naïve
> drop scenario.
>
> The above tests was done for the 'ice' driver.
>
> Some outstanding questions:
>
> * Does biased busy-polling make sense for non-XDP sockets? For a
> dedicated queue, biased busy-polling has a strong case. When the
> NAPI is shared with other sockets, it can affect the latencies of
> sockets that were not explicity busy-poll enabled. Note that this
> true for regular busy-polling as well, but the biased version is
> stricter.
>
> * Currently busy-polling for UDP/TCP is only wired up in the recvmsg()
> path. Does it make sense to extend that to sendmsg() as well?
>
> * Biased busy-polling only makes sense for non-blocking sockets. Reject
> enabling of biased busy-polling unless the socket is non-blocking?
>
> * The watchdog is 200 ms. Should it be configurable?
>
> * Extending xdp_rxq_info_reg() with napi_id touches a lot of drivers,
> and I've only verified the Intel ones. Some drivers initialize NAPI
> (generating the napi_id) after the xdp_rxq_info_reg() call, which
> maybe would open up for another API? I did not send this RFC to all
> the driver authors. I'll do that for a patch proper series.
>
> * Today, enabling busy-polling require CAP_NET_ADMIN. For a NAPI
> context that services multiple socket, this makes sense because one
> socket can affect performance of other sockets. Now, for a
> *dedicated* queue for say XDP socket, would it be OK to drop
> CAP_NET_ADMIN, because it cannot affect other sockets/users?
>
> @Jakub Thanks for the early comments. I left the check in
> napi_schedule_prep(), because I hit that for the Intel i40e driver;
> forcing busy-polling on a core outside the interrupt affinity mask.
>
> [1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
>
> Björn Töpel (9):
> net: introduce biased busy-polling
> net: add SO_BUSY_POLL_BUDGET socket option
> xsk: add support for recvmsg()
> xsk: check need wakeup flag in sendmsg()
> xsk: add busy-poll support for {recv,send}msg()
> xsk: propagate napi_id to XDP socket Rx path
> samples/bpf: use recvfrom() in xdpsock
> samples/bpf: add busy-poll support to xdpsock
> samples/bpf: add option to set the busy-poll budget
>
> arch/alpha/include/uapi/asm/socket.h | 3 +
> arch/mips/include/uapi/asm/socket.h | 3 +
> arch/parisc/include/uapi/asm/socket.h | 3 +
> arch/sparc/include/uapi/asm/socket.h | 3 +
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> .../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
> .../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
> drivers/net/ethernet/intel/ice/ice_base.c | 4 +-
> drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
> drivers/net/ethernet/marvell/mvneta.c | 2 +-
> .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 +-
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> .../ethernet/netronome/nfp/nfp_net_common.c | 2 +-
> drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
> drivers/net/ethernet/sfc/rx_common.c | 2 +-
> drivers/net/ethernet/socionext/netsec.c | 2 +-
> drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
> drivers/net/hyperv/netvsc.c | 2 +-
> drivers/net/tun.c | 2 +-
> drivers/net/veth.c | 2 +-
> drivers/net/virtio_net.c | 2 +-
> drivers/net/xen-netfront.c | 2 +-
> fs/eventpoll.c | 3 +-
> include/linux/netdevice.h | 33 +++---
> include/net/busy_poll.h | 42 +++++--
> include/net/sock.h | 4 +
> include/net/xdp.h | 3 +-
> include/uapi/asm-generic/socket.h | 3 +
> net/core/dev.c | 111 +++++++++++++++---
> net/core/sock.c | 19 +++
> net/core/xdp.c | 3 +-
> net/xdp/xsk.c | 36 +++++-
> net/xdp/xsk_buff_pool.c | 13 +-
> samples/bpf/xdpsock_user.c | 53 +++++++--
> 37 files changed, 296 insertions(+), 85 deletions(-)
>
>
> base-commit: 3cb12d27ff655e57e8efe3486dca2a22f4e30578
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-10-29 1:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).