bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/9] Introduce biased busy-polling
@ 2020-10-28 13:34 Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, bjorn.topel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

Jakub suggested in [1] a "strict busy-polling mode with out
interrupts". This is a first stab at that.

This series adds a new NAPI mode, called biased busy-polling, which is
an extension to the existing busy-polling mode. The new mode is
enabled on the socket layer, where a socket setting this option
"promisies" to busy-poll the NAPI context via a system call. When this
mode is enabled, the NAPI context will operate in a mode with
interrupts disabled. The kernel monitors that the busy-polling promise
is fulfilled by an internal watchdog. If the socket fail/stop
performing the busy-polling, the mode will be disabled.

Biased busy-polling follows the same mechanism as the existing
busy-poll; The napi_id is reported to the socket via the skbuff. Later
commits will extend napi_id reporting to XDP, in order to work
correctly with XDP sockets.
    
Let us walk through a flow of execution:
    
1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
   socket now shows an intent of doing busy-polling. No data has been
   received to the socket, so the napi_id of the socket is still 0
   (non-valid). As usual for busy-polling, the SO_BUSY_POLL option
   also has to be non-zero for biased busy-polling.

2. Data is received on the socket changing the napi_id to non-zero.

3. The socket does a system call that has the busy-polling logic wired
   up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
   as biased busy-poll. The kernel watchdog is armed. If the NAPI
   context is already running, it will try to finish as soon as
   possible and move to busy-polling. If the NAPI context is not
   running, it will execute the NAPI poll function for the
   corresponding napi_id.

4. Goto 3, or wait until the watchdog timeout.

The series is outlined as following:
  Patch 1-2: Biased busy-polling, and option to set busy-poll budget.
  Patch 3-6: Busy-poll plumbing for XDP sockets
  Patch 7-9: Add busy-polling support to the xdpsock sample

Performance UDP sockets:

I hacked netperf to use non-blocking sockets, and looping over
recvfrom(). The following command-line was used:
  $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
      -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate

Non-blocking:
  16,18.45,195,0.94,54070.369
Non-blocking with biased busy-polling:
  15,16.59,38,0.70,60086.313

Performance XDP sockets:

Today, running XDP sockets sample on the same core as the softirq
handling, performance tanks mainly because we do not yield to
user-space when the XDP socket Rx queue is full.
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
  Rx: 64Kpps
  
  # # biased busy-polling, budget 8
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
  Rx 9.9Mpps
  # # biased busy-polling, budget 64
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
  Rx: 19.3Mpps
  # # biased busy-polling, budget 256
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
  Rx: 21.4Mpps
  # # biased busy-polling, budget 512
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
  Rx: 21.4Mpps

Compared to the two-core case:
  # taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
  Rx: 20.7Mpps

We're getting better single-core performance than two, for this naïve
drop scenario.

The above tests was done for the 'ice' driver.

Some outstanding questions:

* Does biased busy-polling make sense for non-XDP sockets? For a
  dedicated queue, biased busy-polling has a strong case. When the
  NAPI is shared with other sockets, it can affect the latencies of
  sockets that were not explicity busy-poll enabled. Note that this
  true for regular busy-polling as well, but the biased version is
  stricter.

* Currently busy-polling for UDP/TCP is only wired up in the recvmsg()
  path. Does it make sense to extend that to sendmsg() as well?

* Biased busy-polling only makes sense for non-blocking sockets. Reject
  enabling of biased busy-polling unless the socket is non-blocking?

* The watchdog is 200 ms. Should it be configurable?

* Extending xdp_rxq_info_reg() with napi_id touches a lot of drivers,
  and I've only verified the Intel ones. Some drivers initialize NAPI
  (generating the napi_id) after the xdp_rxq_info_reg() call, which
  maybe would open up for another API? I did not send this RFC to all
  the driver authors. I'll do that for a patch proper series.

* Today, enabling busy-polling require CAP_NET_ADMIN. For a NAPI
  context that services multiple socket, this makes sense because one
  socket can affect performance of other sockets. Now, for a
  *dedicated* queue for say XDP socket, would it be OK to drop
  CAP_NET_ADMIN, because it cannot affect other sockets/users?

@Jakub Thanks for the early comments. I left the check in
napi_schedule_prep(), because I hit that for the Intel i40e driver;
forcing busy-polling on a core outside the interrupt affinity mask.

[1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

Björn Töpel (9):
  net: introduce biased busy-polling
  net: add SO_BUSY_POLL_BUDGET socket option
  xsk: add support for recvmsg()
  xsk: check need wakeup flag in sendmsg()
  xsk: add busy-poll support for {recv,send}msg()
  xsk: propagate napi_id to XDP socket Rx path
  samples/bpf: use recvfrom() in xdpsock
  samples/bpf: add busy-poll support to xdpsock
  samples/bpf: add option to set the busy-poll budget

 arch/alpha/include/uapi/asm/socket.h          |   3 +
 arch/mips/include/uapi/asm/socket.h           |   3 +
 arch/parisc/include/uapi/asm/socket.h         |   3 +
 arch/sparc/include/uapi/asm/socket.h          |   3 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
 .../ethernet/cavium/thunder/nicvf_queues.c    |   2 +-
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   2 +-
 drivers/net/ethernet/intel/ice/ice_base.c     |   4 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   2 +-
 drivers/net/ethernet/marvell/mvneta.c         |   2 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |   2 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |   2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  |   2 +-
 drivers/net/ethernet/sfc/rx_common.c          |   2 +-
 drivers/net/ethernet/socionext/netsec.c       |   2 +-
 drivers/net/ethernet/ti/cpsw_priv.c           |   2 +-
 drivers/net/hyperv/netvsc.c                   |   2 +-
 drivers/net/tun.c                             |   2 +-
 drivers/net/veth.c                            |   2 +-
 drivers/net/virtio_net.c                      |   2 +-
 drivers/net/xen-netfront.c                    |   2 +-
 fs/eventpoll.c                                |   3 +-
 include/linux/netdevice.h                     |  33 +++---
 include/net/busy_poll.h                       |  42 +++++--
 include/net/sock.h                            |   4 +
 include/net/xdp.h                             |   3 +-
 include/uapi/asm-generic/socket.h             |   3 +
 net/core/dev.c                                | 111 +++++++++++++++---
 net/core/sock.c                               |  19 +++
 net/core/xdp.c                                |   3 +-
 net/xdp/xsk.c                                 |  36 +++++-
 net/xdp/xsk_buff_pool.c                       |  13 +-
 samples/bpf/xdpsock_user.c                    |  53 +++++++--
 37 files changed, 296 insertions(+), 85 deletions(-)


base-commit: 3cb12d27ff655e57e8efe3486dca2a22f4e30578
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 1/9] net: introduce biased busy-polling
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

This change adds a new NAPI mode, called biased busy-polling, which is
an extension to the existing busy-polling mode. The new mode is
enabled on the socket layer, where a socket setting this option
"promisies" to busy-poll the NAPI context via a system call. When this
mode is enabled, the NAPI context will operate in a mode with
interrupts disabled. The kernel monitors that the busy-polling promise
is fulfilled by an internal watchdog. If the socket fail/stop
performing the busy-polling, the mode will be disabled. The watchdog
is currently 200 ms.

Biased busy-polling follows the same mechanism as the existing
busy-poll; The napi_id is reported to the socket via the skbuff. Later
commits will extend napi_id reporting to XDP, in order to work
correctly with XDP sockets.

Let us walk through a flow of execution:

1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
   socket now shows an intent of doing busy-polling. No data has been
   received to the socket, so the napi_id of the socket is still 0
   (non-valid). As usual for busy-polling, the SO_BUSY_POLL option
   also has to be non-zero for biased busy-polling.

2. Data is received on the socket changing the napi_id to non-zero.

3. The socket does a system call that has the busy-polling logic wired
   up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
   as biased busy-poll. The kernel watchdog is armed. If the NAPI
   context is already running, it will try to finish as soon as
   possible and move to busy-polling. If the NAPI context is not
   running, it will execute the NAPI poll function for the
   corresponding napi_id.

4. Goto 3, or wait until the watchdog timeout.

Given the nature of busy-polling, this mode only make sense for
non-blocking sockets.

When the NAPI context is in biased busy-polling mode, it will not
allow a NAPI to be scheduled using the
napi_schedule_prep()/napi_scheduleXXX() combo.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  2 +
 arch/mips/include/uapi/asm/socket.h   |  2 +
 arch/parisc/include/uapi/asm/socket.h |  2 +
 arch/sparc/include/uapi/asm/socket.h  |  2 +
 include/linux/netdevice.h             | 33 +++++-----
 include/net/busy_poll.h               | 17 ++++-
 include/net/sock.h                    |  3 +
 include/uapi/asm-generic/socket.h     |  2 +
 net/core/dev.c                        | 89 +++++++++++++++++++++++++--
 net/core/sock.c                       |  9 +++
 10 files changed, 140 insertions(+), 21 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index de6c4df61082..0f776668fb09 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -124,6 +124,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_BIAS_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d0a9ed2ca2d6..d23984731504 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -135,6 +135,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_BIAS_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 10173c32195e..49469713ed2a 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -116,6 +116,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 0x4042
 
+#define SO_BIAS_BUSY_POLL	0x4043
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 8029b681fc7c..009aba6f7a54 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -117,6 +117,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF  0x0047
 
+#define SO_BIAS_BUSY_POLL	 0x0048
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..9bdc84d3d6b8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -344,29 +344,32 @@ struct napi_struct {
 	struct list_head	rx_list; /* Pending GRO_NORMAL skbs */
 	int			rx_count; /* length of rx_list */
 	struct hrtimer		timer;
+	struct hrtimer		bp_watchdog;
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
 };
 
 enum {
-	NAPI_STATE_SCHED,	/* Poll is scheduled */
-	NAPI_STATE_MISSED,	/* reschedule a napi */
-	NAPI_STATE_DISABLE,	/* Disable pending */
-	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
-	NAPI_STATE_LISTED,	/* NAPI added to system lists */
-	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
-	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+	NAPI_STATE_SCHED,		/* Poll is scheduled */
+	NAPI_STATE_MISSED,		/* reschedule a napi */
+	NAPI_STATE_DISABLE,		/* Disable pending */
+	NAPI_STATE_NPSVC,		/* Netpoll - don't dequeue from poll_list */
+	NAPI_STATE_LISTED,		/* NAPI added to system lists */
+	NAPI_STATE_NO_BUSY_POLL,	/* Do not add in napi_hash, no busy polling */
+	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
+	NAPI_STATE_BIAS_BUSY_POLL,	/* biased busy-polling */
 };
 
 enum {
-	NAPIF_STATE_SCHED	 = BIT(NAPI_STATE_SCHED),
-	NAPIF_STATE_MISSED	 = BIT(NAPI_STATE_MISSED),
-	NAPIF_STATE_DISABLE	 = BIT(NAPI_STATE_DISABLE),
-	NAPIF_STATE_NPSVC	 = BIT(NAPI_STATE_NPSVC),
-	NAPIF_STATE_LISTED	 = BIT(NAPI_STATE_LISTED),
-	NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
-	NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+	NAPIF_STATE_SCHED	   = BIT(NAPI_STATE_SCHED),
+	NAPIF_STATE_MISSED	   = BIT(NAPI_STATE_MISSED),
+	NAPIF_STATE_DISABLE	   = BIT(NAPI_STATE_DISABLE),
+	NAPIF_STATE_NPSVC	   = BIT(NAPI_STATE_NPSVC),
+	NAPIF_STATE_LISTED	   = BIT(NAPI_STATE_LISTED),
+	NAPIF_STATE_NO_BUSY_POLL   = BIT(NAPI_STATE_NO_BUSY_POLL),
+	NAPIF_STATE_IN_BUSY_POLL   = BIT(NAPI_STATE_IN_BUSY_POLL),
+	NAPIF_STATE_BIAS_BUSY_POLL = BIT(NAPI_STATE_BIAS_BUSY_POLL),
 };
 
 enum gro_result {
@@ -555,6 +558,8 @@ static inline bool napi_if_scheduled_mark_missed(struct napi_struct *n)
 	return true;
 }
 
+void napi_bias_busy_poll(unsigned int napi_id);
+
 enum netdev_queue_state_t {
 	__QUEUE_STATE_DRV_XOFF,
 	__QUEUE_STATE_STACK_XOFF,
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index b001fa91c14e..9738923ed17b 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -23,6 +23,9 @@
  */
 #define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1))
 
+/* Biased busy-poll watchdog timeout in ms */
+#define BIASED_BUSY_POLL_TIMEOUT 200
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 
 struct napi_struct;
@@ -99,13 +102,25 @@ static inline bool sk_busy_loop_timeout(struct sock *sk,
 	return true;
 }
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static inline void __sk_bias_busy_poll(struct sock *sk, unsigned int napi_id)
+{
+	if (likely(!READ_ONCE(sk->sk_bias_busy_poll)))
+		return;
+
+	napi_bias_busy_poll(napi_id);
+}
+#endif
+
 static inline void sk_busy_loop(struct sock *sk, int nonblock)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
 
-	if (napi_id >= MIN_NAPI_ID)
+	if (napi_id >= MIN_NAPI_ID) {
+		__sk_bias_busy_poll(sk, napi_id);
 		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+	}
 #endif
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index a5c6ae78df77..cf71834fb601 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -479,6 +479,9 @@ struct sock {
 	u32			sk_ack_backlog;
 	u32			sk_max_ack_backlog;
 	kuid_t			sk_uid;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	u8			sk_bias_busy_poll;
+#endif
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 77f7c1638eb1..8a2b37ccd9d5 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -119,6 +119,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_BIAS_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/dev.c b/net/core/dev.c
index 9499a414d67e..a29e4c4a35f6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6378,6 +6378,9 @@ bool napi_schedule_prep(struct napi_struct *n)
 		val = READ_ONCE(n->state);
 		if (unlikely(val & NAPIF_STATE_DISABLE))
 			return false;
+		if (unlikely(val & NAPIF_STATE_BIAS_BUSY_POLL))
+			return false;
+
 		new = val | NAPIF_STATE_SCHED;
 
 		/* Sets STATE_MISSED bit if STATE_SCHED was already set
@@ -6458,12 +6461,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 
 		/* If STATE_MISSED was set, leave STATE_SCHED set,
 		 * because we will call napi->poll() one more time.
-		 * This C code was suggested by Alexander Duyck to help gcc.
 		 */
-		new |= (val & NAPIF_STATE_MISSED) / NAPIF_STATE_MISSED *
-						    NAPIF_STATE_SCHED;
+		if (val & NAPIF_STATE_MISSED && !(val & NAPIF_STATE_BIAS_BUSY_POLL))
+			new |= NAPIF_STATE_SCHED;
 	} while (cmpxchg(&n->state, val, new) != val);
 
+	if (unlikely(val & NAPIF_STATE_BIAS_BUSY_POLL))
+		return false;
+
 	if (unlikely(val & NAPIF_STATE_MISSED)) {
 		__napi_schedule(n);
 		return false;
@@ -6497,6 +6502,20 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 {
 	int rc;
 
+	clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
+
+	local_bh_disable();
+	/* If we're biased towards busy poll, clear the sched flags,
+	 * so that we can enter again.
+	 */
+	if (READ_ONCE(napi->state) & NAPIF_STATE_BIAS_BUSY_POLL) {
+		netpoll_poll_unlock(have_poll_lock);
+		napi_complete(napi);
+		__kfree_skb_flush();
+		local_bh_enable();
+		return;
+	}
+
 	/* Busy polling means there is a high chance device driver hard irq
 	 * could not grab NAPI_STATE_SCHED, and that NAPI_STATE_MISSED was
 	 * set in napi_schedule_prep().
@@ -6507,9 +6526,6 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 	 * to perform these two clear_bit()
 	 */
 	clear_bit(NAPI_STATE_MISSED, &napi->state);
-	clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
-
-	local_bh_disable();
 
 	/* All we really want here is to re-enable device interrupts.
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
@@ -6569,6 +6585,11 @@ void napi_busy_loop(unsigned int napi_id,
 				goto count;
 			have_poll_lock = netpoll_poll_lock(napi);
 			napi_poll = napi->poll;
+			if (val & NAPIF_STATE_BIAS_BUSY_POLL) {
+				hrtimer_start(&napi->bp_watchdog,
+					      ms_to_ktime(BIASED_BUSY_POLL_TIMEOUT),
+					      HRTIMER_MODE_REL_PINNED);
+			}
 		}
 		work = napi_poll(napi, BUSY_POLL_BUDGET);
 		trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
@@ -6652,6 +6673,53 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+static enum hrtimer_restart napi_biased_busy_poll_watchdog(struct hrtimer *timer)
+{
+	struct napi_struct *napi;
+	unsigned long val, new;
+
+	napi = container_of(timer, struct napi_struct, bp_watchdog);
+
+	do {
+		val = READ_ONCE(napi->state);
+		if (WARN_ON_ONCE(!(val & NAPIF_STATE_BIAS_BUSY_POLL)))
+			return HRTIMER_NORESTART;
+
+		new = val & ~NAPIF_STATE_BIAS_BUSY_POLL;
+	} while (cmpxchg(&napi->state, val, new) != val);
+
+	if (!napi_disable_pending(napi) &&
+	    !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
+		__napi_schedule_irqoff(napi);
+
+	return HRTIMER_NORESTART;
+}
+
+void napi_bias_busy_poll(unsigned int napi_id)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	struct napi_struct *napi;
+	unsigned long val, new;
+
+	napi = napi_by_id(napi_id);
+	if (!napi)
+		return;
+
+	do {
+		val = READ_ONCE(napi->state);
+		if (val & NAPIF_STATE_BIAS_BUSY_POLL)
+			return;
+
+		new = val | NAPIF_STATE_BIAS_BUSY_POLL;
+	} while (cmpxchg(&napi->state, val, new) != val);
+
+	hrtimer_start(&napi->bp_watchdog, ms_to_ktime(BIASED_BUSY_POLL_TIMEOUT),
+		      HRTIMER_MODE_REL_PINNED);
+#endif
+}
+EXPORT_SYMBOL(napi_bias_busy_poll);
+
+
 static void init_gro_hash(struct napi_struct *napi)
 {
 	int i;
@@ -6673,6 +6741,8 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 	INIT_HLIST_NODE(&napi->napi_hash_node);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
+	hrtimer_init(&napi->bp_watchdog, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	napi->bp_watchdog.function = napi_biased_busy_poll_watchdog;
 	init_gro_hash(napi);
 	napi->skb = NULL;
 	INIT_LIST_HEAD(&napi->rx_list);
@@ -6704,7 +6774,9 @@ void napi_disable(struct napi_struct *n)
 		msleep(1);
 
 	hrtimer_cancel(&n->timer);
+	hrtimer_cancel(&n->bp_watchdog);
 
+	clear_bit(NAPI_STATE_BIAS_BUSY_POLL, &n->state);
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }
 EXPORT_SYMBOL(napi_disable);
@@ -6767,6 +6839,11 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 	if (likely(work < weight))
 		goto out_unlock;
 
+	if (unlikely(n->state & NAPIF_STATE_BIAS_BUSY_POLL)) {
+		napi_complete(n);
+		goto out_unlock;
+	}
+
 	/* Drivers must not modify the NAPI state if they
 	 * consume the entire weight.  In such cases this code
 	 * still "owns" the NAPI instance and therefore can
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..686eb5549b79 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 				sk->sk_ll_usec = val;
 		}
 		break;
+	case SO_BIAS_BUSY_POLL:
+		if (valbool && !capable(CAP_NET_ADMIN))
+			ret = -EPERM;
+		else
+			sk->sk_bias_busy_poll = valbool;
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
@@ -1523,6 +1529,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	case SO_BUSY_POLL:
 		v.val = sk->sk_ll_usec;
 		break;
+	case SO_BIAS_BUSY_POLL:
+		v.val = sk->sk_bias_busy_poll;
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  1 +
 arch/mips/include/uapi/asm/socket.h   |  1 +
 arch/parisc/include/uapi/asm/socket.h |  1 +
 arch/sparc/include/uapi/asm/socket.h  |  1 +
 fs/eventpoll.c                        |  3 ++-
 include/net/busy_poll.h               |  6 ++++--
 include/net/sock.h                    |  1 +
 include/uapi/asm-generic/socket.h     |  1 +
 net/core/dev.c                        | 20 +++++++++-----------
 net/core/sock.c                       | 10 ++++++++++
 10 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 0f776668fb09..4ea972b7b711 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -125,6 +125,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_BIAS_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d23984731504..13eaffbfbe50 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -136,6 +136,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_BIAS_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 49469713ed2a..036e42dac6b3 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -117,6 +117,7 @@
 #define SO_DETACH_REUSEPORT_BPF 0x4042
 
 #define SO_BIAS_BUSY_POLL	0x4043
+#define SO_BUSY_POLL_BUDGET	0x4044
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 009aba6f7a54..bc482dc93bd4 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -118,6 +118,7 @@
 #define SO_DETACH_REUSEPORT_BPF  0x0047
 
 #define SO_BIAS_BUSY_POLL	 0x0048
+#define SO_BUSY_POLL_BUDGET	 0x0049
 
 #if !defined(__KERNEL__)
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..fa00a0640264 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,8 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
 	unsigned int napi_id = READ_ONCE(ep->napi_id);
 
 	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
-		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
+		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep,
+			       BUSY_POLL_BUDGET);
 }
 
 static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 9738923ed17b..c6c413d3824d 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -25,6 +25,7 @@
 
 /* Biased busy-poll watchdog timeout in ms */
 #define BIASED_BUSY_POLL_TIMEOUT 200
+#define BUSY_POLL_BUDGET 8
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 
@@ -46,7 +47,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg);
+		    void *loop_end_arg, u16 budget);
 
 #else /* CONFIG_NET_RX_BUSY_POLL */
 static inline unsigned long net_busy_loop_on(void)
@@ -119,7 +120,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 
 	if (napi_id >= MIN_NAPI_ID) {
 		__sk_bias_busy_poll(sk, napi_id);
-		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
+			       sk->sk_busy_poll_budget ?: BUSY_POLL_BUDGET);
 	}
 #endif
 }
diff --git a/include/net/sock.h b/include/net/sock.h
index cf71834fb601..3caf53b6bd71 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -481,6 +481,7 @@ struct sock {
 	kuid_t			sk_uid;
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	u8			sk_bias_busy_poll;
+	u16			sk_busy_poll_budget;
 #endif
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 8a2b37ccd9d5..9dc1f35fe77f 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -120,6 +120,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_BIAS_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index a29e4c4a35f6..b34520acaa7f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6496,9 +6496,7 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 
 #if defined(CONFIG_NET_RX_BUSY_POLL)
 
-#define BUSY_POLL_BUDGET 8
-
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, u16 budget)
 {
 	int rc;
 
@@ -6530,14 +6528,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 	/* All we really want here is to re-enable device interrupts.
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
 	 */
-	rc = napi->poll(napi, BUSY_POLL_BUDGET);
+	rc = napi->poll(napi, budget);
 	/* We can't gro_normal_list() here, because napi->poll() might have
 	 * rearmed the napi (napi_complete_done()) in which case it could
 	 * already be running on another CPU.
 	 */
-	trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
+	trace_napi_poll(napi, rc, budget);
 	netpoll_poll_unlock(have_poll_lock);
-	if (rc == BUSY_POLL_BUDGET) {
+	if (rc == budget) {
 		/* As the whole budget was spent, we still own the napi so can
 		 * safely handle the rx_list.
 		 */
@@ -6549,7 +6547,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg)
+		    void *loop_end_arg, u16 budget)
 {
 	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
 	int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6591,8 +6589,8 @@ void napi_busy_loop(unsigned int napi_id,
 					      HRTIMER_MODE_REL_PINNED);
 			}
 		}
-		work = napi_poll(napi, BUSY_POLL_BUDGET);
-		trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
+		work = napi_poll(napi, budget);
+		trace_napi_poll(napi, work, budget);
 		gro_normal_list(napi);
 count:
 		if (work > 0)
@@ -6605,7 +6603,7 @@ void napi_busy_loop(unsigned int napi_id,
 
 		if (unlikely(need_resched())) {
 			if (napi_poll)
-				busy_poll_stop(napi, have_poll_lock);
+				busy_poll_stop(napi, have_poll_lock, budget);
 			preempt_enable();
 			rcu_read_unlock();
 			cond_resched();
@@ -6616,7 +6614,7 @@ void napi_busy_loop(unsigned int napi_id,
 		cpu_relax();
 	}
 	if (napi_poll)
-		busy_poll_stop(napi, have_poll_lock);
+		busy_poll_stop(napi, have_poll_lock, budget);
 	preempt_enable();
 out:
 	rcu_read_unlock();
diff --git a/net/core/sock.c b/net/core/sock.c
index 686eb5549b79..799125de4add 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 		else
 			sk->sk_bias_busy_poll = valbool;
 		break;
+	case SO_BUSY_POLL_BUDGET:
+		if ((val > sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN))
+			ret = -EPERM;
+		else {
+			if (val < 0)
+				ret = -EINVAL;
+			else
+				sk->sk_busy_poll_budget = val;
+		}
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg()
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Add support for non-blocking recvmsg() to XDP sockets. Previously,
only sendmsg() was supported by XDP socket. Now, for symmetry and the
upcoming busy-polling support, recvmsg() is added.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b71a32eeae65..17d51d1a5752 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -474,6 +474,26 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	return __xsk_sendmsg(sk);
 }
 
+static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
+{
+	bool need_wait = !(flags & MSG_DONTWAIT);
+	struct sock *sk = sock->sk;
+	struct xdp_sock *xs = xdp_sk(sk);
+
+	if (unlikely(!(xs->dev->flags & IFF_UP)))
+		return -ENETDOWN;
+	if (unlikely(!xs->rx))
+		return -ENOBUFS;
+	if (unlikely(!xsk_is_bound(xs)))
+		return -ENXIO;
+	if (unlikely(need_wait))
+		return -EOPNOTSUPP;
+
+	if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
+		return xsk_wakeup(xs, XDP_WAKEUP_RX);
+	return 0;
+}
+
 static __poll_t xsk_poll(struct file *file, struct socket *sock,
 			     struct poll_table_struct *wait)
 {
@@ -1134,7 +1154,7 @@ static const struct proto_ops xsk_proto_ops = {
 	.setsockopt	= xsk_setsockopt,
 	.getsockopt	= xsk_getsockopt,
 	.sendmsg	= xsk_sendmsg,
-	.recvmsg	= sock_no_recvmsg,
+	.recvmsg	= xsk_recvmsg,
 	.mmap		= xsk_mmap,
 	.sendpage	= sock_no_sendpage,
 };
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg()
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (2 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Add a check for need wake up in sendmsg(), so that if a user calls
sendmsg() when no wakeup is needed, do not trigger a wakeup.

To simplify the need wakeup check in the syscall, unconditionally
enable the need wakeup flag for Tx. This has a side-effect for poll();
If poll() is called for a socket without enabled need wakeup, a Tx
wakeup is unconditionally performed.

The wakeup matrix for AF_XDP now looks like:

need wakeup | poll()                 | sendmsg()           | recvmsg()
------------+------------------------+---------------------+---------------------
disabled    | wake Tx                | wake Tx             | nop
enabled     | check flag; wake Tx/Rx | check flag; wake Tx | check flag; wake Rx

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c           |  6 +++++-
 net/xdp/xsk_buff_pool.c | 13 ++++++-------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 17d51d1a5752..2e5b9f27c7a3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -465,13 +465,17 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
+	struct xsk_buff_pool *pool;
 
 	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
-	return __xsk_sendmsg(sk);
+	pool = xs->pool;
+	if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
+		return __xsk_sendmsg(sk);
+	return 0;
 }
 
 static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 64c9e55d4d4e..a4acb5e9576f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 	if (err)
 		return err;
 
-	if (flags & XDP_USE_NEED_WAKEUP) {
+	if (flags & XDP_USE_NEED_WAKEUP)
 		pool->uses_need_wakeup = true;
-		/* Tx needs to be explicitly woken up the first time.
-		 * Also for supporting drivers that do not implement this
-		 * feature. They will always have to call sendto().
-		 */
-		pool->cached_need_wakeup = XDP_WAKEUP_TX;
-	}
+	/* Tx needs to be explicitly woken up the first time.  Also
+	 * for supporting drivers that do not implement this
+	 * feature. They will always have to call sendto() or poll().
+	 */
+	pool->cached_need_wakeup = XDP_WAKEUP_TX;
 
 	dev_hold(netdev);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg()
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (3 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Wire-up XDP socket busy-poll support for recvmsg() and sendmsg().

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2e5b9f27c7a3..da649b4f377c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -23,6 +23,7 @@
 #include <linux/netdevice.h>
 #include <linux/rculist.h>
 #include <net/xdp_sock_drv.h>
+#include <net/busy_poll.h>
 #include <net/xdp.h>
 
 #include "xsk_queue.h"
@@ -472,6 +473,9 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
+	if (sk_can_busy_loop(sk))
+		sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
 	pool = xs->pool;
 	if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
 		return __xsk_sendmsg(sk);
@@ -493,6 +497,9 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
+	if (sk_can_busy_loop(sk))
+		sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
 	if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
 		return xsk_wakeup(xs, XDP_WAKEUP_RX);
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (4 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.

TODO: Only verified by for the Intel drivers. I'll reach out to the
      driver authors for a potential non-RFC version.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 .../ethernet/cavium/thunder/nicvf_queues.c    |  2 +-
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
 drivers/net/ethernet/intel/ice/ice_base.c     |  4 ++--
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  2 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
 drivers/net/ethernet/sfc/rx_common.c          |  2 +-
 drivers/net/ethernet/socionext/netsec.c       |  2 +-
 drivers/net/ethernet/ti/cpsw_priv.c           |  2 +-
 drivers/net/hyperv/netvsc.c                   |  2 +-
 drivers/net/tun.c                             |  2 +-
 drivers/net/veth.c                            |  2 +-
 drivers/net/virtio_net.c                      |  2 +-
 drivers/net/xen-netfront.c                    |  2 +-
 include/net/busy_poll.h                       | 19 +++++++++++++++----
 include/net/xdp.h                             |  3 ++-
 net/core/dev.c                                |  2 +-
 net/core/xdp.c                                |  3 ++-
 net/xdp/xsk.c                                 |  1 +
 26 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index e8131dadc22c..6ad59f0068f6 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
 {
 	int rc;
 
-	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
+	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
 
 	if (rc) {
 		netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index fa147865e33f..5df13387ab74 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2894,7 +2894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 		if (rc)
 			return rc;
 
-		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
+		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
 		if (rc < 0)
 			return rc;
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 7a141ce32e86..f782e6af45e9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
 	rq->caching = 1;
 
 	/* Driver have no proper error path for failed XDP RX-queue info reg */
-	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
+	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
 
 	/* Send a mailbox msg to PF to config RQ */
 	mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index cf9400a9886d..40953980e846 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
 		return 0;
 
 	err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
-			       fq->flowid);
+			       fq->flowid, 0);
 	if (err) {
 		dev_err(dev, "xdp_rxq_info_reg failed\n");
 		return err;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d43ce13a93c9..a3d5bdaca2f5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1436,7 +1436,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	/* XDP RX-queue info only needed for RX rings exposed to XDP */
 	if (rx_ring->vsi->type == I40E_VSI_MAIN) {
 		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-				       rx_ring->queue_index);
+				       rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index fe4320e2d1f2..3124a3bf519a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 		if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
 			/* coverity[check_return] */
 			xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
-					 ring->q_index);
+					 ring->q_index, ring->q_vector->napi.napi_id);
 
 		ring->xsk_pool = ice_xsk_pool(ring);
 		if (ring->xsk_pool) {
@@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 				/* coverity[check_return] */
 				xdp_rxq_info_reg(&ring->xdp_rxq,
 						 ring->netdev,
-						 ring->q_index);
+						 ring->q_index, ring->q_vector->napi.napi_id);
 
 			err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 							 MEM_TYPE_PAGE_SHARED,
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index eae75260fe20..77d5eae6b4c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
 	if (rx_ring->vsi->type == ICE_VSI_PF &&
 	    !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
 		if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-				     rx_ring->q_index))
+				     rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
 			goto err;
 	return 0;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45ae33e15303..50e6b8b6ba7b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 
 	/* XDP RX-queue info */
 	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
-			     rx_ring->queue_index) < 0)
+			     rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
 		goto err;
 
 	rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 54b0bf574c05..7d0098f4ef9d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3219,7 +3219,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 		return err;
 	}
 
-	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
+	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
 	if (err < 0)
 		goto err_free_pp;
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index f6616c8933ca..ff8729b6c414 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2606,11 +2606,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
 	mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
 
 	if (priv->percpu_pools) {
-		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
+		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
 		if (err < 0)
 			goto err_free_dma;
 
-		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
+		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
 		if (err < 0)
 			goto err_unregister_rxq_short;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 502d1b97855c..f561979e5731 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 	ring->log_stride = ffs(ring->stride) - 1;
 	ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
 
-	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
+	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
 		goto err_ring;
 
 	tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b150da43adb2..b4acf2f41e84 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
 
 	if (dp->netdev) {
 		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
-				       rx_ring->idx);
+				       rx_ring->idx, rx_ring->r_vec->napi.napi_id);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..b73e95329acd 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
 
 			/* Driver have no error path from here */
 			WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
-						 fp->rxq->rxq_id) < 0);
+						 fp->rxq->rxq_id. 0) < 0);
 
 			if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
 						       MEM_TYPE_PAGE_ORDER0,
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 19cf7cac1e6e..68fc7d317693 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
 
 	/* Initialise XDP queue information */
 	rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
-			      rx_queue->core_index);
+			      rx_queue->core_index, 0);
 
 	if (rc) {
 		netif_err(efx, rx_err, efx->net_dev,
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 1503cc9ec6e2..80ab24658e87 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
 		goto err_out;
 	}
 
-	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
+	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, 0);
 	if (err)
 		goto err_out;
 
diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 51cc29f39038..d8f287c88d77 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1189,7 +1189,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
 	pool = cpsw->page_pool[ch];
 	rxq = &priv->xdp_rxq[ch];
 
-	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
+	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
 	if (ret)
 		return ret;
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0c3de94b5178..fa8341f8359a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
 		u64_stats_init(&nvchan->tx_stats.syncp);
 		u64_stats_init(&nvchan->rx_stats.syncp);
 
-		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
+		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
 
 		if (ret) {
 			netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index be69d272052f..f2541d645707 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -791,7 +791,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	} else {
 		/* Setup XDP RX-queue info, for new tfile getting attached */
 		err = xdp_rxq_info_reg(&tfile->xdp_rxq,
-				       tun->dev, tfile->queue_index);
+				       tun->dev, tfile->queue_index, 0);
 		if (err < 0)
 			goto out;
 		err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..04d20e9d8431 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -926,7 +926,7 @@ static int veth_enable_xdp(struct net_device *dev)
 		for (i = 0; i < dev->real_num_rx_queues; i++) {
 			struct veth_rq *rq = &priv->rq[i];
 
-			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
+			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, 0);
 			if (err < 0)
 				goto err_rxq_reg;
 
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..d71fe41595b7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
+		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, 0);
 		if (err < 0)
 			return err;
 
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3e9895bec15f..28714a48f5d0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
 	}
 
 	err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
-			       queue->id);
+			       queue->id, 0);
 	if (err) {
 		netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
 		goto err_free_pp;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c6c413d3824d..262f60065355 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -148,14 +148,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 	sk_rx_queue_set(sk, skb);
 }
 
-/* variant used for unconnected sockets */
-static inline void sk_mark_napi_id_once(struct sock *sk,
-					const struct sk_buff *skb)
+static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	if (!READ_ONCE(sk->sk_napi_id))
-		WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
+		WRITE_ONCE(sk->sk_napi_id, napi_id);
 #endif
 }
 
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+					const struct sk_buff *skb)
+{
+	__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
+}
+
+static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
+					    const struct xdp_buff *xdp)
+{
+	__sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
+}
+
 #endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3814fb631d52..4d4255a94773 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -59,6 +59,7 @@ struct xdp_rxq_info {
 	u32 queue_index;
 	u32 reg_state;
 	struct xdp_mem_info mem;
+	unsigned int napi_id;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
 struct xdp_txq_info {
@@ -211,7 +212,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index);
+		     struct net_device *dev, u32 queue_index, unsigned int napi_id);
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
 bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index b34520acaa7f..ad3261be5e21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9834,7 +9834,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 		rx[i].dev = dev;
 
 		/* XDP RX-queue setup */
-		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
+		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
 		if (err < 0)
 			goto err_rxq_info;
 	}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 48aba933a5a8..7cca7cb5b65f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
 
 /* Returns 0 on success, negative on failure */
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index)
+		     struct net_device *dev, u32 queue_index, unsigned int napi_id)
 {
 	if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
 		WARN(1, "Driver promised not to register this");
@@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
 	xdp_rxq_info_init(xdp_rxq);
 	xdp_rxq->dev = dev;
 	xdp_rxq->queue_index = queue_index;
+	xdp_rxq->napi_id = napi_id;
 
 	xdp_rxq->reg_state = REG_STATE_REGISTERED;
 	return 0;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index da649b4f377c..0b825612d895 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
+	sk_mark_napi_id_once_xdp(&xs->sk, xdp);
 	len = xdp->data_end - xdp->data;
 
 	return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (5 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Start using recvfrom() the rxdrop scenario.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 1149e94ca32f..96d0b6482ac4 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1172,7 +1172,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
 	}
 }
 
-static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
+static void rx_drop(struct xsk_socket_info *xsk)
 {
 	unsigned int rcvd, i;
 	u32 idx_rx = 0, idx_fq = 0;
@@ -1182,7 +1182,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	if (!rcvd) {
 		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
-			ret = poll(fds, num_socks, opt_timeout);
+			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
 		return;
 	}
@@ -1193,7 +1193,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 			exit_with_error(-ret);
 		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.fill_fail_polls++;
-			ret = poll(fds, num_socks, opt_timeout);
+			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
 		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
 	}
@@ -1235,7 +1235,7 @@ static void rx_drop_all(void)
 		}
 
 		for (i = 0; i < num_socks; i++)
-			rx_drop(xsks[i], fds);
+			rx_drop(xsks[i]);
 
 		if (benchmark_done)
 			break;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (6 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
  2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Add a new option to xdpsock, 'B', for busy-polling. This option will
also set the batching size, 'b' option, to the busy-poll budget.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 40 +++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 96d0b6482ac4..7ef2c01a1094 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -95,6 +95,7 @@ static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
 static u32 opt_num_xsks = 1;
 static u32 prog_id;
+static bool opt_busy_poll;
 
 struct xsk_ring_stats {
 	unsigned long rx_npkts;
@@ -911,6 +912,7 @@ static struct option long_options[] = {
 	{"quiet", no_argument, 0, 'Q'},
 	{"app-stats", no_argument, 0, 'a'},
 	{"irq-string", no_argument, 0, 'I'},
+	{"busy-poll", no_argument, 0, 'B'},
 	{0, 0, 0, 0}
 };
 
@@ -949,6 +951,7 @@ static void usage(const char *prog)
 		"  -Q, --quiet          Do not display any stats.\n"
 		"  -a, --app-stats	Display application (syscall) statistics.\n"
 		"  -I, --irq-string	Display driver interrupt statistics for interface associated with irq-string.\n"
+		"  -B, --busy-poll      Busy poll.\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
 		opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
@@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
+		c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
 				fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
 				usage(basename(argv[0]));
 			}
-
+			break;
+		case 'B':
+			opt_busy_poll = 1;
 			break;
 		default:
 			usage(basename(argv[0]));
@@ -1132,7 +1137,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
+			if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
 				xsk->app_stats.fill_fail_polls++;
 				ret = poll(fds, num_socks, opt_timeout);
 			}
@@ -1180,7 +1185,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
 
 	rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
 	if (!rcvd) {
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
 			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
@@ -1191,7 +1196,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	while (ret != rcvd) {
 		if (ret < 0)
 			exit_with_error(-ret);
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.fill_fail_polls++;
 			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
@@ -1342,7 +1347,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 
 	rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
 	if (!rcvd) {
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
 			ret = poll(fds, num_socks, opt_timeout);
 		}
@@ -1354,7 +1359,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 		if (ret < 0)
 			exit_with_error(-ret);
 		complete_tx_l2fwd(xsk, fds);
-		if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
 			xsk->app_stats.tx_wakeup_sendtos++;
 			kick_tx(xsk);
 		}
@@ -1461,6 +1466,24 @@ static void enter_xsks_into_map(struct bpf_object *obj)
 	}
 }
 
+static void apply_setsockopt(struct xsk_socket_info *xsk)
+{
+	int sock_opt;
+
+	if (!opt_busy_poll)
+		return;
+
+	sock_opt = 1;
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BIAS_BUSY_POLL,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
+
+	sock_opt = 20; // randomly picked :-P
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
+}
+
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -1502,6 +1525,9 @@ int main(int argc, char **argv)
 	for (i = 0; i < opt_num_xsks; i++)
 		xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
 
+	for (i = 0; i < opt_num_xsks; i++)
+		apply_setsockopt(xsks[i]);
+
 	if (opt_bench == BENCH_TXONLY) {
 		gen_eth_hdr_data();
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (7 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
@ 2020-10-28 13:34 ` Björn Töpel
  2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet
  9 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-10-28 13:34 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, intel-wired-lan, jonathan.lemon

From: Björn Töpel <bjorn.topel@intel.com>

Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching
option ('b').

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7ef2c01a1094..948faada96d5 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1482,6 +1482,11 @@ static void apply_setsockopt(struct xsk_socket_info *xsk)
 	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
 		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
 		exit_with_error(errno);
+
+	sock_opt = opt_batch_size;
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
 }
 
 int main(int argc, char **argv)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH bpf-next 0/9] Introduce biased busy-polling
  2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
                   ` (8 preceding siblings ...)
  2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
@ 2020-10-28 14:13 ` Eric Dumazet
  9 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2020-10-28 14:13 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson,
	Alexei Starovoitov, Daniel Borkmann, maciej.fijalkowski,
	Samudrala, Sridhar, Jesse Brandeburg, qi.z.zhang, Jakub Kicinski,
	intel-wired-lan, Jonathan Lemon

On Wed, Oct 28, 2020 at 2:35 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> Jakub suggested in [1] a "strict busy-polling mode with out
> interrupts". This is a first stab at that.
>
> This series adds a new NAPI mode, called biased busy-polling, which is
> an extension to the existing busy-polling mode. The new mode is
> enabled on the socket layer, where a socket setting this option
> "promisies" to busy-poll the NAPI context via a system call. When this
> mode is enabled, the NAPI context will operate in a mode with
> interrupts disabled. The kernel monitors that the busy-polling promise
> is fulfilled by an internal watchdog. If the socket fail/stop
> performing the busy-polling, the mode will be disabled.
>
> Biased busy-polling follows the same mechanism as the existing
> busy-poll; The napi_id is reported to the socket via the skbuff. Later
> commits will extend napi_id reporting to XDP, in order to work
> correctly with XDP sockets.
>
> Let us walk through a flow of execution:
>
> 1. A socket sets the new SO_BIAS_BUSY_POLL socket option to true. The
>    socket now shows an intent of doing busy-polling. No data has been
>    received to the socket, so the napi_id of the socket is still 0
>    (non-valid). As usual for busy-polling, the SO_BUSY_POLL option
>    also has to be non-zero for biased busy-polling.
>
> 2. Data is received on the socket changing the napi_id to non-zero.
>
> 3. The socket does a system call that has the busy-polling logic wired
>    up, e.g. recvfrom() for UDP sockets. The NAPI context is now marked
>    as biased busy-poll. The kernel watchdog is armed. If the NAPI
>    context is already running, it will try to finish as soon as
>    possible and move to busy-polling. If the NAPI context is not
>    running, it will execute the NAPI poll function for the
>    corresponding napi_id.
>
> 4. Goto 3, or wait until the watchdog timeout.
>
> The series is outlined as following:
>   Patch 1-2: Biased busy-polling, and option to set busy-poll budget.
>   Patch 3-6: Busy-poll plumbing for XDP sockets
>   Patch 7-9: Add busy-polling support to the xdpsock sample
>
> Performance UDP sockets:
>
> I hacked netperf to use non-blocking sockets, and looping over
> recvfrom(). The following command-line was used:
>   $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
>       -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
>
> Non-blocking:
>   16,18.45,195,0.94,54070.369
> Non-blocking with biased busy-polling:
>   15,16.59,38,0.70,60086.313
>

But a fair comparison should be done using current busy-polling mode,
which does not require netperf to use non-blocking mode in the first place ?

Would disabling/rearming interrupts about 60,000 times per second
bring any benefit ?


Additional questions :

- What happens to the gro_flush_timeout and accumulated TCP segments
in GRO engine
while the biased busy-polling is in use ?

- What mechanism would avoid a potential 200 ms latency when the
application wants to exit cleanly ?
  Presumably when/if SO_BIAS_BUSY_POLL is used to clear
sk->sk_bias_busy_poll we need
 to make sure device interrupts are re-enabled.


> Performance XDP sockets:
>
> Today, running XDP sockets sample on the same core as the softirq
> handling, performance tanks mainly because we do not yield to
> user-space when the XDP socket Rx queue is full.
>   # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
>   Rx: 64Kpps
>
>   # # biased busy-polling, budget 8
>   # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
>   Rx 9.9Mpps
>   # # biased busy-polling, budget 64
>   # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
>   Rx: 19.3Mpps
>   # # biased busy-polling, budget 256
>   # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
>   Rx: 21.4Mpps
>   # # biased busy-polling, budget 512
>   # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
>   Rx: 21.4Mpps
>
> Compared to the two-core case:
>   # taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
>   Rx: 20.7Mpps
>
> We're getting better single-core performance than two, for this naïve
> drop scenario.
>
> The above tests was done for the 'ice' driver.
>
> Some outstanding questions:
>
> * Does biased busy-polling make sense for non-XDP sockets? For a
>   dedicated queue, biased busy-polling has a strong case. When the
>   NAPI is shared with other sockets, it can affect the latencies of
>   sockets that were not explicity busy-poll enabled. Note that this
>   true for regular busy-polling as well, but the biased version is
>   stricter.
>
> * Currently busy-polling for UDP/TCP is only wired up in the recvmsg()
>   path. Does it make sense to extend that to sendmsg() as well?
>
> * Biased busy-polling only makes sense for non-blocking sockets. Reject
>   enabling of biased busy-polling unless the socket is non-blocking?
>
> * The watchdog is 200 ms. Should it be configurable?
>
> * Extending xdp_rxq_info_reg() with napi_id touches a lot of drivers,
>   and I've only verified the Intel ones. Some drivers initialize NAPI
>   (generating the napi_id) after the xdp_rxq_info_reg() call, which
>   maybe would open up for another API? I did not send this RFC to all
>   the driver authors. I'll do that for a patch proper series.
>
> * Today, enabling busy-polling require CAP_NET_ADMIN. For a NAPI
>   context that services multiple socket, this makes sense because one
>   socket can affect performance of other sockets. Now, for a
>   *dedicated* queue for say XDP socket, would it be OK to drop
>   CAP_NET_ADMIN, because it cannot affect other sockets/users?
>
> @Jakub Thanks for the early comments. I left the check in
> napi_schedule_prep(), because I hit that for the Intel i40e driver;
> forcing busy-polling on a core outside the interrupt affinity mask.
>
> [1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
>
> Björn Töpel (9):
>   net: introduce biased busy-polling
>   net: add SO_BUSY_POLL_BUDGET socket option
>   xsk: add support for recvmsg()
>   xsk: check need wakeup flag in sendmsg()
>   xsk: add busy-poll support for {recv,send}msg()
>   xsk: propagate napi_id to XDP socket Rx path
>   samples/bpf: use recvfrom() in xdpsock
>   samples/bpf: add busy-poll support to xdpsock
>   samples/bpf: add option to set the busy-poll budget
>
>  arch/alpha/include/uapi/asm/socket.h          |   3 +
>  arch/mips/include/uapi/asm/socket.h           |   3 +
>  arch/parisc/include/uapi/asm/socket.h         |   3 +
>  arch/sparc/include/uapi/asm/socket.h          |   3 +
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  |   2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
>  .../ethernet/cavium/thunder/nicvf_queues.c    |   2 +-
>  .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |   2 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   2 +-
>  drivers/net/ethernet/intel/ice/ice_base.c     |   4 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.c     |   2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   2 +-
>  drivers/net/ethernet/marvell/mvneta.c         |   2 +-
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   4 +-
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c    |   2 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |   2 +-
>  drivers/net/ethernet/qlogic/qede/qede_main.c  |   2 +-
>  drivers/net/ethernet/sfc/rx_common.c          |   2 +-
>  drivers/net/ethernet/socionext/netsec.c       |   2 +-
>  drivers/net/ethernet/ti/cpsw_priv.c           |   2 +-
>  drivers/net/hyperv/netvsc.c                   |   2 +-
>  drivers/net/tun.c                             |   2 +-
>  drivers/net/veth.c                            |   2 +-
>  drivers/net/virtio_net.c                      |   2 +-
>  drivers/net/xen-netfront.c                    |   2 +-
>  fs/eventpoll.c                                |   3 +-
>  include/linux/netdevice.h                     |  33 +++---
>  include/net/busy_poll.h                       |  42 +++++--
>  include/net/sock.h                            |   4 +
>  include/net/xdp.h                             |   3 +-
>  include/uapi/asm-generic/socket.h             |   3 +
>  net/core/dev.c                                | 111 +++++++++++++++---
>  net/core/sock.c                               |  19 +++
>  net/core/xdp.c                                |   3 +-
>  net/xdp/xsk.c                                 |  36 +++++-
>  net/xdp/xsk_buff_pool.c                       |  13 +-
>  samples/bpf/xdpsock_user.c                    |  53 +++++++--
>  37 files changed, 296 insertions(+), 85 deletions(-)
>
>
> base-commit: 3cb12d27ff655e57e8efe3486dca2a22f4e30578
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-29  1:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 13:34 [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 1/9] net: introduce " Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
2020-10-28 13:34 ` [RFC PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
2020-10-28 14:13 ` [RFC PATCH bpf-next 0/9] Introduce biased busy-polling Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).