bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 00/10] Introduce preferred busy-polling
@ 2020-11-19  8:30 Björn Töpel
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, bjorn.topel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

This series introduces three new features:

1. A new "heavy traffic" busy-polling variant that works in concert
   with the existing napi_defer_hard_irqs and gro_flush_timeout knobs.

2. A new socket option that let a user change the busy-polling NAPI
   budget.

3. Allow busy-polling to be performed on XDP sockets.

The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.

One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.

This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.

If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.

In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.

Patch 6 touches a lot of drivers, so the Cc: list is grossly long.


Example usage:

  $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
  $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.

Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.


Performance simple UDP ping-pong:

A packet generator blasts UDP packets from a packet generator to a
certain {src,dst}IP/port, so a dedicated ksoftirq will be busy
handling the packets at a certain core.

A simple UDP test program that simply does recvfrom/sendto is running
at the host end. Throughput in pps and RTT latency is measured at the
packet generator.

/proc/sys/net/core/busy_read is set (20).

Min       Max       Avg (usec)

1. Blocking 2-cores:                       490Kpps
 1218.192  1335.427  1271.083

2. Blocking, 1-core:                       155Kpps
 1327.195 17294.855  4761.367

3. Non-blocking, 2-cores:                  475Kpps
 1221.197  1330.465  1270.740

4. Non-blocking, 1-core:                     3Kpps
29006.482 37260.465 33128.367

5. Non-blocking, prefer busy-poll, 1-core: 420Kpps
 1202.535  5494.052  4885.443 

Scenario 2 and 5 shows when the new option should be used. Throughput
go from 155 to 420Kpps, average latency are similar, but the tail
latencies are much better for the latter.


Performance XDP sockets:

Again, a packet generator blasts UDP packets from a packet generator
to a certain {src,dst}IP/port.

Today, running XDP sockets sample on the same core as the softirq
handling, performance tanks mainly because we do not yield to
user-space when the XDP socket Rx queue is full.

  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
  Rx: 64Kpps
  
  # # biased busy-polling, budget 8
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
  Rx 9.9Mpps
  # # biased busy-polling, budget 64
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
  Rx: 19.3Mpps
  # # biased busy-polling, budget 256
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
  Rx: 21.4Mpps
  # # biased busy-polling, budget 512
  # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
  Rx: 21.7Mpps

Compared to the two-core case:
  # taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
  Rx: 20.7Mpps

We're getting better single-core performance than two, for this naïve
drop scenario.


Performance netperf UDP_RR:

Note that netperf UDP_RR is not a heavy traffic tests, and preferred
busy-polling is not typically something we want to use here.

  $ echo 20 | sudo tee /proc/sys/net/core/busy_read
  $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
      -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate

busy-polling blocking sockets:            12,13.33,224,0.63,74731.177

I hacked netperf to use non-blocking sockets and re-ran:

busy-polling non-blocking sockets:        12,13.46,218,0.72,73991.172
prefer busy-polling non-blocking sockets: 12,13.62,221,0.59,73138.448

Using the preferred busy-polling mode does not impact performance.

The above tests was done for the 'ice' driver.

Thanks to Jakub for suggesting this busy-polling addition [1], and
Eric for all input/review!


Changes:

rfc-v1 [2] -> rfc-v2:
  * Changed name from bias to prefer.
  * Base the work on Eric's/Luigi's defer irq/gro timeout work.
  * Proper GRO flushing.
  * Build issues for some XDP drivers.

rfc-v2 [3] -> v1:
  * Fixed broken qlogic build.
  * Do not trigger an IPI (XDP socket wakeup) when busy-polling is
    enabled.

v1 [4] -> v2:
  * Added napi_id to socionext driver, and added Ilias Acked-by:. (Ilias)
  * Added a samples patch to improve busy-polling for xdpsock/l2fwd.
  * Correctly mark atomic operations with {WRITE,READ}_ONCE, to make
    KCSAN and the code readers happy. (Eric)
  * Check NAPI budget not to exceed U16_MAX. (Eric)
  * Added kdoc.

v2 [5] -> v3:
  * Collected Acked-by:.
  * Check NAPI disable prior prefer busy-polling. (Jakub)
  * Added napi_id registration for virtio-net. (Michael)
  * Added napi_id registration for veth.

[1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
[2] https://lore.kernel.org/bpf/20201028133437.212503-1-bjorn.topel@gmail.com/
[3] https://lore.kernel.org/bpf/20201105102812.152836-1-bjorn.topel@gmail.com/
[4] https://lore.kernel.org/bpf/20201112114041.131998-1-bjorn.topel@gmail.com/
[5] https://lore.kernel.org/bpf/20201116110416.10719-1-bjorn.topel@gmail.com/

Björn Töpel (10):
  net: introduce preferred busy-polling
  net: add SO_BUSY_POLL_BUDGET socket option
  xsk: add support for recvmsg()
  xsk: check need wakeup flag in sendmsg()
  xsk: add busy-poll support for {recv,send}msg()
  xsk: propagate napi_id to XDP socket Rx path
  samples/bpf: use recvfrom() in xdpsock/rxdrop
  samples/bpf: use recvfrom() in xdpsock/l2fwd
  samples/bpf: add busy-poll support to xdpsock
  samples/bpf: add option to set the busy-poll budget

 arch/alpha/include/uapi/asm/socket.h          |  3 +
 arch/mips/include/uapi/asm/socket.h           |  3 +
 arch/parisc/include/uapi/asm/socket.h         |  3 +
 arch/sparc/include/uapi/asm/socket.h          |  3 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 .../ethernet/cavium/thunder/nicvf_queues.c    |  2 +-
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
 drivers/net/ethernet/intel/ice/ice_base.c     |  4 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  2 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
 drivers/net/ethernet/sfc/rx_common.c          |  2 +-
 drivers/net/ethernet/socionext/netsec.c       |  2 +-
 drivers/net/ethernet/ti/cpsw_priv.c           |  2 +-
 drivers/net/hyperv/netvsc.c                   |  2 +-
 drivers/net/tun.c                             |  2 +-
 drivers/net/veth.c                            | 12 ++-
 drivers/net/virtio_net.c                      |  2 +-
 drivers/net/xen-netfront.c                    |  2 +-
 fs/eventpoll.c                                |  3 +-
 include/linux/netdevice.h                     | 35 +++++---
 include/net/busy_poll.h                       | 27 ++++--
 include/net/sock.h                            |  6 ++
 include/net/xdp.h                             |  3 +-
 include/uapi/asm-generic/socket.h             |  3 +
 net/core/dev.c                                | 89 ++++++++++++++-----
 net/core/sock.c                               | 19 ++++
 net/core/xdp.c                                |  3 +-
 net/xdp/xsk.c                                 | 53 ++++++++++-
 net/xdp/xsk_buff_pool.c                       | 13 ++-
 samples/bpf/xdpsock_user.c                    | 78 ++++++++++------
 40 files changed, 299 insertions(+), 107 deletions(-)


base-commit: 4e99d115d865d45e17e83478d757b58d8fa66d3c
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-24  0:04   ` Jakub Kicinski
                     ` (2 more replies)
  2020-11-19  8:30 ` [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.

One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.

This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.

If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.

In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.

Example usage:

  $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
  $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.

Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  2 +
 arch/mips/include/uapi/asm/socket.h   |  2 +
 arch/parisc/include/uapi/asm/socket.h |  2 +
 arch/sparc/include/uapi/asm/socket.h  |  2 +
 fs/eventpoll.c                        |  2 +-
 include/linux/netdevice.h             | 35 +++++++-----
 include/net/busy_poll.h               |  5 +-
 include/net/sock.h                    |  4 ++
 include/uapi/asm-generic/socket.h     |  2 +
 net/core/dev.c                        | 78 +++++++++++++++++++++------
 net/core/sock.c                       |  9 ++++
 11 files changed, 111 insertions(+), 32 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index de6c4df61082..538359642554 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -124,6 +124,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_PREFER_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d0a9ed2ca2d6..e406e73b5e6e 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -135,6 +135,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_PREFER_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 10173c32195e..1bc46200889d 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -116,6 +116,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 0x4042
 
+#define SO_PREFER_BUSY_POLL	0x4043
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 8029b681fc7c..99688cf673a4 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -117,6 +117,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF  0x0047
 
+#define SO_PREFER_BUSY_POLL	 0x0048
+
 #if !defined(__KERNEL__)
 
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..e11fab3a0b9e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,7 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
 	unsigned int napi_id = READ_ONCE(ep->napi_id);
 
 	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
-		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
+		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false);
 }
 
 static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7ce648a564f7..52d1cc2bd8a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -350,23 +350,25 @@ struct napi_struct {
 };
 
 enum {
-	NAPI_STATE_SCHED,	/* Poll is scheduled */
-	NAPI_STATE_MISSED,	/* reschedule a napi */
-	NAPI_STATE_DISABLE,	/* Disable pending */
-	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
-	NAPI_STATE_LISTED,	/* NAPI added to system lists */
-	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
-	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+	NAPI_STATE_SCHED,		/* Poll is scheduled */
+	NAPI_STATE_MISSED,		/* reschedule a napi */
+	NAPI_STATE_DISABLE,		/* Disable pending */
+	NAPI_STATE_NPSVC,		/* Netpoll - don't dequeue from poll_list */
+	NAPI_STATE_LISTED,		/* NAPI added to system lists */
+	NAPI_STATE_NO_BUSY_POLL,	/* Do not add in napi_hash, no busy polling */
+	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
+	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
 };
 
 enum {
-	NAPIF_STATE_SCHED	 = BIT(NAPI_STATE_SCHED),
-	NAPIF_STATE_MISSED	 = BIT(NAPI_STATE_MISSED),
-	NAPIF_STATE_DISABLE	 = BIT(NAPI_STATE_DISABLE),
-	NAPIF_STATE_NPSVC	 = BIT(NAPI_STATE_NPSVC),
-	NAPIF_STATE_LISTED	 = BIT(NAPI_STATE_LISTED),
-	NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
-	NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+	NAPIF_STATE_SCHED		= BIT(NAPI_STATE_SCHED),
+	NAPIF_STATE_MISSED		= BIT(NAPI_STATE_MISSED),
+	NAPIF_STATE_DISABLE		= BIT(NAPI_STATE_DISABLE),
+	NAPIF_STATE_NPSVC		= BIT(NAPI_STATE_NPSVC),
+	NAPIF_STATE_LISTED		= BIT(NAPI_STATE_LISTED),
+	NAPIF_STATE_NO_BUSY_POLL	= BIT(NAPI_STATE_NO_BUSY_POLL),
+	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
+	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
 };
 
 enum gro_result {
@@ -437,6 +439,11 @@ static inline bool napi_disable_pending(struct napi_struct *n)
 	return test_bit(NAPI_STATE_DISABLE, &n->state);
 }
 
+static inline bool napi_prefer_busy_poll(struct napi_struct *n)
+{
+	return test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
+}
+
 bool napi_schedule_prep(struct napi_struct *n);
 
 /**
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index b001fa91c14e..0292b8353d7e 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -43,7 +43,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg);
+		    void *loop_end_arg, bool prefer_busy_poll);
 
 #else /* CONFIG_NET_RX_BUSY_POLL */
 static inline unsigned long net_busy_loop_on(void)
@@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 	unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
 
 	if (napi_id >= MIN_NAPI_ID)
-		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
+			       READ_ONCE(sk->sk_prefer_busy_poll));
 #endif
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index a5c6ae78df77..d49b89b071b6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -301,6 +301,7 @@ struct bpf_local_storage;
   *	@sk_ack_backlog: current listen backlog
   *	@sk_max_ack_backlog: listen backlog set in listen()
   *	@sk_uid: user id of owner
+  *	@sk_prefer_busy_poll: prefer busypolling over softirq processing
   *	@sk_priority: %SO_PRIORITY setting
   *	@sk_type: socket type (%SOCK_STREAM, etc)
   *	@sk_protocol: which protocol this socket belongs in this network family
@@ -479,6 +480,9 @@ struct sock {
 	u32			sk_ack_backlog;
 	u32			sk_max_ack_backlog;
 	kuid_t			sk_uid;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	u8			sk_prefer_busy_poll;
+#endif
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 77f7c1638eb1..7dd02408b7ce 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -119,6 +119,8 @@
 
 #define SO_DETACH_REUSEPORT_BPF 68
 
+#define SO_PREFER_BUSY_POLL	69
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/dev.c b/net/core/dev.c
index 60d325bda0d7..6f8d2cffb7c5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6458,7 +6458,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 
 		WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
 
-		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED);
+		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+			      NAPIF_STATE_PREFER_BUSY_POLL);
 
 		/* If STATE_MISSED was set, leave STATE_SCHED set,
 		 * because we will call napi->poll() one more time.
@@ -6497,8 +6498,29 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 
 #define BUSY_POLL_BUDGET 8
 
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
+static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
 {
+	if (!skip_schedule) {
+		gro_normal_list(napi);
+		__napi_schedule(napi);
+		return;
+	}
+
+	if (napi->gro_bitmask) {
+		/* flush too old packets
+		 * If HZ < 1000, flush all packets.
+		 */
+		napi_gro_flush(napi, HZ >= 1000);
+	}
+
+	gro_normal_list(napi);
+	clear_bit(NAPI_STATE_SCHED, &napi->state);
+}
+
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll)
+{
+	bool skip_schedule = false;
+	unsigned long timeout;
 	int rc;
 
 	/* Busy polling means there is a high chance device driver hard irq
@@ -6515,6 +6537,15 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 
 	local_bh_disable();
 
+	if (prefer_busy_poll) {
+		napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
+		timeout = READ_ONCE(napi->dev->gro_flush_timeout);
+		if (napi->defer_hard_irqs_count && timeout) {
+			hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
+			skip_schedule = true;
+		}
+	}
+
 	/* All we really want here is to re-enable device interrupts.
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
 	 */
@@ -6525,19 +6556,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 	 */
 	trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
 	netpoll_poll_unlock(have_poll_lock);
-	if (rc == BUSY_POLL_BUDGET) {
-		/* As the whole budget was spent, we still own the napi so can
-		 * safely handle the rx_list.
-		 */
-		gro_normal_list(napi);
-		__napi_schedule(napi);
-	}
+	if (rc == BUSY_POLL_BUDGET)
+		__busy_poll_stop(napi, skip_schedule);
 	local_bh_enable();
 }
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg)
+		    void *loop_end_arg, bool prefer_busy_poll)
 {
 	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
 	int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6565,12 +6591,18 @@ void napi_busy_loop(unsigned int napi_id,
 			 * we avoid dirtying napi->state as much as we can.
 			 */
 			if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
-				   NAPIF_STATE_IN_BUSY_POLL))
+				   NAPIF_STATE_IN_BUSY_POLL)) {
+				if (prefer_busy_poll)
+					set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
 				goto count;
+			}
 			if (cmpxchg(&napi->state, val,
 				    val | NAPIF_STATE_IN_BUSY_POLL |
-					  NAPIF_STATE_SCHED) != val)
+					  NAPIF_STATE_SCHED) != val) {
+				if (prefer_busy_poll)
+					set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
 				goto count;
+			}
 			have_poll_lock = netpoll_poll_lock(napi);
 			napi_poll = napi->poll;
 		}
@@ -6588,7 +6620,7 @@ void napi_busy_loop(unsigned int napi_id,
 
 		if (unlikely(need_resched())) {
 			if (napi_poll)
-				busy_poll_stop(napi, have_poll_lock);
+				busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
 			preempt_enable();
 			rcu_read_unlock();
 			cond_resched();
@@ -6599,7 +6631,7 @@ void napi_busy_loop(unsigned int napi_id,
 		cpu_relax();
 	}
 	if (napi_poll)
-		busy_poll_stop(napi, have_poll_lock);
+		busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
 	preempt_enable();
 out:
 	rcu_read_unlock();
@@ -6650,8 +6682,10 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 	 * NAPI_STATE_MISSED, since we do not react to a device IRQ.
 	 */
 	if (!napi_disable_pending(napi) &&
-	    !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
+	    !test_and_set_bit(NAPI_STATE_SCHED, &napi->state)) {
+		clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
 		__napi_schedule_irqoff(napi);
+	}
 
 	return HRTIMER_NORESTART;
 }
@@ -6709,6 +6743,7 @@ void napi_disable(struct napi_struct *n)
 
 	hrtimer_cancel(&n->timer);
 
+	clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }
 EXPORT_SYMBOL(napi_disable);
@@ -6781,6 +6816,19 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 		goto out_unlock;
 	}
 
+	/* The NAPI context has more processing work, but busy-polling
+	 * is preferred. Exit early.
+	 */
+	if (napi_prefer_busy_poll(n)) {
+		if (napi_complete_done(n, work)) {
+			/* If timeout is not set, we need to make sure
+			 * that the NAPI is re-scheduled.
+			 */
+			napi_schedule(n);
+		}
+		goto out_unlock;
+	}
+
 	if (n->gro_bitmask) {
 		/* flush too old packets
 		 * If HZ < 1000, flush all packets.
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..e05f2e52b5a8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 				sk->sk_ll_usec = val;
 		}
 		break;
+	case SO_PREFER_BUSY_POLL:
+		if (valbool && !capable(CAP_NET_ADMIN))
+			ret = -EPERM;
+		else
+			WRITE_ONCE(sk->sk_prefer_busy_poll, valbool);
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
@@ -1523,6 +1529,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	case SO_BUSY_POLL:
 		v.val = sk->sk_ll_usec;
 		break;
+	case SO_PREFER_BUSY_POLL:
+		v.val = READ_ONCE(sk->sk_prefer_busy_poll);
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-24 16:21   ` Jakub Kicinski
  2020-11-19  8:30 ` [PATCH bpf-next v3 03/10] xsk: add support for recvmsg() Björn Töpel
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  1 +
 arch/mips/include/uapi/asm/socket.h   |  1 +
 arch/parisc/include/uapi/asm/socket.h |  1 +
 arch/sparc/include/uapi/asm/socket.h  |  1 +
 fs/eventpoll.c                        |  3 ++-
 include/net/busy_poll.h               |  7 +++++--
 include/net/sock.h                    |  2 ++
 include/uapi/asm-generic/socket.h     |  1 +
 net/core/dev.c                        | 21 ++++++++++-----------
 net/core/sock.c                       | 10 ++++++++++
 10 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 538359642554..57420356ce4c 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -125,6 +125,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_PREFER_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index e406e73b5e6e..2d949969313b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -136,6 +136,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_PREFER_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 1bc46200889d..f60904329bbc 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -117,6 +117,7 @@
 #define SO_DETACH_REUSEPORT_BPF 0x4042
 
 #define SO_PREFER_BUSY_POLL	0x4043
+#define SO_BUSY_POLL_BUDGET	0x4044
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 99688cf673a4..848a22fbac20 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -118,6 +118,7 @@
 #define SO_DETACH_REUSEPORT_BPF  0x0047
 
 #define SO_PREFER_BUSY_POLL	 0x0048
+#define SO_BUSY_POLL_BUDGET	 0x0049
 
 #if !defined(__KERNEL__)
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e11fab3a0b9e..73c346e503d7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,8 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
 	unsigned int napi_id = READ_ONCE(ep->napi_id);
 
 	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
-		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false);
+		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
+			       BUSY_POLL_BUDGET);
 }
 
 static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 0292b8353d7e..2f8f51807b83 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -23,6 +23,8 @@
  */
 #define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1))
 
+#define BUSY_POLL_BUDGET 8
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 
 struct napi_struct;
@@ -43,7 +45,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg, bool prefer_busy_poll);
+		    void *loop_end_arg, bool prefer_busy_poll, u16 budget);
 
 #else /* CONFIG_NET_RX_BUSY_POLL */
 static inline unsigned long net_busy_loop_on(void)
@@ -106,7 +108,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 
 	if (napi_id >= MIN_NAPI_ID)
 		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
-			       READ_ONCE(sk->sk_prefer_busy_poll));
+			       READ_ONCE(sk->sk_prefer_busy_poll),
+			       READ_ONCE(sk->sk_busy_poll_budget) ?: BUSY_POLL_BUDGET);
 #endif
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index d49b89b071b6..77ba2c2737db 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -302,6 +302,7 @@ struct bpf_local_storage;
   *	@sk_max_ack_backlog: listen backlog set in listen()
   *	@sk_uid: user id of owner
   *	@sk_prefer_busy_poll: prefer busypolling over softirq processing
+  *	@sk_busy_poll_budget: napi processing budget when busypolling
   *	@sk_priority: %SO_PRIORITY setting
   *	@sk_type: socket type (%SOCK_STREAM, etc)
   *	@sk_protocol: which protocol this socket belongs in this network family
@@ -482,6 +483,7 @@ struct sock {
 	kuid_t			sk_uid;
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	u8			sk_prefer_busy_poll;
+	u16			sk_busy_poll_budget;
 #endif
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 7dd02408b7ce..4dcd13d097a9 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -120,6 +120,7 @@
 #define SO_DETACH_REUSEPORT_BPF 68
 
 #define SO_PREFER_BUSY_POLL	69
+#define SO_BUSY_POLL_BUDGET	70
 
 #if !defined(__KERNEL__)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 6f8d2cffb7c5..7a1e5936c67f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6496,8 +6496,6 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 
 #if defined(CONFIG_NET_RX_BUSY_POLL)
 
-#define BUSY_POLL_BUDGET 8
-
 static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
 {
 	if (!skip_schedule) {
@@ -6517,7 +6515,8 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
 	clear_bit(NAPI_STATE_SCHED, &napi->state);
 }
 
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll)
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll,
+			   u16 budget)
 {
 	bool skip_schedule = false;
 	unsigned long timeout;
@@ -6549,21 +6548,21 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
 	/* All we really want here is to re-enable device interrupts.
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
 	 */
-	rc = napi->poll(napi, BUSY_POLL_BUDGET);
+	rc = napi->poll(napi, budget);
 	/* We can't gro_normal_list() here, because napi->poll() might have
 	 * rearmed the napi (napi_complete_done()) in which case it could
 	 * already be running on another CPU.
 	 */
-	trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
+	trace_napi_poll(napi, rc, budget);
 	netpoll_poll_unlock(have_poll_lock);
-	if (rc == BUSY_POLL_BUDGET)
+	if (rc == budget)
 		__busy_poll_stop(napi, skip_schedule);
 	local_bh_enable();
 }
 
 void napi_busy_loop(unsigned int napi_id,
 		    bool (*loop_end)(void *, unsigned long),
-		    void *loop_end_arg, bool prefer_busy_poll)
+		    void *loop_end_arg, bool prefer_busy_poll, u16 budget)
 {
 	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
 	int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6606,8 +6605,8 @@ void napi_busy_loop(unsigned int napi_id,
 			have_poll_lock = netpoll_poll_lock(napi);
 			napi_poll = napi->poll;
 		}
-		work = napi_poll(napi, BUSY_POLL_BUDGET);
-		trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
+		work = napi_poll(napi, budget);
+		trace_napi_poll(napi, work, budget);
 		gro_normal_list(napi);
 count:
 		if (work > 0)
@@ -6620,7 +6619,7 @@ void napi_busy_loop(unsigned int napi_id,
 
 		if (unlikely(need_resched())) {
 			if (napi_poll)
-				busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
+				busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
 			preempt_enable();
 			rcu_read_unlock();
 			cond_resched();
@@ -6631,7 +6630,7 @@ void napi_busy_loop(unsigned int napi_id,
 		cpu_relax();
 	}
 	if (napi_poll)
-		busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
+		busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
 	preempt_enable();
 out:
 	rcu_read_unlock();
diff --git a/net/core/sock.c b/net/core/sock.c
index e05f2e52b5a8..d422a6808405 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 		else
 			WRITE_ONCE(sk->sk_prefer_busy_poll, valbool);
 		break;
+	case SO_BUSY_POLL_BUDGET:
+		if (val > READ_ONCE(sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN)) {
+			ret = -EPERM;
+		} else {
+			if (val < 0 || val > U16_MAX)
+				ret = -EINVAL;
+			else
+				WRITE_ONCE(sk->sk_busy_poll_budget, val);
+		}
+		break;
 #endif
 
 	case SO_MAX_PACING_RATE:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 03/10] xsk: add support for recvmsg()
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
  2020-11-19  8:30 ` [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  6:55   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg() Björn Töpel
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Add support for non-blocking recvmsg() to XDP sockets. Previously,
only sendmsg() was supported by XDP socket. Now, for symmetry and the
upcoming busy-polling support, recvmsg() is added.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b0141973f23e..56a52ec75696 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -531,6 +531,26 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	return __xsk_sendmsg(sk);
 }
 
+static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
+{
+	bool need_wait = !(flags & MSG_DONTWAIT);
+	struct sock *sk = sock->sk;
+	struct xdp_sock *xs = xdp_sk(sk);
+
+	if (unlikely(!(xs->dev->flags & IFF_UP)))
+		return -ENETDOWN;
+	if (unlikely(!xs->rx))
+		return -ENOBUFS;
+	if (unlikely(!xsk_is_bound(xs)))
+		return -ENXIO;
+	if (unlikely(need_wait))
+		return -EOPNOTSUPP;
+
+	if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
+		return xsk_wakeup(xs, XDP_WAKEUP_RX);
+	return 0;
+}
+
 static __poll_t xsk_poll(struct file *file, struct socket *sock,
 			     struct poll_table_struct *wait)
 {
@@ -1191,7 +1211,7 @@ static const struct proto_ops xsk_proto_ops = {
 	.setsockopt	= xsk_setsockopt,
 	.getsockopt	= xsk_getsockopt,
 	.sendmsg	= xsk_sendmsg,
-	.recvmsg	= sock_no_recvmsg,
+	.recvmsg	= xsk_recvmsg,
 	.mmap		= xsk_mmap,
 	.sendpage	= sock_no_sendpage,
 };
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg()
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (2 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 03/10] xsk: add support for recvmsg() Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  7:16   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Add a check for need wake up in sendmsg(), so that if a user calls
sendmsg() when no wakeup is needed, do not trigger a wakeup.

To simplify the need wakeup check in the syscall, unconditionally
enable the need wakeup flag for Tx. This has a side-effect for poll();
If poll() is called for a socket without enabled need wakeup, a Tx
wakeup is unconditionally performed.

The wakeup matrix for AF_XDP now looks like:

need wakeup | poll()       | sendmsg()   | recvmsg()
------------+--------------+-------------+------------
disabled    | wake Tx      | wake Tx     | nop
enabled     | check flag;  | check flag; | check flag;
            |   wake Tx/Rx |   wake Tx   |   wake Rx

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c           |  6 +++++-
 net/xdp/xsk_buff_pool.c | 13 ++++++-------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 56a52ec75696..bf0f5c34af6c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -522,13 +522,17 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
+	struct xsk_buff_pool *pool;
 
 	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
-	return __xsk_sendmsg(sk);
+	pool = xs->pool;
+	if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
+		return __xsk_sendmsg(sk);
+	return 0;
 }
 
 static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8a3bf4e1318e..96bb607853ad 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 	if (err)
 		return err;
 
-	if (flags & XDP_USE_NEED_WAKEUP) {
+	if (flags & XDP_USE_NEED_WAKEUP)
 		pool->uses_need_wakeup = true;
-		/* Tx needs to be explicitly woken up the first time.
-		 * Also for supporting drivers that do not implement this
-		 * feature. They will always have to call sendto().
-		 */
-		pool->cached_need_wakeup = XDP_WAKEUP_TX;
-	}
+	/* Tx needs to be explicitly woken up the first time.  Also
+	 * for supporting drivers that do not implement this
+	 * feature. They will always have to call sendto() or poll().
+	 */
+	pool->cached_need_wakeup = XDP_WAKEUP_TX;
 
 	dev_hold(netdev);
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg()
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (3 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg() Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  7:58   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Wire-up XDP socket busy-poll support for recvmsg() and sendmsg(). If
the XDP socket prefers busy-polling, make sure that no wakeup/IPI is
performed.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index bf0f5c34af6c..ecc4579e41ee 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -23,6 +23,7 @@
 #include <linux/netdevice.h>
 #include <linux/rculist.h>
 #include <net/xdp_sock_drv.h>
+#include <net/busy_poll.h>
 #include <net/xdp.h>
 
 #include "xsk_queue.h"
@@ -517,6 +518,17 @@ static int __xsk_sendmsg(struct sock *sk)
 	return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
 }
 
+static bool xsk_no_wakeup(struct sock *sk)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	/* Prefer busy-polling, skip the wakeup. */
+	return READ_ONCE(sk->sk_prefer_busy_poll) && READ_ONCE(sk->sk_ll_usec) &&
+		READ_ONCE(sk->sk_napi_id) >= MIN_NAPI_ID;
+#else
+	return false;
+#endif
+}
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
@@ -529,6 +541,12 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
+	if (sk_can_busy_loop(sk))
+		sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
+	if (xsk_no_wakeup(sk))
+		return 0;
+
 	pool = xs->pool;
 	if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
 		return __xsk_sendmsg(sk);
@@ -550,6 +568,12 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl
 	if (unlikely(need_wait))
 		return -EOPNOTSUPP;
 
+	if (sk_can_busy_loop(sk))
+		sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
+	if (xsk_no_wakeup(sk))
+		return 0;
+
 	if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
 		return xsk_wakeup(xs, XDP_WAKEUP_RX);
 	return 0;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (4 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25 14:47   ` Magnus Karlsson
                     ` (3 more replies)
  2020-11-19  8:30 ` [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop Björn Töpel
                   ` (5 subsequent siblings)
  11 siblings, 4 replies; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi,
	intel-wired-lan, netanel, akiyano, michael.chan, sgoutham,
	ioana.ciornei, ruxandra.radulescu, thomas.petazzoni, mcroce,
	saeedm, tariqt, aelior, ecree, ilias.apalodimas,
	grygorii.strashko, sthemmin, mst, kda

From: Björn Töpel <bjorn.topel@intel.com>

Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 .../ethernet/cavium/thunder/nicvf_queues.c    |  2 +-
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
 drivers/net/ethernet/intel/ice/ice_base.c     |  4 ++--
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
 drivers/net/ethernet/marvell/mvneta.c         |  2 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
 drivers/net/ethernet/sfc/rx_common.c          |  2 +-
 drivers/net/ethernet/socionext/netsec.c       |  2 +-
 drivers/net/ethernet/ti/cpsw_priv.c           |  2 +-
 drivers/net/hyperv/netvsc.c                   |  2 +-
 drivers/net/tun.c                             |  2 +-
 drivers/net/veth.c                            | 12 ++++++++----
 drivers/net/virtio_net.c                      |  2 +-
 drivers/net/xen-netfront.c                    |  2 +-
 include/net/busy_poll.h                       | 19 +++++++++++++++----
 include/net/xdp.h                             |  3 ++-
 net/core/dev.c                                |  2 +-
 net/core/xdp.c                                |  3 ++-
 net/xdp/xsk.c                                 |  1 +
 29 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index e8131dadc22c..6ad59f0068f6 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
 {
 	int rc;
 
-	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
+	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
 
 	if (rc) {
 		netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7975f59735d6..725d929eddb1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2884,7 +2884,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 		if (rc)
 			return rc;
 
-		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
+		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
 		if (rc < 0)
 			return rc;
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 7a141ce32e86..f782e6af45e9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
 	rq->caching = 1;
 
 	/* Driver have no proper error path for failed XDP RX-queue info reg */
-	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
+	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
 
 	/* Send a mailbox msg to PF to config RQ */
 	mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index cf9400a9886d..40953980e846 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
 		return 0;
 
 	err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
-			       fq->flowid);
+			       fq->flowid, 0);
 	if (err) {
 		dev_err(dev, "xdp_rxq_info_reg failed\n");
 		return err;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c21548c71bb1..9f73cd7aee09 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1447,7 +1447,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	/* XDP RX-queue info only needed for RX rings exposed to XDP */
 	if (rx_ring->vsi->type == I40E_VSI_MAIN) {
 		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-				       rx_ring->queue_index);
+				       rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index fe4320e2d1f2..3124a3bf519a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 		if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
 			/* coverity[check_return] */
 			xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
-					 ring->q_index);
+					 ring->q_index, ring->q_vector->napi.napi_id);
 
 		ring->xsk_pool = ice_xsk_pool(ring);
 		if (ring->xsk_pool) {
@@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 				/* coverity[check_return] */
 				xdp_rxq_info_reg(&ring->xdp_rxq,
 						 ring->netdev,
-						 ring->q_index);
+						 ring->q_index, ring->q_vector->napi.napi_id);
 
 			err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
 							 MEM_TYPE_PAGE_SHARED,
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index eae75260fe20..77d5eae6b4c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
 	if (rx_ring->vsi->type == ICE_VSI_PF &&
 	    !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
 		if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-				     rx_ring->q_index))
+				     rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
 			goto err;
 	return 0;
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5fc2c381da55..6a4ef4934fcf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4352,7 +4352,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
 
 	/* XDP RX-queue info */
 	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-			     rx_ring->queue_index) < 0)
+			     rx_ring->queue_index, 0) < 0)
 		goto err;
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45ae33e15303..50e6b8b6ba7b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 
 	/* XDP RX-queue info */
 	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
-			     rx_ring->queue_index) < 0)
+			     rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
 		goto err;
 
 	rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 82fce27f682b..4061cd7db5dd 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3493,7 +3493,7 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter,
 
 	/* XDP RX-queue info */
 	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
-			     rx_ring->queue_index) < 0)
+			     rx_ring->queue_index, 0) < 0)
 		goto err;
 
 	rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 183530ed4d1d..ba6dcb19bb1d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3227,7 +3227,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 		return err;
 	}
 
-	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
+	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
 	if (err < 0)
 		goto err_free_pp;
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 3069e192d773..5504cbc24970 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2614,11 +2614,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
 	mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
 
 	if (priv->percpu_pools) {
-		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
+		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
 		if (err < 0)
 			goto err_free_dma;
 
-		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
+		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
 		if (err < 0)
 			goto err_unregister_rxq_short;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b0f79a5151cf..40775cb8fb2a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 	ring->log_stride = ffs(ring->stride) - 1;
 	ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
 
-	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
+	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
 		goto err_ring;
 
 	tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 527c5f12c5af..427fc376fe1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -434,7 +434,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	rq_xdp_ix = rq->ix;
 	if (xsk)
 		rq_xdp_ix += params->num_channels * MLX5E_RQ_GROUP_XSK;
-	err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix);
+	err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
 	if (err < 0)
 		goto err_rq_xdp_prog;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b150da43adb2..b4acf2f41e84 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
 
 	if (dp->netdev) {
 		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
-				       rx_ring->idx);
+				       rx_ring->idx, rx_ring->r_vec->napi.napi_id);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..9cf960a6d007 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
 
 			/* Driver have no error path from here */
 			WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
-						 fp->rxq->rxq_id) < 0);
+						 fp->rxq->rxq_id, 0) < 0);
 
 			if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
 						       MEM_TYPE_PAGE_ORDER0,
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 19cf7cac1e6e..68fc7d317693 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
 
 	/* Initialise XDP queue information */
 	rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
-			      rx_queue->core_index);
+			      rx_queue->core_index, 0);
 
 	if (rc) {
 		netif_err(efx, rx_err, efx->net_dev,
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 1503cc9ec6e2..27d3c9d9210e 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
 		goto err_out;
 	}
 
-	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
+	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, priv->napi.napi_id);
 	if (err)
 		goto err_out;
 
diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 31c5e36ff706..6dd73bd0f458 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1186,7 +1186,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
 	pool = cpsw->page_pool[ch];
 	rxq = &priv->xdp_rxq[ch];
 
-	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
+	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
 	if (ret)
 		return ret;
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0c3de94b5178..fa8341f8359a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
 		u64_stats_init(&nvchan->tx_stats.syncp);
 		u64_stats_init(&nvchan->rx_stats.syncp);
 
-		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
+		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
 
 		if (ret) {
 			netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3d45d56172cb..8867d39db6ac 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -780,7 +780,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	} else {
 		/* Setup XDP RX-queue info, for new tfile getting attached */
 		err = xdp_rxq_info_reg(&tfile->xdp_rxq,
-				       tun->dev, tfile->queue_index);
+				       tun->dev, tfile->queue_index, 0);
 		if (err < 0)
 			goto out;
 		err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..9bd37c7151f8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -884,7 +884,6 @@ static int veth_napi_add(struct net_device *dev)
 	for (i = 0; i < dev->real_num_rx_queues; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
-		netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
 		napi_enable(&rq->xdp_napi);
 	}
 
@@ -926,7 +925,8 @@ static int veth_enable_xdp(struct net_device *dev)
 		for (i = 0; i < dev->real_num_rx_queues; i++) {
 			struct veth_rq *rq = &priv->rq[i];
 
-			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
+			netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
+			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, rq->xdp_napi.napi_id);
 			if (err < 0)
 				goto err_rxq_reg;
 
@@ -952,8 +952,12 @@ static int veth_enable_xdp(struct net_device *dev)
 err_reg_mem:
 	xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
 err_rxq_reg:
-	for (i--; i >= 0; i--)
-		xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
+	for (i--; i >= 0; i--) {
+		struct veth_rq *rq = &priv->rq[i];
+
+		xdp_rxq_info_unreg(&rq->xdp_rxq);
+		netif_napi_del(&rq->xdp_napi);
+	}
 
 	return err;
 }
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..052975ea0af4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
+		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id);
 		if (err < 0)
 			return err;
 
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 920cac4385bf..b01848ef4649 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
 	}
 
 	err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
-			       queue->id);
+			       queue->id, 0);
 	if (err) {
 		netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
 		goto err_free_pp;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 2f8f51807b83..45b3e04b99d3 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -135,14 +135,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 	sk_rx_queue_set(sk, skb);
 }
 
-/* variant used for unconnected sockets */
-static inline void sk_mark_napi_id_once(struct sock *sk,
-					const struct sk_buff *skb)
+static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	if (!READ_ONCE(sk->sk_napi_id))
-		WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
+		WRITE_ONCE(sk->sk_napi_id, napi_id);
 #endif
 }
 
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+					const struct sk_buff *skb)
+{
+	__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
+}
+
+static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
+					    const struct xdp_buff *xdp)
+{
+	__sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
+}
+
 #endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 7d48b2ae217a..700ad5db7f5d 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -59,6 +59,7 @@ struct xdp_rxq_info {
 	u32 queue_index;
 	u32 reg_state;
 	struct xdp_mem_info mem;
+	unsigned int napi_id;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
 struct xdp_txq_info {
@@ -226,7 +227,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index);
+		     struct net_device *dev, u32 queue_index, unsigned int napi_id);
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
 bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index 7a1e5936c67f..3b6b0e175fe7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9810,7 +9810,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 		rx[i].dev = dev;
 
 		/* XDP RX-queue setup */
-		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
+		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
 		if (err < 0)
 			goto err_rxq_info;
 	}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 3d330ebda893..17ffd33c6b18 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
 
 /* Returns 0 on success, negative on failure */
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index)
+		     struct net_device *dev, u32 queue_index, unsigned int napi_id)
 {
 	if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
 		WARN(1, "Driver promised not to register this");
@@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
 	xdp_rxq_info_init(xdp_rxq);
 	xdp_rxq->dev = dev;
 	xdp_rxq->queue_index = queue_index;
+	xdp_rxq->napi_id = napi_id;
 
 	xdp_rxq->reg_state = REG_STATE_REGISTERED;
 	return 0;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ecc4579e41ee..d4cb1c5c1abf 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
+	sk_mark_napi_id_once_xdp(&xs->sk, xdp);
 	len = xdp->data_end - xdp->data;
 
 	return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (5 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  7:59   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd Björn Töpel
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Start using recvfrom() the rxdrop scenario.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 2567f0db5aca..f90111b95b2e 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1170,7 +1170,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
 	}
 }
 
-static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
+static void rx_drop(struct xsk_socket_info *xsk)
 {
 	unsigned int rcvd, i;
 	u32 idx_rx = 0, idx_fq = 0;
@@ -1180,7 +1180,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 	if (!rcvd) {
 		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
-			ret = poll(fds, num_socks, opt_timeout);
+			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
 		return;
 	}
@@ -1191,7 +1191,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
 			exit_with_error(-ret);
 		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.fill_fail_polls++;
-			ret = poll(fds, num_socks, opt_timeout);
+			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
 		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
 	}
@@ -1233,7 +1233,7 @@ static void rx_drop_all(void)
 		}
 
 		for (i = 0; i < num_socks; i++)
-			rx_drop(xsks[i], fds);
+			rx_drop(xsks[i]);
 
 		if (benchmark_done)
 			break;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (6 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  8:00   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock Björn Töpel
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Start using recvfrom() the l2fwd scenario, instead of poll() which is
more expensive and need additional knobs for busy-polling.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index f90111b95b2e..24aa7511c4c8 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1098,8 +1098,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
 	exit_with_error(errno);
 }
 
-static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
-				     struct pollfd *fds)
+static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
 	struct xsk_umem_info *umem = xsk->umem;
 	u32 idx_cq = 0, idx_fq = 0;
@@ -1134,7 +1133,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 				exit_with_error(-ret);
 			if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
 				xsk->app_stats.fill_fail_polls++;
-				ret = poll(fds, num_socks, opt_timeout);
+				recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 			}
 			ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
 		}
@@ -1331,19 +1330,19 @@ static void tx_only_all(void)
 		complete_tx_only_all();
 }
 
-static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
+static void l2fwd(struct xsk_socket_info *xsk)
 {
 	unsigned int rcvd, i;
 	u32 idx_rx = 0, idx_tx = 0;
 	int ret;
 
-	complete_tx_l2fwd(xsk, fds);
+	complete_tx_l2fwd(xsk);
 
 	rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
 	if (!rcvd) {
 		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
-			ret = poll(fds, num_socks, opt_timeout);
+			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
 		return;
 	}
@@ -1353,7 +1352,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
 	while (ret != rcvd) {
 		if (ret < 0)
 			exit_with_error(-ret);
-		complete_tx_l2fwd(xsk, fds);
+		complete_tx_l2fwd(xsk);
 		if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
 			xsk->app_stats.tx_wakeup_sendtos++;
 			kick_tx(xsk);
@@ -1388,22 +1387,20 @@ static void l2fwd_all(void)
 	struct pollfd fds[MAX_SOCKS] = {};
 	int i, ret;
 
-	for (i = 0; i < num_socks; i++) {
-		fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
-		fds[i].events = POLLOUT | POLLIN;
-	}
-
 	for (;;) {
 		if (opt_poll) {
-			for (i = 0; i < num_socks; i++)
+			for (i = 0; i < num_socks; i++) {
+				fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
+				fds[i].events = POLLOUT | POLLIN;
 				xsks[i]->app_stats.opt_polls++;
+			}
 			ret = poll(fds, num_socks, opt_timeout);
 			if (ret <= 0)
 				continue;
 		}
 
 		for (i = 0; i < num_socks; i++)
-			l2fwd(xsks[i], fds);
+			l2fwd(xsks[i]);
 
 		if (benchmark_done)
 			break;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (7 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  8:19   ` Magnus Karlsson
  2020-11-19  8:30 ` [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget Björn Töpel
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Add a new option to xdpsock, 'B', for busy-polling. This option will
also set the batching size, 'b' option, to the busy-poll budget.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 40 +++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 24aa7511c4c8..cb1eaee8a32b 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -95,6 +95,7 @@ static int opt_timeout = 1000;
 static bool opt_need_wakeup = true;
 static u32 opt_num_xsks = 1;
 static u32 prog_id;
+static bool opt_busy_poll;
 
 struct xsk_ring_stats {
 	unsigned long rx_npkts;
@@ -911,6 +912,7 @@ static struct option long_options[] = {
 	{"quiet", no_argument, 0, 'Q'},
 	{"app-stats", no_argument, 0, 'a'},
 	{"irq-string", no_argument, 0, 'I'},
+	{"busy-poll", no_argument, 0, 'B'},
 	{0, 0, 0, 0}
 };
 
@@ -949,6 +951,7 @@ static void usage(const char *prog)
 		"  -Q, --quiet          Do not display any stats.\n"
 		"  -a, --app-stats	Display application (syscall) statistics.\n"
 		"  -I, --irq-string	Display driver interrupt statistics for interface associated with irq-string.\n"
+		"  -B, --busy-poll      Busy poll.\n"
 		"\n";
 	fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
 		opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
@@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
+		c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
 				long_options, &option_index);
 		if (c == -1)
 			break;
@@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
 				fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
 				usage(basename(argv[0]));
 			}
-
+			break;
+		case 'B':
+			opt_busy_poll = 1;
 			break;
 		default:
 			usage(basename(argv[0]));
@@ -1131,7 +1136,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 		while (ret != rcvd) {
 			if (ret < 0)
 				exit_with_error(-ret);
-			if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
+			if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
 				xsk->app_stats.fill_fail_polls++;
 				recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 			}
@@ -1177,7 +1182,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
 
 	rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
 	if (!rcvd) {
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
 			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
@@ -1188,7 +1193,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
 	while (ret != rcvd) {
 		if (ret < 0)
 			exit_with_error(-ret);
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.fill_fail_polls++;
 			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
@@ -1340,7 +1345,7 @@ static void l2fwd(struct xsk_socket_info *xsk)
 
 	rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
 	if (!rcvd) {
-		if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
 			xsk->app_stats.rx_empty_polls++;
 			recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
 		}
@@ -1353,7 +1358,7 @@ static void l2fwd(struct xsk_socket_info *xsk)
 		if (ret < 0)
 			exit_with_error(-ret);
 		complete_tx_l2fwd(xsk);
-		if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
+		if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
 			xsk->app_stats.tx_wakeup_sendtos++;
 			kick_tx(xsk);
 		}
@@ -1458,6 +1463,24 @@ static void enter_xsks_into_map(struct bpf_object *obj)
 	}
 }
 
+static void apply_setsockopt(struct xsk_socket_info *xsk)
+{
+	int sock_opt;
+
+	if (!opt_busy_poll)
+		return;
+
+	sock_opt = 1;
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_PREFER_BUSY_POLL,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
+
+	sock_opt = 20;
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
+}
+
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -1499,6 +1522,9 @@ int main(int argc, char **argv)
 	for (i = 0; i < opt_num_xsks; i++)
 		xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
 
+	for (i = 0; i < opt_num_xsks; i++)
+		apply_setsockopt(xsks[i]);
+
 	if (opt_bench == BENCH_TXONLY) {
 		gen_eth_hdr_data();
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (8 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock Björn Töpel
@ 2020-11-19  8:30 ` Björn Töpel
  2020-11-25  8:23   ` Magnus Karlsson
  2020-11-23 13:31 ` [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
  2020-11-24  0:14 ` Jakub Kicinski
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-19  8:30 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi

From: Björn Töpel <bjorn.topel@intel.com>

Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching
option ('b').

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 samples/bpf/xdpsock_user.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index cb1eaee8a32b..deba623e9003 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1479,6 +1479,11 @@ static void apply_setsockopt(struct xsk_socket_info *xsk)
 	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
 		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
 		exit_with_error(errno);
+
+	sock_opt = opt_batch_size;
+	if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
+		       (void *)&sock_opt, sizeof(sock_opt)) < 0)
+		exit_with_error(errno);
 }
 
 int main(int argc, char **argv)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 00/10] Introduce preferred busy-polling
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (9 preceding siblings ...)
  2020-11-19  8:30 ` [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget Björn Töpel
@ 2020-11-23 13:31 ` Björn Töpel
  2020-11-23 23:54   ` Jakub Kicinski
  2020-11-24  0:14 ` Jakub Kicinski
  11 siblings, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2020-11-23 13:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: Björn Töpel, Netdev, Karlsson, Magnus, bpf,
	Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jonathan Lemon, maximmi

On Thu, 19 Nov 2020 at 09:30, Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> This series introduces three new features:
>
> 1. A new "heavy traffic" busy-polling variant that works in concert
>    with the existing napi_defer_hard_irqs and gro_flush_timeout knobs.
>
> 2. A new socket option that let a user change the busy-polling NAPI
>    budget.
>
> 3. Allow busy-polling to be performed on XDP sockets.
>
> The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
> option or system-wide using the /proc/sys/net/core/busy_read knob, is
> an opportunistic. That means that if the NAPI context is not
> scheduled, it will poll it. If, after busy-polling, the budget is
> exceeded the busy-polling logic will schedule the NAPI onto the
> regular softirq handling.
>
> One implication of the behavior above is that a busy/heavy loaded NAPI
> context will never enter/allow for busy-polling. Some applications
> prefer that most NAPI processing would be done by busy-polling.
>
> This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
> in concert with the napi_defer_hard_irqs and gro_flush_timeout
> knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
> introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
> feature"), and allows for a user to defer interrupts to be enabled and
> instead schedule the NAPI context from a watchdog timer. When a user
> enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
> and the NAPI context is being processed by a softirq, the softirq NAPI
> processing will exit early to allow the busy-polling to be performed.
>
> If the application stops performing busy-polling via a system call,
> the watchdog timer defined by gro_flush_timeout will timeout, and
> regular softirq handling will resume.
>
> In summary; Heavy traffic applications that prefer busy-polling over
> softirq processing should use this option.
>

Eric/Jakub, any more thoughts/input? Tomatoes? :-P


Thank you,
Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 00/10] Introduce preferred busy-polling
  2020-11-23 13:31 ` [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
@ 2020-11-23 23:54   ` Jakub Kicinski
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-23 23:54 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Eric Dumazet, Björn Töpel, Netdev, Karlsson, Magnus,
	bpf, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jonathan Lemon, maximmi

On Mon, 23 Nov 2020 14:31:14 +0100 Björn Töpel wrote:
> Eric/Jakub, any more thoughts/input? Tomatoes? :-P

Looking now, sorry for the delay. Somehow patches without net in their
tag feel like they can wait..

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
@ 2020-11-24  0:04   ` Jakub Kicinski
  2020-11-24  7:58     ` Björn Töpel
  2020-11-24  0:11   ` Jakub Kicinski
  2020-11-24 16:21   ` Jakub Kicinski
  2 siblings, 1 reply; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-24  0:04 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi

On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
> +	/* The NAPI context has more processing work, but busy-polling
> +	 * is preferred. Exit early.
> +	 */
> +	if (napi_prefer_busy_poll(n)) {
> +		if (napi_complete_done(n, work)) {
> +			/* If timeout is not set, we need to make sure
> +			 * that the NAPI is re-scheduled.
> +			 */
> +			napi_schedule(n);
> +		}
> +		goto out_unlock;
> +	}

Do we really need to go through napi_complete_done() here?

Isn't it sufficient to check:

	if (napi_prefer_busy_poll(n) && 
	    hrtimer_active(&n->timer)) // not 100% sure this is the
	                               // right helper for the check

If timer is scheduled it will fire and worst case sirq will kick back
in after timeout. napi_complete_done() should had been called by the
driver already to schedule the timer. If the driver doesn't call
napi_complete_done() we should not allow it to use busy_poll() anyway.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
  2020-11-24  0:04   ` Jakub Kicinski
@ 2020-11-24  0:11   ` Jakub Kicinski
  2020-11-24  8:47     ` Björn Töpel
  2020-11-24 16:21   ` Jakub Kicinski
  2 siblings, 1 reply; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-24  0:11 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi

On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
> @@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
>  	unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
>  
>  	if (napi_id >= MIN_NAPI_ID)
> -		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
> +		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
> +			       READ_ONCE(sk->sk_prefer_busy_poll));

Perhaps a noob question, but aren't all accesses to the new sk members
under the socket lock? Do we really need the READ_ONCE() / WRITE_ONCE()?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 00/10] Introduce preferred busy-polling
  2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
                   ` (10 preceding siblings ...)
  2020-11-23 13:31 ` [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
@ 2020-11-24  0:14 ` Jakub Kicinski
  11 siblings, 0 replies; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-24  0:14 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, bjorn.topel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi

On Thu, 19 Nov 2020 09:30:14 +0100 Björn Töpel wrote:
> Performance netperf UDP_RR:
> 
> Note that netperf UDP_RR is not a heavy traffic tests, and preferred
> busy-polling is not typically something we want to use here.
> 
>   $ echo 20 | sudo tee /proc/sys/net/core/busy_read
>   $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
>       -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
> 
> busy-polling blocking sockets:            12,13.33,224,0.63,74731.177
> 
> I hacked netperf to use non-blocking sockets and re-ran:
> 
> busy-polling non-blocking sockets:        12,13.46,218,0.72,73991.172
> prefer busy-polling non-blocking sockets: 12,13.62,221,0.59,73138.448
> 
> Using the preferred busy-polling mode does not impact performance.
> 
> The above tests was done for the 'ice' driver.

Any interest in this work form ADQ folks? I recall they were using
memcache with busy polling for their tests, it'd cool to see how much
this helps memcache on P99+ latency!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-24  0:04   ` Jakub Kicinski
@ 2020-11-24  7:58     ` Björn Töpel
  0 siblings, 0 replies; 33+ messages in thread
From: Björn Töpel @ 2020-11-24  7:58 UTC (permalink / raw)
  To: Jakub Kicinski, Björn Töpel
  Cc: netdev, bpf, magnus.karlsson, ast, daniel, maciej.fijalkowski,
	sridhar.samudrala, jesse.brandeburg, qi.z.zhang, edumazet,
	jonathan.lemon, maximmi

On 2020-11-24 01:04, Jakub Kicinski wrote:
> On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
>> +	/* The NAPI context has more processing work, but busy-polling
>> +	 * is preferred. Exit early.
>> +	 */
>> +	if (napi_prefer_busy_poll(n)) {
>> +		if (napi_complete_done(n, work)) {
>> +			/* If timeout is not set, we need to make sure
>> +			 * that the NAPI is re-scheduled.
>> +			 */
>> +			napi_schedule(n);
>> +		}
>> +		goto out_unlock;
>> +	}
> 
> Do we really need to go through napi_complete_done() here?
> 
> Isn't it sufficient to check:
> 
> 	if (napi_prefer_busy_poll(n) &&
> 	    hrtimer_active(&n->timer)) // not 100% sure this is the
> 	                               // right helper for the check
> 
> If timer is scheduled it will fire and worst case sirq will kick back
> in after timeout. napi_complete_done() should had been called by the
> driver already to schedule the timer. If the driver doesn't call
> napi_complete_done() we should not allow it to use busy_poll() anyway.
> 

No, it's not. For a heavy traffic load, the napi_complete_done() will
never be called by the driver. It'll just keep on spinning in the
ksoftirqd. This code is to force out of that loop, so we need to call
napi_complete_done() explicitly (which will set the timeout).

Without the explicit napi_complete_done(), the ksoftirqd will not stop,
and the busy-polling will never allow to enter.


Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-24  0:11   ` Jakub Kicinski
@ 2020-11-24  8:47     ` Björn Töpel
  0 siblings, 0 replies; 33+ messages in thread
From: Björn Töpel @ 2020-11-24  8:47 UTC (permalink / raw)
  To: Jakub Kicinski, Björn Töpel
  Cc: netdev, bpf, magnus.karlsson, ast, daniel, maciej.fijalkowski,
	sridhar.samudrala, jesse.brandeburg, qi.z.zhang, edumazet,
	jonathan.lemon, maximmi


On 2020-11-24 01:11, Jakub Kicinski wrote:
> On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
>> @@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
>>   	unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
>>   
>>   	if (napi_id >= MIN_NAPI_ID)
>> -		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
>> +		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
>> +			       READ_ONCE(sk->sk_prefer_busy_poll));
> 
> Perhaps a noob question, but aren't all accesses to the new sk members
> under the socket lock? Do we really need the READ_ONCE() / WRITE_ONCE()?
> 

No, only when setting them via sock_setsockopt. Reading is done outside
the lock.


Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
  2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
  2020-11-24  0:04   ` Jakub Kicinski
  2020-11-24  0:11   ` Jakub Kicinski
@ 2020-11-24 16:21   ` Jakub Kicinski
  2 siblings, 0 replies; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-24 16:21 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi

On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
> option or system-wide using the /proc/sys/net/core/busy_read knob, is
> an opportunistic. That means that if the NAPI context is not
> scheduled, it will poll it. If, after busy-polling, the budget is
> exceeded the busy-polling logic will schedule the NAPI onto the
> regular softirq handling.
> 
> One implication of the behavior above is that a busy/heavy loaded NAPI
> context will never enter/allow for busy-polling. Some applications
> prefer that most NAPI processing would be done by busy-polling.
> 
> This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
> in concert with the napi_defer_hard_irqs and gro_flush_timeout
> knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
> introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
> feature"), and allows for a user to defer interrupts to be enabled and
> instead schedule the NAPI context from a watchdog timer. When a user
> enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
> and the NAPI context is being processed by a softirq, the softirq NAPI
> processing will exit early to allow the busy-polling to be performed.
> 
> If the application stops performing busy-polling via a system call,
> the watchdog timer defined by gro_flush_timeout will timeout, and
> regular softirq handling will resume.
> 
> In summary; Heavy traffic applications that prefer busy-polling over
> softirq processing should use this option.
> 
> Example usage:
> 
>   $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
>   $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
> 
> Note that the timeout should be larger than the userspace processing
> window, otherwise the watchdog will timeout and fall back to regular
> softirq processing.
> 
> Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option
  2020-11-19  8:30 ` [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-11-24 16:21   ` Jakub Kicinski
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Kicinski @ 2020-11-24 16:21 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi

On Thu, 19 Nov 2020 09:30:16 +0100 Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> This option lets a user set a per socket NAPI budget for
> busy-polling. If the options is not set, it will use the default of 8.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 03/10] xsk: add support for recvmsg()
  2020-11-19  8:30 ` [PATCH bpf-next v3 03/10] xsk: add support for recvmsg() Björn Töpel
@ 2020-11-25  6:55   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  6:55 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:32 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add support for non-blocking recvmsg() to XDP sockets. Previously,
> only sendmsg() was supported by XDP socket. Now, for symmetry and the
> upcoming busy-polling support, recvmsg() is added.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  net/xdp/xsk.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index b0141973f23e..56a52ec75696 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -531,6 +531,26 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>         return __xsk_sendmsg(sk);
>  }
>
> +static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
> +{
> +       bool need_wait = !(flags & MSG_DONTWAIT);
> +       struct sock *sk = sock->sk;
> +       struct xdp_sock *xs = xdp_sk(sk);
> +
> +       if (unlikely(!(xs->dev->flags & IFF_UP)))
> +               return -ENETDOWN;
> +       if (unlikely(!xs->rx))
> +               return -ENOBUFS;
> +       if (unlikely(!xsk_is_bound(xs)))
> +               return -ENXIO;
> +       if (unlikely(need_wait))
> +               return -EOPNOTSUPP;
> +
> +       if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
> +               return xsk_wakeup(xs, XDP_WAKEUP_RX);
> +       return 0;
> +}
> +
>  static __poll_t xsk_poll(struct file *file, struct socket *sock,
>                              struct poll_table_struct *wait)
>  {
> @@ -1191,7 +1211,7 @@ static const struct proto_ops xsk_proto_ops = {
>         .setsockopt     = xsk_setsockopt,
>         .getsockopt     = xsk_getsockopt,
>         .sendmsg        = xsk_sendmsg,
> -       .recvmsg        = sock_no_recvmsg,
> +       .recvmsg        = xsk_recvmsg,
>         .mmap           = xsk_mmap,
>         .sendpage       = sock_no_sendpage,
>  };
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg()
  2020-11-19  8:30 ` [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg() Björn Töpel
@ 2020-11-25  7:16   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  7:16 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add a check for need wake up in sendmsg(), so that if a user calls
> sendmsg() when no wakeup is needed, do not trigger a wakeup.
>
> To simplify the need wakeup check in the syscall, unconditionally
> enable the need wakeup flag for Tx. This has a side-effect for poll();
> If poll() is called for a socket without enabled need wakeup, a Tx
> wakeup is unconditionally performed.
>
> The wakeup matrix for AF_XDP now looks like:
>
> need wakeup | poll()       | sendmsg()   | recvmsg()
> ------------+--------------+-------------+------------
> disabled    | wake Tx      | wake Tx     | nop
> enabled     | check flag;  | check flag; | check flag;
>             |   wake Tx/Rx |   wake Tx   |   wake Rx
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  net/xdp/xsk.c           |  6 +++++-
>  net/xdp/xsk_buff_pool.c | 13 ++++++-------
>  2 files changed, 11 insertions(+), 8 deletions(-)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 56a52ec75696..bf0f5c34af6c 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -522,13 +522,17 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>         bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
>         struct sock *sk = sock->sk;
>         struct xdp_sock *xs = xdp_sk(sk);
> +       struct xsk_buff_pool *pool;
>
>         if (unlikely(!xsk_is_bound(xs)))
>                 return -ENXIO;
>         if (unlikely(need_wait))
>                 return -EOPNOTSUPP;
>
> -       return __xsk_sendmsg(sk);
> +       pool = xs->pool;
> +       if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
> +               return __xsk_sendmsg(sk);
> +       return 0;
>  }
>
>  static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 8a3bf4e1318e..96bb607853ad 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
>         if (err)
>                 return err;
>
> -       if (flags & XDP_USE_NEED_WAKEUP) {
> +       if (flags & XDP_USE_NEED_WAKEUP)
>                 pool->uses_need_wakeup = true;
> -               /* Tx needs to be explicitly woken up the first time.
> -                * Also for supporting drivers that do not implement this
> -                * feature. They will always have to call sendto().
> -                */
> -               pool->cached_need_wakeup = XDP_WAKEUP_TX;
> -       }
> +       /* Tx needs to be explicitly woken up the first time.  Also
> +        * for supporting drivers that do not implement this
> +        * feature. They will always have to call sendto() or poll().
> +        */
> +       pool->cached_need_wakeup = XDP_WAKEUP_TX;
>
>         dev_hold(netdev);
>
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg()
  2020-11-19  8:30 ` [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
@ 2020-11-25  7:58   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  7:58 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Wire-up XDP socket busy-poll support for recvmsg() and sendmsg(). If
> the XDP socket prefers busy-polling, make sure that no wakeup/IPI is
> performed.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  net/xdp/xsk.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index bf0f5c34af6c..ecc4579e41ee 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -23,6 +23,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/rculist.h>
>  #include <net/xdp_sock_drv.h>
> +#include <net/busy_poll.h>
>  #include <net/xdp.h>
>
>  #include "xsk_queue.h"
> @@ -517,6 +518,17 @@ static int __xsk_sendmsg(struct sock *sk)
>         return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
>  }
>
> +static bool xsk_no_wakeup(struct sock *sk)
> +{
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +       /* Prefer busy-polling, skip the wakeup. */
> +       return READ_ONCE(sk->sk_prefer_busy_poll) && READ_ONCE(sk->sk_ll_usec) &&
> +               READ_ONCE(sk->sk_napi_id) >= MIN_NAPI_ID;
> +#else
> +       return false;
> +#endif
> +}
> +
>  static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>  {
>         bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> @@ -529,6 +541,12 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>         if (unlikely(need_wait))
>                 return -EOPNOTSUPP;
>
> +       if (sk_can_busy_loop(sk))
> +               sk_busy_loop(sk, 1); /* only support non-blocking sockets */
> +
> +       if (xsk_no_wakeup(sk))
> +               return 0;
> +
>         pool = xs->pool;
>         if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
>                 return __xsk_sendmsg(sk);
> @@ -550,6 +568,12 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl
>         if (unlikely(need_wait))
>                 return -EOPNOTSUPP;
>
> +       if (sk_can_busy_loop(sk))
> +               sk_busy_loop(sk, 1); /* only support non-blocking sockets */
> +
> +       if (xsk_no_wakeup(sk))
> +               return 0;
> +
>         if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
>                 return xsk_wakeup(xs, XDP_WAKEUP_RX);
>         return 0;
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop
  2020-11-19  8:30 ` [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop Björn Töpel
@ 2020-11-25  7:59   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  7:59 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:34 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Start using recvfrom() the rxdrop scenario.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 2567f0db5aca..f90111b95b2e 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -1170,7 +1170,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
>         }
>  }
>
> -static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
> +static void rx_drop(struct xsk_socket_info *xsk)
>  {
>         unsigned int rcvd, i;
>         u32 idx_rx = 0, idx_fq = 0;
> @@ -1180,7 +1180,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
>         if (!rcvd) {
>                 if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.rx_empty_polls++;
> -                       ret = poll(fds, num_socks, opt_timeout);
> +                       recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
>                 return;
>         }
> @@ -1191,7 +1191,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
>                         exit_with_error(-ret);
>                 if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.fill_fail_polls++;
> -                       ret = poll(fds, num_socks, opt_timeout);
> +                       recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
>                 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
>         }
> @@ -1233,7 +1233,7 @@ static void rx_drop_all(void)
>                 }
>
>                 for (i = 0; i < num_socks; i++)
> -                       rx_drop(xsks[i], fds);
> +                       rx_drop(xsks[i]);
>
>                 if (benchmark_done)
>                         break;
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd
  2020-11-19  8:30 ` [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd Björn Töpel
@ 2020-11-25  8:00   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  8:00 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Start using recvfrom() the l2fwd scenario, instead of poll() which is
> more expensive and need additional knobs for busy-polling.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index f90111b95b2e..24aa7511c4c8 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -1098,8 +1098,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
>         exit_with_error(errno);
>  }
>
> -static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
> -                                    struct pollfd *fds)
> +static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>  {
>         struct xsk_umem_info *umem = xsk->umem;
>         u32 idx_cq = 0, idx_fq = 0;
> @@ -1134,7 +1133,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
>                                 exit_with_error(-ret);
>                         if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
>                                 xsk->app_stats.fill_fail_polls++;
> -                               ret = poll(fds, num_socks, opt_timeout);
> +                               recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                         }
>                         ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
>                 }
> @@ -1331,19 +1330,19 @@ static void tx_only_all(void)
>                 complete_tx_only_all();
>  }
>
> -static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
> +static void l2fwd(struct xsk_socket_info *xsk)
>  {
>         unsigned int rcvd, i;
>         u32 idx_rx = 0, idx_tx = 0;
>         int ret;
>
> -       complete_tx_l2fwd(xsk, fds);
> +       complete_tx_l2fwd(xsk);
>
>         rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
>         if (!rcvd) {
>                 if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.rx_empty_polls++;
> -                       ret = poll(fds, num_socks, opt_timeout);
> +                       recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
>                 return;
>         }
> @@ -1353,7 +1352,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
>         while (ret != rcvd) {
>                 if (ret < 0)
>                         exit_with_error(-ret);
> -               complete_tx_l2fwd(xsk, fds);
> +               complete_tx_l2fwd(xsk);
>                 if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
>                         xsk->app_stats.tx_wakeup_sendtos++;
>                         kick_tx(xsk);
> @@ -1388,22 +1387,20 @@ static void l2fwd_all(void)
>         struct pollfd fds[MAX_SOCKS] = {};
>         int i, ret;
>
> -       for (i = 0; i < num_socks; i++) {
> -               fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
> -               fds[i].events = POLLOUT | POLLIN;
> -       }
> -
>         for (;;) {
>                 if (opt_poll) {
> -                       for (i = 0; i < num_socks; i++)
> +                       for (i = 0; i < num_socks; i++) {
> +                               fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
> +                               fds[i].events = POLLOUT | POLLIN;
>                                 xsks[i]->app_stats.opt_polls++;
> +                       }
>                         ret = poll(fds, num_socks, opt_timeout);
>                         if (ret <= 0)
>                                 continue;
>                 }
>
>                 for (i = 0; i < num_socks; i++)
> -                       l2fwd(xsks[i], fds);
> +                       l2fwd(xsks[i]);
>
>                 if (benchmark_done)
>                         break;
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock
  2020-11-19  8:30 ` [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock Björn Töpel
@ 2020-11-25  8:19   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  8:19 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add a new option to xdpsock, 'B', for busy-polling. This option will
> also set the batching size, 'b' option, to the busy-poll budget.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 40 +++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index 24aa7511c4c8..cb1eaee8a32b 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -95,6 +95,7 @@ static int opt_timeout = 1000;
>  static bool opt_need_wakeup = true;
>  static u32 opt_num_xsks = 1;
>  static u32 prog_id;
> +static bool opt_busy_poll;
>
>  struct xsk_ring_stats {
>         unsigned long rx_npkts;
> @@ -911,6 +912,7 @@ static struct option long_options[] = {
>         {"quiet", no_argument, 0, 'Q'},
>         {"app-stats", no_argument, 0, 'a'},
>         {"irq-string", no_argument, 0, 'I'},
> +       {"busy-poll", no_argument, 0, 'B'},
>         {0, 0, 0, 0}
>  };
>
> @@ -949,6 +951,7 @@ static void usage(const char *prog)
>                 "  -Q, --quiet          Do not display any stats.\n"
>                 "  -a, --app-stats      Display application (syscall) statistics.\n"
>                 "  -I, --irq-string     Display driver interrupt statistics for interface associated with irq-string.\n"
> +               "  -B, --busy-poll      Busy poll.\n"
>                 "\n";
>         fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
>                 opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
> @@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
>         opterr = 0;
>
>         for (;;) {
> -               c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
> +               c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
>                                 long_options, &option_index);
>                 if (c == -1)
>                         break;
> @@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
>                                 fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
>                                 usage(basename(argv[0]));
>                         }
> -
> +                       break;
> +               case 'B':
> +                       opt_busy_poll = 1;
>                         break;
>                 default:
>                         usage(basename(argv[0]));
> @@ -1131,7 +1136,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
>                 while (ret != rcvd) {
>                         if (ret < 0)
>                                 exit_with_error(-ret);
> -                       if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
> +                       if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
>                                 xsk->app_stats.fill_fail_polls++;
>                                 recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                         }
> @@ -1177,7 +1182,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
>
>         rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
>         if (!rcvd) {
> -               if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
> +               if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.rx_empty_polls++;
>                         recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
> @@ -1188,7 +1193,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
>         while (ret != rcvd) {
>                 if (ret < 0)
>                         exit_with_error(-ret);
> -               if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
> +               if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.fill_fail_polls++;
>                         recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
> @@ -1340,7 +1345,7 @@ static void l2fwd(struct xsk_socket_info *xsk)
>
>         rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
>         if (!rcvd) {
> -               if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
> +               if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
>                         xsk->app_stats.rx_empty_polls++;
>                         recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
>                 }
> @@ -1353,7 +1358,7 @@ static void l2fwd(struct xsk_socket_info *xsk)
>                 if (ret < 0)
>                         exit_with_error(-ret);
>                 complete_tx_l2fwd(xsk);
> -               if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
> +               if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
>                         xsk->app_stats.tx_wakeup_sendtos++;
>                         kick_tx(xsk);
>                 }
> @@ -1458,6 +1463,24 @@ static void enter_xsks_into_map(struct bpf_object *obj)
>         }
>  }
>
> +static void apply_setsockopt(struct xsk_socket_info *xsk)
> +{
> +       int sock_opt;
> +
> +       if (!opt_busy_poll)
> +               return;
> +
> +       sock_opt = 1;
> +       if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_PREFER_BUSY_POLL,
> +                      (void *)&sock_opt, sizeof(sock_opt)) < 0)
> +               exit_with_error(errno);
> +
> +       sock_opt = 20;
> +       if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
> +                      (void *)&sock_opt, sizeof(sock_opt)) < 0)
> +               exit_with_error(errno);
> +}
> +
>  int main(int argc, char **argv)
>  {
>         struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> @@ -1499,6 +1522,9 @@ int main(int argc, char **argv)
>         for (i = 0; i < opt_num_xsks; i++)
>                 xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
>
> +       for (i = 0; i < opt_num_xsks; i++)
> +               apply_setsockopt(xsks[i]);
> +
>         if (opt_bench == BENCH_TXONLY) {
>                 gen_eth_hdr_data();
>
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget
  2020-11-19  8:30 ` [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget Björn Töpel
@ 2020-11-25  8:23   ` Magnus Karlsson
  0 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25  8:23 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching
> option ('b').
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 5 +++++
>  1 file changed, 5 insertions(+)

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index cb1eaee8a32b..deba623e9003 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -1479,6 +1479,11 @@ static void apply_setsockopt(struct xsk_socket_info *xsk)
>         if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
>                        (void *)&sock_opt, sizeof(sock_opt)) < 0)
>                 exit_with_error(errno);
> +
> +       sock_opt = opt_batch_size;
> +       if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
> +                      (void *)&sock_opt, sizeof(sock_opt)) < 0)
> +               exit_with_error(errno);
>  }
>
>  int main(int argc, char **argv)
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-11-25 14:47   ` Magnus Karlsson
  2020-11-25 21:14   ` Michael S. Tsirkin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Magnus Karlsson @ 2020-11-25 14:47 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, bpf, Björn Töpel, Karlsson,
	Magnus, Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z,
	Jakub Kicinski, Eric Dumazet, Jonathan Lemon, Maxim Mikityanskiy,
	intel-wired-lan, netanel, akiyano, michael.chan, sgoutham,
	ioana.ciornei, ruxandra.radulescu, thomas.petazzoni, mcroce,
	saeedm, tariqt, aelior, ecree, ilias.apalodimas,
	grygorii.strashko, sthemmin, Michael S. Tsirkin, kda

On Thu, Nov 19, 2020 at 9:33 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add napi_id to the xdp_rxq_info structure, and make sure the XDP
> socket pick up the napi_id in the Rx path. The napi_id is used to find
> the corresponding NAPI structure for socket busy polling.
>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
>  .../ethernet/cavium/thunder/nicvf_queues.c    |  2 +-
>  .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
>  drivers/net/ethernet/intel/ice/ice_base.c     |  4 ++--
>  drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
>  drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
>  .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
>  drivers/net/ethernet/marvell/mvneta.c         |  2 +-
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 ++--
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
>  drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
>  drivers/net/ethernet/sfc/rx_common.c          |  2 +-
>  drivers/net/ethernet/socionext/netsec.c       |  2 +-
>  drivers/net/ethernet/ti/cpsw_priv.c           |  2 +-
>  drivers/net/hyperv/netvsc.c                   |  2 +-
>  drivers/net/tun.c                             |  2 +-
>  drivers/net/veth.c                            | 12 ++++++++----
>  drivers/net/virtio_net.c                      |  2 +-
>  drivers/net/xen-netfront.c                    |  2 +-
>  include/net/busy_poll.h                       | 19 +++++++++++++++----
>  include/net/xdp.h                             |  3 ++-
>  net/core/dev.c                                |  2 +-
>  net/core/xdp.c                                |  3 ++-
>  net/xdp/xsk.c                                 |  1 +
>  29 files changed, 54 insertions(+), 36 deletions(-)

For Intel drivers:

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index e8131dadc22c..6ad59f0068f6 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
>  {
>         int rc;
>
> -       rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
> +       rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
>
>         if (rc) {
>                 netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 7975f59735d6..725d929eddb1 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -2884,7 +2884,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>                 if (rc)
>                         return rc;
>
> -               rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> +               rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
>                 if (rc < 0)
>                         return rc;
>
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> index 7a141ce32e86..f782e6af45e9 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> @@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
>         rq->caching = 1;
>
>         /* Driver have no proper error path for failed XDP RX-queue info reg */
> -       WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
> +       WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
>
>         /* Send a mailbox msg to PF to config RQ */
>         mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> index cf9400a9886d..40953980e846 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> @@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
>                 return 0;
>
>         err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
> -                              fq->flowid);
> +                              fq->flowid, 0);
>         if (err) {
>                 dev_err(dev, "xdp_rxq_info_reg failed\n");
>                 return err;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index c21548c71bb1..9f73cd7aee09 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1447,7 +1447,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>         /* XDP RX-queue info only needed for RX rings exposed to XDP */
>         if (rx_ring->vsi->type == I40E_VSI_MAIN) {
>                 err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -                                      rx_ring->queue_index);
> +                                      rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
>                 if (err < 0)
>                         return err;
>         }
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index fe4320e2d1f2..3124a3bf519a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>                 if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
>                         /* coverity[check_return] */
>                         xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
> -                                        ring->q_index);
> +                                        ring->q_index, ring->q_vector->napi.napi_id);
>
>                 ring->xsk_pool = ice_xsk_pool(ring);
>                 if (ring->xsk_pool) {
> @@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>                                 /* coverity[check_return] */
>                                 xdp_rxq_info_reg(&ring->xdp_rxq,
>                                                  ring->netdev,
> -                                                ring->q_index);
> +                                                ring->q_index, ring->q_vector->napi.napi_id);
>
>                         err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
>                                                          MEM_TYPE_PAGE_SHARED,
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index eae75260fe20..77d5eae6b4c2 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
>         if (rx_ring->vsi->type == ICE_VSI_PF &&
>             !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
>                 if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -                                    rx_ring->q_index))
> +                                    rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
>                         goto err;
>         return 0;
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 5fc2c381da55..6a4ef4934fcf 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -4352,7 +4352,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
>
>         /* XDP RX-queue info */
>         if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -                            rx_ring->queue_index) < 0)
> +                            rx_ring->queue_index, 0) < 0)
>                 goto err;
>
>         return 0;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 45ae33e15303..50e6b8b6ba7b 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
>
>         /* XDP RX-queue info */
>         if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
> -                            rx_ring->queue_index) < 0)
> +                            rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
>                 goto err;
>
>         rx_ring->xdp_prog = adapter->xdp_prog;
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 82fce27f682b..4061cd7db5dd 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3493,7 +3493,7 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter,
>
>         /* XDP RX-queue info */
>         if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
> -                            rx_ring->queue_index) < 0)
> +                            rx_ring->queue_index, 0) < 0)
>                 goto err;
>
>         rx_ring->xdp_prog = adapter->xdp_prog;
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 183530ed4d1d..ba6dcb19bb1d 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3227,7 +3227,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
>                 return err;
>         }
>
> -       err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
> +       err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
>         if (err < 0)
>                 goto err_free_pp;
>
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index 3069e192d773..5504cbc24970 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -2614,11 +2614,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
>         mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
>
>         if (priv->percpu_pools) {
> -               err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
> +               err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
>                 if (err < 0)
>                         goto err_free_dma;
>
> -               err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
> +               err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
>                 if (err < 0)
>                         goto err_unregister_rxq_short;
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index b0f79a5151cf..40775cb8fb2a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
>         ring->log_stride = ffs(ring->stride) - 1;
>         ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
>
> -       if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
> +       if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
>                 goto err_ring;
>
>         tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 527c5f12c5af..427fc376fe1a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -434,7 +434,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>         rq_xdp_ix = rq->ix;
>         if (xsk)
>                 rq_xdp_ix += params->num_channels * MLX5E_RQ_GROUP_XSK;
> -       err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix);
> +       err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
>         if (err < 0)
>                 goto err_rq_xdp_prog;
>
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index b150da43adb2..b4acf2f41e84 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
>
>         if (dp->netdev) {
>                 err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
> -                                      rx_ring->idx);
> +                                      rx_ring->idx, rx_ring->r_vec->napi.napi_id);
>                 if (err < 0)
>                         return err;
>         }
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
> index 05e3a3b60269..9cf960a6d007 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> @@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
>
>                         /* Driver have no error path from here */
>                         WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
> -                                                fp->rxq->rxq_id) < 0);
> +                                                fp->rxq->rxq_id, 0) < 0);
>
>                         if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
>                                                        MEM_TYPE_PAGE_ORDER0,
> diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
> index 19cf7cac1e6e..68fc7d317693 100644
> --- a/drivers/net/ethernet/sfc/rx_common.c
> +++ b/drivers/net/ethernet/sfc/rx_common.c
> @@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
>
>         /* Initialise XDP queue information */
>         rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
> -                             rx_queue->core_index);
> +                             rx_queue->core_index, 0);
>
>         if (rc) {
>                 netif_err(efx, rx_err, efx->net_dev,
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 1503cc9ec6e2..27d3c9d9210e 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
>                 goto err_out;
>         }
>
> -       err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
> +       err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, priv->napi.napi_id);
>         if (err)
>                 goto err_out;
>
> diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
> index 31c5e36ff706..6dd73bd0f458 100644
> --- a/drivers/net/ethernet/ti/cpsw_priv.c
> +++ b/drivers/net/ethernet/ti/cpsw_priv.c
> @@ -1186,7 +1186,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
>         pool = cpsw->page_pool[ch];
>         rxq = &priv->xdp_rxq[ch];
>
> -       ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
> +       ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
>         if (ret)
>                 return ret;
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 0c3de94b5178..fa8341f8359a 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
>                 u64_stats_init(&nvchan->tx_stats.syncp);
>                 u64_stats_init(&nvchan->rx_stats.syncp);
>
> -               ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
> +               ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
>
>                 if (ret) {
>                         netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 3d45d56172cb..8867d39db6ac 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -780,7 +780,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>         } else {
>                 /* Setup XDP RX-queue info, for new tfile getting attached */
>                 err = xdp_rxq_info_reg(&tfile->xdp_rxq,
> -                                      tun->dev, tfile->queue_index);
> +                                      tun->dev, tfile->queue_index, 0);
>                 if (err < 0)
>                         goto out;
>                 err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 8c737668008a..9bd37c7151f8 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -884,7 +884,6 @@ static int veth_napi_add(struct net_device *dev)
>         for (i = 0; i < dev->real_num_rx_queues; i++) {
>                 struct veth_rq *rq = &priv->rq[i];
>
> -               netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
>                 napi_enable(&rq->xdp_napi);
>         }
>
> @@ -926,7 +925,8 @@ static int veth_enable_xdp(struct net_device *dev)
>                 for (i = 0; i < dev->real_num_rx_queues; i++) {
>                         struct veth_rq *rq = &priv->rq[i];
>
> -                       err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
> +                       netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
> +                       err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, rq->xdp_napi.napi_id);
>                         if (err < 0)
>                                 goto err_rxq_reg;
>
> @@ -952,8 +952,12 @@ static int veth_enable_xdp(struct net_device *dev)
>  err_reg_mem:
>         xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
>  err_rxq_reg:
> -       for (i--; i >= 0; i--)
> -               xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
> +       for (i--; i >= 0; i--) {
> +               struct veth_rq *rq = &priv->rq[i];
> +
> +               xdp_rxq_info_unreg(&rq->xdp_rxq);
> +               netif_napi_del(&rq->xdp_napi);
> +       }
>
>         return err;
>  }
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 21b71148c532..052975ea0af4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
>                         if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
>                                 schedule_delayed_work(&vi->refill, 0);
>
> -               err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
> +               err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id);
>                 if (err < 0)
>                         return err;
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 920cac4385bf..b01848ef4649 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
>         }
>
>         err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
> -                              queue->id);
> +                              queue->id, 0);
>         if (err) {
>                 netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
>                 goto err_free_pp;
> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
> index 2f8f51807b83..45b3e04b99d3 100644
> --- a/include/net/busy_poll.h
> +++ b/include/net/busy_poll.h
> @@ -135,14 +135,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
>         sk_rx_queue_set(sk, skb);
>  }
>
> -/* variant used for unconnected sockets */
> -static inline void sk_mark_napi_id_once(struct sock *sk,
> -                                       const struct sk_buff *skb)
> +static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
>  {
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>         if (!READ_ONCE(sk->sk_napi_id))
> -               WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
> +               WRITE_ONCE(sk->sk_napi_id, napi_id);
>  #endif
>  }
>
> +/* variant used for unconnected sockets */
> +static inline void sk_mark_napi_id_once(struct sock *sk,
> +                                       const struct sk_buff *skb)
> +{
> +       __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
> +}
> +
> +static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
> +                                           const struct xdp_buff *xdp)
> +{
> +       __sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
> +}
> +
>  #endif /* _LINUX_NET_BUSY_POLL_H */
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 7d48b2ae217a..700ad5db7f5d 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -59,6 +59,7 @@ struct xdp_rxq_info {
>         u32 queue_index;
>         u32 reg_state;
>         struct xdp_mem_info mem;
> +       unsigned int napi_id;
>  } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>
>  struct xdp_txq_info {
> @@ -226,7 +227,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
>  }
>
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> -                    struct net_device *dev, u32 queue_index);
> +                    struct net_device *dev, u32 queue_index, unsigned int napi_id);
>  void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
>  void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
>  bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7a1e5936c67f..3b6b0e175fe7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9810,7 +9810,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>                 rx[i].dev = dev;
>
>                 /* XDP RX-queue setup */
> -               err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
> +               err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
>                 if (err < 0)
>                         goto err_rxq_info;
>         }
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 3d330ebda893..17ffd33c6b18 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
>
>  /* Returns 0 on success, negative on failure */
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> -                    struct net_device *dev, u32 queue_index)
> +                    struct net_device *dev, u32 queue_index, unsigned int napi_id)
>  {
>         if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
>                 WARN(1, "Driver promised not to register this");
> @@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
>         xdp_rxq_info_init(xdp_rxq);
>         xdp_rxq->dev = dev;
>         xdp_rxq->queue_index = queue_index;
> +       xdp_rxq->napi_id = napi_id;
>
>         xdp_rxq->reg_state = REG_STATE_REGISTERED;
>         return 0;
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ecc4579e41ee..d4cb1c5c1abf 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
>         if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>                 return -EINVAL;
>
> +       sk_mark_napi_id_once_xdp(&xs->sk, xdp);
>         len = xdp->data_end - xdp->data;
>
>         return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
  2020-11-25 14:47   ` Magnus Karlsson
@ 2020-11-25 21:14   ` Michael S. Tsirkin
  2021-09-29 18:33   ` kernel test robot
  2021-11-05 20:17   ` kernel test robot
  3 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2020-11-25 21:14 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi,
	intel-wired-lan, netanel, akiyano, michael.chan, sgoutham,
	ioana.ciornei, ruxandra.radulescu, thomas.petazzoni, mcroce,
	saeedm, tariqt, aelior, ecree, ilias.apalodimas,
	grygorii.strashko, sthemmin, kda

On Thu, Nov 19, 2020 at 09:30:20AM +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Add napi_id to the xdp_rxq_info structure, and make sure the XDP
> socket pick up the napi_id in the Rx path. The napi_id is used to find
> the corresponding NAPI structure for socket busy polling.
> 
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

For virtio:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  |  2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
>  .../ethernet/cavium/thunder/nicvf_queues.c    |  2 +-
>  .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
>  drivers/net/ethernet/intel/ice/ice_base.c     |  4 ++--
>  drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
>  drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
>  .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
>  drivers/net/ethernet/marvell/mvneta.c         |  2 +-
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 ++--
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
>  drivers/net/ethernet/qlogic/qede/qede_main.c  |  2 +-
>  drivers/net/ethernet/sfc/rx_common.c          |  2 +-
>  drivers/net/ethernet/socionext/netsec.c       |  2 +-
>  drivers/net/ethernet/ti/cpsw_priv.c           |  2 +-
>  drivers/net/hyperv/netvsc.c                   |  2 +-
>  drivers/net/tun.c                             |  2 +-
>  drivers/net/veth.c                            | 12 ++++++++----
>  drivers/net/virtio_net.c                      |  2 +-
>  drivers/net/xen-netfront.c                    |  2 +-
>  include/net/busy_poll.h                       | 19 +++++++++++++++----
>  include/net/xdp.h                             |  3 ++-
>  net/core/dev.c                                |  2 +-
>  net/core/xdp.c                                |  3 ++-
>  net/xdp/xsk.c                                 |  1 +
>  29 files changed, 54 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index e8131dadc22c..6ad59f0068f6 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
>  {
>  	int rc;
>  
> -	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
> +	rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
>  
>  	if (rc) {
>  		netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 7975f59735d6..725d929eddb1 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -2884,7 +2884,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  		if (rc)
>  			return rc;
>  
> -		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> +		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
>  		if (rc < 0)
>  			return rc;
>  
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> index 7a141ce32e86..f782e6af45e9 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> @@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
>  	rq->caching = 1;
>  
>  	/* Driver have no proper error path for failed XDP RX-queue info reg */
> -	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
> +	WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
>  
>  	/* Send a mailbox msg to PF to config RQ */
>  	mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> index cf9400a9886d..40953980e846 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> @@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
>  		return 0;
>  
>  	err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
> -			       fq->flowid);
> +			       fq->flowid, 0);
>  	if (err) {
>  		dev_err(dev, "xdp_rxq_info_reg failed\n");
>  		return err;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index c21548c71bb1..9f73cd7aee09 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1447,7 +1447,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>  	/* XDP RX-queue info only needed for RX rings exposed to XDP */
>  	if (rx_ring->vsi->type == I40E_VSI_MAIN) {
>  		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -				       rx_ring->queue_index);
> +				       rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
>  		if (err < 0)
>  			return err;
>  	}
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index fe4320e2d1f2..3124a3bf519a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>  		if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
>  			/* coverity[check_return] */
>  			xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
> -					 ring->q_index);
> +					 ring->q_index, ring->q_vector->napi.napi_id);
>  
>  		ring->xsk_pool = ice_xsk_pool(ring);
>  		if (ring->xsk_pool) {
> @@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
>  				/* coverity[check_return] */
>  				xdp_rxq_info_reg(&ring->xdp_rxq,
>  						 ring->netdev,
> -						 ring->q_index);
> +						 ring->q_index, ring->q_vector->napi.napi_id);
>  
>  			err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
>  							 MEM_TYPE_PAGE_SHARED,
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index eae75260fe20..77d5eae6b4c2 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
>  	if (rx_ring->vsi->type == ICE_VSI_PF &&
>  	    !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
>  		if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -				     rx_ring->q_index))
> +				     rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
>  			goto err;
>  	return 0;
>  
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 5fc2c381da55..6a4ef4934fcf 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -4352,7 +4352,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
>  
>  	/* XDP RX-queue info */
>  	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> -			     rx_ring->queue_index) < 0)
> +			     rx_ring->queue_index, 0) < 0)
>  		goto err;
>  
>  	return 0;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 45ae33e15303..50e6b8b6ba7b 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
>  
>  	/* XDP RX-queue info */
>  	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
> -			     rx_ring->queue_index) < 0)
> +			     rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
>  		goto err;
>  
>  	rx_ring->xdp_prog = adapter->xdp_prog;
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 82fce27f682b..4061cd7db5dd 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3493,7 +3493,7 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter,
>  
>  	/* XDP RX-queue info */
>  	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
> -			     rx_ring->queue_index) < 0)
> +			     rx_ring->queue_index, 0) < 0)
>  		goto err;
>  
>  	rx_ring->xdp_prog = adapter->xdp_prog;
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 183530ed4d1d..ba6dcb19bb1d 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3227,7 +3227,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
>  		return err;
>  	}
>  
> -	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
> +	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
>  	if (err < 0)
>  		goto err_free_pp;
>  
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index 3069e192d773..5504cbc24970 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -2614,11 +2614,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
>  	mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
>  
>  	if (priv->percpu_pools) {
> -		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
> +		err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
>  		if (err < 0)
>  			goto err_free_dma;
>  
> -		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
> +		err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
>  		if (err < 0)
>  			goto err_unregister_rxq_short;
>  
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index b0f79a5151cf..40775cb8fb2a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
>  	ring->log_stride = ffs(ring->stride) - 1;
>  	ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
>  
> -	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
> +	if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
>  		goto err_ring;
>  
>  	tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 527c5f12c5af..427fc376fe1a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -434,7 +434,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>  	rq_xdp_ix = rq->ix;
>  	if (xsk)
>  		rq_xdp_ix += params->num_channels * MLX5E_RQ_GROUP_XSK;
> -	err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix);
> +	err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
>  	if (err < 0)
>  		goto err_rq_xdp_prog;
>  
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index b150da43adb2..b4acf2f41e84 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
>  
>  	if (dp->netdev) {
>  		err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
> -				       rx_ring->idx);
> +				       rx_ring->idx, rx_ring->r_vec->napi.napi_id);
>  		if (err < 0)
>  			return err;
>  	}
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
> index 05e3a3b60269..9cf960a6d007 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
> @@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
>  
>  			/* Driver have no error path from here */
>  			WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
> -						 fp->rxq->rxq_id) < 0);
> +						 fp->rxq->rxq_id, 0) < 0);
>  
>  			if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
>  						       MEM_TYPE_PAGE_ORDER0,
> diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
> index 19cf7cac1e6e..68fc7d317693 100644
> --- a/drivers/net/ethernet/sfc/rx_common.c
> +++ b/drivers/net/ethernet/sfc/rx_common.c
> @@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
>  
>  	/* Initialise XDP queue information */
>  	rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
> -			      rx_queue->core_index);
> +			      rx_queue->core_index, 0);
>  
>  	if (rc) {
>  		netif_err(efx, rx_err, efx->net_dev,
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 1503cc9ec6e2..27d3c9d9210e 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
>  		goto err_out;
>  	}
>  
> -	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
> +	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, priv->napi.napi_id);
>  	if (err)
>  		goto err_out;
>  
> diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
> index 31c5e36ff706..6dd73bd0f458 100644
> --- a/drivers/net/ethernet/ti/cpsw_priv.c
> +++ b/drivers/net/ethernet/ti/cpsw_priv.c
> @@ -1186,7 +1186,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
>  	pool = cpsw->page_pool[ch];
>  	rxq = &priv->xdp_rxq[ch];
>  
> -	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
> +	ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 0c3de94b5178..fa8341f8359a 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
>  		u64_stats_init(&nvchan->tx_stats.syncp);
>  		u64_stats_init(&nvchan->rx_stats.syncp);
>  
> -		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
> +		ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
>  
>  		if (ret) {
>  			netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 3d45d56172cb..8867d39db6ac 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -780,7 +780,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>  	} else {
>  		/* Setup XDP RX-queue info, for new tfile getting attached */
>  		err = xdp_rxq_info_reg(&tfile->xdp_rxq,
> -				       tun->dev, tfile->queue_index);
> +				       tun->dev, tfile->queue_index, 0);
>  		if (err < 0)
>  			goto out;
>  		err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 8c737668008a..9bd37c7151f8 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -884,7 +884,6 @@ static int veth_napi_add(struct net_device *dev)
>  	for (i = 0; i < dev->real_num_rx_queues; i++) {
>  		struct veth_rq *rq = &priv->rq[i];
>  
> -		netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
>  		napi_enable(&rq->xdp_napi);
>  	}
>  
> @@ -926,7 +925,8 @@ static int veth_enable_xdp(struct net_device *dev)
>  		for (i = 0; i < dev->real_num_rx_queues; i++) {
>  			struct veth_rq *rq = &priv->rq[i];
>  
> -			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
> +			netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
> +			err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, rq->xdp_napi.napi_id);
>  			if (err < 0)
>  				goto err_rxq_reg;
>  
> @@ -952,8 +952,12 @@ static int veth_enable_xdp(struct net_device *dev)
>  err_reg_mem:
>  	xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
>  err_rxq_reg:
> -	for (i--; i >= 0; i--)
> -		xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
> +	for (i--; i >= 0; i--) {
> +		struct veth_rq *rq = &priv->rq[i];
> +
> +		xdp_rxq_info_unreg(&rq->xdp_rxq);
> +		netif_napi_del(&rq->xdp_napi);
> +	}
>  
>  	return err;
>  }
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 21b71148c532..052975ea0af4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
>  			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  
> -		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
> +		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id);
>  		if (err < 0)
>  			return err;
>  
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 920cac4385bf..b01848ef4649 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
>  	}
>  
>  	err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
> -			       queue->id);
> +			       queue->id, 0);
>  	if (err) {
>  		netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
>  		goto err_free_pp;
> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
> index 2f8f51807b83..45b3e04b99d3 100644
> --- a/include/net/busy_poll.h
> +++ b/include/net/busy_poll.h
> @@ -135,14 +135,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
>  	sk_rx_queue_set(sk, skb);
>  }
>  
> -/* variant used for unconnected sockets */
> -static inline void sk_mark_napi_id_once(struct sock *sk,
> -					const struct sk_buff *skb)
> +static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
>  {
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	if (!READ_ONCE(sk->sk_napi_id))
> -		WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
> +		WRITE_ONCE(sk->sk_napi_id, napi_id);
>  #endif
>  }
>  
> +/* variant used for unconnected sockets */
> +static inline void sk_mark_napi_id_once(struct sock *sk,
> +					const struct sk_buff *skb)
> +{
> +	__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
> +}
> +
> +static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
> +					    const struct xdp_buff *xdp)
> +{
> +	__sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
> +}
> +
>  #endif /* _LINUX_NET_BUSY_POLL_H */
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 7d48b2ae217a..700ad5db7f5d 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -59,6 +59,7 @@ struct xdp_rxq_info {
>  	u32 queue_index;
>  	u32 reg_state;
>  	struct xdp_mem_info mem;
> +	unsigned int napi_id;
>  } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>  
>  struct xdp_txq_info {
> @@ -226,7 +227,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
>  }
>  
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> -		     struct net_device *dev, u32 queue_index);
> +		     struct net_device *dev, u32 queue_index, unsigned int napi_id);
>  void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
>  void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
>  bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7a1e5936c67f..3b6b0e175fe7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9810,7 +9810,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>  		rx[i].dev = dev;
>  
>  		/* XDP RX-queue setup */
> -		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
> +		err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
>  		if (err < 0)
>  			goto err_rxq_info;
>  	}
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 3d330ebda893..17ffd33c6b18 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
>  
>  /* Returns 0 on success, negative on failure */
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> -		     struct net_device *dev, u32 queue_index)
> +		     struct net_device *dev, u32 queue_index, unsigned int napi_id)
>  {
>  	if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
>  		WARN(1, "Driver promised not to register this");
> @@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
>  	xdp_rxq_info_init(xdp_rxq);
>  	xdp_rxq->dev = dev;
>  	xdp_rxq->queue_index = queue_index;
> +	xdp_rxq->napi_id = napi_id;
>  
>  	xdp_rxq->reg_state = REG_STATE_REGISTERED;
>  	return 0;
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ecc4579e41ee..d4cb1c5c1abf 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
>  	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>  		return -EINVAL;
>  
> +	sk_mark_napi_id_once_xdp(&xs->sk, xdp);
>  	len = xdp->data_end - xdp->data;
>  
>  	return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
  2020-11-25 14:47   ` Magnus Karlsson
  2020-11-25 21:14   ` Michael S. Tsirkin
@ 2021-09-29 18:33   ` kernel test robot
  2021-09-30  6:04     ` Magnus Karlsson
  2021-11-05 20:17   ` kernel test robot
  3 siblings, 1 reply; 33+ messages in thread
From: kernel test robot @ 2021-09-29 18:33 UTC (permalink / raw)
  To: Björn Töpel, netdev, bpf
  Cc: kbuild-all, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang

[-- Attachment #1: Type: text/plain, Size: 1803 bytes --]

Hi "Björn,

I love your patch! Yet something to improve:

[auto build test ERROR on 4e99d115d865d45e17e83478d757b58d8fa66d3c]

url:    https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
base:   4e99d115d865d45e17e83478d757b58d8fa66d3c
config: um-kunit_defconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/f481c00164924dd5d782a92cc67897cc7f804502
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
        git checkout f481c00164924dd5d782a92cc67897cc7f804502
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=um SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
   In file included from fs/select.c:32:
   include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
>> include/net/busy_poll.h:150:36: error: 'const struct sk_buff' has no member named 'napi_id'
     150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
         |                                    ^~


vim +150 include/net/busy_poll.h

   145	
   146	/* variant used for unconnected sockets */
   147	static inline void sk_mark_napi_id_once(struct sock *sk,
   148						const struct sk_buff *skb)
   149	{
 > 150		__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
   151	}
   152	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2021-09-29 18:33   ` kernel test robot
@ 2021-09-30  6:04     ` Magnus Karlsson
  2021-10-02  2:07       ` [kbuild-all] " Philip Li
  0 siblings, 1 reply; 33+ messages in thread
From: Magnus Karlsson @ 2021-09-30  6:04 UTC (permalink / raw)
  To: kernel test robot
  Cc: Björn Töpel, Network Development, bpf, kbuild-all,
	Björn Töpel, Karlsson, Magnus, Alexei Starovoitov,
	Daniel Borkmann, Fijalkowski, Maciej, Samudrala, Sridhar,
	Brandeburg, Jesse, Zhang, Qi Z

On Wed, Sep 29, 2021 at 8:37 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi "Björn,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on 4e99d115d865d45e17e83478d757b58d8fa66d3c]
>
> url:    https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
> base:   4e99d115d865d45e17e83478d757b58d8fa66d3c
> config: um-kunit_defconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/0day-ci/linux/commit/f481c00164924dd5d782a92cc67897cc7f804502
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
>         git checkout f481c00164924dd5d782a92cc67897cc7f804502
>         # save the attached .config to linux build tree
>         mkdir build_dir
>         make W=1 O=build_dir ARCH=um SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
>    In file included from fs/select.c:32:
>    include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
> >> include/net/busy_poll.h:150:36: error: 'const struct sk_buff' has no member named 'napi_id'
>      150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
>          |                                    ^~
>
>
> vim +150 include/net/busy_poll.h
>
>    145
>    146  /* variant used for unconnected sockets */
>    147  static inline void sk_mark_napi_id_once(struct sock *sk,
>    148                                          const struct sk_buff *skb)
>    149  {
>  > 150          __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
>    151  }
>    152

It seems that the robot tested an old commit and that this was already
fixed by Daniel 10 months ago. Slow mail delivery, a robot glitch, or
am I missing something?

commit ba0581749fec389e55c9d761f2716f8fcbefced5
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Dec 1 15:22:59 2020 +0100

    net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error

    Stephen reported the following build error for !CONFIG_NET_RX_BUSY_POLL
    built kernels:

      In file included from fs/select.c:32:
      include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
      include/net/busy_poll.h:150:36: error: 'const struct sk_buff'
has no member named 'napi_id'
        150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
            |                                    ^~

    Fix it by wrapping a CONFIG_NET_RX_BUSY_POLL around the helpers.

    Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Björn Töpel <bjorn.topel@intel.com>
    Link: https://lore.kernel.org/linux-next/20201201190746.7d3357fb@canb.auug.org.au


> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [kbuild-all] Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2021-09-30  6:04     ` Magnus Karlsson
@ 2021-10-02  2:07       ` Philip Li
  0 siblings, 0 replies; 33+ messages in thread
From: Philip Li @ 2021-10-02  2:07 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: kernel test robot, Björn Töpel, Network Development,
	bpf, kbuild-all, Björn Töpel, Karlsson, Magnus,
	Alexei Starovoitov, Daniel Borkmann, Fijalkowski, Maciej,
	Samudrala, Sridhar, Brandeburg, Jesse, Zhang, Qi Z

On Thu, Sep 30, 2021 at 08:04:06AM +0200, Magnus Karlsson wrote:
> On Wed, Sep 29, 2021 at 8:37 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi "Björn,
> >
> > I love your patch! Yet something to improve:
> >
> > [auto build test ERROR on 4e99d115d865d45e17e83478d757b58d8fa66d3c]
> >
> > url:    https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
> > base:   4e99d115d865d45e17e83478d757b58d8fa66d3c
> > config: um-kunit_defconfig (attached as .config)
> > compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> > reproduce (this is a W=1 build):
> >         # https://github.com/0day-ci/linux/commit/f481c00164924dd5d782a92cc67897cc7f804502
> >         git remote add linux-review https://github.com/0day-ci/linux
> >         git fetch --no-tags linux-review Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
> >         git checkout f481c00164924dd5d782a92cc67897cc7f804502
> >         # save the attached .config to linux build tree
> >         mkdir build_dir
> >         make W=1 O=build_dir ARCH=um SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> > All errors (new ones prefixed by >>):
> >
> >    cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
> >    In file included from fs/select.c:32:
> >    include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
> > >> include/net/busy_poll.h:150:36: error: 'const struct sk_buff' has no member named 'napi_id'
> >      150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
> >          |                                    ^~
> >
> >
> > vim +150 include/net/busy_poll.h
> >
> >    145
> >    146  /* variant used for unconnected sockets */
> >    147  static inline void sk_mark_napi_id_once(struct sock *sk,
> >    148                                          const struct sk_buff *skb)
> >    149  {
> >  > 150          __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
> >    151  }
> >    152
> 
> It seems that the robot tested an old commit and that this was already
> fixed by Daniel 10 months ago. Slow mail delivery, a robot glitch, or
> am I missing something?
sorry for the noise, we got storage crash and it seems to have old data
back in test queue after recovery. We will do cleanup to avoid this kind
of old commit being tested.

> 
> commit ba0581749fec389e55c9d761f2716f8fcbefced5
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Tue Dec 1 15:22:59 2020 +0100
> 
>     net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error
> 
>     Stephen reported the following build error for !CONFIG_NET_RX_BUSY_POLL
>     built kernels:
> 
>       In file included from fs/select.c:32:
>       include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
>       include/net/busy_poll.h:150:36: error: 'const struct sk_buff'
> has no member named 'napi_id'
>         150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
>             |                                    ^~
> 
>     Fix it by wrapping a CONFIG_NET_RX_BUSY_POLL around the helpers.
> 
>     Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
>     Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>     Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
>     Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>     Cc: Björn Töpel <bjorn.topel@intel.com>
>     Link: https://lore.kernel.org/linux-next/20201201190746.7d3357fb@canb.auug.org.au
> 
> 
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> _______________________________________________
> kbuild-all mailing list -- kbuild-all@lists.01.org
> To unsubscribe send an email to kbuild-all-leave@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path
  2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
                     ` (2 preceding siblings ...)
  2021-09-29 18:33   ` kernel test robot
@ 2021-11-05 20:17   ` kernel test robot
  3 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-11-05 20:17 UTC (permalink / raw)
  To: Björn Töpel, netdev, bpf
  Cc: llvm, kbuild-all, Björn Töpel, magnus.karlsson, ast,
	daniel, maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang

[-- Attachment #1: Type: text/plain, Size: 3310 bytes --]

Hi "Björn,

I love your patch! Yet something to improve:

[auto build test ERROR on 4e99d115d865d45e17e83478d757b58d8fa66d3c]

url:    https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
base:   4e99d115d865d45e17e83478d757b58d8fa66d3c
config: riscv-randconfig-r042-20210929 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project dc6e8dfdfe7efecfda318d43a06fae18b40eb498)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f481c00164924dd5d782a92cc67897cc7f804502
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bj-rn-T-pel/Introduce-preferred-busy-polling/20210929-234934
        git checkout f481c00164924dd5d782a92cc67897cc7f804502
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from fs/select.c:32:
   In file included from include/net/busy_poll.h:15:
   In file included from include/linux/netdevice.h:37:
   In file included from include/linux/ethtool.h:18:
   In file included from include/uapi/linux/ethtool.h:19:
   In file included from include/linux/if_ether.h:19:
   include/linux/skbuff.h:4622:26: error: implicit declaration of function 'skb_ext_add' [-Werror,-Wimplicit-function-declaration]
                   u64 *kcov_handle_ptr = skb_ext_add(skb, SKB_EXT_KCOV_HANDLE);
                                          ^
   include/linux/skbuff.h:4622:43: error: use of undeclared identifier 'SKB_EXT_KCOV_HANDLE'
                   u64 *kcov_handle_ptr = skb_ext_add(skb, SKB_EXT_KCOV_HANDLE);
                                                           ^
   include/linux/skbuff.h:4631:21: error: implicit declaration of function 'skb_ext_find' [-Werror,-Wimplicit-function-declaration]
           u64 *kcov_handle = skb_ext_find(skb, SKB_EXT_KCOV_HANDLE);
                              ^
   include/linux/skbuff.h:4631:39: error: use of undeclared identifier 'SKB_EXT_KCOV_HANDLE'
           u64 *kcov_handle = skb_ext_find(skb, SKB_EXT_KCOV_HANDLE);
                                                ^
   In file included from fs/select.c:32:
>> include/net/busy_poll.h:150:38: error: no member named 'napi_id' in 'struct sk_buff'
           __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
                                          ~~~  ^
   5 errors generated.


vim +150 include/net/busy_poll.h

   145	
   146	/* variant used for unconnected sockets */
   147	static inline void sk_mark_napi_id_once(struct sock *sk,
   148						const struct sk_buff *skb)
   149	{
 > 150		__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
   151	}
   152	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34628 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-11-05 20:18 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  8:30 [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
2020-11-19  8:30 ` [PATCH bpf-next v3 01/10] net: introduce " Björn Töpel
2020-11-24  0:04   ` Jakub Kicinski
2020-11-24  7:58     ` Björn Töpel
2020-11-24  0:11   ` Jakub Kicinski
2020-11-24  8:47     ` Björn Töpel
2020-11-24 16:21   ` Jakub Kicinski
2020-11-19  8:30 ` [PATCH bpf-next v3 02/10] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
2020-11-24 16:21   ` Jakub Kicinski
2020-11-19  8:30 ` [PATCH bpf-next v3 03/10] xsk: add support for recvmsg() Björn Töpel
2020-11-25  6:55   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 04/10] xsk: check need wakeup flag in sendmsg() Björn Töpel
2020-11-25  7:16   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 05/10] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
2020-11-25  7:58   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 06/10] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
2020-11-25 14:47   ` Magnus Karlsson
2020-11-25 21:14   ` Michael S. Tsirkin
2021-09-29 18:33   ` kernel test robot
2021-09-30  6:04     ` Magnus Karlsson
2021-10-02  2:07       ` [kbuild-all] " Philip Li
2021-11-05 20:17   ` kernel test robot
2020-11-19  8:30 ` [PATCH bpf-next v3 07/10] samples/bpf: use recvfrom() in xdpsock/rxdrop Björn Töpel
2020-11-25  7:59   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 08/10] samples/bpf: use recvfrom() in xdpsock/l2fwd Björn Töpel
2020-11-25  8:00   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 09/10] samples/bpf: add busy-poll support to xdpsock Björn Töpel
2020-11-25  8:19   ` Magnus Karlsson
2020-11-19  8:30 ` [PATCH bpf-next v3 10/10] samples/bpf: add option to set the busy-poll budget Björn Töpel
2020-11-25  8:23   ` Magnus Karlsson
2020-11-23 13:31 ` [PATCH bpf-next v3 00/10] Introduce preferred busy-polling Björn Töpel
2020-11-23 23:54   ` Jakub Kicinski
2020-11-24  0:14 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).