* [PATCH bpf-next 1/9] net: introduce preferred busy-polling
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 14:38 ` Eric Dumazet
2020-11-12 11:40 ` [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
` (7 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.
One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.
This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.
If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.
In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.
Example usage:
$ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
$ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.
Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
fs/eventpoll.c | 2 +-
include/linux/netdevice.h | 35 +++++++-----
include/net/busy_poll.h | 5 +-
include/net/sock.h | 3 ++
include/uapi/asm-generic/socket.h | 2 +
net/core/dev.c | 78 +++++++++++++++++++++------
net/core/sock.c | 9 ++++
11 files changed, 110 insertions(+), 32 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index de6c4df61082..538359642554 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -124,6 +124,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index d0a9ed2ca2d6..e406e73b5e6e 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -135,6 +135,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 10173c32195e..1bc46200889d 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -116,6 +116,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x4042
+#define SO_PREFER_BUSY_POLL 0x4043
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 8029b681fc7c..99688cf673a4 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -117,6 +117,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x0047
+#define SO_PREFER_BUSY_POLL 0x0048
+
#if !defined(__KERNEL__)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..e11fab3a0b9e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,7 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
unsigned int napi_id = READ_ONCE(ep->napi_id);
if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
- napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
+ napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false);
}
static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..84dca3efdd05 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -350,23 +350,25 @@ struct napi_struct {
};
enum {
- NAPI_STATE_SCHED, /* Poll is scheduled */
- NAPI_STATE_MISSED, /* reschedule a napi */
- NAPI_STATE_DISABLE, /* Disable pending */
- NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
- NAPI_STATE_LISTED, /* NAPI added to system lists */
- NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
- NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+ NAPI_STATE_SCHED, /* Poll is scheduled */
+ NAPI_STATE_MISSED, /* reschedule a napi */
+ NAPI_STATE_DISABLE, /* Disable pending */
+ NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
+ NAPI_STATE_LISTED, /* NAPI added to system lists */
+ NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */
+ NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */
+ NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/
};
enum {
- NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
- NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
- NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
- NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
- NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
- NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
- NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+ NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
+ NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
+ NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
+ NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
+ NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
+ NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
+ NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+ NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL),
};
enum gro_result {
@@ -437,6 +439,11 @@ static inline bool napi_disable_pending(struct napi_struct *n)
return test_bit(NAPI_STATE_DISABLE, &n->state);
}
+static inline bool napi_prefer_busy_poll(struct napi_struct *n)
+{
+ return test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
+}
+
bool napi_schedule_prep(struct napi_struct *n);
/**
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index b001fa91c14e..0292b8353d7e 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -43,7 +43,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg);
+ void *loop_end_arg, bool prefer_busy_poll);
#else /* CONFIG_NET_RX_BUSY_POLL */
static inline unsigned long net_busy_loop_on(void)
@@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
if (napi_id >= MIN_NAPI_ID)
- napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
+ napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
+ READ_ONCE(sk->sk_prefer_busy_poll));
#endif
}
diff --git a/include/net/sock.h b/include/net/sock.h
index a5c6ae78df77..716960a15e83 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -479,6 +479,9 @@ struct sock {
u32 sk_ack_backlog;
u32 sk_max_ack_backlog;
kuid_t sk_uid;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ u8 sk_prefer_busy_poll;
+#endif
struct pid *sk_peer_pid;
const struct cred *sk_peer_cred;
long sk_rcvtimeo;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 77f7c1638eb1..7dd02408b7ce 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -119,6 +119,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69
+
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/dev.c b/net/core/dev.c
index 9499a414d67e..49015b059549 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6454,7 +6454,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
- new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED);
+ new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+ NAPIF_STATE_PREFER_BUSY_POLL);
/* If STATE_MISSED was set, leave STATE_SCHED set,
* because we will call napi->poll() one more time.
@@ -6493,8 +6494,29 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
#define BUSY_POLL_BUDGET 8
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
+static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
{
+ if (!skip_schedule) {
+ gro_normal_list(napi);
+ __napi_schedule(napi);
+ return;
+ }
+
+ if (napi->gro_bitmask) {
+ /* flush too old packets
+ * If HZ < 1000, flush all packets.
+ */
+ napi_gro_flush(napi, HZ >= 1000);
+ }
+
+ gro_normal_list(napi);
+ clear_bit(NAPI_STATE_SCHED, &napi->state);
+}
+
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll)
+{
+ bool skip_schedule = false;
+ unsigned long timeout;
int rc;
/* Busy polling means there is a high chance device driver hard irq
@@ -6511,6 +6533,15 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
local_bh_disable();
+ if (prefer_busy_poll) {
+ napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
+ timeout = READ_ONCE(napi->dev->gro_flush_timeout);
+ if (napi->defer_hard_irqs_count && timeout) {
+ hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
+ skip_schedule = true;
+ }
+ }
+
/* All we really want here is to re-enable device interrupts.
* Ideally, a new ndo_busy_poll_stop() could avoid another round.
*/
@@ -6521,19 +6552,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
*/
trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
netpoll_poll_unlock(have_poll_lock);
- if (rc == BUSY_POLL_BUDGET) {
- /* As the whole budget was spent, we still own the napi so can
- * safely handle the rx_list.
- */
- gro_normal_list(napi);
- __napi_schedule(napi);
- }
+ if (rc == BUSY_POLL_BUDGET)
+ __busy_poll_stop(napi, skip_schedule);
local_bh_enable();
}
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg)
+ void *loop_end_arg, bool prefer_busy_poll)
{
unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6561,12 +6587,18 @@ void napi_busy_loop(unsigned int napi_id,
* we avoid dirtying napi->state as much as we can.
*/
if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
- NAPIF_STATE_IN_BUSY_POLL))
+ NAPIF_STATE_IN_BUSY_POLL)) {
+ if (prefer_busy_poll)
+ set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
goto count;
+ }
if (cmpxchg(&napi->state, val,
val | NAPIF_STATE_IN_BUSY_POLL |
- NAPIF_STATE_SCHED) != val)
+ NAPIF_STATE_SCHED) != val) {
+ if (prefer_busy_poll)
+ set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
goto count;
+ }
have_poll_lock = netpoll_poll_lock(napi);
napi_poll = napi->poll;
}
@@ -6584,7 +6616,7 @@ void napi_busy_loop(unsigned int napi_id,
if (unlikely(need_resched())) {
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock);
+ busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
preempt_enable();
rcu_read_unlock();
cond_resched();
@@ -6595,7 +6627,7 @@ void napi_busy_loop(unsigned int napi_id,
cpu_relax();
}
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock);
+ busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
preempt_enable();
out:
rcu_read_unlock();
@@ -6646,8 +6678,10 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
* NAPI_STATE_MISSED, since we do not react to a device IRQ.
*/
if (!napi_disable_pending(napi) &&
- !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
+ !test_and_set_bit(NAPI_STATE_SCHED, &napi->state)) {
+ clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
__napi_schedule_irqoff(napi);
+ }
return HRTIMER_NORESTART;
}
@@ -6705,6 +6739,7 @@ void napi_disable(struct napi_struct *n)
hrtimer_cancel(&n->timer);
+ clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
clear_bit(NAPI_STATE_DISABLE, &n->state);
}
EXPORT_SYMBOL(napi_disable);
@@ -6767,6 +6802,19 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
if (likely(work < weight))
goto out_unlock;
+ /* The NAPI context has more processing work, but busy-polling
+ * is preferred. Exit early.
+ */
+ if (napi_prefer_busy_poll(n)) {
+ if (napi_complete_done(n, work)) {
+ /* If timeout is not set, we need to make sure
+ * that the NAPI is re-scheduled.
+ */
+ napi_schedule(n);
+ }
+ goto out_unlock;
+ }
+
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..248f6a763661 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
sk->sk_ll_usec = val;
}
break;
+ case SO_PREFER_BUSY_POLL:
+ if (valbool && !capable(CAP_NET_ADMIN))
+ ret = -EPERM;
+ else
+ sk->sk_prefer_busy_poll = valbool;
+ break;
#endif
case SO_MAX_PACING_RATE:
@@ -1523,6 +1529,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
case SO_BUSY_POLL:
v.val = sk->sk_ll_usec;
break;
+ case SO_PREFER_BUSY_POLL:
+ v.val = sk->sk_prefer_busy_poll;
+ break;
#endif
case SO_MAX_PACING_RATE:
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next 1/9] net: introduce preferred busy-polling
2020-11-12 11:40 ` [PATCH bpf-next 1/9] net: introduce " Björn Töpel
@ 2020-11-12 14:38 ` Eric Dumazet
2020-11-12 14:43 ` Björn Töpel
0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-11-12 14:38 UTC (permalink / raw)
To: Björn Töpel
Cc: netdev, bpf, Björn Töpel, magnus.karlsson,
Alexei Starovoitov, Daniel Borkmann, maciej.fijalkowski,
Samudrala, Sridhar, Jesse Brandeburg, qi.z.zhang, Jakub Kicinski,
Jonathan Lemon, maximmi
On Thu, Nov 12, 2020 at 12:41 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
> option or system-wide using the /proc/sys/net/core/busy_read knob, is
> an opportunistic. That means that if the NAPI context is not
> scheduled, it will poll it. If, after busy-polling, the budget is
> exceeded the busy-polling logic will schedule the NAPI onto the
> regular softirq handling.
>
> One implication of the behavior above is that a busy/heavy loaded NAPI
> context will never enter/allow for busy-polling. Some applications
> prefer that most NAPI processing would be done by busy-polling.
>
> This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
> in concert with the napi_defer_hard_irqs and gro_flush_timeout
> knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
> introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
> feature"), and allows for a user to defer interrupts to be enabled and
> instead schedule the NAPI context from a watchdog timer. When a user
> enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
> and the NAPI context is being processed by a softirq, the softirq NAPI
> processing will exit early to allow the busy-polling to be performed.
>
> If the application stops performing busy-polling via a system call,
> the watchdog timer defined by gro_flush_timeout will timeout, and
> regular softirq handling will resume.
>
> In summary; Heavy traffic applications that prefer busy-polling over
> softirq processing should use this option.
>
> Example usage:
>
> $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
> $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
>
> Note that the timeout should be larger than the userspace processing
> window, otherwise the watchdog will timeout and fall back to regular
> softirq processing.
>
> Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
...
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 727ea1cc633c..248f6a763661 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
> sk->sk_ll_usec = val;
> }
> break;
> + case SO_PREFER_BUSY_POLL:
> + if (valbool && !capable(CAP_NET_ADMIN))
> + ret = -EPERM;
> + else
> + sk->sk_prefer_busy_poll = valbool;
WRITE_ONCE(sk->sk_prefer_busy_poll, valbool);
So that KCSAN is happy while readers read this field while socket is not locked.
> + break;
> #endif
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next 1/9] net: introduce preferred busy-polling
2020-11-12 14:38 ` Eric Dumazet
@ 2020-11-12 14:43 ` Björn Töpel
0 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 14:43 UTC (permalink / raw)
To: Eric Dumazet, Björn Töpel
Cc: netdev, bpf, magnus.karlsson, Alexei Starovoitov,
Daniel Borkmann, maciej.fijalkowski, Samudrala, Sridhar,
Jesse Brandeburg, qi.z.zhang, Jakub Kicinski, Jonathan Lemon,
maximmi
On 2020-11-12 15:38, Eric Dumazet wrote:
> On Thu, Nov 12, 2020 at 12:41 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
>> option or system-wide using the /proc/sys/net/core/busy_read knob, is
>> an opportunistic. That means that if the NAPI context is not
>> scheduled, it will poll it. If, after busy-polling, the budget is
>> exceeded the busy-polling logic will schedule the NAPI onto the
>> regular softirq handling.
>>
>> One implication of the behavior above is that a busy/heavy loaded NAPI
>> context will never enter/allow for busy-polling. Some applications
>> prefer that most NAPI processing would be done by busy-polling.
>>
>> This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
>> in concert with the napi_defer_hard_irqs and gro_flush_timeout
>> knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
>> introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
>> feature"), and allows for a user to defer interrupts to be enabled and
>> instead schedule the NAPI context from a watchdog timer. When a user
>> enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
>> and the NAPI context is being processed by a softirq, the softirq NAPI
>> processing will exit early to allow the busy-polling to be performed.
>>
>> If the application stops performing busy-polling via a system call,
>> the watchdog timer defined by gro_flush_timeout will timeout, and
>> regular softirq handling will resume.
>>
>> In summary; Heavy traffic applications that prefer busy-polling over
>> softirq processing should use this option.
>>
>> Example usage:
>>
>> $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
>> $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
>>
>> Note that the timeout should be larger than the userspace processing
>> window, otherwise the watchdog will timeout and fall back to regular
>> softirq processing.
>>
>> Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> ...
>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 727ea1cc633c..248f6a763661 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1159,6 +1159,12 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>> sk->sk_ll_usec = val;
>> }
>> break;
>> + case SO_PREFER_BUSY_POLL:
>> + if (valbool && !capable(CAP_NET_ADMIN))
>> + ret = -EPERM;
>> + else
>> + sk->sk_prefer_busy_poll = valbool;
>
> WRITE_ONCE(sk->sk_prefer_busy_poll, valbool);
>
> So that KCSAN is happy while readers read this field while socket is not locked.
>
Thanks Eric, I'll fix that!
Also, in patch 5, READ_ONCE is missing. I'll address that as well.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 1/9] net: introduce " Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 14:36 ` Eric Dumazet
2020-11-12 11:40 ` [PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
` (6 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
arch/alpha/include/uapi/asm/socket.h | 1 +
arch/mips/include/uapi/asm/socket.h | 1 +
arch/parisc/include/uapi/asm/socket.h | 1 +
arch/sparc/include/uapi/asm/socket.h | 1 +
fs/eventpoll.c | 3 ++-
include/net/busy_poll.h | 7 +++++--
include/net/sock.h | 1 +
include/uapi/asm-generic/socket.h | 1 +
net/core/dev.c | 21 ++++++++++-----------
net/core/sock.c | 10 ++++++++++
10 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 538359642554..57420356ce4c 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -125,6 +125,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_PREFER_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index e406e73b5e6e..2d949969313b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -136,6 +136,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_PREFER_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 1bc46200889d..f60904329bbc 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -117,6 +117,7 @@
#define SO_DETACH_REUSEPORT_BPF 0x4042
#define SO_PREFER_BUSY_POLL 0x4043
+#define SO_BUSY_POLL_BUDGET 0x4044
#if !defined(__KERNEL__)
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 99688cf673a4..848a22fbac20 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -118,6 +118,7 @@
#define SO_DETACH_REUSEPORT_BPF 0x0047
#define SO_PREFER_BUSY_POLL 0x0048
+#define SO_BUSY_POLL_BUDGET 0x0049
#if !defined(__KERNEL__)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e11fab3a0b9e..73c346e503d7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -397,7 +397,8 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
unsigned int napi_id = READ_ONCE(ep->napi_id);
if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
- napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false);
+ napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
+ BUSY_POLL_BUDGET);
}
static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 0292b8353d7e..b4f653cc15a7 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -23,6 +23,8 @@
*/
#define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1))
+#define BUSY_POLL_BUDGET 8
+
#ifdef CONFIG_NET_RX_BUSY_POLL
struct napi_struct;
@@ -43,7 +45,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg, bool prefer_busy_poll);
+ void *loop_end_arg, bool prefer_busy_poll, u16 budget);
#else /* CONFIG_NET_RX_BUSY_POLL */
static inline unsigned long net_busy_loop_on(void)
@@ -106,7 +108,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
if (napi_id >= MIN_NAPI_ID)
napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
- READ_ONCE(sk->sk_prefer_busy_poll));
+ READ_ONCE(sk->sk_prefer_busy_poll),
+ sk->sk_busy_poll_budget ?: BUSY_POLL_BUDGET);
#endif
}
diff --git a/include/net/sock.h b/include/net/sock.h
index 716960a15e83..1ddfb4a2dac2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -481,6 +481,7 @@ struct sock {
kuid_t sk_uid;
#ifdef CONFIG_NET_RX_BUSY_POLL
u8 sk_prefer_busy_poll;
+ u16 sk_busy_poll_budget;
#endif
struct pid *sk_peer_pid;
const struct cred *sk_peer_cred;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 7dd02408b7ce..4dcd13d097a9 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -120,6 +120,7 @@
#define SO_DETACH_REUSEPORT_BPF 68
#define SO_PREFER_BUSY_POLL 69
+#define SO_BUSY_POLL_BUDGET 70
#if !defined(__KERNEL__)
diff --git a/net/core/dev.c b/net/core/dev.c
index 49015b059549..33c67004f2ad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6492,8 +6492,6 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
#if defined(CONFIG_NET_RX_BUSY_POLL)
-#define BUSY_POLL_BUDGET 8
-
static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
{
if (!skip_schedule) {
@@ -6513,7 +6511,8 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
clear_bit(NAPI_STATE_SCHED, &napi->state);
}
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll)
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll,
+ u16 budget)
{
bool skip_schedule = false;
unsigned long timeout;
@@ -6545,21 +6544,21 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
/* All we really want here is to re-enable device interrupts.
* Ideally, a new ndo_busy_poll_stop() could avoid another round.
*/
- rc = napi->poll(napi, BUSY_POLL_BUDGET);
+ rc = napi->poll(napi, budget);
/* We can't gro_normal_list() here, because napi->poll() might have
* rearmed the napi (napi_complete_done()) in which case it could
* already be running on another CPU.
*/
- trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
+ trace_napi_poll(napi, rc, budget);
netpoll_poll_unlock(have_poll_lock);
- if (rc == BUSY_POLL_BUDGET)
+ if (rc == budget)
__busy_poll_stop(napi, skip_schedule);
local_bh_enable();
}
void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg, bool prefer_busy_poll)
+ void *loop_end_arg, bool prefer_busy_poll, u16 budget)
{
unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
int (*napi_poll)(struct napi_struct *napi, int budget);
@@ -6602,8 +6601,8 @@ void napi_busy_loop(unsigned int napi_id,
have_poll_lock = netpoll_poll_lock(napi);
napi_poll = napi->poll;
}
- work = napi_poll(napi, BUSY_POLL_BUDGET);
- trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
+ work = napi_poll(napi, budget);
+ trace_napi_poll(napi, work, budget);
gro_normal_list(napi);
count:
if (work > 0)
@@ -6616,7 +6615,7 @@ void napi_busy_loop(unsigned int napi_id,
if (unlikely(need_resched())) {
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
+ busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
preempt_enable();
rcu_read_unlock();
cond_resched();
@@ -6627,7 +6626,7 @@ void napi_busy_loop(unsigned int napi_id,
cpu_relax();
}
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock, prefer_busy_poll);
+ busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
preempt_enable();
out:
rcu_read_unlock();
diff --git a/net/core/sock.c b/net/core/sock.c
index 248f6a763661..e08d5a6ae9d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
else
sk->sk_prefer_busy_poll = valbool;
break;
+ case SO_BUSY_POLL_BUDGET:
+ if (val > sk->sk_busy_poll_budget && !capable(CAP_NET_ADMIN)) {
+ ret = -EPERM;
+ } else {
+ if (val < 0)
+ ret = -EINVAL;
+ else
+ sk->sk_busy_poll_budget = val;
+ }
+ break;
#endif
case SO_MAX_PACING_RATE:
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option
2020-11-12 11:40 ` [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-11-12 14:36 ` Eric Dumazet
2020-11-12 14:45 ` Björn Töpel
0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-11-12 14:36 UTC (permalink / raw)
To: Björn Töpel
Cc: netdev, bpf, Björn Töpel, magnus.karlsson,
Alexei Starovoitov, Daniel Borkmann, maciej.fijalkowski,
Samudrala, Sridhar, Jesse Brandeburg, qi.z.zhang, Jakub Kicinski,
Jonathan Lemon, maximmi
On Thu, Nov 12, 2020 at 12:41 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This option lets a user set a per socket NAPI budget for
> busy-polling. If the options is not set, it will use the default of 8.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>
...
> #else /* CONFIG_NET_RX_BUSY_POLL */
> static inline unsigned long net_busy_loop_on(void)
> @@ -106,7 +108,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
>
> if (napi_id >= MIN_NAPI_ID)
> napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
> - READ_ONCE(sk->sk_prefer_busy_poll));
> + READ_ONCE(sk->sk_prefer_busy_poll),
> + sk->sk_busy_poll_budget ?: BUSY_POLL_BUDGET);
Please use :
READ_ONCE(sk->sk_busy_poll_budget) ?: BUSY_POLL_BUDGET
Because sk_busy_loop() is usually called without socket lock being held.
This will prevent yet another KCSAN report.
> #endif
> }
>
...
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
> else
> sk->sk_prefer_busy_poll = valbool;
> break;
> + case SO_BUSY_POLL_BUDGET:
> + if (val > sk->sk_busy_poll_budget && !capable(CAP_NET_ADMIN)) {
> + ret = -EPERM;
> + } else {
> + if (val < 0)
if (val < 0 || val > (u16)~0)
> + ret = -EINVAL;
> + else
> + sk->sk_busy_poll_budget = val;
WRITE_ONCE(sk->sk_busy_poll_budget, val);
> + }
> + break;
> #endif
>
> case SO_MAX_PACING_RATE:
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option
2020-11-12 14:36 ` Eric Dumazet
@ 2020-11-12 14:45 ` Björn Töpel
0 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 14:45 UTC (permalink / raw)
To: Eric Dumazet, Björn Töpel
Cc: netdev, bpf, magnus.karlsson, Alexei Starovoitov,
Daniel Borkmann, maciej.fijalkowski, Samudrala, Sridhar,
Jesse Brandeburg, qi.z.zhang, Jakub Kicinski, Jonathan Lemon,
maximmi
On 2020-11-12 15:36, Eric Dumazet wrote:
> On Thu, Nov 12, 2020 at 12:41 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> This option lets a user set a per socket NAPI budget for
>> busy-polling. If the options is not set, it will use the default of 8.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>
>
> ...
>
>> #else /* CONFIG_NET_RX_BUSY_POLL */
>> static inline unsigned long net_busy_loop_on(void)
>> @@ -106,7 +108,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
>>
>> if (napi_id >= MIN_NAPI_ID)
>> napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
>> - READ_ONCE(sk->sk_prefer_busy_poll));
>> + READ_ONCE(sk->sk_prefer_busy_poll),
>> + sk->sk_busy_poll_budget ?: BUSY_POLL_BUDGET);
>
> Please use :
>
> READ_ONCE(sk->sk_busy_poll_budget) ?: BUSY_POLL_BUDGET
>
> Because sk_busy_loop() is usually called without socket lock being held.
>
> This will prevent yet another KCSAN report.
>
>> #endif
>> }
>>
>
> ...
>
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1165,6 +1165,16 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>> else
>> sk->sk_prefer_busy_poll = valbool;
>> break;
>> + case SO_BUSY_POLL_BUDGET:
>> + if (val > sk->sk_busy_poll_budget && !capable(CAP_NET_ADMIN)) {
>> + ret = -EPERM;
>> + } else {
>> + if (val < 0)
>
> if (val < 0 || val > (u16)~0)
>
>> + ret = -EINVAL;
>> + else
>> + sk->sk_busy_poll_budget = val;
>
>
> WRITE_ONCE(sk->sk_busy_poll_budget, val);
>
Thanks for the review! I'll address it all.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH bpf-next 3/9] xsk: add support for recvmsg()
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 1/9] net: introduce " Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 2/9] net: add SO_BUSY_POLL_BUDGET socket option Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Add support for non-blocking recvmsg() to XDP sockets. Previously,
only sendmsg() was supported by XDP socket. Now, for symmetry and the
upcoming busy-polling support, recvmsg() is added.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b71a32eeae65..17d51d1a5752 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -474,6 +474,26 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
return __xsk_sendmsg(sk);
}
+static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
+{
+ bool need_wait = !(flags & MSG_DONTWAIT);
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+
+ if (unlikely(!(xs->dev->flags & IFF_UP)))
+ return -ENETDOWN;
+ if (unlikely(!xs->rx))
+ return -ENOBUFS;
+ if (unlikely(!xsk_is_bound(xs)))
+ return -ENXIO;
+ if (unlikely(need_wait))
+ return -EOPNOTSUPP;
+
+ if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
+ return xsk_wakeup(xs, XDP_WAKEUP_RX);
+ return 0;
+}
+
static __poll_t xsk_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait)
{
@@ -1134,7 +1154,7 @@ static const struct proto_ops xsk_proto_ops = {
.setsockopt = xsk_setsockopt,
.getsockopt = xsk_getsockopt,
.sendmsg = xsk_sendmsg,
- .recvmsg = sock_no_recvmsg,
+ .recvmsg = xsk_recvmsg,
.mmap = xsk_mmap,
.sendpage = sock_no_sendpage,
};
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg()
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (2 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 3/9] xsk: add support for recvmsg() Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Add a check for need wake up in sendmsg(), so that if a user calls
sendmsg() when no wakeup is needed, do not trigger a wakeup.
To simplify the need wakeup check in the syscall, unconditionally
enable the need wakeup flag for Tx. This has a side-effect for poll();
If poll() is called for a socket without enabled need wakeup, a Tx
wakeup is unconditionally performed.
The wakeup matrix for AF_XDP now looks like:
need wakeup | poll() | sendmsg() | recvmsg()
------------+--------------+-------------+------------
disabled | wake Tx | wake Tx | nop
enabled | check flag; | check flag; | check flag;
| wake Tx/Rx | wake Tx | wake Rx
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 6 +++++-
net/xdp/xsk_buff_pool.c | 13 ++++++-------
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 17d51d1a5752..2e5b9f27c7a3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -465,13 +465,17 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
struct sock *sk = sock->sk;
struct xdp_sock *xs = xdp_sk(sk);
+ struct xsk_buff_pool *pool;
if (unlikely(!xsk_is_bound(xs)))
return -ENXIO;
if (unlikely(need_wait))
return -EOPNOTSUPP;
- return __xsk_sendmsg(sk);
+ pool = xs->pool;
+ if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
+ return __xsk_sendmsg(sk);
+ return 0;
}
static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 64c9e55d4d4e..a4acb5e9576f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
if (err)
return err;
- if (flags & XDP_USE_NEED_WAKEUP) {
+ if (flags & XDP_USE_NEED_WAKEUP)
pool->uses_need_wakeup = true;
- /* Tx needs to be explicitly woken up the first time.
- * Also for supporting drivers that do not implement this
- * feature. They will always have to call sendto().
- */
- pool->cached_need_wakeup = XDP_WAKEUP_TX;
- }
+ /* Tx needs to be explicitly woken up the first time. Also
+ * for supporting drivers that do not implement this
+ * feature. They will always have to call sendto() or poll().
+ */
+ pool->cached_need_wakeup = XDP_WAKEUP_TX;
dev_hold(netdev);
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg()
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (3 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 4/9] xsk: check need wakeup flag in sendmsg() Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Wire-up XDP socket busy-poll support for recvmsg() and sendmsg(). If
the XDP socket prefers busy-polling, make sure that no wakeup/IPI is
performed.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2e5b9f27c7a3..00663390a4a8 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -23,6 +23,7 @@
#include <linux/netdevice.h>
#include <linux/rculist.h>
#include <net/xdp_sock_drv.h>
+#include <net/busy_poll.h>
#include <net/xdp.h>
#include "xsk_queue.h"
@@ -460,6 +461,16 @@ static int __xsk_sendmsg(struct sock *sk)
return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
}
+static bool xsk_no_wakeup(struct sock *sk)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ /* Prefer busy-polling, skip the wakeup. */
+ return sk->sk_prefer_busy_poll && sk->sk_ll_usec && sk->sk_napi_id >= MIN_NAPI_ID;
+#else
+ return false;
+#endif
+}
+
static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
{
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
@@ -472,6 +483,12 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (unlikely(need_wait))
return -EOPNOTSUPP;
+ if (sk_can_busy_loop(sk))
+ sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
+ if (xsk_no_wakeup(sk))
+ return 0;
+
pool = xs->pool;
if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
return __xsk_sendmsg(sk);
@@ -493,6 +510,12 @@ static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int fl
if (unlikely(need_wait))
return -EOPNOTSUPP;
+ if (sk_can_busy_loop(sk))
+ sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+
+ if (xsk_no_wakeup(sk))
+ return 0;
+
if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
return xsk_wakeup(xs, XDP_WAKEUP_RX);
return 0;
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (4 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 5/9] xsk: add busy-poll support for {recv,send}msg() Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-15 4:48 ` Ilias Apalodimas
2020-11-12 11:40 ` [PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
` (2 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi,
intel-wired-lan, netanel, akiyano, michael.chan, sgoutham,
ioana.ciornei, ruxandra.radulescu, thomas.petazzoni, mcroce,
saeedm, tariqt, aelior, ecree, ilias.apalodimas,
grygorii.strashko, sthemmin, mst, kda
From: Björn Töpel <bjorn.topel@intel.com>
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
.../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
.../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
.../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
.../ethernet/netronome/nfp/nfp_net_common.c | 2 +-
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
drivers/net/ethernet/sfc/rx_common.c | 2 +-
drivers/net/ethernet/socionext/netsec.c | 2 +-
drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
drivers/net/hyperv/netvsc.c | 2 +-
drivers/net/tun.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/virtio_net.c | 2 +-
drivers/net/xen-netfront.c | 2 +-
include/net/busy_poll.h | 19 +++++++++++++++----
include/net/xdp.h | 3 ++-
net/core/dev.c | 2 +-
net/core/xdp.c | 3 ++-
net/xdp/xsk.c | 1 +
29 files changed, 47 insertions(+), 33 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index e8131dadc22c..6ad59f0068f6 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
{
int rc;
- rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
+ rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
if (rc) {
netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index fa147865e33f..5df13387ab74 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2894,7 +2894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
if (rc)
return rc;
- rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
+ rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
if (rc < 0)
return rc;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 7a141ce32e86..f782e6af45e9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
rq->caching = 1;
/* Driver have no proper error path for failed XDP RX-queue info reg */
- WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
+ WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
/* Send a mailbox msg to PF to config RQ */
mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index cf9400a9886d..40953980e846 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
return 0;
err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
- fq->flowid);
+ fq->flowid, 0);
if (err) {
dev_err(dev, "xdp_rxq_info_reg failed\n");
return err;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d43ce13a93c9..a3d5bdaca2f5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1436,7 +1436,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
/* XDP RX-queue info only needed for RX rings exposed to XDP */
if (rx_ring->vsi->type == I40E_VSI_MAIN) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->queue_index);
+ rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
if (err < 0)
return err;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index fe4320e2d1f2..3124a3bf519a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
/* coverity[check_return] */
xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
- ring->q_index);
+ ring->q_index, ring->q_vector->napi.napi_id);
ring->xsk_pool = ice_xsk_pool(ring);
if (ring->xsk_pool) {
@@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
/* coverity[check_return] */
xdp_rxq_info_reg(&ring->xdp_rxq,
ring->netdev,
- ring->q_index);
+ ring->q_index, ring->q_vector->napi.napi_id);
err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_PAGE_SHARED,
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index eae75260fe20..77d5eae6b4c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
if (rx_ring->vsi->type == ICE_VSI_PF &&
!xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->q_index))
+ rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
goto err;
return 0;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5fc2c381da55..6a4ef4934fcf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4352,7 +4352,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->queue_index) < 0)
+ rx_ring->queue_index, 0) < 0)
goto err;
return 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 45ae33e15303..50e6b8b6ba7b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
- rx_ring->queue_index) < 0)
+ rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
goto err;
rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 82fce27f682b..4061cd7db5dd 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3493,7 +3493,7 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter,
/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
- rx_ring->queue_index) < 0)
+ rx_ring->queue_index, 0) < 0)
goto err;
rx_ring->xdp_prog = adapter->xdp_prog;
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 54b0bf574c05..7d0098f4ef9d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3219,7 +3219,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
return err;
}
- err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
if (err < 0)
goto err_free_pp;
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index f6616c8933ca..ff8729b6c414 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2606,11 +2606,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
if (priv->percpu_pools) {
- err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
if (err < 0)
goto err_free_dma;
- err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
+ err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
if (err < 0)
goto err_unregister_rxq_short;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 502d1b97855c..f561979e5731 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
- if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
+ if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
goto err_ring;
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b3f02aac7f26..3e7ecd7f0290 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -434,7 +434,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
rq_xdp_ix = rq->ix;
if (xsk)
rq_xdp_ix += params->num_channels * MLX5E_RQ_GROUP_XSK;
- err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix);
+ err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
if (err < 0)
goto err_rq_xdp_prog;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b150da43adb2..b4acf2f41e84 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
if (dp->netdev) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
- rx_ring->idx);
+ rx_ring->idx, rx_ring->r_vec->napi.napi_id);
if (err < 0)
return err;
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..9cf960a6d007 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
/* Driver have no error path from here */
WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
- fp->rxq->rxq_id) < 0);
+ fp->rxq->rxq_id, 0) < 0);
if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
MEM_TYPE_PAGE_ORDER0,
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 19cf7cac1e6e..68fc7d317693 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
/* Initialise XDP queue information */
rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
- rx_queue->core_index);
+ rx_queue->core_index, 0);
if (rc) {
netif_err(efx, rx_err, efx->net_dev,
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 1503cc9ec6e2..80ab24658e87 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
goto err_out;
}
- err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
+ err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, 0);
if (err)
goto err_out;
diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 51cc29f39038..d8f287c88d77 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1189,7 +1189,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
pool = cpsw->page_pool[ch];
rxq = &priv->xdp_rxq[ch];
- ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
+ ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
if (ret)
return ret;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0c3de94b5178..fa8341f8359a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
u64_stats_init(&nvchan->tx_stats.syncp);
u64_stats_init(&nvchan->rx_stats.syncp);
- ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
+ ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
if (ret) {
netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index be69d272052f..f2541d645707 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -791,7 +791,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
} else {
/* Setup XDP RX-queue info, for new tfile getting attached */
err = xdp_rxq_info_reg(&tfile->xdp_rxq,
- tun->dev, tfile->queue_index);
+ tun->dev, tfile->queue_index, 0);
if (err < 0)
goto out;
err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..04d20e9d8431 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -926,7 +926,7 @@ static int veth_enable_xdp(struct net_device *dev)
for (i = 0; i < dev->real_num_rx_queues; i++) {
struct veth_rq *rq = &priv->rq[i];
- err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, 0);
if (err < 0)
goto err_rxq_reg;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..d71fe41595b7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
- err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, 0);
if (err < 0)
return err;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3e9895bec15f..28714a48f5d0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
}
err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
- queue->id);
+ queue->id, 0);
if (err) {
netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
goto err_free_pp;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index b4f653cc15a7..2dd8c2c90c0c 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -135,14 +135,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
sk_rx_queue_set(sk, skb);
}
-/* variant used for unconnected sockets */
-static inline void sk_mark_napi_id_once(struct sock *sk,
- const struct sk_buff *skb)
+static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
{
#ifdef CONFIG_NET_RX_BUSY_POLL
if (!READ_ONCE(sk->sk_napi_id))
- WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
+ WRITE_ONCE(sk->sk_napi_id, napi_id);
#endif
}
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+ const struct sk_buff *skb)
+{
+ __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
+}
+
+static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
+ const struct xdp_buff *xdp)
+{
+ __sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
+}
+
#endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3814fb631d52..4d4255a94773 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -59,6 +59,7 @@ struct xdp_rxq_info {
u32 queue_index;
u32 reg_state;
struct xdp_mem_info mem;
+ unsigned int napi_id;
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
struct xdp_txq_info {
@@ -211,7 +212,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
}
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
- struct net_device *dev, u32 queue_index);
+ struct net_device *dev, u32 queue_index, unsigned int napi_id);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
diff --git a/net/core/dev.c b/net/core/dev.c
index 33c67004f2ad..6635c92935bc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9806,7 +9806,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
rx[i].dev = dev;
/* XDP RX-queue setup */
- err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
+ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
if (err < 0)
goto err_rxq_info;
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 48aba933a5a8..7cca7cb5b65f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
/* Returns 0 on success, negative on failure */
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
- struct net_device *dev, u32 queue_index)
+ struct net_device *dev, u32 queue_index, unsigned int napi_id)
{
if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
WARN(1, "Driver promised not to register this");
@@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
xdp_rxq_info_init(xdp_rxq);
xdp_rxq->dev = dev;
xdp_rxq->queue_index = queue_index;
+ xdp_rxq->napi_id = napi_id;
xdp_rxq->reg_state = REG_STATE_REGISTERED;
return 0;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 00663390a4a8..5c90e5993ddb 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -233,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
return -EINVAL;
+ sk_mark_napi_id_once_xdp(&xs->sk, xdp);
len = xdp->data_end - xdp->data;
return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path
2020-11-12 11:40 ` [PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-11-15 4:48 ` Ilias Apalodimas
0 siblings, 0 replies; 15+ messages in thread
From: Ilias Apalodimas @ 2020-11-15 4:48 UTC (permalink / raw)
To: Björn Töpel
Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi,
intel-wired-lan, netanel, akiyano, michael.chan, sgoutham,
ioana.ciornei, ruxandra.radulescu, thomas.petazzoni, mcroce,
saeedm, tariqt, aelior, ecree, grygorii.strashko, sthemmin, mst,
kda
On Thu, Nov 12, 2020 at 12:40:38PM +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add napi_id to the xdp_rxq_info structure, and make sure the XDP
> socket pick up the napi_id in the Rx path. The napi_id is used to find
> the corresponding NAPI structure for socket busy polling.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> .../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
> .../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
> drivers/net/ethernet/intel/ice/ice_base.c | 4 ++--
> drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
> drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
> .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
> drivers/net/ethernet/marvell/mvneta.c | 2 +-
> .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
> .../ethernet/netronome/nfp/nfp_net_common.c | 2 +-
> drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
> drivers/net/ethernet/sfc/rx_common.c | 2 +-
> drivers/net/ethernet/socionext/netsec.c | 2 +-
> drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
> drivers/net/hyperv/netvsc.c | 2 +-
> drivers/net/tun.c | 2 +-
> drivers/net/veth.c | 2 +-
> drivers/net/virtio_net.c | 2 +-
> drivers/net/xen-netfront.c | 2 +-
> include/net/busy_poll.h | 19 +++++++++++++++----
> include/net/xdp.h | 3 ++-
> net/core/dev.c | 2 +-
> net/core/xdp.c | 3 ++-
> net/xdp/xsk.c | 1 +
> 29 files changed, 47 insertions(+), 33 deletions(-)
>
For the socionext driver
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (5 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 6/9] xsk: propagate napi_id to XDP socket Rx path Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Start using recvfrom() the rxdrop scenario.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 1149e94ca32f..96d0b6482ac4 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1172,7 +1172,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
}
}
-static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
+static void rx_drop(struct xsk_socket_info *xsk)
{
unsigned int rcvd, i;
u32 idx_rx = 0, idx_fq = 0;
@@ -1182,7 +1182,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
if (!rcvd) {
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
- ret = poll(fds, num_socks, opt_timeout);
+ recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
return;
}
@@ -1193,7 +1193,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
exit_with_error(-ret);
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.fill_fail_polls++;
- ret = poll(fds, num_socks, opt_timeout);
+ recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
}
@@ -1235,7 +1235,7 @@ static void rx_drop_all(void)
}
for (i = 0; i < num_socks; i++)
- rx_drop(xsks[i], fds);
+ rx_drop(xsks[i]);
if (benchmark_done)
break;
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (6 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 7/9] samples/bpf: use recvfrom() in xdpsock Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
2020-11-12 11:40 ` [PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget Björn Töpel
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Add a new option to xdpsock, 'B', for busy-polling. This option will
also set the batching size, 'b' option, to the busy-poll budget.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 40 +++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 96d0b6482ac4..8ecacbae7682 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -95,6 +95,7 @@ static int opt_timeout = 1000;
static bool opt_need_wakeup = true;
static u32 opt_num_xsks = 1;
static u32 prog_id;
+static bool opt_busy_poll;
struct xsk_ring_stats {
unsigned long rx_npkts;
@@ -911,6 +912,7 @@ static struct option long_options[] = {
{"quiet", no_argument, 0, 'Q'},
{"app-stats", no_argument, 0, 'a'},
{"irq-string", no_argument, 0, 'I'},
+ {"busy-poll", no_argument, 0, 'B'},
{0, 0, 0, 0}
};
@@ -949,6 +951,7 @@ static void usage(const char *prog)
" -Q, --quiet Do not display any stats.\n"
" -a, --app-stats Display application (syscall) statistics.\n"
" -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n"
+ " -B, --busy-poll Busy poll.\n"
"\n";
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
@@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
- c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
+ c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
long_options, &option_index);
if (c == -1)
break;
@@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
usage(basename(argv[0]));
}
-
+ break;
+ case 'B':
+ opt_busy_poll = 1;
break;
default:
usage(basename(argv[0]));
@@ -1132,7 +1137,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
- if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
xsk->app_stats.fill_fail_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
@@ -1180,7 +1185,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
@@ -1191,7 +1196,7 @@ static void rx_drop(struct xsk_socket_info *xsk)
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.fill_fail_polls++;
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
}
@@ -1342,7 +1347,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
- if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
@@ -1354,7 +1359,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
if (ret < 0)
exit_with_error(-ret);
complete_tx_l2fwd(xsk, fds);
- if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
+ if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
xsk->app_stats.tx_wakeup_sendtos++;
kick_tx(xsk);
}
@@ -1461,6 +1466,24 @@ static void enter_xsks_into_map(struct bpf_object *obj)
}
}
+static void apply_setsockopt(struct xsk_socket_info *xsk)
+{
+ int sock_opt;
+
+ if (!opt_busy_poll)
+ return;
+
+ sock_opt = 1;
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_PREFER_BUSY_POLL,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
+
+ sock_opt = 20;
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
+}
+
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -1502,6 +1525,9 @@ int main(int argc, char **argv)
for (i = 0; i < opt_num_xsks; i++)
xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
+ for (i = 0; i < opt_num_xsks; i++)
+ apply_setsockopt(xsks[i]);
+
if (opt_bench == BENCH_TXONLY) {
gen_eth_hdr_data();
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next 9/9] samples/bpf: add option to set the busy-poll budget
2020-11-12 11:40 [PATCH bpf-next 0/9] Introduce preferred busy-polling Björn Töpel
` (7 preceding siblings ...)
2020-11-12 11:40 ` [PATCH bpf-next 8/9] samples/bpf: add busy-poll support to xdpsock Björn Töpel
@ 2020-11-12 11:40 ` Björn Töpel
8 siblings, 0 replies; 15+ messages in thread
From: Björn Töpel @ 2020-11-12 11:40 UTC (permalink / raw)
To: netdev, bpf
Cc: Björn Töpel, magnus.karlsson, ast, daniel,
maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
qi.z.zhang, kuba, edumazet, jonathan.lemon, maximmi
From: Björn Töpel <bjorn.topel@intel.com>
Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching
option ('b').
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
samples/bpf/xdpsock_user.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 8ecacbae7682..3f87b931c177 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1482,6 +1482,11 @@ static void apply_setsockopt(struct xsk_socket_info *xsk)
if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
(void *)&sock_opt, sizeof(sock_opt)) < 0)
exit_with_error(errno);
+
+ sock_opt = opt_batch_size;
+ if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
+ (void *)&sock_opt, sizeof(sock_opt)) < 0)
+ exit_with_error(errno);
}
int main(int argc, char **argv)
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread