* [PATCH bpf] xsk: mark napi_id on sendmsg()
@ 2022-06-29 10:57 Maciej Fijalkowski
2022-06-29 12:45 ` Björn Töpel
0 siblings, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2022-06-29 10:57 UTC (permalink / raw)
To: bpf, ast, daniel, andrii
Cc: netdev, magnus.karlsson, bjorn, Maciej Fijalkowski
When application runs in zero copy busy poll mode and does not receive a
single packet but only sends them, it is currently impossible to get
into napi_busy_loop() as napi_id is only marked on Rx side in
xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info
carried by xdp_buff. From Tx perspective, we do not have access to it.
What we have handy is the xsk pool.
Xsk pool works on a pool of internal xdp_buff wrappers called
xdp_buff_xsk. AF_XDP ZC enabled drivers call xp_set_rxq_info() so each
of xdp_buff_xsk has a valid pointer to xdp_rxq_info of underlying queue.
Therefore, on Tx side, napi_id can be pulled from
xs->pool->heads[0].xdp.rxq->napi_id.
Do this only for sockets working in ZC mode as otherwise rxq pointers
would not be initialized.
Fixes: a0731952d9cd ("xsk: Add busy-poll support for {recv,send}msg()")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
net/xdp/xsk.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 19ac872a6624..eafd512d38b1 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -637,8 +637,11 @@ static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len
if (unlikely(need_wait))
return -EOPNOTSUPP;
- if (sk_can_busy_loop(sk))
+ if (sk_can_busy_loop(sk)) {
+ if (xs->zc)
+ __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+ }
if (xs->zc && xsk_no_wakeup(sk))
return 0;
--
2.27.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 10:57 [PATCH bpf] xsk: mark napi_id on sendmsg() Maciej Fijalkowski
@ 2022-06-29 12:45 ` Björn Töpel
2022-06-29 12:53 ` Maciej Fijalkowski
2022-06-29 13:39 ` Björn Töpel
0 siblings, 2 replies; 10+ messages in thread
From: Björn Töpel @ 2022-06-29 12:45 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Netdev, Karlsson, Magnus
On Wed, 29 Jun 2022 at 12:58, Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> When application runs in zero copy busy poll mode and does not receive a
> single packet but only sends them, it is currently impossible to get
> into napi_busy_loop() as napi_id is only marked on Rx side in
> xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info
> carried by xdp_buff. From Tx perspective, we do not have access to it.
> What we have handy is the xsk pool.
The fact that the napi_id is not set unless set from the ingress side
is actually "by design". It's CONFIG_NET_RX_BUSY_POLL after all. I
followed the semantics of the regular busy-polling sockets. So, I
wouldn't say it's a fix! The busy-polling in sendmsg is really just
about "driving the RX busy-polling from another socket syscall".
That being said, I definitely see that this is useful for AF_XDP
sockets, but keep in mind that it sort of changes the behavior from
regular sockets. And we'll get different behavior for
copy-mode/zero-copy mode.
TL;DR, I think it's a good addition. One small nit below:
> + __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
Please hide this hideous pointer chasing in something neater:
xsk_pool_get_napi_id() or something.
Björn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 12:45 ` Björn Töpel
@ 2022-06-29 12:53 ` Maciej Fijalkowski
2022-06-29 13:18 ` Magnus Karlsson
2022-06-29 16:16 ` Jakub Kicinski
2022-06-29 13:39 ` Björn Töpel
1 sibling, 2 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2022-06-29 12:53 UTC (permalink / raw)
To: Björn Töpel
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Netdev, Karlsson, Magnus
On Wed, Jun 29, 2022 at 02:45:11PM +0200, Björn Töpel wrote:
> On Wed, 29 Jun 2022 at 12:58, Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > When application runs in zero copy busy poll mode and does not receive a
> > single packet but only sends them, it is currently impossible to get
> > into napi_busy_loop() as napi_id is only marked on Rx side in
> > xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info
> > carried by xdp_buff. From Tx perspective, we do not have access to it.
> > What we have handy is the xsk pool.
>
> The fact that the napi_id is not set unless set from the ingress side
> is actually "by design". It's CONFIG_NET_RX_BUSY_POLL after all. I
> followed the semantics of the regular busy-polling sockets. So, I
> wouldn't say it's a fix! The busy-polling in sendmsg is really just
> about "driving the RX busy-polling from another socket syscall".
I just felt that busy polling for txonly apps was broken, hence the
'fixing' flavour. I can send it just as improvement to bpf-next.
>
> That being said, I definitely see that this is useful for AF_XDP
> sockets, but keep in mind that it sort of changes the behavior from
> regular sockets. And we'll get different behavior for
> copy-mode/zero-copy mode.
>
> TL;DR, I think it's a good addition. One small nit below:
>
> > + __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
>
> Please hide this hideous pointer chasing in something neater:
> xsk_pool_get_napi_id() or something.
Would it make sense to introduce napi_id to xsk_buff_pool then?
xp_set_rxq_info() could be setting it. We are sure that napi_id is the
same for whole pool (each xdp_buff_xsk's rxq info).
>
>
> Björn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 12:53 ` Maciej Fijalkowski
@ 2022-06-29 13:18 ` Magnus Karlsson
2022-06-29 16:16 ` Jakub Kicinski
1 sibling, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2022-06-29 13:18 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Björn Töpel, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Netdev, Karlsson, Magnus
On Wed, Jun 29, 2022 at 2:58 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Jun 29, 2022 at 02:45:11PM +0200, Björn Töpel wrote:
> > On Wed, 29 Jun 2022 at 12:58, Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > When application runs in zero copy busy poll mode and does not receive a
> > > single packet but only sends them, it is currently impossible to get
> > > into napi_busy_loop() as napi_id is only marked on Rx side in
> > > xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info
> > > carried by xdp_buff. From Tx perspective, we do not have access to it.
> > > What we have handy is the xsk pool.
> >
> > The fact that the napi_id is not set unless set from the ingress side
> > is actually "by design". It's CONFIG_NET_RX_BUSY_POLL after all. I
> > followed the semantics of the regular busy-polling sockets. So, I
> > wouldn't say it's a fix! The busy-polling in sendmsg is really just
> > about "driving the RX busy-polling from another socket syscall".
>
> I just felt that busy polling for txonly apps was broken, hence the
> 'fixing' flavour. I can send it just as improvement to bpf-next.
>
> >
> > That being said, I definitely see that this is useful for AF_XDP
> > sockets, but keep in mind that it sort of changes the behavior from
> > regular sockets. And we'll get different behavior for
> > copy-mode/zero-copy mode.
> >
> > TL;DR, I think it's a good addition. One small nit below:
> >
> > > + __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
> >
> > Please hide this hideous pointer chasing in something neater:
> > xsk_pool_get_napi_id() or something.
>
> Would it make sense to introduce napi_id to xsk_buff_pool then?
Only if it has a positive performance impact. Let us not carry copies
of state otherwise. So please measure first and see if it makes any
difference. If not, I prefer the pointer chasing hidden behind an API
like Björn suggests.
> xp_set_rxq_info() could be setting it. We are sure that napi_id is the
> same for whole pool (each xdp_buff_xsk's rxq info).
>
> >
> >
> > Björn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 12:45 ` Björn Töpel
2022-06-29 12:53 ` Maciej Fijalkowski
@ 2022-06-29 13:39 ` Björn Töpel
2022-06-29 14:28 ` Maciej Fijalkowski
1 sibling, 1 reply; 10+ messages in thread
From: Björn Töpel @ 2022-06-29 13:39 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Netdev, Karlsson, Magnus
On Wed, 29 Jun 2022 at 14:45, Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> TL;DR, I think it's a good addition. One small nit below:
>
I forgot one thing. Setting napi_id should be gated by
"CONFIG_NET_RX_BUSY_POLL" ifdefs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 13:39 ` Björn Töpel
@ 2022-06-29 14:28 ` Maciej Fijalkowski
0 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2022-06-29 14:28 UTC (permalink / raw)
To: Björn Töpel
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Netdev, Karlsson, Magnus
On Wed, Jun 29, 2022 at 03:39:57PM +0200, Björn Töpel wrote:
> On Wed, 29 Jun 2022 at 14:45, Björn Töpel <bjorn.topel@gmail.com> wrote:
> >
> > TL;DR, I think it's a good addition. One small nit below:
> >
>
> I forgot one thing. Setting napi_id should be gated by
> "CONFIG_NET_RX_BUSY_POLL" ifdefs.
napi id marking functions are wrapped inside with this ifdef
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 12:53 ` Maciej Fijalkowski
2022-06-29 13:18 ` Magnus Karlsson
@ 2022-06-29 16:16 ` Jakub Kicinski
2022-06-29 16:17 ` Jakub Kicinski
1 sibling, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2022-06-29 16:16 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Björn Töpel, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Netdev, Karlsson, Magnus
On Wed, 29 Jun 2022 14:53:34 +0200 Maciej Fijalkowski wrote:
> > > + __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
> >
> > Please hide this hideous pointer chasing in something neater:
> > xsk_pool_get_napi_id() or something.
>
> Would it make sense to introduce napi_id to xsk_buff_pool then?
> xp_set_rxq_info() could be setting it. We are sure that napi_id is the
> same for whole pool (each xdp_buff_xsk's rxq info).
Would it be possible to move the marking to when the queue is getting
bound instead of the recv/send paths?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 16:16 ` Jakub Kicinski
@ 2022-06-29 16:17 ` Jakub Kicinski
2022-06-30 11:53 ` Maciej Fijalkowski
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2022-06-29 16:17 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Björn Töpel, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Netdev, Karlsson, Magnus
On Wed, 29 Jun 2022 09:16:29 -0700 Jakub Kicinski wrote:
> > Would it make sense to introduce napi_id to xsk_buff_pool then?
> > xp_set_rxq_info() could be setting it. We are sure that napi_id is the
> > same for whole pool (each xdp_buff_xsk's rxq info).
>
> Would it be possible to move the marking to when the queue is getting
> bound instead of the recv/send paths?
I mean when socket is getting bound.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-29 16:17 ` Jakub Kicinski
@ 2022-06-30 11:53 ` Maciej Fijalkowski
2022-06-30 15:52 ` Jakub Kicinski
0 siblings, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2022-06-30 11:53 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Björn Töpel, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Netdev, Karlsson, Magnus
On Wed, Jun 29, 2022 at 05:17:07PM +0100, Jakub Kicinski wrote:
> On Wed, 29 Jun 2022 09:16:29 -0700 Jakub Kicinski wrote:
> > > Would it make sense to introduce napi_id to xsk_buff_pool then?
> > > xp_set_rxq_info() could be setting it. We are sure that napi_id is the
> > > same for whole pool (each xdp_buff_xsk's rxq info).
> >
> > Would it be possible to move the marking to when the queue is getting
> > bound instead of the recv/send paths?
>
> I mean when socket is getting bound.
So Bjorn said that it was the design choice to follow the standard
sockets' approach. I'm including a dirty diff for a discussion which
allows me to get napi_id at bind() time. But, this works for ice as this
driver during the XDP prog/XSK pool load only disables the NAPI, so we are
sure that napi_id stays the same. That might not be the case for other
AF_XDP ZC enabled drivers though, they might delete the NAPI and this
approach wouldn't work...or am I missing something?
I'd prefer the diff below though as it simplifies the data path, but I
can't say if it's safe to do so. We would have to be sure about drivers
keeping their NAPI struct. This would also allow us to drop napi_id from
xdp_rxq_info.
Thoughts?
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 49ba8bfdbf04..3d084558628e 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -312,6 +312,7 @@ ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
return err;
set_bit(qid, vsi->af_xdp_zc_qps);
+ xsk_pool_set_napi_id(pool, vsi->rx_rings[qid]->q_vector->napi.napi_id);
return 0;
}
@@ -348,7 +349,6 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) :
ice_xsk_pool_disable(vsi, qid);
-
xsk_pool_if_up:
if (if_running) {
ret = ice_qp_ena(vsi, qid);
@@ -358,6 +358,7 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
}
+
failure:
if (pool_failure) {
netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n",
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4aa031849668..1a20320cd556 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -44,6 +44,14 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
xp_set_rxq_info(pool, rxq);
}
+static inline void xsk_pool_set_napi_id(struct xsk_buff_pool *pool,
+ unsigned int napi_id)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ xp_set_napi_id(pool, napi_id);
+#endif
+}
+
static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs)
{
@@ -198,6 +206,11 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
{
}
+static inline void xsk_pool_set_napi_id(struct xsk_buff_pool *pool,
+ unsigned int napi_id)
+{
+}
+
static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs)
{
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 647722e847b4..60775a8d1bcb 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -70,6 +70,7 @@ struct xsk_buff_pool {
u32 chunk_size;
u32 chunk_shift;
u32 frame_len;
+ unsigned int napi_id;
u8 cached_need_wakeup;
bool uses_need_wakeup;
bool dma_need_sync;
@@ -125,6 +126,7 @@ static inline void xp_init_xskb_dma(struct xdp_buff_xsk *xskb, struct xsk_buff_p
/* AF_XDP ZC drivers, via xdp_sock_buff.h */
void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq);
+void xp_set_napi_id(struct xsk_buff_pool *pool, unsigned int napi_id);
int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
unsigned long attrs, struct page **pages, u32 nr_pages);
void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index eafd512d38b1..18ac3f32a48d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -222,7 +222,6 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
return -EINVAL;
- sk_mark_napi_id_once_xdp(&xs->sk, xdp);
return 0;
}
@@ -637,11 +636,8 @@ static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len
if (unlikely(need_wait))
return -EOPNOTSUPP;
- if (sk_can_busy_loop(sk)) {
- if (xs->zc)
- __sk_mark_napi_id_once(sk, xs->pool->heads[0].xdp.rxq->napi_id);
+ if (sk_can_busy_loop(sk))
sk_busy_loop(sk, 1); /* only support non-blocking sockets */
- }
if (xs->zc && xsk_no_wakeup(sk))
return 0;
@@ -1015,6 +1011,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
xs->dev = dev;
xs->zc = xs->umem->zc;
xs->queue_id = qid;
+ if (xs->zc)
+ xs->sk.sk_napi_id = xs->pool->napi_id;
xp_add_xsk(xs->pool, xs);
out_unlock:
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 87bdd71c7bb6..5ec6443be3fc 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -121,6 +121,14 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
}
EXPORT_SYMBOL(xp_set_rxq_info);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+void xp_set_napi_id(struct xsk_buff_pool *pool, unsigned int napi_id)
+{
+ pool->napi_id = napi_id;
+}
+EXPORT_SYMBOL(xp_set_napi_id);
+#endif
+
static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
{
struct netdev_bpf bpf;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH bpf] xsk: mark napi_id on sendmsg()
2022-06-30 11:53 ` Maciej Fijalkowski
@ 2022-06-30 15:52 ` Jakub Kicinski
0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2022-06-30 15:52 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Björn Töpel, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Netdev, Karlsson, Magnus
On Thu, 30 Jun 2022 13:53:11 +0200 Maciej Fijalkowski wrote:
> > > Would it be possible to move the marking to when the queue is getting
> > > bound instead of the recv/send paths?
> >
> > I mean when socket is getting bound.
>
> So Bjorn said that it was the design choice to follow the standard
> sockets' approach. I'm including a dirty diff for a discussion which
> allows me to get napi_id at bind() time. But, this works for ice as this
> driver during the XDP prog/XSK pool load only disables the NAPI, so we are
> sure that napi_id stays the same. That might not be the case for other
> AF_XDP ZC enabled drivers though, they might delete the NAPI and this
> approach wouldn't work...or am I missing something?
Possible, but IDK if we're changing anything substantially in that
regard. The existing code already uses sk_mark_napi_id_once() so
if the NAPI ID changes during the lifetime of the socket the user
will be out of luck already. But no strong feelings.
> I'd prefer the diff below though as it simplifies the data path, but I
> can't say if it's safe to do so. We would have to be sure about drivers
> keeping their NAPI struct. This would also allow us to drop napi_id from
> xdp_rxq_info.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-06-30 15:52 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 10:57 [PATCH bpf] xsk: mark napi_id on sendmsg() Maciej Fijalkowski
2022-06-29 12:45 ` Björn Töpel
2022-06-29 12:53 ` Maciej Fijalkowski
2022-06-29 13:18 ` Magnus Karlsson
2022-06-29 16:16 ` Jakub Kicinski
2022-06-29 16:17 ` Jakub Kicinski
2022-06-30 11:53 ` Maciej Fijalkowski
2022-06-30 15:52 ` Jakub Kicinski
2022-06-29 13:39 ` Björn Töpel
2022-06-29 14:28 ` Maciej Fijalkowski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.