All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] Add xdp_update_skb_shared_info utility routine
@ 2021-07-09 11:10 Lorenzo Bianconi
  2021-07-09 11:10 ` [PATCH bpf-next 1/2] net: skbuff: add xdp_frags_tsize field to skb_shared_info Lorenzo Bianconi
  2021-07-09 11:10 ` [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
  0 siblings, 2 replies; 8+ messages in thread
From: Lorenzo Bianconi @ 2021-07-09 11:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, lorenzo.bianconi, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce xdp_update_skb_shared_info routine to update frags array metadata
from a given xdp_buffer/xdp_frame.
Add xdp_frags_tsize field to skb_shared_info.

Lorenzo Bianconi (2):
  net: skbuff: add xdp_frags_tsize field to skb_shared_info
  net: xdp: add xdp_update_skb_shared_info utility routine

 drivers/net/ethernet/marvell/mvneta.c | 29 +++++++++++++++------------
 include/linux/skbuff.h                |  2 ++
 include/net/xdp.h                     |  3 +++
 net/core/xdp.c                        | 27 +++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 13 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 1/2] net: skbuff: add xdp_frags_tsize field to skb_shared_info
  2021-07-09 11:10 [PATCH bpf-next 0/2] Add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
@ 2021-07-09 11:10 ` Lorenzo Bianconi
  2021-07-09 11:10 ` [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
  1 sibling, 0 replies; 8+ messages in thread
From: Lorenzo Bianconi @ 2021-07-09 11:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, lorenzo.bianconi, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

xdp_frags_tsize field will be used to store paged frame
truesize for xdp_buff/xdp_frame. In order to not increase
skb_shared_info size we will use a hole due to skb_shared_info
alignment.
gso_type filed will be used to store paged frame size.
This is a preliminary patch to properly support xdp multi-buff

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/skbuff.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f19190820e63..0e345c5ccb4f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -522,6 +522,7 @@ struct skb_shared_info {
 	unsigned short	gso_segs;
 	struct sk_buff	*frag_list;
 	struct skb_shared_hwtstamps hwtstamps;
+	/* used for xdp_{buff,frame} paged size */
 	unsigned int	gso_type;
 	u32		tskey;
 
@@ -529,6 +530,7 @@ struct skb_shared_info {
 	 * Warning : all fields before dataref are cleared in __alloc_skb()
 	 */
 	atomic_t	dataref;
+	unsigned int	xdp_frags_tsize;
 
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-09 11:10 [PATCH bpf-next 0/2] Add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
  2021-07-09 11:10 ` [PATCH bpf-next 1/2] net: skbuff: add xdp_frags_tsize field to skb_shared_info Lorenzo Bianconi
@ 2021-07-09 11:10 ` Lorenzo Bianconi
  2021-07-12 18:46   ` John Fastabend
  1 sibling, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2021-07-09 11:10 UTC (permalink / raw)
  To: bpf
  Cc: netdev, lorenzo.bianconi, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce xdp_update_skb_shared_info routine to update frags array
metadata from a given xdp_buffer/xdp_frame. We do not need to reset
frags array since it is already initialized by the driver.
Rely on xdp_update_skb_shared_info in mvneta driver.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 29 +++++++++++++++------------
 include/net/xdp.h                     |  3 +++
 net/core/xdp.c                        | 27 +++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 361bc4fbe20b..abf2e50880e0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 	rx_desc->buf_phys_addr = 0;
 
 	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
-		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
+		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
 
 		skb_frag_off_set(frag, pp->rx_offset_correction);
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
+		/* We don't need to reset pp_recycle here. It's already set, so
+		 * just mark fragments for recycling.
+		 */
+		page_pool_store_mem_info(page, rxq->page_pool);
+
+		/* first fragment */
+		if (!xdp_sinfo->nr_frags)
+			xdp_sinfo->gso_type = *size;
+		xdp_sinfo->nr_frags++;
 
 		/* last fragment */
 		if (len == *size) {
 			struct skb_shared_info *sinfo;
 
 			sinfo = xdp_get_shared_info_from_buff(xdp);
+			sinfo->xdp_frags_tsize = xdp_sinfo->nr_frags * PAGE_SIZE;
 			sinfo->nr_frags = xdp_sinfo->nr_frags;
+			sinfo->gso_type = xdp_sinfo->gso_type;
 			memcpy(sinfo->frags, xdp_sinfo->frags,
 			       sinfo->nr_frags * sizeof(skb_frag_t));
 		}
@@ -2320,7 +2331,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = sinfo->nr_frags;
+	int num_frags = sinfo->nr_frags, size = sinfo->gso_type;
 	struct sk_buff *skb;
 
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
@@ -2333,17 +2344,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
 
-	for (i = 0; i < num_frags; i++) {
-		skb_frag_t *frag = &sinfo->frags[i];
-
-		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-				skb_frag_page(frag), skb_frag_off(frag),
-				skb_frag_size(frag), PAGE_SIZE);
-		/* We don't need to reset pp_recycle here. It's already set, so
-		 * just mark fragments for recycling.
-		 */
-		page_pool_store_mem_info(skb_frag_page(frag), pool);
-	}
+	if (num_frags)
+		xdp_update_skb_shared_info(skb, num_frags, size,
+					   sinfo->xdp_frags_tsize, sinfo);
 
 	return skb;
 }
diff --git a/include/net/xdp.h b/include/net/xdp.h
index ad5b02dcb6f4..08d151ea8400 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -164,6 +164,9 @@ void xdp_warn(const char *msg, const char *func, const int line);
 #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
 
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
+void xdp_update_skb_shared_info(struct sk_buff *skb, int nr_frags,
+				int size, int truesize,
+				struct skb_shared_info *sinfo);
 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index cc92ccb38432..3f44c69e1f56 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -527,6 +527,33 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
 }
 EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
 
+void xdp_update_skb_shared_info(struct sk_buff *skb, int nr_frags,
+				int size, int truesize,
+				struct skb_shared_info *sinfo)
+{
+	int i;
+
+	skb_shinfo(skb)->nr_frags = nr_frags;
+
+	skb->len += size;
+	skb->data_len += size;
+	skb->truesize += truesize;
+
+	if (skb->pfmemalloc)
+		return;
+
+	for (i = 0; i < nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		page = compound_head(page);
+		if (page_is_pfmemalloc(page)) {
+			skb->pfmemalloc = true;
+			break;
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(xdp_update_skb_shared_info);
+
 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-09 11:10 ` [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
@ 2021-07-12 18:46   ` John Fastabend
  2021-07-12 20:22     ` Lorenzo Bianconi
  0 siblings, 1 reply; 8+ messages in thread
From: John Fastabend @ 2021-07-12 18:46 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf
  Cc: netdev, lorenzo.bianconi, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

Lorenzo Bianconi wrote:
> Introduce xdp_update_skb_shared_info routine to update frags array
> metadata from a given xdp_buffer/xdp_frame. We do not need to reset
> frags array since it is already initialized by the driver.
> Rely on xdp_update_skb_shared_info in mvneta driver.

Some more context here would really help. I had to jump into the mvneta
driver to see what is happening.

So as I read this we have a loop processing the descriptor in
mvneta_rx_swbm()

 mvneta_rx_swbm()
   while (rx_proc < budget && rx_proc < rx_todo) {
     if (rx_status & MVNETA_RXD_FIRST_DESC) ...
     else {
       mvneta_swbm_add_rx_fragment()
     }
     ..
     if (!rx_status & MVNETA_RXD_LAST_DESC)
         continue;
     ..
     if (xdp_prog)
       mvneta_run_xdp(...)
   }

roughly looking like above. First question, do you ever hit
!MVNETA_RXD_LAST_DESC today? I assume this is avoided by hardware
setup when XDP is enabled, otherwise _run_xdp() would be
broken correct? Next question, given last descriptor bit
logic whats the condition to hit the code added in this patch?
wouldn't we need more than 1 descriptor and then we would
skip the xdp_run... sorry lost me and its probably easier
to let you give the flow vs spending an hour trying to
track it down.

But, in theory as you handle a hardware discriptor you can build
up a set of pages using them to create a single skb rather than a
skb per descriptor. But don't we know if pfmemalloc should be
done while we are building the frag list? Can't se just set it
vs this for loop in xdp_update_skb_shared_info(),

> +	for (i = 0; i < nr_frags; i++) {
> +		struct page *page = skb_frag_page(&sinfo->frags[i]);
> +
> +		page = compound_head(page);
> +		if (page_is_pfmemalloc(page)) {
> +			skb->pfmemalloc = true;
> +			break;
> +		}
> +	}
> +}

...

> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 361bc4fbe20b..abf2e50880e0 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
>  	rx_desc->buf_phys_addr = 0;
>  
>  	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> -		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
> +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
>  
>  		skb_frag_off_set(frag, pp->rx_offset_correction);
>  		skb_frag_size_set(frag, data_len);
>  		__skb_frag_set_page(frag, page);
> +		/* We don't need to reset pp_recycle here. It's already set, so
> +		 * just mark fragments for recycling.
> +		 */
> +		page_pool_store_mem_info(page, rxq->page_pool);
> +
> +		/* first fragment */
> +		if (!xdp_sinfo->nr_frags)
> +			xdp_sinfo->gso_type = *size;

Would be nice to also change 'int size' -> 'unsigned int size' so the
types matched. Presumably you really can't have a negative size.

Also how about giving gso_type a better name. xdp_sinfo->size maybe?


> +		xdp_sinfo->nr_frags++;
>  
>  		/* last fragment */
>  		if (len == *size) {
>  			struct skb_shared_info *sinfo;
>  
>  			sinfo = xdp_get_shared_info_from_buff(xdp);
> +			sinfo->xdp_frags_tsize = xdp_sinfo->nr_frags * PAGE_SIZE;
>  			sinfo->nr_frags = xdp_sinfo->nr_frags;
> +			sinfo->gso_type = xdp_sinfo->gso_type;
>  			memcpy(sinfo->frags, xdp_sinfo->frags,
>  			       sinfo->nr_frags * sizeof(skb_frag_t));
>  		}

Thanks,
John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-12 18:46   ` John Fastabend
@ 2021-07-12 20:22     ` Lorenzo Bianconi
  2021-07-14 23:44       ` John Fastabend
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2021-07-12 20:22 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 5098 bytes --]

> Lorenzo Bianconi wrote:
> > Introduce xdp_update_skb_shared_info routine to update frags array
> > metadata from a given xdp_buffer/xdp_frame. We do not need to reset
> > frags array since it is already initialized by the driver.
> > Rely on xdp_update_skb_shared_info in mvneta driver.
> 
> Some more context here would really help. I had to jump into the mvneta
> driver to see what is happening.

Hi John,

ack, you are right. I will add more context next time. Sorry for the noise.

> 
> So as I read this we have a loop processing the descriptor in
> mvneta_rx_swbm()
> 
>  mvneta_rx_swbm()
>    while (rx_proc < budget && rx_proc < rx_todo) {
>      if (rx_status & MVNETA_RXD_FIRST_DESC) ...
>      else {
>        mvneta_swbm_add_rx_fragment()
>      }
>      ..
>      if (!rx_status & MVNETA_RXD_LAST_DESC)
>          continue;
>      ..
>      if (xdp_prog)
>        mvneta_run_xdp(...)
>    }
> 
> roughly looking like above. First question, do you ever hit
> !MVNETA_RXD_LAST_DESC today? I assume this is avoided by hardware
> setup when XDP is enabled, otherwise _run_xdp() would be
> broken correct? Next question, given last descriptor bit
> logic whats the condition to hit the code added in this patch?
> wouldn't we need more than 1 descriptor and then we would
> skip the xdp_run... sorry lost me and its probably easier
> to let you give the flow vs spending an hour trying to
> track it down.

I will point it out in the new commit log, but this is a preliminary patch for
xdp multi-buff support. In the current codebase xdp_update_skb_shared_info()
is run just when the NIC is not running in XDP mode (please note
mvneta_swbm_add_rx_fragment() is run even if xdp_prog is NULL).
When we add xdp multi-buff support, xdp_update_skb_shared_info() will run even
in XDP mode since we will remove the MTU constraint.

In the current codebsae the following condition can occur in non-XDP mode if
the packet is split on 3 or more descriptors (e.g. MTU 9000):

if (!(rx_status & MVNETA_RXD_LAST_DESC))
   continue;

> 
> But, in theory as you handle a hardware discriptor you can build
> up a set of pages using them to create a single skb rather than a
> skb per descriptor. But don't we know if pfmemalloc should be
> done while we are building the frag list? Can't se just set it
> vs this for loop in xdp_update_skb_shared_info(),

I added pfmemalloc code in xdp_update_skb_shared_info() in order to reuse it
for the xdp_redirect use-case (e.g. whenever we redirect a xdp multi-buff
in a veth or in a cpumap). I have a pending patch where I am using
xdp_update_skb_shared_info in __xdp_build_skb_from_frame().

> 
> > +	for (i = 0; i < nr_frags; i++) {
> > +		struct page *page = skb_frag_page(&sinfo->frags[i]);
> > +
> > +		page = compound_head(page);
> > +		if (page_is_pfmemalloc(page)) {
> > +			skb->pfmemalloc = true;
> > +			break;
> > +		}
> > +	}
> > +}
> 
> ...
> 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index 361bc4fbe20b..abf2e50880e0 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
> >  	rx_desc->buf_phys_addr = 0;
> >  
> >  	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> > -		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
> > +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
> >  
> >  		skb_frag_off_set(frag, pp->rx_offset_correction);
> >  		skb_frag_size_set(frag, data_len);
> >  		__skb_frag_set_page(frag, page);
> > +		/* We don't need to reset pp_recycle here. It's already set, so
> > +		 * just mark fragments for recycling.
> > +		 */
> > +		page_pool_store_mem_info(page, rxq->page_pool);
> > +
> > +		/* first fragment */
> > +		if (!xdp_sinfo->nr_frags)
> > +			xdp_sinfo->gso_type = *size;
> 
> Would be nice to also change 'int size' -> 'unsigned int size' so the
> types matched. Presumably you really can't have a negative size.
> 

ack

> Also how about giving gso_type a better name. xdp_sinfo->size maybe?

I did it in this way in order to avoid adding a union in skb_shared_info.
What about adding an inline helper to set/get it? e.g.

static inline u32 xdp_get_data_len(struct skb_shared_info *sinfo)
{
    return sinfo->gso_type;
}

static inline void xdp_set_data_len(struct skb_shared_info *sinfo, u32 datalen)
{
    sinfo->gso_type = datalen;
}

Regards,
Lorenzo

> 
> 
> > +		xdp_sinfo->nr_frags++;
> >  
> >  		/* last fragment */
> >  		if (len == *size) {
> >  			struct skb_shared_info *sinfo;
> >  
> >  			sinfo = xdp_get_shared_info_from_buff(xdp);
> > +			sinfo->xdp_frags_tsize = xdp_sinfo->nr_frags * PAGE_SIZE;
> >  			sinfo->nr_frags = xdp_sinfo->nr_frags;
> > +			sinfo->gso_type = xdp_sinfo->gso_type;
> >  			memcpy(sinfo->frags, xdp_sinfo->frags,
> >  			       sinfo->nr_frags * sizeof(skb_frag_t));
> >  		}
> 
> Thanks,
> John
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-12 20:22     ` Lorenzo Bianconi
@ 2021-07-14 23:44       ` John Fastabend
  2021-07-15 23:27         ` Lorenzo Bianconi
  0 siblings, 1 reply; 8+ messages in thread
From: John Fastabend @ 2021-07-14 23:44 UTC (permalink / raw)
  To: Lorenzo Bianconi, John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

Lorenzo Bianconi wrote:
> > Lorenzo Bianconi wrote:
> > > Introduce xdp_update_skb_shared_info routine to update frags array
> > > metadata from a given xdp_buffer/xdp_frame. We do not need to reset
> > > frags array since it is already initialized by the driver.
> > > Rely on xdp_update_skb_shared_info in mvneta driver.
> > 
> > Some more context here would really help. I had to jump into the mvneta
> > driver to see what is happening.
> 
> Hi John,
> 
> ack, you are right. I will add more context next time. Sorry for the noise.
> 
> > 
> > So as I read this we have a loop processing the descriptor in
> > mvneta_rx_swbm()
> > 
> >  mvneta_rx_swbm()
> >    while (rx_proc < budget && rx_proc < rx_todo) {
> >      if (rx_status & MVNETA_RXD_FIRST_DESC) ...
> >      else {
> >        mvneta_swbm_add_rx_fragment()
> >      }
> >      ..
> >      if (!rx_status & MVNETA_RXD_LAST_DESC)
> >          continue;
> >      ..
> >      if (xdp_prog)
> >        mvneta_run_xdp(...)
> >    }
> > 
> > roughly looking like above. First question, do you ever hit
> > !MVNETA_RXD_LAST_DESC today? I assume this is avoided by hardware
> > setup when XDP is enabled, otherwise _run_xdp() would be
> > broken correct? Next question, given last descriptor bit
> > logic whats the condition to hit the code added in this patch?
> > wouldn't we need more than 1 descriptor and then we would
> > skip the xdp_run... sorry lost me and its probably easier
> > to let you give the flow vs spending an hour trying to
> > track it down.
> 
> I will point it out in the new commit log, but this is a preliminary patch for
> xdp multi-buff support. In the current codebase xdp_update_skb_shared_info()
> is run just when the NIC is not running in XDP mode (please note
> mvneta_swbm_add_rx_fragment() is run even if xdp_prog is NULL).
> When we add xdp multi-buff support, xdp_update_skb_shared_info() will run even
> in XDP mode since we will remove the MTU constraint.
> 
> In the current codebsae the following condition can occur in non-XDP mode if
> the packet is split on 3 or more descriptors (e.g. MTU 9000):
> 
> if (!(rx_status & MVNETA_RXD_LAST_DESC))
>    continue;

But, as is there is no caller of xdp_update_skb_shared_info() so
I think we should move the these two patches into the series with
the multibuf support.

> 
> > 
> > But, in theory as you handle a hardware discriptor you can build
> > up a set of pages using them to create a single skb rather than a
> > skb per descriptor. But don't we know if pfmemalloc should be
> > done while we are building the frag list? Can't se just set it
> > vs this for loop in xdp_update_skb_shared_info(),
> 
> I added pfmemalloc code in xdp_update_skb_shared_info() in order to reuse it
> for the xdp_redirect use-case (e.g. whenever we redirect a xdp multi-buff
> in a veth or in a cpumap). I have a pending patch where I am using
> xdp_update_skb_shared_info in __xdp_build_skb_from_frame().

OK, but it adds an extra for loop and the related overhead. Can
we avoid this overhead and just set it from where we first
know we have a compound page. Or carry some bit through and
do a simpler check,

 if (pfmemalloc_needed) skb->pfmemalloc = true;

I guess in the case here its building the skb so performance is maybe
not as critical, but if it gets used in the redirect case then we
shouldn't be doing unnecessary for loops.

> 
> > 
> > > +	for (i = 0; i < nr_frags; i++) {
> > > +		struct page *page = skb_frag_page(&sinfo->frags[i]);
> > > +
> > > +		page = compound_head(page);
> > > +		if (page_is_pfmemalloc(page)) {
> > > +			skb->pfmemalloc = true;
> > > +			break;
> > > +		}
> > > +	}
> > > +}
> > 
> > ...
> > 
> > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > index 361bc4fbe20b..abf2e50880e0 100644
> > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > @@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
> > >  	rx_desc->buf_phys_addr = 0;
> > >  
> > >  	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> > > -		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
> > > +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
> > >  
> > >  		skb_frag_off_set(frag, pp->rx_offset_correction);
> > >  		skb_frag_size_set(frag, data_len);
> > >  		__skb_frag_set_page(frag, page);
> > > +		/* We don't need to reset pp_recycle here. It's already set, so
> > > +		 * just mark fragments for recycling.
> > > +		 */
> > > +		page_pool_store_mem_info(page, rxq->page_pool);
> > > +
> > > +		/* first fragment */
> > > +		if (!xdp_sinfo->nr_frags)
> > > +			xdp_sinfo->gso_type = *size;
> > 
> > Would be nice to also change 'int size' -> 'unsigned int size' so the
> > types matched. Presumably you really can't have a negative size.
> > 
> 
> ack
> 
> > Also how about giving gso_type a better name. xdp_sinfo->size maybe?
> 
> I did it in this way in order to avoid adding a union in skb_shared_info.
> What about adding an inline helper to set/get it? e.g.

What was wrong with the union?

> 
> static inline u32 xdp_get_data_len(struct skb_shared_info *sinfo)
> {
>     return sinfo->gso_type;
> }
> 
> static inline void xdp_set_data_len(struct skb_shared_info *sinfo, u32 datalen)
> {
>     sinfo->gso_type = datalen;
> }
> 
> Regards,
> Lorenzo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-14 23:44       ` John Fastabend
@ 2021-07-15 23:27         ` Lorenzo Bianconi
  2021-07-16  3:05           ` John Fastabend
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2021-07-15 23:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 6394 bytes --]

> Lorenzo Bianconi wrote:
> > > Lorenzo Bianconi wrote:
> > > > Introduce xdp_update_skb_shared_info routine to update frags array
> > > > metadata from a given xdp_buffer/xdp_frame. We do not need to reset
> > > > frags array since it is already initialized by the driver.
> > > > Rely on xdp_update_skb_shared_info in mvneta driver.
> > > 
> > > Some more context here would really help. I had to jump into the mvneta
> > > driver to see what is happening.
> > 
> > Hi John,
> > 
> > ack, you are right. I will add more context next time. Sorry for the noise.
> > 
> > > 
> > > So as I read this we have a loop processing the descriptor in
> > > mvneta_rx_swbm()
> > > 
> > >  mvneta_rx_swbm()
> > >    while (rx_proc < budget && rx_proc < rx_todo) {
> > >      if (rx_status & MVNETA_RXD_FIRST_DESC) ...
> > >      else {
> > >        mvneta_swbm_add_rx_fragment()
> > >      }
> > >      ..
> > >      if (!rx_status & MVNETA_RXD_LAST_DESC)
> > >          continue;
> > >      ..
> > >      if (xdp_prog)
> > >        mvneta_run_xdp(...)
> > >    }
> > > 
> > > roughly looking like above. First question, do you ever hit
> > > !MVNETA_RXD_LAST_DESC today? I assume this is avoided by hardware
> > > setup when XDP is enabled, otherwise _run_xdp() would be
> > > broken correct? Next question, given last descriptor bit
> > > logic whats the condition to hit the code added in this patch?
> > > wouldn't we need more than 1 descriptor and then we would
> > > skip the xdp_run... sorry lost me and its probably easier
> > > to let you give the flow vs spending an hour trying to
> > > track it down.
> > 
> > I will point it out in the new commit log, but this is a preliminary patch for
> > xdp multi-buff support. In the current codebase xdp_update_skb_shared_info()
> > is run just when the NIC is not running in XDP mode (please note
> > mvneta_swbm_add_rx_fragment() is run even if xdp_prog is NULL).
> > When we add xdp multi-buff support, xdp_update_skb_shared_info() will run even
> > in XDP mode since we will remove the MTU constraint.
> > 
> > In the current codebsae the following condition can occur in non-XDP mode if
> > the packet is split on 3 or more descriptors (e.g. MTU 9000):
> > 
> > if (!(rx_status & MVNETA_RXD_LAST_DESC))
> >    continue;
> 
> But, as is there is no caller of xdp_update_skb_shared_info() so
> I think we should move the these two patches into the series with
> the multibuf support.

mvneta is currently using it building the skb in mvneta_swbm_build_skb()
running in non-xdp mode but I am fine merging this series in the
multi-buff one.

> 
> > 
> > > 
> > > But, in theory as you handle a hardware discriptor you can build
> > > up a set of pages using them to create a single skb rather than a
> > > skb per descriptor. But don't we know if pfmemalloc should be
> > > done while we are building the frag list? Can't se just set it
> > > vs this for loop in xdp_update_skb_shared_info(),
> > 
> > I added pfmemalloc code in xdp_update_skb_shared_info() in order to reuse it
> > for the xdp_redirect use-case (e.g. whenever we redirect a xdp multi-buff
> > in a veth or in a cpumap). I have a pending patch where I am using
> > xdp_update_skb_shared_info in __xdp_build_skb_from_frame().
> 
> OK, but it adds an extra for loop and the related overhead. Can
> we avoid this overhead and just set it from where we first
> know we have a compound page. Or carry some bit through and
> do a simpler check,
> 
>  if (pfmemalloc_needed) skb->pfmemalloc = true;
> 
> I guess in the case here its building the skb so performance is maybe
> not as critical, but if it gets used in the redirect case then we
> shouldn't be doing unnecessary for loops.

doing so every driver will need to take care of it building the xdp_buff.
Does it work to do it since probably multi-buff is not critical for
performance?
In order to support xdp_redirect we need to save this info in
xdp_buff/xdp_frame, maybe in the flag field added in xdp multi-buff series.

> 
> > 
> > > 
> > > > +	for (i = 0; i < nr_frags; i++) {
> > > > +		struct page *page = skb_frag_page(&sinfo->frags[i]);
> > > > +
> > > > +		page = compound_head(page);
> > > > +		if (page_is_pfmemalloc(page)) {
> > > > +			skb->pfmemalloc = true;
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +}
> > > 
> > > ...
> > > 
> > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > > index 361bc4fbe20b..abf2e50880e0 100644
> > > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > > @@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
> > > >  	rx_desc->buf_phys_addr = 0;
> > > >  
> > > >  	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> > > > -		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
> > > > +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
> > > >  
> > > >  		skb_frag_off_set(frag, pp->rx_offset_correction);
> > > >  		skb_frag_size_set(frag, data_len);
> > > >  		__skb_frag_set_page(frag, page);
> > > > +		/* We don't need to reset pp_recycle here. It's already set, so
> > > > +		 * just mark fragments for recycling.
> > > > +		 */
> > > > +		page_pool_store_mem_info(page, rxq->page_pool);
> > > > +
> > > > +		/* first fragment */
> > > > +		if (!xdp_sinfo->nr_frags)
> > > > +			xdp_sinfo->gso_type = *size;
> > > 
> > > Would be nice to also change 'int size' -> 'unsigned int size' so the
> > > types matched. Presumably you really can't have a negative size.
> > > 
> > 
> > ack
> > 
> > > Also how about giving gso_type a better name. xdp_sinfo->size maybe?
> > 
> > I did it in this way in order to avoid adding a union in skb_shared_info.
> > What about adding an inline helper to set/get it? e.g.
> 
> What was wrong with the union?

Alex requested to use gso_* fields already there (the union was in the previous
version I sent).

Regards,
Lorenzo

> 
> > 
> > static inline u32 xdp_get_data_len(struct skb_shared_info *sinfo)
> > {
> >     return sinfo->gso_type;
> > }
> > 
> > static inline void xdp_set_data_len(struct skb_shared_info *sinfo, u32 datalen)
> > {
> >     sinfo->gso_type = datalen;
> > }
> > 
> > Regards,
> > Lorenzo
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-07-15 23:27         ` Lorenzo Bianconi
@ 2021-07-16  3:05           ` John Fastabend
  0 siblings, 0 replies; 8+ messages in thread
From: John Fastabend @ 2021-07-16  3:05 UTC (permalink / raw)
  To: Lorenzo Bianconi, John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	alexander.duyck, brouer, echaudro, magnus.karlsson,
	tirthendu.sarkar, toke

Lorenzo Bianconi wrote:
> > Lorenzo Bianconi wrote:
> > > > Lorenzo Bianconi wrote:
> > > > > Introduce xdp_update_skb_shared_info routine to update frags array
> > > > > metadata from a given xdp_buffer/xdp_frame. We do not need to reset
> > > > > frags array since it is already initialized by the driver.
> > > > > Rely on xdp_update_skb_shared_info in mvneta driver.
> > > > 
> > > > Some more context here would really help. I had to jump into the mvneta
> > > > driver to see what is happening.
> > > 
> > > Hi John,
> > > 
> > > ack, you are right. I will add more context next time. Sorry for the noise.
> > > 
> > > > 
> > > > So as I read this we have a loop processing the descriptor in
> > > > mvneta_rx_swbm()
> > > > 
> > > >  mvneta_rx_swbm()
> > > >    while (rx_proc < budget && rx_proc < rx_todo) {
> > > >      if (rx_status & MVNETA_RXD_FIRST_DESC) ...
> > > >      else {
> > > >        mvneta_swbm_add_rx_fragment()
> > > >      }
> > > >      ..
> > > >      if (!rx_status & MVNETA_RXD_LAST_DESC)
> > > >          continue;
> > > >      ..
> > > >      if (xdp_prog)
> > > >        mvneta_run_xdp(...)
> > > >    }
> > > > 
> > > > roughly looking like above. First question, do you ever hit
> > > > !MVNETA_RXD_LAST_DESC today? I assume this is avoided by hardware
> > > > setup when XDP is enabled, otherwise _run_xdp() would be
> > > > broken correct? Next question, given last descriptor bit
> > > > logic whats the condition to hit the code added in this patch?
> > > > wouldn't we need more than 1 descriptor and then we would
> > > > skip the xdp_run... sorry lost me and its probably easier
> > > > to let you give the flow vs spending an hour trying to
> > > > track it down.
> > > 
> > > I will point it out in the new commit log, but this is a preliminary patch for
> > > xdp multi-buff support. In the current codebase xdp_update_skb_shared_info()
> > > is run just when the NIC is not running in XDP mode (please note
> > > mvneta_swbm_add_rx_fragment() is run even if xdp_prog is NULL).
> > > When we add xdp multi-buff support, xdp_update_skb_shared_info() will run even
> > > in XDP mode since we will remove the MTU constraint.
> > > 
> > > In the current codebsae the following condition can occur in non-XDP mode if
> > > the packet is split on 3 or more descriptors (e.g. MTU 9000):
> > > 
> > > if (!(rx_status & MVNETA_RXD_LAST_DESC))
> > >    continue;
> > 
> > But, as is there is no caller of xdp_update_skb_shared_info() so
> > I think we should move the these two patches into the series with
> > the multibuf support.
> 
> mvneta is currently using it building the skb in mvneta_swbm_build_skb()
> running in non-xdp mode but I am fine merging this series in the
> multi-buff one.

My preference is to add it where it will be used. So in the multi-buf
series.

> 
> > 
> > > 
> > > > 
> > > > But, in theory as you handle a hardware discriptor you can build
> > > > up a set of pages using them to create a single skb rather than a
> > > > skb per descriptor. But don't we know if pfmemalloc should be
> > > > done while we are building the frag list? Can't se just set it
> > > > vs this for loop in xdp_update_skb_shared_info(),
> > > 
> > > I added pfmemalloc code in xdp_update_skb_shared_info() in order to reuse it
> > > for the xdp_redirect use-case (e.g. whenever we redirect a xdp multi-buff
> > > in a veth or in a cpumap). I have a pending patch where I am using
> > > xdp_update_skb_shared_info in __xdp_build_skb_from_frame().
> > 
> > OK, but it adds an extra for loop and the related overhead. Can
> > we avoid this overhead and just set it from where we first
> > know we have a compound page. Or carry some bit through and
> > do a simpler check,
> > 
> >  if (pfmemalloc_needed) skb->pfmemalloc = true;
> > 
> > I guess in the case here its building the skb so performance is maybe
> > not as critical, but if it gets used in the redirect case then we
> > shouldn't be doing unnecessary for loops.
> 
> doing so every driver will need to take care of it building the xdp_buff.
> Does it work to do it since probably multi-buff is not critical for
> performance?

OK, but I think we need to improve performance in some of the 100Gbps
drivers. Work is in progress so any thing that has potential to slow
things down again I want to call out. I agree this might be OK and
only matters for nr_frags case.

> In order to support xdp_redirect we need to save this info in
> xdp_buff/xdp_frame, maybe in the flag field added in xdp multi-buff series.

Yeah I think that would work better if possible.

> 
> > 
> > > 
> > > > 
> > > > > +	for (i = 0; i < nr_frags; i++) {
> > > > > +		struct page *page = skb_frag_page(&sinfo->frags[i]);
> > > > > +
> > > > > +		page = compound_head(page);
> > > > > +		if (page_is_pfmemalloc(page)) {
> > > > > +			skb->pfmemalloc = true;
> > > > > +			break;
> > > > > +		}
> > > > > +	}
> > > > > +}
> > > > 
> > > > ...
> > > > 
> > > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > > > index 361bc4fbe20b..abf2e50880e0 100644
> > > > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > > > @@ -2294,18 +2294,29 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
> > > > >  	rx_desc->buf_phys_addr = 0;
> > > > >  
> > > > >  	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
> > > > > -		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
> > > > > +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags];
> > > > >  
> > > > >  		skb_frag_off_set(frag, pp->rx_offset_correction);
> > > > >  		skb_frag_size_set(frag, data_len);
> > > > >  		__skb_frag_set_page(frag, page);
> > > > > +		/* We don't need to reset pp_recycle here. It's already set, so
> > > > > +		 * just mark fragments for recycling.
> > > > > +		 */
> > > > > +		page_pool_store_mem_info(page, rxq->page_pool);
> > > > > +
> > > > > +		/* first fragment */
> > > > > +		if (!xdp_sinfo->nr_frags)
> > > > > +			xdp_sinfo->gso_type = *size;
> > > > 
> > > > Would be nice to also change 'int size' -> 'unsigned int size' so the
> > > > types matched. Presumably you really can't have a negative size.
> > > > 
> > > 
> > > ack
> > > 
> > > > Also how about giving gso_type a better name. xdp_sinfo->size maybe?
> > > 
> > > I did it in this way in order to avoid adding a union in skb_shared_info.
> > > What about adding an inline helper to set/get it? e.g.
> > 
> > What was wrong with the union?
> 
> Alex requested to use gso_* fields already there (the union was in the previous
> version I sent).

@Alex, I think you were just saying union the gso_size field not the
tskey field.  Anyways its a fairly small nit on my side I don't care
much either way.

> 
> Regards,
> Lorenzo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-16  3:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-09 11:10 [PATCH bpf-next 0/2] Add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
2021-07-09 11:10 ` [PATCH bpf-next 1/2] net: skbuff: add xdp_frags_tsize field to skb_shared_info Lorenzo Bianconi
2021-07-09 11:10 ` [PATCH bpf-next 2/2] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
2021-07-12 18:46   ` John Fastabend
2021-07-12 20:22     ` Lorenzo Bianconi
2021-07-14 23:44       ` John Fastabend
2021-07-15 23:27         ` Lorenzo Bianconi
2021-07-16  3:05           ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.