All of lore.kernel.org
 help / color / mirror / Atom feed
* [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
@ 2018-06-14  0:53 Saeed Mahameed
  2018-06-14  2:30 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-14  0:53 UTC (permalink / raw)
  To: Tariq Toukan, Martin KaFai Lau; +Cc: netdev, Saeed Mahameed, Eric Dumazet

When a new rx packet arrives, the rx path will decide whether to reuse
the same page or not according to the current rx frag page offset and
frag size, i.e:
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;

Martin debugged this and he showed that this can cause mlx4 XDP to reuse
buffers when it shouldn't.

Using frag_info->frag_size is wrong since frag size is always the port
mtu and the frag stride might be larger, especially in XDP case where
frag_stride == PAGE_SIZE.

In XDP there is an assumption to have a page per packet and reuse can
break such assumption and might cause packet data corruptions, since in 
XDP frag_offset will always reset to the beginning of the page when
refilling the rx buffer.

Fix this by using the stride size rather than frag size in "release"
condition evaluation.

For non XDP setup this will yield the same behavior since frag_stride
already equals to ALIGN(frag_size, SMP_CACHE_BYTES) and on XDP setup the
"release" condition will always be true as it is supposed to be since
frag_stride == PAGE_SIZE.

Fixes: 34db548bfb95 ("mlx4: add page recycling in receive path")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
CC: Eric Dumazet <edumazet@google.com>

---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5c613c6663da..f63dde0288b7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -504,7 +504,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 			u32 sz_align = ALIGN(frag_size, SMP_CACHE_BYTES);
 
 			frags->page_offset += sz_align;
-			release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
+			release = frags->page_offset + frag_info->frag_stride > PAGE_SIZE;
 		}
 		if (release) {
 			dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14  0:53 [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition Saeed Mahameed
@ 2018-06-14  2:30 ` Eric Dumazet
  2018-06-14 18:56   ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2018-06-14  2:30 UTC (permalink / raw)
  To: Saeed Mahameed, Tariq Toukan, Martin KaFai Lau; +Cc: netdev, Eric Dumazet



On 06/13/2018 05:53 PM, Saeed Mahameed wrote:
> When a new rx packet arrives, the rx path will decide whether to reuse
> the same page or not according to the current rx frag page offset and
> frag size, i.e:
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> Martin debugged this and he showed that this can cause mlx4 XDP to reuse
> buffers when it shouldn't.
> 
> Using frag_info->frag_size is wrong since frag size is always the port
> mtu and the frag stride might be larger, especially in XDP case where
> frag_stride == PAGE_SIZE.

Hmmm... why frag_size is not PAGE_SIZE for XDP then ?

This patch is a bit strange, since we do :

u32 sz_align = ALIGN(frag_size, SMP_CACHE_BYTES);

frags->page_offset += sz_align;
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;

So the @release condition is really to have enough space from frags->page_offset
to hold up to frag_info->frag_size bytes.

(NIC wont push more than frag_info->frag_size bytes, regardless of how big is out frag_stride )

Your patch would prevent a page being reused if say the amount of available bytes is 1536,
but frag_stride = 2048


I would suggest a patch like the following instead.

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5edd0cf2999cbde37b3404aafd700584696af940..83d6b17b102bc9a22bfd8e68863d079f38670501 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1082,7 +1082,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
         * This only works when num_frags == 1.
         */
        if (priv->tx_ring_num[TX_XDP]) {
-               priv->frag_info[0].frag_size = eff_mtu;
+               priv->frag_info[0].frag_size = PAGE_SIZE;
                /* This will gain efficient xdp frame recycling at the
                 * expense of more costly truesize accounting
                 */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14  2:30 ` Eric Dumazet
@ 2018-06-14 18:56   ` Saeed Mahameed
  2018-06-14 19:12     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-14 18:56 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Wed, 2018-06-13 at 19:30 -0700, Eric Dumazet wrote:
> 
> On 06/13/2018 05:53 PM, Saeed Mahameed wrote:
> > When a new rx packet arrives, the rx path will decide whether to
> > reuse
> > the same page or not according to the current rx frag page offset
> > and
> > frag size, i.e:
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> > 
> > Martin debugged this and he showed that this can cause mlx4 XDP to
> > reuse
> > buffers when it shouldn't.
> > 
> > Using frag_info->frag_size is wrong since frag size is always the
> > port
> > mtu and the frag stride might be larger, especially in XDP case
> > where
> > frag_stride == PAGE_SIZE.
> 
> Hmmm... why frag_size is not PAGE_SIZE for XDP then ?
> 

I thought about this, but i see some problems with this that i would
like to avoid.

1. definition of frag_size v.s. frag_stride:
frag_size should be equal to the effective mtu, since the descriptor
frag size depends solely on it. and we don't want to end up receiving
packets larger than the stack mtu (think of a vf with smaller mtu than
the wire).
e.g:
rx_desc->data[i].byte_count = 
       cpu_to_be32(priv->frag_info[i].frag_size);

frag_stride defines how much the frag_size can safely grow inside of a
page or how much padding we need for this frag.

so yes frag_size = PAGE_SIZE can work but it also kind of weird and
might cause incorrectness in terms of RX packets sizes.


> This patch is a bit strange, since we do :
> 
> u32 sz_align = ALIGN(frag_size, SMP_CACHE_BYTES);
> 
> frags->page_offset += sz_align;
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> So the @release condition is really to have enough space from frags-
> >page_offset
> to hold up to frag_info->frag_size bytes.
> 
> (NIC wont push more than frag_info->frag_size bytes, regardless of
> how big is out frag_stride )
> 

yes, but if frag_size > mtu, we might have a problem.

> Your patch would prevent a page being reused if say the amount of
> available bytes is 1536,
> but frag_stride = 2048
> 

Interestingly for this exact frag_stride we don't have an issue :)
since it goes through a different condition branch
(the page flipping thing):

if (frag_info->frag_stride == PAGE_SIZE / 2) {
    frags->page_offset ^= PAGE_SIZE / 2;
	release = page_count(page) != 1 ||
		  page_is_pfmemalloc(page) ||
		  page_to_nid(page) != numa_mem_id();

but yes there is a small price to pay for when 1536 < mtu < 2k

but for other cases there is no change in behavior.

overall i am not against your suggestion i am just laying off the
problems with the two suggestions.

> 
> I would suggest a patch like the following instead.
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index
> 5edd0cf2999cbde37b3404aafd700584696af940..83d6b17b102bc9a22bfd8e68863
> d079f38670501 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -1082,7 +1082,7 @@ void mlx4_en_calc_rx_buf(struct net_device
> *dev)
>          * This only works when num_frags == 1.
>          */
>         if (priv->tx_ring_num[TX_XDP]) {
> -               priv->frag_info[0].frag_size = eff_mtu;
> +               priv->frag_info[0].frag_size = PAGE_SIZE;
>                 /* This will gain efficient xdp frame recycling at
> the
>                  * expense of more costly truesize accounting
>                  */
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 18:56   ` Saeed Mahameed
@ 2018-06-14 19:12     ` Eric Dumazet
  2018-06-14 20:47       ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2018-06-14 19:12 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet



On 06/14/2018 11:56 AM, Saeed Mahameed wrote:

> Interestingly for this exact frag_stride we don't have an issue :)
> since it goes through a different condition branch
> (the page flipping thing):
> 
> if (frag_info->frag_stride == PAGE_SIZE / 2) {
>     frags->page_offset ^= PAGE_SIZE / 2;
> 	release = page_count(page) != 1 ||
> 		  page_is_pfmemalloc(page) ||
> 		  page_to_nid(page) != numa_mem_id();
> 

I guess you forgot to test on PowerPC where PAGE_SIZE=65536 ?

On PowerPC, the first branch is never taken.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 19:12     ` Eric Dumazet
@ 2018-06-14 20:47       ` Saeed Mahameed
  2018-06-14 20:53         ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-14 20:47 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Thu, 2018-06-14 at 12:12 -0700, Eric Dumazet wrote:
> 
> On 06/14/2018 11:56 AM, Saeed Mahameed wrote:
> 
> > Interestingly for this exact frag_stride we don't have an issue :)
> > since it goes through a different condition branch
> > (the page flipping thing):
> > 
> > if (frag_info->frag_stride == PAGE_SIZE / 2) {
> >     frags->page_offset ^= PAGE_SIZE / 2;
> > 	release = page_count(page) != 1 ||
> > 		  page_is_pfmemalloc(page) ||
> > 		  page_to_nid(page) != numa_mem_id();
> > 
> 
> I guess you forgot to test on PowerPC where PAGE_SIZE=65536 ?
> 
> On PowerPC, the first branch is never taken.
> 

This code is already in the driver, i guess as optimization for when a
page can be reused only twice (PAGE_SIZE=4K, strides_size = 2k).
frag_stride is never more than 2k.

for power PC we should always take the other branch which will reuse as
much 2k strides as possible in a page regardless of PAGE_SIZE.

so are we good with my change ?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 20:47       ` Saeed Mahameed
@ 2018-06-14 20:53         ` Saeed Mahameed
  2018-06-14 21:04           ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-14 20:53 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Thu, 2018-06-14 at 13:47 -0700, saeedm@mellanox.com wrote:
> On Thu, 2018-06-14 at 12:12 -0700, Eric Dumazet wrote:
> > 
> > On 06/14/2018 11:56 AM, Saeed Mahameed wrote:
> > 
> > > Interestingly for this exact frag_stride we don't have an issue
> > > :)
> > > since it goes through a different condition branch
> > > (the page flipping thing):
> > > 
> > > if (frag_info->frag_stride == PAGE_SIZE / 2) {
> > >     frags->page_offset ^= PAGE_SIZE / 2;
> > > 	release = page_count(page) != 1 ||
> > > 		  page_is_pfmemalloc(page) ||
> > > 		  page_to_nid(page) != numa_mem_id();
> > > 
> > 
> > I guess you forgot to test on PowerPC where PAGE_SIZE=65536 ?
> > 
> > On PowerPC, the first branch is never taken.
> > 

Oh, sorry i see where you are going with this, you mean for xdp in the
other branch we need to take care of PAGE_SIZE > 4k .. 

You are totally right, good catch! I will have to figure this out :) ..

> 
> This code is already in the driver, i guess as optimization for when
> a
> page can be reused only twice (PAGE_SIZE=4K, strides_size = 2k).
> frag_stride is never more than 2k.
> 
> for power PC we should always take the other branch which will reuse
> as
> much 2k strides as possible in a page regardless of PAGE_SIZE.
> 
> so are we good with my change ?
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 20:53         ` Saeed Mahameed
@ 2018-06-14 21:04           ` Saeed Mahameed
  2018-06-14 23:49             ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-14 21:04 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Thu, 2018-06-14 at 20:53 +0000, Saeed Mahameed wrote:
> On Thu, 2018-06-14 at 13:47 -0700, saeedm@mellanox.com wrote:
> > On Thu, 2018-06-14 at 12:12 -0700, Eric Dumazet wrote:
> > > 
> > > On 06/14/2018 11:56 AM, Saeed Mahameed wrote:
> > > 
> > > > Interestingly for this exact frag_stride we don't have an issue
> > > > :)
> > > > since it goes through a different condition branch
> > > > (the page flipping thing):
> > > > 
> > > > if (frag_info->frag_stride == PAGE_SIZE / 2) {
> > > >     frags->page_offset ^= PAGE_SIZE / 2;
> > > > 	release = page_count(page) != 1 ||
> > > > 		  page_is_pfmemalloc(page) ||
> > > > 		  page_to_nid(page) != numa_mem_id();
> > > > 
> > > 
> > > I guess you forgot to test on PowerPC where PAGE_SIZE=65536 ?
> > > 
> > > On PowerPC, the first branch is never taken.
> > > 
> 
> Oh, sorry i see where you are going with this, you mean for xdp in
> the
> other branch we need to take care of PAGE_SIZE > 4k .. 
> 

I was looking at the code without my fix :)

with my fix:
release = frags->page_offset + frag_info->frag_stride > PAGE_SIZE;

for XDP: frag_info->frag_stride is PAGE_SIZE, so release will always be
true regardless of PAGE_SIZE.

So i guess i didn't quite understand your PowerPC concern.. can you
elaborate ?

> You are totally right, good catch! I will have to figure this out :)
> ..
> 
> > 
> > This code is already in the driver, i guess as optimization for
> > when
> > a
> > page can be reused only twice (PAGE_SIZE=4K, strides_size = 2k).
> > frag_stride is never more than 2k.
> > 
> > for power PC we should always take the other branch which will
> > reuse
> > as
> > much 2k strides as possible in a page regardless of PAGE_SIZE.
> > 
> > so are we good with my change ?
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 21:04           ` Saeed Mahameed
@ 2018-06-14 23:49             ` Eric Dumazet
  2018-06-19 18:05               ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2018-06-14 23:49 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet



On 06/14/2018 02:04 PM, Saeed Mahameed wrote:

> I was looking at the code without my fix :)
> 
> with my fix:
> release = frags->page_offset + frag_info->frag_stride > PAGE_SIZE;
> 
> for XDP: frag_info->frag_stride is PAGE_SIZE, so release will always be
> true regardless of PAGE_SIZE.
> 
> So i guess i didn't quite understand your PowerPC concern.. can you
> elaborate ?
> 

So your maths with PAGE_SIZE=65536 and MTU 9000

frag_stride is about 9344

So if the last chunk of the page has 9100 bytes, we wont be able to use it, while really we should be able to use it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-14 23:49             ` Eric Dumazet
@ 2018-06-19 18:05               ` Saeed Mahameed
  2018-06-20  0:25                 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-19 18:05 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Thu, 2018-06-14 at 16:49 -0700, Eric Dumazet wrote:
> 
> On 06/14/2018 02:04 PM, Saeed Mahameed wrote:
> 
> > I was looking at the code without my fix :)
> > 
> > with my fix:
> > release = frags->page_offset + frag_info->frag_stride > PAGE_SIZE;
> > 
> > for XDP: frag_info->frag_stride is PAGE_SIZE, so release will
> > always be
> > true regardless of PAGE_SIZE.
> > 
> > So i guess i didn't quite understand your PowerPC concern.. can you
> > elaborate ?
> > 
> 
> So your maths with PAGE_SIZE=65536 and MTU 9000
> 
> frag_stride is about 9344
> 
> So if the last chunk of the page has 9100 bytes, we wont be able to
> use it, while really we should be able to use it.
> 
> 

this is only true for XDP setup, for non XDP max stride_size can only
be around ~3k and only for mtu > ~6k

For XDP setup you suggested:
-               priv->frag_info[0].frag_size = eff_mtu;
+               priv->frag_info[0].frag_size = PAGE_SIZE;

currently the condition is:

release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;

so my solution and yours have the same problem you described above.

the problem is not with the initial values or with stride/farg size
math, it just that in XDP we shouldn't reuse at ALL. I agree with you
that we need to optimize and maybe for PAGE_SIZE > 8k we need to allow
XDP setup to reuses. but for now there is a data corruption to handle.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-19 18:05               ` Saeed Mahameed
@ 2018-06-20  0:25                 ` Eric Dumazet
  2018-06-20 23:41                   ` Saeed Mahameed
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet



On 06/19/2018 11:05 AM, Saeed Mahameed wrote:

> this is only true for XDP setup, for non XDP max stride_size can only
> be around ~3k and only for mtu > ~6k
> 
> For XDP setup you suggested:
> -               priv->frag_info[0].frag_size = eff_mtu;
> +               priv->frag_info[0].frag_size = PAGE_SIZE;
> 
> currently the condition is:
> 
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> so my solution and yours have the same problem you described above.
> 
> the problem is not with the initial values or with stride/farg size
> math, it just that in XDP we shouldn't reuse at ALL. I agree with you
> that we need to optimize and maybe for PAGE_SIZE > 8k we need to allow
> XDP setup to reuses. but for now there is a data corruption to handle.


Sure, we all agree there is a bug to fix.

The way you are fixing it is kind of illogical.

The NIC can use a frag if its _size_ is big enough to receive the frame.

The _stride_  is an abstraction created by the driver to report an estimation of the _truesize_,
or memory consumption, so that linux can better track overall memory usage.

For example, if MTU=1500, the size of the fragment is 1536 bytes, but since we can put only
2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048 bytes.

Declaring that a final blob of a page, being 1600 bytes, not able to receive a frame because
_stride_ is 2048 is illogical and waste resources.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-20  0:25                 ` Eric Dumazet
@ 2018-06-20 23:41                   ` Saeed Mahameed
  2018-06-21  0:28                     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Saeed Mahameed @ 2018-06-20 23:41 UTC (permalink / raw)
  To: eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet

On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote:
> 
> On 06/19/2018 11:05 AM, Saeed Mahameed wrote:
> 
> > this is only true for XDP setup, for non XDP max stride_size can
> > only
> > be around ~3k and only for mtu > ~6k
> > 
> > For XDP setup you suggested:
> > -               priv->frag_info[0].frag_size = eff_mtu;
> > +               priv->frag_info[0].frag_size = PAGE_SIZE;
> > 
> > currently the condition is:
> > 
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> > 
> > so my solution and yours have the same problem you described above.
> > 
> > the problem is not with the initial values or with stride/farg size
> > math, it just that in XDP we shouldn't reuse at ALL. I agree with
> > you
> > that we need to optimize and maybe for PAGE_SIZE > 8k we need to
> > allow
> > XDP setup to reuses. but for now there is a data corruption to
> > handle.
> 
> 
> Sure, we all agree there is a bug to fix.
> 
> The way you are fixing it is kind of illogical.
> 
> The NIC can use a frag if its _size_ is big enough to receive the
> frame.
> 
> The _stride_  is an abstraction created by the driver to report an
> estimation of the _truesize_,
> or memory consumption, so that linux can better track overall memory
> usage.
> 
> For example, if MTU=1500, the size of the fragment is 1536 bytes, but
> since we can put only
> 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048
> bytes.
> 
> Declaring that a final blob of a page, being 1600 bytes, not able to
> receive a frame because
> _stride_ is 2048 is illogical and waste resources.
> 
> 

I see, I wanted to use _stride_ as grantee for how much a page frag can
grow, for example in mlx5 we need the whole stride to build_skb  around
the frag, since we always need the trailer, but it is different in here
and we can avoid resource waste.

so how a bout this: (As suggested by Martin).
currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
is always 0 in non-XDP setup, hence:

frags->page_offset += sz_align;

where it really should be:
frags->page_offset += sz_align + priv->rx_headroom;

we can use it as a hint to not reuse as below:
what do you think ?


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
 {
        const struct mlx4_en_frag_info *frag_info = priv->frag_info;
        unsigned int truesize = 0;
+       bool release = true;
        int nr, frag_size;
        struct page *page;
        dma_addr_t dma;
-       bool release;
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c

        /* Collect used fragments while replacing them in the HW
descriptors */
        for (nr = 0;; frags++) {
@@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
                        release = page_count(page) != 1 ||
                                  page_is_pfmemalloc(page) ||
                                  page_to_nid(page) != numa_mem_id();
-               } else {
+               } elseif(!priv->rx_headroom) {
                        u32 sz_align = ALIGN(frag_size,
SMP_CACHE_BYTES);

                        frags->page_offset += sz_align;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
  2018-06-20 23:41                   ` Saeed Mahameed
@ 2018-06-21  0:28                     ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2018-06-21  0:28 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet, kafai, Tariq Toukan; +Cc: netdev, edumazet



On 06/20/2018 04:41 PM, Saeed Mahameed wrote:
> 
> I see, I wanted to use _stride_ as grantee for how much a page frag can
> grow, for example in mlx5 we need the whole stride to build_skb  around
> the frag, since we always need the trailer, but it is different in here
> and we can avoid resource waste.
> 
> so how a bout this: (As suggested by Martin).
> currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
> is always 0 in non-XDP setup, hence:
> 
> frags->page_offset += sz_align;
> 
> where it really should be:
> frags->page_offset += sz_align + priv->rx_headroom;
> 
> we can use it as a hint to not reuse as below:
> what do you think ?
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>  {
>         const struct mlx4_en_frag_info *frag_info = priv->frag_info;
>         unsigned int truesize = 0;
> +       bool release = true;
>         int nr, frag_size;
>         struct page *page;
>         dma_addr_t dma;
> -       bool release;
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> 
>         /* Collect used fragments while replacing them in the HW
> descriptors */
>         for (nr = 0;; frags++) {
> @@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>                         release = page_count(page) != 1 ||
>                                   page_is_pfmemalloc(page) ||
>                                   page_to_nid(page) != numa_mem_id();
> -               } else {
> +               } elseif(!priv->rx_headroom) {
>                         u32 sz_align = ALIGN(frag_size,
> SMP_CACHE_BYTES);
> 
>                         frags->page_offset += sz_align;
> 

I guess that would work, please double check priv->rx_headroom wont need another cache line,
thanks !

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-06-21  0:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14  0:53 [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition Saeed Mahameed
2018-06-14  2:30 ` Eric Dumazet
2018-06-14 18:56   ` Saeed Mahameed
2018-06-14 19:12     ` Eric Dumazet
2018-06-14 20:47       ` Saeed Mahameed
2018-06-14 20:53         ` Saeed Mahameed
2018-06-14 21:04           ` Saeed Mahameed
2018-06-14 23:49             ` Eric Dumazet
2018-06-19 18:05               ` Saeed Mahameed
2018-06-20  0:25                 ` Eric Dumazet
2018-06-20 23:41                   ` Saeed Mahameed
2018-06-21  0:28                     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.