linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx
       [not found] <1484712850.13165.86.camel@edumazet-glaptop3.roam.corp.google.com>
@ 2017-01-18  9:31 ` Konstantin Khlebnikov
  2017-01-18 14:23   ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-18  9:31 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Tariq Toukan, linux-mm

On 18.01.2017 07:14, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX
> ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
> that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.
>
> As using memory reserves is a must in some situations (swap over NFS or
> iSCSI), this patch adds this flag.

AFAIK __GFP_MEMALLOC is used for TX, not for RX: for allocations which are required by memory reclaimer to free some pages.

Allocation RX buffers with __GFP_MEMALLOC is a straight way to depleting all reserves by flood from network.

>
> Note that this driver does not reuse pages (yet) so we do not have to
> add anything else.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index eac527e25ec902c2a586e9952272b9e8e599e2c8..e362f99334d03c0df4d88320977670015870dd9c 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -706,7 +706,8 @@ static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>  	do {
>  		if (mlx4_en_prepare_rx_desc(priv, ring,
>  					    ring->prod & ring->size_mask,
> -					    GFP_ATOMIC | __GFP_COLD))
> +					    GFP_ATOMIC | __GFP_COLD |
> +					    __GFP_MEMALLOC))
>  			break;
>  		ring->prod++;
>  	} while (--missing);
>
>


-- 
Konstantin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx
  2017-01-18  9:31 ` [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx Konstantin Khlebnikov
@ 2017-01-18 14:23   ` Eric Dumazet
  2017-01-18 15:11     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2017-01-18 14:23 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: David Miller, netdev, Tariq Toukan, linux-mm

On Wed, 2017-01-18 at 12:31 +0300, Konstantin Khlebnikov wrote:
> On 18.01.2017 07:14, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX
> > ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
> > that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.
> >
> > As using memory reserves is a must in some situations (swap over NFS or
> > iSCSI), this patch adds this flag.
> 
> AFAIK __GFP_MEMALLOC is used for TX, not for RX: for allocations which
> are required by memory reclaimer to free some pages.
> 
> Allocation RX buffers with __GFP_MEMALLOC is a straight way to
> depleting all reserves by flood from network.

You are mistaken.

How do you think a TCP flow can make progress sending data if no ACK
packet can go back in RX ?

Take a look at sk_filter_trim_cap(), where the RX packets received on a
socket which does not have SOCK_MEMALLOC is dropped.

        /*
         * If the skb was allocated from pfmemalloc reserves, only
         * allow SOCK_MEMALLOC sockets to use it as this socket is
         * helping free memory
         */
        if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
                return -ENOMEM;

Also take a look at __dev_alloc_pages()

static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
                                             unsigned int order)
{
        /* This piece of code contains several assumptions.
         * 1.  This is for device Rx, therefor a cold page is preferred.
         * 2.  The expectation is the user wants a compound page.
         * 3.  If requesting a order 0 page it will not be compound
         *     due to the check to see if order has a value in prep_new_page
         * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
         *     code in gfp_to_alloc_flags that should be enforcing this.
         */
        gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;

        return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
}


So __GFP_MEMALLOC in RX is absolutely supported.

But drivers have to opt-in, either using __dev_alloc_pages() or
dev_alloc_pages, or explicitely ORing __GFP_MEMALLOC when using
alloc_page[s]()



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx
  2017-01-18 14:23   ` Eric Dumazet
@ 2017-01-18 15:11     ` Konstantin Khlebnikov
  2017-01-18 15:24       ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-18 15:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Tariq Toukan, linux-mm

On 18.01.2017 17:23, Eric Dumazet wrote:
> On Wed, 2017-01-18 at 12:31 +0300, Konstantin Khlebnikov wrote:
>> On 18.01.2017 07:14, Eric Dumazet wrote:
>>> From: Eric Dumazet <edumazet@google.com>
>>>
>>> Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX
>>> ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
>>> that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.
>>>
>>> As using memory reserves is a must in some situations (swap over NFS or
>>> iSCSI), this patch adds this flag.
>>
>> AFAIK __GFP_MEMALLOC is used for TX, not for RX: for allocations which
>> are required by memory reclaimer to free some pages.
>>
>> Allocation RX buffers with __GFP_MEMALLOC is a straight way to
>> depleting all reserves by flood from network.
>
> You are mistaken.
>
> How do you think a TCP flow can make progress sending data if no ACK
> packet can go back in RX ?

Well. Ok. I mistaken.

>
> Take a look at sk_filter_trim_cap(), where the RX packets received on a
> socket which does not have SOCK_MEMALLOC is dropped.
>
>         /*
>          * If the skb was allocated from pfmemalloc reserves, only
>          * allow SOCK_MEMALLOC sockets to use it as this socket is
>          * helping free memory
>          */
>         if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
>                 return -ENOMEM;

I suppose this happens in BH context right after receiving packet?

Potentially any ACK could free memory in TCP send queue,
so using reserves here makes sense.

>
> Also take a look at __dev_alloc_pages()
>
> static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
>                                              unsigned int order)
> {
>         /* This piece of code contains several assumptions.
>          * 1.  This is for device Rx, therefor a cold page is preferred.
>          * 2.  The expectation is the user wants a compound page.
>          * 3.  If requesting a order 0 page it will not be compound
>          *     due to the check to see if order has a value in prep_new_page
>          * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
>          *     code in gfp_to_alloc_flags that should be enforcing this.
>          */
>         gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;
>
>         return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
> }
>
>
> So __GFP_MEMALLOC in RX is absolutely supported.
>
> But drivers have to opt-in, either using __dev_alloc_pages() or
> dev_alloc_pages, or explicitely ORing __GFP_MEMALLOC when using
> alloc_page[s]()
>
>
>


-- 
Konstantin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx
  2017-01-18 15:11     ` Konstantin Khlebnikov
@ 2017-01-18 15:24       ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2017-01-18 15:24 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: David Miller, netdev, Tariq Toukan, linux-mm

On Wed, 2017-01-18 at 18:11 +0300, Konstantin Khlebnikov wrote:
> On 18.01.2017 17:23, Eric Dumazet wrote:

> >
> > Take a look at sk_filter_trim_cap(), where the RX packets received on a
> > socket which does not have SOCK_MEMALLOC is dropped.
> >
> >         /*
> >          * If the skb was allocated from pfmemalloc reserves, only
> >          * allow SOCK_MEMALLOC sockets to use it as this socket is
> >          * helping free memory
> >          */
> >         if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
> >                 return -ENOMEM;
> 
> I suppose this happens in BH context right after receiving packet?
> 
> Potentially any ACK could free memory in TCP send queue,
> so using reserves here makes sense.

Yes, but only sockets with SOCK_MEMALLOC have this contract with the mm
layer.

For 'other' sockets, one possible trick would be that if only the page
fragment attached to skb had the pfmemalloc bit, and not the sk_buff
itself, we could attempt a skb_condense() operation [1], but it is not
really easy to properly recompute skb->pfmemalloc.

Pure TCP ACK packets can usually be trimmed by skb_condense().
Since they have no payload, we have a guarantee they wont sit in a queue
and hold memory.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-18 15:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1484712850.13165.86.camel@edumazet-glaptop3.roam.corp.google.com>
2017-01-18  9:31 ` [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx Konstantin Khlebnikov
2017-01-18 14:23   ` Eric Dumazet
2017-01-18 15:11     ` Konstantin Khlebnikov
2017-01-18 15:24       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).