netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] net/skbuff: don't waste memory reserves
@ 2019-04-18 18:05 Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 2/4] net/skbuff: warn if kmalloc_reserve() fails to allocate memory Andrey Ryabinin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-18 18:05 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev, Andrey Ryabinin

In some workloads we have noticed packets being dropped by
sk_filter_trim_cap() because the 'skb' was allocated from pfmemalloc
reserves:

	/*
	 * If the skb was allocated from pfmemalloc reserves, only
	 * allow SOCK_MEMALLOC sockets to use it as this socket is
	 * helping free memory
	 */
	if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) {
		NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP);
		return -ENOMEM;
	}

Memalloc sockets are used for stuff like swap over NBD or NFS
and only memalloc sockets can process memalloc skbs. Since we
don't have any memalloc sockets in our setups we shouldn't have
memalloc skbs either. It simply doesn't make any sense to waste
memory reserves on skb which will be dropped anyway.

It appears that __dev_alloc_pages() unconditionally uses
__GFP_MEMALLOC, so unless caller added __GFP_NOMEMALLOC, the
__dev_alloc_pages() may dive into memory reserves.
Later build_skb() or __skb_fill_page_desc() sets skb->pfmemalloc = 1
so this skb always dropped by sk_filter_trim_cap().

Instead of wasting memory reserves we simply shouldn't use them in the
case of absence memalloc sockets in the system. Do this by adding
the  __GFP_MEMALLOC only when such socket is present in the system.

Fixes: 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 include/linux/skbuff.h | 17 ++++++++++++++++-
 include/net/sock.h     | 15 ---------------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a06275a618f0..676e54f84de4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2784,6 +2784,19 @@ void napi_consume_skb(struct sk_buff *skb, int budget);
 void __kfree_skb_flush(void);
 void __kfree_skb_defer(struct sk_buff *skb);
 
+#ifdef CONFIG_NET
+DECLARE_STATIC_KEY_FALSE(memalloc_socks_key);
+static inline int sk_memalloc_socks(void)
+{
+	return static_branch_unlikely(&memalloc_socks_key);
+}
+#else
+static inline int sk_memalloc_socks(void)
+{
+	return 0;
+}
+#endif
+
 /**
  * __dev_alloc_pages - allocate page for network Rx
  * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
@@ -2804,7 +2817,9 @@ static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
 	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
 	 *     code in gfp_to_alloc_flags that should be enforcing this.
 	 */
-	gfp_mask |= __GFP_COMP | __GFP_MEMALLOC;
+	gfp_mask |=  __GFP_COMP;
+	if (sk_memalloc_socks())
+		gfp_mask |= __GFP_MEMALLOC;
 
 	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
 }
diff --git a/include/net/sock.h b/include/net/sock.h
index bdd77bbce7d8..5b2138d47bd8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -838,21 +838,6 @@ static inline bool sock_flag(const struct sock *sk, enum sock_flags flag)
 	return test_bit(flag, &sk->sk_flags);
 }
 
-#ifdef CONFIG_NET
-DECLARE_STATIC_KEY_FALSE(memalloc_socks_key);
-static inline int sk_memalloc_socks(void)
-{
-	return static_branch_unlikely(&memalloc_socks_key);
-}
-#else
-
-static inline int sk_memalloc_socks(void)
-{
-	return 0;
-}
-
-#endif
-
 static inline gfp_t sk_gfp_mask(const struct sock *sk, gfp_t gfp_mask)
 {
 	return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] net/skbuff: warn if kmalloc_reserve() fails to allocate memory.
  2019-04-18 18:05 [PATCH 1/4] net/skbuff: don't waste memory reserves Andrey Ryabinin
@ 2019-04-18 18:05 ` Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 3/4] net/skbuff: remove unused skb_propagate_pfmemalloc() Andrey Ryabinin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-18 18:05 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev, Andrey Ryabinin

Commit c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
removed memory potential allocation failure messages in the cases when
pfmemalloc reserves are not allowed to be used. Inability to allocate
skb usually indicates some problem, e.g. these ones:

	commit 5d4c9bfbabdb ("tcp: fix potential huge kmalloc() calls in TCP_REPAIR")
	commit 19125c1a4fd8 ("net: use skb_clone to avoid alloc_pages failure.")

It makes sense to bring the warning back to make problems more obvious,
easier to spot. Also neighboring to kmalloc_reserve() allocations often
doesn't add __GFP_NOWARN. For example __alloc_skb():

	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
	...
	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);

It seems unreasonable to allow kmem_cache_alloc_node() to fail loudly but
forbid the same for the kmalloc_reserve().
So, don't add __GFP_NOWARN unless we can use fallback allocation with
__GFP_MEMALLOC.

Also remove __GFP_NOMEMALLOC when usage of memory reserves isn't allowed.
Many allocations on receive path use plain GFP_ATOMIC without adding
__GFP_NOMEMALLOC (again, see __alloc_skb()). Such allocations have greater chances
to succeed because plain GFP_ATOMIC can use 1/4 of memory reserves, while
GFP_ATOMIC|__GFP_NOMEMALLOC can't. So it's seems more reasonable to add
__GFP_NOMEMALLOC only if we have fallback to memory reserves.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 net/core/skbuff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a083e188374f..97557554e90e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -133,16 +133,19 @@ static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
 			       unsigned long ip, bool *pfmemalloc)
 {
 	void *obj;
+	gfp_t gfp_mask = flags;
 	bool ret_pfmemalloc = false;
+	bool pfmemalloc_allowed = gfp_pfmemalloc_allowed(flags);
 
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
 	 * to the reserves, fail.
 	 */
-	obj = kmalloc_node_track_caller(size,
-					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
-					node);
-	if (obj || !(gfp_pfmemalloc_allowed(flags)))
+	if (pfmemalloc_allowed)
+		gfp_mask |= __GFP_NOMEMALLOC | __GFP_NOWARN;
+
+	obj = kmalloc_node_track_caller(size, gfp_mask, node);
+	if (obj || !pfmemalloc_allowed)
 		goto out;
 
 	/* Try again but now we are using pfmemalloc reserves */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] net/skbuff: remove unused skb_propagate_pfmemalloc()
  2019-04-18 18:05 [PATCH 1/4] net/skbuff: don't waste memory reserves Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 2/4] net/skbuff: warn if kmalloc_reserve() fails to allocate memory Andrey Ryabinin
@ 2019-04-18 18:05 ` Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 4/4] net/skbuff: kmalloc_reserve(): remove unused argument Andrey Ryabinin
  2019-04-18 18:55 ` [PATCH 1/4] net/skbuff: don't waste memory reserves Eric Dumazet
  3 siblings, 0 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-18 18:05 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev, Andrey Ryabinin

skb_propagate_pfmemalloc() never has been used, remove it.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 include/linux/skbuff.h | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 676e54f84de4..bda9d0865a25 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2847,18 +2847,6 @@ static inline struct page *dev_alloc_page(void)
 	return dev_alloc_pages(0);
 }
 
-/**
- *	skb_propagate_pfmemalloc - Propagate pfmemalloc if skb is allocated after RX page
- *	@page: The page that was allocated from skb_alloc_page
- *	@skb: The skb that may need pfmemalloc set
- */
-static inline void skb_propagate_pfmemalloc(struct page *page,
-					     struct sk_buff *skb)
-{
-	if (page_is_pfmemalloc(page))
-		skb->pfmemalloc = true;
-}
-
 /**
  * skb_frag_page - retrieve the page referred to by a paged fragment
  * @frag: the paged fragment
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] net/skbuff: kmalloc_reserve(): remove unused argument
  2019-04-18 18:05 [PATCH 1/4] net/skbuff: don't waste memory reserves Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 2/4] net/skbuff: warn if kmalloc_reserve() fails to allocate memory Andrey Ryabinin
  2019-04-18 18:05 ` [PATCH 3/4] net/skbuff: remove unused skb_propagate_pfmemalloc() Andrey Ryabinin
@ 2019-04-18 18:05 ` Andrey Ryabinin
  2019-04-18 18:55 ` [PATCH 1/4] net/skbuff: don't waste memory reserves Eric Dumazet
  3 siblings, 0 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-18 18:05 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev, Andrey Ryabinin

Argument 'ip' always has been unused. Remove it.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 net/core/skbuff.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 97557554e90e..5cd067b44371 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -126,11 +126,8 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
  * may be used. Otherwise, the packet data may be discarded until enough
  * memory is free
  */
-#define kmalloc_reserve(size, gfp, node, pfmemalloc) \
-	 __kmalloc_reserve(size, gfp, node, _RET_IP_, pfmemalloc)
-
-static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
-			       unsigned long ip, bool *pfmemalloc)
+static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+			     bool *pfmemalloc)
 {
 	void *obj;
 	gfp_t gfp_mask = flags;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-18 18:05 [PATCH 1/4] net/skbuff: don't waste memory reserves Andrey Ryabinin
                   ` (2 preceding siblings ...)
  2019-04-18 18:05 ` [PATCH 4/4] net/skbuff: kmalloc_reserve(): remove unused argument Andrey Ryabinin
@ 2019-04-18 18:55 ` Eric Dumazet
  2019-04-18 18:56   ` David Miller
  2019-04-19 13:17   ` Andrey Ryabinin
  3 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-04-18 18:55 UTC (permalink / raw)
  To: Andrey Ryabinin, David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev



On 04/18/2019 11:05 AM, Andrey Ryabinin wrote:
> In some workloads we have noticed packets being dropped by
> sk_filter_trim_cap() because the 'skb' was allocated from pfmemalloc
> reserves:
> 
> 	/*
> 	 * If the skb was allocated from pfmemalloc reserves, only
> 	 * allow SOCK_MEMALLOC sockets to use it as this socket is
> 	 * helping free memory
> 	 */
> 	if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) {
> 		NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP);
> 		return -ENOMEM;
> 	}
> 
> Memalloc sockets are used for stuff like swap over NBD or NFS
> and only memalloc sockets can process memalloc skbs. Since we
> don't have any memalloc sockets in our setups we shouldn't have
> memalloc skbs either. It simply doesn't make any sense to waste
> memory reserves on skb which will be dropped anyway.
> 
> It appears that __dev_alloc_pages() unconditionally uses
> __GFP_MEMALLOC, so unless caller added __GFP_NOMEMALLOC, the
> __dev_alloc_pages() may dive into memory reserves.
> Later build_skb() or __skb_fill_page_desc() sets skb->pfmemalloc = 1
> so this skb always dropped by sk_filter_trim_cap().
> 
> Instead of wasting memory reserves we simply shouldn't use them in the
> case of absence memalloc sockets in the system. Do this by adding
> the  __GFP_MEMALLOC only when such socket is present in the system.
> 
> Fixes: 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb")
> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> ---
>  include/linux/skbuff.h | 17 ++++++++++++++++-
>  include/net/sock.h     | 15 ---------------
>  2 files changed, 16 insertions(+), 16 deletions(-)
>

Hi Andrey

Are you targeting net or net-next tree ?

AFAIK, drivers allocate skbs way before a frame is actually received,
(at RX ring buffer initialization or refill)

So sk_memalloc_socks() might be false at that time, but true later.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-18 18:55 ` [PATCH 1/4] net/skbuff: don't waste memory reserves Eric Dumazet
@ 2019-04-18 18:56   ` David Miller
  2019-04-19 13:17   ` Andrey Ryabinin
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2019-04-18 18:56 UTC (permalink / raw)
  To: eric.dumazet
  Cc: aryabinin, edumazet, mgorman, willemb, fw, linux-kernel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 18 Apr 2019 11:55:16 -0700

> AFAIK, drivers allocate skbs way before a frame is actually received,
> (at RX ring buffer initialization or refill)
> 
> So sk_memalloc_socks() might be false at that time, but true later.

This is exactly what I was going to say too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-18 18:55 ` [PATCH 1/4] net/skbuff: don't waste memory reserves Eric Dumazet
  2019-04-18 18:56   ` David Miller
@ 2019-04-19 13:17   ` Andrey Ryabinin
  2019-04-19 13:36     ` Eric Dumazet
  2019-04-19 13:41     ` Eric Dumazet
  1 sibling, 2 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-19 13:17 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev



On 4/18/19 9:55 PM, Eric Dumazet wrote:
> 
> 
> On 04/18/2019 11:05 AM, Andrey Ryabinin wrote:
>> In some workloads we have noticed packets being dropped by
>> sk_filter_trim_cap() because the 'skb' was allocated from pfmemalloc
>> reserves:
>>
>> 	/*
>> 	 * If the skb was allocated from pfmemalloc reserves, only
>> 	 * allow SOCK_MEMALLOC sockets to use it as this socket is
>> 	 * helping free memory
>> 	 */
>> 	if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) {
>> 		NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP);
>> 		return -ENOMEM;
>> 	}
>>
>> Memalloc sockets are used for stuff like swap over NBD or NFS
>> and only memalloc sockets can process memalloc skbs. Since we
>> don't have any memalloc sockets in our setups we shouldn't have
>> memalloc skbs either. It simply doesn't make any sense to waste
>> memory reserves on skb which will be dropped anyway.
>>
>> It appears that __dev_alloc_pages() unconditionally uses
>> __GFP_MEMALLOC, so unless caller added __GFP_NOMEMALLOC, the
>> __dev_alloc_pages() may dive into memory reserves.
>> Later build_skb() or __skb_fill_page_desc() sets skb->pfmemalloc = 1
>> so this skb always dropped by sk_filter_trim_cap().
>>
>> Instead of wasting memory reserves we simply shouldn't use them in the
>> case of absence memalloc sockets in the system. Do this by adding
>> the  __GFP_MEMALLOC only when such socket is present in the system.
>>
>> Fixes: 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb")
>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> ---
>>  include/linux/skbuff.h | 17 ++++++++++++++++-
>>  include/net/sock.h     | 15 ---------------
>>  2 files changed, 16 insertions(+), 16 deletions(-)
>>
> 
> Hi Andrey
> 
> Are you targeting net or net-next tree ?
> 

I think it's up to Dave to decide where the patches should go. They apply cleanly on both trees.
The last two patches just minor cleanups so they are definitely net-next material.

> AFAIK, drivers allocate skbs way before a frame is actually received,
> (at RX ring buffer initialization or refill)
> 
> So sk_memalloc_socks() might be false at that time, but true later.
> 

I don't see why that would be a problem. If refill failed because we didn't have
access to reserves, then there going to be an another refill attempt, right?
And the next refill attempt will be with access to the reserves if memalloc socket was created.
We can't predict the future, so until the memalloc socket appeared we must assume that those
RX ring buffers won't be used to reclaim memory (and that is actually true in 99% of cases).





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-19 13:17   ` Andrey Ryabinin
@ 2019-04-19 13:36     ` Eric Dumazet
  2019-04-19 13:41     ` Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-04-19 13:36 UTC (permalink / raw)
  To: Andrey Ryabinin, David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev



On 04/19/2019 06:17 AM, Andrey Ryabinin wrote:
> 
> 
> On 4/18/19 9:55 PM, Eric Dumazet wrote:

>> Are you targeting net or net-next tree ?
>>
> 
> I think it's up to Dave to decide where the patches should go. They apply cleanly on both trees.
> The last two patches just minor cleanups so they are definitely net-next material.
> 

Not at all, please carefully read 

Documentation/networking/netdev-FAQ.rst

Q: How do I indicate which tree (net vs. net-next) my patch should be in?

You need to split your patches in two series, not assume that David will manually do
the manual work for you.

Thank you

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-19 13:17   ` Andrey Ryabinin
  2019-04-19 13:36     ` Eric Dumazet
@ 2019-04-19 13:41     ` Eric Dumazet
  2019-04-19 16:25       ` Andrey Ryabinin
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2019-04-19 13:41 UTC (permalink / raw)
  To: Andrey Ryabinin, David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev



On 04/19/2019 06:17 AM, Andrey Ryabinin wrote:
> 

> I don't see why that would be a problem. If refill failed because we didn't have
> access to reserves, then there going to be an another refill attempt, right?
> And the next refill attempt will be with access to the reserves if memalloc socket was created.
> We can't predict the future, so until the memalloc socket appeared we must assume that those
> RX ring buffers won't be used to reclaim memory (and that is actually true in 99% of cases).
> 

I just said that the alloc might be attempted "in the past"

Yes, we can not predict the future, this is why we need to access the reserve _now_ and not
at the time the packet is received.

The 'being true in 99% of cases' argument is not really convincing.

You want the NIC to be ready to receive packets even before sk_memalloc_socks() becomes true.

If a NIC driver has a bug, please fix it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-19 13:41     ` Eric Dumazet
@ 2019-04-19 16:25       ` Andrey Ryabinin
  2019-04-19 16:27         ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Ryabinin @ 2019-04-19 16:25 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Eric Dumazet, Mel Gorman, Willem de Bruijn, Florian Westphal,
	linux-kernel, netdev

On 4/19/19 4:41 PM, Eric Dumazet wrote:
> On 04/19/2019 06:17 AM, Andrey Ryabinin wrote:
>>
> 
>> I don't see why that would be a problem. If refill failed because we didn't have
>> access to reserves, then there going to be an another refill attempt, right?
>> And the next refill attempt will be with access to the reserves if memalloc socket was created.
>> We can't predict the future, so until the memalloc socket appeared we must assume that those
>> RX ring buffers won't be used to reclaim memory (and that is actually true in 99% of cases).
>>
> 
> I just said that the alloc might be attempted "in the past"
> 
> Yes, we can not predict the future, this is why we need to access the reserve _now_ and not
> at the time the packet is received.
> 
> The 'being true in 99% of cases' argument is not really convincing.
> 
> You want the NIC to be ready to receive packets even before sk_memalloc_socks() becomes true.

The fact that we allocate pages before the socket created I completely understand.

But why that failed allocation is such a problem?

1. sk_memalloc_socks() false
2. NIC driver tries to allocate pages and fails
3. swapon /nfs/swap_file /* memalloc_socket created */
4. We may be loosing some packets because of 2. But there will be retransmit, I suppose NIC driver will
try to allocate pages again, and this time with access to reserves, so eventually we all good.

So what is the problem? Potential lost of some packets for some period of time after swapon?


> If a NIC driver has a bug, please fix it.
> 

I don't follow, what kind of bug are you talking about?
The problem I'm trying to fix is entirely in __dev_alloc_pages():
1. sk_memalloc_socks() false all the time.
2. NIC takes pages from memory reserves
3. The fact that we occupying memory reserves increases the chances that the next page will be also taken from reserves as well,
which means more dropped packets than we would have without using the reserves.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] net/skbuff: don't waste memory reserves
  2019-04-19 16:25       ` Andrey Ryabinin
@ 2019-04-19 16:27         ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-04-19 16:27 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Eric Dumazet, David S. Miller, Mel Gorman, Willem de Bruijn,
	Florian Westphal, LKML, netdev

On Fri, Apr 19, 2019 at 9:24 AM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
> But why that failed allocation is such a problem?
>
> 1. sk_memalloc_socks() false
> 2. NIC driver tries to allocate pages and fails

The NIC then is unable to receive any frames.

We need to be able to populate the RX ring buffer, before NIC can
actually be started.

Basically you are saying : We need to allocate memory only _after_
frame has been received by the NIC.
I am saying : We need to allocate memory so that the NIC can put a
future frame in it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-04-19 20:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 18:05 [PATCH 1/4] net/skbuff: don't waste memory reserves Andrey Ryabinin
2019-04-18 18:05 ` [PATCH 2/4] net/skbuff: warn if kmalloc_reserve() fails to allocate memory Andrey Ryabinin
2019-04-18 18:05 ` [PATCH 3/4] net/skbuff: remove unused skb_propagate_pfmemalloc() Andrey Ryabinin
2019-04-18 18:05 ` [PATCH 4/4] net/skbuff: kmalloc_reserve(): remove unused argument Andrey Ryabinin
2019-04-18 18:55 ` [PATCH 1/4] net/skbuff: don't waste memory reserves Eric Dumazet
2019-04-18 18:56   ` David Miller
2019-04-19 13:17   ` Andrey Ryabinin
2019-04-19 13:36     ` Eric Dumazet
2019-04-19 13:41     ` Eric Dumazet
2019-04-19 16:25       ` Andrey Ryabinin
2019-04-19 16:27         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).