All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: linux-mm@kvack.org, netdev@vger.kernel.org
Cc: willy@infradead.org, aruna.ramakrishna@oracle.com,
	bert.barbe@oracle.com, rama.nichanamatlu@oracle.com,
	venkat.x.venkatsubra@oracle.com, manjunath.b.patil@oracle.com,
	joe.jin@oracle.com, srinivas.eeda@oracle.com,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, davem@davemloft.net,
	edumazet@google.com, vbabka@suse.cz, dongli.zhang@oracle.com
Subject: [PATCH v3 1/1] page_frag: Recover from memory pressure
Date: Sun, 15 Nov 2020 12:10:29 -0800	[thread overview]
Message-ID: <20201115201029.11903-1-dongli.zhang@oracle.com> (raw)

The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.

During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.

However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.

Here is how kernel runs into issue.

1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.

2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.

3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.

4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.

Fix this by freeing and re-allocating the page instead of recycling it.

References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bert Barbe <bert.barbe@oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: SRINIVAS <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
Changed since v1:
  - change author from Matthew to Dongli
  - Add references to all prior discussions
  - Add more details to commit message
Changed since v2:
  - add unlikely (suggested by Eric Dumazet)

 mm/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 23f5066bd4a5..91129ce75ed4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5103,6 +5103,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
 			goto refill;
 
+		if (unlikely(nc->pfmemalloc)) {
+			free_the_page(page, compound_order(page));
+			goto refill;
+		}
+
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
 		/* if size can vary use size else just use PAGE_SIZE */
 		size = nc->size;
-- 
2.17.1


             reply	other threads:[~2020-11-15 20:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-15 20:10 Dongli Zhang [this message]
2020-11-16  8:37 ` [PATCH v3 1/1] page_frag: Recover from memory pressure Eric Dumazet
2020-11-16  8:37   ` Eric Dumazet
2020-11-18 19:46 ` Jakub Kicinski
2020-11-18 21:13   ` Andrew Morton
2020-11-18 23:23     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201115201029.11903-1-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aruna.ramakrishna@oracle.com \
    --cc=bert.barbe@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manjunath.b.patil@oracle.com \
    --cc=netdev@vger.kernel.org \
    --cc=rama.nichanamatlu@oracle.com \
    --cc=srinivas.eeda@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=venkat.x.venkatsubra@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.