linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Jiri Slaby <jslaby@suse.cz>,
	Valdis.Kletnieks@vt.edu, Jiri Slaby <jirislaby@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>
Subject: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD"
Date: Mon, 12 Nov 2012 11:37:31 +0000	[thread overview]
Message-ID: <20121112113731.GS8218@suse.de> (raw)
In-Reply-To: <509F6C2A.9060502@redhat.com>

With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

	Hmm,  so it's just took longer to hit the problem and observe
	kswapd0 spinning on my CPU again - it's not as endless like before -
	but still it easily eats minutes - it helps to	turn off  Firefox
	or TB  (memory hungry apps) so kswapd0 stops soon - and restart
	those apps again.  (And I still have like >1GB of cached memory)

	kswapd0         R  running task        0    30      2 0x00000000
	 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246
	 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8
	 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000
	Call Trace:
	 [<ffffffff81555bf2>] preempt_schedule+0x42/0x60
	 [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60
	 [<ffffffff81192971>] put_super+0x31/0x40
	 [<ffffffff81192a42>] drop_super+0x22/0x30
	 [<ffffffff81193b89>] prune_super+0x149/0x1b0
	 [<ffffffff81141e2a>] shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still
be the the case that kswapd is awake for a prolonged period of time
as pgdat->kswapd_max_order is updated each time. This is noticed by
the main kswapd() loop and it will not call kswapd_try_to_sleep().
Instead it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this is
not a bug we should release with, a safer path is to revert "mm: remove
__GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the
balance_pgdat() logic in general.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 drivers/mtd/mtdcore.c           |    6 ++++--
 include/linux/gfp.h             |    5 ++++-
 include/trace/events/gfpflags.h |    1 +
 mm/page_alloc.c                 |    7 ++++---
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 374c46d..ec794a7 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -1077,7 +1077,8 @@ EXPORT_SYMBOL_GPL(mtd_writev);
  * until the request succeeds or until the allocation size falls below
  * the system page size. This attempts to make sure it does not adversely
  * impact system performance, so when allocating more than one page, we
- * ask the memory allocator to avoid re-trying.
+ * ask the memory allocator to avoid re-trying, swapping, writing back
+ * or performing I/O.
  *
  * Note, this function also makes sure that the allocated buffer is aligned to
  * the MTD device's min. I/O unit, i.e. the "mtd->writesize" value.
@@ -1091,7 +1092,8 @@ EXPORT_SYMBOL_GPL(mtd_writev);
  */
 void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size)
 {
-	gfp_t flags = __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY;
+	gfp_t flags = __GFP_NOWARN | __GFP_WAIT |
+		       __GFP_NORETRY | __GFP_NO_KSWAPD;
 	size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE);
 	void *kbuf;
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 02c1c971..d0a7967 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -31,6 +31,7 @@ struct vm_area_struct;
 #define ___GFP_THISNODE		0x40000u
 #define ___GFP_RECLAIMABLE	0x80000u
 #define ___GFP_NOTRACK		0x200000u
+#define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
 
@@ -85,6 +86,7 @@ struct vm_area_struct;
 #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */
 #define __GFP_NOTRACK	((__force gfp_t)___GFP_NOTRACK)  /* Don't track with kmemcheck */
 
+#define __GFP_NO_KSWAPD	((__force gfp_t)___GFP_NO_KSWAPD)
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */
 #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)	/* Allocator intends to dirty page */
 
@@ -114,7 +116,8 @@ struct vm_area_struct;
 				 __GFP_MOVABLE)
 #define GFP_IOFS	(__GFP_IO | __GFP_FS)
 #define GFP_TRANSHUGE	(GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN)
+			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
+			 __GFP_NO_KSWAPD)
 
 #ifdef CONFIG_NUMA
 #define GFP_THISNODE	(__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h
index 9391706..d6fd8e5 100644
--- a/include/trace/events/gfpflags.h
+++ b/include/trace/events/gfpflags.h
@@ -36,6 +36,7 @@
 	{(unsigned long)__GFP_RECLAIMABLE,	"GFP_RECLAIMABLE"},	\
 	{(unsigned long)__GFP_MOVABLE,		"GFP_MOVABLE"},		\
 	{(unsigned long)__GFP_NOTRACK,		"GFP_NOTRACK"},		\
+	{(unsigned long)__GFP_NO_KSWAPD,	"GFP_NO_KSWAPD"},	\
 	{(unsigned long)__GFP_OTHER_NODE,	"GFP_OTHER_NODE"}	\
 	) : "GFP_NOWAIT"
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..7228260 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2416,8 +2416,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto nopage;
 
 restart:
-	wake_all_kswapd(order, zonelist, high_zoneidx,
-					zone_idx(preferred_zone));
+	if (!(gfp_mask & __GFP_NO_KSWAPD))
+		wake_all_kswapd(order, zonelist, high_zoneidx,
+						zone_idx(preferred_zone));
 
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
@@ -2494,7 +2495,7 @@ rebalance:
 	 * system then fail the allocation instead of entering direct reclaim.
 	 */
 	if ((deferred_compaction || contended_compaction) &&
-	    (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE)
+						(gfp_mask & __GFP_NO_KSWAPD))
 		goto nopage;
 
 	/* Try direct reclaim and then allocating */

  reply	other threads:[~2012-11-12 11:37 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11  8:52 kswapd0: wxcessive CPU usage Jiri Slaby
2012-10-11 13:44 ` Valdis.Kletnieks
2012-10-11 15:34   ` Jiri Slaby
2012-10-11 17:56     ` Valdis.Kletnieks
2012-10-11 17:59       ` Jiri Slaby
2012-10-11 18:19         ` Valdis.Kletnieks
2012-10-11 22:08           ` kswapd0: excessive " Jiri Slaby
2012-10-12 12:37             ` Jiri Slaby
2012-10-12 13:57               ` Mel Gorman
2012-10-15  9:54                 ` Jiri Slaby
2012-10-15 11:09                   ` Mel Gorman
2012-10-29 10:52                     ` Thorsten Leemhuis
2012-10-30 19:18                       ` Mel Gorman
2012-10-31 11:25                         ` Thorsten Leemhuis
2012-10-31 15:04                           ` Mel Gorman
2012-11-04 16:36                         ` Rik van Riel
2012-11-02 10:44                     ` Zdenek Kabelac
2012-11-02 10:53                       ` Jiri Slaby
2012-11-02 19:45                         ` Jiri Slaby
2012-11-04 11:26                           ` Zdenek Kabelac
2012-11-05 14:24                           ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman
2012-11-06 10:15                             ` Johannes Hirte
2012-11-09  8:36                               ` Mel Gorman
2012-11-14 21:43                                 ` Johannes Hirte
2012-11-09  9:12                             ` Mel Gorman
2012-11-09  4:22                           ` kswapd0: excessive CPU usage Seth Jennings
2012-11-09  8:07                             ` Zdenek Kabelac
2012-11-09  9:06                               ` Mel Gorman
2012-11-11  9:13                                 ` Zdenek Kabelac
2012-11-12 11:37                                   ` Mel Gorman [this message]
2012-11-16 19:14                                     ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Josh Boyer
2012-11-16 19:51                                       ` Andrew Morton
2012-11-20  1:43                                         ` Valdis.Kletnieks
2012-11-16 20:06                                       ` Mel Gorman
2012-11-20 15:38                                         ` Josh Boyer
2012-11-20 16:13                                           ` Bruno Wolff III
2012-11-20 17:43                                           ` Thorsten Leemhuis
2012-11-23 15:20                                             ` Thorsten Leemhuis
2012-11-27 11:12                                               ` Mel Gorman
2012-11-21 15:08                                           ` Mel Gorman
2012-11-20  9:18                                     ` Glauber Costa
2012-11-20 20:18                                       ` Andrew Morton
2012-11-21  8:30                                         ` Glauber Costa
2012-11-12 12:19                                   ` kswapd0: excessive CPU usage Mel Gorman
2012-11-12 13:13                                     ` Zdenek Kabelac
2012-11-12 13:31                                       ` Mel Gorman
2012-11-12 14:50                                         ` Zdenek Kabelac
2012-11-18 19:00                                         ` Zdenek Kabelac
2012-11-18 19:07                                           ` Jiri Slaby
2012-11-09  8:40                             ` Mel Gorman
2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton
2012-10-11 22:26   ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121112113731.GS8218@suse.de \
    --to=mgorman@suse.de \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=jirislaby@gmail.com \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).