[PATCH stable-only] mm: fix classzone_idx underflow in shrink

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones()
@ 2017-07-04  8:45 Vlastimil Babka
  2017-07-04  8:55 ` Greg KH
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Vlastimil Babka @ 2017-07-04  8:45 UTC (permalink / raw)
  To: stable
  Cc: Johannes Weiner, Minchan Kim, Michal Hocko, linux-mm, LKML, Mel Gorman

Hi,

I realize this is against the standard stable policy, but I see no other
way, because the mainline accidental fix is part of 34+ patch reclaim
rework, that would be absurd to try to backport into stable. The fix is
a one-liner though.

The bug affects at least 4.4.y, and likely also older stable trees that
backported commit 7bf52fb891b6, which itself was a fix for 3.19 commit
6b4f7799c6a5. You could revert the 7bf52fb891b6 backport, but then 32bit
with highmem might suffer from OOM or thrashing.

More details in the changelog itself.

----8<----

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones()
  2017-07-04  8:45 [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones() Vlastimil Babka
@ 2017-07-04  8:55 ` Greg KH
  2017-07-04  9:01 ` Michal Hocko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2017-07-04  8:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: stable, Johannes Weiner, Minchan Kim, Michal Hocko, linux-mm,
	LKML, Mel Gorman

On Tue, Jul 04, 2017 at 10:45:43AM +0200, Vlastimil Babka wrote:
> Hi,
> 
> I realize this is against the standard stable policy, but I see no other
> way, because the mainline accidental fix is part of 34+ patch reclaim
> rework, that would be absurd to try to backport into stable. The fix is
> a one-liner though.
> 
> The bug affects at least 4.4.y, and likely also older stable trees that
> backported commit 7bf52fb891b6, which itself was a fix for 3.19 commit
> 6b4f7799c6a5. You could revert the 7bf52fb891b6 backport, but then 32bit
> with highmem might suffer from OOM or thrashing.

I need a bunch of acks from developers in this area before I can take
this patch :)

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones()
  2017-07-04  8:45 [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones() Vlastimil Babka
  2017-07-04  8:55 ` Greg KH
@ 2017-07-04  9:01 ` Michal Hocko
  2017-07-04 10:24 ` Mel Gorman
  2017-07-07  9:10 ` Patch "mm: fix classzone_idx underflow in shrink_zones()" has been added to the 4.4-stable tree gregkh
  3 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2017-07-04  9:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: stable, Johannes Weiner, Minchan Kim, linux-mm, LKML, Mel Gorman

On Tue 04-07-17 10:45:43, Vlastimil Babka wrote:
> Hi,
> 
> I realize this is against the standard stable policy, but I see no other
> way, because the mainline accidental fix is part of 34+ patch reclaim
> rework, that would be absurd to try to backport into stable. The fix is
> a one-liner though.
> 
> The bug affects at least 4.4.y, and likely also older stable trees that
> backported commit 7bf52fb891b6, which itself was a fix for 3.19 commit
> 6b4f7799c6a5. You could revert the 7bf52fb891b6 backport, but then 32bit
> with highmem might suffer from OOM or thrashing.
> 
> More details in the changelog itself.
> 
> ----8<----
> >From a1a1e459276298ac98827520e07923aa1219dbe1 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 22 Jun 2017 16:23:13 +0200
> Subject: [PATCH] mm: fix classzone_idx underflow in shrink_zones()
> 
> We've got reported a BUG in do_try_to_free_pages():
> 
> BUG: unable to handle kernel paging request at ffff8ffffff28990
> IP: [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
> PGD 0
> Oops: 0000 [#1] SMP
> megaraid_sas sg scsi_mod efivarfs autofs4
> Supported: No, Unsupported modules are loaded
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> task: ffff88ffd0d4c540 ti: ffff88ffd0e48000 task.ti: ffff88ffd0e48000
> RIP: 0010:[<ffffffff8119abe0>]  [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
> RSP: 0018:ffff88ffd0e4ba60  EFLAGS: 00010206
> RAX: 000006fffffff900 RBX: 00000000ffffffff RCX: ffff88fffff29000
> RDX: 000000ffffffff00 RSI: 0000000000000003 RDI: 00000000024200c8
> RBP: 0000000001320122 R08: 0000000000000000 R09: ffff88ffd0e4bbac
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88ffd0e4bae0
> R13: 0000000000000e00 R14: ffff88fffff2a500 R15: ffff88fffff2b300
> FS:  0000000000000000(0000) GS:ffff88ffe6440000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff8ffffff28990 CR3: 0000000001c0a000 CR4: 00000000003406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Stack:
>  00000002db570a80 024200c80000001e ffff88fffff2b300 0000000000000000
>  ffff88fffffd5700 ffff88ffd0d4c540 ffff88ffd0d4c540 ffffffff0000000c
>  0000000000000000 0000000000000040 00000000024200c8 ffff88ffd0e4bae0
> Call Trace:
>  [<ffffffff8119afea>] try_to_free_pages+0xba/0x170
>  [<ffffffff8118cf2f>] __alloc_pages_nodemask+0x53f/0xb20
>  [<ffffffff811d39ff>] alloc_pages_current+0x7f/0x100
>  [<ffffffff811e2232>] migrate_pages+0x202/0x710
>  [<ffffffff815dadaa>] __offline_pages.constprop.23+0x4ba/0x790
>  [<ffffffff81463263>] memory_subsys_offline+0x43/0x70
>  [<ffffffff8144cbed>] device_offline+0x7d/0xa0
>  [<ffffffff81392fa2>] acpi_bus_offline+0xa5/0xef
>  [<ffffffff81394a77>] acpi_device_hotplug+0x21b/0x41f
>  [<ffffffff8138dab7>] acpi_hotplug_work_fn+0x1a/0x23
>  [<ffffffff81093cee>] process_one_work+0x14e/0x410
>  [<ffffffff81094546>] worker_thread+0x116/0x490
>  [<ffffffff810999ed>] kthread+0xbd/0xe0
>  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
> 
> This translates to the loop in shrink_zone():
> 
> classzone_idx = requested_highidx;
> while (!populated_zone(zone->zone_pgdat->node_zones +
> 					classzone_idx))
> 	classzone_idx--;
> 
> where no zone is populated, so classzone_idx becomes -1 (in RBX).
> 
> Added debugging output reveals that we enter the function with
> sc->gfp_mask == GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE
> requested_highidx = gfp_zone(sc->gfp_mask) == 2 (ZONE_NORMAL)
> 
> Inside the for loop, however:
> gfp_zone(sc->gfp_mask) == 3 (ZONE_MOVABLE)
> 
> This means we have gone through this branch:
> 
> if (buffer_heads_over_limit)
>     sc->gfp_mask |= __GFP_HIGHMEM;
> 
> This changes the gfp_zone() result, but requested_highidx remains unchanged.
> On nodes where the only populated zone is movable, the inner while loop will
> check only lower zones, which are not populated, and underflow classzone_idx.
> 
> To sum up, the bug occurs in configurations with ZONE_MOVABLE (such as when
> booted with the movable_node parameter) and only in situations when
> buffer_heads_over_limit is true, and there's an allocation with __GFP_MOVABLE
> and without __GFP_HIGHMEM performing direct reclaim.
> 
> This patch makes sure that classzone_idx starts with the correct zone.
> 
> Mainline has been affected in versions 4.6 and 4.7, but the culprit commit has
> been also included in stable trees.
> In mainline, this has been fixed accidentally as part of 34-patch series (plus
> follow-up fixes) "Move LRU page reclaim from zones to nodes", which makes the
> mainline commit unsuitable for stable backport, unfortunately.
> 
> Fixes: 7bf52fb891b6 ("mm: vmscan: reclaim highmem zone if buffer_heads is over limit")
> Obsoleted-by: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
> Debugged-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: <stable@vger.kernel.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> 
> ---
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2539,7 +2539,7 @@ static bool shrink_zones(struct zonelist
>  		if (!populated_zone(zone))
>  			continue;
>  
> -		classzone_idx = requested_highidx;
> +		classzone_idx = gfp_zone(sc->gfp_mask);
>  		while (!populated_zone(zone->zone_pgdat->node_zones +
>  							classzone_idx))
>  			classzone_idx--;

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones()
  2017-07-04  8:45 [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones() Vlastimil Babka
  2017-07-04  8:55 ` Greg KH
  2017-07-04  9:01 ` Michal Hocko
@ 2017-07-04 10:24 ` Mel Gorman
  2017-07-07  9:10 ` Patch "mm: fix classzone_idx underflow in shrink_zones()" has been added to the 4.4-stable tree gregkh
  3 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2017-07-04 10:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: stable, Johannes Weiner, Minchan Kim, Michal Hocko, linux-mm, LKML

On Tue, Jul 04, 2017 at 10:45:43AM +0200, Vlastimil Babka wrote:
> Hi,
> 
> I realize this is against the standard stable policy, but I see no other
> way, because the mainline accidental fix is part of 34+ patch reclaim
> rework, that would be absurd to try to backport into stable. The fix is
> a one-liner though.
> 

That 34+ rework would be excessive to say the least as it's quite a
fundamental set of changes to catch a corner case.

> ----8<----
> From a1a1e459276298ac98827520e07923aa1219dbe1 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 22 Jun 2017 16:23:13 +0200
> Subject: [PATCH] mm: fix classzone_idx underflow in shrink_zones()
> 
> We've got reported a BUG in do_try_to_free_pages():
> 
> BUG: unable to handle kernel paging request at ffff8ffffff28990
> IP: [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
> PGD 0
> Oops: 0000 [#1] SMP
> megaraid_sas sg scsi_mod efivarfs autofs4
> Supported: No, Unsupported modules are loaded
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> task: ffff88ffd0d4c540 ti: ffff88ffd0e48000 task.ti: ffff88ffd0e48000
> RIP: 0010:[<ffffffff8119abe0>]  [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
> RSP: 0018:ffff88ffd0e4ba60  EFLAGS: 00010206
> RAX: 000006fffffff900 RBX: 00000000ffffffff RCX: ffff88fffff29000
> RDX: 000000ffffffff00 RSI: 0000000000000003 RDI: 00000000024200c8
> RBP: 0000000001320122 R08: 0000000000000000 R09: ffff88ffd0e4bbac
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88ffd0e4bae0
> R13: 0000000000000e00 R14: ffff88fffff2a500 R15: ffff88fffff2b300
> FS:  0000000000000000(0000) GS:ffff88ffe6440000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff8ffffff28990 CR3: 0000000001c0a000 CR4: 00000000003406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Stack:
>  00000002db570a80 024200c80000001e ffff88fffff2b300 0000000000000000
>  ffff88fffffd5700 ffff88ffd0d4c540 ffff88ffd0d4c540 ffffffff0000000c
>  0000000000000000 0000000000000040 00000000024200c8 ffff88ffd0e4bae0
> Call Trace:
>  [<ffffffff8119afea>] try_to_free_pages+0xba/0x170
>  [<ffffffff8118cf2f>] __alloc_pages_nodemask+0x53f/0xb20
>  [<ffffffff811d39ff>] alloc_pages_current+0x7f/0x100
>  [<ffffffff811e2232>] migrate_pages+0x202/0x710
>  [<ffffffff815dadaa>] __offline_pages.constprop.23+0x4ba/0x790
>  [<ffffffff81463263>] memory_subsys_offline+0x43/0x70
>  [<ffffffff8144cbed>] device_offline+0x7d/0xa0
>  [<ffffffff81392fa2>] acpi_bus_offline+0xa5/0xef
>  [<ffffffff81394a77>] acpi_device_hotplug+0x21b/0x41f
>  [<ffffffff8138dab7>] acpi_hotplug_work_fn+0x1a/0x23
>  [<ffffffff81093cee>] process_one_work+0x14e/0x410
>  [<ffffffff81094546>] worker_thread+0x116/0x490
>  [<ffffffff810999ed>] kthread+0xbd/0xe0
>  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
> 
> This translates to the loop in shrink_zone():
> 
> classzone_idx = requested_highidx;
> while (!populated_zone(zone->zone_pgdat->node_zones +
> 					classzone_idx))
> 	classzone_idx--;
> 
> where no zone is populated, so classzone_idx becomes -1 (in RBX).
> 

Well.... one zone must have been populated or we wouldn't be here in the
first place but you clear that up later.

> Added debugging output reveals that we enter the function with
> sc->gfp_mask == GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE
> requested_highidx = gfp_zone(sc->gfp_mask) == 2 (ZONE_NORMAL)
> 
> Inside the for loop, however:
> gfp_zone(sc->gfp_mask) == 3 (ZONE_MOVABLE)
> 
> This means we have gone through this branch:
> 
> if (buffer_heads_over_limit)
>     sc->gfp_mask |= __GFP_HIGHMEM;
> 
> This changes the gfp_zone() result, but requested_highidx remains unchanged.
> On nodes where the only populated zone is movable, the inner while loop will
> check only lower zones, which are not populated, and underflow classzone_idx.
> 

And this here is the key. It's a lowmem request when the only populated
zone is ZONE_MOVABLE without the __GFP_HIGHMEM flag being in the
original request.

It's somewhat of a corner case that on systems without highmem that the
movable zone is not used for lowmem requests that are __GFP_MOVABLE. However,
taking the approach of "fixing" that has it's own consequences for the
stability of memory hot-remove in general if it turns out that any of those
lowmem __GFP_MOVABLE requests are in fact not movable (can happen if it
turns out the page is permanently pinned if it's holdiing superblock data
pinned by the mount). In the general case, this is ok but for full memory
node removal, a "fix" in that direction will regress memory node hot-remove
and potentially stop it ever working once a filesystem was mounted.

There is a slight corner-case that a lowmem request may consider
the node to be prematurely balanced by this change in the event that
buffer_heads_over_limit is true. This may result in a second, potentially
redundant, direct reclaim request but the effect will be transient.
I expect that to be extremely rare and only apply in the case where the
ratio of ZONE_MOVABLE to ZONE_NORMAL is abnormally high (potentially due
to excessive use of movable_node).

Given the consequences of alternative fixes (backporting a massive
series that fundamentally alters reclaim and introducing a change that
potentiully breaks memory node removable), I consider this the safest
fix for this particular problem so;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Patch "mm: fix classzone_idx underflow in shrink_zones()" has been added to the 4.4-stable tree
  2017-07-04  8:45 [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones() Vlastimil Babka
                   ` (2 preceding siblings ...)
  2017-07-04 10:24 ` Mel Gorman
@ 2017-07-07  9:10 ` gregkh
  3 siblings, 0 replies; 5+ messages in thread
From: gregkh @ 2017-07-07  9:10 UTC (permalink / raw)
  To: vbabka, gregkh, hannes, linux-kernel, linux-mm, mgorman, mhocko,
	mhocko, mhocko, minchan, stable
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    mm: fix classzone_idx underflow in shrink_zones()

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-fix-classzone_idx-underflow-in-shrink_zones.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-07  9:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-04  8:45 [PATCH stable-only] mm: fix classzone_idx underflow in shrink_zones() Vlastimil Babka
2017-07-04  8:55 ` Greg KH
2017-07-04  9:01 ` Michal Hocko
2017-07-04 10:24 ` Mel Gorman
2017-07-07  9:10 ` Patch "mm: fix classzone_idx underflow in shrink_zones()" has been added to the 4.4-stable tree gregkh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).