linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Qian Cai <cai@lca.pw>,
	Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	linux-kernel@vger.kernel.org, Mike Rapoport <rppt@linux.ibm.com>,
	Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
Date: Tue, 24 Nov 2020 15:56:22 -0500	[thread overview]
Message-ID: <X71zdnZd61r429aO@redhat.com> (raw)
In-Reply-To: <20201124133205.GK3306@suse.de>

Hello,

On Tue, Nov 24, 2020 at 01:32:05PM +0000, Mel Gorman wrote:
> I would hope that is not the case because they are not meant to overlap.
> However, if the beginning of the pageblock was not the start of a zone
> then the pages would be valid but the pfn would still be outside the
> zone boundary. If it was reserved, the struct page is valid but not
> suitable for set_pfnblock_flags_mask. However, it is a concern in
> general because the potential is there that pages are isolated from the
> wrong zone.

I guess we have more than one issue to correct in that function
because the same BUG_ON reproduced again even with the tentative patch
I posted earlier.

So my guess is that the problematic reserved page isn't pointed by the
min_pfn, but it must have been pointed by the "highest" variable
calculated below?

			if (pfn >= highest)
				highest = pageblock_start_pfn(pfn);

When I looked at where "highest" comes from, it lacks
pageblock_pfn_to_page check (which was added around v5.7 to min_pfn).

Is that the real bug, which may be fixed by something like this? (untested)

==
From 262671e88723b3074251189004ceae39dcd1689d Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Sat, 21 Nov 2020 12:55:58 -0500
Subject: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set
 pageblock_skip on reserved pages

A corollary issue was fixed in
e577c8b64d58fe307ea4d5149d31615df2d90861. A second issue remained in
v5.7:

	https://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw

==
page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
flags: 0x1fffe000001000(reserved)
==

73a6e474cb376921a311786652782155eac2fdf0 was applied to supposedly fix
the second issue, but it still reproduces with v5.9 on two different
systems:

==
page:0000000062b3e92f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x39800
flags: 0x1000(reserved)
raw: 0000000000001000 fffff5b880e60008 fffff5b880e60008 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
==
page:000000004d32675d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7a200
flags: 0x1fff000000001000(reserved)
raw: 1fff000000001000 ffffe6c5c1e88008 ffffe6c5c1e88008 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
==

The page is "reserved" in all cases. In the last two crashes with the
pfn:

pfn 0x39800 -> 0x39800000 min_pfn hit non-RAM:

39639000-39814fff : Unknown E820 type

pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM:

7a17b000-7a216fff : Unknown E820 type

The pageblock_pfn_to_page check was missing when rewinding pfn to the
start of the pageblock to calculate the "highest" value.

So the "highest" pfn could point to a non valid pfn or not within the
zone, checking the pageblock_pfn_to_page fixes it.

Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/compaction.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 653862aba266..76f061af8f22 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1418,8 +1418,14 @@ fast_isolate_freepages(struct compact_control *cc)
 			nr_scanned++;
 			pfn = page_to_pfn(freepage);
 
-			if (pfn >= highest)
-				highest = pageblock_start_pfn(pfn);
+			if (pfn >= highest) {
+				unsigned long start_pfn, end_pfn;
+				start_pfn = pageblock_start_pfn(pfn);
+				end_pfn = pageblock_end_pfn(start_pfn);
+				if (pageblock_pfn_to_page(start_pfn, end_pfn,
+							  cc->zone))
+					highest = pfn;
+			}
 
 			if (pfn >= low_pfn) {
 				cc->fast_search_fail = 0;
==


Can't we also try to scan in between start_pfn and "pfn" to see if
there's one pfn that passes the pageblock_pfn_to_page test or isn't it
worth it for the fast isolate variant?

> > Then compact_finished() detects that in
> > compact_zone(), but only after migrate_pages() and thus
> > fast_isolate_freepages() is called.
> > 
> > That would mean distance can be negative, or rather a large unsigned number
> > and low_pfn and min_pfn end up away from the zone?
> > 
> > Or maybe the above doesn't happen, but cc->free_pfn gets so close to start
> > of the zone, that the calculations above make min_pfn go below start_pfn?
> > 
> 
> I think the last part is more likely, going below start_pfn

Would it help if I dump the whole status of the zone and pages around
those addresses in the two systems that are reproducing this in v5.9
as extra check?

I was going to do that right now, to validate all zone->zone_start_pfn
and zone_end_pfn() were correct around that non-RAM reserved page
physical address.

> > In any case I would rather make sure we stay within the expected zone
> > boundaries, than play tricks with PageReserved. Mel?
> > 
> 
> It would be preferable because this time it's PageReserved that happened
> to trip up an assumption in set_pfnblock_flags_mask but if it was a real
> zone and real page then compaction is migrating cross-zone which would
> be surprising.
> 
> Maybe this untested patch?

"highest" is not influenced by either low_pfn or min_pfn so it may
very well be needed, but for another case, I don't think this can help
this specific VM_BUG_ON_PAGE if it's caused by "highest" pfn after all?

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 13cb7a961b31..ef1b5dacc289 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1330,6 +1330,10 @@ fast_isolate_freepages(struct compact_control *cc)
>  	low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
>  	min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
>  
> +	/* Ensure the PFN is within the zone */
> +	low_pfn = max(cc->zone->zone_start_pfn, low_pfn);
> +	min_pfn = max(cc->zone->zone_start_pfn, min_pfn);
> +
>  	if (WARN_ON_ONCE(min_pfn > low_pfn))
>  		low_pfn = min_pfn;
>  
> 
> 
> -- 
> Mel Gorman
> SUSE Labs
> 



  reply	other threads:[~2020-11-24 20:56 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24  3:43 ` Baoquan He
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli [this message]
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport
2020-11-25 21:38                 ` Andrea Arcangeli
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X71zdnZd61r429aO@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).