All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Qian Cai <cai@lca.pw>,
	Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, Mike Rapoport <rppt@linux.ibm.com>,
	Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
Date: Wed, 25 Nov 2020 07:45:30 +0100	[thread overview]
Message-ID: <35F8AADA-6CAA-4BD6-A4CF-6F29B3F402A4@redhat.com> (raw)
In-Reply-To: <X73s8fxDKPRD6wET@redhat.com>


> Am 25.11.2020 um 06:34 schrieb Andrea Arcangeli <aarcange@redhat.com>:
> 
> Hello,
> 
>> On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote:
>>> On 11/21/20 8:45 PM, Andrea Arcangeli wrote:
>>> A corollary issue was fixed in
>>> 39639000-39814fff : Unknown E820 type
>>> 
>>> pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM:
>>> 
>>> 7a17b000-7a216fff : Unknown E820 type
>> 
>> It would be nice to also provide a /proc/zoneinfo and how exactly the 
>> "zone_spans_pfn" was violated. I assume we end up below zone's 
>> start_pfn, but is it true?
> 
> Agreed, I was about to grab that info along with all page struct
> around the pfn 0x7a200 and phys address 0x7a216fff.
> 
> # grep -A1 E820 /proc/iomem
> 7a17b000-7a216fff : Unknown E820 type
> 7a217000-7bffffff : System RAM
> 
> DMA      zone_start_pfn 1            zone_end_pfn() 4096         contiguous 1     
> DMA32    zone_start_pfn 4096         zone_end_pfn() 1048576      contiguous 0     
> Normal   zone_start_pfn 1048576      zone_end_pfn() 4715392      contiguous 1     
> Movable  zone_start_pfn 0            zone_end_pfn() 0            contiguous 0     
> 
> 500222 0x7a1fe000 0x1fff000000001000 reserved True
> 500223 0x7a1ff000 0x1fff000000001000 reserved True
> 
> # I suspect "highest pfn" was somewhere in the RAM range
> # 0x7a217000-0x7a400000 and the pageblock_start_pfn(pfn)
> # made highest point to pfn 0x7a200 physaddr 0x7a200000
> # below, which is reserved indeed since it's non-RAM
> # first number is pfn hex(500224) == 0x7a200
> 
> pfn    physaddr   page->flags
> 500224 0x7a200000 0x1fff000000001000 reserved True
> 500225 0x7a201000 0x1fff000000001000 reserved True
> *snip*
> 500245 0x7a215000 0x1fff000000001000 reserved True
> 500246 0x7a216000 0x1fff000000001000 reserved True
> 500247 0x7a217000 0x3fff000000000000 reserved False
> 500248 0x7a218000 0x3fff000000000000 reserved False
> 
> All RAM pages non-reserved are starting at 0x7a217000 as expected.
> 
> The non-RAM page_zonenum(pfn_to_page(0x7a200)) points to ZONE_DMA and 
> page_zone(page) below was the DMA zone despite the pfn of 0x7a200 is
> in DMA32.
> 
>    VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 
> So the patch I sent earlier should prevent the above BUG_ON by not
> setting highest to 0x7a200 when pfn is in the phys RAM range
> 0x7a217000-0x7a400000, because pageblock_pfn_to_page will notice that
> the zone is the wrong one.
> 
>    if (page_zone(start_page) != zone)
>        return NULL;
> 
> However the real bug seems that reserved pages have a zero zone_id in
> the page->flags when it should have the real zone id/nid. The patch I
> sent earlier to validate highest would only be needed to deal with
> pfn_valid.
> 
> Something must have changed more recently than v5.1 that caused the
> zoneid of reserved pages to be wrong, a possible candidate for the
> real would be this change below:
> 
> +               __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
> 

Before that change, the memmap of memory holes were only zeroed out. So the zones/nid was 0, however, pages were not reserved and had a refcount of zero - resulting in other issues.

Most pfn walkers shouldn‘t mess with reserved pages and simply skip them. That would be the right fix here.

> Even if it may not be it, at the light of how the reserved page
> zoneid/nid initialized went wrong, the above line like it's too flakey
> to stay.
> 
> It'd be preferable if the pfn_valid fails and the
> pfn_to_section_nr(pfn) returns an invalid section for the intermediate
> step. Even better memset 0xff over the whole page struct until the
> second stage comes around.

I recently discussed with Baoquan to
1. Using a new pagetype to mark memory holes
2. Using special nid/zid to mark memory holes

Setting the memmap to 0xff would be even more dangerous - pfn_zone() might simply BUG_ON.

> 
> Whenever pfn_valid is true, it's better that the zoneid/nid is correct
> all times, otherwise if the second stage fails we end up in a bug with
> weird side effects.

Memory holes with a valid memmap might not have a zone/nid. For now, skipping reserved pages should be good enough, no?

> 
> Maybe it's not the above that left a zero zoneid though, I haven't
> tried to bisect it yet to look how the page->flags looked like on a
> older kernel that didn't seem to reproduce this crash, I'm just
> guessing.
> 
> Thanks,
> Andrea


  reply	other threads:[~2020-11-25  6:45 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24  3:43 ` Baoquan He
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand [this message]
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport
2020-11-25 21:38                 ` Andrea Arcangeli
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35F8AADA-6CAA-4BD6-A4CF-6F29B3F402A4@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.