linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: robert shteynfeld <robert.shteynfeld@gmail.com>,
	Mikhail Zaslonko <zaslonko@linux.ibm.com>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	Steven Sistare <steven.sistare@oracle.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Bob Picco <bob.picco@oracle.com>
Subject: Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684
Date: Mon, 28 Jan 2019 12:43:58 +0100	[thread overview]
Message-ID: <20190128114358.GH18811@dhcp22.suse.cz> (raw)
In-Reply-To: <20190125181549.GE20411@dhcp22.suse.cz>

On Fri 25-01-19 19:15:49, Michal Hocko wrote:
> On Fri 25-01-19 18:33:15, Michal Hocko wrote:
> > On Fri 25-01-19 17:39:38, Michal Hocko wrote:
> > > On Fri 25-01-19 11:16:30, robert shteynfeld wrote:
> > > > Attached is the dmesg from patched kernel.
> > > 
> > > Your Node1 physical memory range precedes Node0 which is quite unusual
> > > but it shouldn't be a huge problem on its own. But memory ranges are
> > > not aligned to the memory section
> > > 
> > > [    0.286954] Early memory node ranges
> > > [    0.286955]   node   1: [mem 0x0000000000001000-0x0000000000090fff]
> > > [    0.286955]   node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
> > > [    0.286956]   node   1: [mem 0x0000000100000000-0x0000001423ffffff]
> > > [    0.286956]   node   0: [mem 0x0000001424000000-0x0000002023ffffff]
> > > 
> > > As you can see the last pfn for the node1 is inside the section and
> > > Node0 starts right after. This is quite unusual as well. If for no other
> > > reasons then the memmap of those struct pages will be remote for one or
> > > the other. Actually I am not even sure we can handle that properly
> > > because we do expect 1:1 mapping between sections and nodes.
> > > 
> > > Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug:
> > > initialize struct pages for the full memory section") made any
> > > difference. We simply write over a potentially initialized struct page
> > > and blow up on that. I strongly suspect that the commit just uncovered
> > > a pre-existing problem. Let me think what we can do about that.
> > 
> > Appart from force aligning node's start the only other option is to
> > revert 2830bf6f05fb and handling the underlying issue in the hotplug
> > code.
> 
> We cannot really align because we have things like ZONE_DMA starting at
> 0x1000 and who knows what else. So let's go with the revert. Hutplug
> simply needs a larger surgery to get rid of the PAGES_PER_SECTION
> inherent assumptions.
> 
> Linus, could you take the revert please?

or should I post the patch as a reply to make your life easier?

> From 817b18d3db36a6900ca9043af8c1416c56358be3 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 25 Jan 2019 19:08:58 +0100
> Subject: [PATCH] Revert "mm, memory_hotplug: initialize struct pages for the
>  full memory section"
> 
> This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684.
> 
> The underlying assumption that one sparse section belongs into a single
> numa node doesn't hold really. Robert Shteynfeld has reported a boot
> failure. The boot log was not captured but his memory layout is as
> follows:
> [    0.286954] Early memory node ranges
> [    0.286955]   node   1: [mem 0x0000000000001000-0x0000000000090fff]
> [    0.286955]   node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
> [    0.286956]   node   1: [mem 0x0000000100000000-0x0000001423ffffff]
> [    0.286956]   node   0: [mem 0x0000001424000000-0x0000002023ffffff]
> 
> This means that node0 starts in the middle of a memory section which is
> also in node1. memmap_init_zone tries to initialize padding of a section
> even when it is outside of the given pfn range because there are code
> paths (e.g. memory hotplug) which assume that the full worth of memory
> section is always initialized. In this particular case, though, such a
> range is already intialized and most likely already managed by the page
> allocator. Scribbling over those pages corrupts the internal state and
> likely blows up when any of those pages gets used.
> 
> Reported-by: Robert Shteynfeld <robert.shteynfeld@gmail.com>
> Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d295c9bc01a8..35fdde041f5c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5701,18 +5701,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			cond_resched();
>  		}
>  	}
> -#ifdef CONFIG_SPARSEMEM
> -	/*
> -	 * If the zone does not span the rest of the section then
> -	 * we should at least initialize those pages. Otherwise we
> -	 * could blow up on a poisoned page in some paths which depend
> -	 * on full sections being initialized (e.g. memory hotplug).
> -	 */
> -	while (end_pfn % PAGES_PER_SECTION) {
> -		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
> -		end_pfn++;
> -	}
> -#endif
>  }
>  
>  #ifdef CONFIG_ZONE_DEVICE
> -- 
> 2.20.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2019-01-28 11:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CADfvbxqC1+XoSsn0sEDqFE16tN1Pq46-QDAaCu=AXJdZeY3rDQ@mail.gmail.com>
2019-01-24 15:48 ` Fwd: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684 robert shteynfeld
2019-01-25  4:48 ` Linus Torvalds
2019-01-25  7:37   ` Michal Hocko
2019-01-25  8:19     ` Michal Hocko
2019-01-25  8:29       ` Michal Hocko
2019-01-25 15:52         ` robert shteynfeld
2019-01-25 15:58           ` Michal Hocko
2019-01-25 16:16             ` robert shteynfeld
2019-01-25 16:39               ` Michal Hocko
2019-01-25 16:51                 ` robert shteynfeld
2019-01-25 17:33                 ` Michal Hocko
2019-01-25 18:15                   ` Michal Hocko
2019-01-28  6:37                     ` Mikhail Gavrilov
2019-01-28  6:43                       ` Michal Hocko
2019-01-28 11:43                     ` Michal Hocko [this message]
2019-01-28 18:42                       ` Linus Torvalds
2019-01-28 19:02                         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128114358.GH18811@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=bob.picco@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=robert.shteynfeld@gmail.com \
    --cc=steven.sistare@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=zaslonko@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).