All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: "steven.sistare@oracle.com" <steven.sistare@oracle.com>,
	"daniel.m.jordan@oracle.com" <daniel.m.jordan@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"bharata@linux.vnet.ibm.com" <bharata@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
Date: Sat, 7 Apr 2018 10:45:03 -0400	[thread overview]
Message-ID: <CAGM2reYtCb2_czEt1M8KhFz+YxGq=iSGnTskC=FGNM4kXOiS5g@mail.gmail.com> (raw)
In-Reply-To: <20180406124535.k3qyxjfrlo55d5if@xakep.localdomain>

> Let me study your trace, perhaps I will able to figure out the issue
> without reproducing it.

Hi Sasha,

I've been studying this problem more. The issue happens in this stack:

...subsys_init...
 topology_init()
  register_one_node(nid)
   link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages)
    register_mem_sect_under_node(mem_blk, nid)
     get_nid_for_pfn(pfn)
      pfn_to_nid(pfn)
       page_to_nid(page)
        PF_POISONED_CHECK(page)

We are trying to get nid from struct page which has not been
initialized.  My patches add this extra scrutiny to make sure that we
never get invalid nid from a "struct page" by adding
PF_POISONED_CHECK() to page_to_nid(). So, the bug already exists in
Linux where incorrect nid is read. The question is why does happen?

First, I thought, that perhaps struct page is not yet initialized.
But, the initcalls are done after deferred pages are initialized, and
thus every struct page must be initialized by now. Also, if deferred
pages were enabled, we would take a slightly different path and avoid
this bug by getting nid from memblock instead of struct page:

get_nid_for_pfn(pfn)
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 if (system_state < SYSTEM_RUNNING)
  return early_pfn_to_nid(pfn);
#endif

I also verified in your config that CONFIG_DEFERRED_STRUCT_PAGE_INIT
is not set. So, one way to fix this issue, is to remove this "#ifdef"
(I have not checked for dependancies), but this is simply addressing
symptom, not fixing the actual issue.

Thus, we have a "struct page" backing memory for this pfn, but we have
not initialized it. For some reason memmap_init_zone() decided to skip
it, and I am not sure why. Looking at the code we skip initializing
if:
!early_pfn_valid(pfn)) aka !pfn_valid(pfn) and if !early_pfn_in_nid(pfn, nid).

I suspect, this has something to do with !pfn_valid(pfn). But, without
having a machine on which I could reproduce this problem, I cannot
study it further to determine exactly why pfn is not valid.

Please replace !pfn_valid_within() with !pfn_valid() in
get_nid_for_pfn() and see if problem still happens. If it does not
happen, lets study the memory map, pgdata's start end, and the value
of this pfn.

Thank you,
Pasha

  reply	other threads:[~2018-04-07 14:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-31 21:02 [PATCH v2 0/2] optimize memory hotplug Pavel Tatashin
2018-01-31 21:02 ` Pavel Tatashin
2018-01-31 21:02 ` [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking Pavel Tatashin
2018-01-31 21:02   ` Pavel Tatashin
2018-02-02  4:12   ` kbuild test robot
2018-03-13 23:43   ` Sasha Levin
2018-03-14  0:38     ` Pavel Tatashin
2018-03-14  0:53       ` Sasha Levin
2018-03-14  1:02         ` Pavel Tatashin
2018-03-15 19:04         ` Pavel Tatashin
2018-03-15 20:43           ` Sasha Levin
2018-04-04  2:17             ` Pavel Tatashin
2018-04-05 13:49               ` Pavel Tatashin
2018-04-05 19:22                 ` Sasha Levin
2018-04-06 12:45                   ` Pavel Tatashin
2018-04-07 14:45                     ` Pavel Tatashin [this message]
2018-01-31 21:03 ` [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug Pavel Tatashin
2018-01-31 21:03   ` Pavel Tatashin
2018-02-02  4:48   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGM2reYtCb2_czEt1M8KhFz+YxGq=iSGnTskC=FGNM4kXOiS5g@mail.gmail.com' \
    --to=pasha.tatashin@oracle.com \
    --cc=Alexander.Levin@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=steven.sistare@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.