linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: David Hildenbrand <david@redhat.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>, Qian Cai <cai@lca.pw>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Michal Hocko <mhocko@kernel.org>,
	Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Ira Weiny <ira.weiny@intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v1] mm: Fix access of uninitialized memmaps in fs/proc/page.c
Date: Fri, 11 Oct 2019 00:11:25 +0000	[thread overview]
Message-ID: <20191011001124.GA17127@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <f0fcdacc-814b-49d6-78da-beeb1fa6b67a@redhat.com>

On Thu, Oct 10, 2019 at 09:30:01AM +0200, David Hildenbrand wrote:
> On 09.10.19 11:57, Naoya Horiguchi wrote:
> > Hi David,
> > 
> > On Wed, Oct 09, 2019 at 11:12:04AM +0200, David Hildenbrand wrote:
> >> There are various places where we access uninitialized memmaps, namely:
> >> - /proc/kpagecount
> >> - /proc/kpageflags
> >> - /proc/kpagecgroup
> >> - memory_failure() - which reuses stable_page_flags() from fs/proc/page.c
> > 
> > Ah right, memory_failure is another victim of this bug.
> > 
> >>
> >> We have initialized memmaps either when the section is online or when
> >> the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
> >> garbage and in the worst case trigger kernel BUGs, especially with
> >> CONFIG_PAGE_POISONING.
> >>
> >> For example, not onlining a DIMM during boot and calling /proc/kpagecount
> >> with CONFIG_PAGE_POISONING:
> >> :/# cat /proc/kpagecount > tmp.test
> >> [   95.600592] BUG: unable to handle page fault for address: fffffffffffffffe
> >> [   95.601238] #PF: supervisor read access in kernel mode
> >> [   95.601675] #PF: error_code(0x0000) - not-present page
> >> [   95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
> >> [   95.602596] Oops: 0000 [#1] SMP NOPTI
> >> [   95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
> >> [   95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> >> [   95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0
> >> [   95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
> >> [   95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
> >> [   95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
> >> [   95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
> >> [   95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
> >> [   95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
> >> [   95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
> >> [   95.609924] FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
> >> [   95.610599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [   95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
> >> [   95.611686] Call Trace:
> >> [   95.611906]  proc_reg_read+0x3c/0x60
> >> [   95.612228]  vfs_read+0xc5/0x180
> >> [   95.612505]  ksys_read+0x68/0xe0
> >> [   95.612785]  do_syscall_64+0x5c/0xa0
> >> [   95.613092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> Note that there are still two possible races as far as I can see:
> >> - pfn_to_online_page() succeeding but the memory getting offlined and
> >>   removed. get_online_mems() could help once we run into this.
> >> - pfn_zone_device() succeeding but the memmap not being fully
> >>   initialized yet. As the memmap is initialized outside of the memory
> >>   hoptlug lock, get_online_mems() can't help.
> >>
> >> Let's keep the existing interfaces working with ZONE_DEVICE memory. We
> >> can later come back and fix these rare races and eventually speed-up the
> >> ZONE_DEVICE detection.
> > 
> > Actually, Toshiki is writing code to refactor and optimize the pfn walking
> > part, where we find the pfn ranges covered by zone devices by running over
> > xarray pgmap_array and use the range info to reduce pointer dereferences
> > to speed up pfn walk. I hope he will share it soon.
> 
> AFAIKT, Michal is not a friend of special-casing PFN walkers in that
> way. We should have a mechanism to detect if a memmap was initialized
> without having to go via pgmap, special-casing. See my other mail where
> I draft one basic approach.

OK, so considering your v2 approach, we could have another pfn_to_page()
variant like pfn_to_zone_device_page(), where we check that a given pfn
belongs to the memory section backed by zone memory, then another check if
the pfn has initialized memmap or not, and return NULL if memmap not
initialied.  We'll try this approach then, but if you find problems/concerns,
please let me know.

Thanks,
Naoya Horiguchi


  reply	other threads:[~2019-10-11  0:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09  9:12 [PATCH v1] mm: Fix access of uninitialized memmaps in fs/proc/page.c David Hildenbrand
2019-10-09  9:37 ` Michal Hocko
2019-10-09 10:19   ` David Hildenbrand
2019-10-09 11:24     ` Michal Hocko
2019-10-09 12:58       ` David Hildenbrand
2019-10-09 13:22         ` Michal Hocko
2019-10-09 13:24           ` David Hildenbrand
2019-10-09 13:29             ` Michal Hocko
2019-10-09 13:32               ` David Hildenbrand
2019-10-09  9:57 ` Naoya Horiguchi
2019-10-10  7:30   ` David Hildenbrand
2019-10-11  0:11     ` Naoya Horiguchi [this message]
2019-10-11  0:50       ` Naoya Horiguchi
2019-10-11  9:53         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191011001124.GA17127@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=koct9i@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=t-fukasawa@vx.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).