linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Richter <robert.richter@cavium.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <rric@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	David Daney <david.daney@cavium.com>,
	Hanjun Guo <hanjun.guo@linaro.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section
Date: Fri, 28 Oct 2016 11:19:05 +0200	[thread overview]
Message-ID: <20161028091905.GM22012@rric.localdomain> (raw)
In-Reply-To: <20161027160136.GD24290@arm.com>

On 27.10.16 17:01:36, Will Deacon wrote:
> Hi Robert,
> 
> On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote:
> > Mark, Will, any opinion here?
> 
> Having looking at this, I'm inclined to agree with you; pfn_valid() is
> all about whether the underlying mem_map (struct page *) entry exists,
> not about whether the page is mappable or not.
> 
> That said, setting the zone for pages representing NOMAP memory feels
> like a slippery slope to losing information about them being NOMAP in
> the first place and the whole problem getting out-of-hand. Whilst I'm
> happy for pfn_valid() to return true (in the sense that we're within
> bounds of mem_map etc), I'm less happy that we're also saying that the
> struct page contains useful information, such as the zone and the node
> information, which is then subsequently used by the NUMA code.

Let's see it in a different way, pfns and the struct page assigned to
each of it is about *physical* memory. The system knows all the
memory, some is free, some reserved and some marked NOMAP. Regardless
of the mapping of the page the mm code maintains and uses that
information.

There are assumptions on validity and checks in the code that now
cause problems due to partly or non-existing data about nomap pages.
This inconsistency is dangerous since a problem may occur any time
then the page area is accessed first, thus a system may crash randomly
depending on the memory access. Luckily, in my case it triggered
reproducible while loading initrd during boot.

I also think that this is not only NUMA related. E.g. the following
bug report is probably also related:

 https://bugzilla.redhat.com/show_bug.cgi?id=1387793

> On top of that, pfn_valid is used in other places as a coarse "is this
> memory?" check, and will cause things like ioremap to fail whereas it
> wouldn't at the moment.

IMO this is a misuse of pfn_valid() that needs to be fixed with
additional checks, e.g. traversing memblocks.

> It feels to me like NOMAP memory is a new type
> of memory where there *is* a struct page, but it shouldn't be used for
> anything.

IMO, a NOMAP page should just be handled like a reserved page except
that the page is marked reserved. See free_low_memory_core_early().
Thus, NOMAP pages are not in the free pages list or set to reserved.
It is simply not available for mapping at all. Isn't that exactly what
it should be?

I also did not yet understand the benefit of the differentiation
between NOMAP and reserved and the original motivation for its
implementation. I looked through the mail threads but could not find
any hint. The only difference I see now is that it is not listed as a
reserved page, but as long as it is not freed it should behave the
same. I remember the case to handle memory different (coherency,
etc.), but are not sure here. Ard, could you explain this?

> I don't think pfn_valid can describe that, given the way it's
> currently used, and flipping the logic is just likely to move the problem
> elsewhere.
> 
> What options do we have for fixing this in the NUMA code?

Out of my mind:

1) Treat NOMAP pages same as reserved pages (my patch).

2) Change mm code to allow arch specific early_pfn_valid().

3) Fix mm code to only access stuct page (of a zone) if pfn_valid() is
   true.

There can be more alternatives. IMO:

 * We shouldn't touch generic mm code.

 * We should maintain a valid struct page for all pages in a sections.

 * We should only traverse memblock where really necessary (arm64
   only).

 * I don't think this problem is numa specific.

-Robert

  reply	other threads:[~2016-10-28  9:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-06  9:52 [PATCH] arm64: mm: Fix memmap to be initialized for the entire section Robert Richter
2016-10-06 10:00 ` Ard Biesheuvel
2016-10-06 16:11   ` Robert Richter
2016-10-17 18:58     ` Robert Richter
2016-10-27 16:01       ` Will Deacon
2016-10-28  9:19         ` Robert Richter [this message]
2016-11-07 21:05           ` Will Deacon
2016-11-09 19:51             ` Robert Richter
2016-11-17 14:25               ` Will Deacon
2016-11-17 15:18                 ` Robert Richter
2016-11-20 17:07                   ` Ard Biesheuvel
2016-11-23 21:15                     ` Robert Richter
2016-11-23 21:25                       ` Ard Biesheuvel
2016-11-24 13:42                         ` Robert Richter
2016-11-24 13:44                           ` Ard Biesheuvel
2016-11-24 13:51                             ` Robert Richter
2016-11-24 13:58                               ` Ard Biesheuvel
2016-11-24 14:11                                 ` Robert Richter
2016-11-24 14:23                                   ` Ard Biesheuvel
2016-11-24 15:09                                     ` Robert Richter
2016-11-24 19:26                                       ` Robert Richter
2016-11-24 19:42                                         ` Ard Biesheuvel
2016-11-25 11:29                                           ` Robert Richter
2016-11-25 12:28                                             ` Ard Biesheuvel
2016-11-25 17:01                                               ` Ard Biesheuvel
2016-10-18 10:18     ` Mark Rutland
2016-10-18 15:02       ` Robert Richter
2016-10-10 15:33 ` David Daney
2016-11-01 16:55 ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161028091905.GM22012@rric.localdomain \
    --to=robert.richter@cavium.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=david.daney@cavium.com \
    --cc=hanjun.guo@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=rric@kernel.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).