linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, platform-driver-x86@vger.kernel.org,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 5/6] mm, memory_hotplug: Provide argument for the pgprot_t in arch_add_memory()
Date: Tue, 10 Dec 2019 10:56:15 +0100	[thread overview]
Message-ID: <20191210095615.GB10404@dhcp22.suse.cz> (raw)
In-Reply-To: <e0f9c1e1-4968-f48f-ec09-853fc8fc779b@deltatee.com>

On Mon 09-12-19 14:24:22, Logan Gunthorpe wrote:
> 
> 
> On 2019-12-09 1:41 p.m., Michal Hocko wrote:
> > On Mon 09-12-19 13:24:19, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-12-09 12:23 p.m., David Hildenbrand wrote:
> >>> On 09.12.19 20:13, Logan Gunthorpe wrote:
> >>>> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >>>> struct page mappings for IO memory. At present, these mappings are created
> >>>> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >>>> x86, an mtrr register will typically override this and force the cache
> >>>> type to be UC-. In the case firmware doesn't set this register it is
> >>>> effectively WB and will typically result in a machine check exception
> >>>> when it's accessed.
> >>>>
> >>>> Other arches are not currently likely to function correctly seeing they
> >>>> don't have any MTRR registers to fall back on.
> >>>>
> >>>> To solve this, add an argument to arch_add_memory() to explicitly
> >>>> set the pgprot value to a specific value.
> >>>>
> >>>> Of the arches that support MEMORY_HOTPLUG: x86_64, s390 and arm64 is a
> >>>> simple change to pass the pgprot_t down to their respective functions
> >>>> which set up the page tables. For x86_32, set the page tables explicitly
> >>>> using _set_memory_prot() (seeing they are already mapped). For sh, reject
> >>>> anything but PAGE_KERNEL settings -- this should be fine, for now, seeing
> >>>> sh doesn't support ZONE_DEVICE anyway.
> >>>>
> >>>> Cc: Dan Williams <dan.j.williams@intel.com>
> >>>> Cc: David Hildenbrand <david@redhat.com>
> >>>> Cc: Michal Hocko <mhocko@suse.com>
> >>>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >>>> ---
> >>>>  arch/arm64/mm/mmu.c            | 4 ++--
> >>>>  arch/ia64/mm/init.c            | 5 ++++-
> >>>>  arch/powerpc/mm/mem.c          | 4 ++--
> >>>>  arch/s390/mm/init.c            | 4 ++--
> >>>>  arch/sh/mm/init.c              | 5 ++++-
> >>>>  arch/x86/mm/init_32.c          | 7 ++++++-
> >>>>  arch/x86/mm/init_64.c          | 4 ++--
> >>>>  include/linux/memory_hotplug.h | 2 +-
> >>>>  mm/memory_hotplug.c            | 2 +-
> >>>>  mm/memremap.c                  | 2 +-
> >>>>  10 files changed, 25 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >>>> index 60c929f3683b..48b65272df15 100644
> >>>> --- a/arch/arm64/mm/mmu.c
> >>>> +++ b/arch/arm64/mm/mmu.c
> >>>> @@ -1050,7 +1050,7 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
> >>>>  }
> >>>>  
> >>>>  #ifdef CONFIG_MEMORY_HOTPLUG
> >>>> -int arch_add_memory(int nid, u64 start, u64 size,
> >>>> +int arch_add_memory(int nid, u64 start, u64 size, pgprot_t prot,
> >>>>  			struct mhp_restrictions *restrictions)
> >>>
> >>> Can we fiddle that into "struct mhp_restrictions" instead?
> >>
> >> Yes, if that's what people want, it's pretty trivial to do. I chose not
> >> to do it that way because it doesn't get passed down to add_pages() and
> >> it's not really a "restriction". If I don't hear any objections, I will
> >> do that for v2.
> > 
> > I do agree that restriction is not the best fit. But I consider prot
> > argument to complicate the API to all users even though it is not really
> > clear whether we are going to have many users really benefiting from it.
> > Look at the vmalloc API and try to find how many users of __vmalloc do
> > not use PAGE_KERNEL.
> > 
> > So I can see two options. One of them is to add arch_add_memory_prot
> > that would allow to have give and extra prot argument or simply call
> > an arch independent API to change the protection after arch_add_memory.
> > The later sounds like much less code. The memory shouldn't be in use by
> > anybody at that stage yet AFAIU. Maybe there even is an API like that.
> 
> Yes, well, we tried something like this by calling set_memory_wc()
> inside memremap_pages(); but on large bars (tens of GB) it was too slow
> (taking several seconds to complete) and on some hosts actually hit CPU
> watchdog errors.

Which looks like something to fix independently.

> So at the very least we'd have to add some cpu_relax() calls to that
> path. And it's also the case that set_memory_wc() is x86 only right now.
> So we'd have to create a new general interface to walk and fixup page
> tables for all arches.
> 
> But, in my opinion, setting up all those page tables twice is too large
> of an overhead and it's better to just add them correctly the first
> time. The changes I propose to do this aren't really a lot of code and
> probably less than creating a new interface for all arches.

OK, fair enough. Then I would suggest going with arch_add_memory_prot
then unless there is a wider disagreement witht that.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-12-10  9:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 19:13 [PATCH 0/6] Allow setting caching mode in arch_add_memory() for P2PDMA Logan Gunthorpe
2019-12-09 19:13 ` [PATCH 1/6] x86/mm: Thread pgprot_t through init_memory_mapping() Logan Gunthorpe
2019-12-09 19:13 ` [PATCH 2/6] x86/mm: Introduce _set_memory_prot() Logan Gunthorpe
2019-12-09 19:13 ` [PATCH 3/6] powerpc/mm: Thread pgprot_t through create_section_mapping() Logan Gunthorpe
2019-12-09 19:13 ` [PATCH 4/6] s390/mm: Thread pgprot_t through vmem_add_mapping() Logan Gunthorpe
2019-12-09 19:13 ` [PATCH 5/6] mm, memory_hotplug: Provide argument for the pgprot_t in arch_add_memory() Logan Gunthorpe
2019-12-09 19:23   ` David Hildenbrand
2019-12-09 20:24     ` Logan Gunthorpe
2019-12-09 20:41       ` Michal Hocko
2019-12-09 21:00         ` Dan Williams
2019-12-09 21:27           ` Logan Gunthorpe
2019-12-09 21:24         ` Logan Gunthorpe
2019-12-10  9:56           ` Michal Hocko [this message]
2019-12-09 20:43       ` Dan Williams
2019-12-09 20:52         ` David Hildenbrand
2019-12-10 10:04         ` Michal Hocko
2019-12-10 10:09           ` David Hildenbrand
2019-12-10 10:34             ` Michal Hocko
2019-12-10 11:25               ` David Hildenbrand
2019-12-10 23:52                 ` Logan Gunthorpe
2019-12-11  8:37                   ` Michal Hocko
2019-12-09 19:13 ` [PATCH 6/6] mm/memremap: Set caching mode for PCI P2PDMA memory to WC Logan Gunthorpe
2019-12-09 20:43 ` [PATCH 0/6] Allow setting caching mode in arch_add_memory() for P2PDMA Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191210095615.GB10404@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).