All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Michal Hocko <mhocko@suse.com>,
	linux-ia64@vger.kernel.org, Linux-sh <linux-sh@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, Will Deacon <will@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-s390 <linux-s390@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michal Hocko <mhocko@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Eric Badger <ebadger@gigaio.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 02 Mar 2020 20:26:45 +0000	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot = PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-ia64@vger.kernel.org,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Linux-sh <linux-sh@vger.kernel.org>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Eric Badger <ebadger@gigaio.com>, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 2 Mar 2020 12:26:45 -0800	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot == PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-ia64@vger.kernel.org,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Linux-sh <linux-sh@vger.kernel.org>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 2 Mar 2020 12:26:45 -0800	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot == PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-ia64@vger.kernel.org,
	 linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	 Linux-sh <linux-sh@vger.kernel.org>,
	platform-driver-x86@vger.kernel.org,
	 Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	 David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Christoph Hellwig <hch@lst.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Eric Badger <ebadger@gigaio.com>,  Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 2 Mar 2020 12:26:45 -0800	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot == PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.


WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Michal Hocko <mhocko@suse.com>,
	linux-ia64@vger.kernel.org, Linux-sh <linux-sh@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, Will Deacon <will@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-s390 <linux-s390@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michal Hocko <mhocko@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Eric Badger <ebadger@gigaio.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 2 Mar 2020 12:26:45 -0800	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot == PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Michal Hocko <mhocko@suse.com>,
	linux-ia64@vger.kernel.org, Linux-sh <linux-sh@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, Will Deacon <will@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-s390 <linux-s390@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michal Hocko <mhocko@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Eric Badger <ebadger@gigaio.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params
Date: Mon, 2 Mar 2020 12:26:45 -0800	[thread overview]
Message-ID: <CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com> (raw)
In-Reply-To: <8b13f6aa-77b7-a47d-1a49-b8e2f800ac9d@deltatee.com>

On Mon, Mar 2, 2020 at 10:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-02-29 3:44 p.m., Dan Williams wrote:
> > On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe <logang@deltatee.com> wrote:
> >>
> >> devm_memremap_pages() is currently used by the PCI P2PDMA code to create
> >> struct page mappings for IO memory. At present, these mappings are created
> >> with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
> >> x86, an mtrr register will typically override this and force the cache
> >> type to be UC-. In the case firmware doesn't set this register it is
> >> effectively WB and will typically result in a machine check exception
> >> when it's accessed.
> >>
> >> Other arches are not currently likely to function correctly seeing they
> >> don't have any MTRR registers to fall back on.
> >>
> >> To solve this, provide a way to specify the pgprot value explicitly to
> >> arch_add_memory().
> >>
> >> Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple
> >> change to pass the pgprot_t down to their respective functions which set
> >> up the page tables. For x86_32, set the page tables explicitly using
> >> _set_memory_prot() (seeing they are already mapped). For ia64, s390 and
> >> sh, reject anything but PAGE_KERNEL settings -- this should be fine,
> >> for now, seeing these architectures don't support ZONE_DEVICE.
> >>
> >> A check in __add_pages() is also added to ensure the pgprot parameter was
> >> set for all arches.
> >>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Michal Hocko <mhocko@suse.com>
> >> ---
> >>  arch/arm64/mm/mmu.c            | 3 ++-
> >>  arch/ia64/mm/init.c            | 3 +++
> >>  arch/powerpc/mm/mem.c          | 3 ++-
> >>  arch/s390/mm/init.c            | 3 +++
> >>  arch/sh/mm/init.c              | 3 +++
> >>  arch/x86/mm/init_32.c          | 5 +++++
> >>  arch/x86/mm/init_64.c          | 2 +-
> >>  include/linux/memory_hotplug.h | 2 ++
> >>  mm/memory_hotplug.c            | 5 ++++-
> >>  mm/memremap.c                  | 6 +++---
> >>  10 files changed, 28 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index ee37bca8aba8..ea3fa844a8a2 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >>
> >>         __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> -                            size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
> >> +                            size, params->pgprot, __pgd_pgtable_alloc,
> >> +                            flags);
> >>
> >>         memblock_clear_nomap(start, size);
> >>
> >> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> >> index 97bbc23ea1e3..d637b4ea3147 100644
> >> --- a/arch/ia64/mm/init.c
> >> +++ b/arch/ia64/mm/init.c
> >> @@ -676,6 +676,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (ret)
> >>                 printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> >> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> >> index 19b1da5d7eca..832412bc7fad 100644
> >> --- a/arch/powerpc/mm/mem.c
> >> +++ b/arch/powerpc/mm/mem.c
> >> @@ -138,7 +138,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> >>         resize_hpt_for_hotplug(memblock_phys_mem_size());
> >>
> >>         start = (unsigned long)__va(start);
> >> -       rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
> >> +       rc = create_section_mapping(start, start + size, nid,
> >> +                                   params->pgprot);
> >>         if (rc) {
> >>                 pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
> >>                         start, start + size, rc);
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index e9e4a7abd0cc..87b2d024e75a 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -277,6 +277,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         if (WARN_ON_ONCE(params->altmap))
> >>                 return -EINVAL;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
> >> +               return -EINVAL;
> >> +
> >>         rc = vmem_add_mapping(start, size);
> >>         if (rc)
> >>                 return rc;
> >> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> >> index e5114c053364..b9de2d4fa57e 100644
> >> --- a/arch/sh/mm/init.c
> >> +++ b/arch/sh/mm/init.c
> >> @@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >>         int ret;
> >>
> >> +       if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot)
> >> +               return -EINVAL;
> >> +
> >>         /* We only have ZONE_NORMAL, so this is easy.. */
> >>         ret = __add_pages(nid, start_pfn, nr_pages, params);
> >>         if (unlikely(ret))
> >> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> >> index e25a4218e6ff..96d8e4fb1cc8 100644
> >> --- a/arch/x86/mm/init_32.c
> >> +++ b/arch/x86/mm/init_32.c
> >> @@ -858,6 +858,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  {
> >>         unsigned long start_pfn = start >> PAGE_SHIFT;
> >>         unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       ret = _set_memory_prot(start, nr_pages, params->pgprot);
> >
> > Perhaps a comment since it's not immediately obvious where the
> > PAGE_KERNEL prot was established, and perhaps add a conditional to
> > skip this call in the param->pgprot == PAGE_KERNEL case?
>
> Yes I can add the skip in the PAGE_KERNEL case. Though I'm not sure what
> you are asking for with regards to the comment. Just that pgprot is set
> by the caller usually to PAGE_KERNEL?

No, I'm reacting to this comment in the changelog "For x86_32, set the
page tables explicitly using _set_memory_prot() (seeing they are
already mapped)". You've done some investigation that
x86_32::arch_add_memory() expects the page tables to be already
established. I think that's worth capturing inline in the code for
other people doing cross-arch arch_add_memory() changes.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-03-02 20:26 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 18:24 [PATCH v3 0/7] Allow setting caching mode in arch_add_memory() for P2PDMA Logan Gunthorpe
2020-02-21 18:24 ` Logan Gunthorpe
2020-02-21 18:24 ` Logan Gunthorpe
2020-02-21 18:24 ` Logan Gunthorpe
2020-02-21 18:24 ` [PATCH v3 1/7] mm/memory_hotplug: Drop the flags field from struct mhp_restrictions Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-28 21:31   ` Dan Williams
2020-02-28 21:31     ` Dan Williams
2020-02-28 21:31     ` Dan Williams
2020-02-28 21:31     ` Dan Williams
2020-02-28 21:31     ` Dan Williams
2020-02-28 21:31     ` Dan Williams
2020-03-03  9:50   ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-02-21 18:24 ` [PATCH v3 2/7] mm/memory_hotplug: Rename mhp_restrictions to mhp_params Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-24  9:11   ` David Hildenbrand
2020-02-24  9:11     ` David Hildenbrand
2020-02-24  9:11     ` David Hildenbrand
2020-02-24  9:11     ` David Hildenbrand
2020-02-29 20:44   ` Dan Williams
2020-02-29 20:44     ` Dan Williams
2020-02-29 20:44     ` Dan Williams
2020-02-29 20:44     ` Dan Williams
2020-02-29 20:44     ` Dan Williams
2020-02-29 20:44     ` Dan Williams
2020-03-03  9:50   ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-03-03  9:50     ` Michal Hocko
2020-02-21 18:24 ` [PATCH v3 3/7] x86/mm: Thread pgprot_t through init_memory_mapping() Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-21 18:24   ` Logan Gunthorpe
2020-02-29 22:37   ` Dan Williams
2020-02-29 22:37     ` Dan Williams
2020-02-29 22:37     ` Dan Williams
2020-02-29 22:37     ` Dan Williams
2020-02-29 22:37     ` Dan Williams
2020-02-29 22:37     ` Dan Williams
2020-03-03  9:52   ` Michal Hocko
2020-03-03  9:52     ` Michal Hocko
2020-03-03  9:52     ` Michal Hocko
2020-03-03  9:52     ` Michal Hocko
2020-03-03  9:52     ` Michal Hocko
2020-02-21 18:25 ` [PATCH v3 4/7] x86/mm: Introduce _set_memory_prot() Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-29 22:33   ` Dan Williams
2020-02-29 22:33     ` Dan Williams
2020-02-29 22:33     ` Dan Williams
2020-02-29 22:33     ` Dan Williams
2020-02-29 22:33     ` Dan Williams
2020-02-29 22:33     ` Dan Williams
2020-03-02 18:46     ` Logan Gunthorpe
2020-03-02 18:46       ` Logan Gunthorpe
2020-03-02 18:46       ` Logan Gunthorpe
2020-03-02 18:46       ` Logan Gunthorpe
2020-03-02 18:46       ` Logan Gunthorpe
2020-02-21 18:25 ` [PATCH v3 5/7] powerpc/mm: Thread pgprot_t through create_section_mapping() Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25 ` [PATCH v3 6/7] mm/memory_hotplug: Add pgprot_t to mhp_params Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-24  9:26   ` David Hildenbrand
2020-02-24  9:26     ` David Hildenbrand
2020-02-24  9:26     ` David Hildenbrand
2020-02-24  9:26     ` David Hildenbrand
2020-02-29 22:44   ` Dan Williams
2020-02-29 22:44     ` Dan Williams
2020-02-29 22:44     ` Dan Williams
2020-02-29 22:44     ` Dan Williams
2020-02-29 22:44     ` Dan Williams
2020-02-29 22:44     ` Dan Williams
2020-03-02 18:55     ` Logan Gunthorpe
2020-03-02 18:55       ` Logan Gunthorpe
2020-03-02 18:55       ` Logan Gunthorpe
2020-03-02 18:55       ` Logan Gunthorpe
2020-03-02 18:55       ` Logan Gunthorpe
2020-03-02 20:26       ` Dan Williams [this message]
2020-03-02 20:26         ` Dan Williams
2020-03-02 20:26         ` Dan Williams
2020-03-02 20:26         ` Dan Williams
2020-03-02 20:26         ` Dan Williams
2020-03-02 20:26         ` Dan Williams
2020-02-21 18:25 ` [PATCH v3 7/7] mm/memremap: Set caching mode for PCI P2PDMA memory to WC Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-21 18:25   ` Logan Gunthorpe
2020-02-29 22:47   ` Dan Williams
2020-02-29 22:47     ` Dan Williams
2020-02-29 22:47     ` Dan Williams
2020-02-29 22:47     ` Dan Williams
2020-02-29 22:47     ` Dan Williams
2020-02-29 22:47     ` Dan Williams
2020-03-02 21:20     ` Logan Gunthorpe
2020-03-02 21:20       ` Logan Gunthorpe
2020-03-02 21:20       ` Logan Gunthorpe
2020-03-02 21:20       ` Logan Gunthorpe
2020-03-02 21:20       ` Logan Gunthorpe
2020-02-27 17:17 ` [PATCH v3 0/7] Allow setting caching mode in arch_add_memory() for P2PDMA Jason Gunthorpe
2020-02-27 17:17   ` Jason Gunthorpe
2020-02-27 17:17   ` Jason Gunthorpe
2020-02-27 17:17   ` Jason Gunthorpe
2020-02-27 17:17   ` Jason Gunthorpe
2020-02-27 17:21   ` Logan Gunthorpe
2020-02-27 17:21     ` Logan Gunthorpe
2020-02-27 17:21     ` Logan Gunthorpe
2020-02-27 17:21     ` Logan Gunthorpe
2020-02-27 17:21     ` Logan Gunthorpe
2020-02-27 17:43     ` Jason Gunthorpe
2020-02-27 17:43       ` Jason Gunthorpe
2020-02-27 17:43       ` Jason Gunthorpe
2020-02-27 17:43       ` Jason Gunthorpe
2020-02-27 17:43       ` Jason Gunthorpe
2020-02-27 17:54       ` Logan Gunthorpe
2020-02-27 17:54         ` Logan Gunthorpe
2020-02-27 17:54         ` Logan Gunthorpe
2020-02-27 17:54         ` Logan Gunthorpe
2020-02-27 17:54         ` Logan Gunthorpe
2020-02-27 17:55       ` Dan Williams
2020-02-27 17:55         ` Dan Williams
2020-02-27 17:55         ` Dan Williams
2020-02-27 17:55         ` Dan Williams
2020-02-27 17:55         ` Dan Williams
2020-02-27 17:55         ` Dan Williams
2020-02-27 18:03         ` Jason Gunthorpe
2020-02-27 18:03           ` Jason Gunthorpe
2020-02-27 18:03           ` Jason Gunthorpe
2020-02-27 18:03           ` Jason Gunthorpe
2020-02-27 18:03           ` Jason Gunthorpe
2020-02-27 18:08           ` Dan Williams
2020-02-27 18:08             ` Dan Williams
2020-02-27 18:08             ` Dan Williams
2020-02-27 18:08             ` Dan Williams
2020-02-27 18:08             ` Dan Williams
2020-02-27 18:08             ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4g3c0rhW0eofG6FFMVNVPiw5fxP7LUpFJ2OYdLCAabZ1Q@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=ebadger@gigaio.com \
    --cc=hch@lst.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.