linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [RFC PATCH 1/4] libnvdimm/namespace: Make namespace size validation arch dependent
Date: Mon, 28 Oct 2019 22:30:01 -0700	[thread overview]
Message-ID: <CAPcyv4gMnSe26QfSBABx0zj3XuFqy=K1XaGnmE3h3sP3Y76nRw@mail.gmail.com> (raw)
In-Reply-To: <6f85f4af-788d-aaef-db64-ab8d3faf6b1b@linux.ibm.com>

On Mon, Oct 28, 2019 at 9:35 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 10/29/19 4:38 AM, Dan Williams wrote:
> > On Mon, Oct 28, 2019 at 2:48 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> The page size used to map the namespace is arch dependent. For example
> >> architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
> >> size is not aligned to the mapping page size, we can observe kernel crash
> >> during namespace init and destroy.
> >>
> >> This is due to kernel doing partial map/unmap of the resource range
> >>
> >> BUG: Unable to handle kernel data access at 0xc001000406000000
> >> Faulting instruction address: 0xc000000000090790
> >> NIP [c000000000090790] arch_add_memory+0xc0/0x130
> >> LR [c000000000090744] arch_add_memory+0x74/0x130
> >> Call Trace:
> >>   arch_add_memory+0x74/0x130 (unreliable)
> >>   memremap_pages+0x74c/0xa30
> >>   devm_memremap_pages+0x3c/0xa0
> >>   pmem_attach_disk+0x188/0x770
> >>   nvdimm_bus_probe+0xd8/0x470
> >>   really_probe+0x148/0x570
> >>   driver_probe_device+0x19c/0x1d0
> >>   device_driver_attach+0xcc/0x100
> >>   bind_store+0x134/0x1c0
> >>   drv_attr_store+0x44/0x60
> >>   sysfs_kf_write+0x74/0xc0
> >>   kernfs_fop_write+0x1b4/0x290
> >>   __vfs_write+0x3c/0x70
> >>   vfs_write+0xd0/0x260
> >>   ksys_write+0xdc/0x130
> >>   system_call+0x5c/0x68
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>   arch/arm64/mm/flush.c     | 11 +++++++++++
> >>   arch/powerpc/lib/pmem.c   | 21 +++++++++++++++++++--
> >>   arch/x86/mm/pageattr.c    | 12 ++++++++++++
> >>   include/linux/libnvdimm.h |  1 +
> >>   4 files changed, 43 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> >> index ac485163a4a7..90c54c600023 100644
> >> --- a/arch/arm64/mm/flush.c
> >> +++ b/arch/arm64/mm/flush.c
> >> @@ -91,4 +91,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
> >>          __inval_dcache_area(addr, size);
> >>   }
> >>   EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> +       u32 remainder;
> >> +
> >> +       div_u64_rem(size, PAGE_SIZE * ndr_mappings, &remainder);
> >> +       if (remainder)
> >> +               return PAGE_SIZE * ndr_mappings;
> >> +       return 0;
> >> +}
> >> +EXPORT_SYMBOL_GPL(arch_validate_namespace_size);
> >>   #endif
> >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> >> index 377712e85605..2e661a08dae5 100644
> >> --- a/arch/powerpc/lib/pmem.c
> >> +++ b/arch/powerpc/lib/pmem.c
> >> @@ -17,14 +17,31 @@ void arch_wb_cache_pmem(void *addr, size_t size)
> >>          unsigned long start = (unsigned long) addr;
> >>          flush_dcache_range(start, start + size);
> >>   }
> >> -EXPORT_SYMBOL(arch_wb_cache_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
> >>
> >>   void arch_invalidate_pmem(void *addr, size_t size)
> >>   {
> >>          unsigned long start = (unsigned long) addr;
> >>          flush_dcache_range(start, start + size);
> >>   }
> >> -EXPORT_SYMBOL(arch_invalidate_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> +       u32 remainder;
> >> +       unsigned long linear_map_size;
> >> +
> >> +       if (radix_enabled())
> >> +               linear_map_size = PAGE_SIZE;
> >> +       else
> >> +               linear_map_size = (1UL << mmu_psize_defs[mmu_linear_psize].shift);
> >
> > This seems more a "supported_alignments" problem, and less a namespace
> > size or PAGE_SIZE problem, because if the starting address is
> > misaligned this size validation can still succeed when it shouldn't.
> >
>
>
> Isn't supported_alignments an indication of how user want the namespace
> to be mapped to applications?  Ie, with the above restrictions we can
> still do both 64K and 16M mapping of the namespace to userspace.

True, for the pfn device and the device-dax mapping size, but I'm
suggesting adding another instance of alignment control at the raw
namespace level. That would need to be disconnected from the
device-dax page mapping granularity.

>
> Also for supported alignment the huge page mapping is further  dependent
> on the THP feature.
>
> The restrictions here are mostly w.r.t the direct-mapping page size used
> by some architecture.

Right, that's a base requirement for the namespace that can be
independent of the user page mapping size for device-dax.

  reply	other threads:[~2019-10-29  5:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-28  9:48 [RFC PATCH 1/4] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
2019-10-28  9:48 ` [RFC PATCH 2/4] libnvdimm/namespace: Disable the region if the namespace size is not aligned correctly Aneesh Kumar K.V
2019-10-28  9:48 ` [RFC PATCH 3/4] libnvdimm/namespace: Use direct-mapping page size to validate namespace size Aneesh Kumar K.V
2019-10-28  9:48 ` [RFC PATCH 4/4] libnvdimm/namespace: Add debug check while initializing namespace resource size Aneesh Kumar K.V
2019-10-28 21:21 ` [RFC PATCH 1/4] libnvdimm/namespace: Make namespace size validation arch dependent Ira Weiny
2019-10-29  4:34   ` Aneesh Kumar K.V
2019-10-28 23:08 ` Dan Williams
2019-10-29  4:34   ` Aneesh Kumar K.V
2019-10-29  5:30     ` Dan Williams [this message]
2019-10-31  5:35       ` Aneesh Kumar K.V
2019-10-31  6:30         ` Dan Williams
2019-10-31  8:37           ` Aneesh Kumar K.V
2019-10-31 15:53             ` Dan Williams
2019-11-06 10:44           ` Aneesh Kumar K.V
2019-11-16 12:15             ` Aneesh Kumar K.V
2019-11-16 18:50               ` Dan Williams
2019-11-18  9:51                 ` Aneesh Kumar K.V
2019-11-19 17:58                   ` Dan Williams
2019-11-20  3:19                     ` Aneesh Kumar K.V
2019-11-20  3:46                       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gMnSe26QfSBABx0zj3XuFqy=K1XaGnmE3h3sP3Y76nRw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).