Linux-NVDIMM Archive on lore.kernel.org
 help / color / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Jeff Moyer <jmoyer@redhat.com>, Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v2 2/4] libnvdimm/namespace: Enforce memremap_compat_align()
Date: Fri, 14 Feb 2020 22:25:28 +0530
Message-ID: <0843d8bf-c9e4-37c9-d9c2-ba4407daae21@linux.ibm.com> (raw)
In-Reply-To: <x49h7ztdsp5.fsf@segfault.boston.devel.redhat.com>

On 2/14/20 10:14 PM, Jeff Moyer wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
>> On Thu, Feb 13, 2020 at 1:55 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>>>
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> The pmem driver on PowerPC crashes with the following signature when
>>>> instantiating misaligned namespaces that map their capacity via
>>>> memremap_pages().
>>>>
>>>>      BUG: Unable to handle kernel data access at 0xc001000406000000
>>>>      Faulting instruction address: 0xc000000000090790
>>>>      NIP [c000000000090790] arch_add_memory+0xc0/0x130
>>>>      LR [c000000000090744] arch_add_memory+0x74/0x130
>>>>      Call Trace:
>>>>       arch_add_memory+0x74/0x130 (unreliable)
>>>>       memremap_pages+0x74c/0xa30
>>>>       devm_memremap_pages+0x3c/0xa0
>>>>       pmem_attach_disk+0x188/0x770
>>>>       nvdimm_bus_probe+0xd8/0x470
>>>>
>>>> With the assumption that only memremap_pages() has alignment
>>>> constraints, enforce memremap_compat_align() for
>>>> pmem_should_map_pages(), nd_pfn, or nd_dax cases.
>>>>
>>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> Cc: Jeff Moyer <jmoyer@redhat.com>
>>>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>> ---
>>>>   drivers/nvdimm/namespace_devs.c |   10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
>>>> index 032dc61725ff..aff1f32fdb4f 100644
>>>> --- a/drivers/nvdimm/namespace_devs.c
>>>> +++ b/drivers/nvdimm/namespace_devs.c
>>>> @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
>>>>                return ERR_PTR(-ENODEV);
>>>>        }
>>>>
>>>> +     if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) {
>>>> +             struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
>>>> +             resource_size_t start = nsio->res.start;
>>>> +
>>>> +             if (!IS_ALIGNED(start | size, memremap_compat_align())) {
>>>> +                     dev_dbg(&ndns->dev, "misaligned, unable to map\n");
>>>> +                     return ERR_PTR(-EOPNOTSUPP);
>>>> +             }
>>>> +     }
>>>> +
>>>>        if (is_namespace_pmem(&ndns->dev)) {
>>>>                struct nd_namespace_pmem *nspm;
>>>>
>>>
>>> Actually, I take back my ack.  :) This prevents a previously working
>>> namespace from being successfully probed/setup.
>>
>> Do you have a test case handy? I can see a potential gap with a
>> namespace that used internal padding to fix up the alignment.
> 
> # ndctl list -v -n namespace0.0
> [
>    {
>      "dev":"namespace0.0",
>      "mode":"fsdax",
>      "map":"dev",
>      "size":52846133248,
>      "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e",
>      "raw_uuid":"aff43777-015b-493f-bbf9-7c7b0fe33519",
>      "sector_size":512,
>      "align":4096,
>      "blockdev":"pmem0",
>      "numa_node":0
>    }
> ]
> 
> # cat /sys/bus/nd/devices/region0/mappings
> 6
> 
> # grep namespace0.0 /proc/iomem
>    1860000000-24e0003fff : namespace0.0
> 
>> The goal of this check is to catch cases that are just going to fail
>> devm_memremap_pages(), and the expectation is that it could not have
>> worked before unless it was ported from another platform, or someone
>> flipped the page-size switch on PowerPC.
> 
> On x86, creation and probing of the namespace worked fine before this
> patch.  What *doesn't* work is creating another fsdax namespace after
> this one.  sector mode namespaces can still be created, though:
> 
> [
>    {
>      "dev":"namespace0.1",
>      "mode":"sector",
>      "size":53270768640,
>      "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7",
>      "sector_size":512,
>      "blockdev":"pmem0.1s"
>    },
> 
> # grep namespace0.1 /proc/iomem
>    24e0004000-3160007fff : namespace0.1
> 
>>> I thought we were only going to enforce the alignment for a newly
>>> created namespace?  This should only check whether the alignment
>>> works for the current platform.
>>
>> The model is a new default 16MB alignment is enforced at creation
>> time, but if you need to support previously created namespaces then
>> you can manually trim that alignment requirement to no less than
>> memremap_compat_align() because that's the point at which
>> devm_memremap_pages() will start failing or crashing.
> 
> The problem is that older kernels did not enforce alignment to
> SUBSECTION_SIZE.  We shouldn't prevent those namespaces from being
> accessed.  The probe itself will not cause the WARN_ON to trigger.
> Creating new namespaces at misaligned addresses could, but you've
> altered the free space allocation such that we won't hit that anymore.
> 
> If I drop this patch, the probe will still work, and allocating new
> namespaces will also work:
> 
> # ndctl list
> [
>    {
>      "dev":"namespace0.1",
>      "mode":"sector",
>      "size":53270768640,
>      "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7",
>      "sector_size":512,
>      "blockdev":"pmem0.1s"
>    },
>    {
>      "dev":"namespace0.0",
>      "mode":"fsdax",
>      "map":"dev",
>      "size":52846133248,
>      "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e",
>      "sector_size":512,
>      "align":4096,
>      "blockdev":"pmem0"
>    }
> ]
>   ndctl create-namespace -m fsdax -s 36g -r 0
> {
>    "dev":"namespace0.2",
>    "mode":"fsdax",
>    "map":"dev",
>    "size":"35.44 GiB (38.05 GB)",
>    "uuid":"7893264c-c7ef-4cbe-95e1-ccf2aff041fb",
>    "sector_size":512,
>    "align":2097152,
>    "blockdev":"pmem0.2"
> }
> 
> proc/iomem:
> 
> 1860000000-d55fffffff : Persistent Memory
>    1860000000-24e0003fff : namespace0.0
>    24e0004000-3160007fff : namespace0.1
>    3162000000-3a61ffffff : namespace0.2
> 
> So, maybe the right thing is to make memremap_compat_align return
> PAGE_SIZE for x86 instead of SUBSECTION_SIZE?
> 


I did that as part of 
https://lore.kernel.org/linux-nvdimm/20200120140749.69549-2-aneesh.kumar@linux.ibm.com 
and applied the subsection details only when creating new namespace

https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.kumar@linux.ibm.com


But I do agree with the approach that in-order to create a compatible 
namespace we need enforce max possible align value across all supported 
architectures.


On POWER we should still be able to enforce SUBSECTION_SIZE 
restrictions. We did put that as document w.r.t. distributions like Suse 
https://www.suse.com/support/kb/doc/?id=7024300



-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13  0:48 [PATCH v2 0/4] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
2020-02-13  0:48 ` [PATCH v2 1/4] mm/memremap_pages: Introduce memremap_compat_align() Dan Williams
2020-02-13 16:57   ` Jeff Moyer
2020-02-13 18:26     ` Dan Williams
2020-02-14  3:26       ` Aneesh Kumar K.V
2020-02-14 20:59       ` Jeff Moyer
2020-02-14 23:05         ` Dan Williams
2020-02-13  0:48 ` [PATCH v2 2/4] libnvdimm/namespace: Enforce memremap_compat_align() Dan Williams
2020-02-13 19:16   ` Jeff Moyer
2020-02-13 21:55   ` Jeff Moyer
2020-02-13 22:43     ` Dan Williams
2020-02-14 16:44       ` Jeff Moyer
2020-02-14 16:55         ` Aneesh Kumar K.V [this message]
2020-02-13  0:48 ` [PATCH v2 3/4] libnvdimm/region: Introduce NDD_LABELING Dan Williams
2020-02-13 19:12   ` Jeff Moyer
2020-02-13  0:48 ` [PATCH v2 4/4] libnvdimm/region: Introduce an 'align' attribute Dan Williams
2020-02-14 20:19   ` Jeff Moyer
2020-02-14 21:03 ` [PATCH v2 0/4] libnvdimm: Cross-arch compatible namespace alignment Jeff Moyer

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0843d8bf-c9e4-37c9-d9c2-ba4407daae21@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVDIMM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvdimm/0 linux-nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvdimm linux-nvdimm/ https://lore.kernel.org/linux-nvdimm \
		linux-nvdimm@lists.01.org
	public-inbox-index linux-nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.01.lists.linux-nvdimm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git