* [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE @ 2021-07-28 8:22 Jia He 2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He 0 siblings, 1 reply; 7+ messages in thread From: Jia He @ 2021-07-28 8:22 UTC (permalink / raw) To: Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd, Jia He Background ========== I once sent out the preparatory patches [1] but I dropped the last patch of using fallback nid with memory_add_physaddr_to_nid() due to some dependency. After phys_addr_to_target_node() and memory_add_physaddr_to_nid() are stable now, it's time to fix the original bug on arm64 now. Compared with the last version [2], this version rebases the patch to latest v5.14-rc3 (s/kmem_start/range.start) Test ==== Tested on ThunderX2 host/qemu "-M virt" guest with a nvdimm device. The memblocks from the dax pmem device can be either hot-added or hot-removed on arm64 guest. Also passed the compilation test on x86. [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2228771.html [2] https://lkml.org/lkml/2020/7/8/1546 Jia He (1): device-dax: use fallback nid when numa_node is invalid drivers/dax/kmem.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-28 8:22 [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE Jia He @ 2021-07-28 8:22 ` Jia He 2021-07-28 20:17 ` David Hildenbrand 0 siblings, 1 reply; 7+ messages in thread From: Jia He @ 2021-07-28 8:22 UTC (permalink / raw) To: Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd, Jia He Previously, numa_off was set unconditionally in dummy_numa_init() even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1) after acpi_map_pxm_to_node() because it regards numa_off as turning off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa. Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table isn't present: $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 kmem: probe of dax0.0 failed with error -22 This fixes it by using fallback memory_add_physaddr_to_nid() as nid. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> --- drivers/dax/kmem.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index ac231cc36359..749674909e51 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) struct dax_kmem_data *data; int rc = -ENOMEM; int i, mapped = 0; - int numa_node; - - /* - * Ensure good NUMA information for the persistent memory. - * Without this check, there is a risk that slow memory - * could be mixed in a node with faster memory, causing - * unavoidable performance issues. - */ - numa_node = dev_dax->target_node; - if (numa_node < 0) { - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", - numa_node); - return -EINVAL; - } + int numa_node = dev_dax->target_node, new_node; data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL); if (!data) @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) */ res->flags = IORESOURCE_SYSTEM_RAM; + /* + * Ensure good NUMA information for the persistent memory. + * Without this check, there is a risk but not fatal that slow + * memory could be mixed in a node with faster memory, causing + * unavoidable performance issues. Furthermore, fallback node + * id can be used when numa_node is invalid. + */ + if (numa_node < 0) { + new_node = memory_add_physaddr_to_nid(range.start); + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n", + numa_node, new_node, res); + numa_node = new_node; + } + /* * Ensure that future kexec'd kernels will not treat * this as RAM automatically. @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) int i, success = 0; struct device *dev = &dev_dax->dev; struct dax_kmem_data *data = dev_get_drvdata(dev); + int numa_node = dev_dax->target_node; /* * We have one shot for removing memory, if some memory blocks were not @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) if (rc) continue; - rc = remove_memory(dev_dax->target_node, range.start, - range_len(&range)); + if (numa_node < 0) + numa_node = memory_add_physaddr_to_nid(range.start); + + rc = remove_memory(numa_node, range.start, range_len(&range)); if (rc == 0) { release_resource(data->res[i]); kfree(data->res[i]); -- 2.17.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He @ 2021-07-28 20:17 ` David Hildenbrand 2021-07-29 0:20 ` Justin He 0 siblings, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2021-07-28 20:17 UTC (permalink / raw) To: Jia He, Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd On 28.07.21 10:22, Jia He wrote: > Previously, numa_off was set unconditionally in dummy_numa_init() > even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1) > after acpi_map_pxm_to_node() because it regards numa_off as turning > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > arm64 with fake numa. > > Without this patch, pmem can't be probed as a RAM device on arm64 if > SRAT table isn't present: > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 > kmem: probe of dax0.0 failed with error -22 > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > drivers/dax/kmem.c | 36 ++++++++++++++++++++---------------- > 1 file changed, 20 insertions(+), 16 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index ac231cc36359..749674909e51 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > struct dax_kmem_data *data; > int rc = -ENOMEM; > int i, mapped = 0; > - int numa_node; > - > - /* > - * Ensure good NUMA information for the persistent memory. > - * Without this check, there is a risk that slow memory > - * could be mixed in a node with faster memory, causing > - * unavoidable performance issues. > - */ > - numa_node = dev_dax->target_node; > - if (numa_node < 0) { > - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", > - numa_node); > - return -EINVAL; > - } > + int numa_node = dev_dax->target_node, new_node; > > data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL); > if (!data) > @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > */ > res->flags = IORESOURCE_SYSTEM_RAM; > > + /* > + * Ensure good NUMA information for the persistent memory. > + * Without this check, there is a risk but not fatal that slow > + * memory could be mixed in a node with faster memory, causing > + * unavoidable performance issues. Furthermore, fallback node > + * id can be used when numa_node is invalid. > + */ > + if (numa_node < 0) { > + new_node = memory_add_physaddr_to_nid(range.start); > + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n", > + numa_node, new_node, res); > + numa_node = new_node; > + } > + > /* > * Ensure that future kexec'd kernels will not treat > * this as RAM automatically. > @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) > int i, success = 0; > struct device *dev = &dev_dax->dev; > struct dax_kmem_data *data = dev_get_drvdata(dev); > + int numa_node = dev_dax->target_node; > > /* > * We have one shot for removing memory, if some memory blocks were not > @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) > if (rc) > continue; > > - rc = remove_memory(dev_dax->target_node, range.start, > - range_len(&range)); > + if (numa_node < 0) > + numa_node = memory_add_physaddr_to_nid(range.start); > + > + rc = remove_memory(numa_node, range.start, range_len(&range)); > if (rc == 0) { > release_resource(data->res[i]); > kfree(data->res[i]); > Note that this patch conflicts with: https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com But nothing fundamental. Determining a single NID is similar to how I'm handling it for ACPI: https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-28 20:17 ` David Hildenbrand @ 2021-07-29 0:20 ` Justin He 2021-07-29 7:59 ` David Hildenbrand 0 siblings, 1 reply; 7+ messages in thread From: Justin He @ 2021-07-29 0:20 UTC (permalink / raw) To: David Hildenbrand, Dan Williams, Vishal Verma, Dave Jiang Cc: nvdimm, linux-kernel, nd Hi David > -----Original Message----- > From: David Hildenbrand <david@redhat.com> > Sent: Thursday, July 29, 2021 4:17 AM > To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>; > Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com> > Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com> > Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is invalid > > On 28.07.21 10:22, Jia He wrote: > > Previously, numa_off was set unconditionally in dummy_numa_init() > > even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1) > > after acpi_map_pxm_to_node() because it regards numa_off as turning > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > > arm64 with fake numa. > > > > Without this patch, pmem can't be probed as a RAM device on arm64 if > > SRAT table isn't present: > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g > -a 64K > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > invalid node: -1 > > kmem: probe of dax0.0 failed with error -22 > > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > drivers/dax/kmem.c | 36 ++++++++++++++++++++---------------- > > 1 file changed, 20 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index ac231cc36359..749674909e51 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > struct dax_kmem_data *data; > > int rc = -ENOMEM; > > int i, mapped = 0; > > - int numa_node; > > - > > - /* > > - * Ensure good NUMA information for the persistent memory. > > - * Without this check, there is a risk that slow memory > > - * could be mixed in a node with faster memory, causing > > - * unavoidable performance issues. > > - */ > > - numa_node = dev_dax->target_node; > > - if (numa_node < 0) { > > - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", > > - numa_node); > > - return -EINVAL; > > - } > > + int numa_node = dev_dax->target_node, new_node; > > > > data = kzalloc(struct_size(data, res, dev_dax->nr_range), > GFP_KERNEL); > > if (!data) > > @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > */ > > res->flags = IORESOURCE_SYSTEM_RAM; > > > > + /* > > + * Ensure good NUMA information for the persistent memory. > > + * Without this check, there is a risk but not fatal that slow > > + * memory could be mixed in a node with faster memory, causing > > + * unavoidable performance issues. Furthermore, fallback node > > + * id can be used when numa_node is invalid. > > + */ > > + if (numa_node < 0) { > > + new_node = memory_add_physaddr_to_nid(range.start); > > + dev_info(dev, "changing nid from %d to %d for DAX > region %pR\n", > > + numa_node, new_node, res); > > + numa_node = new_node; > > + } > > + > > /* > > * Ensure that future kexec'd kernels will not treat > > * this as RAM automatically. > > @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax > *dev_dax) > > int i, success = 0; > > struct device *dev = &dev_dax->dev; > > struct dax_kmem_data *data = dev_get_drvdata(dev); > > + int numa_node = dev_dax->target_node; > > > > /* > > * We have one shot for removing memory, if some memory blocks were > not > > @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax > *dev_dax) > > if (rc) > > continue; > > > > - rc = remove_memory(dev_dax->target_node, range.start, > > - range_len(&range)); > > + if (numa_node < 0) > > + numa_node = memory_add_physaddr_to_nid(range.start); > > + > > + rc = remove_memory(numa_node, range.start, range_len(&range)); > > if (rc == 0) { > > release_resource(data->res[i]); > > kfree(data->res[i]); > > > > Note that this patch conflicts with: > > https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com > > But nothing fundamental. Determining a single NID is similar to how I'm > handling it for ACPI: > > https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com > Okay, got it. Thanks for the reminder. Seems my patch is not useful after your patch. -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-29 0:20 ` Justin He @ 2021-07-29 7:59 ` David Hildenbrand 2021-07-29 14:44 ` Justin He 0 siblings, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2021-07-29 7:59 UTC (permalink / raw) To: Justin He, Dan Williams, Vishal Verma, Dave Jiang Cc: nvdimm, linux-kernel, nd Hi Justin, >>> >> >> Note that this patch conflicts with: >> >> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com >> >> But nothing fundamental. Determining a single NID is similar to how I'm >> handling it for ACPI: >> >> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com >> > > Okay, got it. Thanks for the reminder. > Seems my patch is not useful after your patch. > I think your patch still makes sense. With https://lore.kernel.org/linux-acpi/20210723125210.29987-7-david@redhat.com/ We'd have to detect the node id in the first loop instead. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-29 7:59 ` David Hildenbrand @ 2021-07-29 14:44 ` Justin He 2021-08-02 15:47 ` David Hildenbrand 0 siblings, 1 reply; 7+ messages in thread From: Justin He @ 2021-07-29 14:44 UTC (permalink / raw) To: David Hildenbrand, Dan Williams, Vishal Verma, Dave Jiang Cc: nvdimm, linux-kernel, nd Hi David > -----Original Message----- > From: David Hildenbrand <david@redhat.com> > Sent: Thursday, July 29, 2021 3:59 PM > To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>; > Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com> > Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com> > Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is > invalid > > Hi Justin, > > >>> > >> > >> Note that this patch conflicts with: > >> > >> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com > >> > >> But nothing fundamental. Determining a single NID is similar to how I'm > >> handling it for ACPI: > >> > >> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com > >> > > > > Okay, got it. Thanks for the reminder. > > Seems my patch is not useful after your patch. > > > > I think your patch still makes sense. With > > https://lore.kernel.org/linux-acpi/20210723125210.29987-7- > david@redhat.com/ > > We'd have to detect the node id in the first loop instead. Ok, I got your point. I will do that in v2. Btw, sorry for commenting there about your patch 06 since I didn't subscribe lkml via this mailbox. + for (i = 0; i < dev_dax->nr_range; i++) { + struct range range; + + rc = dax_kmem_range(dev_dax, i, &range); + if (rc) { + dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n", + i, range.start, range.end); + continue; + } + total_len += range_len(&range); + } You add an additional loop to get the total_len. I wonder is it independent on 2nd loop? If yes, why not merge the 2 loops into one? Sorry if this question is too simple, I don't know too much about the background of your patch. -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid 2021-07-29 14:44 ` Justin He @ 2021-08-02 15:47 ` David Hildenbrand 0 siblings, 0 replies; 7+ messages in thread From: David Hildenbrand @ 2021-08-02 15:47 UTC (permalink / raw) To: Justin He, Dan Williams, Vishal Verma, Dave Jiang Cc: nvdimm, linux-kernel, nd On 29.07.21 16:44, Justin He wrote: > Hi David > >> -----Original Message----- >> From: David Hildenbrand <david@redhat.com> >> Sent: Thursday, July 29, 2021 3:59 PM >> To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>; >> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com> >> Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com> >> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is >> invalid >> >> Hi Justin, >> >>>>> >>>> >>>> Note that this patch conflicts with: >>>> >>>> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com >>>> >>>> But nothing fundamental. Determining a single NID is similar to how I'm >>>> handling it for ACPI: >>>> >>>> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com >>>> >>> >>> Okay, got it. Thanks for the reminder. >>> Seems my patch is not useful after your patch. >>> >> >> I think your patch still makes sense. With >> >> https://lore.kernel.org/linux-acpi/20210723125210.29987-7- >> david@redhat.com/ >> >> We'd have to detect the node id in the first loop instead. > > Ok, I got your point. I will do that in v2. > > Btw, sorry for commenting there about your patch 06 since I didn't > subscribe lkml via this mailbox. Sure, you really should subscribe :) > > + for (i = 0; i < dev_dax->nr_range; i++) { > + struct range range; > + > + rc = dax_kmem_range(dev_dax, i, &range); > + if (rc) { > + dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n", > + i, range.start, range.end); > + continue; > + } > + total_len += range_len(&range); > + } > You add an additional loop to get the total_len. > I wonder is it independent on 2nd loop? > If yes, why not merge the 2 loops into one? > Sorry if this question is too simple, I don't know too much > about the background of your patch. We need total_len to register the memory group. We need the memory group to add memory. Therefore, we need a second loop to calculate total_len upfront. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-02 15:47 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-28 8:22 [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE Jia He 2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He 2021-07-28 20:17 ` David Hildenbrand 2021-07-29 0:20 ` Justin He 2021-07-29 7:59 ` David Hildenbrand 2021-07-29 14:44 ` Justin He 2021-08-02 15:47 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).