All of lore.kernel.org
 help / color / mirror / Atom feed
From: Justin He <Justin.He@arm.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	nd <nd@arm.com>
Subject: RE: [PATCH v2] device-dax: use fallback nid when numa node is invalid
Date: Tue, 14 Sep 2021 02:06:14 +0000	[thread overview]
Message-ID: <AM6PR08MB4376FC35158104629C603197F7DA9@AM6PR08MB4376.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CAPcyv4ie_ZzEwrrKJEVrDP19UWAgSiW3GU9f99EqX0e6BPQDPA@mail.gmail.com>

Hi Dan,

> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Sent: Friday, September 10, 2021 11:42 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang
> <dave.jiang@intel.com>; David Hildenbrand <david@redhat.com>; Linux NVDIMM
> <nvdimm@lists.linux.dev>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>
> Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is
> invalid
> 
> On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin.he@arm.com> wrote:
> >
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa case.
> >
> > Without this patch, pmem can't be probed as RAM devices on arm64 if
> > SRAT table isn't present:
> >   $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> >   kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> >   kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> > v2: - rebase it based on David's "memory group" patch.
> >     - drop the changes in dev_dax_kmem_remove() since nid had been
> >       removed in remove_memory().
> >  drivers/dax/kmem.c | 31 +++++++++++++++++--------------
> >  1 file changed, 17 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index a37622060fff..e4836eb7539e 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >         unsigned long total_len = 0;
> >         struct dax_kmem_data *data;
> >         int i, rc, mapped = 0;
> > -       int numa_node;
> > -
> > -       /*
> > -        * Ensure good NUMA information for the persistent memory.
> > -        * Without this check, there is a risk that slow memory
> > -        * could be mixed in a node with faster memory, causing
> > -        * unavoidable performance issues.
> > -        */
> > -       numa_node = dev_dax->target_node;
> > -       if (numa_node < 0) {
> > -               dev_warn(dev, "rejecting DAX region with invalid
> node: %d\n",
> > -                               numa_node);
> > -               return -EINVAL;
> > -       }
> > +       int numa_node = dev_dax->target_node;
> >
> >         for (i = 0; i < dev_dax->nr_range; i++) {
> >                 struct range range;
> > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >                                         i, range.start, range.end);
> >                         continue;
> >                 }
> > +
> > +               /*
> > +                * Ensure good NUMA information for the persistent
> memory.
> > +                * Without this check, there is a risk but not fatal
> that slow
> > +                * memory could be mixed in a node with faster memory,
> causing
> > +                * unavoidable performance issues. Warn this and use
> fallback
> > +                * node id.
> > +                */
> > +               if (numa_node < 0) {
> > +                       int new_node =
> memory_add_physaddr_to_nid(range.start);
> > +
> > +                       dev_info(dev, "changing nid from %d to %d for
> DAX region [%#llx-%#llx]\n",
> > +                                numa_node, new_node, range.start,
> range.end);
> > +                       numa_node = new_node;
> > +               }
> > +
> >                 total_len += range_len(&range);
> 
> This fallback change belongs where the parent region for the namespace
> adopts its target_node, because it's not clear
> memory_add_physaddr_to_nid() is the right fallback in all situations.
> Here is where this setting is happening currently:
> 
> drivers/acpi/nfit/core.c:3004:          ndr_desc->target_node =
> pxm_to_node(spa->proximity_domain);
On my local arm64 guest('virt' machine type), the target_node is
set to -1 at this line.
That is:
The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit.

> drivers/acpi/nfit/core.c:3007:          ndr_desc->target_node =
> NUMA_NO_NODE;
> drivers/nvdimm/e820.c:29:       ndr_desc.target_node = nid;
> drivers/nvdimm/of_pmem.c:58:            ndr_desc.target_node =
> ndr_desc.numa_node;
> drivers/nvdimm/region_devs.c:1127:      nd_region->target_node =
> ndr_desc->target_node;


Sorry,Dan. I thought I missed your previous mail:

=========================================
Looks like it is the NFIT driver, thanks.

If you're getting NUMA_NO_NODE in dax_kmem from the NFIT driver in
means your ACPI NFIT table is failing to populate correct numa
information. You could try the following to fix it up, but I think the
real problem is that your platform BIOS needs to add the proper numa
data.

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index fb775b967c52..d3a0cec635b1 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct
acpi_nfit_desc *acpi_desc,
        ndr_desc->res = &res;
        ndr_desc->provider_data = nfit_spa;
        ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
-       if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) {
-               ndr_desc->numa_node = acpi_map_pxm_to_online_node(
-                                               spa->proximity_domain);
-               ndr_desc->target_node = acpi_map_pxm_to_node(
-                               spa->proximity_domain);
-       } else {
-               ndr_desc->numa_node = NUMA_NO_NODE;
-               ndr_desc->target_node = NUMA_NO_NODE;
-       }
+       ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address);
+       ndr_desc->target_node = phys_to_target_node(spa->address);

        /*
         * Persistence domain bits are hierarchical, if
===================================================

Do you still suggest fixing like this?


--
Cheers,
Justin (Jia He)




  reply	other threads:[~2021-09-14  2:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 12:46 [PATCH v2] device-dax: use fallback nid when numa node is invalid Jia He
2021-09-10 15:42 ` Dan Williams
2021-09-10 15:42   ` Dan Williams
2021-09-14  2:06   ` Justin He [this message]
2021-09-15  5:15     ` Dan Williams
2021-09-15  5:15       ` Dan Williams
2021-09-15  6:50       ` Justin He
2021-09-15 16:22         ` Dan Williams
2021-09-15 16:22           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM6PR08MB4376FC35158104629C603197F7DA9@AM6PR08MB4376.eurprd08.prod.outlook.com \
    --to=justin.he@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nd@arm.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.