linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Brice Goglin <brice.goglin@gmail.com>
Cc: <linux-mm@kvack.org>, <linux-acpi@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <x86@kernel.org>,
	Keith Busch <kbusch@kernel.org>, <jglisse@redhat.com>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>, <linuxarm@huawei.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Tao Xu <tao3.xu@intel.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains
Date: Fri, 3 Jan 2020 10:09:20 +0000	[thread overview]
Message-ID: <20200103100920.00006a18@Huawei.com> (raw)
In-Reply-To: <b428d231-4879-4462-ac42-900b5d094eee@gmail.com>

On Thu, 2 Jan 2020 22:37:04 +0100
Brice Goglin <brice.goglin@gmail.com> wrote:

> Le 02/01/2020 à 16:27, Jonathan Cameron a écrit :
> >  
> >> However the HMAT table gets ignored because find_mem_target() fails in
> >> hmat_parse_proximity_domain(). The target should have been allocated in
> >> alloc_memory_target() which is called in srat_parse_mem_affinity(), but
> >> it seems to me that this function isn't called for GI nodes. Or should
> >> SRAT also contain a normal Memory node with same PM as the GI?
> >>  
> > Hi Brice,
> >
> > Yes you should see a node with 0kB memory.  Same as you get for a processor
> > only node I believe.
> >
> > srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes
> > as they don't have any memory.   The hmat table should only refer to
> > GI domains as initiators.  Just to check, do you have them listed as
> > a target node?  Or perhaps in some hmat proximity entry as memory_PD?
> >  
> 
> Thanks, I finally got things to work. I am on x86. It's a dual-socket
> machine with SubNUMA clusters (2 nodes per socket) and NVDIMMs (one
> dax-kmem node per socket). Before adding a GI, initiators look like this:
> 
> node0 -> node0 and node4
> 
> node1 -> node1 and node5
> 
> node2 -> node2 and node4
> 
> node3 -> node3 and node5
> 
> I added a GI with faster access to node0, node2, node4 (first socket).
> 
> The GI node becomes an access0 initiator for node4, and node0 and node2
> remain access1 initiators.
> 
> The GI node doesn't become access0 initiator for node0 and node2, likely
> because of this test :
> 
>         /*
>          * If the Address Range Structure provides a local processor pxm, link
>          * only that one. Otherwise, find the best performance attributes and
>          * register all initiators that match.
>          */
>         if (target->processor_pxm != PXM_INVAL) {
> 
> I guess I should split node0-3 into separate CPU nodes and memory nodes
> in SRAT?

It sounds like it's working as expected.  There are a few assumptions made about
'sensible' hmat configurations.

1) If the memory and processor are in the same domain, that should mean the
access characteristics within that domain are the best in the system.
It is possible to have a setup with very low latency access
from a particular processor but also low bandwidth.  Another domain may have
high bandwidth but long latency.   Such systems may occur, but they are probably
going to not be for 'normal memory the OS can just use'.

2) If we have a relevant "Memory Proximity Domain Attributes Structure"
Note this was renamed in acpi 6.3 from "Address Range Structure" as
it no longer has any address ranges.
(which are entirely optional btw) that indicates that the memory controller
for a given memory lies in the proximity domain of the Initiator specified.
If that happens we ignore cases where hmat says somewhere else is nearer
via bandwidth and latency.

For case 1) I'm not sure we actually enforce it.
I think you've hit case 2).  

Removing the address range structures should work, or as you say you can
move that memory into separate memory nodes.  It will be a bit of a strange
setup though.  Assuming node4 is an NVDIMM then that would be closer to a
potential real system.  With a suitable coherent bus (CCIX is most familiar
to me and can do this) You might have

 _______       ________    _______
|       |     |        |   |       |
| Node0 |     | Node4  |---| Node6 |
| CPU   |-----| Mem +  |---| GI    |
| Mem   |     | CCHome |---|       |
|_______|     |________|   |_______|
   |                          |
   |__________________________|

CCHome Cache Coherency directory location to avoid the need for more
esoteric cache coherency short cut methods etc.

Idea being the GI node is some big fat DB accelerator or similar doing
offloaded queries.  It has a fat pipe to the NVDIMMs.  

Lets ignore that, to actually justify the use of a GI only node,
you need some more elements as this situation could be represented
by fusing node4 and node6 and having asymmetric HMAT between Node0
and the fused Node4.

So in conclusion, with your setup, only the NVDIMM nodes look like the
sort of memory that might be in a node nearer to a GI than the host.
> 
> Brice

Thanks again for looking at this!

Jonathan
> 
> 
> 
> 



  reply	other threads:[~2020-01-03 10:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-16 15:38 [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 1/7] ACPI: Support Generic Initiator only domains Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 2/7] arm64: " Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 3/7] x86: Support Generic Initiator only proximity domains Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 4/7] ACPI: Let ACPI know we support Generic Initiator Affinity Structures Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 5/7] ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3 Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 6/7] node: Add access1 class to represent CPU to memory characteristics Jonathan Cameron
2019-12-16 15:38 ` [PATCH V6 7/7] docs: mm: numaperf.rst Add brief description for access class 1 Jonathan Cameron
2019-12-18 11:34   ` Brice Goglin
2019-12-18 14:37     ` Jonathan Cameron
2019-12-18 11:32 ` [PATCH V6 0/7] ACPI: Support Generic Initiator proximity domains Brice Goglin
2019-12-18 14:50   ` Jonathan Cameron
2019-12-20 21:40     ` Brice Goglin
2020-01-02 15:27       ` Jonathan Cameron
2020-01-02 21:37         ` Brice Goglin
2020-01-03 10:09           ` Jonathan Cameron [this message]
2020-01-03 12:18             ` Brice Goglin
2020-01-03 13:08               ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200103100920.00006a18@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=brice.goglin@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=guohanjun@huawei.com \
    --cc=jglisse@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=sudeep.holla@arm.com \
    --cc=tao3.xu@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).