From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from g2t2352.austin.hpe.com (g2t2352.austin.hpe.com [15.233.44.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 42820221ED774 for ; Wed, 20 Dec 2017 17:36:31 -0800 (PST) From: "Elliott, Robert (Persistent Memory)" Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Thu, 21 Dec 2017 01:41:15 +0000 Message-ID: References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <20171220211350.GA2688@linux.intel.com> In-Reply-To: <20171220211350.GA2688@linux.intel.com> Content-Language: en-US MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Ross Zwisler , Matthew Wilcox Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , Anaczkowski,, "Robert , linux-acpi@vger.kernel.org" , Odzioba,, "Erik , Len Brown" , John Hubbard , Jerome Glisse , "devel@acpica.org" , "Kogut, Jaroslaw , Tim Chen" , "linux-mm@kvack.org" , "Koss, Marcin" , "linux-api@vger.kernel.org" , Brice Goglin , "Nachimuthu, Murugasamy , Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Koziej, , "Joonas , Andrew Morton" , "linuxppc-dev@lists.ozlabs.org" List-ID: > -----Original Message----- > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of > Ross Zwisler ... > > On Wed, Dec 20, 2017 at 10:19:37AM -0800, Matthew Wilcox wrote: ... > > initiator is a CPU? I'd have expected you to expose a memory controller > > abstraction rather than re-use storage terminology. > > Yea, I agree that at first blush it seems weird. It turns out that > looking at it in sort of a storage initiator/target way is beneficial, > though, because it allows us to cut down on the number of data values > we need to represent. > > For example the SLIT, which doesn't differentiate between initiator and > target proximity domains (and thus nodes) always represents a system > with N proximity domains using a NxN distance table. This makes sense > if every node contains both CPUs and memory. > > With the introduction of the HMAT, though, we can have memory-only > initiator nodes and we can explicitly associate them with their local > CPU. This is necessary so that we can separate memory with different > performance characteristics (HBM vs normal memory vs persistent memory, > for example) that are all attached to the same CPU. > > So, say we now have a system with 4 CPUs, and each of those CPUs has 3 > different types of memory attached to it. We now have 16 total proximity > domains, 4 CPU and 12 memory. The CPU cores that make up a node can have performance restrictions of their own; for example, they might max out at 10 GB/s even though the memory controller supports 120 GB/s (meaning you need to use 12 cores on the node to fully exercise memory). It'd be helpful to report this, so software can decide how many cores to use for bandwidth-intensive work. > If we represent this with the SLIT we end up with a 16 X 16 distance table > (256 entries), most of which don't matter because they are memory-to- > memory distances which don't make sense. > > In the HMAT, though, we separate out the initiators and the targets and > put them into separate lists. (See 5.2.27.4 System Locality Latency and > Bandwidth Information Structure in ACPI 6.2 for details.) So, this same > config in the HMAT only has 4*12=48 performance values of each type, all > of which convey meaningful information. > > The HMAT indeed even uses the storage "initiator" and "target" > terminology. :) Centralized DMA engines (e.g., as used by the "DMA based blk-mq pmem driver") have performance differences too. A CPU might include CPU cores that reach 10 GB/s, DMA engines that reach 60 GB/s, and memory controllers that reach 120 GB/s. I guess these would be represented as extra initiators on the node? --- Robert Elliott, HPE Persistent Memory _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Elliott, Robert (Persistent Memory)" Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Thu, 21 Dec 2017 01:41:15 +0000 Message-ID: References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <20171220211350.GA2688@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:32584 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756457AbdLUBlT (ORCPT ); Wed, 20 Dec 2017 20:41:19 -0500 In-Reply-To: <20171220211350.GA2688@linux.intel.com> Content-Language: en-US Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Ross Zwisler , Matthew Wilcox Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , "linux-acpi@vger.kernel.org" , "Odzioba, Lukasz" , "Schmauss, Erik" , Len Brown , John Hubbard , "linuxppc-dev@lists.ozlabs.org" , Jerome Glisse , "devel@acpica.org" , "Kogut, Jaroslaw" , linux-mm@kvack. > -----Original Message----- > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of > Ross Zwisler ... > > On Wed, Dec 20, 2017 at 10:19:37AM -0800, Matthew Wilcox wrote: ... > > initiator is a CPU? I'd have expected you to expose a memory controller > > abstraction rather than re-use storage terminology. > > Yea, I agree that at first blush it seems weird. It turns out that > looking at it in sort of a storage initiator/target way is beneficial, > though, because it allows us to cut down on the number of data values > we need to represent. > > For example the SLIT, which doesn't differentiate between initiator and > target proximity domains (and thus nodes) always represents a system > with N proximity domains using a NxN distance table. This makes sense > if every node contains both CPUs and memory. > > With the introduction of the HMAT, though, we can have memory-only > initiator nodes and we can explicitly associate them with their local > CPU. This is necessary so that we can separate memory with different > performance characteristics (HBM vs normal memory vs persistent memory, > for example) that are all attached to the same CPU. > > So, say we now have a system with 4 CPUs, and each of those CPUs has 3 > different types of memory attached to it. We now have 16 total proximity > domains, 4 CPU and 12 memory. The CPU cores that make up a node can have performance restrictions of their own; for example, they might max out at 10 GB/s even though the memory controller supports 120 GB/s (meaning you need to use 12 cores on the node to fully exercise memory). It'd be helpful to report this, so software can decide how many cores to use for bandwidth-intensive work. > If we represent this with the SLIT we end up with a 16 X 16 distance table > (256 entries), most of which don't matter because they are memory-to- > memory distances which don't make sense. > > In the HMAT, though, we separate out the initiators and the targets and > put them into separate lists. (See 5.2.27.4 System Locality Latency and > Bandwidth Information Structure in ACPI 6.2 for details.) So, this same > config in the HMAT only has 4*12=48 performance values of each type, all > of which convey meaningful information. > > The HMAT indeed even uses the storage "initiator" and "target" > terminology. :) Centralized DMA engines (e.g., as used by the "DMA based blk-mq pmem driver") have performance differences too. A CPU might include CPU cores that reach 10 GB/s, DMA engines that reach 60 GB/s, and memory controllers that reach 120 GB/s. I guess these would be represented as extra initiators on the node? --- Robert Elliott, HPE Persistent Memory From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757578AbdLUBlZ convert rfc822-to-8bit (ORCPT ); Wed, 20 Dec 2017 20:41:25 -0500 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:32584 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756457AbdLUBlT (ORCPT ); Wed, 20 Dec 2017 20:41:19 -0500 From: "Elliott, Robert (Persistent Memory)" To: Ross Zwisler , Matthew Wilcox CC: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , "linux-acpi@vger.kernel.org" , "Odzioba, Lukasz" , "Schmauss, Erik" , "Len Brown" , John Hubbard , "linuxppc-dev@lists.ozlabs.org" , "Jerome Glisse" , "devel@acpica.org" , "Kogut, Jaroslaw" , "linux-mm@kvack.org" , "Koss, Marcin" , "linux-api@vger.kernel.org" , Brice Goglin , "Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , "Koziej, Artur" , "Lahtinen, Joonas" , Andrew Morton , "Tim Chen" Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Thread-Topic: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Thread-Index: AQHTdIDGf6y9L6KQC0qiwydPaO/QjKNCzgoAgAbIhYCAAv6fgIAAMK0AgABErUA= Date: Thu, 21 Dec 2017 01:41:15 +0000 Message-ID: References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <20171220211350.GA2688@linux.intel.com> In-Reply-To: <20171220211350.GA2688@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=elliott@hpe.com; x-originating-ip: [15.211.195.7] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;AT5PR8401MB0386;6:6a9GnQEQgzhoPmKzEzz23SZyUv+d0avXwjHO8RP5KtoP6wQed7aW+mhV8xDFx1ZPOuNuL50oaWnJuuZrHsoRjTSvN7DDoCiiOXU0HgyJe44b0OXQ3reKEnKuvsXAwhegeeweedqiNspcKz87DNF27SgvsmtNHoMCz8hLi6HuKqjAoIuOSYKb6uyC4hXOfyemse5PGm7lM612v0tmL/oNKsZ3Eu2NvZtxAz0ayPt8gVCAGXs+BrRuqU+MG3pUCBqoYApwoAQbEOTDh93VH1wXsEnwFZOJcR3U/O4cx8kG4SNDcDT7AKwO2ZY6HtnnmZIDa0WwDTOLjvZvE7A4JqEFcZ+bzAxYcpJMCfHaQvJcFMk=;5:txedN4ww+1pKFu2BXsSdEyY8RNru2kF9bqDqBvGVVEdUa8Mm3t2Pa4ozVWGuOnpAF0bA9SadDbs94gyFjNa6H12Y/FdFg81ZeMXBpZAWj7gzBCF/iv2Sqyx/9jobisyIc/YlV3FXhzc9VJXj0c+NqrygU0Jpai3xZ1T/8a3Saw4=;24:Eb0roxrNs7k2shSnrQugn0WS9iLT9YZLslnYRYrMqf+zyEZVAegP+FurgSdWDEH+X4mulwOIgCbALT1V1Ky9+BtLdg8iVpUgsWokv98punk=;7:bsb1+RRh94q+sMhPl9HlFcBOX7yT9Y9fr+DnJ3Pv/yA1QbdHoYTvsyCfRax60nTENv06g1KSsWTBL3Kpz7F2lmJwBRduZTtUlQOsRx6L6XxdT2ehnEIZ3lEh2Gqi+APCunRgKs3MFdys5WaG9PUU8b+kFzf47nU0I+gdClGl3ytmc7m7n4Z9JngzW3WwxF/eJQ+U5cueTWyCzvIXuToKxMuQ12TWB4iYA4WX6zJDL08AOvmWKmYfimrQT9ahnxpH x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 551a6ce6-794f-488c-9a8d-08d54813eca5 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(48565401081)(4534020)(4602075)(4627115)(8989060)(201703031133081)(201702281549075)(8990040)(5600026)(4604075)(2017052603307)(7153060);SRVR:AT5PR8401MB0386; x-ms-traffictypediagnostic: AT5PR8401MB0386: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(162533806227266); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(8121501046)(5005006)(3002001)(3231023)(10201501046)(93006095)(93001095)(6055026)(6041268)(20161123564045)(20161123558120)(20161123562045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);SRVR:AT5PR8401MB0386;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:AT5PR8401MB0386; x-forefront-prvs: 0528942FD8 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39850400004)(396003)(376002)(39380400002)(366004)(346002)(24454002)(199004)(13464003)(189003)(7736002)(7416002)(6436002)(478600001)(8936002)(3846002)(66066001)(74316002)(6116002)(14454004)(5660300001)(3280700002)(3660700001)(2906002)(68736007)(55016002)(316002)(76176011)(54906003)(2950100002)(7406005)(97736004)(5250100002)(110136005)(59450400001)(6506007)(5890100001)(86362001)(2900100001)(33656002)(4326008)(105586002)(81156014)(106356001)(25786009)(81166006)(99286004)(9686003)(6246003)(53936002)(305945005)(8676002)(39060400002)(7696005)(229853002)(93886005)(102836004);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR8401MB0386;H:AT5PR8401MB0387.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 551a6ce6-794f-488c-9a8d-08d54813eca5 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Dec 2017 01:41:15.3241 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR8401MB0386 X-OriginatorOrg: hpe.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of > Ross Zwisler ... > > On Wed, Dec 20, 2017 at 10:19:37AM -0800, Matthew Wilcox wrote: ... > > initiator is a CPU? I'd have expected you to expose a memory controller > > abstraction rather than re-use storage terminology. > > Yea, I agree that at first blush it seems weird. It turns out that > looking at it in sort of a storage initiator/target way is beneficial, > though, because it allows us to cut down on the number of data values > we need to represent. > > For example the SLIT, which doesn't differentiate between initiator and > target proximity domains (and thus nodes) always represents a system > with N proximity domains using a NxN distance table. This makes sense > if every node contains both CPUs and memory. > > With the introduction of the HMAT, though, we can have memory-only > initiator nodes and we can explicitly associate them with their local > CPU. This is necessary so that we can separate memory with different > performance characteristics (HBM vs normal memory vs persistent memory, > for example) that are all attached to the same CPU. > > So, say we now have a system with 4 CPUs, and each of those CPUs has 3 > different types of memory attached to it. We now have 16 total proximity > domains, 4 CPU and 12 memory. The CPU cores that make up a node can have performance restrictions of their own; for example, they might max out at 10 GB/s even though the memory controller supports 120 GB/s (meaning you need to use 12 cores on the node to fully exercise memory). It'd be helpful to report this, so software can decide how many cores to use for bandwidth-intensive work. > If we represent this with the SLIT we end up with a 16 X 16 distance table > (256 entries), most of which don't matter because they are memory-to- > memory distances which don't make sense. > > In the HMAT, though, we separate out the initiators and the targets and > put them into separate lists. (See 5.2.27.4 System Locality Latency and > Bandwidth Information Structure in ACPI 6.2 for details.) So, this same > config in the HMAT only has 4*12=48 performance values of each type, all > of which convey meaningful information. > > The HMAT indeed even uses the storage "initiator" and "target" > terminology. :) Centralized DMA engines (e.g., as used by the "DMA based blk-mq pmem driver") have performance differences too. A CPU might include CPU cores that reach 10 GB/s, DMA engines that reach 60 GB/s, and memory controllers that reach 120 GB/s. I guess these would be represented as extra initiators on the node? --- Robert Elliott, HPE Persistent Memory From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 079896B025F for ; Wed, 20 Dec 2017 20:41:22 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id n6so17422179pfg.19 for ; Wed, 20 Dec 2017 17:41:22 -0800 (PST) Received: from g2t2352.austin.hpe.com (g2t2352.austin.hpe.com. [15.233.44.25]) by mx.google.com with ESMTPS id e4si14066463pln.445.2017.12.20.17.41.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Dec 2017 17:41:21 -0800 (PST) From: "Elliott, Robert (Persistent Memory)" Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Thu, 21 Dec 2017 01:41:15 +0000 Message-ID: References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <20171220211350.GA2688@linux.intel.com> In-Reply-To: <20171220211350.GA2688@linux.intel.com> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Ross Zwisler , Matthew Wilcox Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , "linux-acpi@vger.kernel.org" , "Odzioba, Lukasz" , "Schmauss, Erik" , Len Brown , John Hubbard , "linuxppc-dev@lists.ozlabs.org" , Jerome Glisse , "devel@acpica.org" , "Kogut, Jaroslaw" , "linux-mm@kvack.org" , "Koss, Marcin" , "linux-api@vger.kernel.org" , Brice Goglin , "Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , "Koziej, Artur" , "Lahtinen, Joonas" , Andrew Morton , Tim Chen > -----Original Message----- > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf O= f > Ross Zwisler ... >=20 > On Wed, Dec 20, 2017 at 10:19:37AM -0800, Matthew Wilcox wrote: ... > > initiator is a CPU? I'd have expected you to expose a memory controlle= r > > abstraction rather than re-use storage terminology. >=20 > Yea, I agree that at first blush it seems weird. It turns out that > looking at it in sort of a storage initiator/target way is beneficial, > though, because it allows us to cut down on the number of data values > we need to represent. >=20 > For example the SLIT, which doesn't differentiate between initiator and > target proximity domains (and thus nodes) always represents a system > with N proximity domains using a NxN distance table. This makes sense > if every node contains both CPUs and memory. >=20 > With the introduction of the HMAT, though, we can have memory-only > initiator nodes and we can explicitly associate them with their local=20 > CPU. This is necessary so that we can separate memory with different > performance characteristics (HBM vs normal memory vs persistent memory, > for example) that are all attached to the same CPU. >=20 > So, say we now have a system with 4 CPUs, and each of those CPUs has 3 > different types of memory attached to it. We now have 16 total proximity > domains, 4 CPU and 12 memory. The CPU cores that make up a node can have performance restrictions of their own; for example, they might max out at 10 GB/s even though the memory controller supports 120 GB/s (meaning you need to use 12 cores on the node to fully exercise memory). It'd be helpful to report this, so software can decide how many cores to use for bandwidth-intensive work. > If we represent this with the SLIT we end up with a 16 X 16 distance tabl= e > (256 entries), most of which don't matter because they are memory-to- > memory distances which don't make sense. >=20 > In the HMAT, though, we separate out the initiators and the targets and > put them into separate lists. (See 5.2.27.4 System Locality Latency and > Bandwidth Information Structure in ACPI 6.2 for details.) So, this same > config in the HMAT only has 4*12=3D48 performance values of each type, al= l > of which convey meaningful information. >=20 > The HMAT indeed even uses the storage "initiator" and "target" > terminology. :) Centralized DMA engines (e.g., as used by the "DMA based blk-mq pmem driver") have performance differences too. A CPU might include CPU cores that reach 10 GB/s, DMA engines that reach 60 GB/s, and memory controllers that reach 120 GB/s. I guess these would be represented as extra initiators on the node? --- Robert Elliott, HPE Persistent Memory -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from g2t1383g.austin.hpe.com (g2t1383g.austin.hpe.com [15.233.16.89]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3z2Dym0JVCzF07s for ; Thu, 21 Dec 2017 12:46:55 +1100 (AEDT) Received: from g2t2352.austin.hpe.com (g2t2352.austin.hpe.com [15.233.44.25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by g2t1383g.austin.hpe.com (Postfix) with ESMTPS id 1A4221E3E for ; Thu, 21 Dec 2017 01:41:22 +0000 (UTC) From: "Elliott, Robert (Persistent Memory)" To: Ross Zwisler , Matthew Wilcox CC: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , "linux-acpi@vger.kernel.org" , "Odzioba, Lukasz" , "Schmauss, Erik" , "Len Brown" , John Hubbard , "linuxppc-dev@lists.ozlabs.org" , "Jerome Glisse" , "devel@acpica.org" , "Kogut, Jaroslaw" , "linux-mm@kvack.org" , "Koss, Marcin" , "linux-api@vger.kernel.org" , Brice Goglin , "Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , "Koziej, Artur" , "Lahtinen, Joonas" , Andrew Morton , "Tim Chen" Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Thu, 21 Dec 2017 01:41:15 +0000 Message-ID: References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <20171220211350.GA2688@linux.intel.com> In-Reply-To: <20171220211350.GA2688@linux.intel.com> Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , > -----Original Message----- > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf O= f > Ross Zwisler ... >=20 > On Wed, Dec 20, 2017 at 10:19:37AM -0800, Matthew Wilcox wrote: ... > > initiator is a CPU? I'd have expected you to expose a memory controlle= r > > abstraction rather than re-use storage terminology. >=20 > Yea, I agree that at first blush it seems weird. It turns out that > looking at it in sort of a storage initiator/target way is beneficial, > though, because it allows us to cut down on the number of data values > we need to represent. >=20 > For example the SLIT, which doesn't differentiate between initiator and > target proximity domains (and thus nodes) always represents a system > with N proximity domains using a NxN distance table. This makes sense > if every node contains both CPUs and memory. >=20 > With the introduction of the HMAT, though, we can have memory-only > initiator nodes and we can explicitly associate them with their local=20 > CPU. This is necessary so that we can separate memory with different > performance characteristics (HBM vs normal memory vs persistent memory, > for example) that are all attached to the same CPU. >=20 > So, say we now have a system with 4 CPUs, and each of those CPUs has 3 > different types of memory attached to it. We now have 16 total proximity > domains, 4 CPU and 12 memory. The CPU cores that make up a node can have performance restrictions of their own; for example, they might max out at 10 GB/s even though the memory controller supports 120 GB/s (meaning you need to use 12 cores on the node to fully exercise memory). It'd be helpful to report this, so software can decide how many cores to use for bandwidth-intensive work. > If we represent this with the SLIT we end up with a 16 X 16 distance tabl= e > (256 entries), most of which don't matter because they are memory-to- > memory distances which don't make sense. >=20 > In the HMAT, though, we separate out the initiators and the targets and > put them into separate lists. (See 5.2.27.4 System Locality Latency and > Bandwidth Information Structure in ACPI 6.2 for details.) So, this same > config in the HMAT only has 4*12=3D48 performance values of each type, al= l > of which convey meaningful information. >=20 > The HMAT indeed even uses the storage "initiator" and "target" > terminology. :) Centralized DMA engines (e.g., as used by the "DMA based blk-mq pmem driver") have performance differences too. A CPU might include CPU cores that reach 10 GB/s, DMA engines that reach 60 GB/s, and memory controllers that reach 120 GB/s. I guess these would be represented as extra initiators on the node? --- Robert Elliott, HPE Persistent Memory