From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> From: Brice Goglin Message-ID: <71317994-af66-a1b2-4c7a-86a03253cf62@gmail.com> Date: Wed, 27 Dec 2017 10:10:34 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: owner-linux-mm@kvack.org To: Dan Williams Cc: Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev List-ID: Le 22/12/2017 à 23:53, Dan Williams a écrit : > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: >> Le 20/12/2017 à 23:41, Ross Zwisler a écrit : > [..] >> Hello >> >> I can confirm that HPC runtimes are going to use these patches (at least >> all runtimes that use hwloc for topology discovery, but that's the vast >> majority of HPC anyway). >> >> We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> explicitly detect that specific crazy table to find out which NUMA nodes >> were local to which cores, and to find out which NUMA nodes were >> HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> application because the reported latencies didn't match reality. Quite >> annoying. >> >> With Ross' patches, we can easily get what we need: >> * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> can only report a single local node per CPU (doesn't work for KNL and >> upcoming architectures with HBM+DDR+...) >> * which NUMA nodes are slow/fast (for both bandwidth and latency) >> And we can still look at SLIT under /sys/devices/system/node if really >> needed. >> >> And of course having this in sysfs is much better than parsing ACPI >> tables that are only accessible to root :) > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. Once you need to be root > for this info, is parsing binary HMAT vs sysfs a blocker for the HPC > use case? I don't think it would be a blocker. > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. Some HPC users will benchmark the machine to discovery actual performance numbers anyway. However, most users won't do this. They will want to know relative performance of different nodes. If you normalize HMAT values by dividing them with system-RAM values, that's likely OK. If you just say "that node is faster than system RAM", it's not precise enough. Brice -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brice Goglin Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Wed, 27 Dec 2017 10:10:34 +0100 Message-ID: <71317994-af66-a1b2-4c7a-86a03253cf62@gmail.com> References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , Matthew Wilcox , Linux ACPI , "Odzioba, Lukasz" , "Schmauss, Erik" , Len Brown , John Hubbard , linuxppc-dev , Jerome Glisse , devel-E0kO6a4B6psdnm+yROfE0A@public.gmane.org, "Kogut, Jaroslaw" , Linux MM , "Koss, Marcin" , Linux API , "Nachimuthu, Murugasamy" List-Id: linux-acpi@vger.kernel.org TGUgMjIvMTIvMjAxNyDDoCAyMzo1MywgRGFuIFdpbGxpYW1zIGEgw6ljcml0wqA6Cj4gT24gVGh1 LCBEZWMgMjEsIDIwMTcgYXQgMTI6MzEgUE0sIEJyaWNlIEdvZ2xpbiA8YnJpY2UuZ29nbGluQGdt YWlsLmNvbT4gd3JvdGU6Cj4+IExlIDIwLzEyLzIwMTcgw6AgMjM6NDEsIFJvc3MgWndpc2xlciBh IMOpY3JpdCA6Cj4gWy4uXQo+PiBIZWxsbwo+Pgo+PiBJIGNhbiBjb25maXJtIHRoYXQgSFBDIHJ1 bnRpbWVzIGFyZSBnb2luZyB0byB1c2UgdGhlc2UgcGF0Y2hlcyAoYXQgbGVhc3QKPj4gYWxsIHJ1 bnRpbWVzIHRoYXQgdXNlIGh3bG9jIGZvciB0b3BvbG9neSBkaXNjb3ZlcnksIGJ1dCB0aGF0J3Mg dGhlIHZhc3QKPj4gbWFqb3JpdHkgb2YgSFBDIGFueXdheSkuCj4+Cj4+IFdlIHJlYWxseSBkaWRu J3QgbGlrZSBLTkwgZXhwb3NpbmcgYSBoYWNreSBTTElUIHRhYmxlIFsxXS4gV2UgaGFkIHRvCj4+ IGV4cGxpY2l0bHkgZGV0ZWN0IHRoYXQgc3BlY2lmaWMgY3JhenkgdGFibGUgdG8gZmluZCBvdXQg d2hpY2ggTlVNQSBub2Rlcwo+PiB3ZXJlIGxvY2FsIHRvIHdoaWNoIGNvcmVzLCBhbmQgdG8gZmlu ZCBvdXQgd2hpY2ggTlVNQSBub2RlcyB3ZXJlCj4+IEhCTS9NQ0RSQU0gb3IgRERSLiBBbmQgdGhl biB3ZSBoYWQgdG8gaGlkZSB0aGUgU0xJVCB2YWx1ZXMgdG8gdGhlCj4+IGFwcGxpY2F0aW9uIGJl Y2F1c2UgdGhlIHJlcG9ydGVkIGxhdGVuY2llcyBkaWRuJ3QgbWF0Y2ggcmVhbGl0eS4gUXVpdGUK Pj4gYW5ub3lpbmcuCj4+Cj4+IFdpdGggUm9zcycgcGF0Y2hlcywgd2UgY2FuIGVhc2lseSBnZXQg d2hhdCB3ZSBuZWVkOgo+PiAqIHdoaWNoIE5VTUEgbm9kZXMgYXJlIGxvY2FsIHRvIHdoaWNoIENQ VXM/IC9zeXMvZGV2aWNlcy9zeXN0ZW0vbm9kZS8KPj4gY2FuIG9ubHkgcmVwb3J0IGEgc2luZ2xl IGxvY2FsIG5vZGUgcGVyIENQVSAoZG9lc24ndCB3b3JrIGZvciBLTkwgYW5kCj4+IHVwY29taW5n IGFyY2hpdGVjdHVyZXMgd2l0aCBIQk0rRERSKy4uLikKPj4gKiB3aGljaCBOVU1BIG5vZGVzIGFy ZSBzbG93L2Zhc3QgKGZvciBib3RoIGJhbmR3aWR0aCBhbmQgbGF0ZW5jeSkKPj4gQW5kIHdlIGNh biBzdGlsbCBsb29rIGF0IFNMSVQgdW5kZXIgL3N5cy9kZXZpY2VzL3N5c3RlbS9ub2RlIGlmIHJl YWxseQo+PiBuZWVkZWQuCj4+Cj4+IEFuZCBvZiBjb3Vyc2UgaGF2aW5nIHRoaXMgaW4gc3lzZnMg aXMgbXVjaCBiZXR0ZXIgdGhhbiBwYXJzaW5nIEFDUEkKPj4gdGFibGVzIHRoYXQgYXJlIG9ubHkg YWNjZXNzaWJsZSB0byByb290IDopCj4gT24gdGhpcyBwb2ludCwgaXQncyBub3QgY2xlYXIgdG8g bWUgdGhhdCB3ZSBzaG91bGQgYWxsb3cgdGhlc2Ugc3lzZnMKPiBlbnRyaWVzIHRvIGJlIHdvcmxk IHJlYWRhYmxlLiBHaXZlbiAvcHJvYy9pb21lbSBub3cgaGlkZXMgcGh5c2ljYWwKPiBhZGRyZXNz IGluZm9ybWF0aW9uIGZyb20gbm9uLXJvb3Qgd2UgYXQgbGVhc3QgbmVlZCB0byBiZSBjYXJlZnVs IG5vdAo+IHRvIHVuZG8gdGhhdCB3aXRoIG5ldyBzeXNmcyBITUFUIGF0dHJpYnV0ZXMuIE9uY2Ug eW91IG5lZWQgdG8gYmUgcm9vdAo+IGZvciB0aGlzIGluZm8sIGlzIHBhcnNpbmcgYmluYXJ5IEhN QVQgdnMgc3lzZnMgYSBibG9ja2VyIGZvciB0aGUgSFBDCj4gdXNlIGNhc2U/CgpJIGRvbid0IHRo aW5rIGl0IHdvdWxkIGJlIGEgYmxvY2tlci4KCj4gUGVyaGFwcyB3ZSBjYW4gZW5saXN0IC9wcm9j L2lvbWVtIG9yIGEgc2ltaWxhciBlbnVtZXJhdGlvbiBpbnRlcmZhY2UKPiB0byB0ZWxsIHVzZXJz cGFjZSB0aGUgTlVNQSBub2RlIGFuZCB3aGV0aGVyIHRoZSBrZXJuZWwgdGhpbmtzIGl0IGhhcwo+ IGJldHRlciBvciB3b3JzZSBwZXJmb3JtYW5jZSBjaGFyYWN0ZXJpc3RpY3MgcmVsYXRpdmUgdG8g YmFzZQo+IHN5c3RlbS1SQU0sIGkuZS4gbmV3IElPUkVTX0RFU0NfKiB2YWx1ZXMuIEknbSB3b3Jy aWVkIHRoYXQgaWYgd2Ugc3RhcnQKPiBwdWJsaXNoaW5nIGFic29sdXRlIG51bWJlcnMgaW4gc3lz ZnMgdXNlcnNwYWNlIHdpbGwgZGVmYXVsdCB0byBsb29raW5nCj4gZm9yIHNwZWNpZmljIG1hZ2lj IG51bWJlcnMgaW4gc3lzZnMgdnMgYXNraW5nIHRoZSBrZXJuZWwgZm9yIG1lbW9yeQo+IHRoYXQg aGFzIHBlcmZvcm1hbmNlIGNoYXJhY3RlcmlzdGljcyByZWxhdGl2ZSB0byBiYXNlICJTeXN0ZW0g UkFNIi4gSW4KPiBvdGhlciB3b3JkcyB0aGUgYWJzb2x1dGUgcGVyZm9ybWFuY2UgaW5mb3JtYXRp b24gdGhhdCB0aGUgSE1BVAo+IHB1Ymxpc2hlcyBpcyB1c2VmdWwgdG8gdGhlIGtlcm5lbCwgYnV0 IGl0J3Mgbm90IGNsZWFyIHRoYXQgdXNlcnNwYWNlCj4gbmVlZHMgdGhhdCB2cyBhIHJlbGF0aXZl IGluZGljYXRvciBmb3IgbWFraW5nIE5VTUEgbm9kZSBwcmVmZXJlbmNlCj4gZGVjaXNpb25zLgoK U29tZSBIUEMgdXNlcnMgd2lsbCBiZW5jaG1hcmsgdGhlIG1hY2hpbmUgdG8gZGlzY292ZXJ5IGFj dHVhbApwZXJmb3JtYW5jZSBudW1iZXJzIGFueXdheS4KSG93ZXZlciwgbW9zdCB1c2VycyB3b24n dCBkbyB0aGlzLiBUaGV5IHdpbGwgd2FudCB0byBrbm93IHJlbGF0aXZlCnBlcmZvcm1hbmNlIG9m IGRpZmZlcmVudCBub2Rlcy4gSWYgeW91IG5vcm1hbGl6ZSBITUFUIHZhbHVlcyBieSBkaXZpZGlu Zwp0aGVtIHdpdGggc3lzdGVtLVJBTSB2YWx1ZXMsIHRoYXQncyBsaWtlbHkgT0suIElmIHlvdSBq dXN0IHNheSAidGhhdApub2RlIGlzIGZhc3RlciB0aGFuIHN5c3RlbSBSQU0iLCBpdCdzIG5vdCBw cmVjaXNlIGVub3VnaC4KCkJyaWNlCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fXwpMaW51eC1udmRpbW0gbWFpbGluZyBsaXN0CkxpbnV4LW52ZGltbUBsaXN0 cy4wMS5vcmcKaHR0cHM6Ly9saXN0cy4wMS5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1udmRp bW0K From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751715AbdL0JKk (ORCPT ); Wed, 27 Dec 2017 04:10:40 -0500 Received: from mail2-relais-roc.national.inria.fr ([192.134.164.83]:10871 "EHLO mail2-relais-roc.national.inria.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751004AbdL0JKh (ORCPT ); Wed, 27 Dec 2017 04:10:37 -0500 X-IronPort-AV: E=Sophos;i="5.45,464,1508796000"; d="scan'208";a="306974074" Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT To: Dan Williams Cc: Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> From: Brice Goglin Message-ID: <71317994-af66-a1b2-4c7a-86a03253cf62@gmail.com> Date: Wed, 27 Dec 2017 10:10:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 22/12/2017 à 23:53, Dan Williams a écrit : > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: >> Le 20/12/2017 à 23:41, Ross Zwisler a écrit : > [..] >> Hello >> >> I can confirm that HPC runtimes are going to use these patches (at least >> all runtimes that use hwloc for topology discovery, but that's the vast >> majority of HPC anyway). >> >> We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> explicitly detect that specific crazy table to find out which NUMA nodes >> were local to which cores, and to find out which NUMA nodes were >> HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> application because the reported latencies didn't match reality. Quite >> annoying. >> >> With Ross' patches, we can easily get what we need: >> * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> can only report a single local node per CPU (doesn't work for KNL and >> upcoming architectures with HBM+DDR+...) >> * which NUMA nodes are slow/fast (for both bandwidth and latency) >> And we can still look at SLIT under /sys/devices/system/node if really >> needed. >> >> And of course having this in sysfs is much better than parsing ACPI >> tables that are only accessible to root :) > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. Once you need to be root > for this info, is parsing binary HMAT vs sysfs a blocker for the HPC > use case? I don't think it would be a blocker. > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. Some HPC users will benchmark the machine to discovery actual performance numbers anyway. However, most users won't do this. They will want to know relative performance of different nodes. If you normalize HMAT values by dividing them with system-RAM values, that's likely OK. If you just say "that node is faster than system RAM", it's not precise enough. Brice From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id D467C6B0033 for ; Wed, 27 Dec 2017 04:10:37 -0500 (EST) Received: by mail-wr0-f199.google.com with SMTP id q4so5446495wre.14 for ; Wed, 27 Dec 2017 01:10:37 -0800 (PST) Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr. [192.134.164.83]) by mx.google.com with ESMTPS id h4si7342118wrh.59.2017.12.27.01.10.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 01:10:36 -0800 (PST) Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> From: Brice Goglin Message-ID: <71317994-af66-a1b2-4c7a-86a03253cf62@gmail.com> Date: Wed, 27 Dec 2017 10:10:34 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Dan Williams Cc: Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev Le 22/12/2017 A 23:53, Dan Williams a A(C)critA : > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: >> Le 20/12/2017 A 23:41, Ross Zwisler a A(C)crit : > [..] >> Hello >> >> I can confirm that HPC runtimes are going to use these patches (at least >> all runtimes that use hwloc for topology discovery, but that's the vast >> majority of HPC anyway). >> >> We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> explicitly detect that specific crazy table to find out which NUMA nodes >> were local to which cores, and to find out which NUMA nodes were >> HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> application because the reported latencies didn't match reality. Quite >> annoying. >> >> With Ross' patches, we can easily get what we need: >> * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> can only report a single local node per CPU (doesn't work for KNL and >> upcoming architectures with HBM+DDR+...) >> * which NUMA nodes are slow/fast (for both bandwidth and latency) >> And we can still look at SLIT under /sys/devices/system/node if really >> needed. >> >> And of course having this in sysfs is much better than parsing ACPI >> tables that are only accessible to root :) > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. Once you need to be root > for this info, is parsing binary HMAT vs sysfs a blocker for the HPC > use case? I don't think it would be a blocker. > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. Some HPC users will benchmark the machine to discovery actual performance numbers anyway. However, most users won't do this. They will want to know relative performance of different nodes. If you normalize HMAT values by dividing them with system-RAM values, that's likely OK. If you just say "that node is faster than system RAM", it's not precise enough. Brice -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org