From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-x242.google.com (mail-ot0-x242.google.com [IPv6:2607:f8b0:4003:c0f::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AFE42222F4E37 for ; Fri, 22 Dec 2017 15:52:52 -0800 (PST) Received: by mail-ot0-x242.google.com with SMTP id p31so19749451ota.4 for ; Fri, 22 Dec 2017 15:57:42 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> From: Dan Williams Date: Fri, 22 Dec 2017 15:57:41 -0800 Message-ID: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Ross Zwisler Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , Anaczkowski,, "Robert , Matthew Wilcox , Linux ACPI" , Odzioba,, "Erik , Len Brown" , John Hubbard , linuxppc-dev , Jerome Glisse , devel@acpica.org, Kogut,, "Marcin , Linux API , Brice Goglin" , "Nachimuthu, Murugasamy , Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Koziej,, "Joonas , Andrew Morton , Tim Chen" List-ID: T24gRnJpLCBEZWMgMjIsIDIwMTcgYXQgMzoyMiBQTSwgUm9zcyBad2lzbGVyCjxyb3NzLnp3aXNs ZXJAbGludXguaW50ZWwuY29tPiB3cm90ZToKPiBPbiBGcmksIERlYyAyMiwgMjAxNyBhdCAwMjo1 Mzo0MlBNIC0wODAwLCBEYW4gV2lsbGlhbXMgd3JvdGU6Cj4+IE9uIFRodSwgRGVjIDIxLCAyMDE3 IGF0IDEyOjMxIFBNLCBCcmljZSBHb2dsaW4gPGJyaWNlLmdvZ2xpbkBnbWFpbC5jb20+IHdyb3Rl Ogo+PiA+IExlIDIwLzEyLzIwMTcgw6AgMjM6NDEsIFJvc3MgWndpc2xlciBhIMOpY3JpdCA6Cj4+ IFsuLl0KPj4gPiBIZWxsbwo+PiA+Cj4+ID4gSSBjYW4gY29uZmlybSB0aGF0IEhQQyBydW50aW1l cyBhcmUgZ29pbmcgdG8gdXNlIHRoZXNlIHBhdGNoZXMgKGF0IGxlYXN0Cj4+ID4gYWxsIHJ1bnRp bWVzIHRoYXQgdXNlIGh3bG9jIGZvciB0b3BvbG9neSBkaXNjb3ZlcnksIGJ1dCB0aGF0J3MgdGhl IHZhc3QKPj4gPiBtYWpvcml0eSBvZiBIUEMgYW55d2F5KS4KPj4gPgo+PiA+IFdlIHJlYWxseSBk aWRuJ3QgbGlrZSBLTkwgZXhwb3NpbmcgYSBoYWNreSBTTElUIHRhYmxlIFsxXS4gV2UgaGFkIHRv Cj4+ID4gZXhwbGljaXRseSBkZXRlY3QgdGhhdCBzcGVjaWZpYyBjcmF6eSB0YWJsZSB0byBmaW5k IG91dCB3aGljaCBOVU1BIG5vZGVzCj4+ID4gd2VyZSBsb2NhbCB0byB3aGljaCBjb3JlcywgYW5k IHRvIGZpbmQgb3V0IHdoaWNoIE5VTUEgbm9kZXMgd2VyZQo+PiA+IEhCTS9NQ0RSQU0gb3IgRERS LiBBbmQgdGhlbiB3ZSBoYWQgdG8gaGlkZSB0aGUgU0xJVCB2YWx1ZXMgdG8gdGhlCj4+ID4gYXBw bGljYXRpb24gYmVjYXVzZSB0aGUgcmVwb3J0ZWQgbGF0ZW5jaWVzIGRpZG4ndCBtYXRjaCByZWFs aXR5LiBRdWl0ZQo+PiA+IGFubm95aW5nLgo+PiA+Cj4+ID4gV2l0aCBSb3NzJyBwYXRjaGVzLCB3 ZSBjYW4gZWFzaWx5IGdldCB3aGF0IHdlIG5lZWQ6Cj4+ID4gKiB3aGljaCBOVU1BIG5vZGVzIGFy ZSBsb2NhbCB0byB3aGljaCBDUFVzPyAvc3lzL2RldmljZXMvc3lzdGVtL25vZGUvCj4+ID4gY2Fu IG9ubHkgcmVwb3J0IGEgc2luZ2xlIGxvY2FsIG5vZGUgcGVyIENQVSAoZG9lc24ndCB3b3JrIGZv ciBLTkwgYW5kCj4+ID4gdXBjb21pbmcgYXJjaGl0ZWN0dXJlcyB3aXRoIEhCTStERFIrLi4uKQo+ PiA+ICogd2hpY2ggTlVNQSBub2RlcyBhcmUgc2xvdy9mYXN0IChmb3IgYm90aCBiYW5kd2lkdGgg YW5kIGxhdGVuY3kpCj4+ID4gQW5kIHdlIGNhbiBzdGlsbCBsb29rIGF0IFNMSVQgdW5kZXIgL3N5 cy9kZXZpY2VzL3N5c3RlbS9ub2RlIGlmIHJlYWxseQo+PiA+IG5lZWRlZC4KPj4gPgo+PiA+IEFu ZCBvZiBjb3Vyc2UgaGF2aW5nIHRoaXMgaW4gc3lzZnMgaXMgbXVjaCBiZXR0ZXIgdGhhbiBwYXJz aW5nIEFDUEkKPj4gPiB0YWJsZXMgdGhhdCBhcmUgb25seSBhY2Nlc3NpYmxlIHRvIHJvb3QgOikK Pj4KPj4gT24gdGhpcyBwb2ludCwgaXQncyBub3QgY2xlYXIgdG8gbWUgdGhhdCB3ZSBzaG91bGQg YWxsb3cgdGhlc2Ugc3lzZnMKPj4gZW50cmllcyB0byBiZSB3b3JsZCByZWFkYWJsZS4gR2l2ZW4g L3Byb2MvaW9tZW0gbm93IGhpZGVzIHBoeXNpY2FsCj4+IGFkZHJlc3MgaW5mb3JtYXRpb24gZnJv bSBub24tcm9vdCB3ZSBhdCBsZWFzdCBuZWVkIHRvIGJlIGNhcmVmdWwgbm90Cj4+IHRvIHVuZG8g dGhhdCB3aXRoIG5ldyBzeXNmcyBITUFUIGF0dHJpYnV0ZXMuCj4KPiBUaGlzIGVuYWJsaW5nIGRv ZXMgbm90IGV4cG9zZSBhbnkgcGh5c2ljYWwgYWRkcmVzc2VzIHRvIHVzZXJzcGFjZS4gIEl0IG9u bHkKPiBwcm92aWRlcyBwZXJmb3JtYW5jZSBudW1iZXJzIGZyb20gdGhlIEhNQVQgYW5kIGFzc29j aWF0ZXMgdGhlbSB3aXRoIGV4aXN0aW5nCj4gTlVNQSBub2Rlcy4gIEFyZSB5b3Ugd29ycmllZCB0 aGF0IGV4cG9zaW5nIHBlcmZvcm1hbmNlIG51bWJlcnMgdG8gbm9uLXJvb3QKPiB1c2VycyB2aWEg c3lzZnMgcG9zZXMgYSBzZWN1cml0eSByaXNrPwoKSXQncyBhbiBpbmZvcm1hdGlvbiBkaXNjbG9z dXJlIHRoYXQncyBub3QgY2xlYXIgd2UgbmVlZCB0byBtYWtlIHRvCm5vbi1yb290IHByb2Nlc3Nl cy4KCkknbSBtb3JlIHdvcnJpZWQgYWJvdXQgdXNlcnNwYWNlIGdyb3dpbmcgZGVwZW5kZW5jaWVz IG9uIHRoZSBhYnNvbHV0ZQpudW1iZXJzIHdoZW4gdGhvc2UgbnVtYmVycyBjYW4gY2hhbmdlIGZy b20gcGxhdGZvcm0gdG8gcGxhdGZvcm0uCkRpZmZlcmVudGlhdGVkIG1lbW9yeSBvbiBvbmUgcGxh dGZvcm0gbWF5IGJlIHRoZSBjb21tb24gbWVtb3J5IHBvb2wgb24KYW5vdGhlci4KClRvIG1lIHRo aXMgaGFzIHBhcmFsbGVscyB3aXRoIHN0b3JhZ2UgZGV2aWNlIGhpbnRpbmcgd2hlcmUKc3BlY2lm aWNhdGlvbnMgbGlrZSBUMTAgaGF2ZSBhIGNvbXBsZXggZW51bWVyYXRpb24gb2YgYWxsIHRoZQpw ZXJmb3JtYW5jZSBoaW50cyB0aGF0IGNhbiBiZSBwYXNzZWQgdG8gdGhlIGRldmljZSwgYnV0IHRo ZSBMaW51eAplbmFibGluZyBlZmZvcnQgYWltcyBmb3IgYSBzYW5pdHplZCBzZXQgb2YgcmVsYXRp dmUgaGludHMgdGhhdCBtYWtlCnNlbnNlLiBJdCdzIG1vcmUgZmxleGlibGUgaWYgdXNlcnNwYWNl IHNwZWNpZmllcyBhIHJlbGF0aXZlIGludGVudApyYXRoZXIgdGhhbiBhbiBhYnNvbHV0ZSBwZXJm b3JtYW5jZSB0YXJnZXQuIFB1dHRpbmcgYWxsIHRoZSBITUFUCmluZm9ybWF0aW9uIGludG8gc3lz ZnMgZ2l2ZXMgdXNlcnNwYWNlIG1vcmUgaW5mb3JtYXRpb24gdGhhbiBpdCBjb3VsZApwb3NzaWJs eSBkbyBhbnl0aGluZyByZWFzb25hYmxlLCBhdCBsZWFzdCBvdXRzaWRlIG9mIHNwZWNpYWxpemVk IGFwcHMKdGhhdCBhcmUgaGFuZCB0dW5lZCBmb3IgYSBnaXZlbiBoYXJkd2FyZSBwbGF0Zm9ybS4K X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTGludXgtbnZk aW1tIG1haWxpbmcgbGlzdApMaW51eC1udmRpbW1AbGlzdHMuMDEub3JnCmh0dHBzOi8vbGlzdHMu MDEub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtbnZkaW1tCg== From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Fri, 22 Dec 2017 15:57:41 -0800 Message-ID: References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail-ot0-f194.google.com ([74.125.82.194]:36108 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756762AbdLVX5m (ORCPT ); Fri, 22 Dec 2017 18:57:42 -0500 Received: by mail-ot0-f194.google.com with SMTP id d5so26396007oti.3 for ; Fri, 22 Dec 2017 15:57:42 -0800 (PST) In-Reply-To: <20171222232231.GA26715@linux.intel.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Ross Zwisler Cc: Brice Goglin , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: >> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin = wrote: >> > Le 20/12/2017 =C3=A0 23:41, Ross Zwisler a =C3=A9crit : >> [..] >> > Hello >> > >> > I can confirm that HPC runtimes are going to use these patches (at lea= st >> > all runtimes that use hwloc for topology discovery, but that's the vas= t >> > majority of HPC anyway). >> > >> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> > explicitly detect that specific crazy table to find out which NUMA nod= es >> > were local to which cores, and to find out which NUMA nodes were >> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> > application because the reported latencies didn't match reality. Quite >> > annoying. >> > >> > With Ross' patches, we can easily get what we need: >> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> > can only report a single local node per CPU (doesn't work for KNL and >> > upcoming architectures with HBM+DDR+...) >> > * which NUMA nodes are slow/fast (for both bandwidth and latency) >> > And we can still look at SLIT under /sys/devices/system/node if really >> > needed. >> > >> > And of course having this in sysfs is much better than parsing ACPI >> > tables that are only accessible to root :) >> >> On this point, it's not clear to me that we should allow these sysfs >> entries to be world readable. Given /proc/iomem now hides physical >> address information from non-root we at least need to be careful not >> to undo that with new sysfs HMAT attributes. > > This enabling does not expose any physical addresses to userspace. It on= ly > provides performance numbers from the HMAT and associates them with exist= ing > NUMA nodes. Are you worried that exposing performance numbers to non-roo= t > users via sysfs poses a security risk? It's an information disclosure that's not clear we need to make to non-root processes. I'm more worried about userspace growing dependencies on the absolute numbers when those numbers can change from platform to platform. Differentiated memory on one platform may be the common memory pool on another. To me this has parallels with storage device hinting where specifications like T10 have a complex enumeration of all the performance hints that can be passed to the device, but the Linux enabling effort aims for a sanitzed set of relative hints that make sense. It's more flexible if userspace specifies a relative intent rather than an absolute performance target. Putting all the HMAT information into sysfs gives userspace more information than it could possibly do anything reasonable, at least outside of specialized apps that are hand tuned for a given hardware platform. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756987AbdLVX5r (ORCPT ); Fri, 22 Dec 2017 18:57:47 -0500 Received: from mail-ot0-f193.google.com ([74.125.82.193]:35505 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756787AbdLVX5m (ORCPT ); Fri, 22 Dec 2017 18:57:42 -0500 X-Google-Smtp-Source: ACJfBouFCTi2cDMzLcP4qXykUPfeMskeNyJZsu39x4RP0/+JMAcaz6TJ0MVfcDNyi4K+Kfg0Zk6YnpJ8YdkkGLNN7oY= MIME-Version: 1.0 In-Reply-To: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> From: Dan Williams Date: Fri, 22 Dec 2017 15:57:41 -0800 Message-ID: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT To: Ross Zwisler Cc: Brice Goglin , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id vBMNvpYY008103 On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: >> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: >> > Le 20/12/2017 à 23:41, Ross Zwisler a écrit : >> [..] >> > Hello >> > >> > I can confirm that HPC runtimes are going to use these patches (at least >> > all runtimes that use hwloc for topology discovery, but that's the vast >> > majority of HPC anyway). >> > >> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> > explicitly detect that specific crazy table to find out which NUMA nodes >> > were local to which cores, and to find out which NUMA nodes were >> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> > application because the reported latencies didn't match reality. Quite >> > annoying. >> > >> > With Ross' patches, we can easily get what we need: >> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> > can only report a single local node per CPU (doesn't work for KNL and >> > upcoming architectures with HBM+DDR+...) >> > * which NUMA nodes are slow/fast (for both bandwidth and latency) >> > And we can still look at SLIT under /sys/devices/system/node if really >> > needed. >> > >> > And of course having this in sysfs is much better than parsing ACPI >> > tables that are only accessible to root :) >> >> On this point, it's not clear to me that we should allow these sysfs >> entries to be world readable. Given /proc/iomem now hides physical >> address information from non-root we at least need to be careful not >> to undo that with new sysfs HMAT attributes. > > This enabling does not expose any physical addresses to userspace. It only > provides performance numbers from the HMAT and associates them with existing > NUMA nodes. Are you worried that exposing performance numbers to non-root > users via sysfs poses a security risk? It's an information disclosure that's not clear we need to make to non-root processes. I'm more worried about userspace growing dependencies on the absolute numbers when those numbers can change from platform to platform. Differentiated memory on one platform may be the common memory pool on another. To me this has parallels with storage device hinting where specifications like T10 have a complex enumeration of all the performance hints that can be passed to the device, but the Linux enabling effort aims for a sanitzed set of relative hints that make sense. It's more flexible if userspace specifies a relative intent rather than an absolute performance target. Putting all the HMAT information into sysfs gives userspace more information than it could possibly do anything reasonable, at least outside of specialized apps that are hand tuned for a given hardware platform. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f199.google.com (mail-ot0-f199.google.com [74.125.82.199]) by kanga.kvack.org (Postfix) with ESMTP id AA8CC6B0253 for ; Fri, 22 Dec 2017 18:57:43 -0500 (EST) Received: by mail-ot0-f199.google.com with SMTP id j34so1350240otb.19 for ; Fri, 22 Dec 2017 15:57:43 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id j42sor3120872oth.0.2017.12.22.15.57.42 for (Google Transport Security); Fri, 22 Dec 2017 15:57:42 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> From: Dan Williams Date: Fri, 22 Dec 2017 15:57:41 -0800 Message-ID: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Ross Zwisler Cc: Brice Goglin , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: >> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin = wrote: >> > Le 20/12/2017 =C3=A0 23:41, Ross Zwisler a =C3=A9crit : >> [..] >> > Hello >> > >> > I can confirm that HPC runtimes are going to use these patches (at lea= st >> > all runtimes that use hwloc for topology discovery, but that's the vas= t >> > majority of HPC anyway). >> > >> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> > explicitly detect that specific crazy table to find out which NUMA nod= es >> > were local to which cores, and to find out which NUMA nodes were >> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> > application because the reported latencies didn't match reality. Quite >> > annoying. >> > >> > With Ross' patches, we can easily get what we need: >> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> > can only report a single local node per CPU (doesn't work for KNL and >> > upcoming architectures with HBM+DDR+...) >> > * which NUMA nodes are slow/fast (for both bandwidth and latency) >> > And we can still look at SLIT under /sys/devices/system/node if really >> > needed. >> > >> > And of course having this in sysfs is much better than parsing ACPI >> > tables that are only accessible to root :) >> >> On this point, it's not clear to me that we should allow these sysfs >> entries to be world readable. Given /proc/iomem now hides physical >> address information from non-root we at least need to be careful not >> to undo that with new sysfs HMAT attributes. > > This enabling does not expose any physical addresses to userspace. It on= ly > provides performance numbers from the HMAT and associates them with exist= ing > NUMA nodes. Are you worried that exposing performance numbers to non-roo= t > users via sysfs poses a security risk? It's an information disclosure that's not clear we need to make to non-root processes. I'm more worried about userspace growing dependencies on the absolute numbers when those numbers can change from platform to platform. Differentiated memory on one platform may be the common memory pool on another. To me this has parallels with storage device hinting where specifications like T10 have a complex enumeration of all the performance hints that can be passed to the device, but the Linux enabling effort aims for a sanitzed set of relative hints that make sense. It's more flexible if userspace specifies a relative intent rather than an absolute performance target. Putting all the HMAT information into sysfs gives userspace more information than it could possibly do anything reasonable, at least outside of specialized apps that are hand tuned for a given hardware platform. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-x242.google.com (mail-ot0-x242.google.com [IPv6:2607:f8b0:4003:c0f::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3z3QS12fg8zF0C8 for ; Sat, 23 Dec 2017 10:57:44 +1100 (AEDT) Received: by mail-ot0-x242.google.com with SMTP id b56so12098782otd.10 for ; Fri, 22 Dec 2017 15:57:44 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> From: Dan Williams Date: Fri, 22 Dec 2017 15:57:41 -0800 Message-ID: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT To: Ross Zwisler Cc: Brice Goglin , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev Content-Type: text/plain; charset="UTF-8" List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: >> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin = wrote: >> > Le 20/12/2017 =C3=A0 23:41, Ross Zwisler a =C3=A9crit : >> [..] >> > Hello >> > >> > I can confirm that HPC runtimes are going to use these patches (at lea= st >> > all runtimes that use hwloc for topology discovery, but that's the vas= t >> > majority of HPC anyway). >> > >> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to >> > explicitly detect that specific crazy table to find out which NUMA nod= es >> > were local to which cores, and to find out which NUMA nodes were >> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >> > application because the reported latencies didn't match reality. Quite >> > annoying. >> > >> > With Ross' patches, we can easily get what we need: >> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >> > can only report a single local node per CPU (doesn't work for KNL and >> > upcoming architectures with HBM+DDR+...) >> > * which NUMA nodes are slow/fast (for both bandwidth and latency) >> > And we can still look at SLIT under /sys/devices/system/node if really >> > needed. >> > >> > And of course having this in sysfs is much better than parsing ACPI >> > tables that are only accessible to root :) >> >> On this point, it's not clear to me that we should allow these sysfs >> entries to be world readable. Given /proc/iomem now hides physical >> address information from non-root we at least need to be careful not >> to undo that with new sysfs HMAT attributes. > > This enabling does not expose any physical addresses to userspace. It on= ly > provides performance numbers from the HMAT and associates them with exist= ing > NUMA nodes. Are you worried that exposing performance numbers to non-roo= t > users via sysfs poses a security risk? It's an information disclosure that's not clear we need to make to non-root processes. I'm more worried about userspace growing dependencies on the absolute numbers when those numbers can change from platform to platform. Differentiated memory on one platform may be the common memory pool on another. To me this has parallels with storage device hinting where specifications like T10 have a complex enumeration of all the performance hints that can be passed to the device, but the Linux enabling effort aims for a sanitzed set of relative hints that make sense. It's more flexible if userspace specifies a relative intent rather than an absolute performance target. Putting all the HMAT information into sysfs gives userspace more information than it could possibly do anything reasonable, at least outside of specialized apps that are hand tuned for a given hardware platform.