From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AD570222F4E3E for ; Fri, 22 Dec 2017 15:17:42 -0800 (PST) Date: Fri, 22 Dec 2017 16:22:31 -0700 From: Ross Zwisler Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Message-ID: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Michal Hocko , "Box, David E" , Dave Hansen , "Zheng, Lv" , "linux-nvdimm@lists.01.org" , "Rafael J. Wysocki" , Anaczkowski,, "Robert , Matthew Wilcox , Linux ACPI" , Odzioba,, "Erik , Len Brown" , John Hubbard , linuxppc-dev , Jerome Glisse , devel@acpica.org, Kogut,, "Marcin , Linux API , Brice Goglin" , "Nachimuthu, Murugasamy , Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Koziej,, "Joonas , Andrew Morton , Tim Chen" List-ID: On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin w= rote: > > Le 20/12/2017 =E0 23:41, Ross Zwisler a =E9crit : > [..] > > Hello > > > > I can confirm that HPC runtimes are going to use these patches (at least > > all runtimes that use hwloc for topology discovery, but that's the vast > > majority of HPC anyway). > > > > We really didn't like KNL exposing a hacky SLIT table [1]. We had to > > explicitly detect that specific crazy table to find out which NUMA nodes > > were local to which cores, and to find out which NUMA nodes were > > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the > > application because the reported latencies didn't match reality. Quite > > annoying. > > > > With Ross' patches, we can easily get what we need: > > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ > > can only report a single local node per CPU (doesn't work for KNL and > > upcoming architectures with HBM+DDR+...) > > * which NUMA nodes are slow/fast (for both bandwidth and latency) > > And we can still look at SLIT under /sys/devices/system/node if really > > needed. > > > > And of course having this in sysfs is much better than parsing ACPI > > tables that are only accessible to root :) > = > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. This enabling does not expose any physical addresses to userspace. It only provides performance numbers from the HMAT and associates them with existing NUMA nodes. Are you worried that exposing performance numbers to non-root users via sysfs poses a security risk? > Once you need to be root for this info, is parsing binary HMAT vs sysfs a > blocker for the HPC use case? > = > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ross Zwisler Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Fri, 22 Dec 2017 16:22:31 -0700 Message-ID: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Dan Williams Cc: Brice Goglin , Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" List-Id: linux-acpi@vger.kernel.org On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: > > Le 20/12/2017 à 23:41, Ross Zwisler a écrit : > [..] > > Hello > > > > I can confirm that HPC runtimes are going to use these patches (at least > > all runtimes that use hwloc for topology discovery, but that's the vast > > majority of HPC anyway). > > > > We really didn't like KNL exposing a hacky SLIT table [1]. We had to > > explicitly detect that specific crazy table to find out which NUMA nodes > > were local to which cores, and to find out which NUMA nodes were > > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the > > application because the reported latencies didn't match reality. Quite > > annoying. > > > > With Ross' patches, we can easily get what we need: > > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ > > can only report a single local node per CPU (doesn't work for KNL and > > upcoming architectures with HBM+DDR+...) > > * which NUMA nodes are slow/fast (for both bandwidth and latency) > > And we can still look at SLIT under /sys/devices/system/node if really > > needed. > > > > And of course having this in sysfs is much better than parsing ACPI > > tables that are only accessible to root :) > > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. This enabling does not expose any physical addresses to userspace. It only provides performance numbers from the HMAT and associates them with existing NUMA nodes. Are you worried that exposing performance numbers to non-root users via sysfs poses a security risk? > Once you need to be root for this info, is parsing binary HMAT vs sysfs a > blocker for the HPC use case? > > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756903AbdLVXWg (ORCPT ); Fri, 22 Dec 2017 18:22:36 -0500 Received: from mga05.intel.com ([192.55.52.43]:39593 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753008AbdLVXWc (ORCPT ); Fri, 22 Dec 2017 18:22:32 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,443,1508828400"; d="scan'208";a="4164985" Date: Fri, 22 Dec 2017 16:22:31 -0700 From: Ross Zwisler To: Dan Williams Cc: Brice Goglin , Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Message-ID: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: > > Le 20/12/2017 à 23:41, Ross Zwisler a écrit : > [..] > > Hello > > > > I can confirm that HPC runtimes are going to use these patches (at least > > all runtimes that use hwloc for topology discovery, but that's the vast > > majority of HPC anyway). > > > > We really didn't like KNL exposing a hacky SLIT table [1]. We had to > > explicitly detect that specific crazy table to find out which NUMA nodes > > were local to which cores, and to find out which NUMA nodes were > > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the > > application because the reported latencies didn't match reality. Quite > > annoying. > > > > With Ross' patches, we can easily get what we need: > > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ > > can only report a single local node per CPU (doesn't work for KNL and > > upcoming architectures with HBM+DDR+...) > > * which NUMA nodes are slow/fast (for both bandwidth and latency) > > And we can still look at SLIT under /sys/devices/system/node if really > > needed. > > > > And of course having this in sysfs is much better than parsing ACPI > > tables that are only accessible to root :) > > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. This enabling does not expose any physical addresses to userspace. It only provides performance numbers from the HMAT and associates them with existing NUMA nodes. Are you worried that exposing performance numbers to non-root users via sysfs poses a security risk? > Once you need to be root for this info, is parsing binary HMAT vs sysfs a > blocker for the HPC use case? > > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 65ACE6B0038 for ; Fri, 22 Dec 2017 18:22:34 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id z1so20922834pfl.9 for ; Fri, 22 Dec 2017 15:22:34 -0800 (PST) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id n10si17165348plp.158.2017.12.22.15.22.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Dec 2017 15:22:33 -0800 (PST) Date: Fri, 22 Dec 2017 16:22:31 -0700 From: Ross Zwisler Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Message-ID: <20171222232231.GA26715@linux.intel.com> References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Dan Williams Cc: Brice Goglin , Ross Zwisler , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin wrote: > > Le 20/12/2017 a 23:41, Ross Zwisler a ecrit : > [..] > > Hello > > > > I can confirm that HPC runtimes are going to use these patches (at least > > all runtimes that use hwloc for topology discovery, but that's the vast > > majority of HPC anyway). > > > > We really didn't like KNL exposing a hacky SLIT table [1]. We had to > > explicitly detect that specific crazy table to find out which NUMA nodes > > were local to which cores, and to find out which NUMA nodes were > > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the > > application because the reported latencies didn't match reality. Quite > > annoying. > > > > With Ross' patches, we can easily get what we need: > > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ > > can only report a single local node per CPU (doesn't work for KNL and > > upcoming architectures with HBM+DDR+...) > > * which NUMA nodes are slow/fast (for both bandwidth and latency) > > And we can still look at SLIT under /sys/devices/system/node if really > > needed. > > > > And of course having this in sysfs is much better than parsing ACPI > > tables that are only accessible to root :) > > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. This enabling does not expose any physical addresses to userspace. It only provides performance numbers from the HMAT and associates them with existing NUMA nodes. Are you worried that exposing performance numbers to non-root users via sysfs poses a security risk? > Once you need to be root for this info, is parsing binary HMAT vs sysfs a > blocker for the HPC use case? > > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============4907131457676150064==" MIME-Version: 1.0 From: Ross Zwisler Subject: Re: [Devel] [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Fri, 22 Dec 2017 16:22:31 -0700 Message-ID: <20171222232231.GA26715@linux.intel.com> In-Reply-To: CAPcyv4j9shdJFrvADa=qW4L-jPJJ4S_TJc_c=aRoW3EmSCCChQ@mail.gmail.com List-ID: To: devel@acpica.org --===============4907131457676150064== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: > On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin = wrote: > > Le 20/12/2017 =C3=A0 23:41, Ross Zwisler a =C3=A9crit : > [..] > > Hello > > > > I can confirm that HPC runtimes are going to use these patches (at least > > all runtimes that use hwloc for topology discovery, but that's the vast > > majority of HPC anyway). > > > > We really didn't like KNL exposing a hacky SLIT table [1]. We had to > > explicitly detect that specific crazy table to find out which NUMA nodes > > were local to which cores, and to find out which NUMA nodes were > > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the > > application because the reported latencies didn't match reality. Quite > > annoying. > > > > With Ross' patches, we can easily get what we need: > > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ > > can only report a single local node per CPU (doesn't work for KNL and > > upcoming architectures with HBM+DDR+...) > > * which NUMA nodes are slow/fast (for both bandwidth and latency) > > And we can still look at SLIT under /sys/devices/system/node if really > > needed. > > > > And of course having this in sysfs is much better than parsing ACPI > > tables that are only accessible to root :) > = > On this point, it's not clear to me that we should allow these sysfs > entries to be world readable. Given /proc/iomem now hides physical > address information from non-root we at least need to be careful not > to undo that with new sysfs HMAT attributes. This enabling does not expose any physical addresses to userspace. It only provides performance numbers from the HMAT and associates them with existing NUMA nodes. Are you worried that exposing performance numbers to non-root users via sysfs poses a security risk? > Once you need to be root for this info, is parsing binary HMAT vs sysfs a > blocker for the HPC use case? > = > Perhaps we can enlist /proc/iomem or a similar enumeration interface > to tell userspace the NUMA node and whether the kernel thinks it has > better or worse performance characteristics relative to base > system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start > publishing absolute numbers in sysfs userspace will default to looking > for specific magic numbers in sysfs vs asking the kernel for memory > that has performance characteristics relative to base "System RAM". In > other words the absolute performance information that the HMAT > publishes is useful to the kernel, but it's not clear that userspace > needs that vs a relative indicator for making NUMA node preference > decisions. --===============4907131457676150064==--