From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 491F32219BCBF for ; Thu, 21 Dec 2017 19:05:15 -0800 (PST) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vBM39FRs039689 for ; Thu, 21 Dec 2017 22:10:04 -0500 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2f0nqahmjw-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 21 Dec 2017 22:10:03 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 22 Dec 2017 03:10:01 -0000 Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> From: Anshuman Khandual Date: Fri, 22 Dec 2017 08:39:41 +0530 MIME-Version: 1.0 In-Reply-To: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Message-Id: <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Ross Zwisler , linux-kernel@vger.kernel.org Cc: "Box, David E" , Dave Hansen , "Zheng, Lv" , linux-nvdimm@lists.01.org, "Rafael J. Wysocki" , Anaczkowski,, Robert, Lukasz, "Erik , Len Brown" , John Hubbard , Jerome Glisse , devel@acpica.org, Kogut,, "Marcin , Brice Goglin , Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , Koziej,, "Joonas , Andrew Morton , Tim Chen" List-ID: On 12/14/2017 07:40 AM, Ross Zwisler wrote: > ==== Quick Summary ==== > > Platforms exist today which have multiple types of memory attached to a > single CPU. These disparate memory ranges have some characteristics in > common, such as CPU cache coherence, but they can have wide ranges of > performance both in terms of latency and bandwidth. Right. > > For example, consider a system that contains persistent memory, standard > DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU. > There could potentially be an order of magnitude or more difference in > performance between the slowest and fastest memory attached to that CPU. Right. > > With the current Linux code NUMA nodes are CPU-centric, so all the memory > attached to a given CPU will be lumped into the same NUMA node. This makes > it very difficult for userspace applications to understand the performance > of different memory ranges on a given CPU. Right but that might require fundamental changes to the NUMA representation. Plugging those memory as separate NUMA nodes, identify them through sysfs and try allocating from it through mbind() seems like a short term solution. Though if we decide to go in this direction, sysfs interface or something similar is required to enumerate memory properties. > > We solve this issue by providing userspace with performance information on > individual memory ranges. This performance information is exposed via > sysfs: > > # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null > mem_tgt2/firmware_id:1 > mem_tgt2/is_cached:0 > mem_tgt2/local_init/read_bw_MBps:40960 > mem_tgt2/local_init/read_lat_nsec:50 > mem_tgt2/local_init/write_bw_MBps:40960 > mem_tgt2/local_init/write_lat_nsec:50 I might have missed discussions from earlier versions, why we have this kind of a "source --> target" model ? We will enlist properties for all possible "source --> target" on the system ? Right now it shows only bandwidth and latency properties, can it accommodate other properties as well in future ? > > This allows applications to easily find the memory that they want to use. > We expect that the existing NUMA APIs will be enhanced to use this new > information so that applications can continue to use them to select their > desired memory. I had presented a proposal for NUMA redesign in the Plumbers Conference this year where various memory devices with different kind of memory attributes can be represented in the kernel and be used explicitly from the user space. Here is the link to the proposal if you feel interested. The proposal is very intrusive and also I dont have a RFC for it yet for discussion here. https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf Problem is, designing the sysfs interface for memory attribute detection from user space without first thinking about redesigning the NUMA for heterogeneous memory may not be a good idea. Will look into this further. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anshuman Khandual Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Fri, 22 Dec 2017 08:39:41 +0530 Message-ID: <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:51518 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752766AbdLVDKF (ORCPT ); Thu, 21 Dec 2017 22:10:05 -0500 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vBM39MBQ078475 for ; Thu, 21 Dec 2017 22:10:04 -0500 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2f0nyd958e-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 21 Dec 2017 22:10:04 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 22 Dec 2017 03:10:01 -0000 In-Reply-To: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Ross Zwisler , linux-kernel@vger.kernel.org Cc: "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan On 12/14/2017 07:40 AM, Ross Zwisler wrote: > ==== Quick Summary ==== > > Platforms exist today which have multiple types of memory attached to a > single CPU. These disparate memory ranges have some characteristics in > common, such as CPU cache coherence, but they can have wide ranges of > performance both in terms of latency and bandwidth. Right. > > For example, consider a system that contains persistent memory, standard > DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU. > There could potentially be an order of magnitude or more difference in > performance between the slowest and fastest memory attached to that CPU. Right. > > With the current Linux code NUMA nodes are CPU-centric, so all the memory > attached to a given CPU will be lumped into the same NUMA node. This makes > it very difficult for userspace applications to understand the performance > of different memory ranges on a given CPU. Right but that might require fundamental changes to the NUMA representation. Plugging those memory as separate NUMA nodes, identify them through sysfs and try allocating from it through mbind() seems like a short term solution. Though if we decide to go in this direction, sysfs interface or something similar is required to enumerate memory properties. > > We solve this issue by providing userspace with performance information on > individual memory ranges. This performance information is exposed via > sysfs: > > # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null > mem_tgt2/firmware_id:1 > mem_tgt2/is_cached:0 > mem_tgt2/local_init/read_bw_MBps:40960 > mem_tgt2/local_init/read_lat_nsec:50 > mem_tgt2/local_init/write_bw_MBps:40960 > mem_tgt2/local_init/write_lat_nsec:50 I might have missed discussions from earlier versions, why we have this kind of a "source --> target" model ? We will enlist properties for all possible "source --> target" on the system ? Right now it shows only bandwidth and latency properties, can it accommodate other properties as well in future ? > > This allows applications to easily find the memory that they want to use. > We expect that the existing NUMA APIs will be enhanced to use this new > information so that applications can continue to use them to select their > desired memory. I had presented a proposal for NUMA redesign in the Plumbers Conference this year where various memory devices with different kind of memory attributes can be represented in the kernel and be used explicitly from the user space. Here is the link to the proposal if you feel interested. The proposal is very intrusive and also I dont have a RFC for it yet for discussion here. https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf Problem is, designing the sysfs interface for memory attribute detection from user space without first thinking about redesigning the NUMA for heterogeneous memory may not be a good idea. Will look into this further. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755458AbdLVDKI (ORCPT ); Thu, 21 Dec 2017 22:10:08 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:58990 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752705AbdLVDKE (ORCPT ); Thu, 21 Dec 2017 22:10:04 -0500 Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT To: Ross Zwisler , linux-kernel@vger.kernel.org References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Cc: "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan Williams , Dave Hansen , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org From: Anshuman Khandual Date: Fri, 22 Dec 2017 08:39:41 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17122203-0040-0000-0000-000003FCD728 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17122203-0041-0000-0000-00002600028C Message-Id: <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-22_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712220042 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/14/2017 07:40 AM, Ross Zwisler wrote: > ==== Quick Summary ==== > > Platforms exist today which have multiple types of memory attached to a > single CPU. These disparate memory ranges have some characteristics in > common, such as CPU cache coherence, but they can have wide ranges of > performance both in terms of latency and bandwidth. Right. > > For example, consider a system that contains persistent memory, standard > DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU. > There could potentially be an order of magnitude or more difference in > performance between the slowest and fastest memory attached to that CPU. Right. > > With the current Linux code NUMA nodes are CPU-centric, so all the memory > attached to a given CPU will be lumped into the same NUMA node. This makes > it very difficult for userspace applications to understand the performance > of different memory ranges on a given CPU. Right but that might require fundamental changes to the NUMA representation. Plugging those memory as separate NUMA nodes, identify them through sysfs and try allocating from it through mbind() seems like a short term solution. Though if we decide to go in this direction, sysfs interface or something similar is required to enumerate memory properties. > > We solve this issue by providing userspace with performance information on > individual memory ranges. This performance information is exposed via > sysfs: > > # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null > mem_tgt2/firmware_id:1 > mem_tgt2/is_cached:0 > mem_tgt2/local_init/read_bw_MBps:40960 > mem_tgt2/local_init/read_lat_nsec:50 > mem_tgt2/local_init/write_bw_MBps:40960 > mem_tgt2/local_init/write_lat_nsec:50 I might have missed discussions from earlier versions, why we have this kind of a "source --> target" model ? We will enlist properties for all possible "source --> target" on the system ? Right now it shows only bandwidth and latency properties, can it accommodate other properties as well in future ? > > This allows applications to easily find the memory that they want to use. > We expect that the existing NUMA APIs will be enhanced to use this new > information so that applications can continue to use them to select their > desired memory. I had presented a proposal for NUMA redesign in the Plumbers Conference this year where various memory devices with different kind of memory attributes can be represented in the kernel and be used explicitly from the user space. Here is the link to the proposal if you feel interested. The proposal is very intrusive and also I dont have a RFC for it yet for discussion here. https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf Problem is, designing the sysfs interface for memory attribute detection from user space without first thinking about redesigning the NUMA for heterogeneous memory may not be a good idea. Will look into this further. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id A04E06B0038 for ; Thu, 21 Dec 2017 22:10:05 -0500 (EST) Received: by mail-qk0-f197.google.com with SMTP id l18so2447837qkk.23 for ; Thu, 21 Dec 2017 19:10:05 -0800 (PST) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id x64si1215264qke.189.2017.12.21.19.10.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Dec 2017 19:10:04 -0800 (PST) Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vBM39FLi006326 for ; Thu, 21 Dec 2017 22:10:04 -0500 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0b-001b2d01.pphosted.com with ESMTP id 2f0pvu73d5-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 21 Dec 2017 22:10:03 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 22 Dec 2017 03:10:01 -0000 Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> From: Anshuman Khandual Date: Fri, 22 Dec 2017 08:39:41 +0530 MIME-Version: 1.0 In-Reply-To: <20171214021019.13579-1-ross.zwisler@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Message-Id: <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ross Zwisler , linux-kernel@vger.kernel.org Cc: "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan Williams , Dave Hansen , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org On 12/14/2017 07:40 AM, Ross Zwisler wrote: > ==== Quick Summary ==== > > Platforms exist today which have multiple types of memory attached to a > single CPU. These disparate memory ranges have some characteristics in > common, such as CPU cache coherence, but they can have wide ranges of > performance both in terms of latency and bandwidth. Right. > > For example, consider a system that contains persistent memory, standard > DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU. > There could potentially be an order of magnitude or more difference in > performance between the slowest and fastest memory attached to that CPU. Right. > > With the current Linux code NUMA nodes are CPU-centric, so all the memory > attached to a given CPU will be lumped into the same NUMA node. This makes > it very difficult for userspace applications to understand the performance > of different memory ranges on a given CPU. Right but that might require fundamental changes to the NUMA representation. Plugging those memory as separate NUMA nodes, identify them through sysfs and try allocating from it through mbind() seems like a short term solution. Though if we decide to go in this direction, sysfs interface or something similar is required to enumerate memory properties. > > We solve this issue by providing userspace with performance information on > individual memory ranges. This performance information is exposed via > sysfs: > > # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null > mem_tgt2/firmware_id:1 > mem_tgt2/is_cached:0 > mem_tgt2/local_init/read_bw_MBps:40960 > mem_tgt2/local_init/read_lat_nsec:50 > mem_tgt2/local_init/write_bw_MBps:40960 > mem_tgt2/local_init/write_lat_nsec:50 I might have missed discussions from earlier versions, why we have this kind of a "source --> target" model ? We will enlist properties for all possible "source --> target" on the system ? Right now it shows only bandwidth and latency properties, can it accommodate other properties as well in future ? > > This allows applications to easily find the memory that they want to use. > We expect that the existing NUMA APIs will be enhanced to use this new > information so that applications can continue to use them to select their > desired memory. I had presented a proposal for NUMA redesign in the Plumbers Conference this year where various memory devices with different kind of memory attributes can be represented in the kernel and be used explicitly from the user space. Here is the link to the proposal if you feel interested. The proposal is very intrusive and also I dont have a RFC for it yet for discussion here. https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf Problem is, designing the sysfs interface for memory attribute detection from user space without first thinking about redesigning the NUMA for heterogeneous memory may not be a good idea. Will look into this further. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org