From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> <20171222223154.GC25711@linux.intel.com> From: "Liubo(OS Lab)" Message-ID: <794c531c-eb96-b6ef-97a1-64c81779dc72@huawei.com> Date: Mon, 25 Dec 2017 10:05:01 +0800 MIME-Version: 1.0 In-Reply-To: <20171222223154.GC25711@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org To: Ross Zwisler , Anshuman Khandual Cc: linux-kernel@vger.kernel.org, "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan Williams , Dave Hansen , Jerome Glisse , John Hubbard , "Len Brown , Tim Chen ," , linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org List-ID: On 2017/12/23 6:31, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 08:39:41AM +0530, Anshuman Khandual wrote: >> On 12/14/2017 07:40 AM, Ross Zwisler wrote: > <> >>> We solve this issue by providing userspace with performance information on >>> individual memory ranges. This performance information is exposed via >>> sysfs: >>> >>> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null >>> mem_tgt2/firmware_id:1 >>> mem_tgt2/is_cached:0 >>> mem_tgt2/local_init/read_bw_MBps:40960 >>> mem_tgt2/local_init/read_lat_nsec:50 >>> mem_tgt2/local_init/write_bw_MBps:40960 >>> mem_tgt2/local_init/write_lat_nsec:50 > <> >> We will enlist properties for all possible "source --> target" on the system? > > Nope, just 'local' initiator/target pairs. I talk about the reasoning for > this in the cover letter for patch 3: > > https://lists.01.org/pipermail/linux-nvdimm/2017-December/013574.html > >> Right now it shows only bandwidth and latency properties, can it accommodate >> other properties as well in future ? > > We also have an 'is_cached' attribute for the memory targets if they are > involved in a caching hierarchy, but right now those are all the things we > expose. We can potentially expose whatever we want that is present in the > HMAT, but those seemed like a good start. > > I noticed that in your presentation you had some other examples of attributes > you cared about: > > * reliability > * power consumption > * density > > The HMAT doesn't provide this sort of information at present, but we > could/would add them to sysfs if the HMAT ever grew support for them. > >>> This allows applications to easily find the memory that they want to use. >>> We expect that the existing NUMA APIs will be enhanced to use this new >>> information so that applications can continue to use them to select their >>> desired memory. >> >> I had presented a proposal for NUMA redesign in the Plumbers Conference this >> year where various memory devices with different kind of memory attributes >> can be represented in the kernel and be used explicitly from the user space. >> Here is the link to the proposal if you feel interested. The proposal is >> very intrusive and also I dont have a RFC for it yet for discussion here. >> >> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf >> >> Problem is, designing the sysfs interface for memory attribute detection >> from user space without first thinking about redesigning the NUMA for >> heterogeneous memory may not be a good idea. Will look into this further. > > I took another look at your presentation, and overall I think that if/when a > NUMA redesign like this takes place ACPI systems with HMAT tables will be able > to participate. But I think we are probably a ways away from that, and like I I'm afraid not, there are cache-coherent bus like CCIX/OpenCAPI come out soon. No matter to say System-on-Chip already with internal bus linked DDR、HBM、CPU、Accelerator.. > said in my previous mail ACPI systems with memory-only NUMA nodes are going to > exist and need to be supported with the current NUMA scheme. Hence I don't And not only memory-only, but the accelerators can also be a master like CPU. > think that this patch series conflicts with your proposal. Didn't see conflict neither, but perhaps we should think for a longer-term solution and cover more situations/platforms. Anshuman's proposal is really a good start point to us. Cheers, Bob Liu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Liubo(OS Lab)" Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Date: Mon, 25 Dec 2017 10:05:01 +0800 Message-ID: <794c531c-eb96-b6ef-97a1-64c81779dc72@huawei.com> References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> <20171222223154.GC25711@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20171222223154.GC25711-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Ross Zwisler , Anshuman Khandual Cc: "Box, David E" , Dave Hansen , "Zheng, Lv" , linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Odzioba, Lukasz" , "Schmauss, Erik" , Len Brown , John Hubbard , Jerome Glisse , devel-E0kO6a4B6psdnm+yROfE0A@public.gmane.org, "Kogut, Jaroslaw" , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, "Koss, Marcin" , Brice Goglin , "Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Koziej, Artur" , "Lahtinen, Joonas" List-Id: linux-acpi@vger.kernel.org T24gMjAxNy8xMi8yMyA2OjMxLCBSb3NzIFp3aXNsZXIgd3JvdGU6Cj4gT24gRnJpLCBEZWMgMjIs IDIwMTcgYXQgMDg6Mzk6NDFBTSArMDUzMCwgQW5zaHVtYW4gS2hhbmR1YWwgd3JvdGU6Cj4+IE9u IDEyLzE0LzIwMTcgMDc6NDAgQU0sIFJvc3MgWndpc2xlciB3cm90ZToKPiA8Pgo+Pj4gV2Ugc29s dmUgdGhpcyBpc3N1ZSBieSBwcm92aWRpbmcgdXNlcnNwYWNlIHdpdGggcGVyZm9ybWFuY2UgaW5m b3JtYXRpb24gb24KPj4+IGluZGl2aWR1YWwgbWVtb3J5IHJhbmdlcy4gIFRoaXMgcGVyZm9ybWFu Y2UgaW5mb3JtYXRpb24gaXMgZXhwb3NlZCB2aWEKPj4+IHN5c2ZzOgo+Pj4KPj4+ICAgIyBncmVw IC4gbWVtX3RndDIvKiBtZW1fdGd0Mi9sb2NhbF9pbml0LyogMj4vZGV2L251bGwKPj4+ICAgbWVt X3RndDIvZmlybXdhcmVfaWQ6MQo+Pj4gICBtZW1fdGd0Mi9pc19jYWNoZWQ6MAo+Pj4gICBtZW1f dGd0Mi9sb2NhbF9pbml0L3JlYWRfYndfTUJwczo0MDk2MAo+Pj4gICBtZW1fdGd0Mi9sb2NhbF9p bml0L3JlYWRfbGF0X25zZWM6NTAKPj4+ICAgbWVtX3RndDIvbG9jYWxfaW5pdC93cml0ZV9id19N QnBzOjQwOTYwCj4+PiAgIG1lbV90Z3QyL2xvY2FsX2luaXQvd3JpdGVfbGF0X25zZWM6NTAKPiA8 Pgo+PiBXZSB3aWxsIGVubGlzdCBwcm9wZXJ0aWVzIGZvciBhbGwgcG9zc2libGUgInNvdXJjZSAt LT4gdGFyZ2V0IiBvbiB0aGUgc3lzdGVtPwo+IAo+IE5vcGUsIGp1c3QgJ2xvY2FsJyBpbml0aWF0 b3IvdGFyZ2V0IHBhaXJzLiAgSSB0YWxrIGFib3V0IHRoZSByZWFzb25pbmcgZm9yCj4gdGhpcyBp biB0aGUgY292ZXIgbGV0dGVyIGZvciBwYXRjaCAzOgo+IAo+IGh0dHBzOi8vbGlzdHMuMDEub3Jn L3BpcGVybWFpbC9saW51eC1udmRpbW0vMjAxNy1EZWNlbWJlci8wMTM1NzQuaHRtbAo+IAo+PiBS aWdodCBub3cgaXQgc2hvd3Mgb25seSBiYW5kd2lkdGggYW5kIGxhdGVuY3kgcHJvcGVydGllcywg Y2FuIGl0IGFjY29tbW9kYXRlCj4+IG90aGVyIHByb3BlcnRpZXMgYXMgd2VsbCBpbiBmdXR1cmUg Pwo+IAo+IFdlIGFsc28gaGF2ZSBhbiAnaXNfY2FjaGVkJyBhdHRyaWJ1dGUgZm9yIHRoZSBtZW1v cnkgdGFyZ2V0cyBpZiB0aGV5IGFyZQo+IGludm9sdmVkIGluIGEgY2FjaGluZyBoaWVyYXJjaHks IGJ1dCByaWdodCBub3cgdGhvc2UgYXJlIGFsbCB0aGUgdGhpbmdzIHdlCj4gZXhwb3NlLiAgV2Ug Y2FuIHBvdGVudGlhbGx5IGV4cG9zZSB3aGF0ZXZlciB3ZSB3YW50IHRoYXQgaXMgcHJlc2VudCBp biB0aGUKPiBITUFULCBidXQgdGhvc2Ugc2VlbWVkIGxpa2UgYSBnb29kIHN0YXJ0Lgo+IAo+IEkg bm90aWNlZCB0aGF0IGluIHlvdXIgcHJlc2VudGF0aW9uIHlvdSBoYWQgc29tZSBvdGhlciBleGFt cGxlcyBvZiBhdHRyaWJ1dGVzCj4geW91IGNhcmVkIGFib3V0Ogo+IAo+ICAqIHJlbGlhYmlsaXR5 Cj4gICogcG93ZXIgY29uc3VtcHRpb24KPiAgKiBkZW5zaXR5Cj4gCj4gVGhlIEhNQVQgZG9lc24n dCBwcm92aWRlIHRoaXMgc29ydCBvZiBpbmZvcm1hdGlvbiBhdCBwcmVzZW50LCBidXQgd2UKPiBj b3VsZC93b3VsZCBhZGQgdGhlbSB0byBzeXNmcyBpZiB0aGUgSE1BVCBldmVyIGdyZXcgc3VwcG9y dCBmb3IgdGhlbS4KPiAKPj4+IFRoaXMgYWxsb3dzIGFwcGxpY2F0aW9ucyB0byBlYXNpbHkgZmlu ZCB0aGUgbWVtb3J5IHRoYXQgdGhleSB3YW50IHRvIHVzZS4KPj4+IFdlIGV4cGVjdCB0aGF0IHRo ZSBleGlzdGluZyBOVU1BIEFQSXMgd2lsbCBiZSBlbmhhbmNlZCB0byB1c2UgdGhpcyBuZXcKPj4+ IGluZm9ybWF0aW9uIHNvIHRoYXQgYXBwbGljYXRpb25zIGNhbiBjb250aW51ZSB0byB1c2UgdGhl bSB0byBzZWxlY3QgdGhlaXIKPj4+IGRlc2lyZWQgbWVtb3J5Lgo+Pgo+PiBJIGhhZCBwcmVzZW50 ZWQgYSBwcm9wb3NhbCBmb3IgTlVNQSByZWRlc2lnbiBpbiB0aGUgUGx1bWJlcnMgQ29uZmVyZW5j ZSB0aGlzCj4+IHllYXIgd2hlcmUgdmFyaW91cyBtZW1vcnkgZGV2aWNlcyB3aXRoIGRpZmZlcmVu dCBraW5kIG9mIG1lbW9yeSBhdHRyaWJ1dGVzCj4+IGNhbiBiZSByZXByZXNlbnRlZCBpbiB0aGUg a2VybmVsIGFuZCBiZSB1c2VkIGV4cGxpY2l0bHkgZnJvbSB0aGUgdXNlciBzcGFjZS4KPj4gSGVy ZSBpcyB0aGUgbGluayB0byB0aGUgcHJvcG9zYWwgaWYgeW91IGZlZWwgaW50ZXJlc3RlZC4gVGhl IHByb3Bvc2FsIGlzCj4+IHZlcnkgaW50cnVzaXZlIGFuZCBhbHNvIEkgZG9udCBoYXZlIGEgUkZD IGZvciBpdCB5ZXQgZm9yIGRpc2N1c3Npb24gaGVyZS4KPj4KPj4gaHR0cHM6Ly9saW51eHBsdW1i ZXJzY29uZi5vcmcvMjAxNy9vY3cvL3N5c3RlbS9wcmVzZW50YXRpb25zLzQ2NTYvb3JpZ2luYWwv SGllcmFyY2hpY2FsX05VTUFfRGVzaWduX1BsdW1iZXJzXzIwMTcucGRmCj4+Cj4+IFByb2JsZW0g aXMsIGRlc2lnbmluZyB0aGUgc3lzZnMgaW50ZXJmYWNlIGZvciBtZW1vcnkgYXR0cmlidXRlIGRl dGVjdGlvbgo+PiBmcm9tIHVzZXIgc3BhY2Ugd2l0aG91dCBmaXJzdCB0aGlua2luZyBhYm91dCBy ZWRlc2lnbmluZyB0aGUgTlVNQSBmb3IKPj4gaGV0ZXJvZ2VuZW91cyBtZW1vcnkgbWF5IG5vdCBi ZSBhIGdvb2QgaWRlYS4gV2lsbCBsb29rIGludG8gdGhpcyBmdXJ0aGVyLgo+IAo+IEkgdG9vayBh bm90aGVyIGxvb2sgYXQgeW91ciBwcmVzZW50YXRpb24sIGFuZCBvdmVyYWxsIEkgdGhpbmsgdGhh dCBpZi93aGVuIGEKPiBOVU1BIHJlZGVzaWduIGxpa2UgdGhpcyB0YWtlcyBwbGFjZSBBQ1BJIHN5 c3RlbXMgd2l0aCBITUFUIHRhYmxlcyB3aWxsIGJlIGFibGUKPiB0byBwYXJ0aWNpcGF0ZS4gIEJ1 dCBJIHRoaW5rIHdlIGFyZSBwcm9iYWJseSBhIHdheXMgYXdheSBmcm9tIHRoYXQsIGFuZCBsaWtl IEkKCkknbSBhZnJhaWQgbm90LCB0aGVyZSBhcmUgY2FjaGUtY29oZXJlbnQgYnVzIGxpa2UgQ0NJ WC9PcGVuQ0FQSSBjb21lIG91dCBzb29uLgpObyBtYXR0ZXIgdG8gc2F5IFN5c3RlbS1vbi1DaGlw IGFscmVhZHkgd2l0aCBpbnRlcm5hbCBidXMgbGlua2VkIEREUuOAgUhCTeOAgUNQVeOAgUFjY2Vs ZXJhdG9yLi4KCj4gc2FpZCBpbiBteSBwcmV2aW91cyBtYWlsIEFDUEkgc3lzdGVtcyB3aXRoIG1l bW9yeS1vbmx5IE5VTUEgbm9kZXMgYXJlIGdvaW5nIHRvCj4gZXhpc3QgYW5kIG5lZWQgdG8gYmUg c3VwcG9ydGVkIHdpdGggdGhlIGN1cnJlbnQgTlVNQSBzY2hlbWUuICBIZW5jZSBJIGRvbid0CgpB bmQgbm90IG9ubHkgbWVtb3J5LW9ubHksIGJ1dCB0aGUgYWNjZWxlcmF0b3JzIGNhbiBhbHNvIGJl IGEgbWFzdGVyIGxpa2UgQ1BVLgoKPiB0aGluayB0aGF0IHRoaXMgcGF0Y2ggc2VyaWVzIGNvbmZs aWN0cyB3aXRoIHlvdXIgcHJvcG9zYWwuCgpEaWRuJ3Qgc2VlIGNvbmZsaWN0IG5laXRoZXIsIGJ1 dCBwZXJoYXBzIHdlIHNob3VsZCB0aGluayBmb3IgYSBsb25nZXItdGVybSBzb2x1dGlvbiBhbmQg Y292ZXIgbW9yZQpzaXR1YXRpb25zL3BsYXRmb3Jtcy4KQW5zaHVtYW4ncyBwcm9wb3NhbCBpcyBy ZWFsbHkgYSBnb29kIHN0YXJ0IHBvaW50IHRvIHVzLgoKQ2hlZXJzLApCb2IgTGl1CgoKX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTGludXgtbnZkaW1tIG1h aWxpbmcgbGlzdApMaW51eC1udmRpbW1AbGlzdHMuMDEub3JnCmh0dHBzOi8vbGlzdHMuMDEub3Jn L21haWxtYW4vbGlzdGluZm8vbGludXgtbnZkaW1tCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751551AbdLYCFn (ORCPT ); Sun, 24 Dec 2017 21:05:43 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2727 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751039AbdLYCFj (ORCPT ); Sun, 24 Dec 2017 21:05:39 -0500 Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT To: Ross Zwisler , Anshuman Khandual References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> <20171222223154.GC25711@linux.intel.com> CC: , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan Williams , Dave Hansen , Jerome Glisse , John Hubbard , "Len Brown" , Tim Chen , , , , From: "Liubo(OS Lab)" Message-ID: <794c531c-eb96-b6ef-97a1-64c81779dc72@huawei.com> Date: Mon, 25 Dec 2017 10:05:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20171222223154.GC25711@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.142.83.150] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017/12/23 6:31, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 08:39:41AM +0530, Anshuman Khandual wrote: >> On 12/14/2017 07:40 AM, Ross Zwisler wrote: > <> >>> We solve this issue by providing userspace with performance information on >>> individual memory ranges. This performance information is exposed via >>> sysfs: >>> >>> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null >>> mem_tgt2/firmware_id:1 >>> mem_tgt2/is_cached:0 >>> mem_tgt2/local_init/read_bw_MBps:40960 >>> mem_tgt2/local_init/read_lat_nsec:50 >>> mem_tgt2/local_init/write_bw_MBps:40960 >>> mem_tgt2/local_init/write_lat_nsec:50 > <> >> We will enlist properties for all possible "source --> target" on the system? > > Nope, just 'local' initiator/target pairs. I talk about the reasoning for > this in the cover letter for patch 3: > > https://lists.01.org/pipermail/linux-nvdimm/2017-December/013574.html > >> Right now it shows only bandwidth and latency properties, can it accommodate >> other properties as well in future ? > > We also have an 'is_cached' attribute for the memory targets if they are > involved in a caching hierarchy, but right now those are all the things we > expose. We can potentially expose whatever we want that is present in the > HMAT, but those seemed like a good start. > > I noticed that in your presentation you had some other examples of attributes > you cared about: > > * reliability > * power consumption > * density > > The HMAT doesn't provide this sort of information at present, but we > could/would add them to sysfs if the HMAT ever grew support for them. > >>> This allows applications to easily find the memory that they want to use. >>> We expect that the existing NUMA APIs will be enhanced to use this new >>> information so that applications can continue to use them to select their >>> desired memory. >> >> I had presented a proposal for NUMA redesign in the Plumbers Conference this >> year where various memory devices with different kind of memory attributes >> can be represented in the kernel and be used explicitly from the user space. >> Here is the link to the proposal if you feel interested. The proposal is >> very intrusive and also I dont have a RFC for it yet for discussion here. >> >> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf >> >> Problem is, designing the sysfs interface for memory attribute detection >> from user space without first thinking about redesigning the NUMA for >> heterogeneous memory may not be a good idea. Will look into this further. > > I took another look at your presentation, and overall I think that if/when a > NUMA redesign like this takes place ACPI systems with HMAT tables will be able > to participate. But I think we are probably a ways away from that, and like I I'm afraid not, there are cache-coherent bus like CCIX/OpenCAPI come out soon. No matter to say System-on-Chip already with internal bus linked DDR、HBM、CPU、Accelerator.. > said in my previous mail ACPI systems with memory-only NUMA nodes are going to > exist and need to be supported with the current NUMA scheme. Hence I don't And not only memory-only, but the accelerators can also be a master like CPU. > think that this patch series conflicts with your proposal. Didn't see conflict neither, but perhaps we should think for a longer-term solution and cover more situations/platforms. Anshuman's proposal is really a good start point to us. Cheers, Bob Liu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 982B16B0253 for ; Sun, 24 Dec 2017 21:05:40 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id z12so19584548pgv.6 for ; Sun, 24 Dec 2017 18:05:40 -0800 (PST) Received: from huawei.com (szxga04-in.huawei.com. [45.249.212.190]) by mx.google.com with ESMTPS id k7si21259153pls.500.2017.12.24.18.05.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 24 Dec 2017 18:05:39 -0800 (PST) Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT References: <20171214021019.13579-1-ross.zwisler@linux.intel.com> <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com> <20171222223154.GC25711@linux.intel.com> From: "Liubo(OS Lab)" Message-ID: <794c531c-eb96-b6ef-97a1-64c81779dc72@huawei.com> Date: Mon, 25 Dec 2017 10:05:01 +0800 MIME-Version: 1.0 In-Reply-To: <20171222223154.GC25711@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Ross Zwisler , Anshuman Khandual Cc: linux-kernel@vger.kernel.org, "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Brice Goglin , Dan Williams , Dave Hansen , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org On 2017/12/23 6:31, Ross Zwisler wrote: > On Fri, Dec 22, 2017 at 08:39:41AM +0530, Anshuman Khandual wrote: >> On 12/14/2017 07:40 AM, Ross Zwisler wrote: > <> >>> We solve this issue by providing userspace with performance information on >>> individual memory ranges. This performance information is exposed via >>> sysfs: >>> >>> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null >>> mem_tgt2/firmware_id:1 >>> mem_tgt2/is_cached:0 >>> mem_tgt2/local_init/read_bw_MBps:40960 >>> mem_tgt2/local_init/read_lat_nsec:50 >>> mem_tgt2/local_init/write_bw_MBps:40960 >>> mem_tgt2/local_init/write_lat_nsec:50 > <> >> We will enlist properties for all possible "source --> target" on the system? > > Nope, just 'local' initiator/target pairs. I talk about the reasoning for > this in the cover letter for patch 3: > > https://lists.01.org/pipermail/linux-nvdimm/2017-December/013574.html > >> Right now it shows only bandwidth and latency properties, can it accommodate >> other properties as well in future ? > > We also have an 'is_cached' attribute for the memory targets if they are > involved in a caching hierarchy, but right now those are all the things we > expose. We can potentially expose whatever we want that is present in the > HMAT, but those seemed like a good start. > > I noticed that in your presentation you had some other examples of attributes > you cared about: > > * reliability > * power consumption > * density > > The HMAT doesn't provide this sort of information at present, but we > could/would add them to sysfs if the HMAT ever grew support for them. > >>> This allows applications to easily find the memory that they want to use. >>> We expect that the existing NUMA APIs will be enhanced to use this new >>> information so that applications can continue to use them to select their >>> desired memory. >> >> I had presented a proposal for NUMA redesign in the Plumbers Conference this >> year where various memory devices with different kind of memory attributes >> can be represented in the kernel and be used explicitly from the user space. >> Here is the link to the proposal if you feel interested. The proposal is >> very intrusive and also I dont have a RFC for it yet for discussion here. >> >> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf >> >> Problem is, designing the sysfs interface for memory attribute detection >> from user space without first thinking about redesigning the NUMA for >> heterogeneous memory may not be a good idea. Will look into this further. > > I took another look at your presentation, and overall I think that if/when a > NUMA redesign like this takes place ACPI systems with HMAT tables will be able > to participate. But I think we are probably a ways away from that, and like I I'm afraid not, there are cache-coherent bus like CCIX/OpenCAPI come out soon. No matter to say System-on-Chip already with internal bus linked DDRa??HBMa??CPUa??Accelerator.. > said in my previous mail ACPI systems with memory-only NUMA nodes are going to > exist and need to be supported with the current NUMA scheme. Hence I don't And not only memory-only, but the accelerators can also be a master like CPU. > think that this patch series conflicts with your proposal. Didn't see conflict neither, but perhaps we should think for a longer-term solution and cover more situations/platforms. Anshuman's proposal is really a good start point to us. Cheers, Bob Liu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org