From: Bob Liu <liubo95@huawei.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Jerome Glisse <jglisse@redhat.com>
Cc: <akpm@linux-foundation.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
David Nellans <dnellans@nvidia.com>,
Balbir Singh <bsingharora@gmail.com>,
majiuyue <majiuyue@huawei.com>,
"xieyisheng (A)" <xieyisheng1@huawei.com>
Subject: Re: [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3
Date: Wed, 6 Sep 2017 09:25:36 +0800 [thread overview]
Message-ID: <0bc5047d-d27c-65b6-acab-921263e715c8@huawei.com> (raw)
In-Reply-To: <20170905185414.GB24073@linux.intel.com>
On 2017/9/6 2:54, Ross Zwisler wrote:
> On Mon, Sep 04, 2017 at 10:38:27PM -0400, Jerome Glisse wrote:
>> On Tue, Sep 05, 2017 at 09:13:24AM +0800, Bob Liu wrote:
>>> On 2017/9/4 23:51, Jerome Glisse wrote:
>>>> On Mon, Sep 04, 2017 at 11:09:14AM +0800, Bob Liu wrote:
>>>>> On 2017/8/17 8:05, Jérôme Glisse wrote:
>>>>>> Unlike unaddressable memory, coherent device memory has a real
>>>>>> resource associated with it on the system (as CPU can address
>>>>>> it). Add a new helper to hotplug such memory within the HMM
>>>>>> framework.
>>>>>>
>>>>>
>>>>> Got an new question, coherent device( e.g CCIX) memory are likely reported to OS
>>>>> through ACPI and recognized as NUMA memory node.
>>>>> Then how can their memory be captured and managed by HMM framework?
>>>>>
>>>>
>>>> Only platform that has such memory today is powerpc and it is not reported
>>>> as regular memory by the firmware hence why they need this helper.
>>>>
>>>> I don't think anyone has defined anything yet for x86 and acpi. As this is
>>>
>>> Not yet, but now the ACPI spec has Heterogeneous Memory Attribute
>>> Table (HMAT) table defined in ACPI 6.2.
>>> The HMAT can cover CPU-addressable memory types(though not non-cache
>>> coherent on-device memory).
>>>
>>> Ross from Intel already done some work on this, see:
>>> https://lwn.net/Articles/724562/
>>>
>>> arm64 supports APCI also, there is likely more this kind of device when CCIX
>>> is out (should be very soon if on schedule).
>>
>> HMAT is not for the same thing, AFAIK HMAT is for deep "hierarchy" memory ie
>> when you have several kind of memory each with different characteristics:
>> - HBM very fast (latency) and high bandwidth, non persistent, somewhat
>> small (ie few giga bytes)
>> - Persistent memory, slower (both latency and bandwidth) big (tera bytes)
>> - DDR (good old memory) well characteristics are between HBM and persistent
>>
>> So AFAICT this has nothing to do with what HMM is for, ie device memory. Note
>> that device memory can have a hierarchy of memory themself (HBM, GDDR and in
>> maybe even persistent memory).
>>
>>>> memory on PCIE like interface then i don't expect it to be reported as NUMA
>>>> memory node but as io range like any regular PCIE resources. Device driver
>>>> through capabilities flags would then figure out if the link between the
>>>> device and CPU is CCIX capable if so it can use this helper to hotplug it
>>>> as device memory.
>>>>
>>>
>>> From my point of view, Cache coherent device memory will popular soon and
>>> reported through ACPI/UEFI. Extending NUMA policy still sounds more reasonable
>>> to me.
>>
>> Cache coherent device will be reported through standard mecanisms defined by
>> the bus standard they are using. To my knowledge all the standard are either
>> on top of PCIE or are similar to PCIE.
>>
>> It is true that on many platform PCIE resource is manage/initialize by the
>> bios (UEFI) but it is platform specific. In some case we reprogram what the
>> bios pick.
>>
>> So like i was saying i don't expect the BIOS/UEFI to report device memory as
>> regular memory. It will be reported as a regular PCIE resources and then the
>> device driver will be able to determine through some flags if the link between
>> the CPU(s) and the device is cache coherent or not. At that point the device
>> driver can use register it with HMM helper.
>>
>>
>> The whole NUMA discussion happen several time in the past i suggest looking
>> on mm list archive for them. But it was rule out for several reasons. Top of
>> my head:
>> - people hate CPU less node and device memory is inherently CPU less
>
> With the introduction of the HMAT in ACPI 6.2 one of the things that was added
> was the ability to have an ACPI proximity domain that isn't associated with a
> CPU. This can be seen in the changes in the text of the "Proximity Domain"
> field in table 5-73 which describes the "Memory Affinity Structure". One of
> the major features of the HMAT was the separation of "Initiator" proximity
> domains (CPUs, devices that initiate memory transfers), and "target" proximity
> domains (memory regions, be they attached to a CPU or some other device).
>
> ACPI proximity domains map directly to Linux NUMA nodes, so I think we're
> already in a place where we have to support CPU-less NUMA nodes.
>
>> - device driver want total control over memory and thus to be isolated from
>> mm mecanism and doing all those special cases was not welcome
>
> I agree that the kernel doesn't have enough information to be able to
> accurately handle all the use cases for the various types of heterogeneous
> memory. The goal of my HMAT enabling is to allow that memory to be reserved
> from kernel use via the "Reservation Hint" in the HMAT's Memory Subsystem
> Address Range Structure, then provide userspace with enough information to be
> able to distinguish between the various types of memory in the system so it
> can allocate & utilize it appropriately.
>
Does this mean require an user space memory management library to deal with all alloc/free/defragment..
But how to do with virtual <-> physical address mapping from userspace?
--
Regards,
Bob Liu
>> - existing NUMA migration mecanism are ill suited for this memory as
>> access by the device to the memory is unknown to core mm and there
>> is no easy way to report it or track it (this kind of depends on the
>> platform and hardware)
>>
>> I am likely missing other big points.
>>
>> Cheers,
>> Jérôme
next prev parent reply other threads:[~2017-09-06 1:29 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-17 0:05 [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 01/19] hmm: heterogeneous memory management documentation v3 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 02/19] mm/hmm: heterogeneous memory management (HMM for short) v5 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 03/19] mm/hmm/mirror: mirror process address space on device with HMM helpers v3 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 04/19] mm/hmm/mirror: helper to snapshot CPU page table v4 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 05/19] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 06/19] mm/memory_hotplug: introduce add_pages Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5 Jérôme Glisse
2018-12-20 8:33 ` Dan Williams
2018-12-20 16:15 ` Jerome Glisse
2018-12-20 16:47 ` Dan Williams
2018-12-20 16:57 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 08/19] mm/ZONE_DEVICE: special case put_page() for device private pages v4 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 09/19] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4 Jérôme Glisse
2017-09-05 17:13 ` Laurent Dufour
2017-09-05 17:21 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 11/19] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v7 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 12/19] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v3 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 13/19] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-08-17 21:12 ` Andrew Morton
2017-08-17 21:44 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 14/19] mm/migrate: new memory migration helper for use with device memory v5 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 15/19] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 16/19] mm/migrate: support un-addressable ZONE_DEVICE page in migration v3 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 17/19] mm/migrate: allow migrate_vma() to alloc new page on empty entry v4 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 18/19] mm/device-public-memory: device memory cache coherent with CPU v5 Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-09-04 3:09 ` Bob Liu
2017-09-04 15:51 ` Jerome Glisse
2017-09-05 1:13 ` Bob Liu
2017-09-05 2:38 ` Jerome Glisse
2017-09-05 3:50 ` Bob Liu
2017-09-05 13:50 ` Jerome Glisse
2017-09-05 16:18 ` Dan Williams
2017-09-05 19:00 ` Ross Zwisler
2017-09-05 19:20 ` Jerome Glisse
2017-09-08 19:43 ` Ross Zwisler
2017-09-08 20:29 ` Jerome Glisse
2017-09-05 18:54 ` Ross Zwisler
2017-09-06 1:25 ` Bob Liu [this message]
2017-09-06 2:12 ` Jerome Glisse
2017-09-07 2:06 ` Bob Liu
2017-09-07 17:00 ` Jerome Glisse
2017-09-07 17:27 ` Jerome Glisse
2017-09-08 1:59 ` Bob Liu
2017-09-08 20:43 ` Dan Williams
2017-11-17 3:47 ` chetan L
2017-09-05 3:36 ` Balbir Singh
2017-08-17 21:39 ` [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Andrew Morton
2017-08-17 21:55 ` Jerome Glisse
2017-08-17 21:59 ` Dan Williams
2017-08-17 22:02 ` Jerome Glisse
2017-08-17 22:06 ` Dan Williams
2017-08-17 22:16 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bc5047d-d27c-65b6-acab-921263e715c8@huawei.com \
--to=liubo95@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dnellans@nvidia.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=majiuyue@huawei.com \
--cc=ross.zwisler@linux.intel.com \
--cc=xieyisheng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).