linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Bob Liu <liubo95@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Balbir Singh <bsingharora@gmail.com>,
	majiuyue <majiuyue@huawei.com>,
	"xieyisheng (A)" <xieyisheng1@huawei.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3
Date: Tue, 5 Sep 2017 09:18:55 -0700	[thread overview]
Message-ID: <CAPcyv4iAspsatNmv=z-jAsTycwPrkh8XsWENyBOL9-1WuhGQWw@mail.gmail.com> (raw)
In-Reply-To: <20170905135017.GA19397@redhat.com>

On Tue, Sep 5, 2017 at 6:50 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Tue, Sep 05, 2017 at 11:50:57AM +0800, Bob Liu wrote:
>> On 2017/9/5 10:38, Jerome Glisse wrote:
>> > On Tue, Sep 05, 2017 at 09:13:24AM +0800, Bob Liu wrote:
>> >> On 2017/9/4 23:51, Jerome Glisse wrote:
>> >>> On Mon, Sep 04, 2017 at 11:09:14AM +0800, Bob Liu wrote:
>> >>>> On 2017/8/17 8:05, Jérôme Glisse wrote:
>> >>>>> Unlike unaddressable memory, coherent device memory has a real
>> >>>>> resource associated with it on the system (as CPU can address
>> >>>>> it). Add a new helper to hotplug such memory within the HMM
>> >>>>> framework.
>> >>>>>
>> >>>>
>> >>>> Got an new question, coherent device( e.g CCIX) memory are likely reported to OS
>> >>>> through ACPI and recognized as NUMA memory node.
>> >>>> Then how can their memory be captured and managed by HMM framework?
>> >>>>
>> >>>
>> >>> Only platform that has such memory today is powerpc and it is not reported
>> >>> as regular memory by the firmware hence why they need this helper.
>> >>>
>> >>> I don't think anyone has defined anything yet for x86 and acpi. As this is
>> >>
>> >> Not yet, but now the ACPI spec has Heterogeneous Memory Attribute
>> >> Table (HMAT) table defined in ACPI 6.2.
>> >> The HMAT can cover CPU-addressable memory types(though not non-cache
>> >> coherent on-device memory).
>> >>
>> >> Ross from Intel already done some work on this, see:
>> >> https://lwn.net/Articles/724562/
>> >>
>> >> arm64 supports APCI also, there is likely more this kind of device when CCIX
>> >> is out (should be very soon if on schedule).
>> >
>> > HMAT is not for the same thing, AFAIK HMAT is for deep "hierarchy" memory ie
>> > when you have several kind of memory each with different characteristics:
>> >   - HBM very fast (latency) and high bandwidth, non persistent, somewhat
>> >     small (ie few giga bytes)
>> >   - Persistent memory, slower (both latency and bandwidth) big (tera bytes)
>> >   - DDR (good old memory) well characteristics are between HBM and persistent
>> >
>>
>> Okay, then how the kernel handle the situation of "kind of memory each with different characteristics"?
>> Does someone have any suggestion?  I thought HMM can do this.
>> Numa policy/node distance is good but perhaps require a few extending, e.g a HBM node can't be
>> swap, can't accept DDR fallback allocation.
>
> I don't think there is any consensus for this. I put forward the idea that NUMA
> needed to be extended as with deep hierarchy it is not only the distance between
> two nodes but also others factors like persistency, bandwidth, latency ...
>
>
>> > So AFAICT this has nothing to do with what HMM is for, ie device memory. Note
>> > that device memory can have a hierarchy of memory themself (HBM, GDDR and in
>> > maybe even persistent memory).
>> >
>>
>> This looks like a subset of HMAT when CPU can address device memory directly in cache-coherent way.
>
> It is not, it is much more complex than that. Linux kernel has no idea on what is
> going on a device and thus do not have any usefull informations to make proper
> decission regarding device memory. Here device is real device ie something with
> processing capability, not something like HBM or persistent memory even if the
> latter is associated with a struct device inside linux kernel.
>
>>
>>
>> >>> memory on PCIE like interface then i don't expect it to be reported as NUMA
>> >>> memory node but as io range like any regular PCIE resources. Device driver
>> >>> through capabilities flags would then figure out if the link between the
>> >>> device and CPU is CCIX capable if so it can use this helper to hotplug it
>> >>> as device memory.
>> >>>
>> >>
>> >> From my point of view,  Cache coherent device memory will popular soon and
>> >> reported through ACPI/UEFI. Extending NUMA policy still sounds more reasonable
>> >> to me.
>> >
>> > Cache coherent device will be reported through standard mecanisms defined by
>> > the bus standard they are using. To my knowledge all the standard are either
>> > on top of PCIE or are similar to PCIE.
>> >
>> > It is true that on many platform PCIE resource is manage/initialize by the
>> > bios (UEFI) but it is platform specific. In some case we reprogram what the
>> > bios pick.
>> >
>> > So like i was saying i don't expect the BIOS/UEFI to report device memory as
>>
>> But it's happening.
>> In my understanding, that's why HMAT was introduced.
>> For reporting device memory as regular memory(with different characteristics).
>
> That is not my understanding but only Intel can confirm. HMAT was introduced
> for things like HBM or persistent memory. Which i do not consider as device
> memory. Sure persistent memory is assign a device struct because it is easier
> for integration with the block system i assume. But it does not make it a
> device in my view. For me a device is a piece of hardware that has some
> processing capabilities (network adapter, sound card, GPU, ...)
>
> But we can argue about semantic and what a device is. For all intent and purposes
> device in HMM context is some piece of hardware with processing capabilities and
> local device memory.

I would say that device memory at its base-level is a memory range
whose availability is dependent on a device-driver. HMM layers some
additional functionality on top, but ZONE_DEVICE should only be seen
as the device-driver controlled lifetime and not conflated with the
incremental HMM functionality.

HMAT simply allows you to associate a memory range with a numa-node /
proximity-domain number that represents a set of performance / feature
characteristics.

  reply	other threads:[~2017-09-05 16:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17  0:05 [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 01/19] hmm: heterogeneous memory management documentation v3 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 02/19] mm/hmm: heterogeneous memory management (HMM for short) v5 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 03/19] mm/hmm/mirror: mirror process address space on device with HMM helpers v3 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 04/19] mm/hmm/mirror: helper to snapshot CPU page table v4 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 05/19] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 06/19] mm/memory_hotplug: introduce add_pages Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5 Jérôme Glisse
2018-12-20  8:33   ` Dan Williams
2018-12-20 16:15     ` Jerome Glisse
2018-12-20 16:47       ` Dan Williams
2018-12-20 16:57         ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 08/19] mm/ZONE_DEVICE: special case put_page() for device private pages v4 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 09/19] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4 Jérôme Glisse
2017-09-05 17:13   ` Laurent Dufour
2017-09-05 17:21     ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 11/19] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v7 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 12/19] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v3 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 13/19] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-08-17 21:12   ` Andrew Morton
2017-08-17 21:44     ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 14/19] mm/migrate: new memory migration helper for use with device memory v5 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 15/19] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 16/19] mm/migrate: support un-addressable ZONE_DEVICE page in migration v3 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 17/19] mm/migrate: allow migrate_vma() to alloc new page on empty entry v4 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 18/19] mm/device-public-memory: device memory cache coherent with CPU v5 Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-09-04  3:09   ` Bob Liu
2017-09-04 15:51     ` Jerome Glisse
2017-09-05  1:13       ` Bob Liu
2017-09-05  2:38         ` Jerome Glisse
2017-09-05  3:50           ` Bob Liu
2017-09-05 13:50             ` Jerome Glisse
2017-09-05 16:18               ` Dan Williams [this message]
2017-09-05 19:00               ` Ross Zwisler
2017-09-05 19:20                 ` Jerome Glisse
2017-09-08 19:43                   ` Ross Zwisler
2017-09-08 20:29                     ` Jerome Glisse
2017-09-05 18:54           ` Ross Zwisler
2017-09-06  1:25             ` Bob Liu
2017-09-06  2:12               ` Jerome Glisse
2017-09-07  2:06                 ` Bob Liu
2017-09-07 17:00                   ` Jerome Glisse
2017-09-07 17:27                   ` Jerome Glisse
2017-09-08  1:59                     ` Bob Liu
2017-09-08 20:43                       ` Dan Williams
2017-11-17  3:47                         ` chetan L
2017-09-05  3:36       ` Balbir Singh
2017-08-17 21:39 ` [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Andrew Morton
2017-08-17 21:55   ` Jerome Glisse
2017-08-17 21:59     ` Dan Williams
2017-08-17 22:02       ` Jerome Glisse
2017-08-17 22:06         ` Dan Williams
2017-08-17 22:16       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iAspsatNmv=z-jAsTycwPrkh8XsWENyBOL9-1WuhGQWw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=dnellans@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liubo95@huawei.com \
    --cc=majiuyue@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=xieyisheng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).