From: Dan Williams <dan.j.williams@intel.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Bob Liu <liubo95@huawei.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
David Nellans <dnellans@nvidia.com>,
Balbir Singh <bsingharora@gmail.com>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5
Date: Thu, 20 Jul 2017 20:48:20 -0700 [thread overview]
Message-ID: <CAPcyv4jJraGPW214xJ+wU3G=88UUP45YiA6hV5_NvNZSNB4qGA@mail.gmail.com> (raw)
In-Reply-To: <20170721014106.GB25991@redhat.com>
On Thu, Jul 20, 2017 at 6:41 PM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Fri, Jul 21, 2017 at 09:15:29AM +0800, Bob Liu wrote:
>> On 2017/7/20 23:03, Jerome Glisse wrote:
>> > On Wed, Jul 19, 2017 at 05:09:04PM +0800, Bob Liu wrote:
>> >> On 2017/7/19 10:25, Jerome Glisse wrote:
>> >>> On Wed, Jul 19, 2017 at 09:46:10AM +0800, Bob Liu wrote:
>> >>>> On 2017/7/18 23:38, Jerome Glisse wrote:
>> >>>>> On Tue, Jul 18, 2017 at 11:26:51AM +0800, Bob Liu wrote:
>> >>>>>> On 2017/7/14 5:15, Jérôme Glisse wrote:
>
> [...]
>
>> >> Then it's more like replace the numa node solution(CDM) with ZONE_DEVICE
>> >> (type MEMORY_DEVICE_PUBLIC). But the problem is the same, e.g how to make
>> >> sure the device memory say HBM won't be occupied by normal CPU allocation.
>> >> Things will be more complex if there are multi GPU connected by nvlink
>> >> (also cache coherent) in a system, each GPU has their own HBM.
>> >>
>> >> How to decide allocate physical memory from local HBM/DDR or remote HBM/
>> >> DDR?
>> >>
>> >> If using numa(CDM) approach there are NUMA mempolicy and autonuma mechanism
>> >> at least.
>> >
>> > NUMA is not as easy as you think. First like i said we want the device
>> > memory to be isolated from most existing mm mechanism. Because memory
>> > is unreliable and also because device might need to be able to evict
>> > memory to make contiguous physical memory allocation for graphics.
>> >
>>
>> Right, but we need isolation any way.
>> For hmm-cdm, the isolation is not adding device memory to lru list, and many
>> if (is_device_public_page(page)) ...
>>
>> But how to evict device memory?
>
> What you mean by evict ? Device driver can evict whenever they see the need
> to do so. CPU page fault will evict too. Process exit or munmap() will free
> the device memory.
>
> Are you refering to evict in the sense of memory reclaim under pressure ?
>
> So the way it flows for memory pressure is that if device driver want to
> make room it can evict stuff to system memory and if there is not enough
> system memory than thing get reclaim as usual before device driver can
> make progress on device memory reclaim.
>
>
>> > Second device driver are not integrated that closely within mm and the
>> > scheduler kernel code to allow to efficiently plug in device access
>> > notification to page (ie to update struct page so that numa worker
>> > thread can migrate memory base on accurate informations).
>> >
>> > Third it can be hard to decide who win between CPU and device access
>> > when it comes to updating thing like last CPU id.
>> >
>> > Fourth there is no such thing like device id ie equivalent of CPU id.
>> > If we were to add something the CPU id field in flags of struct page
>> > would not be big enough so this can have repercusion on struct page
>> > size. This is not an easy sell.
>> >
>> > They are other issues i can't think of right now. I think for now it
>>
>> My opinion is most of the issues are the same no matter use CDM or HMM-CDM.
>> I just care about a more complete solution no matter CDM,HMM-CDM or other ways.
>> HMM or HMM-CDM depends on device driver, but haven't see a public/full driver to
>> demonstrate the whole solution works fine.
>
> I am working with NVidia close source driver team to make sure that it works
> well for them. I am also working on nouveau open source driver for same NVidia
> hardware thought it will be of less use as what is missing there is a solid
> open source userspace to leverage this. Nonetheless open source driver are in
> the work.
Can you point to the nouveau patches? I still find these HMM patches
un-reviewable without an upstream consumer.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-21 3:48 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 21:15 [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Jérôme Glisse
2017-07-13 21:15 ` [PATCH 1/6] mm/zone-device: rename DEVICE_PUBLIC to DEVICE_HOST Jérôme Glisse
2017-07-17 9:09 ` Balbir Singh
2017-07-13 21:15 ` [PATCH 2/6] mm/device-public-memory: device memory cache coherent with CPU v4 Jérôme Glisse
2017-07-13 23:01 ` Balbir Singh
2017-07-13 21:15 ` [PATCH 3/6] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-07-13 21:15 ` [PATCH 4/6] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-07-17 9:10 ` Balbir Singh
2017-07-13 21:15 ` [PATCH 5/6] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_PUBLIC v3 Jérôme Glisse
2017-07-17 9:15 ` Balbir Singh
2017-07-13 21:15 ` [PATCH 6/6] mm/hmm: documents how device memory is accounted in rss and memcg Jérôme Glisse
2017-07-14 13:26 ` Michal Hocko
2017-07-18 3:26 ` [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Bob Liu
2017-07-18 15:38 ` Jerome Glisse
2017-07-19 1:46 ` Bob Liu
2017-07-19 2:25 ` Jerome Glisse
2017-07-19 9:09 ` Bob Liu
2017-07-20 15:03 ` Jerome Glisse
2017-07-21 1:15 ` Bob Liu
2017-07-21 1:41 ` Jerome Glisse
2017-07-21 2:10 ` Bob Liu
2017-07-21 12:01 ` Bob Liu
2017-07-21 15:21 ` Jerome Glisse
2017-07-21 3:48 ` Dan Williams [this message]
2017-07-21 15:22 ` Jerome Glisse
2017-09-05 19:36 ` Jerome Glisse
2017-09-09 23:22 ` Bob Liu
2017-09-11 23:36 ` Jerome Glisse
2017-09-12 1:02 ` Bob Liu
2017-09-12 16:17 ` Jerome Glisse
2017-09-26 9:56 ` Bob Liu
2017-09-26 16:16 ` Jerome Glisse
2017-09-30 2:57 ` Bob Liu
2017-09-30 22:49 ` Jerome Glisse
2017-10-11 13:15 ` Bob Liu
2017-10-12 15:37 ` Jerome Glisse
2017-11-16 2:10 ` chet l
2017-11-16 2:44 ` Jerome Glisse
2017-11-16 3:23 ` chetan L
2017-11-16 3:29 ` chetan L
2017-11-16 21:29 ` Jerome Glisse
2017-11-16 22:41 ` chetan L
2017-11-16 23:11 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4jJraGPW214xJ+wU3G=88UUP45YiA6hV5_NvNZSNB4qGA@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=bsingharora@gmail.com \
--cc=dnellans@nvidia.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liubo95@huawei.com \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).