All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz,
	mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org,
	aneesh.kumar@linux.vnet.ibm.com
Subject: Re: [RFC 0/8] Define coherent device memory node
Date: Tue, 25 Oct 2016 11:21:17 -0400	[thread overview]
Message-ID: <20161025152052.GA6131@gmail.com> (raw)
In-Reply-To: <24fce2e8-e2e9-a665-f2a0-b7902a337c5d@gmail.com>

On Tue, Oct 25, 2016 at 11:07:39PM +1100, Balbir Singh wrote:
> On 25/10/16 04:09, Jerome Glisse wrote:
> > On Mon, Oct 24, 2016 at 10:01:49AM +0530, Anshuman Khandual wrote:
> > 
> >> [...]
> > 
> >> 	Core kernel memory features like reclamation, evictions etc. might
> >> need to be restricted or modified on the coherent device memory node as
> >> they can be performance limiting. The RFC does not propose anything on this
> >> yet but it can be looked into later on. For now it just disables Auto NUMA
> >> for any VMA which has coherent device memory.
> >>
> >> 	Seamless integration of coherent device memory with system memory
> >> will enable various other features, some of which can be listed as follows.
> >>
> >> 	a. Seamless migrations between system RAM and the coherent memory
> >> 	b. Will have asynchronous and high throughput migrations
> >> 	c. Be able to allocate huge order pages from these memory regions
> >> 	d. Restrict allocations to a large extent to the tasks using the
> >> 	   device for workload acceleration
> >>
> >> 	Before concluding, will look into the reasons why the existing
> >> solutions don't work. There are two basic requirements which have to be
> >> satisfies before the coherent device memory can be integrated with core
> >> kernel seamlessly.
> >>
> >> 	a. PFN must have struct page
> >> 	b. Struct page must able to be inside standard LRU lists
> >>
> >> 	The above two basic requirements discard the existing method of
> >> device memory representation approaches like these which then requires the
> >> need of creating a new framework.
> > 
> > I do not believe the LRU list is a hard requirement, yes when faulting in
> > a page inside the page cache it assumes it needs to be added to lru list.
> > But i think this can easily be work around.
> > 
> > In HMM i am using ZONE_DEVICE and because memory is not accessible from CPU
> > (not everyone is bless with decent system bus like CAPI, CCIX, Gen-Z, ...)
> > so in my case a file back page must always be spawn first from a regular
> > page and once read from disk then i can migrate to GPU page.
> > 
> 
> I've not seen the HMM patchset, but read from disk will go to ZONE_DEVICE?
> Then get migrated?

Because in my case device memory is not accessible by anything except the device
(not entirely true but for sake of design it is) any page read from disk will be
first read into regular page (from regular system memory). It is only once it is
uptodate and in page cache that it can be migrated to a ZONE_DEVICE page.

So read from disk use an intermediary page. Write back is kind of the same i plan
on using a bounce page by leveraging existing bio bounce infrastructure.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz,
	mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org,
	aneesh.kumar@linux.vnet.ibm.com
Subject: Re: [RFC 0/8] Define coherent device memory node
Date: Tue, 25 Oct 2016 11:21:17 -0400	[thread overview]
Message-ID: <20161025152052.GA6131@gmail.com> (raw)
In-Reply-To: <24fce2e8-e2e9-a665-f2a0-b7902a337c5d@gmail.com>

On Tue, Oct 25, 2016 at 11:07:39PM +1100, Balbir Singh wrote:
> On 25/10/16 04:09, Jerome Glisse wrote:
> > On Mon, Oct 24, 2016 at 10:01:49AM +0530, Anshuman Khandual wrote:
> > 
> >> [...]
> > 
> >> 	Core kernel memory features like reclamation, evictions etc. might
> >> need to be restricted or modified on the coherent device memory node as
> >> they can be performance limiting. The RFC does not propose anything on this
> >> yet but it can be looked into later on. For now it just disables Auto NUMA
> >> for any VMA which has coherent device memory.
> >>
> >> 	Seamless integration of coherent device memory with system memory
> >> will enable various other features, some of which can be listed as follows.
> >>
> >> 	a. Seamless migrations between system RAM and the coherent memory
> >> 	b. Will have asynchronous and high throughput migrations
> >> 	c. Be able to allocate huge order pages from these memory regions
> >> 	d. Restrict allocations to a large extent to the tasks using the
> >> 	   device for workload acceleration
> >>
> >> 	Before concluding, will look into the reasons why the existing
> >> solutions don't work. There are two basic requirements which have to be
> >> satisfies before the coherent device memory can be integrated with core
> >> kernel seamlessly.
> >>
> >> 	a. PFN must have struct page
> >> 	b. Struct page must able to be inside standard LRU lists
> >>
> >> 	The above two basic requirements discard the existing method of
> >> device memory representation approaches like these which then requires the
> >> need of creating a new framework.
> > 
> > I do not believe the LRU list is a hard requirement, yes when faulting in
> > a page inside the page cache it assumes it needs to be added to lru list.
> > But i think this can easily be work around.
> > 
> > In HMM i am using ZONE_DEVICE and because memory is not accessible from CPU
> > (not everyone is bless with decent system bus like CAPI, CCIX, Gen-Z, ...)
> > so in my case a file back page must always be spawn first from a regular
> > page and once read from disk then i can migrate to GPU page.
> > 
> 
> I've not seen the HMM patchset, but read from disk will go to ZONE_DEVICE?
> Then get migrated?

Because in my case device memory is not accessible by anything except the device
(not entirely true but for sake of design it is) any page read from disk will be
first read into regular page (from regular system memory). It is only once it is
uptodate and in page cache that it can be migrated to a ZONE_DEVICE page.

So read from disk use an intermediary page. Write back is kind of the same i plan
on using a bounce page by leveraging existing bio bounce infrastructure.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-10-25 15:21 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24  4:31 [RFC 0/8] Define coherent device memory node Anshuman Khandual
2016-10-24  4:31 ` Anshuman Khandual
2016-10-24  4:31 ` [RFC 1/8] mm: " Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24 17:09   ` Dave Hansen
2016-10-24 17:09     ` Dave Hansen
2016-10-25  1:22     ` Anshuman Khandual
2016-10-25  1:22       ` Anshuman Khandual
2016-10-25 15:47       ` Dave Hansen
2016-10-25 15:47         ` Dave Hansen
2016-10-24  4:31 ` [RFC 2/8] mm: Add specialized fallback zonelist for coherent device memory nodes Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24 17:10   ` Dave Hansen
2016-10-24 17:10     ` Dave Hansen
2016-10-25  1:27     ` Anshuman Khandual
2016-10-25  1:27       ` Anshuman Khandual
2016-11-17  7:40   ` Anshuman Khandual
2016-11-17  7:40     ` Anshuman Khandual
2016-11-17  7:59     ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask Anshuman Khandual
2016-11-17  7:59       ` Anshuman Khandual
2016-11-17  7:59       ` [DRAFT 2/2] mm/hugetlb: Restrict HugeTLB allocations only to the system RAM nodes Anshuman Khandual
2016-11-17  7:59         ` Anshuman Khandual
2016-11-17  8:28       ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask kbuild test robot
2016-10-24  4:31 ` [RFC 3/8] mm: Isolate coherent device memory nodes from HugeTLB allocation paths Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24 17:16   ` Dave Hansen
2016-10-24 17:16     ` Dave Hansen
2016-10-25  4:15     ` Aneesh Kumar K.V
2016-10-25  4:15       ` Aneesh Kumar K.V
2016-10-25  7:17       ` Balbir Singh
2016-10-25  7:17         ` Balbir Singh
2016-10-25  7:25         ` Balbir Singh
2016-10-25  7:25           ` Balbir Singh
2016-10-24  4:31 ` [RFC 4/8] mm: Accommodate coherent device memory nodes in MPOL_BIND implementation Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24  4:31 ` [RFC 5/8] mm: Add new flag VM_CDM for coherent device memory Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24 17:38   ` Dave Hansen
2016-10-24 17:38     ` Dave Hansen
2016-10-24 18:00     ` Dave Hansen
2016-10-24 18:00       ` Dave Hansen
2016-10-25 12:36     ` Balbir Singh
2016-10-25 12:36       ` Balbir Singh
2016-10-25 19:20     ` Aneesh Kumar K.V
2016-10-25 19:20       ` Aneesh Kumar K.V
2016-10-25 20:01       ` Dave Hansen
2016-10-25 20:01         ` Dave Hansen
2016-10-24  4:31 ` [RFC 6/8] mm: Make VM_CDM marked VMAs non migratable Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24  4:31 ` [RFC 7/8] mm: Add a new migration function migrate_virtual_range() Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-24  4:31 ` [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[] Anshuman Khandual
2016-10-24  4:31   ` Anshuman Khandual
2016-10-25  7:22   ` Balbir Singh
2016-10-25  7:22     ` Balbir Singh
2016-10-26  4:52     ` Anshuman Khandual
2016-10-26  4:52       ` Anshuman Khandual
2016-10-24  4:42 ` [DEBUG 00/10] Test and debug patches for coherent device memory Anshuman Khandual
2016-10-24  4:42   ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 01/10] dt-bindings: Add doc for ibm,hotplug-aperture Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 02/10] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 03/10] powerpc/mm: Allow memory hotplug into a memory less node Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 04/10] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 05/10] powerpc/mm: Identify isolation seeking coherent memory nodes during boot Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 06/10] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 07/10] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 08/10] powerpc: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 09/10] drivers: Add two drivers for coherent device memory tests Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 10/10] test: Add a script to perform random VMA migrations across nodes Anshuman Khandual
2016-10-24  4:42     ` Anshuman Khandual
2016-10-24 17:09 ` [RFC 0/8] Define coherent device memory node Jerome Glisse
2016-10-24 17:09   ` Jerome Glisse
2016-10-25  4:26   ` Aneesh Kumar K.V
2016-10-25  4:26     ` Aneesh Kumar K.V
2016-10-25 15:16     ` Jerome Glisse
2016-10-25 15:16       ` Jerome Glisse
2016-10-26 11:09       ` Aneesh Kumar K.V
2016-10-26 11:09         ` Aneesh Kumar K.V
2016-10-26 16:07         ` Jerome Glisse
2016-10-26 16:07           ` Jerome Glisse
2016-10-28  5:29           ` Aneesh Kumar K.V
2016-10-28  5:29             ` Aneesh Kumar K.V
2016-10-28 16:16             ` Jerome Glisse
2016-10-28 16:16               ` Jerome Glisse
2016-11-05  5:21     ` Anshuman Khandual
2016-11-05  5:21       ` Anshuman Khandual
2016-11-05 18:02       ` Jerome Glisse
2016-11-05 18:02         ` Jerome Glisse
2016-10-25  4:59   ` Aneesh Kumar K.V
2016-10-25  4:59     ` Aneesh Kumar K.V
2016-10-25 15:32     ` Jerome Glisse
2016-10-25 15:32       ` Jerome Glisse
2016-10-25 17:31       ` Aneesh Kumar K.V
2016-10-25 17:31         ` Aneesh Kumar K.V
2016-10-25 18:52         ` Jerome Glisse
2016-10-25 18:52           ` Jerome Glisse
2016-10-26 11:13           ` Anshuman Khandual
2016-10-26 11:13             ` Anshuman Khandual
2016-10-26 16:02             ` Jerome Glisse
2016-10-26 16:02               ` Jerome Glisse
2016-10-27  4:38               ` Anshuman Khandual
2016-10-27  4:38                 ` Anshuman Khandual
2016-10-27  7:03                 ` Anshuman Khandual
2016-10-27  7:03                   ` Anshuman Khandual
2016-10-27 15:05                   ` Jerome Glisse
2016-10-27 15:05                     ` Jerome Glisse
2016-10-28  5:47                     ` Anshuman Khandual
2016-10-28  5:47                       ` Anshuman Khandual
2016-10-28 16:08                       ` Jerome Glisse
2016-10-28 16:08                         ` Jerome Glisse
2016-10-26 12:56           ` Anshuman Khandual
2016-10-26 12:56             ` Anshuman Khandual
2016-10-26 16:28             ` Jerome Glisse
2016-10-26 16:28               ` Jerome Glisse
2016-10-27 10:23               ` Balbir Singh
2016-10-27 10:23                 ` Balbir Singh
2016-10-25 12:07   ` Balbir Singh
2016-10-25 12:07     ` Balbir Singh
2016-10-25 15:21     ` Jerome Glisse [this message]
2016-10-25 15:21       ` Jerome Glisse
2016-10-24 18:04 ` Dave Hansen
2016-10-24 18:04   ` Dave Hansen
2016-10-24 18:32   ` David Nellans
2016-10-24 18:32     ` David Nellans
2016-10-24 19:36     ` Dave Hansen
2016-10-24 19:36       ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025152052.GA6131@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=js1304@gmail.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.