linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Christoph Lameter <cl@linux.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	joro@8bytes.org, Mel Gorman <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
	Cameron Buschardt <cabuschardt@nvidia.com>,
	Arvind Gopalakrishnan <arvindg@nvidia.com>,
	Shachar Raindel <raindel@mellanox.com>,
	Liran Liss <liranl@mellanox.com>,
	Roland Dreier <roland@purestorage.com>,
	Ben Sander <ben.sander@amd.com>,
	Greg Stoner <Greg.Stoner@amd.com>,
	John Bridgman <John.Bridgman@amd.com>,
	Michael Mantor <Michael.Mantor@amd.com>,
	Paul Blinzer <Paul.Blinzer@amd.com>,
	Laurent Morichetti <Laurent.Morichetti@amd.com>,
	Alexander Deucher <Alexander.Deucher@amd.com>,
	Oded Gabbay <Oded.Gabbay@amd.com>,
	linux-fsdevel@vger.kernel.org, Linda Wang <lwang@redhat.com>,
	Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>, Jeff Law <law@redhat.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>
Subject: Re: HMM (heterogeneous memory management) v6
Date: Wed, 12 Nov 2014 15:09:14 -0500	[thread overview]
Message-ID: <20141112200911.GA7720@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1411111259560.6657@gentwo.org>

On Tue, Nov 11, 2014 at 01:00:56PM -0600, Christoph Lameter wrote:
> On Mon, 10 Nov 2014, j.glisse@gmail.com wrote:
> 
> > In a nutshell HMM is a subsystem that provide an easy to use api to mirror a
> > process address on a device with minimal hardware requirement (mainly device
> > page fault and read only page mapping). This does not rely on ATS and PASID
> > PCIE extensions. It intends to supersede those extensions by allowing to move
> > system memory to device memory in a transparent fashion for core kernel mm
> > code (ie cpu page fault on page residing in device memory will trigger
> > migration back to system memory).
> 
> Could we define a new NUMA node that maps memory from the GPU and
> then simply use the existing NUMA features to move a process over there.

Sorry for late reply, i am traveling and working on an updated patchset to
change the device page table design to something simpler and easier to grasp.

So GPU process will never run on CPU nor will they have a kernel task struct
associated with them. From core kernel point of view they do not exist. I
hope that at one point down the line the hw will allow for better integration
with kernel core but it's not there yet.

So the NUMA idea was considered early on but was discarded as it's not really
appropriate. You can have several CPU thread working with several GPU thread
at the same time and they can either access disjoint memory or some share
memory. Usual case will be few kbytes of share memory for synchronization
btw CPU and GPU threads.

But when a GPU job is launch we want most of the memory it will use to be
migrated to device memory. Issue is that the device memory is not accessible
from the CPU (PCIE bar are too small). So there is no way to keep the memory
mapped for the CPU. We do need to mark the memory as unaccessible to the CPU
and then migrate it to the GPU memory.

Now when there is a CPU page fault on some migrated memory we need to migrate
memory back to system memory. Hence why i need to tie HMM with some core MM
code so that on this kind of fault core kernel knows it needs to call into
HMM which will perform housekeeping and starts migration back to system
memory.


So technicaly there is no task migration only memory migration.


Is there something i missing inside NUMA or some NUMA work in progress that
change NUMA sufficiently that it might somehow address the use case i am
describing above ?


Cheers,
Jérôme

  reply	other threads:[~2014-11-12 20:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-10 18:28 HMM (heterogeneous memory management) v6 j.glisse
2014-11-10 18:28 ` [PATCH 1/5] mmu_notifier: add event information to address invalidation v6 j.glisse
2014-11-10 18:28 ` [PATCH 2/5] mmu_notifier: keep track of active invalidation ranges v2 j.glisse
2014-11-10 18:28 ` [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2 j.glisse
2014-11-10 20:22   ` Linus Torvalds
2014-11-10 20:58     ` Jerome Glisse
2014-11-10 21:35       ` Linus Torvalds
2014-11-10 21:47         ` Linus Torvalds
2014-11-10 22:58           ` Jerome Glisse
2014-11-10 22:50         ` Jerome Glisse
2014-11-10 23:53           ` Linus Torvalds
2014-11-11  2:45             ` Jerome Glisse
2014-11-11  3:16               ` Linus Torvalds
2014-11-11  4:19                 ` Jerome Glisse
2014-11-11  4:29                 ` Linus Torvalds
2014-11-11  9:59               ` Peter Zijlstra
2014-11-11 13:42                 ` Jerome Glisse
2014-11-11 21:01                 ` David Airlie
2014-11-13 23:50             ` Linus Torvalds
2014-11-14  0:58               ` Kirill A. Shutemov
2014-11-14  1:18                 ` Linus Torvalds
2014-11-14  1:50                   ` Linus Torvalds
2014-11-13 16:07     ` Rik van Riel
2014-11-10 18:28 ` [PATCH 4/5] hmm: heterogeneous memory management v6 j.glisse
2014-11-11 19:00 ` HMM (heterogeneous memory management) v6 Christoph Lameter
2014-11-12 20:09   ` Jerome Glisse [this message]
2014-11-12 23:08     ` Christoph Lameter
2014-11-13  4:28       ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141112200911.GA7720@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Oded.Gabbay@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=cl@linux.com \
    --cc=dpoole@nvidia.com \
    --cc=haggaie@mellanox.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=ogerlitz@mellanox.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sagig@mellanox.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).