linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"Mel Gorman" <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Haggai Eran" <haggaie@mellanox.com>,
	"Shachar Raindel" <raindel@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Oded Gabbay" <Oded.Gabbay@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jatin Kumar" <jakumar@nvidia.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.
Date: Tue, 9 Jun 2015 11:56:03 -0400	[thread overview]
Message-ID: <20150609155601.GA3101@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1506081841490.1802@mdh-linux64-2.nvidia.com>

On Mon, Jun 08, 2015 at 06:54:29PM -0700, Mark Hairgrove wrote:
> On Mon, 8 Jun 2015, Jerome Glisse wrote:
> > On Mon, Jun 08, 2015 at 12:40:18PM -0700, Mark Hairgrove wrote:
> > > On Thu, 21 May 2015, j.glisse@gmail.com wrote:
> > > > From: Jerome Glisse <jglisse@redhat.com>
> > > >
> > > > This patch only introduce core HMM functions for registering a new
> > > > mirror and stopping a mirror as well as HMM device registering and
> > > > unregistering.
> > > >
> > > > [...]
> > > >
> > > > +/* struct hmm_device_operations - HMM device operation callback
> > > > + */
> > > > +struct hmm_device_ops {
> > > > +	/* release() - mirror must stop using the address space.
> > > > +	 *
> > > > +	 * @mirror: The mirror that link process address space with the device.
> > > > +	 *
> > > > +	 * When this is call, device driver must kill all device thread using
> > > > +	 * this mirror. Also, this callback is the last thing call by HMM and
> > > > +	 * HMM will not access the mirror struct after this call (ie no more
> > > > +	 * dereference of it so it is safe for the device driver to free it).
> > > > +	 * It is call either from :
> > > > +	 *   - mm dying (all process using this mm exiting).
> > > > +	 *   - hmm_mirror_unregister() (if no other thread holds a reference)
> > > > +	 *   - outcome of some device error reported by any of the device
> > > > +	 *     callback against that mirror.
> > > > +	 */
> > > > +	void (*release)(struct hmm_mirror *mirror);
> > > > +};
> > >
> > > The comment that ->release is called when the mm dies doesn't match the
> > > implementation. ->release is only called when the mirror is destroyed, and
> > > that can only happen after the mirror has been unregistered. This may not
> > > happen until after the mm dies.
> > >
> > > Is the intent for the driver to get the callback when the mm goes down?
> > > That seems beneficial so the driver can kill whatever's happening on the
> > > device. Otherwise the device may continue operating in a dead address
> > > space until the driver's file gets closed and it unregisters the mirror.
> >
> > This was the intent before merging free & release. I guess i need to
> > reinstate the free versus release callback. Sadly the lifetime for HMM
> > is more complex than mmu_notifier as we intend the mirror struct to
> > be embedded into a driver private struct.
> 
> Can you clarify how that's different from mmu_notifiers? Those are also
> embedded into a driver-owned struct.

For HMM you want to be able to kill a mirror from HMM, you might have kernel
thread call inside HMM with a mirror (outside any device file lifetime) ...
The mirror is not only use at register & unregister, there is a lot more thing
you can call using the HMM mirror struct.

So the HMM mirror lifetime as a result is more complex, it can not simply be
free from the mmu_notifier_release callback or randomly. It needs to be
refcounted. The mmu_notifier code assume that the mmu_notifier struct is
embedded inside a struct that has a lifetime properly synchronize with the
mm. For HMM mirror this is not something that sounds like a good idea as there
is too many way to get it wrong.

So idea of HMM mirror is that it can out last the mm lifetime but the HMM
struct can not. So you have hmm_mirror <~> hmm <-> mm and the mirror can be
"unlink" and have different lifetime from the hmm that itself has same life
time as mm.

> Is the goal to allow calling hmm_mirror_unregister from within the "mm is
> dying" HMM callback? I don't know whether that's really necessary as long
> as there's some association between the driver files and the mirrors.

No this is not a goal and i actualy forbid that.

> 
> > > If so, I think there's a race here in the case of mm teardown happening
> > > concurrently with hmm_mirror_unregister.
> > >
> > > [...]
> > >
> > > Do you agree that this sequence can happen, or am I missing something
> > > which prevents it?
> >
> > Can't happen because child have mm->hmm = NULL ie only one hmm per mm
> > and hmm is tie to only one mm. It is the responsability of the device
> > driver to make sure same apply to private reference to the hmm mirror
> > struct ie hmm_mirror should never be tie to a private file struct.
> 
> It's useful for the driver to have some association between files and
> mirrors. If the file is closed prior to process exit we would like to
> unregister the mirror, otherwise it will persist until process teardown.
> The association doesn't have to be 1:1 but having the files ref count the
> mirror or something would be useful.

This is allowed, i might have put strong word here, but you can associate
with a file struct. What you can not do is use the mirror from a different
process ie one with a different mm struct as mirror is linked to a single
mm. So on fork there is no callback to update the private file struct, when
the device file is duplicated (well just refcount inc) against a different
process. This is something you need to be carefull in your driver. Inside
the dummy driver i abuse that to actually test proper behavior of HMM but
it should not be use as an example.

> 
> But even if we assume no association at all between files and mirrors, are
> you sure that prevents the race? The driver may choose to unregister the
> hmm_device at any point once its files are closed. In the case of module
> unload the device unregister can't be prevented. If mm teardown hasn't
> happened yet mirrors may still be active and registered on that
> hmm_device. The driver thus has to first call hmm_mirror_unregister on all
> active mirrors, then call hmm_device_unregister. mm teardown of those
> mirrors may trigger at any point in this sequence, so we're right back to
> that race.

So when device driver unload the first thing it needs to do is kill all of
its context ie all of its HMM mirror (unregister them) by doing so it will
make sure that there can be no more call to any of its functions.

The race with mm teardown does not exist as what matter for mm teardown is
the fact that the mirror is on the struct hmm mirrors list or not. Either
the device driver is first to remove the mirror from the list or it is the
mm teardown but this is lock protected so only one thread can do it.

The issue you pointed is really about decoupling the lifetime of the mirror
context (ie hardware thread that use the mirror) and the lifetime of the
structure that embedded the hmm_mirror struct. The device driver will care
about the second while everything else will only really care about the
first. The second tells you when you know for sure that there will be no
more callback to your device driver code. The first only tells you that
there should be no more activity associated with that mirror but some thread
might still hold a reference on the underlying struct.


Hope this clarify design and motivation behind the hmm_mirror vs hmm struct
lifetime.


Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-06-09 15:56 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 19:31 HMM (Heterogeneous Memory Management) v8 j.glisse
2015-05-21 19:31 ` [PATCH 01/36] mmu_notifier: add event information to address invalidation v7 j.glisse
2015-05-30  3:43   ` John Hubbard
2015-06-01 19:03     ` Jerome Glisse
2015-06-01 23:10       ` John Hubbard
2015-06-03 16:07         ` Jerome Glisse
2015-06-03 23:02           ` John Hubbard
2015-05-21 19:31 ` [PATCH 02/36] mmu_notifier: keep track of active invalidation ranges v3 j.glisse
2015-05-27  5:09   ` Aneesh Kumar K.V
2015-05-27 14:32     ` Jerome Glisse
2015-06-02  9:32   ` John Hubbard
2015-06-03 17:15     ` Jerome Glisse
2015-06-05  3:29       ` John Hubbard
2015-05-21 19:31 ` [PATCH 03/36] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() j.glisse
2015-05-27  5:17   ` Aneesh Kumar K.V
2015-05-27 14:33     ` Jerome Glisse
2015-06-03  4:25   ` John Hubbard
2015-05-21 19:31 ` [PATCH 04/36] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier j.glisse
2015-05-21 19:31 ` [PATCH 05/36] HMM: introduce heterogeneous memory management v3 j.glisse
2015-05-27  5:50   ` Aneesh Kumar K.V
2015-05-27 14:38     ` Jerome Glisse
2015-06-08 19:40   ` Mark Hairgrove
2015-06-08 21:17     ` Jerome Glisse
2015-06-09  1:54       ` Mark Hairgrove
2015-06-09 15:56         ` Jerome Glisse [this message]
2015-06-10  3:33           ` Mark Hairgrove
2015-06-10 15:42             ` Jerome Glisse
2015-06-11  1:15               ` Mark Hairgrove
2015-06-11 14:23                 ` Jerome Glisse
2015-06-11 22:26                   ` Mark Hairgrove
2015-06-15 14:32                     ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 06/36] HMM: add HMM page table v2 j.glisse
2015-06-19  2:06   ` Mark Hairgrove
2015-06-19 18:07     ` Jerome Glisse
2015-06-20  2:34       ` Mark Hairgrove
2015-06-25 22:57   ` Mark Hairgrove
2015-06-26 16:30     ` Jerome Glisse
2015-06-27  1:34       ` Mark Hairgrove
2015-06-29 14:43         ` Jerome Glisse
2015-07-01  2:51           ` Mark Hairgrove
2015-07-01 15:07             ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 07/36] HMM: add per mirror page table v3 j.glisse
2015-06-25 23:05   ` Mark Hairgrove
2015-06-26 16:43     ` Jerome Glisse
2015-06-27  3:02       ` Mark Hairgrove
2015-06-29 14:50         ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 08/36] HMM: add device page fault support v3 j.glisse
2015-05-21 19:31 ` [PATCH 09/36] HMM: add mm page table iterator helpers j.glisse
2015-05-21 19:31 ` [PATCH 10/36] HMM: use CPU page table during invalidation j.glisse
2015-05-21 19:31 ` [PATCH 11/36] HMM: add discard range helper (to clear and free resources for a range) j.glisse
2015-05-21 19:31 ` [PATCH 12/36] HMM: add dirty range helper (to toggle dirty bit inside mirror page table) j.glisse
2015-05-21 19:31 ` [PATCH 13/36] HMM: DMA map memory on behalf of device driver j.glisse
2015-05-21 19:31 ` [PATCH 14/36] fork: pass the dst vma to copy_page_range() and its sub-functions j.glisse
2015-05-21 19:31 ` [PATCH 15/36] memcg: export get_mem_cgroup_from_mm() j.glisse
2015-05-21 19:31 ` [PATCH 16/36] HMM: add special swap filetype for memory migrated to HMM device memory j.glisse
2015-06-24  7:49   ` Haggai Eran
2015-05-21 19:31 ` [PATCH 17/36] HMM: add new HMM page table flag (valid device memory) j.glisse
2015-05-21 19:31 ` [PATCH 18/36] HMM: add new HMM page table flag (select flag) j.glisse
2015-05-21 19:31 ` [PATCH 19/36] HMM: handle HMM device page table entry on mirror page table fault and update j.glisse
2015-05-21 20:22 ` [PATCH 20/36] HMM: mm add helper to update page table when migrating memory back jglisse
2015-05-21 20:22   ` [PATCH 21/36] HMM: mm add helper to update page table when migrating memory jglisse
2015-05-21 20:22   ` [PATCH 22/36] HMM: add new callback for copying memory from and to device memory jglisse
2015-05-21 20:22   ` [PATCH 23/36] HMM: allow to get pointer to spinlock protecting a directory jglisse
2015-05-21 20:23   ` [PATCH 24/36] HMM: split DMA mapping function in two jglisse
2015-05-21 20:23   ` [PATCH 25/36] HMM: add helpers for migration back to system memory jglisse
2015-05-21 20:23   ` [PATCH 26/36] HMM: fork copy migrated memory into system memory for child process jglisse
2015-05-21 20:23   ` [PATCH 27/36] HMM: CPU page fault on migrated memory jglisse
2015-05-21 20:23   ` [PATCH 28/36] HMM: add mirror fault support for system to device memory migration jglisse
2015-05-21 20:23   ` [PATCH 29/36] IB/mlx5: add a new paramter to __mlx_ib_populated_pas for ODP with HMM jglisse
2015-05-21 20:23   ` [PATCH 30/36] IB/mlx5: add a new paramter to mlx5_ib_update_mtt() " jglisse
2015-05-21 20:23   ` [PATCH 31/36] IB/odp: export rbt_ib_umem_for_each_in_range() jglisse
2015-05-21 20:23   ` [PATCH 32/36] IB/odp/hmm: add new kernel option to use HMM for ODP jglisse
2015-05-21 20:23   ` [PATCH 33/36] IB/odp/hmm: add core infiniband structure and helper for ODP with HMM jglisse
2015-06-24 13:59     ` Haggai Eran
2015-05-21 20:23   ` [PATCH 34/36] IB/mlx5/hmm: add mlx5 HMM device initialization and callback jglisse
2015-05-21 20:23   ` [PATCH 35/36] IB/mlx5/hmm: add page fault support for ODP on HMM jglisse
2015-05-21 20:23   ` [PATCH 36/36] IB/mlx5/hmm: enable ODP using HMM jglisse
2015-05-30  3:01 ` HMM (Heterogeneous Memory Management) v8 John Hubbard
2015-05-31  6:56 ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150609155601.GA3101@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Oded.Gabbay@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=dpoole@nvidia.com \
    --cc=haggaie@mellanox.com \
    --cc=hpa@zytor.com \
    --cc=jakumar@nvidia.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=liranl@mellanox.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).