All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Thu, 8 May 2014 21:26:03 -0400	[thread overview]
Message-ID: <20140509012601.GA2906@gmail.com> (raw)
In-Reply-To: <1399446892.4161.34.camel@pasglop>

On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote:
> > 
> > I do understand that i was pointing out that if i move to, tlb which i
> > am fine with, i will still need to sleep there. That's all i wanted to
> > stress, i did not wanted force using mmu_notifier, i am fine with them
> > becoming atomic as long as i have a place where i can intercept cpu
> > page table update and propagate them to device mmu.
> 
> Your MMU notifier can maintain a map of "dirty" PTEs and you do the
> actual synchronization in the subsequent flush_tlb_* , you need to add
> hooks there but it's much less painful than in the notifiers.
> 
> *However* Linus, even then we can't sleep. We do things like
> ptep_clear_flush() that need the PTL and have the synchronous flush
> semantics.
> 
> Sure, today we wait, possibly for a long time, with IPIs, but we do not
> sleep. Jerome would have to operate within a similar context. No sleep
> for you :)
> 
> Cheers,
> Ben.
> 
> 

So Linus, Benjamin is right there was couple case i did not think about.
For instance with cow page, one thread might trigger copy on write allocate
new page and update page table and another cpu thread might start using the
new page before we even get a chance to update the GPU page table thus GPU
could be working on outdated data.

Same kind of race exist on fork when we write protect a page or on when we
split a huge page.

I thought that i only needed to special case page reclaimation, migration
and forbid things like ksm but i am wrong.

So with that in mind are you ok if i pursue the mmu_notifier case taking
into account the result about rwsem+optspin that would allow to make the
many fork workload fast while still allowing mmu_notifier callback to
sleep ?

Otherwise i have no other choice than to add something like mmu_notifier
in the place where there can a be race (huge page split, cow, ...). Which
sounds like a bad idea to me when mmu_notifier is perfect for the job.

Cheers,
Jérôme Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
	Cameron Buschardt <cabuschardt@nvidia.com>,
	Arvind Gopalakrishnan <arvindg@nvidia.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>,
	Shachar Raindel <raindel@mellanox.com>,
	Liran Liss <liranl@mellanox.com>,
	Roland Dreier <roland@purestorage.com>,
	"Sander, Ben" <ben.sander@amd.com>,
	"Stoner, Greg" <Greg.Stoner@amd.com>,
	"Bridgman, John" <John.Bridgman@amd.com>,
	"Mantor, Michael" <Michael.Mantor@amd.com>,
	"Blinzer, Paul" <Paul.Blinzer@amd.com>,
	"Morichetti, Laurent" <Laurent.Morichetti@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Gabbay, Oded" <Oded.Gabbay@amd.com>,
	Davidlohr Bueso <davidlohr@hp.com>
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Thu, 8 May 2014 21:26:03 -0400	[thread overview]
Message-ID: <20140509012601.GA2906@gmail.com> (raw)
In-Reply-To: <1399446892.4161.34.camel@pasglop>

On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote:
> > 
> > I do understand that i was pointing out that if i move to, tlb which i
> > am fine with, i will still need to sleep there. That's all i wanted to
> > stress, i did not wanted force using mmu_notifier, i am fine with them
> > becoming atomic as long as i have a place where i can intercept cpu
> > page table update and propagate them to device mmu.
> 
> Your MMU notifier can maintain a map of "dirty" PTEs and you do the
> actual synchronization in the subsequent flush_tlb_* , you need to add
> hooks there but it's much less painful than in the notifiers.
> 
> *However* Linus, even then we can't sleep. We do things like
> ptep_clear_flush() that need the PTL and have the synchronous flush
> semantics.
> 
> Sure, today we wait, possibly for a long time, with IPIs, but we do not
> sleep. Jerome would have to operate within a similar context. No sleep
> for you :)
> 
> Cheers,
> Ben.
> 
> 

So Linus, Benjamin is right there was couple case i did not think about.
For instance with cow page, one thread might trigger copy on write allocate
new page and update page table and another cpu thread might start using the
new page before we even get a chance to update the GPU page table thus GPU
could be working on outdated data.

Same kind of race exist on fork when we write protect a page or on when we
split a huge page.

I thought that i only needed to special case page reclaimation, migration
and forbid things like ksm but i am wrong.

So with that in mind are you ok if i pursue the mmu_notifier case taking
into account the result about rwsem+optspin that would allow to make the
many fork workload fast while still allowing mmu_notifier callback to
sleep ?

Otherwise i have no other choice than to add something like mmu_notifier
in the place where there can a be race (huge page split, cow, ...). Which
sounds like a bad idea to me when mmu_notifier is perfect for the job.

Cheers,
Jerome Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-05-09  1:26 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:51 ` j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-02 13:52   ` j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 10:29   ` Peter Zijlstra
2014-05-06 14:57   ` Linus Torvalds
2014-05-06 14:57     ` Linus Torvalds
2014-05-06 15:00     ` Jerome Glisse
2014-05-06 15:00       ` Jerome Glisse
2014-05-06 15:18       ` Linus Torvalds
2014-05-06 15:18         ` Linus Torvalds
2014-05-06 15:33         ` Jerome Glisse
2014-05-06 15:33           ` Jerome Glisse
2014-05-06 15:42           ` Rik van Riel
2014-05-06 15:42             ` Rik van Riel
2014-05-06 15:47           ` Linus Torvalds
2014-05-06 15:47             ` Linus Torvalds
2014-05-06 16:18             ` Jerome Glisse
2014-05-06 16:18               ` Jerome Glisse
2014-05-06 16:32               ` Linus Torvalds
2014-05-06 16:32                 ` Linus Torvalds
2014-05-06 16:49                 ` Jerome Glisse
2014-05-06 16:49                   ` Jerome Glisse
2014-05-06 17:28                 ` Jerome Glisse
2014-05-06 17:28                   ` Jerome Glisse
2014-05-06 17:43                   ` Linus Torvalds
2014-05-06 17:43                     ` Linus Torvalds
2014-05-06 18:13                     ` Jerome Glisse
2014-05-06 18:13                       ` Jerome Glisse
2014-05-06 18:22                       ` Linus Torvalds
2014-05-06 18:22                         ` Linus Torvalds
2014-05-06 18:38                         ` Jerome Glisse
2014-05-06 18:38                           ` Jerome Glisse
2014-05-07  7:18                 ` Benjamin Herrenschmidt
2014-05-07  7:18                   ` Benjamin Herrenschmidt
2014-05-07  7:14               ` Benjamin Herrenschmidt
2014-05-07  7:14                 ` Benjamin Herrenschmidt
2014-05-07 12:39                 ` Jerome Glisse
2014-05-07 12:39                   ` Jerome Glisse
2014-05-09  1:26                 ` Jerome Glisse [this message]
2014-05-09  1:26                   ` Jerome Glisse
2014-05-10  4:28                   ` Benjamin Herrenschmidt
2014-05-10  4:28                     ` Benjamin Herrenschmidt
2014-05-11  0:48                     ` Jerome Glisse
2014-05-11  0:48                       ` Jerome Glisse
2014-05-06 16:30             ` Rik van Riel
2014-05-06 16:30               ` Rik van Riel
2014-05-06 16:34               ` Linus Torvalds
2014-05-06 16:34                 ` Linus Torvalds
2014-05-06 16:47                 ` Rik van Riel
2014-05-06 16:47                   ` Rik van Riel
2014-05-06 16:54                   ` Jerome Glisse
2014-05-06 16:54                     ` Jerome Glisse
2014-05-06 18:02                     ` H. Peter Anvin
2014-05-06 18:02                       ` H. Peter Anvin
2014-05-06 18:26                       ` Jerome Glisse
2014-05-06 18:26                         ` Jerome Glisse
2014-05-06 22:44                 ` David Airlie
2014-05-06 22:44                   ` David Airlie
2014-05-07  2:33   ` Davidlohr Bueso
2014-05-07  2:33     ` Davidlohr Bueso
2014-05-07 13:00     ` Peter Zijlstra
2014-05-07 13:00       ` Peter Zijlstra
2014-05-07 17:34       ` Davidlohr Bueso
2014-05-07 17:34         ` Davidlohr Bueso
2014-05-07 16:21     ` Linus Torvalds
2014-05-07 16:21       ` Linus Torvalds
2014-05-08 16:47     ` sagi grimberg
2014-05-08 16:47       ` sagi grimberg
2014-05-08 17:56       ` Jerome Glisse
2014-05-08 17:56         ` Jerome Glisse
2014-05-09  1:42         ` Davidlohr Bueso
2014-05-09  1:42           ` Davidlohr Bueso
2014-05-09  1:45           ` Jerome Glisse
2014-05-09  1:45             ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140509012601.GA2906@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=blc@redhat.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.