From: Jerome Glisse <j.glisse@gmail.com> To: Haggai Eran <haggaie@mellanox.com> Cc: "Mark Hairgrove" <mhairgrove@nvidia.com>, "Dave Airlie" <airlied@redhat.com>, "Arvind Gopalakrishnan" <arvindg@nvidia.com>, "joro@8bytes.org" <joro@8bytes.org>, "Greg Stoner" <Greg.Stoner@amd.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "Cameron Buschardt" <cabuschardt@nvidia.com>, "Rik van Riel" <riel@redhat.com>, "Paul Blinzer" <Paul.Blinzer@amd.com>, "Lucien Dunning" <ldunning@nvidia.com>, "Johannes Weiner" <jweiner@redhat.com>, "Michael Mantor" <Michael.Mantor@amd.com>, "Laurent Morichetti" <Laurent.Morichetti@amd.com>, "Larry Woodman" <lwoodman@redhat.com>, "John Hubbard" <jhubbard@nvidia.com>, "Brendan Conoboy" <blc@redhat.com>, "John Bridgman" <John.Bridgman@amd.com>, "Subhash Gutti" <sgutti@nvidia.com>, "Roland Dreier" <roland@purestorage.com>, "Duncan Poole" <dpoole@nvidia.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "Alexander Deucher" <Alexander.Deucher@amd.com>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Andrea Arcangeli" <aarcange@redhat.com>, "Oded Gabbay" <Oded.Gabbay@amd.com>, "Sherry Cheung" <SCheung@nvidia.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Shachar Raindel" <raindel@mellanox.com>, "Liran Liss" <liranl@mellanox.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Ben Sander" <ben.sander@amd.com>, "Joe Donohue" <jdonohue@redhat.com>, "Mel Gorman" <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, "Peter Zijlstra" <peterz@infradead.org> Subject: Re: [PATCH 2/7] mmu_notifier: keep track of active invalidation ranges v2 Date: Mon, 5 Jan 2015 13:49:15 -0500 [thread overview] Message-ID: <20150105184914.GA8012@gmail.com> (raw) In-Reply-To: <AMSPR05MB48272339639F199CA876C4FC1500@AMSPR05MB482.eurprd05.prod.outlook.com> On Sun, Dec 28, 2014 at 08:46:42AM +0000, Haggai Eran wrote: > > On Dec 26, 2014 9:20 AM, Jerome Glisse <j.glisse@gmail.com> wrote: > > > > On Thu, Dec 25, 2014 at 10:29:44AM +0200, Haggai Eran wrote: > > > On 22/12/2014 18:48, j.glisse@gmail.com wrote: > > > > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > - unsigned long start, > > > > - unsigned long end, > > > > - enum mmu_event event) > > > > + struct mmu_notifier_range *range) > > > > { > > > > + /* > > > > + * Initialize list no matter what in case a mmu_notifier register after > > > > + * a range_start but before matching range_end. > > > > + */ > > > > + INIT_LIST_HEAD(&range->list); > > > > > > I don't see how can an mmu_notifier register after a range_start but > > > before a matching range_end. The mmu_notifier registration locks all mm > > > locks, and that should prevent any invalidation from running, right? > > > > File invalidation (like truncation) can lead to this case. > > I thought that the fact that mm_take_all_locks locked the i_mmap_mutex of > every file would prevent this from happening, because the notifier is added > when the mutex is locked, and the truncate operation also locks it. Am I > missing something? No you right again, i was convince in my mind that mmu_notifier register was only taking the mmap semaphore in write mode for some reasons while it is in fact also calling mm_take_all_locks(). So yes this protect registration from all concurrent invalidation. > > > > > > > > > > if (mm_has_notifiers(mm)) > > > > - __mmu_notifier_invalidate_range_start(mm, start, end, event); > > > > + __mmu_notifier_invalidate_range_start(mm, range); > > > > } > > > > > > ... > > > > > > > void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > - unsigned long start, > > > > - unsigned long end, > > > > - enum mmu_event event) > > > > + struct mmu_notifier_range *range) > > > > > > > > { > > > > struct mmu_notifier *mn; > > > > @@ -185,21 +183,36 @@ void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > id = srcu_read_lock(&srcu); > > > > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { > > > > if (mn->ops->invalidate_range_start) > > > > - mn->ops->invalidate_range_start(mn, mm, start, > > > > - end, event); > > > > + mn->ops->invalidate_range_start(mn, mm, range); > > > > } > > > > srcu_read_unlock(&srcu, id); > > > > + > > > > + /* > > > > + * This must happen after the callback so that subsystem can block on > > > > + * new invalidation range to synchronize itself. > > > > + */ > > > > + spin_lock(&mm->mmu_notifier_mm->lock); > > > > + list_add_tail(&range->list, &mm->mmu_notifier_mm->ranges); > > > > + mm->mmu_notifier_mm->nranges++; > > > > + spin_unlock(&mm->mmu_notifier_mm->lock); > > > > } > > > > EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start); > > > > > > Don't you have a race here because you add the range struct after the > > > callback? > > > > > > ------------------------------------------------------------------------- > > > Thread A | Thread B > > > ------------------------------------------------------------------------- > > > call mmu notifier callback | > > > clear SPTE | > > > | device page fault > > > | mmu_notifier_range_is_valid returns true > > > | install new SPTE > > > add event struct to list | > > > mm clears/modifies the PTE | > > > ------------------------------------------------------------------------- > > > > > > So we are left with different entries in the host page table and the > > > secondary page table. > > > > > > I would think you'd want the event struct to be added to the list before > > > the callback is run. > > > > > > > Yes you right, but the comment i left trigger memory that i did that on > > purpose a one point probably with a different synch mecanism inside hmm. > > I will try to medidate a bit see if i can bring back memory why i did it > > that way in respect to previous design. > > > > In all case i will respin with that order modified. Can i add you review > > by after doing so ? > > Sure, go ahead. > > Regards, > Haggai
WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com> To: Haggai Eran <haggaie@mellanox.com> Cc: "Mark Hairgrove" <mhairgrove@nvidia.com>, "Dave Airlie" <airlied@redhat.com>, "Arvind Gopalakrishnan" <arvindg@nvidia.com>, "joro@8bytes.org" <joro@8bytes.org>, "Greg Stoner" <Greg.Stoner@amd.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "Cameron Buschardt" <cabuschardt@nvidia.com>, "Rik van Riel" <riel@redhat.com>, "Paul Blinzer" <Paul.Blinzer@amd.com>, "Lucien Dunning" <ldunning@nvidia.com>, "Johannes Weiner" <jweiner@redhat.com>, "Michael Mantor" <Michael.Mantor@amd.com>, "Laurent Morichetti" <Laurent.Morichetti@amd.com>, "Larry Woodman" <lwoodman@redhat.com>, "John Hubbard" <jhubbard@nvidia.com>, "Brendan Conoboy" <blc@redhat.com>, "John Bridgman" <John.Bridgman@amd.com>, "Subhash Gutti" <sgutti@nvidia.com>, "Roland Dreier" <roland@purestorage.com>, "Duncan Poole" <dpoole@nvidia.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "Alexander Deucher" <Alexander.Deucher@amd.com>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Andrea Arcangeli" <aarcange@redhat.com>, "Oded Gabbay" <Oded.Gabbay@amd.com>, "Sherry Cheung" <SCheung@nvidia.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Shachar Raindel" <raindel@mellanox.com>, "Liran Liss" <liranl@mellanox.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Ben Sander" <ben.sander@amd.com>, "Joe Donohue" <jdonohue@redhat.com>, "Mel Gorman" <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, "Peter Zijlstra" <peterz@infradead.org> Subject: Re: [PATCH 2/7] mmu_notifier: keep track of active invalidation ranges v2 Date: Mon, 5 Jan 2015 13:49:15 -0500 [thread overview] Message-ID: <20150105184914.GA8012@gmail.com> (raw) In-Reply-To: <AMSPR05MB48272339639F199CA876C4FC1500@AMSPR05MB482.eurprd05.prod.outlook.com> On Sun, Dec 28, 2014 at 08:46:42AM +0000, Haggai Eran wrote: > > On Dec 26, 2014 9:20 AM, Jerome Glisse <j.glisse@gmail.com> wrote: > > > > On Thu, Dec 25, 2014 at 10:29:44AM +0200, Haggai Eran wrote: > > > On 22/12/2014 18:48, j.glisse@gmail.com wrote: > > > > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > - unsigned long start, > > > > - unsigned long end, > > > > - enum mmu_event event) > > > > + struct mmu_notifier_range *range) > > > > { > > > > + /* > > > > + * Initialize list no matter what in case a mmu_notifier register after > > > > + * a range_start but before matching range_end. > > > > + */ > > > > + INIT_LIST_HEAD(&range->list); > > > > > > I don't see how can an mmu_notifier register after a range_start but > > > before a matching range_end. The mmu_notifier registration locks all mm > > > locks, and that should prevent any invalidation from running, right? > > > > File invalidation (like truncation) can lead to this case. > > I thought that the fact that mm_take_all_locks locked the i_mmap_mutex of > every file would prevent this from happening, because the notifier is added > when the mutex is locked, and the truncate operation also locks it. Am I > missing something? No you right again, i was convince in my mind that mmu_notifier register was only taking the mmap semaphore in write mode for some reasons while it is in fact also calling mm_take_all_locks(). So yes this protect registration from all concurrent invalidation. > > > > > > > > > > if (mm_has_notifiers(mm)) > > > > - __mmu_notifier_invalidate_range_start(mm, start, end, event); > > > > + __mmu_notifier_invalidate_range_start(mm, range); > > > > } > > > > > > ... > > > > > > > void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > - unsigned long start, > > > > - unsigned long end, > > > > - enum mmu_event event) > > > > + struct mmu_notifier_range *range) > > > > > > > > { > > > > struct mmu_notifier *mn; > > > > @@ -185,21 +183,36 @@ void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > > > > id = srcu_read_lock(&srcu); > > > > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { > > > > if (mn->ops->invalidate_range_start) > > > > - mn->ops->invalidate_range_start(mn, mm, start, > > > > - end, event); > > > > + mn->ops->invalidate_range_start(mn, mm, range); > > > > } > > > > srcu_read_unlock(&srcu, id); > > > > + > > > > + /* > > > > + * This must happen after the callback so that subsystem can block on > > > > + * new invalidation range to synchronize itself. > > > > + */ > > > > + spin_lock(&mm->mmu_notifier_mm->lock); > > > > + list_add_tail(&range->list, &mm->mmu_notifier_mm->ranges); > > > > + mm->mmu_notifier_mm->nranges++; > > > > + spin_unlock(&mm->mmu_notifier_mm->lock); > > > > } > > > > EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start); > > > > > > Don't you have a race here because you add the range struct after the > > > callback? > > > > > > ------------------------------------------------------------------------- > > > Thread A | Thread B > > > ------------------------------------------------------------------------- > > > call mmu notifier callback | > > > clear SPTE | > > > | device page fault > > > | mmu_notifier_range_is_valid returns true > > > | install new SPTE > > > add event struct to list | > > > mm clears/modifies the PTE | > > > ------------------------------------------------------------------------- > > > > > > So we are left with different entries in the host page table and the > > > secondary page table. > > > > > > I would think you'd want the event struct to be added to the list before > > > the callback is run. > > > > > > > Yes you right, but the comment i left trigger memory that i did that on > > purpose a one point probably with a different synch mecanism inside hmm. > > I will try to medidate a bit see if i can bring back memory why i did it > > that way in respect to previous design. > > > > In all case i will respin with that order modified. Can i add you review > > by after doing so ? > > Sure, go ahead. > > Regards, > Haggai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-01-05 18:49 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-12-28 8:46 [PATCH 2/7] mmu_notifier: keep track of active invalidation ranges v2 Haggai Eran 2015-01-05 18:49 ` Jerome Glisse [this message] 2015-01-05 18:49 ` Jerome Glisse -- strict thread matches above, loose matches on Subject: below -- 2014-12-22 16:48 HMM (Heterogeneous Memory Management) v7 j.glisse 2014-12-22 16:48 ` [PATCH 2/7] mmu_notifier: keep track of active invalidation ranges v2 j.glisse 2014-12-22 16:48 ` j.glisse 2014-12-25 8:29 ` Haggai Eran 2014-12-25 8:29 ` Haggai Eran 2014-12-26 7:20 ` Jerome Glisse 2014-12-26 7:20 ` Jerome Glisse
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150105184914.GA8012@gmail.com \ --to=j.glisse@gmail.com \ --cc=Alexander.Deucher@amd.com \ --cc=Greg.Stoner@amd.com \ --cc=John.Bridgman@amd.com \ --cc=Laurent.Morichetti@amd.com \ --cc=Michael.Mantor@amd.com \ --cc=Oded.Gabbay@amd.com \ --cc=Paul.Blinzer@amd.com \ --cc=SCheung@nvidia.com \ --cc=aarcange@redhat.com \ --cc=airlied@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=arvindg@nvidia.com \ --cc=ben.sander@amd.com \ --cc=blc@redhat.com \ --cc=cabuschardt@nvidia.com \ --cc=dpoole@nvidia.com \ --cc=haggaie@mellanox.com \ --cc=hpa@zytor.com \ --cc=jdonohue@redhat.com \ --cc=jglisse@redhat.com \ --cc=jhubbard@nvidia.com \ --cc=joro@8bytes.org \ --cc=jweiner@redhat.com \ --cc=ldunning@nvidia.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=liranl@mellanox.com \ --cc=lwoodman@redhat.com \ --cc=mgorman@suse.de \ --cc=mhairgrove@nvidia.com \ --cc=peterz@infradead.org \ --cc=raindel@mellanox.com \ --cc=riel@redhat.com \ --cc=roland@purestorage.com \ --cc=sgutti@nvidia.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.