linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-mm@kvack.org, "Jérôme Glisse" <jglisse@redhat.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
Date: Wed, 25 Mar 2020 09:01:17 +0100	[thread overview]
Message-ID: <20200325080117.GY19542@dhcp22.suse.cz> (raw)
In-Reply-To: <20200324194137.GQ13183@mellanox.com>

On Tue 24-03-20 16:41:37, Jason Gunthorpe wrote:
> On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> > > Many users of the mmu_notifier invalidate_range callbacks maintain
> > > locking/counters/etc on a paired basis and have long expected that
> > > invalidate_range_start/end() are always paired.
> > > 
> > > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > > kvm->mmu_notifier_count which was incremented during start().
> > > 
> > > The recent change to add non-blocking notifiers breaks this assumption
> > > when multiple notifiers are present in the list. When EAGAIN is returned
> > > from an invalidate_range_start() then no invalidate_range_ends() are
> > > called, even if the subscription's start had previously been called.
> > > 
> > > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > > subset of the linked list representing the notifiers already called to
> > > generate an invalidate_range_end() pairing.
> > > 
> > > One case works correctly, if only one subscription requires
> > > invalidate_range_end() and it is the last entry in the hlist. In this
> > > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > > to unwind.
> > > 
> > > Keep the notifier hlist sorted so that notifiers that require
> > > invalidate_range_end() are always last, and if two are added then disable
> > > non-blocking invalidation for the mm.
> > > 
> > > A warning is printed for this case, if in future we determine this never
> > > happens then we can simply fail during registration when there are
> > > unsupported combinations of notifiers.
> > > 
> > > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > >  mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 50 insertions(+), 3 deletions(-)
> > > 
> > > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> > > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> > > * Abandon attempting to fix it by calling invalidate_range_end() during an
> > >   EAGAIN start
> > > * Just trivially ban multiple subscriptions
> > > v3:
> > > * Be more sophisticated, ban only multiple subscriptions if the result is
> > >   a failure. Allows multiple subscriptions without invalidate_range_end
> > > * Include a printk when this condition is hit (Michal)
> > > 
> > > At this point the rework Christoph requested during the first posting
> > > is completed and there are now only 3 drivers using
> > > invalidate_range_end():
> > > 
> > > drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> > > drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> > > virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> > > 
> > > While I think it is unlikely that any of these drivers will be used in
> > > combination with each other, display a printk in hopes to check.
> > > 
> > > Someday I expect to just fail the registration on this condition.
> > > 
> > > I think this also addresses Michal's concern about a 'big hammer' as
> > > it probably won't ever trigger now.
> > 
> > I'm going to put this in linux-next to see if there are any reports of
> > the pr_warn failing.
> > 
> > Michal, are you happy with this solution now?
> 
> It's been a month in linux-next now, with no complaints. If there are
> no comments I will go ahead to send it in the hmm PR.

I will not block this but it still looks like a wrong approach. A more
robust solution would be to allow calling invalidate_range_end even for
the failing invalidate_start.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-03-25  8:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
2020-02-11 21:28 ` Ralph Campbell
2020-02-11 23:42   ` Jason Gunthorpe
2020-02-28 13:50 ` Jason Gunthorpe
2020-03-24 19:41   ` Jason Gunthorpe
2020-03-25  8:01     ` Michal Hocko [this message]
2020-03-25 12:14       ` Jason Gunthorpe
2020-03-25 13:06         ` Michal Hocko
2020-03-26 13:06 ` Qian Cai
2020-03-26 14:56   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200325080117.GY19542@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).