* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
@ 2020-02-11 21:28 ` Ralph Campbell
2020-02-11 23:42 ` Jason Gunthorpe
2020-02-28 13:50 ` Jason Gunthorpe
2020-03-26 13:06 ` Qian Cai
2 siblings, 1 reply; 10+ messages in thread
From: Ralph Campbell @ 2020-02-11 21:28 UTC (permalink / raw)
To: Jason Gunthorpe, linux-mm
Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig
On 2/11/20 12:52 PM, Jason Gunthorpe wrote:
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
>
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
>
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
>
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
>
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
>
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
>
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.
>
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
> mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
>
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
> EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
> a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
>
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
>
> drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
> virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
>
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
>
> Someday I expect to just fail the registration on this condition.
>
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
>
> Regards,
> Jason
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ef3973a5d34a94..f3aba7a970f576 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
> struct mmu_notifier_subscriptions {
> /* all mmu notifiers registered in this mm are queued in this list */
> struct hlist_head list;
> - bool has_itree;
> + u8 has_itree;
> + u8 no_blocking;
> /* to serialize the list modifications and hlist_unhashed */
> spinlock_t lock;
> unsigned long invalidate_seq;
> @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
> int ret = 0;
> int id;
>
> + if (unlikely(subscriptions->no_blocking &&
> + !mmu_notifier_range_blockable(range)))
> + return -EAGAIN;
> +
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
> const struct mmu_notifier_ops *ops = subscription->ops;
> @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> srcu_read_unlock(&srcu, id);
> }
>
> +/*
> + * Add a hlist subscription to the list. The list is kept sorted by the
> + * existence of ops->invalidate_range_end. If there is more than one
> + * invalidate_range_end in the list then this process can no longer support
> + * non-blocking invalidation.
> + *
> + * non-blocking invalidation is problematic as a requirement to block results in
> + * the invalidation being aborted, however due to the use of RCU we have no
> + * reliable way to ensure that every sueessful invalidate_range_start() results
s/sueessful/successful
> + * in a call to invalidate_range_end().
> + *
> + * Thus to support blocking only the last subscription in the list can have
> + * invalidate_range_end() set.
> + */
> +static void
> +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
> + struct mmu_notifier *subscription)
We have mn_hlist_xxx in a number of places in mmu_notifier.c.
Seems like this should be named mn_hlist_add_subscription().
> +{
> + struct mmu_notifier *last = NULL;
> + struct mmu_notifier *itr;
> +
> + hlist_for_each_entry(itr, &subscriptions->list, hlist)
> + last = itr;
> +
> + if (last && last->ops->invalidate_range_end &&
> + subscription->ops->invalidate_range_end) {
> + subscriptions->no_blocking = true;
> + pr_warn_once(
> + "%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
line length?
> + current->comm, current->pid,
> + last->ops->invalidate_range_end,
> + subscription->ops->invalidate_range_end);
> + }
> + if (!last || !last->ops->invalidate_range_end)
> + subscriptions->no_blocking = false;
> +
> + if (last && subscription->ops->invalidate_range_end)
> + hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
> + else
> + hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
> +}
> +
> /*
> * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
> * write mode. A NULL mn signals the notifier is being registered for itree
> @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> subscription->users = 1;
>
> spin_lock(&mm->notifier_subscriptions->lock);
> - hlist_add_head_rcu(&subscription->hlist,
> - &mm->notifier_subscriptions->list);
> + mn_hist_add_subscription(mm->notifier_subscriptions,
> + subscription);
> spin_unlock(&mm->notifier_subscriptions->lock);
> } else
> mm->notifier_subscriptions->has_itree = true;
>
Other than some nits, looks good to me so you can add:
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-02-11 21:28 ` Ralph Campbell
@ 2020-02-11 23:42 ` Jason Gunthorpe
0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-11 23:42 UTC (permalink / raw)
To: Ralph Campbell
Cc: linux-mm, Michal Hocko, Jérôme Glisse, Christoph Hellwig
On Tue, Feb 11, 2020 at 01:28:42PM -0800, Ralph Campbell wrote:
> > +/*
> > + * Add a hlist subscription to the list. The list is kept sorted by the
> > + * existence of ops->invalidate_range_end. If there is more than one
> > + * invalidate_range_end in the list then this process can no longer support
> > + * non-blocking invalidation.
> > + *
> > + * non-blocking invalidation is problematic as a requirement to block results in
> > + * the invalidation being aborted, however due to the use of RCU we have no
> > + * reliable way to ensure that every sueessful invalidate_range_start() results
>
> s/sueessful/successful
woops, yes, two spellos, thanks
> > +{
> > + struct mmu_notifier *last = NULL;
> > + struct mmu_notifier *itr;
> > +
> > + hlist_for_each_entry(itr, &subscriptions->list, hlist)
> > + last = itr;
> > +
> > + if (last && last->ops->invalidate_range_end &&
> > + subscription->ops->invalidate_range_end) {
> > + subscriptions->no_blocking = true;
> > + pr_warn_once(
> > + "%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
>
> line length?
Style guide is to keep strings across the 80 cols for grepability
> > /*
> > * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
> > * write mode. A NULL mn signals the notifier is being registered for itree
> > @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> > subscription->users = 1;
> > spin_lock(&mm->notifier_subscriptions->lock);
> > - hlist_add_head_rcu(&subscription->hlist,
> > - &mm->notifier_subscriptions->list);
> > + mn_hist_add_subscription(mm->notifier_subscriptions,
> > + subscription);
> > spin_unlock(&mm->notifier_subscriptions->lock);
> > } else
> > mm->notifier_subscriptions->has_itree = true;
> >
>
> Other than some nits, looks good to me so you can add:
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Great, thanks!
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
2020-02-11 21:28 ` Ralph Campbell
@ 2020-02-28 13:50 ` Jason Gunthorpe
2020-03-24 19:41 ` Jason Gunthorpe
2020-03-26 13:06 ` Qian Cai
2 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-28 13:50 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig
On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
>
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
>
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
>
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
>
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
>
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
>
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.
>
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
>
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
> EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
> a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
>
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
>
> drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
> virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
>
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
>
> Someday I expect to just fail the registration on this condition.
>
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
I'm going to put this in linux-next to see if there are any reports of
the pr_warn failing.
Michal, are you happy with this solution now?
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-02-28 13:50 ` Jason Gunthorpe
@ 2020-03-24 19:41 ` Jason Gunthorpe
2020-03-25 8:01 ` Michal Hocko
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-24 19:41 UTC (permalink / raw)
To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig
On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> > Many users of the mmu_notifier invalidate_range callbacks maintain
> > locking/counters/etc on a paired basis and have long expected that
> > invalidate_range_start/end() are always paired.
> >
> > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > kvm->mmu_notifier_count which was incremented during start().
> >
> > The recent change to add non-blocking notifiers breaks this assumption
> > when multiple notifiers are present in the list. When EAGAIN is returned
> > from an invalidate_range_start() then no invalidate_range_ends() are
> > called, even if the subscription's start had previously been called.
> >
> > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > subset of the linked list representing the notifiers already called to
> > generate an invalidate_range_end() pairing.
> >
> > One case works correctly, if only one subscription requires
> > invalidate_range_end() and it is the last entry in the hlist. In this
> > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > to unwind.
> >
> > Keep the notifier hlist sorted so that notifiers that require
> > invalidate_range_end() are always last, and if two are added then disable
> > non-blocking invalidation for the mm.
> >
> > A warning is printed for this case, if in future we determine this never
> > happens then we can simply fail during registration when there are
> > unsupported combinations of notifiers.
> >
> > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 50 insertions(+), 3 deletions(-)
> >
> > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> > * Abandon attempting to fix it by calling invalidate_range_end() during an
> > EAGAIN start
> > * Just trivially ban multiple subscriptions
> > v3:
> > * Be more sophisticated, ban only multiple subscriptions if the result is
> > a failure. Allows multiple subscriptions without invalidate_range_end
> > * Include a printk when this condition is hit (Michal)
> >
> > At this point the rework Christoph requested during the first posting
> > is completed and there are now only 3 drivers using
> > invalidate_range_end():
> >
> > drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> > drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
> > virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
> >
> > While I think it is unlikely that any of these drivers will be used in
> > combination with each other, display a printk in hopes to check.
> >
> > Someday I expect to just fail the registration on this condition.
> >
> > I think this also addresses Michal's concern about a 'big hammer' as
> > it probably won't ever trigger now.
>
> I'm going to put this in linux-next to see if there are any reports of
> the pr_warn failing.
>
> Michal, are you happy with this solution now?
It's been a month in linux-next now, with no complaints. If there are
no comments I will go ahead to send it in the hmm PR.
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-03-24 19:41 ` Jason Gunthorpe
@ 2020-03-25 8:01 ` Michal Hocko
2020-03-25 12:14 ` Jason Gunthorpe
0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-25 8:01 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig
On Tue 24-03-20 16:41:37, Jason Gunthorpe wrote:
> On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> > > Many users of the mmu_notifier invalidate_range callbacks maintain
> > > locking/counters/etc on a paired basis and have long expected that
> > > invalidate_range_start/end() are always paired.
> > >
> > > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > > kvm->mmu_notifier_count which was incremented during start().
> > >
> > > The recent change to add non-blocking notifiers breaks this assumption
> > > when multiple notifiers are present in the list. When EAGAIN is returned
> > > from an invalidate_range_start() then no invalidate_range_ends() are
> > > called, even if the subscription's start had previously been called.
> > >
> > > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > > subset of the linked list representing the notifiers already called to
> > > generate an invalidate_range_end() pairing.
> > >
> > > One case works correctly, if only one subscription requires
> > > invalidate_range_end() and it is the last entry in the hlist. In this
> > > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > > to unwind.
> > >
> > > Keep the notifier hlist sorted so that notifiers that require
> > > invalidate_range_end() are always last, and if two are added then disable
> > > non-blocking invalidation for the mm.
> > >
> > > A warning is printed for this case, if in future we determine this never
> > > happens then we can simply fail during registration when there are
> > > unsupported combinations of notifiers.
> > >
> > > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > > mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> > > 1 file changed, 50 insertions(+), 3 deletions(-)
> > >
> > > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> > > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> > > * Abandon attempting to fix it by calling invalidate_range_end() during an
> > > EAGAIN start
> > > * Just trivially ban multiple subscriptions
> > > v3:
> > > * Be more sophisticated, ban only multiple subscriptions if the result is
> > > a failure. Allows multiple subscriptions without invalidate_range_end
> > > * Include a printk when this condition is hit (Michal)
> > >
> > > At this point the rework Christoph requested during the first posting
> > > is completed and there are now only 3 drivers using
> > > invalidate_range_end():
> > >
> > > drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> > > drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
> > > virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
> > >
> > > While I think it is unlikely that any of these drivers will be used in
> > > combination with each other, display a printk in hopes to check.
> > >
> > > Someday I expect to just fail the registration on this condition.
> > >
> > > I think this also addresses Michal's concern about a 'big hammer' as
> > > it probably won't ever trigger now.
> >
> > I'm going to put this in linux-next to see if there are any reports of
> > the pr_warn failing.
> >
> > Michal, are you happy with this solution now?
>
> It's been a month in linux-next now, with no complaints. If there are
> no comments I will go ahead to send it in the hmm PR.
I will not block this but it still looks like a wrong approach. A more
robust solution would be to allow calling invalidate_range_end even for
the failing invalidate_start.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-03-25 8:01 ` Michal Hocko
@ 2020-03-25 12:14 ` Jason Gunthorpe
2020-03-25 13:06 ` Michal Hocko
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-25 12:14 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig
On Wed, Mar 25, 2020 at 09:01:17AM +0100, Michal Hocko wrote:
> > > I'm going to put this in linux-next to see if there are any reports of
> > > the pr_warn failing.
> > >
> > > Michal, are you happy with this solution now?
> >
> > It's been a month in linux-next now, with no complaints. If there are
> > no comments I will go ahead to send it in the hmm PR.
>
> I will not block this but it still looks like a wrong approach. A more
> robust solution would be to allow calling invalidate_range_end even for
> the failing invalidate_start.
That requires reliably walking a rcu list backwards under rcu. I don't
have a good feeling about that algorithm. Do you know of a solution?
Since we don't actually have any users that care about this any more,
I felt this testable solution was better.
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-03-25 12:14 ` Jason Gunthorpe
@ 2020-03-25 13:06 ` Michal Hocko
0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2020-03-25 13:06 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig
On Wed 25-03-20 09:14:07, Jason Gunthorpe wrote:
> On Wed, Mar 25, 2020 at 09:01:17AM +0100, Michal Hocko wrote:
> > > > I'm going to put this in linux-next to see if there are any reports of
> > > > the pr_warn failing.
> > > >
> > > > Michal, are you happy with this solution now?
> > >
> > > It's been a month in linux-next now, with no complaints. If there are
> > > no comments I will go ahead to send it in the hmm PR.
> >
> > I will not block this but it still looks like a wrong approach. A more
> > robust solution would be to allow calling invalidate_range_end even for
> > the failing invalidate_start.
>
> That requires reliably walking a rcu list backwards under rcu. I don't
> have a good feeling about that algorithm. Do you know of a solution?
No, not really. I would be pushing for it if I had. Maybe we want a
different data structure than the RCU list. But...
> Since we don't actually have any users that care about this any more,
> I felt this testable solution was better.
... you are right that this is probably the best short term workaround.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
2020-02-11 21:28 ` Ralph Campbell
2020-02-28 13:50 ` Jason Gunthorpe
@ 2020-03-26 13:06 ` Qian Cai
2020-03-26 14:56 ` Jason Gunthorpe
2 siblings, 1 reply; 10+ messages in thread
From: Qian Cai @ 2020-03-26 13:06 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Linux Memory Management List, Michal Hocko,
Jérôme Glisse, Christoph Hellwig
> On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
>
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
>
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
>
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
>
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
>
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
>
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.
This will generate a warning when running a simple qemu-kvm on arm64,
qemu-kvm (37712) created two mmu_notifier's with invalidate_range_end(): kvm_mmu_notifier_invalidate_range_end and kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers disabled
>
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
> mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
>
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
> EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
> a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
>
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
>
> drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
> virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
>
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
>
> Someday I expect to just fail the registration on this condition.
>
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
>
> Regards,
> Jason
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ef3973a5d34a94..f3aba7a970f576 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
> struct mmu_notifier_subscriptions {
> /* all mmu notifiers registered in this mm are queued in this list */
> struct hlist_head list;
> - bool has_itree;
> + u8 has_itree;
> + u8 no_blocking;
> /* to serialize the list modifications and hlist_unhashed */
> spinlock_t lock;
> unsigned long invalidate_seq;
> @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
> int ret = 0;
> int id;
>
> + if (unlikely(subscriptions->no_blocking &&
> + !mmu_notifier_range_blockable(range)))
> + return -EAGAIN;
> +
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
> const struct mmu_notifier_ops *ops = subscription->ops;
> @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> srcu_read_unlock(&srcu, id);
> }
>
> +/*
> + * Add a hlist subscription to the list. The list is kept sorted by the
> + * existence of ops->invalidate_range_end. If there is more than one
> + * invalidate_range_end in the list then this process can no longer support
> + * non-blocking invalidation.
> + *
> + * non-blocking invalidation is problematic as a requirement to block results in
> + * the invalidation being aborted, however due to the use of RCU we have no
> + * reliable way to ensure that every sueessful invalidate_range_start() results
> + * in a call to invalidate_range_end().
> + *
> + * Thus to support blocking only the last subscription in the list can have
> + * invalidate_range_end() set.
> + */
> +static void
> +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
> + struct mmu_notifier *subscription)
> +{
> + struct mmu_notifier *last = NULL;
> + struct mmu_notifier *itr;
> +
> + hlist_for_each_entry(itr, &subscriptions->list, hlist)
> + last = itr;
> +
> + if (last && last->ops->invalidate_range_end &&
> + subscription->ops->invalidate_range_end) {
> + subscriptions->no_blocking = true;
> + pr_warn_once(
> + "%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
> + current->comm, current->pid,
> + last->ops->invalidate_range_end,
> + subscription->ops->invalidate_range_end);
> + }
> + if (!last || !last->ops->invalidate_range_end)
> + subscriptions->no_blocking = false;
> +
> + if (last && subscription->ops->invalidate_range_end)
> + hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
> + else
> + hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
> +}
> +
> /*
> * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
> * write mode. A NULL mn signals the notifier is being registered for itree
> @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> subscription->users = 1;
>
> spin_lock(&mm->notifier_subscriptions->lock);
> - hlist_add_head_rcu(&subscription->hlist,
> - &mm->notifier_subscriptions->list);
> + mn_hist_add_subscription(mm->notifier_subscriptions,
> + subscription);
> spin_unlock(&mm->notifier_subscriptions->lock);
> } else
> mm->notifier_subscriptions->has_itree = true;
> --
> 2.25.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
2020-03-26 13:06 ` Qian Cai
@ 2020-03-26 14:56 ` Jason Gunthorpe
0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-26 14:56 UTC (permalink / raw)
To: Qian Cai
Cc: Linux Memory Management List, Michal Hocko,
Jérôme Glisse, Christoph Hellwig
On Thu, Mar 26, 2020 at 09:06:12AM -0400, Qian Cai wrote:
>
>
> > On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > Many users of the mmu_notifier invalidate_range callbacks maintain
> > locking/counters/etc on a paired basis and have long expected that
> > invalidate_range_start/end() are always paired.
> >
> > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > kvm->mmu_notifier_count which was incremented during start().
> >
> > The recent change to add non-blocking notifiers breaks this assumption
> > when multiple notifiers are present in the list. When EAGAIN is returned
> > from an invalidate_range_start() then no invalidate_range_ends() are
> > called, even if the subscription's start had previously been called.
> >
> > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > subset of the linked list representing the notifiers already called to
> > generate an invalidate_range_end() pairing.
> >
> > One case works correctly, if only one subscription requires
> > invalidate_range_end() and it is the last entry in the hlist. In this
> > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > to unwind.
> >
> > Keep the notifier hlist sorted so that notifiers that require
> > invalidate_range_end() are always last, and if two are added then disable
> > non-blocking invalidation for the mm.
> >
> > A warning is printed for this case, if in future we determine this never
> > happens then we can simply fail during registration when there are
> > unsupported combinations of notifiers.
>
> This will generate a warning when running a simple qemu-kvm on arm64,
>
> qemu-kvm (37712) created two mmu_notifier's with
> invalidate_range_end(): kvm_mmu_notifier_invalidate_range_end and
> kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers
> disabled
Thank you Qian, this is very valuable information
It seems this solution will not work if kvm is registering multiple
notifiers.
I will take the patch out of linux-next
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread