Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
@ 2020-02-11 20:52 Jason Gunthorpe
  2020-02-11 21:28 ` Ralph Campbell
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-11 20:52 UTC (permalink / raw)
  To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig

Many users of the mmu_notifier invalidate_range callbacks maintain
locking/counters/etc on a paired basis and have long expected that
invalidate_range_start/end() are always paired.

For instance kvm_mmu_notifier_invalidate_range_end() undoes
kvm->mmu_notifier_count which was incremented during start().

The recent change to add non-blocking notifiers breaks this assumption
when multiple notifiers are present in the list. When EAGAIN is returned
from an invalidate_range_start() then no invalidate_range_ends() are
called, even if the subscription's start had previously been called.

Unfortunately, due to the RCU list traversal we can't reliably generate a
subset of the linked list representing the notifiers already called to
generate an invalidate_range_end() pairing.

One case works correctly, if only one subscription requires
invalidate_range_end() and it is the last entry in the hlist. In this
case, when invalidate_range_start() returns -EAGAIN there will be nothing
to unwind.

Keep the notifier hlist sorted so that notifiers that require
invalidate_range_end() are always last, and if two are added then disable
non-blocking invalidation for the mm.

A warning is printed for this case, if in future we determine this never
happens then we can simply fail during registration when there are
unsupported combinations of notifiers.

Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
* Abandon attempting to fix it by calling invalidate_range_end() during an
  EAGAIN start
* Just trivially ban multiple subscriptions
v3:
* Be more sophisticated, ban only multiple subscriptions if the result is
  a failure. Allows multiple subscriptions without invalidate_range_end
* Include a printk when this condition is hit (Michal)

At this point the rework Christoph requested during the first posting
is completed and there are now only 3 drivers using
invalidate_range_end():

drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,

While I think it is unlikely that any of these drivers will be used in
combination with each other, display a printk in hopes to check.

Someday I expect to just fail the registration on this condition.

I think this also addresses Michal's concern about a 'big hammer' as
it probably won't ever trigger now.

Regards,
Jason

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ef3973a5d34a94..f3aba7a970f576 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
 struct mmu_notifier_subscriptions {
 	/* all mmu notifiers registered in this mm are queued in this list */
 	struct hlist_head list;
-	bool has_itree;
+	u8 has_itree;
+	u8 no_blocking;
 	/* to serialize the list modifications and hlist_unhashed */
 	spinlock_t lock;
 	unsigned long invalidate_seq;
@@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
 	int ret = 0;
 	int id;
 
+	if (unlikely(subscriptions->no_blocking &&
+		     !mmu_notifier_range_blockable(range)))
+		return -EAGAIN;
+
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
 		const struct mmu_notifier_ops *ops = subscription->ops;
@@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
 	srcu_read_unlock(&srcu, id);
 }
 
+/*
+ * Add a hlist subscription to the list. The list is kept sorted by the
+ * existence of ops->invalidate_range_end. If there is more than one
+ * invalidate_range_end in the list then this process can no longer support
+ * non-blocking invalidation.
+ *
+ * non-blocking invalidation is problematic as a requirement to block results in
+ * the invalidation being aborted, however due to the use of RCU we have no
+ * reliable way to ensure that every sueessful invalidate_range_start() results
+ * in a call to invalidate_range_end().
+ *
+ * Thus to support blocking only the last subscription in the list can have
+ * invalidate_range_end() set.
+ */
+static void
+mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
+			 struct mmu_notifier *subscription)
+{
+	struct mmu_notifier *last = NULL;
+	struct mmu_notifier *itr;
+
+	hlist_for_each_entry(itr, &subscriptions->list, hlist)
+		last = itr;
+
+	if (last && last->ops->invalidate_range_end &&
+	    subscription->ops->invalidate_range_end) {
+		subscriptions->no_blocking = true;
+		pr_warn_once(
+			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
+			current->comm, current->pid,
+			last->ops->invalidate_range_end,
+			subscription->ops->invalidate_range_end);
+	}
+	if (!last || !last->ops->invalidate_range_end)
+		subscriptions->no_blocking = false;
+
+	if (last && subscription->ops->invalidate_range_end)
+		hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
+	else
+		hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
+}
+
 /*
  * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
  * write mode. A NULL mn signals the notifier is being registered for itree
@@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
 		subscription->users = 1;
 
 		spin_lock(&mm->notifier_subscriptions->lock);
-		hlist_add_head_rcu(&subscription->hlist,
-				   &mm->notifier_subscriptions->list);
+		mn_hist_add_subscription(mm->notifier_subscriptions,
+					 subscription);
 		spin_unlock(&mm->notifier_subscriptions->lock);
 	} else
 		mm->notifier_subscriptions->has_itree = true;
-- 
2.25.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
@ 2020-02-11 21:28 ` Ralph Campbell
  2020-02-11 23:42   ` Jason Gunthorpe
  2020-02-28 13:50 ` Jason Gunthorpe
  2020-03-26 13:06 ` Qian Cai
  2 siblings, 1 reply; 10+ messages in thread
From: Ralph Campbell @ 2020-02-11 21:28 UTC (permalink / raw)
  To: Jason Gunthorpe, linux-mm
  Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig


On 2/11/20 12:52 PM, Jason Gunthorpe wrote:
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
> 
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
> 
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
> 
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
> 
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
> 
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
> 
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.
> 
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>   mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 50 insertions(+), 3 deletions(-)
> 
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
>    EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
>    a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
> 
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
> 
> drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> 
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
> 
> Someday I expect to just fail the registration on this condition.
> 
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
> 
> Regards,
> Jason
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ef3973a5d34a94..f3aba7a970f576 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
>   struct mmu_notifier_subscriptions {
>   	/* all mmu notifiers registered in this mm are queued in this list */
>   	struct hlist_head list;
> -	bool has_itree;
> +	u8 has_itree;
> +	u8 no_blocking;
>   	/* to serialize the list modifications and hlist_unhashed */
>   	spinlock_t lock;
>   	unsigned long invalidate_seq;
> @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
>   	int ret = 0;
>   	int id;
>   
> +	if (unlikely(subscriptions->no_blocking &&
> +		     !mmu_notifier_range_blockable(range)))
> +		return -EAGAIN;
> +
>   	id = srcu_read_lock(&srcu);
>   	hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
>   		const struct mmu_notifier_ops *ops = subscription->ops;
> @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
>   	srcu_read_unlock(&srcu, id);
>   }
>   
> +/*
> + * Add a hlist subscription to the list. The list is kept sorted by the
> + * existence of ops->invalidate_range_end. If there is more than one
> + * invalidate_range_end in the list then this process can no longer support
> + * non-blocking invalidation.
> + *
> + * non-blocking invalidation is problematic as a requirement to block results in
> + * the invalidation being aborted, however due to the use of RCU we have no
> + * reliable way to ensure that every sueessful invalidate_range_start() results

s/sueessful/successful

> + * in a call to invalidate_range_end().
> + *
> + * Thus to support blocking only the last subscription in the list can have
> + * invalidate_range_end() set.
> + */
> +static void
> +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
> +			 struct mmu_notifier *subscription)

We have mn_hlist_xxx in a number of places in mmu_notifier.c.
Seems like this should be named mn_hlist_add_subscription().

> +{
> +	struct mmu_notifier *last = NULL;
> +	struct mmu_notifier *itr;
> +
> +	hlist_for_each_entry(itr, &subscriptions->list, hlist)
> +		last = itr;
> +
> +	if (last && last->ops->invalidate_range_end &&
> +	    subscription->ops->invalidate_range_end) {
> +		subscriptions->no_blocking = true;
> +		pr_warn_once(
> +			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",

line length?

> +			current->comm, current->pid,
> +			last->ops->invalidate_range_end,
> +			subscription->ops->invalidate_range_end);
> +	}
> +	if (!last || !last->ops->invalidate_range_end)
> +		subscriptions->no_blocking = false;
> +
> +	if (last && subscription->ops->invalidate_range_end)
> +		hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
> +	else
> +		hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
> +}
> +
>   /*
>    * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
>    * write mode. A NULL mn signals the notifier is being registered for itree
> @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
>   		subscription->users = 1;
>   
>   		spin_lock(&mm->notifier_subscriptions->lock);
> -		hlist_add_head_rcu(&subscription->hlist,
> -				   &mm->notifier_subscriptions->list);
> +		mn_hist_add_subscription(mm->notifier_subscriptions,
> +					 subscription);
>   		spin_unlock(&mm->notifier_subscriptions->lock);
>   	} else
>   		mm->notifier_subscriptions->has_itree = true;
> 

Other than some nits, looks good to me so you can add:
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-02-11 21:28 ` Ralph Campbell
@ 2020-02-11 23:42   ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-11 23:42 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-mm, Michal Hocko, Jérôme Glisse, Christoph Hellwig

On Tue, Feb 11, 2020 at 01:28:42PM -0800, Ralph Campbell wrote:
> > +/*
> > + * Add a hlist subscription to the list. The list is kept sorted by the
> > + * existence of ops->invalidate_range_end. If there is more than one
> > + * invalidate_range_end in the list then this process can no longer support
> > + * non-blocking invalidation.
> > + *
> > + * non-blocking invalidation is problematic as a requirement to block results in
> > + * the invalidation being aborted, however due to the use of RCU we have no
> > + * reliable way to ensure that every sueessful invalidate_range_start() results
> 
> s/sueessful/successful

woops, yes, two spellos, thanks

> > +{
> > +	struct mmu_notifier *last = NULL;
> > +	struct mmu_notifier *itr;
> > +
> > +	hlist_for_each_entry(itr, &subscriptions->list, hlist)
> > +		last = itr;
> > +
> > +	if (last && last->ops->invalidate_range_end &&
> > +	    subscription->ops->invalidate_range_end) {
> > +		subscriptions->no_blocking = true;
> > +		pr_warn_once(
> > +			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
> 
> line length?

Style guide is to keep strings across the 80 cols for grepability

> >   /*
> >    * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
> >    * write mode. A NULL mn signals the notifier is being registered for itree
> > @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> >   		subscription->users = 1;
> >   		spin_lock(&mm->notifier_subscriptions->lock);
> > -		hlist_add_head_rcu(&subscription->hlist,
> > -				   &mm->notifier_subscriptions->list);
> > +		mn_hist_add_subscription(mm->notifier_subscriptions,
> > +					 subscription);
> >   		spin_unlock(&mm->notifier_subscriptions->lock);
> >   	} else
> >   		mm->notifier_subscriptions->has_itree = true;
> > 
> 
> Other than some nits, looks good to me so you can add:
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

Great, thanks!

Jason


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
  2020-02-11 21:28 ` Ralph Campbell
@ 2020-02-28 13:50 ` Jason Gunthorpe
  2020-03-24 19:41   ` Jason Gunthorpe
  2020-03-26 13:06 ` Qian Cai
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-28 13:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig

On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
> 
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
> 
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
> 
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
> 
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
> 
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
> 
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.
> 
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>  mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 50 insertions(+), 3 deletions(-)
> 
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
>   EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
>   a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
> 
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
> 
> drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> 
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
> 
> Someday I expect to just fail the registration on this condition.
> 
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.

I'm going to put this in linux-next to see if there are any reports of
the pr_warn failing.

Michal, are you happy with this solution now?

Thanks,
Jason


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-02-28 13:50 ` Jason Gunthorpe
@ 2020-03-24 19:41   ` Jason Gunthorpe
  2020-03-25  8:01     ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-24 19:41 UTC (permalink / raw)
  To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig

On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> > Many users of the mmu_notifier invalidate_range callbacks maintain
> > locking/counters/etc on a paired basis and have long expected that
> > invalidate_range_start/end() are always paired.
> > 
> > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > kvm->mmu_notifier_count which was incremented during start().
> > 
> > The recent change to add non-blocking notifiers breaks this assumption
> > when multiple notifiers are present in the list. When EAGAIN is returned
> > from an invalidate_range_start() then no invalidate_range_ends() are
> > called, even if the subscription's start had previously been called.
> > 
> > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > subset of the linked list representing the notifiers already called to
> > generate an invalidate_range_end() pairing.
> > 
> > One case works correctly, if only one subscription requires
> > invalidate_range_end() and it is the last entry in the hlist. In this
> > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > to unwind.
> > 
> > Keep the notifier hlist sorted so that notifiers that require
> > invalidate_range_end() are always last, and if two are added then disable
> > non-blocking invalidation for the mm.
> > 
> > A warning is printed for this case, if in future we determine this never
> > happens then we can simply fail during registration when there are
> > unsupported combinations of notifiers.
> > 
> > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >  mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 50 insertions(+), 3 deletions(-)
> > 
> > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> > * Abandon attempting to fix it by calling invalidate_range_end() during an
> >   EAGAIN start
> > * Just trivially ban multiple subscriptions
> > v3:
> > * Be more sophisticated, ban only multiple subscriptions if the result is
> >   a failure. Allows multiple subscriptions without invalidate_range_end
> > * Include a printk when this condition is hit (Michal)
> > 
> > At this point the rework Christoph requested during the first posting
> > is completed and there are now only 3 drivers using
> > invalidate_range_end():
> > 
> > drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> > drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> > virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> > 
> > While I think it is unlikely that any of these drivers will be used in
> > combination with each other, display a printk in hopes to check.
> > 
> > Someday I expect to just fail the registration on this condition.
> > 
> > I think this also addresses Michal's concern about a 'big hammer' as
> > it probably won't ever trigger now.
> 
> I'm going to put this in linux-next to see if there are any reports of
> the pr_warn failing.
> 
> Michal, are you happy with this solution now?

It's been a month in linux-next now, with no complaints. If there are
no comments I will go ahead to send it in the hmm PR.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-03-24 19:41   ` Jason Gunthorpe
@ 2020-03-25  8:01     ` Michal Hocko
  2020-03-25 12:14       ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-25  8:01 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig

On Tue 24-03-20 16:41:37, Jason Gunthorpe wrote:
> On Fri, Feb 28, 2020 at 09:50:06AM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 11, 2020 at 04:52:52PM -0400, Jason Gunthorpe wrote:
> > > Many users of the mmu_notifier invalidate_range callbacks maintain
> > > locking/counters/etc on a paired basis and have long expected that
> > > invalidate_range_start/end() are always paired.
> > > 
> > > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > > kvm->mmu_notifier_count which was incremented during start().
> > > 
> > > The recent change to add non-blocking notifiers breaks this assumption
> > > when multiple notifiers are present in the list. When EAGAIN is returned
> > > from an invalidate_range_start() then no invalidate_range_ends() are
> > > called, even if the subscription's start had previously been called.
> > > 
> > > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > > subset of the linked list representing the notifiers already called to
> > > generate an invalidate_range_end() pairing.
> > > 
> > > One case works correctly, if only one subscription requires
> > > invalidate_range_end() and it is the last entry in the hlist. In this
> > > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > > to unwind.
> > > 
> > > Keep the notifier hlist sorted so that notifiers that require
> > > invalidate_range_end() are always last, and if two are added then disable
> > > non-blocking invalidation for the mm.
> > > 
> > > A warning is printed for this case, if in future we determine this never
> > > happens then we can simply fail during registration when there are
> > > unsupported combinations of notifiers.
> > > 
> > > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > >  mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 50 insertions(+), 3 deletions(-)
> > > 
> > > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> > > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> > > * Abandon attempting to fix it by calling invalidate_range_end() during an
> > >   EAGAIN start
> > > * Just trivially ban multiple subscriptions
> > > v3:
> > > * Be more sophisticated, ban only multiple subscriptions if the result is
> > >   a failure. Allows multiple subscriptions without invalidate_range_end
> > > * Include a printk when this condition is hit (Michal)
> > > 
> > > At this point the rework Christoph requested during the first posting
> > > is completed and there are now only 3 drivers using
> > > invalidate_range_end():
> > > 
> > > drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> > > drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> > > virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> > > 
> > > While I think it is unlikely that any of these drivers will be used in
> > > combination with each other, display a printk in hopes to check.
> > > 
> > > Someday I expect to just fail the registration on this condition.
> > > 
> > > I think this also addresses Michal's concern about a 'big hammer' as
> > > it probably won't ever trigger now.
> > 
> > I'm going to put this in linux-next to see if there are any reports of
> > the pr_warn failing.
> > 
> > Michal, are you happy with this solution now?
> 
> It's been a month in linux-next now, with no complaints. If there are
> no comments I will go ahead to send it in the hmm PR.

I will not block this but it still looks like a wrong approach. A more
robust solution would be to allow calling invalidate_range_end even for
the failing invalidate_start.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-03-25  8:01     ` Michal Hocko
@ 2020-03-25 12:14       ` Jason Gunthorpe
  2020-03-25 13:06         ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-25 12:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig

On Wed, Mar 25, 2020 at 09:01:17AM +0100, Michal Hocko wrote:
> > > I'm going to put this in linux-next to see if there are any reports of
> > > the pr_warn failing.
> > > 
> > > Michal, are you happy with this solution now?
> > 
> > It's been a month in linux-next now, with no complaints. If there are
> > no comments I will go ahead to send it in the hmm PR.
> 
> I will not block this but it still looks like a wrong approach. A more
> robust solution would be to allow calling invalidate_range_end even for
> the failing invalidate_start.

That requires reliably walking a rcu list backwards under rcu. I don't
have a good feeling about that algorithm. Do you know of a solution?

Since we don't actually have any users that care about this any more,
I felt this testable solution was better.

Jason


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-03-25 12:14       ` Jason Gunthorpe
@ 2020-03-25 13:06         ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2020-03-25 13:06 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, Christoph Hellwig

On Wed 25-03-20 09:14:07, Jason Gunthorpe wrote:
> On Wed, Mar 25, 2020 at 09:01:17AM +0100, Michal Hocko wrote:
> > > > I'm going to put this in linux-next to see if there are any reports of
> > > > the pr_warn failing.
> > > > 
> > > > Michal, are you happy with this solution now?
> > > 
> > > It's been a month in linux-next now, with no complaints. If there are
> > > no comments I will go ahead to send it in the hmm PR.
> > 
> > I will not block this but it still looks like a wrong approach. A more
> > robust solution would be to allow calling invalidate_range_end even for
> > the failing invalidate_start.
> 
> That requires reliably walking a rcu list backwards under rcu. I don't
> have a good feeling about that algorithm. Do you know of a solution?

No, not really. I would be pushing for it if I had. Maybe we want a
different data structure than the RCU list. But...

> Since we don't actually have any users that care about this any more,
> I felt this testable solution was better.

... you are right that this is probably the best short term workaround.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
  2020-02-11 21:28 ` Ralph Campbell
  2020-02-28 13:50 ` Jason Gunthorpe
@ 2020-03-26 13:06 ` Qian Cai
  2020-03-26 14:56   ` Jason Gunthorpe
  2 siblings, 1 reply; 10+ messages in thread
From: Qian Cai @ 2020-03-26 13:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Linux Memory Management List, Michal Hocko,
	Jérôme Glisse, Christoph Hellwig



> On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> 
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
> 
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
> 
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
> 
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
> 
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
> 
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
> 
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.

This will generate a warning when running a simple qemu-kvm on arm64,

qemu-kvm (37712) created two mmu_notifier's with invalidate_range_end(): kvm_mmu_notifier_invalidate_range_end and kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers disabled

> 
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
> mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
> 
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
>  EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
>  a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
> 
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
> 
> drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> 
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
> 
> Someday I expect to just fail the registration on this condition.
> 
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
> 
> Regards,
> Jason
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ef3973a5d34a94..f3aba7a970f576 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
> struct mmu_notifier_subscriptions {
> 	/* all mmu notifiers registered in this mm are queued in this list */
> 	struct hlist_head list;
> -	bool has_itree;
> +	u8 has_itree;
> +	u8 no_blocking;
> 	/* to serialize the list modifications and hlist_unhashed */
> 	spinlock_t lock;
> 	unsigned long invalidate_seq;
> @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
> 	int ret = 0;
> 	int id;
> 
> +	if (unlikely(subscriptions->no_blocking &&
> +		     !mmu_notifier_range_blockable(range)))
> +		return -EAGAIN;
> +
> 	id = srcu_read_lock(&srcu);
> 	hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
> 		const struct mmu_notifier_ops *ops = subscription->ops;
> @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> 	srcu_read_unlock(&srcu, id);
> }
> 
> +/*
> + * Add a hlist subscription to the list. The list is kept sorted by the
> + * existence of ops->invalidate_range_end. If there is more than one
> + * invalidate_range_end in the list then this process can no longer support
> + * non-blocking invalidation.
> + *
> + * non-blocking invalidation is problematic as a requirement to block results in
> + * the invalidation being aborted, however due to the use of RCU we have no
> + * reliable way to ensure that every sueessful invalidate_range_start() results
> + * in a call to invalidate_range_end().
> + *
> + * Thus to support blocking only the last subscription in the list can have
> + * invalidate_range_end() set.
> + */
> +static void
> +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
> +			 struct mmu_notifier *subscription)
> +{
> +	struct mmu_notifier *last = NULL;
> +	struct mmu_notifier *itr;
> +
> +	hlist_for_each_entry(itr, &subscriptions->list, hlist)
> +		last = itr;
> +
> +	if (last && last->ops->invalidate_range_end &&
> +	    subscription->ops->invalidate_range_end) {
> +		subscriptions->no_blocking = true;
> +		pr_warn_once(
> +			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
> +			current->comm, current->pid,
> +			last->ops->invalidate_range_end,
> +			subscription->ops->invalidate_range_end);
> +	}
> +	if (!last || !last->ops->invalidate_range_end)
> +		subscriptions->no_blocking = false;
> +
> +	if (last && subscription->ops->invalidate_range_end)
> +		hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
> +	else
> +		hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
> +}
> +
> /*
>  * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
>  * write mode. A NULL mn signals the notifier is being registered for itree
> @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> 		subscription->users = 1;
> 
> 		spin_lock(&mm->notifier_subscriptions->lock);
> -		hlist_add_head_rcu(&subscription->hlist,
> -				   &mm->notifier_subscriptions->list);
> +		mn_hist_add_subscription(mm->notifier_subscriptions,
> +					 subscription);
> 		spin_unlock(&mm->notifier_subscriptions->lock);
> 	} else
> 		mm->notifier_subscriptions->has_itree = true;
> -- 
> 2.25.0
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
  2020-03-26 13:06 ` Qian Cai
@ 2020-03-26 14:56   ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-03-26 14:56 UTC (permalink / raw)
  To: Qian Cai
  Cc: Linux Memory Management List, Michal Hocko,
	Jérôme Glisse, Christoph Hellwig

On Thu, Mar 26, 2020 at 09:06:12AM -0400, Qian Cai wrote:
> 
> 
> > On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> > 
> > Many users of the mmu_notifier invalidate_range callbacks maintain
> > locking/counters/etc on a paired basis and have long expected that
> > invalidate_range_start/end() are always paired.
> > 
> > For instance kvm_mmu_notifier_invalidate_range_end() undoes
> > kvm->mmu_notifier_count which was incremented during start().
> > 
> > The recent change to add non-blocking notifiers breaks this assumption
> > when multiple notifiers are present in the list. When EAGAIN is returned
> > from an invalidate_range_start() then no invalidate_range_ends() are
> > called, even if the subscription's start had previously been called.
> > 
> > Unfortunately, due to the RCU list traversal we can't reliably generate a
> > subset of the linked list representing the notifiers already called to
> > generate an invalidate_range_end() pairing.
> > 
> > One case works correctly, if only one subscription requires
> > invalidate_range_end() and it is the last entry in the hlist. In this
> > case, when invalidate_range_start() returns -EAGAIN there will be nothing
> > to unwind.
> > 
> > Keep the notifier hlist sorted so that notifiers that require
> > invalidate_range_end() are always last, and if two are added then disable
> > non-blocking invalidation for the mm.
> > 
> > A warning is printed for this case, if in future we determine this never
> > happens then we can simply fail during registration when there are
> > unsupported combinations of notifiers.
> 
> This will generate a warning when running a simple qemu-kvm on arm64,
> 
> qemu-kvm (37712) created two mmu_notifier's with
> invalidate_range_end(): kvm_mmu_notifier_invalidate_range_end and
> kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers
> disabled

Thank you Qian, this is very valuable information

It seems this solution will not work if kvm is registering multiple
notifiers.

I will take the patch out of linux-next

Jason


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
2020-02-11 21:28 ` Ralph Campbell
2020-02-11 23:42   ` Jason Gunthorpe
2020-02-28 13:50 ` Jason Gunthorpe
2020-03-24 19:41   ` Jason Gunthorpe
2020-03-25  8:01     ` Michal Hocko
2020-03-25 12:14       ` Jason Gunthorpe
2020-03-25 13:06         ` Michal Hocko
2020-03-26 13:06 ` Qian Cai
2020-03-26 14:56   ` Jason Gunthorpe

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git