linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end
@ 2020-02-11 20:52 Jason Gunthorpe
  2020-02-11 21:28 ` Ralph Campbell
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-02-11 20:52 UTC (permalink / raw)
  To: linux-mm; +Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig

Many users of the mmu_notifier invalidate_range callbacks maintain
locking/counters/etc on a paired basis and have long expected that
invalidate_range_start/end() are always paired.

For instance kvm_mmu_notifier_invalidate_range_end() undoes
kvm->mmu_notifier_count which was incremented during start().

The recent change to add non-blocking notifiers breaks this assumption
when multiple notifiers are present in the list. When EAGAIN is returned
from an invalidate_range_start() then no invalidate_range_ends() are
called, even if the subscription's start had previously been called.

Unfortunately, due to the RCU list traversal we can't reliably generate a
subset of the linked list representing the notifiers already called to
generate an invalidate_range_end() pairing.

One case works correctly, if only one subscription requires
invalidate_range_end() and it is the last entry in the hlist. In this
case, when invalidate_range_start() returns -EAGAIN there will be nothing
to unwind.

Keep the notifier hlist sorted so that notifiers that require
invalidate_range_end() are always last, and if two are added then disable
non-blocking invalidation for the mm.

A warning is printed for this case, if in future we determine this never
happens then we can simply fail during registration when there are
unsupported combinations of notifiers.

Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/
v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/
* Abandon attempting to fix it by calling invalidate_range_end() during an
  EAGAIN start
* Just trivially ban multiple subscriptions
v3:
* Be more sophisticated, ban only multiple subscriptions if the result is
  a failure. Allows multiple subscriptions without invalidate_range_end
* Include a printk when this condition is hit (Michal)

At this point the rework Christoph requested during the first posting
is completed and there are now only 3 drivers using
invalidate_range_end():

drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,

While I think it is unlikely that any of these drivers will be used in
combination with each other, display a printk in hopes to check.

Someday I expect to just fail the registration on this condition.

I think this also addresses Michal's concern about a 'big hammer' as
it probably won't ever trigger now.

Regards,
Jason

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ef3973a5d34a94..f3aba7a970f576 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
 struct mmu_notifier_subscriptions {
 	/* all mmu notifiers registered in this mm are queued in this list */
 	struct hlist_head list;
-	bool has_itree;
+	u8 has_itree;
+	u8 no_blocking;
 	/* to serialize the list modifications and hlist_unhashed */
 	spinlock_t lock;
 	unsigned long invalidate_seq;
@@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
 	int ret = 0;
 	int id;
 
+	if (unlikely(subscriptions->no_blocking &&
+		     !mmu_notifier_range_blockable(range)))
+		return -EAGAIN;
+
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
 		const struct mmu_notifier_ops *ops = subscription->ops;
@@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
 	srcu_read_unlock(&srcu, id);
 }
 
+/*
+ * Add a hlist subscription to the list. The list is kept sorted by the
+ * existence of ops->invalidate_range_end. If there is more than one
+ * invalidate_range_end in the list then this process can no longer support
+ * non-blocking invalidation.
+ *
+ * non-blocking invalidation is problematic as a requirement to block results in
+ * the invalidation being aborted, however due to the use of RCU we have no
+ * reliable way to ensure that every sueessful invalidate_range_start() results
+ * in a call to invalidate_range_end().
+ *
+ * Thus to support blocking only the last subscription in the list can have
+ * invalidate_range_end() set.
+ */
+static void
+mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
+			 struct mmu_notifier *subscription)
+{
+	struct mmu_notifier *last = NULL;
+	struct mmu_notifier *itr;
+
+	hlist_for_each_entry(itr, &subscriptions->list, hlist)
+		last = itr;
+
+	if (last && last->ops->invalidate_range_end &&
+	    subscription->ops->invalidate_range_end) {
+		subscriptions->no_blocking = true;
+		pr_warn_once(
+			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
+			current->comm, current->pid,
+			last->ops->invalidate_range_end,
+			subscription->ops->invalidate_range_end);
+	}
+	if (!last || !last->ops->invalidate_range_end)
+		subscriptions->no_blocking = false;
+
+	if (last && subscription->ops->invalidate_range_end)
+		hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
+	else
+		hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
+}
+
 /*
  * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
  * write mode. A NULL mn signals the notifier is being registered for itree
@@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
 		subscription->users = 1;
 
 		spin_lock(&mm->notifier_subscriptions->lock);
-		hlist_add_head_rcu(&subscription->hlist,
-				   &mm->notifier_subscriptions->list);
+		mn_hist_add_subscription(mm->notifier_subscriptions,
+					 subscription);
 		spin_unlock(&mm->notifier_subscriptions->lock);
 	} else
 		mm->notifier_subscriptions->has_itree = true;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-26 14:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 20:52 [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end Jason Gunthorpe
2020-02-11 21:28 ` Ralph Campbell
2020-02-11 23:42   ` Jason Gunthorpe
2020-02-28 13:50 ` Jason Gunthorpe
2020-03-24 19:41   ` Jason Gunthorpe
2020-03-25  8:01     ` Michal Hocko
2020-03-25 12:14       ` Jason Gunthorpe
2020-03-25 13:06         ` Michal Hocko
2020-03-26 13:06 ` Qian Cai
2020-03-26 14:56   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).