Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
@ 2019-08-07 19:16 Jason Gunthorpe
  2019-08-08  7:00 ` Christoph Hellwig
  2019-08-08  8:18 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 19:16 UTC (permalink / raw)
  To: linux-mm, Michal Hocko
  Cc: Jérôme Glisse, Andrea Arcangeli, Christoph Hellwig

Many users of the mmu_notifier invalidate_range callbacks maintain
locking/counters/etc on a paired basis and have long expected that
invalidate_range start/end are always paired.

The recent change to add non-blocking notifiers breaks this assumption
when multiple notifiers are present in the list as an EAGAIN return from a
later notifier causes all earlier notifiers to get their
invalidate_range_end() skipped.

During the development of non-blocking each user was audited to be sure
they can skip their invalidate_range_end() if their start returns -EAGAIN,
so the only place that has a problem is when there are multiple
subscriptions.

Due to the RCU locking we can't reliably generate a subset of the linked
list representing the notifiers already called, and generate an
invalidate_range_end() pairing.

Rather than design an elaborate fix, for now, just block non-blocking
requests early on if there are multiple subscriptions.

Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 include/linux/mmu_notifier.h |  1 +
 mm/mmu_notifier.c            | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

HCH suggested to make the locking common so we don't need to have an
invalidate_range_end, but that is a longer journey.

Here is a simpler stop-gap for this bug. What do you think Michal?
I don't have a good way to test this flow ..

This lightly clashes with the other mmu notififer series I just sent,
so it should go to either -rc or hmm.git

Thanks,
Jason

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b6c004bd9f6ad9..170fa2c65d659c 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -53,6 +53,7 @@ struct mmu_notifier_mm {
 	struct hlist_head list;
 	/* to serialize the list modifications and hlist_unhashed */
 	spinlock_t lock;
+	bool multiple_subscriptions;
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index b5670620aea0fc..4e56f75c560242 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -171,6 +171,19 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 	int ret = 0;
 	int id;
 
+	/*
+	 * If there is more than one notififer subscribed to this mm then we
+	 * cannot support the EAGAIN return. invalidate_range_start/end() must
+	 * always be paired unless start returns -EAGAIN. When we return
+	 * -EAGAIN from here the caller will skip all invalidate_range_end()
+	 * calls. However, if there is more than one notififer then some
+	 * notifiers may have had a successful invalidate_range_start() -
+	 * causing imbalance when the end is skipped.
+	 */
+	if (!mmu_notifier_range_blockable(range) &&
+	    range->mm->mmu_notifier_mm->multiple_subscriptions)
+		return -EAGAIN;
+
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_start) {
@@ -274,6 +287,8 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
 	 * thanks to mm_take_all_locks().
 	 */
 	spin_lock(&mm->mmu_notifier_mm->lock);
+	mm->mmu_notifier_mm->multiple_subscriptions =
+		!hlist_empty(&mm->mmu_notifier_mm->list);
 	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 
-- 
2.22.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
  2019-08-07 19:16 [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking Jason Gunthorpe
@ 2019-08-08  7:00 ` Christoph Hellwig
  2019-08-08  8:18 ` Michal Hocko
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2019-08-08  7:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, Michal Hocko, Jérôme Glisse,
	Andrea Arcangeli, Christoph Hellwig

This looks like a pretty big hammer, but probably about the best we can
do until locking moves to common code..


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
  2019-08-07 19:16 [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking Jason Gunthorpe
  2019-08-08  7:00 ` Christoph Hellwig
@ 2019-08-08  8:18 ` Michal Hocko
  2019-08-08 12:04   ` Jason Gunthorpe
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-08-08  8:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, Jérôme Glisse, Andrea Arcangeli, Christoph Hellwig

On Wed 07-08-19 19:16:32, Jason Gunthorpe wrote:
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range start/end are always paired.
> 
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list as an EAGAIN return from a
> later notifier causes all earlier notifiers to get their
> invalidate_range_end() skipped.
> 
> During the development of non-blocking each user was audited to be sure
> they can skip their invalidate_range_end() if their start returns -EAGAIN,
> so the only place that has a problem is when there are multiple
> subscriptions.
> 
> Due to the RCU locking we can't reliably generate a subset of the linked
> list representing the notifiers already called, and generate an
> invalidate_range_end() pairing.
> 
> Rather than design an elaborate fix, for now, just block non-blocking
> requests early on if there are multiple subscriptions.

Which means that the oom path cannot really release any memory for
ranges covered by these notifiers which is really unfortunate because
that might cover a lot of memory. Especially when the particular range
might not be tracked at all, right?

So I cannot really say I am happy with this much. People will simply
start complaining that the OOM killer has killed more victims because
the first one hasn't really released its memory during the tear down.

If a different fix is indeed too elaborate then make sure to let users
known that there is a restriction in place and dump something useful
into the kernel log.

> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  include/linux/mmu_notifier.h |  1 +
>  mm/mmu_notifier.c            | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> HCH suggested to make the locking common so we don't need to have an
> invalidate_range_end, but that is a longer journey.
> 
> Here is a simpler stop-gap for this bug. What do you think Michal?
> I don't have a good way to test this flow ..

Testing is quite simple if you have any blocking notifiers at hand.
Simple trigger an OOM condition, mark a process which uses notifier to
be a prime oom candidate by
echo 1000 > /proc/<pid>/oom_score_adj

and see how it behaves.

> This lightly clashes with the other mmu notififer series I just sent,
> so it should go to either -rc or hmm.git
> 
> Thanks,
> Jason
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index b6c004bd9f6ad9..170fa2c65d659c 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -53,6 +53,7 @@ struct mmu_notifier_mm {
>  	struct hlist_head list;
>  	/* to serialize the list modifications and hlist_unhashed */
>  	spinlock_t lock;
> +	bool multiple_subscriptions;
>  };
>  
>  #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index b5670620aea0fc..4e56f75c560242 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -171,6 +171,19 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
>  	int ret = 0;
>  	int id;
>  
> +	/*
> +	 * If there is more than one notififer subscribed to this mm then we
> +	 * cannot support the EAGAIN return. invalidate_range_start/end() must
> +	 * always be paired unless start returns -EAGAIN. When we return
> +	 * -EAGAIN from here the caller will skip all invalidate_range_end()
> +	 * calls. However, if there is more than one notififer then some
> +	 * notifiers may have had a successful invalidate_range_start() -
> +	 * causing imbalance when the end is skipped.
> +	 */
> +	if (!mmu_notifier_range_blockable(range) &&
> +	    range->mm->mmu_notifier_mm->multiple_subscriptions)
> +		return -EAGAIN;
> +
>  	id = srcu_read_lock(&srcu);
>  	hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
>  		if (mn->ops->invalidate_range_start) {
> @@ -274,6 +287,8 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
>  	 * thanks to mm_take_all_locks().
>  	 */
>  	spin_lock(&mm->mmu_notifier_mm->lock);
> +	mm->mmu_notifier_mm->multiple_subscriptions =
> +		!hlist_empty(&mm->mmu_notifier_mm->list);
>  	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
>  	spin_unlock(&mm->mmu_notifier_mm->lock);
>  
> -- 
> 2.22.0
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
  2019-08-08  8:18 ` Michal Hocko
@ 2019-08-08 12:04   ` Jason Gunthorpe
  2019-08-08 12:13     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2019-08-08 12:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Jérôme Glisse, Andrea Arcangeli, Christoph Hellwig

On Thu, Aug 08, 2019 at 10:18:27AM +0200, Michal Hocko wrote:
> On Wed 07-08-19 19:16:32, Jason Gunthorpe wrote:
> > Many users of the mmu_notifier invalidate_range callbacks maintain
> > locking/counters/etc on a paired basis and have long expected that
> > invalidate_range start/end are always paired.
> > 
> > The recent change to add non-blocking notifiers breaks this assumption
> > when multiple notifiers are present in the list as an EAGAIN return from a
> > later notifier causes all earlier notifiers to get their
> > invalidate_range_end() skipped.
> > 
> > During the development of non-blocking each user was audited to be sure
> > they can skip their invalidate_range_end() if their start returns -EAGAIN,
> > so the only place that has a problem is when there are multiple
> > subscriptions.
> > 
> > Due to the RCU locking we can't reliably generate a subset of the linked
> > list representing the notifiers already called, and generate an
> > invalidate_range_end() pairing.
> > 
> > Rather than design an elaborate fix, for now, just block non-blocking
> > requests early on if there are multiple subscriptions.
> 
> Which means that the oom path cannot really release any memory for
> ranges covered by these notifiers which is really unfortunate because
> that might cover a lot of memory. Especially when the particular range
> might not be tracked at all, right?

Yes, it is a very big hammer to avoid a bug where the locking schemes
get corrupted and the impacted drivers deadlock.

If you really don't like it then we have to push ahead on either an
rcu-safe undo algorithm or some locking thing. I've been looking at
the locking thing, so we can wait a bit more and see. 

At least it doesn't seem urgent right now as nobody is reporting
hitting this bug, but we are moving toward cases where a process will
have 4 notififers (amdgpu kfd, hmm, amd iommu, RDMA ODP), so the
chance is higher

> If a different fix is indeed too elaborate then make sure to let users
> known that there is a restriction in place and dump something useful
> into the kernel log.

The 'simple' alternative I see is to use a rcu safe undo algorithm,
such as sorting the hlist. This is not so much code, but it is tricky
stuff.

Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
  2019-08-08 12:04   ` Jason Gunthorpe
@ 2019-08-08 12:13     ` Michal Hocko
  2019-10-24 14:15       ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-08-08 12:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, Jérôme Glisse, Andrea Arcangeli, Christoph Hellwig

On Thu 08-08-19 12:04:07, Jason Gunthorpe wrote:
> On Thu, Aug 08, 2019 at 10:18:27AM +0200, Michal Hocko wrote:
> > On Wed 07-08-19 19:16:32, Jason Gunthorpe wrote:
> > > Many users of the mmu_notifier invalidate_range callbacks maintain
> > > locking/counters/etc on a paired basis and have long expected that
> > > invalidate_range start/end are always paired.
> > > 
> > > The recent change to add non-blocking notifiers breaks this assumption
> > > when multiple notifiers are present in the list as an EAGAIN return from a
> > > later notifier causes all earlier notifiers to get their
> > > invalidate_range_end() skipped.
> > > 
> > > During the development of non-blocking each user was audited to be sure
> > > they can skip their invalidate_range_end() if their start returns -EAGAIN,
> > > so the only place that has a problem is when there are multiple
> > > subscriptions.
> > > 
> > > Due to the RCU locking we can't reliably generate a subset of the linked
> > > list representing the notifiers already called, and generate an
> > > invalidate_range_end() pairing.
> > > 
> > > Rather than design an elaborate fix, for now, just block non-blocking
> > > requests early on if there are multiple subscriptions.
> > 
> > Which means that the oom path cannot really release any memory for
> > ranges covered by these notifiers which is really unfortunate because
> > that might cover a lot of memory. Especially when the particular range
> > might not be tracked at all, right?
> 
> Yes, it is a very big hammer to avoid a bug where the locking schemes
> get corrupted and the impacted drivers deadlock.
> 
> If you really don't like it then we have to push ahead on either an
> rcu-safe undo algorithm or some locking thing. I've been looking at
> the locking thing, so we can wait a bit more and see. 

Well, I do not like it but I understand that an over reaction for OOM is
much less of a pain than a deadlock or similar misbehavior. So go ahead
with this as a stop gap with Cc: stable but please let's do not stop
there and let's come up with something of a less hamery kind.

That being said, feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
with a printk_once to explain what is going on and a TODO note that this
is just a stop gap.

Thanks!
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking
  2019-08-08 12:13     ` Michal Hocko
@ 2019-10-24 14:15       ` Jason Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2019-10-24 14:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Jérôme Glisse, Andrea Arcangeli, Christoph Hellwig

On Thu, Aug 08, 2019 at 02:13:09PM +0200, Michal Hocko wrote:
> On Thu 08-08-19 12:04:07, Jason Gunthorpe wrote:
> > On Thu, Aug 08, 2019 at 10:18:27AM +0200, Michal Hocko wrote:
> > > On Wed 07-08-19 19:16:32, Jason Gunthorpe wrote:
> > > > Many users of the mmu_notifier invalidate_range callbacks maintain
> > > > locking/counters/etc on a paired basis and have long expected that
> > > > invalidate_range start/end are always paired.
> > > > 
> > > > The recent change to add non-blocking notifiers breaks this assumption
> > > > when multiple notifiers are present in the list as an EAGAIN return from a
> > > > later notifier causes all earlier notifiers to get their
> > > > invalidate_range_end() skipped.
> > > > 
> > > > During the development of non-blocking each user was audited to be sure
> > > > they can skip their invalidate_range_end() if their start returns -EAGAIN,
> > > > so the only place that has a problem is when there are multiple
> > > > subscriptions.
> > > > 
> > > > Due to the RCU locking we can't reliably generate a subset of the linked
> > > > list representing the notifiers already called, and generate an
> > > > invalidate_range_end() pairing.
> > > > 
> > > > Rather than design an elaborate fix, for now, just block non-blocking
> > > > requests early on if there are multiple subscriptions.
> > > 
> > > Which means that the oom path cannot really release any memory for
> > > ranges covered by these notifiers which is really unfortunate because
> > > that might cover a lot of memory. Especially when the particular range
> > > might not be tracked at all, right?
> > 
> > Yes, it is a very big hammer to avoid a bug where the locking schemes
> > get corrupted and the impacted drivers deadlock.
> > 
> > If you really don't like it then we have to push ahead on either an
> > rcu-safe undo algorithm or some locking thing. I've been looking at
> > the locking thing, so we can wait a bit more and see. 
> 
> Well, I do not like it but I understand that an over reaction for OOM is
> much less of a pain than a deadlock or similar misbehavior. So go ahead
> with this as a stop gap with Cc: stable but please let's do not stop
> there and let's come up with something of a less hamery kind.
> 
> That being said, feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>
> with a printk_once to explain what is going on and a TODO note that this
> is just a stop gap.

I didn't resend this pending how the mmu notifiers rework would look.

With this patch:

https://patchwork.kernel.org/patch/11191423/

Users of the new mmu_range_notifiers can safely share and handling
!blocking failures. They also reliably limit their influence for OOM
to a specific VA range without taking blocking locks, as desired.

I intend to resend this patch, with the warning, with the thinking
that all the cases involving sharing notifiers are likely to have been
moved to the mmu_range scheme.

Does this seem reasonable? Would you look through the above?

Thanks,
Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-07 19:16 [PATCH] mm/mmn: prevent unpaired invalidate_start and invalidate_end with non-blocking Jason Gunthorpe
2019-08-08  7:00 ` Christoph Hellwig
2019-08-08  8:18 ` Michal Hocko
2019-08-08 12:04   ` Jason Gunthorpe
2019-08-08 12:13     ` Michal Hocko
2019-10-24 14:15       ` Jason Gunthorpe

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git