linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
@ 2020-06-24  8:02 Chris Wilson
  2020-06-24  8:02 ` [PATCH 2/2] drm/i915/gem: Use mmu_notifier_range_mayfail() to avoid waiting inside reclaim Chris Wilson
  2020-06-24 12:10 ` [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Jason Gunthorpe
  0 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2020-06-24  8:02 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, intel-gfx, Chris Wilson, Andrew Morton, Jason Gunthorpe

When direct reclaim enters the shrinker and tries to reclaim pages, it
has to opportunitically unmap them [try_to_unmap_one]. For direct
reclaim, the calling context is unknown and may include attempts to
unmap one page of a dma object while attempting to allocate more pages
for that object. Pass the information along that we are inside an
opportunistic unmap that can allow that page to remain referenced and
mapped, and let the callback opt in to avoiding a recursive wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
CC: Jason Gunthorpe <jgg@ziepe.ca>
---
 include/linux/mmu_notifier.h | 15 ++++++++++++++-
 mm/mmu_notifier.c            |  3 +++
 mm/rmap.c                    |  5 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index fc68f3570e19..ee1ad008951c 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -48,7 +48,8 @@ enum mmu_notifier_event {
 	MMU_NOTIFY_RELEASE,
 };
 
-#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+#define MMU_NOTIFIER_RANGE_BLOCKABLE	BIT(0)
+#define MMU_NOTIFIER_RANGE_MAYFAIL	BIT(1)
 
 struct mmu_notifier_ops {
 	/*
@@ -169,6 +170,12 @@ struct mmu_notifier_ops {
 	 * a non-blocking behavior then the same applies to
 	 * invalidate_range_end.
 	 *
+	 * If mayfail is set then the callback may return -EAGAIN while still
+	 * holding its page references. This flag is set inside direct
+	 * reclaim paths that are opportunistically trying to unmap pages
+	 * from unknown contexts. The callback must be prepared to handle
+	 * the matching invalidate_range_end even after failing the
+	 * invalidate_range_start.
 	 */
 	int (*invalidate_range_start)(struct mmu_notifier *subscription,
 				      const struct mmu_notifier_range *range);
@@ -397,6 +404,12 @@ mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
 	return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
 }
 
+static inline bool
+mmu_notifier_range_mayfail(const struct mmu_notifier_range *range)
+{
+	return (range->flags & MMU_NOTIFIER_RANGE_MAYFAIL);
+}
+
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
 	if (mm_has_notifiers(mm))
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 352bb9f3ecc0..95b89cee7af4 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -493,6 +493,9 @@ static int mn_hlist_invalidate_range_start(
 			_ret = ops->invalidate_range_start(subscription, range);
 			if (!mmu_notifier_range_blockable(range))
 				non_block_end();
+			if (_ret == -EAGAIN &&
+			    mmu_notifier_range_mayfail(range))
+				_ret = 0;
 			if (_ret) {
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 					ops->invalidate_range_start, _ret,
diff --git a/mm/rmap.c b/mm/rmap.c
index 5fe2dedce1fc..912b737a3353 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1406,8 +1406,9 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-				address,
+	mmu_notifier_range_init(&range,
+				MMU_NOTIFY_CLEAR, MMU_NOTIFIER_RANGE_MAYFAIL,
+				vma, vma->vm_mm, address,
 				min(vma->vm_end, address + page_size(page)));
 	if (PageHuge(page)) {
 		/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] drm/i915/gem: Use mmu_notifier_range_mayfail() to avoid waiting inside reclaim
  2020-06-24  8:02 [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Chris Wilson
@ 2020-06-24  8:02 ` Chris Wilson
  2020-06-24 12:10 ` [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-06-24  8:02 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, intel-gfx, Chris Wilson

The direct reclaim path may trigger the mmu_notifier callback as part of
try_to_unmap_one. As this is purely an opportunitistic attempt to
reclaim pages, and will be called from any allocation context under
unknown conditions (that include attempting to allocate pages for the
userptr object itself and subsequently trying to reclaim parts of the
partially acquired object) we have to be careful never to wait on
anything being held by the calling context. Since that is unknown, we
have to avoid waiting from inside direct reclaim.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 9c53eb883400..72cfb91230ea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -103,6 +103,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 	it = interval_tree_iter_first(&mn->objects, range->start, end);
 	while (it) {
 		struct drm_i915_gem_object *obj;
+		unsigned int flags;
 
 		if (!mmu_notifier_range_blockable(range)) {
 			ret = -EAGAIN;
@@ -126,9 +127,12 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 		}
 		spin_unlock(&mn->lock);
 
-		ret = i915_gem_object_unbind(obj,
-					     I915_GEM_OBJECT_UNBIND_ACTIVE |
-					     I915_GEM_OBJECT_UNBIND_BARRIER);
+		flags = (I915_GEM_OBJECT_UNBIND_ACTIVE |
+			 I915_GEM_OBJECT_UNBIND_BARRIER);
+		if (mmu_notifier_range_mayfail(range))
+			flags = 0;
+
+		ret = i915_gem_object_unbind(obj, flags);
 		if (ret == 0)
 			ret = __i915_gem_object_put_pages(obj);
 		i915_gem_object_put(obj);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24  8:02 [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Chris Wilson
  2020-06-24  8:02 ` [PATCH 2/2] drm/i915/gem: Use mmu_notifier_range_mayfail() to avoid waiting inside reclaim Chris Wilson
@ 2020-06-24 12:10 ` Jason Gunthorpe
  2020-06-24 12:21   ` Chris Wilson
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 12:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> When direct reclaim enters the shrinker and tries to reclaim pages, it
> has to opportunitically unmap them [try_to_unmap_one]. For direct
> reclaim, the calling context is unknown and may include attempts to
> unmap one page of a dma object while attempting to allocate more pages
> for that object. Pass the information along that we are inside an
> opportunistic unmap that can allow that page to remain referenced and
> mapped, and let the callback opt in to avoiding a recursive wait.

i915 should already not be holding locks shared with the notifiers
across allocations that can trigger reclaim. This is already required
to use notifiers correctly anyhow - why do we need something in the
notifiers?

I really don't like this patch, the purpose of notifiers is only to
*track changes* not to influence policy of the callers.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 12:10 ` [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Jason Gunthorpe
@ 2020-06-24 12:21   ` Chris Wilson
  2020-06-24 12:39     ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-06-24 12:21 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > reclaim, the calling context is unknown and may include attempts to
> > unmap one page of a dma object while attempting to allocate more pages
> > for that object. Pass the information along that we are inside an
> > opportunistic unmap that can allow that page to remain referenced and
> > mapped, and let the callback opt in to avoiding a recursive wait.
> 
> i915 should already not be holding locks shared with the notifiers
> across allocations that can trigger reclaim. This is already required
> to use notifiers correctly anyhow - why do we need something in the
> notifiers?

for (n = 0; n < num_pages; n++)
	pin_user_page()

may call try_to_unmap_page from the lru shrinker for [0, n-1].

We're in the middle of allocating the object, how are we best to untangle
that?

Or during allocation of something that is using the pages pinned by that
object, how are we best to not to shrink that object as there is a
mutual dependency?
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 12:21   ` Chris Wilson
@ 2020-06-24 12:39     ` Jason Gunthorpe
  2020-06-24 14:12       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 12:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > reclaim, the calling context is unknown and may include attempts to
> > > unmap one page of a dma object while attempting to allocate more pages
> > > for that object. Pass the information along that we are inside an
> > > opportunistic unmap that can allow that page to remain referenced and
> > > mapped, and let the callback opt in to avoiding a recursive wait.
> > 
> > i915 should already not be holding locks shared with the notifiers
> > across allocations that can trigger reclaim. This is already required
> > to use notifiers correctly anyhow - why do we need something in the
> > notifiers?
> 
> for (n = 0; n < num_pages; n++)
> 	pin_user_page()
> 
> may call try_to_unmap_page from the lru shrinker for [0, n-1].

Yes, of course you can't hold any locks that intersect with notifiers
across pin_user_page()/get_user_page()

It has always been that way.

I consolidated all this tricky locking into interval notifiers, maybe
updating i915 to use them will give it a solution. I looked at it
once, it was straightforward enough until it got to all the #ifdefery

> We're in the middle of allocating the object, how are we best to untangle
> that?

I don't know anything about i915, but this is clearly i915 not using
notifiers properly, it needs proper fixing, not hacking up notifiers.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 12:39     ` Jason Gunthorpe
@ 2020-06-24 14:12       ` Chris Wilson
  2020-06-24 14:16         ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-06-24 14:12 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > reclaim, the calling context is unknown and may include attempts to
> > > > unmap one page of a dma object while attempting to allocate more pages
> > > > for that object. Pass the information along that we are inside an
> > > > opportunistic unmap that can allow that page to remain referenced and
> > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > 
> > > i915 should already not be holding locks shared with the notifiers
> > > across allocations that can trigger reclaim. This is already required
> > > to use notifiers correctly anyhow - why do we need something in the
> > > notifiers?
> > 
> > for (n = 0; n < num_pages; n++)
> >       pin_user_page()
> > 
> > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> 
> Yes, of course you can't hold any locks that intersect with notifiers
> across pin_user_page()/get_user_page()

What lock though? It's just the page refcount, shrinker asks us to drop
it [via mmu], we reply we would like to keep using that page as freeing
it for the current allocation is "robbing Peter to pay Paul".
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 14:12       ` Chris Wilson
@ 2020-06-24 14:16         ` Jason Gunthorpe
  2020-06-24 14:21           ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 14:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > for that object. Pass the information along that we are inside an
> > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > 
> > > > i915 should already not be holding locks shared with the notifiers
> > > > across allocations that can trigger reclaim. This is already required
> > > > to use notifiers correctly anyhow - why do we need something in the
> > > > notifiers?
> > > 
> > > for (n = 0; n < num_pages; n++)
> > >       pin_user_page()
> > > 
> > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > 
> > Yes, of course you can't hold any locks that intersect with notifiers
> > across pin_user_page()/get_user_page()
> 
> What lock though? It's just the page refcount, shrinker asks us to drop
> it [via mmu], we reply we would like to keep using that page as freeing
> it for the current allocation is "robbing Peter to pay Paul".

Maybe I'm unclear what this series is actually trying to fix? 

You said "avoiding a recursive wait" which sounds like some locking
deadlock to me.

Still, again, notifiers are for tracking, not for influencing MM policy.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 14:16         ` Jason Gunthorpe
@ 2020-06-24 14:21           ` Chris Wilson
  2020-06-24 14:25             ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-06-24 14:21 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > for that object. Pass the information along that we are inside an
> > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > 
> > > > > i915 should already not be holding locks shared with the notifiers
> > > > > across allocations that can trigger reclaim. This is already required
> > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > notifiers?
> > > > 
> > > > for (n = 0; n < num_pages; n++)
> > > >       pin_user_page()
> > > > 
> > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > 
> > > Yes, of course you can't hold any locks that intersect with notifiers
> > > across pin_user_page()/get_user_page()
> > 
> > What lock though? It's just the page refcount, shrinker asks us to drop
> > it [via mmu], we reply we would like to keep using that page as freeing
> > it for the current allocation is "robbing Peter to pay Paul".
> 
> Maybe I'm unclear what this series is actually trying to fix? 
> 
> You said "avoiding a recursive wait" which sounds like some locking
> deadlock to me.

It's the shrinker being called while we are allocating for/on behalf of
the object. As we are actively using the object, we don't want to free
it -- the partial object allocation being the clearest, if the object
consists of 2 pages, trying to free page 0 in order to allocate page 1
has to fail (and the shrinker should find another candidate to reclaim,
or fail the allocation).
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 14:21           ` Chris Wilson
@ 2020-06-24 14:25             ` Jason Gunthorpe
  2020-06-24 14:37               ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 14:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > 
> > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > notifiers?
> > > > > 
> > > > > for (n = 0; n < num_pages; n++)
> > > > >       pin_user_page()
> > > > > 
> > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > 
> > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > across pin_user_page()/get_user_page()
> > > 
> > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > it [via mmu], we reply we would like to keep using that page as freeing
> > > it for the current allocation is "robbing Peter to pay Paul".
> > 
> > Maybe I'm unclear what this series is actually trying to fix? 
> > 
> > You said "avoiding a recursive wait" which sounds like some locking
> > deadlock to me.
> 
> It's the shrinker being called while we are allocating for/on behalf of
> the object. As we are actively using the object, we don't want to free
> it -- the partial object allocation being the clearest, if the object
> consists of 2 pages, trying to free page 0 in order to allocate page 1
> has to fail (and the shrinker should find another candidate to reclaim,
> or fail the allocation).

mmu notifiers are not for influencing policy of the mm.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 14:25             ` Jason Gunthorpe
@ 2020-06-24 14:37               ` Chris Wilson
  2020-06-24 16:50                 ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-06-24 14:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > 
> > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > notifiers?
> > > > > > 
> > > > > > for (n = 0; n < num_pages; n++)
> > > > > >       pin_user_page()
> > > > > > 
> > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > 
> > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > across pin_user_page()/get_user_page()
> > > > 
> > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > it for the current allocation is "robbing Peter to pay Paul".
> > > 
> > > Maybe I'm unclear what this series is actually trying to fix? 
> > > 
> > > You said "avoiding a recursive wait" which sounds like some locking
> > > deadlock to me.
> > 
> > It's the shrinker being called while we are allocating for/on behalf of
> > the object. As we are actively using the object, we don't want to free
> > it -- the partial object allocation being the clearest, if the object
> > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > has to fail (and the shrinker should find another candidate to reclaim,
> > or fail the allocation).
> 
> mmu notifiers are not for influencing policy of the mm.

It's policy is "this may fail" regardless of the mmu notifier at this
point. That is not changed.

Your suggestion is that we move the pages to the unevictable mapping so
that the shrinker LRU is never invoked on pages we have grabbed with
pin_user_page. Does that work with the rest of the mmu notifiers?
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 14:37               ` Chris Wilson
@ 2020-06-24 16:50                 ` Jason Gunthorpe
  2020-06-24 17:58                   ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 16:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 03:37:32PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> > On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > > 
> > > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > > notifiers?
> > > > > > > 
> > > > > > > for (n = 0; n < num_pages; n++)
> > > > > > >       pin_user_page()
> > > > > > > 
> > > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > > 
> > > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > > across pin_user_page()/get_user_page()
> > > > > 
> > > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > > it for the current allocation is "robbing Peter to pay Paul".
> > > > 
> > > > Maybe I'm unclear what this series is actually trying to fix? 
> > > > 
> > > > You said "avoiding a recursive wait" which sounds like some locking
> > > > deadlock to me.
> > > 
> > > It's the shrinker being called while we are allocating for/on behalf of
> > > the object. As we are actively using the object, we don't want to free
> > > it -- the partial object allocation being the clearest, if the object
> > > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > > has to fail (and the shrinker should find another candidate to reclaim,
> > > or fail the allocation).
> > 
> > mmu notifiers are not for influencing policy of the mm.
> 
> It's policy is "this may fail" regardless of the mmu notifier at this
> point. That is not changed.

MMU notifiers are for tracking updates, they are not allowed to fail.
The one slightly weird case of non-blocking is the only exception.

> Your suggestion is that we move the pages to the unevictable mapping so
> that the shrinker LRU is never invoked on pages we have grabbed with
> pin_user_page. Does that work with the rest of the mmu notifiers?

That is beyond what I'm familiar with - but generally - if you want to
influence decisions the MM is making then it needs to be at the
front of the process and not inside notifiers. 

So what you describe seems broadly appropriate to me.

I'm still a little unclear on what you are trying to fix - pinned
pages are definitely not freed, do you have some case where pages
which are pinned are being cleaned out from the MM despite being
pinned? Sounds a bit strange, maybe that is worth adressing directly?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 16:50                 ` Jason Gunthorpe
@ 2020-06-24 17:58                   ` Chris Wilson
  2020-06-24 18:48                     ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-06-24 17:58 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

Quoting Jason Gunthorpe (2020-06-24 17:50:57)
> On Wed, Jun 24, 2020 at 03:37:32PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> > > On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > > > 
> > > > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > > > notifiers?
> > > > > > > > 
> > > > > > > > for (n = 0; n < num_pages; n++)
> > > > > > > >       pin_user_page()
> > > > > > > > 
> > > > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > > > 
> > > > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > > > across pin_user_page()/get_user_page()
> > > > > > 
> > > > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > > > it for the current allocation is "robbing Peter to pay Paul".
> > > > > 
> > > > > Maybe I'm unclear what this series is actually trying to fix? 
> > > > > 
> > > > > You said "avoiding a recursive wait" which sounds like some locking
> > > > > deadlock to me.
> > > > 
> > > > It's the shrinker being called while we are allocating for/on behalf of
> > > > the object. As we are actively using the object, we don't want to free
> > > > it -- the partial object allocation being the clearest, if the object
> > > > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > > > has to fail (and the shrinker should find another candidate to reclaim,
> > > > or fail the allocation).
> > > 
> > > mmu notifiers are not for influencing policy of the mm.
> > 
> > It's policy is "this may fail" regardless of the mmu notifier at this
> > point. That is not changed.
> 
> MMU notifiers are for tracking updates, they are not allowed to fail.
> The one slightly weird case of non-blocking is the only exception.
> 
> > Your suggestion is that we move the pages to the unevictable mapping so
> > that the shrinker LRU is never invoked on pages we have grabbed with
> > pin_user_page. Does that work with the rest of the mmu notifiers?
> 
> That is beyond what I'm familiar with - but generally - if you want to
> influence decisions the MM is making then it needs to be at the
> front of the process and not inside notifiers. 
> 
> So what you describe seems broadly appropriate to me.

Sadly, it's a mlock_vma_page problem all over again.
 
> I'm still a little unclear on what you are trying to fix - pinned
> pages are definitely not freed, do you have some case where pages
> which are pinned are being cleaned out from the MM despite being
> pinned? Sounds a bit strange, maybe that is worth adressing directly?

It suffices to say that pin_user_pages does not prevent try_to_unmap_one
from trying to revoke the page. But we could perhaps slip a
page_maybe_dma_pinned() in around there and see what happens.
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL
  2020-06-24 17:58                   ` Chris Wilson
@ 2020-06-24 18:48                     ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2020-06-24 18:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-mm, linux-kernel, intel-gfx, Andrew Morton

On Wed, Jun 24, 2020 at 06:58:49PM +0100, Chris Wilson wrote:
> > I'm still a little unclear on what you are trying to fix - pinned
> > pages are definitely not freed, do you have some case where pages
> > which are pinned are being cleaned out from the MM despite being
> > pinned? Sounds a bit strange, maybe that is worth adressing directly?
> 
> It suffices to say that pin_user_pages does not prevent try_to_unmap_one
> from trying to revoke the page.

This doesn't sound right.. Maybe there are some odd ball cases, but in
common things like page out we absolutely cannot page out or discard
pages under DMA. This would be a significant bug.

What is the actual problem here?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-24 18:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  8:02 [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Chris Wilson
2020-06-24  8:02 ` [PATCH 2/2] drm/i915/gem: Use mmu_notifier_range_mayfail() to avoid waiting inside reclaim Chris Wilson
2020-06-24 12:10 ` [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL Jason Gunthorpe
2020-06-24 12:21   ` Chris Wilson
2020-06-24 12:39     ` Jason Gunthorpe
2020-06-24 14:12       ` Chris Wilson
2020-06-24 14:16         ` Jason Gunthorpe
2020-06-24 14:21           ` Chris Wilson
2020-06-24 14:25             ` Jason Gunthorpe
2020-06-24 14:37               ` Chris Wilson
2020-06-24 16:50                 ` Jason Gunthorpe
2020-06-24 17:58                   ` Chris Wilson
2020-06-24 18:48                     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).