[PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
@ 2017-06-01 13:33 Chris Wilson
  2017-06-01 13:33 ` [PATCH 2/3] drm/i915: Encourage our shrinker more when our shmemfs allocations fails Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-01 13:33 UTC (permalink / raw)
  To: intel-gfx

In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
stopped direct reclaim and kswapd from triggering GPU/client stalls
whilst running (by restricting the objects they could reap to be idle).

However with abusive GPU usage, it becomes quite easy to starve kswapd
of memory and prevent it from making forward progress towards obtaining
enough free memory (thus driving the system closer to swap exhaustion).
Relax the previous restriction to allow kswapd (but not direct reclaim)
to stall the device whilst reaping purgeable pages.

v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 0fd2b58ce475..58f27369183c 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -332,6 +332,15 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 					 sc->nr_to_scan - freed,
 					 I915_SHRINK_BOUND |
 					 I915_SHRINK_UNBOUND);
+	if (freed < sc->nr_to_scan && current_is_kswapd()) {
+		intel_runtime_pm_get(dev_priv);
+		freed += i915_gem_shrink(dev_priv,
+					 sc->nr_to_scan - freed,
+					 I915_SHRINK_ACTIVE |
+					 I915_SHRINK_BOUND |
+					 I915_SHRINK_UNBOUND);
+		intel_runtime_pm_put(dev_priv);
+	}
 
 	shrinker_unlock(dev_priv, unlock);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] drm/i915: Encourage our shrinker more when our shmemfs allocations fails
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
@ 2017-06-01 13:33 ` Chris Wilson
  2017-06-01 13:33 ` [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator Chris Wilson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-01 13:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 52 ++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b676fd1f075..129515d8482a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2337,8 +2337,8 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	struct page *page;
 	unsigned long last_pfn = 0;	/* suppress gcc warning */
 	unsigned int max_segment;
+	gfp_t noreclaim;
 	int ret;
-	gfp_t gfp;
 
 	/* Assert that the object is not currently in any GPU domain. As it
 	 * wasn't in the GTT, there shouldn't be any way it could have been in
@@ -2367,22 +2367,33 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	 * Fail silently without starting the shrinker
 	 */
 	mapping = obj->base.filp->f_mapping;
-	gfp = mapping_gfp_constraint(mapping, ~(__GFP_IO | __GFP_RECLAIM));
-	gfp |= __GFP_NORETRY | __GFP_NOWARN;
+	noreclaim = mapping_gfp_constraint(mapping,
+					   ~(__GFP_IO | __GFP_RECLAIM));
+	noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
+
 	sg = st->sgl;
 	st->nents = 0;
 	for (i = 0; i < page_count; i++) {
-		page = shmem_read_mapping_page_gfp(mapping, i, gfp);
-		if (unlikely(IS_ERR(page))) {
-			i915_gem_shrink(dev_priv,
-					page_count,
-					I915_SHRINK_BOUND |
-					I915_SHRINK_UNBOUND |
-					I915_SHRINK_PURGEABLE);
+		const unsigned int shrink[] = {
+			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_PURGEABLE,
+			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_PURGEABLE | I915_SHRINK_ACTIVE,
+			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_ACTIVE,
+			0,
+		}, *s = shrink;
+		gfp_t gfp = noreclaim;
+
+		do {
 			page = shmem_read_mapping_page_gfp(mapping, i, gfp);
-		}
-		if (unlikely(IS_ERR(page))) {
-			gfp_t reclaim;
+			if (likely(!IS_ERR(page)))
+				break;
+
+			if (!*s) {
+				ret = PTR_ERR(page);
+				goto err_sg;
+			}
+
+			i915_gem_shrink(dev_priv, 2 * page_count, *s++);
+			cond_resched();
 
 			/* We've tried hard to allocate the memory by reaping
 			 * our own buffer, now let the real VM do its job and
@@ -2392,15 +2403,13 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			 * defer the oom here by reporting the ENOMEM back
 			 * to userspace.
 			 */
-			reclaim = mapping_gfp_mask(mapping);
-			reclaim |= __GFP_NORETRY; /* reclaim, but no oom */
-
-			page = shmem_read_mapping_page_gfp(mapping, i, reclaim);
-			if (IS_ERR(page)) {
-				ret = PTR_ERR(page);
-				goto err_sg;
+			if (!*s) {
+				/* reclaim and warn, but no oom */
+				gfp = mapping_gfp_mask(mapping);
+				gfp |= __GFP_NORETRY;
 			}
-		}
+		} while (1);
+
 		if (!i ||
 		    sg->length >= max_segment ||
 		    page_to_pfn(page) != last_pfn + 1) {
@@ -4285,6 +4294,7 @@ i915_gem_object_create(struct drm_i915_private *dev_priv, u64 size)
 
 	mapping = obj->base.filp->f_mapping;
 	mapping_set_gfp_mask(mapping, mask);
+	GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
 
 	i915_gem_object_init(obj, &i915_gem_object_ops);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
  2017-06-01 13:33 ` [PATCH 2/3] drm/i915: Encourage our shrinker more when our shmemfs allocations fails Chris Wilson
@ 2017-06-01 13:33 ` Chris Wilson
  2017-06-02 10:28   ` Joonas Lahtinen
  2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
  2017-06-01 14:23 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping Patchwork
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-01 13:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim via kswapd (through inconsistency within
throttle_direct_reclaim() and even then the race between multiple
allocators makes the two step of reclaim then allocate fragile), and as
our buffers are always dirty (with very few exceptions), we required
kswapd to perform pageout on them. The only effective means of waiting
on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
leaves us with the dilemma of invoking the oomkiller instead of
propagating the allocation failure back to userspace where it can be
handled more gracefully (one hopes). We cheat and note that __GFP_THISNODE
has the side-effect of preventing oom and has no consequence for our final
attempt at allocation.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 129515d8482a..53c51787d2ed 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2406,7 +2406,21 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			if (!*s) {
 				/* reclaim and warn, but no oom */
 				gfp = mapping_gfp_mask(mapping);
-				gfp |= __GFP_NORETRY;
+
+				/* Our bo are always dirty and so we require
+				 * kswapd to reclaim our pages (direct reclaim
+				 * performs no swapping on its own). However,
+				 * direct reclaim is meant to wait for kswapd
+				 * when under pressure, this is broken. As a
+				 * result __GFP_RECLAIM is unreliable and fails
+				 * to actually reclaim dirty pages -- unless
+				 * you try over and over again with
+				 * !__GFP_NORETRY. However, we still want to
+				 * fail this allocation rather than trigger
+				 * the out-of-memory killer and for this we
+				 * subvert __GFP_THISNODE for that side effect.
+				 */
+				gfp |= __GFP_THISNODE;
 			}
 		} while (1);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
  2017-06-01 13:33 ` [PATCH 2/3] drm/i915: Encourage our shrinker more when our shmemfs allocations fails Chris Wilson
  2017-06-01 13:33 ` [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator Chris Wilson
@ 2017-06-01 14:23 ` Patchwork
  2017-06-02 10:24 ` [PATCH 1/3] " Joonas Lahtinen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2017-06-01 14:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping
URL   : https://patchwork.freedesktop.org/series/25164/
State : success

== Summary ==

Series 25164v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/25164/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> FAIL       (fi-snb-2600) fdo#100007
Test kms_busy:
        Subgroup basic-flip-default-b:
                pass       -> DMESG-WARN (fi-skl-6700hq) fdo#101144 +2
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                pass       -> FAIL       (fi-skl-6700hq) fdo#101154 +7

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#101144 https://bugs.freedesktop.org/show_bug.cgi?id=101144
fdo#101154 https://bugs.freedesktop.org/show_bug.cgi?id=101154

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:444s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:431s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:580s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:513s
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:487s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:488s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:432s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:420s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:416s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:495s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:472s
fi-kbl-7500u     total:278  pass:255  dwarn:5   dfail:0   fail:0   skip:18  time:464s
fi-kbl-7560u     total:278  pass:263  dwarn:5   dfail:0   fail:0   skip:10  time:575s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:456s
fi-skl-6700hq    total:278  pass:229  dwarn:1   dfail:0   fail:26  skip:22  time:406s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:464s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:507s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:437s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:544s
fi-snb-2600      total:278  pass:248  dwarn:0   dfail:0   fail:1   skip:29  time:404s

2c9abf8ec88cf4b3d3735976bb0ff7a4991946b2 drm-tip: 2017y-06m-01d-13h-29m-05s UTC integration manifest
507e749 drm/i915: Remove __GFP_NORETRY from our buffer allocator
ec73274 drm/i915: Encourage our shrinker more when our shmemfs allocations fails
7247e82 drm/i915: Allow kswapd to pause the device whilst reaping

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4856/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
                   ` (2 preceding siblings ...)
  2017-06-01 14:23 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping Patchwork
@ 2017-06-02 10:24 ` Joonas Lahtinen
  2017-06-02 12:02 ` Mika Kuoppala
  2017-06-05 11:10 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping (rev2) Patchwork
  5 siblings, 0 replies; 23+ messages in thread
From: Joonas Lahtinen @ 2017-06-02 10:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2017-06-01 at 14:33 +0100, Chris Wilson wrote:
> In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
> stopped direct reclaim and kswapd from triggering GPU/client stalls
> whilst running (by restricting the objects they could reap to be idle).
> 
> However with abusive GPU usage, it becomes quite easy to starve kswapd
> of memory and prevent it from making forward progress towards obtaining
> enough free memory (thus driving the system closer to swap exhaustion).
> Relax the previous restriction to allow kswapd (but not direct reclaim)
> to stall the device whilst reaping purgeable pages.
> 
> v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

These are the kind of patches one wishes we had good testing coverage.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-01 13:33 ` [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator Chris Wilson
@ 2017-06-02 10:28   ` Joonas Lahtinen
  2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
  1 sibling, 0 replies; 23+ messages in thread
From: Joonas Lahtinen @ 2017-06-02 10:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Ursulin, Tvrtko, Kuoppala, Mika; +Cc: Daniel Vetter

On to, 2017-06-01 at 14:33 +0100, Chris Wilson wrote:
> I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> struggles with handling reclaim via kswapd (through inconsistency within
> throttle_direct_reclaim() and even then the race between multiple
> allocators makes the two step of reclaim then allocate fragile), and as
> our buffers are always dirty (with very few exceptions), we required
> kswapd to perform pageout on them. The only effective means of waiting
> on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> leaves us with the dilemma of invoking the oomkiller instead of
> propagating the allocation failure back to userspace where it can be
> handled more gracefully (one hopes). We cheat and note that __GFP_THISNODE
> has the side-effect of preventing oom and has no consequence for our final
> attempt at allocation.
> 
> Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> Testcase: igt/gem_tiled_swapping
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

<SNIP>

> @@ -2406,7 +2406,21 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>  			if (!*s) {
>  				/* reclaim and warn, but no oom */
>  				gfp = mapping_gfp_mask(mapping);
> -				gfp |= __GFP_NORETRY;
> +
> +				/* Our bo are always dirty and so we require
> +				 * kswapd to reclaim our pages (direct reclaim
> +				 * performs no swapping on its own). However,
> +				 * direct reclaim is meant to wait for kswapd
> +				 * when under pressure, this is broken. As a
> +				 * result __GFP_RECLAIM is unreliable and fails
> +				 * to actually reclaim dirty pages -- unless
> +				 * you try over and over again with
> +				 * !__GFP_NORETRY. However, we still want to
> +				 * fail this allocation rather than trigger
> +				 * the out-of-memory killer and for this we
> +				 * subvert __GFP_THISNODE for that side effect.

"side effect" sounds bad immediately, Daniel is on vacation so adding
Tvrtko and Mika here too.

Was this discussed with core MM?

Regards, Joonas

> +				 */
> +				gfp |= __GFP_THISNODE;
>  			}
>  		} while (1);
>  
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
                   ` (3 preceding siblings ...)
  2017-06-02 10:24 ` [PATCH 1/3] " Joonas Lahtinen
@ 2017-06-02 12:02 ` Mika Kuoppala
  2017-06-02 12:20   ` Chris Wilson
  2017-06-05 11:10 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping (rev2) Patchwork
  5 siblings, 1 reply; 23+ messages in thread
From: Mika Kuoppala @ 2017-06-02 12:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
> stopped direct reclaim and kswapd from triggering GPU/client stalls
> whilst running (by restricting the objects they could reap to be idle).
>
> However with abusive GPU usage, it becomes quite easy to starve kswapd
> of memory and prevent it from making forward progress towards obtaining
> enough free memory (thus driving the system closer to swap exhaustion).
> Relax the previous restriction to allow kswapd (but not direct reclaim)
> to stall the device whilst reaping purgeable pages.
>
> v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 0fd2b58ce475..58f27369183c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -332,6 +332,15 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>  					 sc->nr_to_scan - freed,
>  					 I915_SHRINK_BOUND |
>  					 I915_SHRINK_UNBOUND);
> +	if (freed < sc->nr_to_scan && current_is_kswapd()) {
> +		intel_runtime_pm_get(dev_priv);

We take extra ref to force device wake and thus force bound objects out?

> +		freed += i915_gem_shrink(dev_priv,
> +					 sc->nr_to_scan - freed,
> +					 I915_SHRINK_ACTIVE |
> +					 I915_SHRINK_BOUND |
> +					 I915_SHRINK_UNBOUND);

Looking at the shrink code, I am pondering how the stall will happen?

There are other callpaths that force gpu idle before kicking out
objects, but for this callpath it seems that we kick out
objects that might be still currently accessed by the gpu.

-Mika


> +		intel_runtime_pm_put(dev_priv);
> +	}
>  
>  	shrinker_unlock(dev_priv, unlock);
>  
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-02 12:02 ` Mika Kuoppala
@ 2017-06-02 12:20   ` Chris Wilson
  2017-06-02 12:38     ` Mika Kuoppala
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2017-06-02 12:20 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-06-02 13:02:57)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
> > stopped direct reclaim and kswapd from triggering GPU/client stalls
> > whilst running (by restricting the objects they could reap to be idle).
> >
> > However with abusive GPU usage, it becomes quite easy to starve kswapd
> > of memory and prevent it from making forward progress towards obtaining
> > enough free memory (thus driving the system closer to swap exhaustion).
> > Relax the previous restriction to allow kswapd (but not direct reclaim)
> > to stall the device whilst reaping purgeable pages.
> >
> > v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index 0fd2b58ce475..58f27369183c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -332,6 +332,15 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
> >                                        sc->nr_to_scan - freed,
> >                                        I915_SHRINK_BOUND |
> >                                        I915_SHRINK_UNBOUND);
> > +     if (freed < sc->nr_to_scan && current_is_kswapd()) {
> > +             intel_runtime_pm_get(dev_priv);
> 
> We take extra ref to force device wake and thus force bound objects out?

Yes. The shrinker skips the unbind phase if it can't acquire the device
wakeref, so we ensure we enter the shrinker with it held.
 
> > +             freed += i915_gem_shrink(dev_priv,
> > +                                      sc->nr_to_scan - freed,
> > +                                      I915_SHRINK_ACTIVE |
> > +                                      I915_SHRINK_BOUND |
> > +                                      I915_SHRINK_UNBOUND);
> 
> Looking at the shrink code, I am pondering how the stall will happen?
> 
> There are other callpaths that force gpu idle before kicking out
> objects, but for this callpath it seems that we kick out
> objects that might be still currently accessed by the gpu.

By unbinding an active object, we stall.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-02 12:20   ` Chris Wilson
@ 2017-06-02 12:38     ` Mika Kuoppala
  2017-06-02 13:58       ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Mika Kuoppala @ 2017-06-02 12:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2017-06-02 13:02:57)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
>> > stopped direct reclaim and kswapd from triggering GPU/client stalls
>> > whilst running (by restricting the objects they could reap to be idle).
>> >
>> > However with abusive GPU usage, it becomes quite easy to starve kswapd
>> > of memory and prevent it from making forward progress towards obtaining
>> > enough free memory (thus driving the system closer to swap exhaustion).
>> > Relax the previous restriction to allow kswapd (but not direct reclaim)
>> > to stall the device whilst reaping purgeable pages.
>> >
>> > v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 9 +++++++++
>> >  1 file changed, 9 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> > index 0fd2b58ce475..58f27369183c 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> > @@ -332,6 +332,15 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>> >                                        sc->nr_to_scan - freed,
>> >                                        I915_SHRINK_BOUND |
>> >                                        I915_SHRINK_UNBOUND);
>> > +     if (freed < sc->nr_to_scan && current_is_kswapd()) {
>> > +             intel_runtime_pm_get(dev_priv);
>> 
>> We take extra ref to force device wake and thus force bound objects out?
>
> Yes. The shrinker skips the unbind phase if it can't acquire the device
> wakeref, so we ensure we enter the shrinker with it held.
>  
>> > +             freed += i915_gem_shrink(dev_priv,
>> > +                                      sc->nr_to_scan - freed,
>> > +                                      I915_SHRINK_ACTIVE |
>> > +                                      I915_SHRINK_BOUND |
>> > +                                      I915_SHRINK_UNBOUND);
>> 
>> Looking at the shrink code, I am pondering how the stall will happen?
>> 
>> There are other callpaths that force gpu idle before kicking out
>> objects, but for this callpath it seems that we kick out
>> objects that might be still currently accessed by the gpu.
>
> By unbinding an active object, we stall.

I see that now.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping
  2017-06-02 12:38     ` Mika Kuoppala
@ 2017-06-02 13:58       ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-02 13:58 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-06-02 13:38:34)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2017-06-02 13:02:57)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
> >> > stopped direct reclaim and kswapd from triggering GPU/client stalls
> >> > whilst running (by restricting the objects they could reap to be idle).
> >> >
> >> > However with abusive GPU usage, it becomes quite easy to starve kswapd
> >> > of memory and prevent it from making forward progress towards obtaining
> >> > enough free memory (thus driving the system closer to swap exhaustion).
> >> > Relax the previous restriction to allow kswapd (but not direct reclaim)
> >> > to stall the device whilst reaping purgeable pages.
> >> >
> >> > v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
> >> >
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 9 +++++++++
> >> >  1 file changed, 9 insertions(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >> > index 0fd2b58ce475..58f27369183c 100644
> >> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >> > @@ -332,6 +332,15 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
> >> >                                        sc->nr_to_scan - freed,
> >> >                                        I915_SHRINK_BOUND |
> >> >                                        I915_SHRINK_UNBOUND);
> >> > +     if (freed < sc->nr_to_scan && current_is_kswapd()) {
> >> > +             intel_runtime_pm_get(dev_priv);
> >> 
> >> We take extra ref to force device wake and thus force bound objects out?
> >
> > Yes. The shrinker skips the unbind phase if it can't acquire the device
> > wakeref, so we ensure we enter the shrinker with it held.
> >  
> >> > +             freed += i915_gem_shrink(dev_priv,
> >> > +                                      sc->nr_to_scan - freed,
> >> > +                                      I915_SHRINK_ACTIVE |
> >> > +                                      I915_SHRINK_BOUND |
> >> > +                                      I915_SHRINK_UNBOUND);
> >> 
> >> Looking at the shrink code, I am pondering how the stall will happen?
> >> 
> >> There are other callpaths that force gpu idle before kicking out
> >> objects, but for this callpath it seems that we kick out
> >> objects that might be still currently accessed by the gpu.
> >
> > By unbinding an active object, we stall.
> 
> I see that now.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Pushed this as the wakeref alone should make kswapd much more effective.
Perhaps I should have included a fixes, but alas I did not.

I would like to get some traction on the NORETRY patch.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-01 13:33 ` [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator Chris Wilson
  2017-06-02 10:28   ` Joonas Lahtinen
@ 2017-06-05 10:35   ` Chris Wilson
  2017-06-05 10:47     ` Daniel Stone
                       ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-05 10:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Michal Hocko

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim via kswapd (through inconsistency within
throttle_direct_reclaim() and even then the race between multiple
allocators makes the two step of reclaim then allocate fragile), and as
our buffers are always dirty (with very few exceptions), we required
kswapd to perform pageout on them. The only effective means of waiting
on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
leaves us with the dilemma of invoking the oomkiller instead of
propagating the allocation failure back to userspace where it can be
handled more gracefully (one hopes).  In the future we may have
__GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
and the oomkiller would have been invoked. Until then, let the oomkiller
wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7286f5dd3e64..845df6067e90 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			if (!*s) {
 				/* reclaim and warn, but no oom */
 				gfp = mapping_gfp_mask(mapping);
-				gfp |= __GFP_NORETRY;
+
+				/* Our bo are always dirty and so we require
+				 * kswapd to reclaim our pages (direct reclaim
+				 * performs no swapping on its own). However,
+				 * direct reclaim is meant to wait for kswapd
+				 * when under pressure, this is broken. As a
+				 * result __GFP_RECLAIM is unreliable and fails
+				 * to actually reclaim dirty pages -- unless
+				 * you try over and over again with
+				 * !__GFP_NORETRY. However, we still want to
+				 * fail this allocation rather than trigger
+				 * the out-of-memory killer and for this we
+				 * want the future __GFP_MAYFAIL.
+				 */
 			}
 		} while (1);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
@ 2017-06-05 10:47     ` Daniel Stone
  2017-06-05 11:51       ` Chris Wilson
  2017-06-05 12:26     ` Michal Hocko
  2017-06-08 12:26     ` Michal Hocko
  2 siblings, 1 reply; 23+ messages in thread
From: Daniel Stone @ 2017-06-05 10:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Michal Hocko

Hi,

On 5 June 2017 at 11:35, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> struggles with handling reclaim via kswapd (through inconsistency within
> throttle_direct_reclaim() and even then the race between multiple
> allocators makes the two step of reclaim then allocate fragile), and as
> our buffers are always dirty (with very few exceptions), we required
> kswapd to perform pageout on them. The only effective means of waiting
> on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> leaves us with the dilemma of invoking the oomkiller instead of
> propagating the allocation failure back to userspace where it can be
> handled more gracefully (one hopes).

The i965 GL(ES) driver may dash your hopes somewhat:

         ret = execbuffer(dri_screen->fd, batch, hw_ctx,
                          4 * USED_BATCH(*batch),
                          in_fence_fd, out_fence_fd, flags);

   if (ret != 0) {
      fprintf(stderr, "intel_do_flush_locked failed: %s\n", strerror(-ret));
      exit(1);
   }

Before removing NORETRY, occasionally I'd get lucky and Chrome would
fail, but usually it'd be Mutter and my entire session would
disappear. I'm also not sure what a good strategy as a compositor
would be: just keep on trying updates until you get lucky? Sit doing
nothing for a while and hope redraws succeed 'later' ... ? Similarly
to Linus, I was in a position where reclaim should've been extremely
effective - into the gigabytes - at the time, so pushing reclaim
harder and taking a small time penalty seems far better than a hard
failure.

Cheers,
Daniel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping (rev2)
  2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
                   ` (4 preceding siblings ...)
  2017-06-02 12:02 ` Mika Kuoppala
@ 2017-06-05 11:10 ` Patchwork
  5 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2017-06-05 11:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping (rev2)
URL   : https://patchwork.freedesktop.org/series/25164/
State : success

== Summary ==

Series 25164v2 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/25164/revisions/2/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> FAIL       (fi-snb-2600) fdo#100007
Test kms_busy:
        Subgroup basic-flip-default-b:
                dmesg-warn -> PASS       (fi-skl-6700hq) fdo#101144 +2
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                fail       -> PASS       (fi-skl-6700hq) fdo#101154 +7

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#101144 https://bugs.freedesktop.org/show_bug.cgi?id=101144
fdo#101154 https://bugs.freedesktop.org/show_bug.cgi?id=101154

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time:445s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time:433s
fi-bsw-n3050     total:278  pass:242  dwarn:0   dfail:0   fail:0   skip:36  time:576s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:510s
fi-byt-j1900     total:278  pass:254  dwarn:0   dfail:0   fail:0   skip:24  time:487s
fi-byt-n2820     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:486s
fi-glk-2a        total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time:589s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:430s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time:411s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time:421s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:498s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:468s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time:463s
fi-kbl-7560u     total:278  pass:267  dwarn:1   dfail:0   fail:0   skip:10  time:569s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:460s
fi-skl-6700hq    total:278  pass:239  dwarn:0   dfail:1   fail:17  skip:21  time:430s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time:464s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time:506s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time:436s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time:537s
fi-snb-2600      total:278  pass:248  dwarn:0   dfail:0   fail:1   skip:29  time:404s

d90972f79041a05fe10d62ee29e2d1c345dda772 drm-tip: 2017y-06m-05d-10h-01m-30s UTC integration manifest
e049049 drm/i915: Remove __GFP_NORETRY from our buffer allocator
60cb8a4 drm/i915: Encourage our shrinker more when our shmemfs allocations fails

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4875/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 10:47     ` Daniel Stone
@ 2017-06-05 11:51       ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-05 11:51 UTC (permalink / raw)
  To: Daniel Stone; +Cc: Daniel Vetter, intel-gfx, Michal Hocko

Quoting Daniel Stone (2017-06-05 11:47:44)
> Hi,
> 
> On 5 June 2017 at 11:35, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > struggles with handling reclaim via kswapd (through inconsistency within
> > throttle_direct_reclaim() and even then the race between multiple
> > allocators makes the two step of reclaim then allocate fragile), and as
> > our buffers are always dirty (with very few exceptions), we required
> > kswapd to perform pageout on them. The only effective means of waiting
> > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > leaves us with the dilemma of invoking the oomkiller instead of
> > propagating the allocation failure back to userspace where it can be
> > handled more gracefully (one hopes).
> 
> The i965 GL(ES) driver may dash your hopes somewhat:
> 
>          ret = execbuffer(dri_screen->fd, batch, hw_ctx,
>                           4 * USED_BATCH(*batch),
>                           in_fence_fd, out_fence_fd, flags);
> 
>    if (ret != 0) {
>       fprintf(stderr, "intel_do_flush_locked failed: %s\n", strerror(-ret));
>       exit(1);
>    }

And their response has been that mesa is not a system critical library,
so don't use it in such roles. We have also floated patches for several
years now for that error to percolate back. Otoh, all the other memory
handling paths have more or less been fixed to report back that failure.
You'll note from Linus's message it wasn't this that failed, but one of
those other allocation paths and the error handling associated with them
in the client.

> Before removing NORETRY, occasionally I'd get lucky and Chrome would
> fail, but usually it'd be Mutter and my entire session would
> disappear. I'm also not sure what a good strategy as a compositor
> would be: just keep on trying updates until you get lucky? Sit doing
> nothing for a while and hope redraws succeed 'later' ... ?

In terms of mutter and chrome, the first question is why have they
filled all of memory with buffers. I strongly suspect there are leaks all
around, but if there are not something as complex as chrome can easily
identify buffers it has kept merely for convenience that can be rebuilt
on demand. (But they should already be using GL APPLE purgeable memory
for those.) There is a wider problem than this, having seen a growth in
the number of order-6 allocations failures over the past few kernels (or
rather distribution updates) for simple cursor updates (where the first
question to be asked is why is the cursor being reallocated?)

And yes, they should be robust in handling error conditions and throw
away rendering if it failed. A compositor may even preallocate resources
so that it can throw a message onto a screen if a failure occurs, a
lowlevel driver can do that and with Vulkan that too should be possible
(at least down to a few mallocs internal to the driver, but using an
externally controlled allocator).

> Similarly
> to Linus, I was in a position where reclaim should've been extremely
> effective - into the gigabytes - at the time, so pushing reclaim
> harder and taking a small time penalty seems far better than a hard
> failure.

The point here was that we did reclaim and fail (the failure outlined in
the changelog was that reclaim is not waiting for kswapd due a false
positive from allow_direct_reclaim()); after having first purged
everything unwanted from the set of i915 buffers. It wasn't that we asked
for no reclaim, it's just that we don't want to have the oomkiller kill
something at random. And there was no middle ground in the set of gfp
flags.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
  2017-06-05 10:47     ` Daniel Stone
@ 2017-06-05 12:26     ` Michal Hocko
  2017-06-05 12:49       ` Chris Wilson
  2017-06-08 12:26     ` Michal Hocko
  2 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-06-05 12:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> struggles with handling reclaim via kswapd (through inconsistency within
> throttle_direct_reclaim() and even then the race between multiple
> allocators makes the two step of reclaim then allocate fragile), and as
> our buffers are always dirty (with very few exceptions), we required
> kswapd to perform pageout on them. The only effective means of waiting
> on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> leaves us with the dilemma of invoking the oomkiller instead of
> propagating the allocation failure back to userspace where it can be
> handled more gracefully (one hopes).  In the future we may have
> __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> and the oomkiller would have been invoked. Until then, let the oomkiller
> wreck havoc.
> 
> v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> 
> Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> Testcase: igt/gem_tiled_swapping
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7286f5dd3e64..845df6067e90 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>  			if (!*s) {
>  				/* reclaim and warn, but no oom */
>  				gfp = mapping_gfp_mask(mapping);
> -				gfp |= __GFP_NORETRY;
> +
> +				/* Our bo are always dirty and so we require
> +				 * kswapd to reclaim our pages (direct reclaim
> +				 * performs no swapping on its own). However,

Not sure whether this is exactly what you mean. The only pageout the
direct reclaim is allowed to the swap partition (so anonymous and
shmem). So the above is not 100% correct.

> +				 * direct reclaim is meant to wait for kswapd
> +				 * when under pressure, this is broken. As a
> +				 * result __GFP_RECLAIM is unreliable and fails
> +				 * to actually reclaim dirty pages -- unless
> +				 * you try over and over again with
> +				 * !__GFP_NORETRY.

Yes, I would just mention that a heavy parallel allocations might result
in __GFP_NORETRY failures quite easilly. I believe this is a bigger
problem than your remark about dirty buffers (well assuming your buffers
are not filling up the whole memory).

> 				   However, we still want to
> +				 * fail this allocation rather than trigger
> +				 * the out-of-memory killer and for this we
> +				 * want the future __GFP_MAYFAIL.
> +				 */
>  			}
>  		} while (1);
>  
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 12:26     ` Michal Hocko
@ 2017-06-05 12:49       ` Chris Wilson
  2017-06-05 13:08         ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2017-06-05 12:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Daniel Vetter, intel-gfx

Quoting Michal Hocko (2017-06-05 13:26:30)
> On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > struggles with handling reclaim via kswapd (through inconsistency within
> > throttle_direct_reclaim() and even then the race between multiple
> > allocators makes the two step of reclaim then allocate fragile), and as
> > our buffers are always dirty (with very few exceptions), we required
> > kswapd to perform pageout on them. The only effective means of waiting
> > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > leaves us with the dilemma of invoking the oomkiller instead of
> > propagating the allocation failure back to userspace where it can be
> > handled more gracefully (one hopes).  In the future we may have
> > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > and the oomkiller would have been invoked. Until then, let the oomkiller
> > wreck havoc.
> > 
> > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > 
> > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > Testcase: igt/gem_tiled_swapping
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Michal Hocko <mhocko@suse.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 7286f5dd3e64..845df6067e90 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> >                       if (!*s) {
> >                               /* reclaim and warn, but no oom */
> >                               gfp = mapping_gfp_mask(mapping);
> > -                             gfp |= __GFP_NORETRY;
> > +
> > +                             /* Our bo are always dirty and so we require
> > +                              * kswapd to reclaim our pages (direct reclaim
> > +                              * performs no swapping on its own). However,
> 
> Not sure whether this is exactly what you mean. The only pageout the
> direct reclaim is allowed to the swap partition (so anonymous and
> shmem). So the above is not 100% correct.

Hmm, I didn't see anything that allows direct reclaim to perform
writeback into swap. The issue for us (i915) is that our buffers are
almost exclusively dirty, so even after we unpin them, in order to make
room they need to be paged out. Afaict, throttle_direct_reclaim() is
supposed to be the point at which direct reclaim waits for writeback via
kswapd and doesn't invoke writeback directly. throttle_direct_reclaim()
never waited as allow_direct_reclaim() kept reporting true even after a
direct reclaim failure. Without __GFP_NORETRY we were busy spinning on
kswapd making progress (and so avoiding the fail).

> 
> > +                              * direct reclaim is meant to wait for kswapd
> > +                              * when under pressure, this is broken. As a
> > +                              * result __GFP_RECLAIM is unreliable and fails
> > +                              * to actually reclaim dirty pages -- unless
> > +                              * you try over and over again with
> > +                              * !__GFP_NORETRY.
> 
> Yes, I would just mention that a heavy parallel allocations might result
> in __GFP_NORETRY failures quite easilly. I believe this is a bigger
> problem than your remark about dirty buffers (well assuming your buffers
> are not filling up the whole memory).

Yup, parallel allocations hit a problem where one thread was consuming
the successful reclaim from another and so we were failing faster. Our
tests tend to focus on i915 exclusively (mlocking the rest), so looking
at what happens when i915 fills all of memory is what we focus on.

I keep meaning to get around to adding extra pressure (find / or find
-xdev / -exec cat {};) to those tests.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 12:49       ` Chris Wilson
@ 2017-06-05 13:08         ` Michal Hocko
  2017-06-05 13:39           ` Chris Wilson
  2017-06-05 15:04           ` Chris Wilson
  0 siblings, 2 replies; 23+ messages in thread
From: Michal Hocko @ 2017-06-05 13:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> Quoting Michal Hocko (2017-06-05 13:26:30)
> > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > struggles with handling reclaim via kswapd (through inconsistency within
> > > throttle_direct_reclaim() and even then the race between multiple
> > > allocators makes the two step of reclaim then allocate fragile), and as
> > > our buffers are always dirty (with very few exceptions), we required
> > > kswapd to perform pageout on them. The only effective means of waiting
> > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > leaves us with the dilemma of invoking the oomkiller instead of
> > > propagating the allocation failure back to userspace where it can be
> > > handled more gracefully (one hopes).  In the future we may have
> > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > wreck havoc.
> > > 
> > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > 
> > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > Testcase: igt/gem_tiled_swapping
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 7286f5dd3e64..845df6067e90 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > >                       if (!*s) {
> > >                               /* reclaim and warn, but no oom */
> > >                               gfp = mapping_gfp_mask(mapping);
> > > -                             gfp |= __GFP_NORETRY;
> > > +
> > > +                             /* Our bo are always dirty and so we require
> > > +                              * kswapd to reclaim our pages (direct reclaim
> > > +                              * performs no swapping on its own). However,
> > 
> > Not sure whether this is exactly what you mean. The only pageout the
> > direct reclaim is allowed to the swap partition (so anonymous and
> > shmem). So the above is not 100% correct.
> 
> Hmm, I didn't see anything that allows direct reclaim to perform
> writeback into swap.

shrink_page_list
  add_to_swap
    add_to_swap_cache (gains mapping see page_mapping)
  pageout
    mapping->a_ops->writepage = shmem_writepage
      swap_writepage

note that the regular writeback is not allowed by 
shrink_page_list:
			if (page_is_file_cache(page) &&
			    (!current_is_kswapd() || !PageReclaim(page) ||
			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
				/*
				 * Immediately reclaim when written back.
				 * Similar in principal to deactivate_page()
				 * except we already have the page isolated
				 * and know it's dirty
				 */
				inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
				SetPageReclaim(page);

				goto activate_locked;
			}
called right before pageout()

> The issue for us (i915) is that our buffers are
> almost exclusively dirty, so even after we unpin them, in order to make
> room they need to be paged out. Afaict, throttle_direct_reclaim() is
> supposed to be the point at which direct reclaim waits for writeback via
> kswapd and doesn't invoke writeback directly.

Well, throttle_direct_reclaim is mostly about preventing over reclaim
than anything else. It doesn't check the amount of dirty data and such.

> throttle_direct_reclaim()
> never waited as allow_direct_reclaim() kept reporting true even after a
> direct reclaim failure. Without __GFP_NORETRY we were busy spinning on
> kswapd making progress (and so avoiding the fail).
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 13:08         ` Michal Hocko
@ 2017-06-05 13:39           ` Chris Wilson
  2017-06-06  9:04             ` Michal Hocko
  2017-06-05 15:04           ` Chris Wilson
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2017-06-05 13:39 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Daniel Vetter, intel-gfx

Quoting Michal Hocko (2017-06-05 14:08:10)
> On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> > Quoting Michal Hocko (2017-06-05 13:26:30)
> > > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > > struggles with handling reclaim via kswapd (through inconsistency within
> > > > throttle_direct_reclaim() and even then the race between multiple
> > > > allocators makes the two step of reclaim then allocate fragile), and as
> > > > our buffers are always dirty (with very few exceptions), we required
> > > > kswapd to perform pageout on them. The only effective means of waiting
> > > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > > leaves us with the dilemma of invoking the oomkiller instead of
> > > > propagating the allocation failure back to userspace where it can be
> > > > handled more gracefully (one hopes).  In the future we may have
> > > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > > wreck havoc.
> > > > 
> > > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > > 
> > > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > > Testcase: igt/gem_tiled_swapping
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index 7286f5dd3e64..845df6067e90 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > > >                       if (!*s) {
> > > >                               /* reclaim and warn, but no oom */
> > > >                               gfp = mapping_gfp_mask(mapping);
> > > > -                             gfp |= __GFP_NORETRY;
> > > > +
> > > > +                             /* Our bo are always dirty and so we require
> > > > +                              * kswapd to reclaim our pages (direct reclaim
> > > > +                              * performs no swapping on its own). However,
> > > 
> > > Not sure whether this is exactly what you mean. The only pageout the
> > > direct reclaim is allowed to the swap partition (so anonymous and
> > > shmem). So the above is not 100% correct.
> > 
> > Hmm, I didn't see anything that allows direct reclaim to perform
> > writeback into swap.

Hmm, maybe I was looking too hard at reclaim_clean_pages_from_list(), or
at least I was looking for ways I could start pageout earlier.
 
> shrink_page_list
>   add_to_swap
>     add_to_swap_cache (gains mapping see page_mapping)
>   pageout
>     mapping->a_ops->writepage = shmem_writepage
>       swap_writepage

Right, if we have sc->write_page it should be hitting here the first
time it sees a dirty page on a reclaim list. Hmm, I interpreted the
"only if nonblocking" to mean that we wouldn't swapout for direct
reclaim. Perhaps a better question then is why doesn't shmemfs make
better progress for writeback under direct reclaim?

> note that the regular writeback is not allowed by 
> shrink_page_list:
>                         if (page_is_file_cache(page) &&
>                             (!current_is_kswapd() || !PageReclaim(page) ||
>                              !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
>                                 /*
>                                  * Immediately reclaim when written back.
>                                  * Similar in principal to deactivate_page()
>                                  * except we already have the page isolated
>                                  * and know it's dirty
>                                  */
>                                 inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
>                                 SetPageReclaim(page);
> 
>                                 goto activate_locked;
>                         }
> called right before pageout()

At one point I thought that was just there to make my life more
miserable. But it's purpose is clear, no need to write a clean copy of a
file to swap when it can be read back in from the file later.

> > The issue for us (i915) is that our buffers are
> > almost exclusively dirty, so even after we unpin them, in order to make
> > room they need to be paged out. Afaict, throttle_direct_reclaim() is
> > supposed to be the point at which direct reclaim waits for writeback via
> > kswapd and doesn't invoke writeback directly.
> 
> Well, throttle_direct_reclaim is mostly about preventing over reclaim
> than anything else. It doesn't check the amount of dirty data and such.

It's the only mechanism I could see there that did try and wait on
kswapd. __GFP_KSWAPD_RECLAIM just means to kick kswapd and not actually
wait for progress from kswapd. I felt misled by that flag.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 13:08         ` Michal Hocko
  2017-06-05 13:39           ` Chris Wilson
@ 2017-06-05 15:04           ` Chris Wilson
  2017-06-06  9:08             ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2017-06-05 15:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Daniel Vetter, intel-gfx

Quoting Michal Hocko (2017-06-05 14:08:10)
> On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> > Quoting Michal Hocko (2017-06-05 13:26:30)
> > > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > > struggles with handling reclaim via kswapd (through inconsistency within
> > > > throttle_direct_reclaim() and even then the race between multiple
> > > > allocators makes the two step of reclaim then allocate fragile), and as
> > > > our buffers are always dirty (with very few exceptions), we required
> > > > kswapd to perform pageout on them. The only effective means of waiting
> > > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > > leaves us with the dilemma of invoking the oomkiller instead of
> > > > propagating the allocation failure back to userspace where it can be
> > > > handled more gracefully (one hopes).  In the future we may have
> > > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > > wreck havoc.
> > > > 
> > > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > > 
> > > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > > Testcase: igt/gem_tiled_swapping
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index 7286f5dd3e64..845df6067e90 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > > >                       if (!*s) {
> > > >                               /* reclaim and warn, but no oom */
> > > >                               gfp = mapping_gfp_mask(mapping);
> > > > -                             gfp |= __GFP_NORETRY;
> > > > +
> > > > +                             /* Our bo are always dirty and so we require
> > > > +                              * kswapd to reclaim our pages (direct reclaim
> > > > +                              * performs no swapping on its own). However,
> > > 
> > > Not sure whether this is exactly what you mean. The only pageout the
> > > direct reclaim is allowed to the swap partition (so anonymous and
> > > shmem). So the above is not 100% correct.
> > 
> > Hmm, I didn't see anything that allows direct reclaim to perform
> > writeback into swap.
> 
> shrink_page_list
>   add_to_swap
>     add_to_swap_cache (gains mapping see page_mapping)
>   pageout
>     mapping->a_ops->writepage = shmem_writepage
>       swap_writepage

Fwiw, sticking a few counters into writepage, i915 objects are only ever
called for shmem_writepage from kswapd.

Ideas? I'll continue on with a few more printks to try and work out why
I never see direct reclaim doing pageout.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 13:39           ` Chris Wilson
@ 2017-06-06  9:04             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-06-06  9:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon 05-06-17 14:39:17, Chris Wilson wrote:
> Quoting Michal Hocko (2017-06-05 14:08:10)
> > On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> > > Quoting Michal Hocko (2017-06-05 13:26:30)
> > > > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > > > struggles with handling reclaim via kswapd (through inconsistency within
> > > > > throttle_direct_reclaim() and even then the race between multiple
> > > > > allocators makes the two step of reclaim then allocate fragile), and as
> > > > > our buffers are always dirty (with very few exceptions), we required
> > > > > kswapd to perform pageout on them. The only effective means of waiting
> > > > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > > > leaves us with the dilemma of invoking the oomkiller instead of
> > > > > propagating the allocation failure back to userspace where it can be
> > > > > handled more gracefully (one hopes).  In the future we may have
> > > > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > > > wreck havoc.
> > > > > 
> > > > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > > > 
> > > > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > > > Testcase: igt/gem_tiled_swapping
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > index 7286f5dd3e64..845df6067e90 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > > > >                       if (!*s) {
> > > > >                               /* reclaim and warn, but no oom */
> > > > >                               gfp = mapping_gfp_mask(mapping);
> > > > > -                             gfp |= __GFP_NORETRY;
> > > > > +
> > > > > +                             /* Our bo are always dirty and so we require
> > > > > +                              * kswapd to reclaim our pages (direct reclaim
> > > > > +                              * performs no swapping on its own). However,
> > > > 
> > > > Not sure whether this is exactly what you mean. The only pageout the
> > > > direct reclaim is allowed to the swap partition (so anonymous and
> > > > shmem). So the above is not 100% correct.
> > > 
> > > Hmm, I didn't see anything that allows direct reclaim to perform
> > > writeback into swap.
> 
> Hmm, maybe I was looking too hard at reclaim_clean_pages_from_list(), or
> at least I was looking for ways I could start pageout earlier.
>  
> > shrink_page_list
> >   add_to_swap
> >     add_to_swap_cache (gains mapping see page_mapping)
> >   pageout
> >     mapping->a_ops->writepage = shmem_writepage
> >       swap_writepage
> 
> Right, if we have sc->write_page it should be hitting here the first
> time it sees a dirty page on a reclaim list. Hmm, I interpreted the
> "only if nonblocking" to mean that we wouldn't swapout for direct
> reclaim. Perhaps a better question then is why doesn't shmemfs make
> better progress for writeback under direct reclaim?

I assume that is because our reclaim is highly pagecache biased and we
try to not swap as much as possible (see get_scan_count). So it is quite
possible that we even do not age anonymous LRU lists because you have
sufficient amount of the page cache to reclaim.

> > note that the regular writeback is not allowed by 
> > shrink_page_list:
> >                         if (page_is_file_cache(page) &&
> >                             (!current_is_kswapd() || !PageReclaim(page) ||
> >                              !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
> >                                 /*
> >                                  * Immediately reclaim when written back.
> >                                  * Similar in principal to deactivate_page()
> >                                  * except we already have the page isolated
> >                                  * and know it's dirty
> >                                  */
> >                                 inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
> >                                 SetPageReclaim(page);
> > 
> >                                 goto activate_locked;
> >                         }
> > called right before pageout()
> 
> At one point I thought that was just there to make my life more
> miserable. But it's purpose is clear, no need to write a clean copy of a
> file to swap when it can be read back in from the file later.

Well the primary reason for this code to exist is the terrible IO
patterns when submitting scattered data and also kernel stack exhaustion
because the page allocator is called from quite a deep call chains and
any more complex storage stack is likely to consume a lot on top. Swap
is special and much simpler in many ways.

> > > The issue for us (i915) is that our buffers are
> > > almost exclusively dirty, so even after we unpin them, in order to make
> > > room they need to be paged out. Afaict, throttle_direct_reclaim() is
> > > supposed to be the point at which direct reclaim waits for writeback via
> > > kswapd and doesn't invoke writeback directly.
> > 
> > Well, throttle_direct_reclaim is mostly about preventing over reclaim
> > than anything else. It doesn't check the amount of dirty data and such.
> 
> It's the only mechanism I could see there that did try and wait on
> kswapd. __GFP_KSWAPD_RECLAIM just means to kick kswapd and not actually
> wait for progress from kswapd. I felt misled by that flag.

Well, we do not really wait for kswapd for __GFP_NORETRY. We just try to
reclaim and consume what we have done which is normally ok. If there is
a heavy and continuous memory pressure then we are out of luck but,
well, caller told us to not retry very hard so this should be an
acceptable behavior. Standard GFP_KERNEL allocations keep retrying the
reclaim which will allow kswapd/flushers to do their job.

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 15:04           ` Chris Wilson
@ 2017-06-06  9:08             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-06-06  9:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon 05-06-17 16:04:55, Chris Wilson wrote:
> Quoting Michal Hocko (2017-06-05 14:08:10)
> > On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> > > Quoting Michal Hocko (2017-06-05 13:26:30)
> > > > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > > > struggles with handling reclaim via kswapd (through inconsistency within
> > > > > throttle_direct_reclaim() and even then the race between multiple
> > > > > allocators makes the two step of reclaim then allocate fragile), and as
> > > > > our buffers are always dirty (with very few exceptions), we required
> > > > > kswapd to perform pageout on them. The only effective means of waiting
> > > > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > > > leaves us with the dilemma of invoking the oomkiller instead of
> > > > > propagating the allocation failure back to userspace where it can be
> > > > > handled more gracefully (one hopes).  In the future we may have
> > > > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > > > wreck havoc.
> > > > > 
> > > > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > > > 
> > > > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > > > Testcase: igt/gem_tiled_swapping
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > index 7286f5dd3e64..845df6067e90 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > > > >                       if (!*s) {
> > > > >                               /* reclaim and warn, but no oom */
> > > > >                               gfp = mapping_gfp_mask(mapping);
> > > > > -                             gfp |= __GFP_NORETRY;
> > > > > +
> > > > > +                             /* Our bo are always dirty and so we require
> > > > > +                              * kswapd to reclaim our pages (direct reclaim
> > > > > +                              * performs no swapping on its own). However,
> > > > 
> > > > Not sure whether this is exactly what you mean. The only pageout the
> > > > direct reclaim is allowed to the swap partition (so anonymous and
> > > > shmem). So the above is not 100% correct.
> > > 
> > > Hmm, I didn't see anything that allows direct reclaim to perform
> > > writeback into swap.
> > 
> > shrink_page_list
> >   add_to_swap
> >     add_to_swap_cache (gains mapping see page_mapping)
> >   pageout
> >     mapping->a_ops->writepage = shmem_writepage
> >       swap_writepage
> 
> Fwiw, sticking a few counters into writepage, i915 objects are only ever
> called for shmem_writepage from kswapd.
> 
> Ideas? I'll continue on with a few more printks to try and work out why
> I never see direct reclaim doing pageout.

We have some tracepoints for vmscan, I would try looking at them. You
should be able to see from which LRUs pages have been isolated and
reclaimed that way.

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
  2017-06-05 10:47     ` Daniel Stone
  2017-06-05 12:26     ` Michal Hocko
@ 2017-06-08 12:26     ` Michal Hocko
  2017-06-08 16:04       ` Chris Wilson
  2 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-06-08 12:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

Do you plan to merge this patch? I would like to post my
__GFP_RETRY_MAYFAIL series sometime soon and would like to cover this
one as well.

On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> struggles with handling reclaim via kswapd (through inconsistency within
> throttle_direct_reclaim() and even then the race between multiple
> allocators makes the two step of reclaim then allocate fragile), and as
> our buffers are always dirty (with very few exceptions), we required
> kswapd to perform pageout on them. The only effective means of waiting
> on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> leaves us with the dilemma of invoking the oomkiller instead of
> propagating the allocation failure back to userspace where it can be
> handled more gracefully (one hopes).  In the future we may have
> __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> and the oomkiller would have been invoked. Until then, let the oomkiller
> wreck havoc.
> 
> v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> 
> Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> Testcase: igt/gem_tiled_swapping
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7286f5dd3e64..845df6067e90 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>  			if (!*s) {
>  				/* reclaim and warn, but no oom */
>  				gfp = mapping_gfp_mask(mapping);
> -				gfp |= __GFP_NORETRY;
> +
> +				/* Our bo are always dirty and so we require
> +				 * kswapd to reclaim our pages (direct reclaim
> +				 * performs no swapping on its own). However,
> +				 * direct reclaim is meant to wait for kswapd
> +				 * when under pressure, this is broken. As a
> +				 * result __GFP_RECLAIM is unreliable and fails
> +				 * to actually reclaim dirty pages -- unless
> +				 * you try over and over again with
> +				 * !__GFP_NORETRY. However, we still want to
> +				 * fail this allocation rather than trigger
> +				 * the out-of-memory killer and for this we
> +				 * want the future __GFP_MAYFAIL.
> +				 */
>  			}
>  		} while (1);
>  
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator
  2017-06-08 12:26     ` Michal Hocko
@ 2017-06-08 16:04       ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2017-06-08 16:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Daniel Vetter, intel-gfx

Quoting Michal Hocko (2017-06-08 13:26:22)
> Do you plan to merge this patch? I would like to post my
> __GFP_RETRY_MAYFAIL series sometime soon and would like to cover this
> one as well.

Yes, just a strict policy on getting some sucker to review it first.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-06-08 16:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 13:33 [PATCH 1/3] drm/i915: Allow kswapd to pause the device whilst reaping Chris Wilson
2017-06-01 13:33 ` [PATCH 2/3] drm/i915: Encourage our shrinker more when our shmemfs allocations fails Chris Wilson
2017-06-01 13:33 ` [PATCH 3/3] drm/i915: Remove __GFP_NORETRY from our buffer allocator Chris Wilson
2017-06-02 10:28   ` Joonas Lahtinen
2017-06-05 10:35   ` [PATCH v2] " Chris Wilson
2017-06-05 10:47     ` Daniel Stone
2017-06-05 11:51       ` Chris Wilson
2017-06-05 12:26     ` Michal Hocko
2017-06-05 12:49       ` Chris Wilson
2017-06-05 13:08         ` Michal Hocko
2017-06-05 13:39           ` Chris Wilson
2017-06-06  9:04             ` Michal Hocko
2017-06-05 15:04           ` Chris Wilson
2017-06-06  9:08             ` Michal Hocko
2017-06-08 12:26     ` Michal Hocko
2017-06-08 16:04       ` Chris Wilson
2017-06-01 14:23 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping Patchwork
2017-06-02 10:24 ` [PATCH 1/3] " Joonas Lahtinen
2017-06-02 12:02 ` Mika Kuoppala
2017-06-02 12:20   ` Chris Wilson
2017-06-02 12:38     ` Mika Kuoppala
2017-06-02 13:58       ` Chris Wilson
2017-06-05 11:10 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Allow kswapd to pause the device whilst reaping (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.