[PATCH] mm: Reward slab shrinkers that reclaim more than they were asked

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: Reward slab shrinkers that reclaim more than they were asked
@ 2017-08-12 11:34 Chris Wilson
  2017-08-15 22:30 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2017-08-12 11:34 UTC (permalink / raw)
  To: linux-mm
  Cc: intel-gfx, Chris Wilson, Andrew Morton, Michal Hocko,
	Johannes Weiner, Hillf Danton, Minchan Kim, Vlastimil Babka,
	Mel Gorman, Shaohua Li

Some shrinkers may only be able to free a bunch of objects at a time, and
so free more than the requested nr_to_scan in one pass. Account for the
extra freed objects against the total number of objects we intend to
free, otherwise we may end up penalising the slab far more than intended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: linux-mm@kvack.org
---
 mm/vmscan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1af041930a6..8bf6f41f94fb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -398,6 +398,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 			break;
 		freed += ret;
 
+		nr_to_scan = max(nr_to_scan, ret);
 		count_vm_events(SLABS_SCANNED, nr_to_scan);
 		total_scan -= nr_to_scan;
 		scanned += nr_to_scan;
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: Reward slab shrinkers that reclaim more than they were asked
  2017-08-12 11:34 [PATCH] mm: Reward slab shrinkers that reclaim more than they were asked Chris Wilson
@ 2017-08-15 22:30 ` Andrew Morton
  2017-08-15 22:53   ` Chris Wilson
  2017-08-22 13:53   ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Chris Wilson
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2017-08-15 22:30 UTC (permalink / raw)
  To: Chris Wilson
  Cc: linux-mm, intel-gfx, Michal Hocko, Johannes Weiner, Hillf Danton,
	Minchan Kim, Vlastimil Babka, Mel Gorman, Shaohua Li

On Sat, 12 Aug 2017 12:34:37 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:

> Some shrinkers may only be able to free a bunch of objects at a time, and
> so free more than the requested nr_to_scan in one pass. Account for the
> extra freed objects against the total number of objects we intend to
> free, otherwise we may end up penalising the slab far more than intended.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -398,6 +398,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  			break;
>  		freed += ret;
>  
> +		nr_to_scan = max(nr_to_scan, ret);
>  		count_vm_events(SLABS_SCANNED, nr_to_scan);
>  		total_scan -= nr_to_scan;
>  		scanned += nr_to_scan;

Well...  kinda.  But what happens if the shrinker scanned more objects
than requested but failed to free many of them?  Of if the shrinker
scanned less than requested?

We really want to return nr_scanned from the shrinker invocation. 
Could we add a field to shrink_control for this?

--- a/mm/vmscan.c~a
+++ a/mm/vmscan.c
@@ -393,14 +393,15 @@ static unsigned long do_shrink_slab(stru
 		unsigned long nr_to_scan = min(batch_size, total_scan);
 
 		shrinkctl->nr_to_scan = nr_to_scan;
+		shrinkctl->nr_scanned = nr_to_scan;
 		ret = shrinker->scan_objects(shrinker, shrinkctl);
 		if (ret == SHRINK_STOP)
 			break;
 		freed += ret;
 
-		count_vm_events(SLABS_SCANNED, nr_to_scan);
-		total_scan -= nr_to_scan;
-		scanned += nr_to_scan;
+		count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
+		total_scan -= shrinkctl->nr_scanned;
+		scanned += shrinkctl->nr_scanned;
 
 		cond_resched();
 	}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: Reward slab shrinkers that reclaim more than they were asked
  2017-08-15 22:30 ` Andrew Morton
@ 2017-08-15 22:53   ` Chris Wilson
  2017-08-22 13:53   ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Chris Wilson
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-08-15 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, intel-gfx, Michal Hocko, Johannes Weiner, Hillf Danton,
	Minchan Kim, Vlastimil Babka, Mel Gorman, Shaohua Li

Quoting Andrew Morton (2017-08-15 23:30:10)
> On Sat, 12 Aug 2017 12:34:37 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > Some shrinkers may only be able to free a bunch of objects at a time, and
> > so free more than the requested nr_to_scan in one pass. Account for the
> > extra freed objects against the total number of objects we intend to
> > free, otherwise we may end up penalising the slab far more than intended.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -398,6 +398,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >                       break;
> >               freed += ret;
> >  
> > +             nr_to_scan = max(nr_to_scan, ret);
> >               count_vm_events(SLABS_SCANNED, nr_to_scan);
> >               total_scan -= nr_to_scan;
> >               scanned += nr_to_scan;
> 
> Well...  kinda.  But what happens if the shrinker scanned more objects
> than requested but failed to free many of them?  Of if the shrinker
> scanned less than requested?
> 
> We really want to return nr_scanned from the shrinker invocation. 
> Could we add a field to shrink_control for this?

Yes, that will work better overall.
-Chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()
  2017-08-15 22:30 ` Andrew Morton
  2017-08-15 22:53   ` Chris Wilson
@ 2017-08-22 13:53   ` Chris Wilson
  2017-08-22 13:53     ` [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned Chris Wilson
  2017-08-24  5:11     ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Minchan Kim
  1 sibling, 2 replies; 11+ messages in thread
From: Chris Wilson @ 2017-08-22 13:53 UTC (permalink / raw)
  To: linux-mm
  Cc: intel-gfx, Chris Wilson, Andrew Morton, Michal Hocko,
	Johannes Weiner, Hillf Danton, Minchan Kim, Vlastimil Babka,
	Mel Gorman, Shaohua Li

Some shrinkers may only be able to free a bunch of objects at a time, and
so free more than the requested nr_to_scan in one pass. Whilst other
shrinkers may find themselves even unable to scan as many objects as
they counted, and so underreport. Account for the extra freed/scanned
objects against the total number of objects we intend to scan, otherwise
we may end up penalising the slab far more than intended. Similarly,
we want to add the underperforming scan to the deferred pass so that we
try harder and harder in future passes.

v2: Andrew's shrinkctl->nr_scanned

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: linux-mm@kvack.org
---
 include/linux/shrinker.h | 7 +++++++
 mm/vmscan.c              | 7 ++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 4fcacd915d45..51d189615bda 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -18,6 +18,13 @@ struct shrink_control {
 	 */
 	unsigned long nr_to_scan;
 
+	/*
+	 * How many objects did scan_objects process?
+	 * This defaults to nr_to_scan before every call, but the callee
+	 * should track its actual progress.
+	 */
+	unsigned long nr_scanned;
+
 	/* current node being shrunk (for NUMA aware shrinkers) */
 	int nid;
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1af041930a6..339b8fc95fc9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -393,14 +393,15 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 		unsigned long nr_to_scan = min(batch_size, total_scan);
 
 		shrinkctl->nr_to_scan = nr_to_scan;
+		shrinkctl->nr_scanned = nr_to_scan;
 		ret = shrinker->scan_objects(shrinker, shrinkctl);
 		if (ret == SHRINK_STOP)
 			break;
 		freed += ret;
 
-		count_vm_events(SLABS_SCANNED, nr_to_scan);
-		total_scan -= nr_to_scan;
-		scanned += nr_to_scan;
+		count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
+		total_scan -= shrinkctl->nr_scanned;
+		scanned += shrinkctl->nr_scanned;
 
 		cond_resched();
 	}
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned
  2017-08-22 13:53   ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Chris Wilson
@ 2017-08-22 13:53     ` Chris Wilson
  2017-08-22 22:45       ` Andrew Morton
  2017-08-24  5:11     ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Minchan Kim
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2017-08-22 13:53 UTC (permalink / raw)
  To: linux-mm
  Cc: intel-gfx, Chris Wilson, Joonas Lahtinen, Andrew Morton,
	Michal Hocko, Johannes Weiner, Hillf Danton, Minchan Kim,
	Vlastimil Babka, Mel Gorman, Shaohua Li

shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan). As
report the number of pages owned by each GEM object as a separate item
to the shrinker, we cannot precisely control the number of shrinker
objects we scan on each pass; and indeed may free more than requested.
If we fail to tell the shrinker about the number of objects we process,
it will continue to hold a grudge against us as any objects left
unscanned are added to the next reclaim -- and so we will keep on
"unfairly" shrinking our own slab in comparison to other slabs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: linux-mm@kvack.org
---
 drivers/gpu/drm/i915/i915_debugfs.c      |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h          |  1 +
 drivers/gpu/drm/i915/i915_gem.c          |  4 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c      |  2 +-
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 ++++++++++++++++++------
 5 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6bad53f89738..ed979cc6fb5d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4338,10 +4338,10 @@ i915_drop_caches_set(void *data, u64 val)
 
 	lockdep_set_current_reclaim_state(GFP_KERNEL);
 	if (val & DROP_BOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);
+		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_BOUND);
 
 	if (val & DROP_UNBOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_UNBOUND);
+		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_UNBOUND);
 
 	if (val & DROP_SHRINK_ALL)
 		i915_gem_shrink_all(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b78605a9f1b5..c3299eaac1af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3752,6 +3752,7 @@ i915_gem_object_create_internal(struct drm_i915_private *dev_priv,
 /* i915_gem_shrinker.c */
 unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
 			      unsigned long target,
+			      unsigned long *nr_scanned,
 			      unsigned flags);
 #define I915_SHRINK_PURGEABLE 0x1
 #define I915_SHRINK_UNBOUND 0x2
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2714898ff01..c06091718bb4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2339,7 +2339,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 				goto err_sg;
 			}
 
-			i915_gem_shrink(dev_priv, 2 * page_count, *s++);
+			i915_gem_shrink(dev_priv, 2 * page_count, NULL, *s++);
 			cond_resched();
 
 			/* We've tried hard to allocate the memory by reaping
@@ -5037,7 +5037,7 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv)
 	 * the objects as well, see i915_gem_freeze()
 	 */
 
-	i915_gem_shrink(dev_priv, -1UL, I915_SHRINK_UNBOUND);
+	i915_gem_shrink(dev_priv, -1UL, NULL, I915_SHRINK_UNBOUND);
 	i915_gem_drain_freed_objects(dev_priv);
 
 	spin_lock(&dev_priv->mm.obj_lock);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b6d5f1c6ef5e..8394fc2a21eb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2062,7 +2062,7 @@ int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
 		 */
 		GEM_BUG_ON(obj->mm.pages == pages);
 	} while (i915_gem_shrink(to_i915(obj->base.dev),
-				 obj->base.size >> PAGE_SHIFT,
+				 obj->base.size >> PAGE_SHIFT, NULL,
 				 I915_SHRINK_BOUND |
 				 I915_SHRINK_UNBOUND |
 				 I915_SHRINK_ACTIVE));
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ee4df98f009d..c178a1c9ae47 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -198,6 +198,7 @@ static void __start_writeback(struct drm_i915_gem_object *obj)
  * i915_gem_shrink - Shrink buffer object caches
  * @dev_priv: i915 device
  * @target: amount of memory to make available, in pages
+ * @nr_scanned: optional output for number of pages scanned (incremental)
  * @flags: control flags for selecting cache types
  *
  * This function is the main interface to the shrinker. It will try to release
@@ -220,7 +221,9 @@ static void __start_writeback(struct drm_i915_gem_object *obj)
  */
 unsigned long
 i915_gem_shrink(struct drm_i915_private *dev_priv,
-		unsigned long target, unsigned flags)
+		unsigned long target,
+		unsigned long *nr_scanned,
+		unsigned flags)
 {
 	const struct {
 		struct list_head *list;
@@ -231,6 +234,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 		{ NULL, 0 },
 	}, *phase;
 	unsigned long count = 0;
+	unsigned long scanned = 0;
 	bool unlock;
 
 	if (!shrinker_lock(dev_priv, &unlock))
@@ -318,6 +322,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 				}
 				mutex_unlock(&obj->mm.lock);
 			}
+			scanned += obj->base.size >> PAGE_SHIFT;
 
 			spin_lock(&dev_priv->mm.obj_lock);
 		}
@@ -332,6 +337,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 
 	shrinker_unlock(dev_priv, unlock);
 
+	if (nr_scanned)
+		*nr_scanned += scanned;
 	return count;
 }
 
@@ -354,7 +361,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 	unsigned long freed;
 
 	intel_runtime_pm_get(dev_priv);
-	freed = i915_gem_shrink(dev_priv, -1UL,
+	freed = i915_gem_shrink(dev_priv, -1UL, NULL,
 				I915_SHRINK_BOUND |
 				I915_SHRINK_UNBOUND |
 				I915_SHRINK_ACTIVE);
@@ -411,23 +418,28 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	unsigned long freed;
 	bool unlock;
 
+	sc->nr_scanned = 0;
+
 	if (!shrinker_lock(dev_priv, &unlock))
 		return SHRINK_STOP;
 
 	freed = i915_gem_shrink(dev_priv,
 				sc->nr_to_scan,
+				&sc->nr_scanned,
 				I915_SHRINK_BOUND |
 				I915_SHRINK_UNBOUND |
 				I915_SHRINK_PURGEABLE);
 	if (freed < sc->nr_to_scan)
 		freed += i915_gem_shrink(dev_priv,
-					 sc->nr_to_scan - freed,
+					 sc->nr_to_scan - sc->nr_scanned,
+					 &sc->nr_scanned,
 					 I915_SHRINK_BOUND |
 					 I915_SHRINK_UNBOUND);
 	if (freed < sc->nr_to_scan && current_is_kswapd()) {
 		intel_runtime_pm_get(dev_priv);
 		freed += i915_gem_shrink(dev_priv,
-					 sc->nr_to_scan - freed,
+					 sc->nr_to_scan - sc->nr_scanned,
+					 &sc->nr_scanned,
 					 I915_SHRINK_ACTIVE |
 					 I915_SHRINK_BOUND |
 					 I915_SHRINK_UNBOUND);
@@ -436,7 +448,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 
 	shrinker_unlock(dev_priv, unlock);
 
-	return freed;
+	return sc->nr_scanned ? freed : SHRINK_STOP;
 }
 
 static bool
@@ -525,7 +537,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 		goto out;
 
 	intel_runtime_pm_get(dev_priv);
-	freed_pages += i915_gem_shrink(dev_priv, -1UL,
+	freed_pages += i915_gem_shrink(dev_priv, -1UL, NULL,
 				       I915_SHRINK_BOUND |
 				       I915_SHRINK_UNBOUND |
 				       I915_SHRINK_ACTIVE |
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned
  2017-08-22 13:53     ` [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned Chris Wilson
@ 2017-08-22 22:45       ` Andrew Morton
  2017-08-23 14:20         ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2017-08-22 22:45 UTC (permalink / raw)
  To: Chris Wilson
  Cc: linux-mm, intel-gfx, Joonas Lahtinen, Michal Hocko,
	Johannes Weiner, Hillf Danton, Minchan Kim, Vlastimil Babka,
	Mel Gorman, Shaohua Li

On Tue, 22 Aug 2017 14:53:25 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:

> shrink_slab() allows us to report back the number of objects we
> successfully scanned (out of the target shrinkctl->nr_to_scan). As
> report the number of pages owned by each GEM object as a separate item
> to the shrinker, we cannot precisely control the number of shrinker
> objects we scan on each pass; and indeed may free more than requested.
> If we fail to tell the shrinker about the number of objects we process,
> it will continue to hold a grudge against us as any objects left
> unscanned are added to the next reclaim -- and so we will keep on
> "unfairly" shrinking our own slab in comparison to other slabs.

It's unclear which tree this is against but I think I got it all fixed
up.  Please check the changes to i915_gem_shrink().

From: Chris Wilson <chris@chris-wilson.co.uk>
Subject: drm/i915: wire up shrinkctl->nr_scanned

shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan).  As report
the number of pages owned by each GEM object as a separate item to the
shrinker, we cannot precisely control the number of shrinker objects we
scan on each pass; and indeed may free more than requested.  If we fail to
tell the shrinker about the number of objects we process, it will continue
to hold a grudge against us as any objects left unscanned are added to the
next reclaim -- and so we will keep on "unfairly" shrinking our own slab
in comparison to other slabs.

Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/gpu/drm/i915/i915_debugfs.c      |    4 +--
 drivers/gpu/drm/i915/i915_drv.h          |    1 
 drivers/gpu/drm/i915/i915_gem.c          |    4 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c      |    2 -
 drivers/gpu/drm/i915/i915_gem_shrinker.c |   24 +++++++++++++++------
 5 files changed, 24 insertions(+), 11 deletions(-)

diff -puN drivers/gpu/drm/i915/i915_debugfs.c~drm-i915-wire-up-shrinkctl-nr_scanned drivers/gpu/drm/i915/i915_debugfs.c
--- a/drivers/gpu/drm/i915/i915_debugfs.c~drm-i915-wire-up-shrinkctl-nr_scanned
+++ a/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4333,10 +4333,10 @@ i915_drop_caches_set(void *data, u64 val
 
 	lockdep_set_current_reclaim_state(GFP_KERNEL);
 	if (val & DROP_BOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);
+		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_BOUND);
 
 	if (val & DROP_UNBOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_UNBOUND);
+		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_UNBOUND);
 
 	if (val & DROP_SHRINK_ALL)
 		i915_gem_shrink_all(dev_priv);
diff -puN drivers/gpu/drm/i915/i915_drv.h~drm-i915-wire-up-shrinkctl-nr_scanned drivers/gpu/drm/i915/i915_drv.h
--- a/drivers/gpu/drm/i915/i915_drv.h~drm-i915-wire-up-shrinkctl-nr_scanned
+++ a/drivers/gpu/drm/i915/i915_drv.h
@@ -3628,6 +3628,7 @@ i915_gem_object_create_internal(struct d
 /* i915_gem_shrinker.c */
 unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
 			      unsigned long target,
+			      unsigned long *nr_scanned,
 			      unsigned flags);
 #define I915_SHRINK_PURGEABLE 0x1
 #define I915_SHRINK_UNBOUND 0x2
diff -puN drivers/gpu/drm/i915/i915_gem.c~drm-i915-wire-up-shrinkctl-nr_scanned drivers/gpu/drm/i915/i915_gem.c
--- a/drivers/gpu/drm/i915/i915_gem.c~drm-i915-wire-up-shrinkctl-nr_scanned
+++ a/drivers/gpu/drm/i915/i915_gem.c
@@ -2408,7 +2408,7 @@ rebuild_st:
 				goto err_sg;
 			}
 
-			i915_gem_shrink(dev_priv, 2 * page_count, *s++);
+			i915_gem_shrink(dev_priv, 2 * page_count, NULL, *s++);
 			cond_resched();
 
 			/* We've tried hard to allocate the memory by reaping
@@ -5012,7 +5012,7 @@ int i915_gem_freeze_late(struct drm_i915
 	 * the objects as well, see i915_gem_freeze()
 	 */
 
-	i915_gem_shrink(dev_priv, -1UL, I915_SHRINK_UNBOUND);
+	i915_gem_shrink(dev_priv, -1UL, NULL, I915_SHRINK_UNBOUND);
 	i915_gem_drain_freed_objects(dev_priv);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
diff -puN drivers/gpu/drm/i915/i915_gem_gtt.c~drm-i915-wire-up-shrinkctl-nr_scanned drivers/gpu/drm/i915/i915_gem_gtt.c
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c~drm-i915-wire-up-shrinkctl-nr_scanned
+++ a/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2061,7 +2061,7 @@ int i915_gem_gtt_prepare_pages(struct dr
 		 */
 		GEM_BUG_ON(obj->mm.pages == pages);
 	} while (i915_gem_shrink(to_i915(obj->base.dev),
-				 obj->base.size >> PAGE_SHIFT,
+				 obj->base.size >> PAGE_SHIFT, NULL,
 				 I915_SHRINK_BOUND |
 				 I915_SHRINK_UNBOUND |
 				 I915_SHRINK_ACTIVE));
diff -puN drivers/gpu/drm/i915/i915_gem_shrinker.c~drm-i915-wire-up-shrinkctl-nr_scanned drivers/gpu/drm/i915/i915_gem_shrinker.c
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c~drm-i915-wire-up-shrinkctl-nr_scanned
+++ a/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -136,6 +136,7 @@ static bool unsafe_drop_pages(struct drm
  * i915_gem_shrink - Shrink buffer object caches
  * @dev_priv: i915 device
  * @target: amount of memory to make available, in pages
+ * @nr_scanned: optional output for number of pages scanned (incremental)
  * @flags: control flags for selecting cache types
  *
  * This function is the main interface to the shrinker. It will try to release
@@ -158,7 +159,9 @@ static bool unsafe_drop_pages(struct drm
  */
 unsigned long
 i915_gem_shrink(struct drm_i915_private *dev_priv,
-		unsigned long target, unsigned flags)
+		unsigned long target,
+		unsigned long *nr_scanned,
+		unsigned flags)
 {
 	const struct {
 		struct list_head *list;
@@ -169,6 +172,7 @@ i915_gem_shrink(struct drm_i915_private
 		{ NULL, 0 },
 	}, *phase;
 	unsigned long count = 0;
+	unsigned long scanned = 0;
 	bool unlock;
 
 	if (!shrinker_lock(dev_priv, &unlock))
@@ -249,6 +253,7 @@ i915_gem_shrink(struct drm_i915_private
 					count += obj->base.size >> PAGE_SHIFT;
 				}
 				mutex_unlock(&obj->mm.lock);
+				scanned += obj->base.size >> PAGE_SHIFT;
 			}
 		}
 		list_splice_tail(&still_in_list, phase->list);
@@ -261,6 +266,8 @@ i915_gem_shrink(struct drm_i915_private
 
 	shrinker_unlock(dev_priv, unlock);
 
+	if (nr_scanned)
+		*nr_scanned += scanned;
 	return count;
 }
 
@@ -283,7 +290,7 @@ unsigned long i915_gem_shrink_all(struct
 	unsigned long freed;
 
 	intel_runtime_pm_get(dev_priv);
-	freed = i915_gem_shrink(dev_priv, -1UL,
+	freed = i915_gem_shrink(dev_priv, -1UL, NULL,
 				I915_SHRINK_BOUND |
 				I915_SHRINK_UNBOUND |
 				I915_SHRINK_ACTIVE);
@@ -329,23 +336,28 @@ i915_gem_shrinker_scan(struct shrinker *
 	unsigned long freed;
 	bool unlock;
 
+	sc->nr_scanned = 0;
+
 	if (!shrinker_lock(dev_priv, &unlock))
 		return SHRINK_STOP;
 
 	freed = i915_gem_shrink(dev_priv,
 				sc->nr_to_scan,
+				&sc->nr_scanned,
 				I915_SHRINK_BOUND |
 				I915_SHRINK_UNBOUND |
 				I915_SHRINK_PURGEABLE);
 	if (freed < sc->nr_to_scan)
 		freed += i915_gem_shrink(dev_priv,
-					 sc->nr_to_scan - freed,
+					 sc->nr_to_scan - sc->nr_scanned,
+					 &sc->nr_scanned,
 					 I915_SHRINK_BOUND |
 					 I915_SHRINK_UNBOUND);
 	if (freed < sc->nr_to_scan && current_is_kswapd()) {
 		intel_runtime_pm_get(dev_priv);
 		freed += i915_gem_shrink(dev_priv,
-					 sc->nr_to_scan - freed,
+					 sc->nr_to_scan - sc->nr_scanned,
+					 &sc->nr_scanned,
 					 I915_SHRINK_ACTIVE |
 					 I915_SHRINK_BOUND |
 					 I915_SHRINK_UNBOUND);
@@ -354,7 +366,7 @@ i915_gem_shrinker_scan(struct shrinker *
 
 	shrinker_unlock(dev_priv, unlock);
 
-	return freed;
+	return sc->nr_scanned ? freed : SHRINK_STOP;
 }
 
 static bool
@@ -453,7 +465,7 @@ i915_gem_shrinker_vmap(struct notifier_b
 		goto out;
 
 	intel_runtime_pm_get(dev_priv);
-	freed_pages += i915_gem_shrink(dev_priv, -1UL,
+	freed_pages += i915_gem_shrink(dev_priv, -1UL, NULL,
 				       I915_SHRINK_BOUND |
 				       I915_SHRINK_UNBOUND |
 				       I915_SHRINK_ACTIVE |
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned
  2017-08-22 22:45       ` Andrew Morton
@ 2017-08-23 14:20         ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2017-08-23 14:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, intel-gfx, Joonas Lahtinen, Michal Hocko,
	Johannes Weiner, Hillf Danton, Minchan Kim, Vlastimil Babka,
	Mel Gorman, Shaohua Li

Quoting Andrew Morton (2017-08-22 23:45:50)
> On Tue, 22 Aug 2017 14:53:25 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > shrink_slab() allows us to report back the number of objects we
> > successfully scanned (out of the target shrinkctl->nr_to_scan). As
> > report the number of pages owned by each GEM object as a separate item
> > to the shrinker, we cannot precisely control the number of shrinker
> > objects we scan on each pass; and indeed may free more than requested.
> > If we fail to tell the shrinker about the number of objects we process,
> > it will continue to hold a grudge against us as any objects left
> > unscanned are added to the next reclaim -- and so we will keep on
> > "unfairly" shrinking our own slab in comparison to other slabs.
> 
> It's unclear which tree this is against but I think I got it all fixed
> up.  Please check the changes to i915_gem_shrink().

My apologies, I wrote it against drm-tip for running against our CI. The
changes look fine, thank you.
-Chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()
  2017-08-22 13:53   ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Chris Wilson
  2017-08-22 13:53     ` [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned Chris Wilson
@ 2017-08-24  5:11     ` Minchan Kim
  2017-08-24  8:00       ` Vlastimil Babka
  1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2017-08-24  5:11 UTC (permalink / raw)
  To: Chris Wilson
  Cc: linux-mm, intel-gfx, Andrew Morton, Michal Hocko,
	Johannes Weiner, Hillf Danton, Vlastimil Babka, Mel Gorman,
	Shaohua Li

Hello Chris,

On Tue, Aug 22, 2017 at 02:53:24PM +0100, Chris Wilson wrote:
> Some shrinkers may only be able to free a bunch of objects at a time, and
> so free more than the requested nr_to_scan in one pass. Whilst other
> shrinkers may find themselves even unable to scan as many objects as
> they counted, and so underreport. Account for the extra freed/scanned
> objects against the total number of objects we intend to scan, otherwise
> we may end up penalising the slab far more than intended. Similarly,
> we want to add the underperforming scan to the deferred pass so that we
> try harder and harder in future passes.
> 
> v2: Andrew's shrinkctl->nr_scanned
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Shaohua Li <shli@fb.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/shrinker.h | 7 +++++++
>  mm/vmscan.c              | 7 ++++---
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 4fcacd915d45..51d189615bda 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -18,6 +18,13 @@ struct shrink_control {
>  	 */
>  	unsigned long nr_to_scan;
>  
> +	/*
> +	 * How many objects did scan_objects process?
> +	 * This defaults to nr_to_scan before every call, but the callee
> +	 * should track its actual progress.

So, if shrinker scans object more than requested, it shoud add up
top nr_scanned?

opposite case, if shrinker scans less than requested, it should reduce
nr_scanned to the value scanned real?

To track the progress is burden for the shrinker users. Even if a
shrinker has a mistake, VM will have big trouble like infinite loop.

IMHO, we need concrete reason to do it but fail to see it at this moment.

Could we just add up more freed object than requested to total_scan
like you did in first version[1]?
[1] lkml.kernel.org/r/<20170812113437.7397-1-chris@chris-wilson.co.uk>

> +	 */
> +	unsigned long nr_scanned;
> +
>  	/* current node being shrunk (for NUMA aware shrinkers) */
>  	int nid;
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a1af041930a6..339b8fc95fc9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -393,14 +393,15 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		unsigned long nr_to_scan = min(batch_size, total_scan);
>  
>  		shrinkctl->nr_to_scan = nr_to_scan;
> +		shrinkctl->nr_scanned = nr_to_scan;
>  		ret = shrinker->scan_objects(shrinker, shrinkctl);
>  		if (ret == SHRINK_STOP)
>  			break;
>  		freed += ret;
>  
> -		count_vm_events(SLABS_SCANNED, nr_to_scan);
> -		total_scan -= nr_to_scan;
> -		scanned += nr_to_scan;
> +		count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
> +		total_scan -= shrinkctl->nr_scanned;
> +		scanned += shrinkctl->nr_scanned;

If we really want to go this way, at least, We need some defense code
to prevent infinite loop when shrinker doesn't have object any more.
However, I really want to go with your first version.

Andrew?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()
  2017-08-24  5:11     ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Minchan Kim
@ 2017-08-24  8:00       ` Vlastimil Babka
  2017-08-25 21:41         ` Andrew Morton
  2017-08-28  8:09         ` Minchan Kim
  0 siblings, 2 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-08-24  8:00 UTC (permalink / raw)
  To: Minchan Kim, Chris Wilson
  Cc: linux-mm, intel-gfx, Andrew Morton, Michal Hocko,
	Johannes Weiner, Hillf Danton, Mel Gorman, Shaohua Li

On 08/24/2017 07:11 AM, Minchan Kim wrote:
> Hello Chris,
> 
> On Tue, Aug 22, 2017 at 02:53:24PM +0100, Chris Wilson wrote:
>> Some shrinkers may only be able to free a bunch of objects at a time, and
>> so free more than the requested nr_to_scan in one pass.

Can such shrinkers reflect that in their shrinker->batch value? Or is it
unpredictable for each scan?

>> Whilst other
>> shrinkers may find themselves even unable to scan as many objects as
>> they counted, and so underreport. Account for the extra freed/scanned
>> objects against the total number of objects we intend to scan, otherwise
>> we may end up penalising the slab far more than intended. Similarly,
>> we want to add the underperforming scan to the deferred pass so that we
>> try harder and harder in future passes.
>>
>> v2: Andrew's shrinkctl->nr_scanned
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Shaohua Li <shli@fb.com>
>> Cc: linux-mm@kvack.org
>> ---
>>  include/linux/shrinker.h | 7 +++++++
>>  mm/vmscan.c              | 7 ++++---
>>  2 files changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>> index 4fcacd915d45..51d189615bda 100644
>> --- a/include/linux/shrinker.h
>> +++ b/include/linux/shrinker.h
>> @@ -18,6 +18,13 @@ struct shrink_control {
>>  	 */
>>  	unsigned long nr_to_scan;
>>  
>> +	/*
>> +	 * How many objects did scan_objects process?
>> +	 * This defaults to nr_to_scan before every call, but the callee
>> +	 * should track its actual progress.
> 
> So, if shrinker scans object more than requested, it shoud add up
> top nr_scanned?

That sounds fair.

> opposite case, if shrinker scans less than requested, it should reduce
> nr_scanned to the value scanned real?

Unsure. If they can't scan more, the following attempt in the next
iteration should fail and thus result in SHRINK_STOP?

> To track the progress is burden for the shrinker users.

You mean shrinker authors, not users? AFAICS this nr_scanned is opt-in,
if they don't want to touch it, the default remains nr_to_scan.

> Even if a
> shrinker has a mistake, VM will have big trouble like infinite loop.

We could fake 0 as 1 or something, at least.

> IMHO, we need concrete reason to do it but fail to see it at this moment.
> 
> Could we just add up more freed object than requested to total_scan
> like you did in first version[1]?

That's a bit different metric, but maybe it doesn't matter. Different
shrinkers are essentially apples and oranges anyway, so improving the
arithmetics can only help to some extent, IMHO.

> [1] lkml.kernel.org/r/<20170812113437.7397-1-chris@chris-wilson.co.uk>
> 
>> +	 */
>> +	unsigned long nr_scanned;
>> +
>>  	/* current node being shrunk (for NUMA aware shrinkers) */
>>  	int nid;
>>  
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a1af041930a6..339b8fc95fc9 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -393,14 +393,15 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>>  		unsigned long nr_to_scan = min(batch_size, total_scan);
>>  
>>  		shrinkctl->nr_to_scan = nr_to_scan;
>> +		shrinkctl->nr_scanned = nr_to_scan;
>>  		ret = shrinker->scan_objects(shrinker, shrinkctl);
>>  		if (ret == SHRINK_STOP)
>>  			break;
>>  		freed += ret;
>>  
>> -		count_vm_events(SLABS_SCANNED, nr_to_scan);
>> -		total_scan -= nr_to_scan;
>> -		scanned += nr_to_scan;
>> +		count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
>> +		total_scan -= shrinkctl->nr_scanned;
>> +		scanned += shrinkctl->nr_scanned;
> 
> If we really want to go this way, at least, We need some defense code
> to prevent infinite loop when shrinker doesn't have object any more.
> However, I really want to go with your first version.
> 
> Andrew?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()
  2017-08-24  8:00       ` Vlastimil Babka
@ 2017-08-25 21:41         ` Andrew Morton
  2017-08-28  8:09         ` Minchan Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2017-08-25 21:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Minchan Kim, Chris Wilson, linux-mm, intel-gfx, Michal Hocko,
	Johannes Weiner, Hillf Danton, Mel Gorman, Shaohua Li

On Thu, 24 Aug 2017 10:00:49 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> > Even if a
> > shrinker has a mistake, VM will have big trouble like infinite loop.
> 
> We could fake 0 as 1 or something, at least.

If the shrinker returns sc->nr_scanned==0 then that's a buggy shrinker
- it should return SHRINK_STOP in that case.  Only a single shrinker
(i915) presently uses sc->nr_scanned and that one gets it right.  I
think it's OK - there's a limit to how far we should go defending
against buggy kernel code, surely.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()
  2017-08-24  8:00       ` Vlastimil Babka
  2017-08-25 21:41         ` Andrew Morton
@ 2017-08-28  8:09         ` Minchan Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2017-08-28  8:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Chris Wilson, linux-mm, intel-gfx, Andrew Morton, Michal Hocko,
	Johannes Weiner, Hillf Danton, Mel Gorman, Shaohua Li

Hi Vlastimil,

On Thu, Aug 24, 2017 at 10:00:49AM +0200, Vlastimil Babka wrote:
> On 08/24/2017 07:11 AM, Minchan Kim wrote:
> > Hello Chris,
> > 
> > On Tue, Aug 22, 2017 at 02:53:24PM +0100, Chris Wilson wrote:
> >> Some shrinkers may only be able to free a bunch of objects at a time, and
> >> so free more than the requested nr_to_scan in one pass.
> 
> Can such shrinkers reflect that in their shrinker->batch value? Or is it
> unpredictable for each scan?
> 
> >> Whilst other
> >> shrinkers may find themselves even unable to scan as many objects as
> >> they counted, and so underreport. Account for the extra freed/scanned
> >> objects against the total number of objects we intend to scan, otherwise
> >> we may end up penalising the slab far more than intended. Similarly,
> >> we want to add the underperforming scan to the deferred pass so that we
> >> try harder and harder in future passes.
> >>
> >> v2: Andrew's shrinkctl->nr_scanned
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Cc: Vlastimil Babka <vbabka@suse.cz>
> >> Cc: Mel Gorman <mgorman@techsingularity.net>
> >> Cc: Shaohua Li <shli@fb.com>
> >> Cc: linux-mm@kvack.org
> >> ---
> >>  include/linux/shrinker.h | 7 +++++++
> >>  mm/vmscan.c              | 7 ++++---
> >>  2 files changed, 11 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> >> index 4fcacd915d45..51d189615bda 100644
> >> --- a/include/linux/shrinker.h
> >> +++ b/include/linux/shrinker.h
> >> @@ -18,6 +18,13 @@ struct shrink_control {
> >>  	 */
> >>  	unsigned long nr_to_scan;
> >>  
> >> +	/*
> >> +	 * How many objects did scan_objects process?
> >> +	 * This defaults to nr_to_scan before every call, but the callee
> >> +	 * should track its actual progress.
> > 
> > So, if shrinker scans object more than requested, it shoud add up
> > top nr_scanned?
> 
> That sounds fair.
> 
> > opposite case, if shrinker scans less than requested, it should reduce
> > nr_scanned to the value scanned real?
> 
> Unsure. If they can't scan more, the following attempt in the next
> iteration should fail and thus result in SHRINK_STOP?

What should I do if I don't scan anything for some reasons on this iteration
but don't want to stop by SHRINK_STOP because I expect I will scan them
on next iteration? Return 1 on shrinker side? It doesn't make sense.
nr_scanned represents for realy scan value so if shrinker doesn't scan
anything but want to continue the scanning, it can return 0 and VM
should take care of it to prevent infinite loop because shrinker's
expectation can be wrong so it can make the system live-lock.

> 
> > To track the progress is burden for the shrinker users.
> 
> You mean shrinker authors, not users? AFAICS this nr_scanned is opt-in,
> if they don't want to touch it, the default remains nr_to_scan.

I meant shrinker authors which is user for VM shrinker. :-D

Anyway, my point is that shrinker are already racy. IOW, the amount of
objects in a shrinker can be changed between count_object and
scan_object and I'm not sure such micro object tracking based on stale
value will help a lot in every cases.

That means it could be broken interface without guarantee helping
the system as expected.

However, with v1 from Chris, it's low hanging fruit to get without pain
so that's why I wanted to merge v1 rather than v2.

> 
> > Even if a
> > shrinker has a mistake, VM will have big trouble like infinite loop.
> 
> We could fake 0 as 1 or something, at least.

Yes, I think we need it if we want to go this way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-28  8:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-12 11:34 [PATCH] mm: Reward slab shrinkers that reclaim more than they were asked Chris Wilson
2017-08-15 22:30 ` Andrew Morton
2017-08-15 22:53   ` Chris Wilson
2017-08-22 13:53   ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Chris Wilson
2017-08-22 13:53     ` [PATCH 2/2] drm/i915: Wire up shrinkctl->nr_scanned Chris Wilson
2017-08-22 22:45       ` Andrew Morton
2017-08-23 14:20         ` Chris Wilson
2017-08-24  5:11     ` [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab() Minchan Kim
2017-08-24  8:00       ` Vlastimil Babka
2017-08-25 21:41         ` Andrew Morton
2017-08-28  8:09         ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).