All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
@ 2017-11-21 12:02 Tetsuo Handa
  2017-11-21 21:40 ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-21 12:02 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: Tetsuo Handa

There are users not checking for register_shrinker() failure.
Continuing with ignoring failure can lead to later oops at
unregister_shrinker().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/linux/shrinker.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 388ff29..a389491 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -75,6 +75,6 @@ struct shrinker {
 #define SHRINKER_NUMA_AWARE	(1 << 0)
 #define SHRINKER_MEMCG_AWARE	(1 << 1)
 
-extern int register_shrinker(struct shrinker *);
+extern __must_check int register_shrinker(struct shrinker *);
 extern void unregister_shrinker(struct shrinker *);
 #endif
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-21 12:02 [PATCH] mm,vmscan: Mark register_shrinker() as __must_check Tetsuo Handa
@ 2017-11-21 21:40 ` Andrew Morton
  2017-11-21 22:09   ` Tetsuo Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2017-11-21 21:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-mm, Dave Chinner, Al Viro, Jan Kara, Paolo Bonzini,
	David Airlie, Alex Deucher, Shaohua Li, Mike Snitzer

On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> There are users not checking for register_shrinker() failure.
> Continuing with ignoring failure can lead to later oops at
> unregister_shrinker().
> 
> ...
>
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -75,6 +75,6 @@ struct shrinker {
>  #define SHRINKER_NUMA_AWARE	(1 << 0)
>  #define SHRINKER_MEMCG_AWARE	(1 << 1)
>  
> -extern int register_shrinker(struct shrinker *);
> +extern __must_check int register_shrinker(struct shrinker *);
>  extern void unregister_shrinker(struct shrinker *);
>  #endif

hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
failing.

Affected code seems to be fs/xfs, fs/super.c, fs/quota,
arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
staging stuff.

I'm not sure this is worth bothering about?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-21 21:40 ` Andrew Morton
@ 2017-11-21 22:09   ` Tetsuo Handa
  2017-11-22 10:53     ` Tetsuo Handa
  2017-11-22 12:33     ` Michal Hocko
  0 siblings, 2 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-21 22:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, david, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

Andrew Morton wrote:
> On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
> > There are users not checking for register_shrinker() failure.
> > Continuing with ignoring failure can lead to later oops at
> > unregister_shrinker().
> > 
> > ...
> >
> > --- a/include/linux/shrinker.h
> > +++ b/include/linux/shrinker.h
> > @@ -75,6 +75,6 @@ struct shrinker {
> >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> >  
> > -extern int register_shrinker(struct shrinker *);
> > +extern __must_check int register_shrinker(struct shrinker *);
> >  extern void unregister_shrinker(struct shrinker *);
> >  #endif
> 
> hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> failing.

It failed by fault injection and resulted in a report at
http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .

> 
> Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> staging stuff.
> 
> I'm not sure this is worth bothering about?
> 

Continuing with failed register_shrinker() is almost always wrong.
Though I don't know whether mm/zsmalloc.c case can make sense.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-21 22:09   ` Tetsuo Handa
@ 2017-11-22 10:53     ` Tetsuo Handa
  2017-11-22 12:45       ` Michal Hocko
  2017-11-22 20:39       ` Dave Chinner
  2017-11-22 12:33     ` Michal Hocko
  1 sibling, 2 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-22 10:53 UTC (permalink / raw)
  To: akpm, glauber, mhocko
  Cc: linux-mm, david, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

Tetsuo Handa wrote:
> Andrew Morton wrote:
> > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > 
> > > There are users not checking for register_shrinker() failure.
> > > Continuing with ignoring failure can lead to later oops at
> > > unregister_shrinker().
> > > 
> > > ...
> > >
> > > --- a/include/linux/shrinker.h
> > > +++ b/include/linux/shrinker.h
> > > @@ -75,6 +75,6 @@ struct shrinker {
> > >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> > >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> > >  
> > > -extern int register_shrinker(struct shrinker *);
> > > +extern __must_check int register_shrinker(struct shrinker *);
> > >  extern void unregister_shrinker(struct shrinker *);
> > >  #endif
> > 
> > hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> > failing.
> 
> It failed by fault injection and resulted in a report at
> http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .

Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12
(which might not be impossible in near future), register_shrinker() can
potentially become a costly allocation which might fail without invoking
the OOM killer. It is a good opportunity to think whether we should allow
register_shrinker() to fail.

> 
> > 
> > Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> > staging stuff.
> > 
> > I'm not sure this is worth bothering about?
> > 
> 
> Continuing with failed register_shrinker() is almost always wrong.
> Though I don't know whether mm/zsmalloc.c case can make sense.
> 

Thinking from the fact that register_shrinker() had been "void" until Linux 3.11
and we did not take appropriate precautions when changing to "int" in Linux 3.12,
we need to consider making register_shrinker() "void" again.

If we could agree with opening up the use of __GFP_NOFAIL for allocating a few
non-contiguous pages on large systems, we can make register_shrinker() "void"
again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE)
rather than kvmalloc() in order to use __GFP_NOFAIL.)


 include/linux/shrinker.h |    4 +++-
 mm/vmscan.c              |   31 ++++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 388ff29..362a871 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -34,6 +34,8 @@ struct shrink_control {
 };
 
 #define SHRINK_STOP (~0UL)
+#define SHRINKER_SLOTS_PER_PAGE (PAGE_SIZE / sizeof(atomic_long_t))
+#define SHRINKER_SLOT_PAGES DIV_ROUND_UP(MAX_NUMNODES, SHRINKER_SLOTS_PER_PAGE)
 /*
  * A callback you can register to apply pressure to ageable caches.
  *
@@ -67,7 +69,7 @@ struct shrinker {
 	/* These are for internal use */
 	struct list_head list;
 	/* objs pending delete, per node */
-	atomic_long_t *nr_deferred;
+	atomic_long_t *nr_deferred[SHRINKER_SLOT_PAGES];
 };
 #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1c1bc95..da1f633 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -276,14 +276,19 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
  */
 int register_shrinker(struct shrinker *shrinker)
 {
-	size_t size = sizeof(*shrinker->nr_deferred);
+	int i;
+	size_t size = sizeof(atomic_long_t);
 
 	if (shrinker->flags & SHRINKER_NUMA_AWARE)
 		size *= nr_node_ids;
 
-	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
-	if (!shrinker->nr_deferred)
-		return -ENOMEM;
+	for (i = 0; i < SHRINKER_SLOT_PAGES; i++) {
+		const size_t s = size >= PAGE_SIZE ? PAGE_SIZE : size;
+
+		size -= s;
+		shrinker->nr_deferred[i] = kzalloc(s,
+						   GFP_KERNEL | __GFP_NOFAIL);
+	}
 
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
@@ -297,10 +302,12 @@ int register_shrinker(struct shrinker *shrinker)
  */
 void unregister_shrinker(struct shrinker *shrinker)
 {
+	int i;
 	down_write(&shrinker_rwsem);
 	list_del(&shrinker->list);
 	up_write(&shrinker_rwsem);
-	kfree(shrinker->nr_deferred);
+	for (i = 0; i < SHRINKER_SLOT_PAGES; i++)
+		kfree(shrinker->nr_deferred[i]);
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
@@ -321,17 +328,24 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	long batch_size = shrinker->batch ? shrinker->batch
 					  : SHRINK_BATCH;
 	long scanned = 0, next_deferred;
+	atomic_long_t *nr_deferred;
 
 	freeable = shrinker->count_objects(shrinker, shrinkctl);
 	if (freeable == 0)
 		return 0;
 
+	if (SHRINKER_SLOT_PAGES > 1)
+		nr_deferred = &shrinker->nr_deferred
+			[nid / SHRINKER_SLOTS_PER_PAGE]
+			[nid % SHRINKER_SLOTS_PER_PAGE];
+	else
+		nr_deferred = &shrinker->nr_deferred[0][nid];
 	/*
 	 * copy the current shrinker scan count into a local variable
 	 * and zero it so that other concurrent shrinker invocations
 	 * don't also do this scanning work.
 	 */
-	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
+	nr = atomic_long_xchg(nr_deferred, 0);
 
 	total_scan = nr;
 	delta = (4 * nr_scanned) / shrinker->seeks;
@@ -417,10 +431,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	 * scan, there is no need to do an update.
 	 */
 	if (next_deferred > 0)
-		new_nr = atomic_long_add_return(next_deferred,
-						&shrinker->nr_deferred[nid]);
+		new_nr = atomic_long_add_return(next_deferred, nr_deferred);
 	else
-		new_nr = atomic_long_read(&shrinker->nr_deferred[nid]);
+		new_nr = atomic_long_read(nr_deferred);
 
 	trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan);
 	return freed;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-21 22:09   ` Tetsuo Handa
  2017-11-22 10:53     ` Tetsuo Handa
@ 2017-11-22 12:33     ` Michal Hocko
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2017-11-22 12:33 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, linux-mm, david, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

On Wed 22-11-17 07:09:33, Tetsuo Handa wrote:
> Andrew Morton wrote:
[...]
> > I'm not sure this is worth bothering about?
> > 
> 
> Continuing with failed register_shrinker() is almost always wrong.
> Though I don't know whether mm/zsmalloc.c case can make sense.

Well, strictly speaking ignoring an error from _any_ function is almost
always wrong. So I am not sure why __must_check is actually an
improvement.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 10:53     ` Tetsuo Handa
@ 2017-11-22 12:45       ` Michal Hocko
  2017-11-22 13:06         ` Tetsuo Handa
  2017-11-22 20:39       ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2017-11-22 12:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, glauber, linux-mm, david, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

On Wed 22-11-17 19:53:59, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Andrew Morton wrote:
> > > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > > 
> > > > There are users not checking for register_shrinker() failure.
> > > > Continuing with ignoring failure can lead to later oops at
> > > > unregister_shrinker().
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/shrinker.h
> > > > +++ b/include/linux/shrinker.h
> > > > @@ -75,6 +75,6 @@ struct shrinker {
> > > >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> > > >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> > > >  
> > > > -extern int register_shrinker(struct shrinker *);
> > > > +extern __must_check int register_shrinker(struct shrinker *);
> > > >  extern void unregister_shrinker(struct shrinker *);
> > > >  #endif
> > > 
> > > hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> > > failing.
> > 
> > It failed by fault injection and resulted in a report at
> > http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .
> 
> Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12
> (which might not be impossible in near future), register_shrinker() can
> potentially become a costly allocation which might fail without invoking
> the OOM killer. It is a good opportunity to think whether we should allow
> register_shrinker() to fail.

Is it really that hard to fix callers to handle the error?

> > > Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> > > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> > > staging stuff.
> > > 
> > > I'm not sure this is worth bothering about?
> > > 
> > 
> > Continuing with failed register_shrinker() is almost always wrong.
> > Though I don't know whether mm/zsmalloc.c case can make sense.
> > 
> 
> Thinking from the fact that register_shrinker() had been "void" until Linux 3.11
> and we did not take appropriate precautions when changing to "int" in Linux 3.12,
> we need to consider making register_shrinker() "void" again.
> 
> If we could agree with opening up the use of __GFP_NOFAIL for allocating a few
> non-contiguous pages on large systems, we can make register_shrinker() "void"
> again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE)
> rather than kvmalloc() in order to use __GFP_NOFAIL.)

I am not sure we want to overcomplicate the code too much. Most
architectures do not have that many numa nodes to care. If we really
need to care maybe we should rethink and get rid of the per numa
deferred count altogether.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 12:45       ` Michal Hocko
@ 2017-11-22 13:06         ` Tetsuo Handa
  2017-11-22 14:31           ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-22 13:06 UTC (permalink / raw)
  To: mhocko
  Cc: akpm, glauber, linux-mm, david, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

Michal Hocko wrote:
> On Wed 22-11-17 19:53:59, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > > Andrew Morton wrote:
> > > > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > > > 
> > > > > There are users not checking for register_shrinker() failure.
> > > > > Continuing with ignoring failure can lead to later oops at
> > > > > unregister_shrinker().
> > > > > 
> > > > > ...
> > > > >
> > > > > --- a/include/linux/shrinker.h
> > > > > +++ b/include/linux/shrinker.h
> > > > > @@ -75,6 +75,6 @@ struct shrinker {
> > > > >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> > > > >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> > > > >  
> > > > > -extern int register_shrinker(struct shrinker *);
> > > > > +extern __must_check int register_shrinker(struct shrinker *);
> > > > >  extern void unregister_shrinker(struct shrinker *);
> > > > >  #endif
> > > > 
> > > > hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> > > > failing.
> > > 
> > > It failed by fault injection and resulted in a report at
> > > http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .
> > 
> > Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12
> > (which might not be impossible in near future), register_shrinker() can
> > potentially become a costly allocation which might fail without invoking
> > the OOM killer. It is a good opportunity to think whether we should allow
> > register_shrinker() to fail.
> 
> Is it really that hard to fix callers to handle the error?

At least adding __must_check will encourage callers to check for an error.
But

> 
> > > > Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> > > > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> > > > staging stuff.
> > > > 
> > > > I'm not sure this is worth bothering about?
> > > > 
> > > 
> > > Continuing with failed register_shrinker() is almost always wrong.
> > > Though I don't know whether mm/zsmalloc.c case can make sense.
> > > 
> > 
> > Thinking from the fact that register_shrinker() had been "void" until Linux 3.11
> > and we did not take appropriate precautions when changing to "int" in Linux 3.12,
> > we need to consider making register_shrinker() "void" again.
> > 
> > If we could agree with opening up the use of __GFP_NOFAIL for allocating a few
> > non-contiguous pages on large systems, we can make register_shrinker() "void"
> > again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE)
> > rather than kvmalloc() in order to use __GFP_NOFAIL.)
> 
> I am not sure we want to overcomplicate the code too much. Most
> architectures do not have that many numa nodes to care. If we really
> need to care maybe we should rethink and get rid of the per numa
> deferred count altogether.

the amount of changes needed for checking for an error will exceed the amount of
changes needed for making register_shrinker() not to return an error.
Do we want to overcomplicate register_shrinker() callers?

I think that making register_shrinker() not to fail is less error-prone than
updating register_shrinker() callers to check for failure. Thus, I appreciate
if we could agree with __GFP_NOFAIL for register_shrinker().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 13:06         ` Tetsuo Handa
@ 2017-11-22 14:31           ` Paolo Bonzini
  2017-11-22 14:36             ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-11-22 14:31 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: akpm, glauber, linux-mm, david, viro, jack, airlied,
	alexander.deucher, shli, snitzer

On 22/11/2017 14:06, Tetsuo Handa wrote:
>> I am not sure we want to overcomplicate the code too much. Most
>> architectures do not have that many numa nodes to care. If we really
>> need to care maybe we should rethink and get rid of the per numa
>> deferred count altogether.
> the amount of changes needed for checking for an error will exceed the amount of
> changes needed for making register_shrinker() not to return an error.
> Do we want to overcomplicate register_shrinker() callers?

For KVM it's not a big deal, fixing kvm_mmu_module_init to check the
return value is trivial.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 14:31           ` Paolo Bonzini
@ 2017-11-22 14:36             ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2017-11-22 14:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tetsuo Handa, akpm, glauber, linux-mm, david, viro, jack,
	airlied, alexander.deucher, shli, snitzer

On Wed 22-11-17 15:31:14, Paolo Bonzini wrote:
> On 22/11/2017 14:06, Tetsuo Handa wrote:
> >> I am not sure we want to overcomplicate the code too much. Most
> >> architectures do not have that many numa nodes to care. If we really
> >> need to care maybe we should rethink and get rid of the per numa
> >> deferred count altogether.
> > the amount of changes needed for checking for an error will exceed the amount of
> > changes needed for making register_shrinker() not to return an error.
> > Do we want to overcomplicate register_shrinker() callers?
> 
> For KVM it's not a big deal, fixing kvm_mmu_module_init to check the
> return value is trivial.

I suspect others will be in a similar situation. I've tried to do so for
sget_userns [1] and it didn't look terrible either.

[1] http://lkml.kernel.org/r/20171121140500.bgkpwcdk2dxesao4@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 10:53     ` Tetsuo Handa
  2017-11-22 12:45       ` Michal Hocko
@ 2017-11-22 20:39       ` Dave Chinner
  2017-11-23  6:34         ` Tetsuo Handa
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2017-11-22 20:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, glauber, mhocko, linux-mm, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

On Wed, Nov 22, 2017 at 07:53:59PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Andrew Morton wrote:
> > > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > > 
> > > > There are users not checking for register_shrinker() failure.
> > > > Continuing with ignoring failure can lead to later oops at
> > > > unregister_shrinker().
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/shrinker.h
> > > > +++ b/include/linux/shrinker.h
> > > > @@ -75,6 +75,6 @@ struct shrinker {
> > > >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> > > >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> > > >  
> > > > -extern int register_shrinker(struct shrinker *);
> > > > +extern __must_check int register_shrinker(struct shrinker *);
> > > >  extern void unregister_shrinker(struct shrinker *);
> > > >  #endif
> > > 
> > > hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> > > failing.
> > 
> > It failed by fault injection and resulted in a report at
> > http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .
> 
> Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12
> (which might not be impossible in near future), register_shrinker() can
> potentially become a costly allocation which might fail without invoking
> the OOM killer. It is a good opportunity to think whether we should allow
> register_shrinker() to fail.

Just fix the numa aware shrinkers, as they are the only ones that
will have this problem. There are only 6 of them, and only the 3
that existed at the time that register_shrinker() was changed to
return an error fail to check for an error. i.e. the superblock
shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.

Seems pretty straight forward to me....

> > > Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> > > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> > > staging stuff.
> > > 
> > > I'm not sure this is worth bothering about?
> > > 
> > 
> > Continuing with failed register_shrinker() is almost always wrong.
> > Though I don't know whether mm/zsmalloc.c case can make sense.
> > 
> 
> Thinking from the fact that register_shrinker() had been "void" until Linux 3.11
> and we did not take appropriate precautions when changing to "int" in Linux 3.12,
> we need to consider making register_shrinker() "void" again.
> 
> If we could agree with opening up the use of __GFP_NOFAIL for allocating a few
> non-contiguous pages on large systems, we can make register_shrinker() "void"
> again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE)
> rather than kvmalloc() in order to use __GFP_NOFAIL.)

That's insane. NACK.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-22 20:39       ` Dave Chinner
@ 2017-11-23  6:34         ` Tetsuo Handa
  2017-11-23  8:02           ` Michal Hocko
  2017-11-23  9:21           ` Paolo Bonzini
  0 siblings, 2 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-23  6:34 UTC (permalink / raw)
  To: david, mhocko
  Cc: akpm, glauber, linux-mm, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

Dave Chinner wrote:
> On Wed, Nov 22, 2017 at 07:53:59PM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > > Andrew Morton wrote:
> > > > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > > > 
> > > > > There are users not checking for register_shrinker() failure.
> > > > > Continuing with ignoring failure can lead to later oops at
> > > > > unregister_shrinker().
> > > > > 
> > > > > ...
> > > > >
> > > > > --- a/include/linux/shrinker.h
> > > > > +++ b/include/linux/shrinker.h
> > > > > @@ -75,6 +75,6 @@ struct shrinker {
> > > > >  #define SHRINKER_NUMA_AWARE	(1 << 0)
> > > > >  #define SHRINKER_MEMCG_AWARE	(1 << 1)
> > > > >  
> > > > > -extern int register_shrinker(struct shrinker *);
> > > > > +extern __must_check int register_shrinker(struct shrinker *);
> > > > >  extern void unregister_shrinker(struct shrinker *);
> > > > >  #endif
> > > > 
> > > > hm, well, OK, it's a small kmalloc(GFP_KERNEL).  That won't be
> > > > failing.
> > > 
> > > It failed by fault injection and resulted in a report at
> > > http://lkml.kernel.org/r/001a113f996099503a055e793dd3@google.com .
> > 
> > Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12
> > (which might not be impossible in near future), register_shrinker() can
> > potentially become a costly allocation which might fail without invoking
> > the OOM killer. It is a good opportunity to think whether we should allow
> > register_shrinker() to fail.
> 
> Just fix the numa aware shrinkers, as they are the only ones that
> will have this problem. There are only 6 of them, and only the 3
> that existed at the time that register_shrinker() was changed to
> return an error fail to check for an error. i.e. the superblock
> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.

You are assuming the "too small to fail" memory-allocation rule
by ignoring that this problem is caused by fault injection.

> 
> Seems pretty straight forward to me....
> 
> > > > Affected code seems to be fs/xfs, fs/super.c, fs/quota,
> > > > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of
> > > > staging stuff.
> > > > 
> > > > I'm not sure this is worth bothering about?
> > > > 
> > > 
> > > Continuing with failed register_shrinker() is almost always wrong.
> > > Though I don't know whether mm/zsmalloc.c case can make sense.
> > > 
> > 
> > Thinking from the fact that register_shrinker() had been "void" until Linux 3.11
> > and we did not take appropriate precautions when changing to "int" in Linux 3.12,
> > we need to consider making register_shrinker() "void" again.
> > 
> > If we could agree with opening up the use of __GFP_NOFAIL for allocating a few
> > non-contiguous pages on large systems, we can make register_shrinker() "void"
> > again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE)
> > rather than kvmalloc() in order to use __GFP_NOFAIL.)
> 
> That's insane. NACK.

That does not solve the problem. We have fault injection which allows precisely
N'th memory allocation request. As long as we fix only numa aware shrinkers,
fault injection will still allow numa unaware shrinkers to crash.

We need to make sure that all shrinkers are ready to handle allocation request,
or make register_shrinker() never fail, or (a different approach shown below)
let register_shrinker() fallback to numa unaware if memory allocation request
failed (because Michal is assuming that most architectures do not have that
many numa nodes to care which means that kmalloc() unlikely fails).

 include/linux/shrinker.h |  4 +++-
 mm/vmscan.c              | 28 ++++++++++++++--------------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index a389491..9caf67b 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -66,6 +66,8 @@ struct shrinker {
 
 	/* These are for internal use */
 	struct list_head list;
+	/* objs pending delete, global */
+	atomic_long_t nr_deferred_global;
 	/* objs pending delete, per node */
 	atomic_long_t *nr_deferred;
 };
@@ -75,6 +77,6 @@ struct shrinker {
 #define SHRINKER_NUMA_AWARE	(1 << 0)
 #define SHRINKER_MEMCG_AWARE	(1 << 1)
 
-extern __must_check int register_shrinker(struct shrinker *);
+extern int register_shrinker(struct shrinker *); /* Always returns 0. */
 extern void unregister_shrinker(struct shrinker *);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6a5a72b..7a54229 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -276,15 +276,13 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
  */
 int register_shrinker(struct shrinker *shrinker)
 {
-	size_t size = sizeof(*shrinker->nr_deferred);
-
-	if (shrinker->flags & SHRINKER_NUMA_AWARE)
-		size *= nr_node_ids;
-
-	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
-	if (!shrinker->nr_deferred)
-		return -ENOMEM;
-
+	atomic_long_set(&shrinker->nr_deferred_global, 0);
+	if (shrinker->flags & SHRINKER_NUMA_AWARE) {
+		shrinker->nr_deferred = kzalloc(sizeof(atomic_long_t) *
+						nr_node_ids, GFP_KERNEL);
+		if (!shrinker->nr_deferred)
+			shrinker->flags &= ~SHRINKER_NUMA_AWARE;
+	}
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	up_write(&shrinker_rwsem);
@@ -300,7 +298,8 @@ void unregister_shrinker(struct shrinker *shrinker)
 	down_write(&shrinker_rwsem);
 	list_del(&shrinker->list);
 	up_write(&shrinker_rwsem);
-	kfree(shrinker->nr_deferred);
+	if (shrinker->flags & SHRINKER_NUMA_AWARE)
+		kfree(shrinker->nr_deferred);
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
@@ -319,6 +318,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	long batch_size = shrinker->batch ? shrinker->batch
 					  : SHRINK_BATCH;
 	long scanned = 0, next_deferred;
+	atomic_long_t *nr_deferred = (shrinker->flags & SHRINKER_NUMA_AWARE) ?
+		&shrinker->nr_deferred[nid] : &shrinker->nr_deferred_global;
 
 	freeable = shrinker->count_objects(shrinker, shrinkctl);
 	if (freeable == 0)
@@ -329,7 +330,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	 * and zero it so that other concurrent shrinker invocations
 	 * don't also do this scanning work.
 	 */
-	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
+	nr = atomic_long_xchg(nr_deferred, 0);
 
 	total_scan = nr;
 	delta = freeable >> priority;
@@ -414,10 +415,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	 * scan, there is no need to do an update.
 	 */
 	if (next_deferred > 0)
-		new_nr = atomic_long_add_return(next_deferred,
-						&shrinker->nr_deferred[nid]);
+		new_nr = atomic_long_add_return(next_deferred, nr_deferred);
 	else
-		new_nr = atomic_long_read(&shrinker->nr_deferred[nid]);
+		new_nr = atomic_long_read(nr_deferred);
 
 	trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan);
 	return freed;
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23  6:34         ` Tetsuo Handa
@ 2017-11-23  8:02           ` Michal Hocko
  2017-11-23  9:21           ` Paolo Bonzini
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2017-11-23  8:02 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: david, akpm, glauber, linux-mm, viro, jack, pbonzini, airlied,
	alexander.deucher, shli, snitzer

On Thu 23-11-17 15:34:13, Tetsuo Handa wrote:
> Dave Chinner wrote:
[...]
> > Just fix the numa aware shrinkers, as they are the only ones that
> > will have this problem. There are only 6 of them, and only the 3
> > that existed at the time that register_shrinker() was changed to
> > return an error fail to check for an error. i.e. the superblock
> > shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.

Absolutely agreed! I haven't checked other shrinkers but those should be
quite easy to fix as well.

> You are assuming the "too small to fail" memory-allocation rule
> by ignoring that this problem is caused by fault injection.

Which is a non-argument because _nobody_ sane runs fault injection on
production systems.

[...]

> We need to make sure that all shrinkers are ready to handle allocation request,
> or make register_shrinker() never fail, or (a different approach shown below)
> let register_shrinker() fallback to numa unaware if memory allocation request
> failed (because Michal is assuming that most architectures do not have that
> many numa nodes to care which means that kmalloc() unlikely fails).

This is just insane.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23  6:34         ` Tetsuo Handa
  2017-11-23  8:02           ` Michal Hocko
@ 2017-11-23  9:21           ` Paolo Bonzini
  2017-11-23  9:56             ` Tetsuo Handa
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-11-23  9:21 UTC (permalink / raw)
  To: Tetsuo Handa, david, mhocko
  Cc: akpm, glauber, linux-mm, viro, jack, airlied, alexander.deucher,
	shli, snitzer

On 23/11/2017 07:34, Tetsuo Handa wrote:
>> Just fix the numa aware shrinkers, as they are the only ones that
>> will have this problem. There are only 6 of them, and only the 3
>> that existed at the time that register_shrinker() was changed to
>> return an error fail to check for an error. i.e. the superblock
>> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
>
> You are assuming the "too small to fail" memory-allocation rule
> by ignoring that this problem is caused by fault injection.

Fault injection should also obey the too small to fail rule, at least by
default.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23  9:21           ` Paolo Bonzini
@ 2017-11-23  9:56             ` Tetsuo Handa
  2017-11-23 10:02               ` Michal Hocko
  2017-11-23 10:43               ` Paolo Bonzini
  0 siblings, 2 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-23  9:56 UTC (permalink / raw)
  To: pbonzini
  Cc: david, mhocko, akpm, glauber, linux-mm, viro, jack, airlied,
	alexander.deucher, shli, snitzer

Paolo Bonzini wrote:
> On 23/11/2017 07:34, Tetsuo Handa wrote:
> >> Just fix the numa aware shrinkers, as they are the only ones that
> >> will have this problem. There are only 6 of them, and only the 3
> >> that existed at the time that register_shrinker() was changed to
> >> return an error fail to check for an error. i.e. the superblock
> >> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
> >
> > You are assuming the "too small to fail" memory-allocation rule
> > by ignoring that this problem is caused by fault injection.
> 
> Fault injection should also obey the too small to fail rule, at least by
> default.
> 

Pardon? Most allocation requests in the kernel are <= 32KB.
Such change makes fault injection useless. ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23  9:56             ` Tetsuo Handa
@ 2017-11-23 10:02               ` Michal Hocko
  2017-11-23 10:38                 ` Tetsuo Handa
  2017-11-23 10:43               ` Paolo Bonzini
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2017-11-23 10:02 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: pbonzini, david, akpm, glauber, linux-mm, viro, jack, airlied,
	alexander.deucher, shli, snitzer

On Thu 23-11-17 18:56:53, Tetsuo Handa wrote:
> Paolo Bonzini wrote:
> > On 23/11/2017 07:34, Tetsuo Handa wrote:
> > >> Just fix the numa aware shrinkers, as they are the only ones that
> > >> will have this problem. There are only 6 of them, and only the 3
> > >> that existed at the time that register_shrinker() was changed to
> > >> return an error fail to check for an error. i.e. the superblock
> > >> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
> > >
> > > You are assuming the "too small to fail" memory-allocation rule
> > > by ignoring that this problem is caused by fault injection.
> > 
> > Fault injection should also obey the too small to fail rule, at least by
> > default.
> > 
> 
> Pardon? Most allocation requests in the kernel are <= 32KB.
> Such change makes fault injection useless. ;-)

Agreed! All we need is to fix the shrinker registration callers. It is
that simple. The rest is just a distraction.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23 10:02               ` Michal Hocko
@ 2017-11-23 10:38                 ` Tetsuo Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-23 10:38 UTC (permalink / raw)
  To: mhocko
  Cc: pbonzini, david, akpm, glauber, linux-mm, viro, jack, airlied,
	alexander.deucher, shli, snitzer, sfr

Michal Hocko wrote:
> On Thu 23-11-17 18:56:53, Tetsuo Handa wrote:
> > Paolo Bonzini wrote:
> > > On 23/11/2017 07:34, Tetsuo Handa wrote:
> > > >> Just fix the numa aware shrinkers, as they are the only ones that
> > > >> will have this problem. There are only 6 of them, and only the 3
> > > >> that existed at the time that register_shrinker() was changed to
> > > >> return an error fail to check for an error. i.e. the superblock
> > > >> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
> > > >
> > > > You are assuming the "too small to fail" memory-allocation rule
> > > > by ignoring that this problem is caused by fault injection.
> > > 
> > > Fault injection should also obey the too small to fail rule, at least by
> > > default.
> > > 
> > 
> > Pardon? Most allocation requests in the kernel are <= 32KB.
> > Such change makes fault injection useless. ;-)
> 
> Agreed! All we need is to fix the shrinker registration callers. It is
> that simple. The rest is just a distraction.
> 

Which coverage (all register_shrinker() callers or only SHRINKER_NUMA_AWARE
callers) are you talking about? If the former, keeping __must_check is OK.
If the latter, it will not avoid future oops reports with fault injection.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23  9:56             ` Tetsuo Handa
  2017-11-23 10:02               ` Michal Hocko
@ 2017-11-23 10:43               ` Paolo Bonzini
  2017-11-23 11:03                 ` Tetsuo Handa
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-11-23 10:43 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: david, mhocko, akpm, glauber, linux-mm, viro, jack, airlied,
	alexander.deucher, shli, snitzer

On 23/11/2017 10:56, Tetsuo Handa wrote:
> Paolo Bonzini wrote:
>> On 23/11/2017 07:34, Tetsuo Handa wrote:
>>>> Just fix the numa aware shrinkers, as they are the only ones that
>>>> will have this problem. There are only 6 of them, and only the 3
>>>> that existed at the time that register_shrinker() was changed to
>>>> return an error fail to check for an error. i.e. the superblock
>>>> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
>>>
>>> You are assuming the "too small to fail" memory-allocation rule
>>> by ignoring that this problem is caused by fault injection.
>>
>> Fault injection should also obey the too small to fail rule, at least by
>> default.
> 
> Pardon? Most allocation requests in the kernel are <= 32KB.
> Such change makes fault injection useless. ;-)

But if these calls are "too small to fail", you are injecting a fault on
something that cannot fail anyway.  Unless you're aiming at removing
"too small to fail", then I understand.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm,vmscan: Mark register_shrinker() as __must_check
  2017-11-23 10:43               ` Paolo Bonzini
@ 2017-11-23 11:03                 ` Tetsuo Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Tetsuo Handa @ 2017-11-23 11:03 UTC (permalink / raw)
  To: pbonzini
  Cc: david, mhocko, akpm, glauber, linux-mm, viro, jack, airlied,
	alexander.deucher, shli, snitzer

Paolo Bonzini wrote:
> On 23/11/2017 10:56, Tetsuo Handa wrote:
> > Paolo Bonzini wrote:
> >> On 23/11/2017 07:34, Tetsuo Handa wrote:
> >>>> Just fix the numa aware shrinkers, as they are the only ones that
> >>>> will have this problem. There are only 6 of them, and only the 3
> >>>> that existed at the time that register_shrinker() was changed to
> >>>> return an error fail to check for an error. i.e. the superblock
> >>>> shrinker, the XFS dquot shrinker and the XFS buffer cache shrinker.
> >>>
> >>> You are assuming the "too small to fail" memory-allocation rule
> >>> by ignoring that this problem is caused by fault injection.
> >>
> >> Fault injection should also obey the too small to fail rule, at least by
> >> default.
> > 
> > Pardon? Most allocation requests in the kernel are <= 32KB.
> > Such change makes fault injection useless. ;-)
> 
> But if these calls are "too small to fail", you are injecting a fault on
> something that cannot fail anyway.  Unless you're aiming at removing
> "too small to fail", then I understand.
> 

The "too small to fail" does not mean that small allocations cannot fail.
It unlikely fails because it will retry forever unless killed as an OOM victim,
and currently we allow it to try memory allocation from memory reserves when
killed as an OOM victim. But such assumption is fragile. We unexpectedly
forget to allow it to try memory allocation from memory reserves (e.g. commit 
c288983dddf71421 ("mm/page_alloc.c: make sure OOM victim can try allocations
with no watermarks once")). There is no guarantee that small allocations
will not fail in future.

To me, using __GFP_NOFAIL for register_shrinker() sounds the simplest fix.
Adding error handling code to all register_shrinker() callers is not worth
bothering. In most runtime environments, kmalloc() in register_shrinker()
will be order 0.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-11-23 11:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21 12:02 [PATCH] mm,vmscan: Mark register_shrinker() as __must_check Tetsuo Handa
2017-11-21 21:40 ` Andrew Morton
2017-11-21 22:09   ` Tetsuo Handa
2017-11-22 10:53     ` Tetsuo Handa
2017-11-22 12:45       ` Michal Hocko
2017-11-22 13:06         ` Tetsuo Handa
2017-11-22 14:31           ` Paolo Bonzini
2017-11-22 14:36             ` Michal Hocko
2017-11-22 20:39       ` Dave Chinner
2017-11-23  6:34         ` Tetsuo Handa
2017-11-23  8:02           ` Michal Hocko
2017-11-23  9:21           ` Paolo Bonzini
2017-11-23  9:56             ` Tetsuo Handa
2017-11-23 10:02               ` Michal Hocko
2017-11-23 10:38                 ` Tetsuo Handa
2017-11-23 10:43               ` Paolo Bonzini
2017-11-23 11:03                 ` Tetsuo Handa
2017-11-22 12:33     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.