All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] shrinker: fix a bug when callback returns -1
@ 2011-08-29 19:36 Mikulas Patocka
  2011-08-30  2:13 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2011-08-29 19:36 UTC (permalink / raw)
  To: Dave Chinner, Al Viro; +Cc: Christoph Hellwig, dm-devel, linux-kernel

Hi

This patch fixes a lockup when shrinker callback returns -1.

BTW. shouldn't the value returned by callback be long instead of int? On 
64-bit architectures, there may be more than 2^32 entries allocated.

Mikulas

---

shrinker: fix a bug when callback returns -1

Shrinker callback can return -1 if it is at a risk of deadlock.
However, this is not tested at some places.

If do_shrinker_shrink returns -1 here
"max_pass = do_shrinker_shrink(shrinker, shrink, 0)",
it is converted to an unsigned long integer. This may result in excessive
total_scan value and a lockup due to code looping too much in
"while (total_scan >= batch_size)" cycle.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 mm/vmscan.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-3.1-rc3-fast/mm/vmscan.c
===================================================================
--- linux-3.1-rc3-fast.orig/mm/vmscan.c	2011-08-29 20:34:27.000000000 +0200
+++ linux-3.1-rc3-fast/mm/vmscan.c	2011-08-29 20:37:38.000000000 +0200
@@ -250,6 +250,7 @@ unsigned long shrink_slab(struct shrink_
 		unsigned long long delta;
 		unsigned long total_scan;
 		unsigned long max_pass;
+		int sr;
 		int shrink_ret = 0;
 		long nr;
 		long new_nr;
@@ -266,7 +267,10 @@ unsigned long shrink_slab(struct shrink_
 		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
 
 		total_scan = nr;
-		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
+		sr = do_shrinker_shrink(shrinker, shrink, 0);
+		if (sr == -1)
+			continue;
+		max_pass = sr;
 		delta = (4 * nr_pages_scanned) / shrinker->seeks;
 		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
@@ -309,6 +313,8 @@ unsigned long shrink_slab(struct shrink_
 			int nr_before;
 
 			nr_before = do_shrinker_shrink(shrinker, shrink, 0);
+			if (nr_before == -1)
+				break;
 			shrink_ret = do_shrinker_shrink(shrinker, shrink,
 							batch_size);
 			if (shrink_ret == -1)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] shrinker: fix a bug when callback returns -1
  2011-08-29 19:36 [PATCH] shrinker: fix a bug when callback returns -1 Mikulas Patocka
@ 2011-08-30  2:13 ` Dave Chinner
  2011-08-30 19:52   ` Mikulas Patocka
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-08-30  2:13 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, dm-devel, linux-kernel

On Mon, Aug 29, 2011 at 03:36:48PM -0400, Mikulas Patocka wrote:
> Hi
> 
> This patch fixes a lockup when shrinker callback returns -1.

What lockup is that? I haven't seen any bug reports, and this code
has been like this for several years, so I'm kind of wondering why
this is suddenly an issue....

> BTW. shouldn't the value returned by callback be long instead of int? On 
> 64-bit architectures, there may be more than 2^32 entries allocated.

The API hasn't changed since the early 2.5 series, so that wasn't a
consideration when it was originally written. As it is, I make this
exact change in the shrinker API update patchset I proposed
recently for exactly the reasons you suggest:

http://thread.gmane.org/gmane.linux.kernel.mm/67326


> Mikulas
> 
> ---
> 
> shrinker: fix a bug when callback returns -1
> 
> Shrinker callback can return -1 if it is at a risk of deadlock.
> However, this is not tested at some places.
> 
> If do_shrinker_shrink returns -1 here
> "max_pass = do_shrinker_shrink(shrinker, shrink, 0)",
> it is converted to an unsigned long integer. This may result in excessive
> total_scan value and a lockup due to code looping too much in
> "while (total_scan >= batch_size)" cycle.

> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  mm/vmscan.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> Index: linux-3.1-rc3-fast/mm/vmscan.c
> ===================================================================
> --- linux-3.1-rc3-fast.orig/mm/vmscan.c	2011-08-29 20:34:27.000000000 +0200
> +++ linux-3.1-rc3-fast/mm/vmscan.c	2011-08-29 20:37:38.000000000 +0200
> @@ -250,6 +250,7 @@ unsigned long shrink_slab(struct shrink_
>  		unsigned long long delta;
>  		unsigned long total_scan;
>  		unsigned long max_pass;
> +		int sr;
>  		int shrink_ret = 0;
>  		long nr;
>  		long new_nr;
> @@ -266,7 +267,10 @@ unsigned long shrink_slab(struct shrink_
>  		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
>  
>  		total_scan = nr;
> -		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> +		sr = do_shrinker_shrink(shrinker, shrink, 0);
> +		if (sr == -1)
> +			continue;

IIRC from my recent shrinker audit, none of the existing shrinkers
return return -1 when nr_to_scan == 0, so this check has never been
necessary.

> +		max_pass = sr;
>  		delta = (4 * nr_pages_scanned) / shrinker->seeks;
>  		delta *= max_pass;
>  		do_div(delta, lru_pages + 1);
> @@ -309,6 +313,8 @@ unsigned long shrink_slab(struct shrink_
>  			int nr_before;
>  
>  			nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> +			if (nr_before == -1)
> +				break;

Same here.

>  			shrink_ret = do_shrinker_shrink(shrinker, shrink,
>  							batch_size);
>  			if (shrink_ret == -1)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] shrinker: fix a bug when callback returns -1
  2011-08-30  2:13 ` Dave Chinner
@ 2011-08-30 19:52   ` Mikulas Patocka
  2011-08-30 22:37     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2011-08-30 19:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, dm-devel, linux-kernel

On Tue, 30 Aug 2011, Dave Chinner wrote:

> On Mon, Aug 29, 2011 at 03:36:48PM -0400, Mikulas Patocka wrote:
> > Hi
> > 
> > This patch fixes a lockup when shrinker callback returns -1.
> 
> What lockup is that? I haven't seen any bug reports, and this code
> has been like this for several years, so I'm kind of wondering why
> this is suddenly an issue....

I got the lockups when modifying my own dm-bufio code to use the shrinker. 
The reason for lockups was that the variable total_scan contained 
extremely high values.

The only possible way how such extreme values could be stored in 
total_scan was this:

max_pass = do_shrinker_shrink(shrinker, shrink, 0);
delta = (4 * nr_pages_scanned) / shrinker->seeks;
delta *= max_pass;
do_div(delta, lru_pages + 1);
total_scan += delta;

--- you don't test if do_shinker_shrink retuned -1 here. The variables are 
unsigned long, so you end up adding extreme value (approximately 
2^64/(lru_pages+1) to total_scan.

Note that some existing shrinkers contain workaround for this (something 
like "return nr_to_scan ? -1 : 0", while some can still return -1 when 
nr_to_scan is 0 and trigger this bug (prune_super).

> > BTW. shouldn't the value returned by callback be long instead of int? On 
> > 64-bit architectures, there may be more than 2^32 entries allocated.
> 
> The API hasn't changed since the early 2.5 series, so that wasn't a
> consideration when it was originally written. As it is, I make this
> exact change in the shrinker API update patchset I proposed
> recently for exactly the reasons you suggest:
> 
> http://thread.gmane.org/gmane.linux.kernel.mm/67326
> 
> 
> > Mikulas
> > 
> > ---
> > 
> > shrinker: fix a bug when callback returns -1
> > 
> > Shrinker callback can return -1 if it is at a risk of deadlock.
> > However, this is not tested at some places.
> > 
> > If do_shrinker_shrink returns -1 here
> > "max_pass = do_shrinker_shrink(shrinker, shrink, 0)",
> > it is converted to an unsigned long integer. This may result in excessive
> > total_scan value and a lockup due to code looping too much in
> > "while (total_scan >= batch_size)" cycle.
> 
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> >  mm/vmscan.c |    8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > Index: linux-3.1-rc3-fast/mm/vmscan.c
> > ===================================================================
> > --- linux-3.1-rc3-fast.orig/mm/vmscan.c	2011-08-29 20:34:27.000000000 +0200
> > +++ linux-3.1-rc3-fast/mm/vmscan.c	2011-08-29 20:37:38.000000000 +0200
> > @@ -250,6 +250,7 @@ unsigned long shrink_slab(struct shrink_
> >  		unsigned long long delta;
> >  		unsigned long total_scan;
> >  		unsigned long max_pass;
> > +		int sr;
> >  		int shrink_ret = 0;
> >  		long nr;
> >  		long new_nr;
> > @@ -266,7 +267,10 @@ unsigned long shrink_slab(struct shrink_
> >  		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
> >  
> >  		total_scan = nr;
> > -		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> > +		sr = do_shrinker_shrink(shrinker, shrink, 0);
> > +		if (sr == -1)
> > +			continue;
> 
> IIRC from my recent shrinker audit, none of the existing shrinkers
> return return -1 when nr_to_scan == 0, so this check has never been
> necessary.
> 
> > +		max_pass = sr;
> >  		delta = (4 * nr_pages_scanned) / shrinker->seeks;
> >  		delta *= max_pass;
> >  		do_div(delta, lru_pages + 1);
> > @@ -309,6 +313,8 @@ unsigned long shrink_slab(struct shrink_
> >  			int nr_before;
> >  
> >  			nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> > +			if (nr_before == -1)
> > +				break;
> 
> Same here.
> 
> >  			shrink_ret = do_shrinker_shrink(shrinker, shrink,
> >  							batch_size);
> >  			if (shrink_ret == -1)
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] shrinker: fix a bug when callback returns -1
  2011-08-30 19:52   ` Mikulas Patocka
@ 2011-08-30 22:37     ` Dave Chinner
  2011-08-31 13:31       ` Mikulas Patocka
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-08-30 22:37 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, dm-devel, linux-kernel

On Tue, Aug 30, 2011 at 03:52:02PM -0400, Mikulas Patocka wrote:
> On Tue, 30 Aug 2011, Dave Chinner wrote:
> 
> > On Mon, Aug 29, 2011 at 03:36:48PM -0400, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > This patch fixes a lockup when shrinker callback returns -1.
> > 
> > What lockup is that? I haven't seen any bug reports, and this code
> > has been like this for several years, so I'm kind of wondering why
> > this is suddenly an issue....
> 
> I got the lockups when modifying my own dm-bufio code to use the shrinker. 
> The reason for lockups was that the variable total_scan contained 
> extremely high values.

Your new shrinker was returning -1 when nr_to_scan == 0? That's not
correct - you should be returning the count of objects (regardless
of the specified gfp_mask) or 0 if you can't get one for whatever
reason....

> The only possible way how such extreme values could be stored in 
> total_scan was this:
> 
> max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> delta = (4 * nr_pages_scanned) / shrinker->seeks;
> delta *= max_pass;
> do_div(delta, lru_pages + 1);
> total_scan += delta;
> 
> --- you don't test if do_shinker_shrink retuned -1 here. The variables are 
> unsigned long, so you end up adding extreme value (approximately 
> 2^64/(lru_pages+1) to total_scan.

That's not the only way to get large values in total scan. If
you do any amount of GFP_NOFS/GFP_NOIO allocation and the shrinker
aborts when it sees this, the shrinker->nr total will aggregate
until it becomes large. total_scan contains that aggregation because
it starts from the current value of shrinker->nr.

> Note that some existing shrinkers contain workaround for this (something 
> like "return nr_to_scan ? -1 : 0",

That's not a workaround - that is exactly how the current API
expects them to operate. That is, when counting objects, you return
the count of objects. If you can't get the count, you return 0.

Did I mention I was rewriting the API to make it more sane, obvious
and simple to implement correctly?

> while some can still return -1 when 
> nr_to_scan is 0 and trigger this bug (prune_super).

prune_super() will only return -1 if grab_super_passive() fails,
which indicates that something serious is happening on the
superblock (like unmount, remount or freeze) in which case the
caches are about to or already undergoing significant change anyway.
It could be seen as a bug, but it's really a "don't care" case - it
doesn't matter what the calculated value is because it doesn't
matter what the shrinker does after such a failure - the next call
is going to fail to grab the superblock, too.

And FWIW, that wart also goes away with the shrinker API rework.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] shrinker: fix a bug when callback returns -1
  2011-08-30 22:37     ` Dave Chinner
@ 2011-08-31 13:31       ` Mikulas Patocka
  0 siblings, 0 replies; 5+ messages in thread
From: Mikulas Patocka @ 2011-08-31 13:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, dm-devel, linux-kernel



On Wed, 31 Aug 2011, Dave Chinner wrote:

> On Tue, Aug 30, 2011 at 03:52:02PM -0400, Mikulas Patocka wrote:
> > On Tue, 30 Aug 2011, Dave Chinner wrote:
> > 
> > > On Mon, Aug 29, 2011 at 03:36:48PM -0400, Mikulas Patocka wrote:
> > > > Hi
> > > > 
> > > > This patch fixes a lockup when shrinker callback returns -1.
> > > 
> > > What lockup is that? I haven't seen any bug reports, and this code
> > > has been like this for several years, so I'm kind of wondering why
> > > this is suddenly an issue....
> > 
> > I got the lockups when modifying my own dm-bufio code to use the shrinker. 
> > The reason for lockups was that the variable total_scan contained 
> > extremely high values.
> 
> Your new shrinker was returning -1 when nr_to_scan == 0? That's not
> correct - you should be returning the count of objects (regardless
> of the specified gfp_mask) or 0 if you can't get one for whatever
> reason....
> 
> > The only possible way how such extreme values could be stored in 
> > total_scan was this:
> > 
> > max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> > delta = (4 * nr_pages_scanned) / shrinker->seeks;
> > delta *= max_pass;
> > do_div(delta, lru_pages + 1);
> > total_scan += delta;
> > 
> > --- you don't test if do_shinker_shrink retuned -1 here. The variables are 
> > unsigned long, so you end up adding extreme value (approximately 
> > 2^64/(lru_pages+1) to total_scan.
> 
> That's not the only way to get large values in total scan. If
> you do any amount of GFP_NOFS/GFP_NOIO allocation and the shrinker
> aborts when it sees this, the shrinker->nr total will aggregate
> until it becomes large. total_scan contains that aggregation because
> it starts from the current value of shrinker->nr.
> 
> > Note that some existing shrinkers contain workaround for this (something 
> > like "return nr_to_scan ? -1 : 0",
> 
> That's not a workaround - that is exactly how the current API
> expects them to operate. That is, when counting objects, you return
> the count of objects. If you can't get the count, you return 0.

So apply this patch, that mentions this requirement in the specification 
and fixes the bug in fs/super.c.

> Did I mention I was rewriting the API to make it more sane, obvious
> and simple to implement correctly?
> 
> > while some can still return -1 when 
> > nr_to_scan is 0 and trigger this bug (prune_super).
> 
> prune_super() will only return -1 if grab_super_passive() fails,
> which indicates that something serious is happening on the
> superblock (like unmount, remount or freeze) in which case the
> caches are about to or already undergoing significant change anyway.
>
> It could be seen as a bug, but it's really a "don't care" case - it
> doesn't matter what the calculated value is because it doesn't
> matter what the shrinker does after such a failure - the next call
> is going to fail to grab the superblock, too.

It is a bug, because the shrinker will loop for 
2^64/(lru_pages+1)/batch_size times if that "-1" is returned at certain 
place --- in "max_pass = do_shrinker_shrink(shrinker, shrink, 0)".

Mikulas

> And FWIW, that wart also goes away with the shrinker API rework.
> 
> Cheers,
> 
> Dave.

---

Fix shrinker callback bug in fs/super.c

The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.

CC: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 fs/super.c               |    2 +-
 include/linux/shrinker.h |    1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-3.1-rc3-fast/fs/super.c
===================================================================
--- linux-3.1-rc3-fast.orig/fs/super.c	2011-08-31 15:21:01.000000000 +0200
+++ linux-3.1-rc3-fast/fs/super.c	2011-08-31 15:21:21.000000000 +0200
@@ -61,7 +61,7 @@ static int prune_super(struct shrinker *
 		return -1;
 
 	if (!grab_super_passive(sb))
-		return -1;
+		return !sc->nr_to_scan ? 0 : -1;
 
 	if (sb->s_op && sb->s_op->nr_cached_objects)
 		fs_objects = sb->s_op->nr_cached_objects(sb);
Index: linux-3.1-rc3-fast/include/linux/shrinker.h
===================================================================
--- linux-3.1-rc3-fast.orig/include/linux/shrinker.h	2011-08-31 15:21:28.000000000 +0200
+++ linux-3.1-rc3-fast/include/linux/shrinker.h	2011-08-31 15:21:58.000000000 +0200
@@ -20,6 +20,7 @@ struct shrink_control {
  * 'nr_to_scan' entries and attempt to free them up.  It should return
  * the number of objects which remain in the cache.  If it returns -1, it means
  * it cannot do any scanning at this time (eg. there is a risk of deadlock).
+ * The callback must not return -1 if nr_to_scan is zero.
  *
  * The 'gfpmask' refers to the allocation we are currently trying to
  * fulfil.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-31 13:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-29 19:36 [PATCH] shrinker: fix a bug when callback returns -1 Mikulas Patocka
2011-08-30  2:13 ` Dave Chinner
2011-08-30 19:52   ` Mikulas Patocka
2011-08-30 22:37     ` Dave Chinner
2011-08-31 13:31       ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.