zsmalloc: fix zs_can_compact() integer overflow
diff mbox series

Message ID 1462779333-7092-1-git-send-email-sergey.senozhatsky@gmail.com
State New, archived
Headers show
Series
  • zsmalloc: fix zs_can_compact() integer overflow
Related show

Commit Message

Sergey Senozhatsky May 9, 2016, 7:35 a.m. UTC
zs_can_compact() has two race conditions in its core calculation:

unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
				zs_stat_get(class, OBJ_USED);

1) classes are not locked, so the numbers of allocated and used
   objects can change by the concurrent ops happening on other CPUs
2) shrinker invokes it from preemptible context

Depending on the circumstances, OBJ_ALLOCATED can become less
than OBJ_USED, which can result in either very high or negative
`total_scan' value calculated in do_shrink_slab().

Take a pool stat snapshot and use it instead of racy zs_stat_get()
calls.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>        [4.3+]
---
 mm/zsmalloc.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Minchan Kim May 9, 2016, 8:07 a.m. UTC | #1
Hello Sergey,

On Mon, May 09, 2016 at 04:35:33PM +0900, Sergey Senozhatsky wrote:
> zs_can_compact() has two race conditions in its core calculation:
> 
> unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> 				zs_stat_get(class, OBJ_USED);
> 
> 1) classes are not locked, so the numbers of allocated and used
>    objects can change by the concurrent ops happening on other CPUs
> 2) shrinker invokes it from preemptible context
> 
> Depending on the circumstances, OBJ_ALLOCATED can become less
> than OBJ_USED, which can result in either very high or negative
> `total_scan' value calculated in do_shrink_slab().

So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
in vmscan.c and skip shrinking?

It would be better to explain what's the result without this patch
and end-user effect for going -stable.

At least, I seem to see above pr_err but at that time if I remember
correctly but at that time I thought it was a bug I introduces in
development process. Since then, I cannot reproduce the symptom
until now. :)

Good catch!
Comment is below.

> 
> Take a pool stat snapshot and use it instead of racy zs_stat_get()
> calls.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: <stable@vger.kernel.org>        [4.3+]
> ---
>  mm/zsmalloc.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 107ec06..1bc2a98 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)

It seems this patch is based on my old page migration work?
It's not go to the mainline yet but your patch which fixes the bug should
be supposed to go to the -stable. So, I hope this patch first.

Thanks.

>  static unsigned long zs_can_compact(struct size_class *class)
>  {
>  	unsigned long obj_wasted;
> +	unsigned long obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
> +	unsigned long obj_used = zs_stat_get(class, OBJ_USED);
>  
> -	obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> -		zs_stat_get(class, OBJ_USED);
> +	if (obj_allocated <= obj_used)
> +		return 0;
>  
> +	obj_wasted = obj_allocated - obj_used;
>  	obj_wasted /= get_maxobj_per_zspage(class->size,
>  			class->pages_per_zspage);
>  
> -- 
> 2.8.2
>
Sergey Senozhatsky May 9, 2016, 8:41 a.m. UTC | #2
Hello,

On (05/09/16 17:07), Minchan Kim wrote:
[..]
> > Depending on the circumstances, OBJ_ALLOCATED can become less
> > than OBJ_USED, which can result in either very high or negative
> > `total_scan' value calculated in do_shrink_slab().
> 
> So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
> in vmscan.c and skip shrinking?

yes

 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62


> It would be better to explain what's the result without this patch
> and end-user effect for going -stable.

it seems that not every overflowed value returned from zs_can_compact()
is getting detected in do_shrink_slab():

	freeable = shrinker->count_objects(shrinker, shrinkctl);
	if (freeable == 0)
		return 0;

	/*
	 * copy the current shrinker scan count into a local variable
	 * and zero it so that other concurrent shrinker invocations
	 * don't also do this scanning work.
	 */
	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);

	total_scan = nr;
	delta = (4 * nr_scanned) / shrinker->seeks;
	delta *= freeable;
	do_div(delta, nr_eligible + 1);
	total_scan += delta;
	if (total_scan < 0) {
		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
		       shrinker->scan_objects, total_scan);
		total_scan = freeable;
	}

this calculation can hide the shrinker->count_objects() error. I added
some debugging code (on x86_64), and the output was:

[   59.041959] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.041963] vmscan: >> but total_scan > 0: 92679974445502
[   59.041964] vmscan: >> resulting total_scan: 92679974445502

[   59.192734] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.192737] vmscan: >> but total_scan > 0: 5830197242006811
[   59.192738] vmscan: >> resulting total_scan: 5830197242006811

[   59.259805] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.259809] vmscan: >> but total_scan > 0: 23649671889371219
[   59.259810] vmscan: >> resulting total_scan: 23649671889371219

[   76.279767] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   76.279770] vmscan: >> but total_scan > 0: 895907920044174
[   76.279771] vmscan: >> resulting total_scan: 895907920044174

[   84.807837] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   84.807841] vmscan: >> but total_scan > 0: 22634041808232578
[   84.807842] vmscan: >> resulting total_scan: 22634041808232578

so we can end up with insanely huge total_scan values.

[..]
> > @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
> 
> It seems this patch is based on my old page migration work?
> It's not go to the mainline yet but your patch which fixes the bug should
> be supposed to go to the -stable. So, I hope this patch first.

oops... my fat fingers! good catch, thanks! I have two versions: for -next and
-mmots (with your LRU rework applied, indeed). somehow I managed to cd to the
wrong dir. sorry, will resend.

	-ss

Patch
diff mbox series

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 107ec06..1bc2a98 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2262,10 +2262,13 @@  static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
 static unsigned long zs_can_compact(struct size_class *class)
 {
 	unsigned long obj_wasted;
+	unsigned long obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
+	unsigned long obj_used = zs_stat_get(class, OBJ_USED);
 
-	obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
-		zs_stat_get(class, OBJ_USED);
+	if (obj_allocated <= obj_used)
+		return 0;
 
+	obj_wasted = obj_allocated - obj_used;
 	obj_wasted /= get_maxobj_per_zspage(class->size,
 			class->pages_per_zspage);