All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  1:15 ` Junil Lee
  0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18  1:15 UTC (permalink / raw)
  To: minchan, ngupta
  Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, Junil Lee

To prevent unlock at the not correct situation, tagging the new obj to
assure lock in migrate_zspage() before right unlock path.

Two functions are in race condition by tag which set 1 on last bit of
obj, however unlock succrently when update new obj to handle before call
unpin_tag() which is right unlock path.

summarize this problem by call flow as below:

		CPU0								CPU1
migrate_zspage
find_alloced_obj()
	trypin_tag() -- obj |= HANDLE_PIN_BIT
obj_malloc() -- new obj is not set			zs_free
record_obj() -- unlock and break sync		pin_tag() -- get lock
unpin_tag()

Before code make crash as below:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

and for test, print obj value after pin_tag() in zs_free().
Sometimes obj is even number means break synchronization.

After patched, crash is not occurred and obj is only odd number in same
situation.

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(free_obj, used_obj, class);
 		index++;
+		/* This also effectively unpins the handle */
 		record_obj(handle, free_obj);
-		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
 	}
 
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  1:15 ` Junil Lee
  0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18  1:15 UTC (permalink / raw)
  To: minchan, ngupta
  Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, Junil Lee

To prevent unlock at the not correct situation, tagging the new obj to
assure lock in migrate_zspage() before right unlock path.

Two functions are in race condition by tag which set 1 on last bit of
obj, however unlock succrently when update new obj to handle before call
unpin_tag() which is right unlock path.

summarize this problem by call flow as below:

		CPU0								CPU1
migrate_zspage
find_alloced_obj()
	trypin_tag() -- obj |= HANDLE_PIN_BIT
obj_malloc() -- new obj is not set			zs_free
record_obj() -- unlock and break sync		pin_tag() -- get lock
unpin_tag()

Before code make crash as below:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

and for test, print obj value after pin_tag() in zs_free().
Sometimes obj is even number means break synchronization.

After patched, crash is not occurred and obj is only odd number in same
situation.

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(free_obj, used_obj, class);
 		index++;
+		/* This also effectively unpins the handle */
 		record_obj(handle, free_obj);
-		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
 	}
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  1:15 ` Junil Lee
@ 2016-01-18  4:14   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  4:14 UTC (permalink / raw)
  To: Junil Lee
  Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, sergey.senozhatsky, Vlastimil Babka

Cc Vlastimil,


Hello,

On (01/18/16 10:15), Junil Lee wrote:
> To prevent unlock at the not correct situation, tagging the new obj to
> assure lock in migrate_zspage() before right unlock path.
> 
> Two functions are in race condition by tag which set 1 on last bit of
> obj, however unlock succrently when update new obj to handle before call
> unpin_tag() which is right unlock path.
> 
> summarize this problem by call flow as below:
> 
> 		CPU0								CPU1
> migrate_zspage
> find_alloced_obj()
> 	trypin_tag() -- obj |= HANDLE_PIN_BIT
> obj_malloc() -- new obj is not set			zs_free
> record_obj() -- unlock and break sync		pin_tag() -- get lock
> unpin_tag()

Junil, can something like this be a bit simpler problem description?

---

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
implicitly (accidentally) un-pins the handle, while migrate_zspage()
still performs an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition
with zs_free(), which can pin that handle by this time, so the handle
becomes un-pinned. Schematically, it goes like this:

CPU0							CPU1
migrate_zspage
  find_alloced_obj
    trypin_tag
      set HANDLE_PIN_BIT				zs_free()
							  pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT			   set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

Fix the race by removing explicit unpin_tag() from migrate_zspage().

---


> and for test, print obj value after pin_tag() in zs_free().
> Sometimes obj is even number means break synchronization.
> 
> After patched, crash is not occurred and obj is only odd number in same
> situation.
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>

I believe Vlastimil deserves a credit here (at least Suggested-by)
Suggested-by: Vlastimil Babka <vbabka@suse.cz>


now, can the compiler re-order

	record_obj(handle, free_obj);
	obj_free(pool, class, used_obj);

?

	-ss

> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */
>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  4:14   ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  4:14 UTC (permalink / raw)
  To: Junil Lee
  Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, sergey.senozhatsky, Vlastimil Babka

Cc Vlastimil,


Hello,

On (01/18/16 10:15), Junil Lee wrote:
> To prevent unlock at the not correct situation, tagging the new obj to
> assure lock in migrate_zspage() before right unlock path.
> 
> Two functions are in race condition by tag which set 1 on last bit of
> obj, however unlock succrently when update new obj to handle before call
> unpin_tag() which is right unlock path.
> 
> summarize this problem by call flow as below:
> 
> 		CPU0								CPU1
> migrate_zspage
> find_alloced_obj()
> 	trypin_tag() -- obj |= HANDLE_PIN_BIT
> obj_malloc() -- new obj is not set			zs_free
> record_obj() -- unlock and break sync		pin_tag() -- get lock
> unpin_tag()

Junil, can something like this be a bit simpler problem description?

---

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
implicitly (accidentally) un-pins the handle, while migrate_zspage()
still performs an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition
with zs_free(), which can pin that handle by this time, so the handle
becomes un-pinned. Schematically, it goes like this:

CPU0							CPU1
migrate_zspage
  find_alloced_obj
    trypin_tag
      set HANDLE_PIN_BIT				zs_free()
							  pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT			   set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

Fix the race by removing explicit unpin_tag() from migrate_zspage().

---


> and for test, print obj value after pin_tag() in zs_free().
> Sometimes obj is even number means break synchronization.
> 
> After patched, crash is not occurred and obj is only odd number in same
> situation.
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>

I believe Vlastimil deserves a credit here (at least Suggested-by)
Suggested-by: Vlastimil Babka <vbabka@suse.cz>


now, can the compiler re-order

	record_obj(handle, free_obj);
	obj_free(pool, class, used_obj);

?

	-ss

> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */
>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  4:14   ` Sergey Senozhatsky
@ 2016-01-18  4:17     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  4:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Junil Lee, minchan, ngupta, akpm, linux-mm, linux-kernel,
	sergey.senozhatsky, Vlastimil Babka

On (01/18/16 13:14), Sergey Senozhatsky wrote:
> Cc Vlastimil,
> 
> Hello,
> 
> On (01/18/16 10:15), Junil Lee wrote:
> > To prevent unlock at the not correct situation, tagging the new obj to
> > assure lock in migrate_zspage() before right unlock path.
> > 
> > Two functions are in race condition by tag which set 1 on last bit of
> > obj, however unlock succrently when update new obj to handle before call
> > unpin_tag() which is right unlock path.
> > 
> > summarize this problem by call flow as below:
> > 
> > 		CPU0								CPU1
> > migrate_zspage
> > find_alloced_obj()
> > 	trypin_tag() -- obj |= HANDLE_PIN_BIT
> > obj_malloc() -- new obj is not set			zs_free
> > record_obj() -- unlock and break sync		pin_tag() -- get lock
> > unpin_tag()
> 
> Junil, can something like this be a bit simpler problem description?
> 
> ---
> 
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
> implicitly (accidentally) un-pins the handle, while migrate_zspage()
> still performs an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition
> with zs_free(), which can pin that handle by this time, so the handle
> becomes un-pinned. Schematically, it goes like this:
> 
> CPU0							CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT				zs_free()
> 							  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT			   set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> ---
> 
> 
> > and for test, print obj value after pin_tag() in zs_free().
> > Sometimes obj is even number means break synchronization.
> > 
> > After patched, crash is not occurred and obj is only odd number in same
> > situation.
> > 
> > Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> 
> I believe Vlastimil deserves a credit here (at least Suggested-by)
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> 
> 
> now, can the compiler re-order
> 
> 	record_obj(handle, free_obj);
> 	obj_free(pool, class, used_obj);

oh, disregard the last "re-ordering" commentary, sorry.

	-ss

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  4:17     ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  4:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Junil Lee, minchan, ngupta, akpm, linux-mm, linux-kernel,
	sergey.senozhatsky, Vlastimil Babka

On (01/18/16 13:14), Sergey Senozhatsky wrote:
> Cc Vlastimil,
> 
> Hello,
> 
> On (01/18/16 10:15), Junil Lee wrote:
> > To prevent unlock at the not correct situation, tagging the new obj to
> > assure lock in migrate_zspage() before right unlock path.
> > 
> > Two functions are in race condition by tag which set 1 on last bit of
> > obj, however unlock succrently when update new obj to handle before call
> > unpin_tag() which is right unlock path.
> > 
> > summarize this problem by call flow as below:
> > 
> > 		CPU0								CPU1
> > migrate_zspage
> > find_alloced_obj()
> > 	trypin_tag() -- obj |= HANDLE_PIN_BIT
> > obj_malloc() -- new obj is not set			zs_free
> > record_obj() -- unlock and break sync		pin_tag() -- get lock
> > unpin_tag()
> 
> Junil, can something like this be a bit simpler problem description?
> 
> ---
> 
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
> implicitly (accidentally) un-pins the handle, while migrate_zspage()
> still performs an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition
> with zs_free(), which can pin that handle by this time, so the handle
> becomes un-pinned. Schematically, it goes like this:
> 
> CPU0							CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT				zs_free()
> 							  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT			   set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> ---
> 
> 
> > and for test, print obj value after pin_tag() in zs_free().
> > Sometimes obj is even number means break synchronization.
> > 
> > After patched, crash is not occurred and obj is only odd number in same
> > situation.
> > 
> > Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> 
> I believe Vlastimil deserves a credit here (at least Suggested-by)
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> 
> 
> now, can the compiler re-order
> 
> 	record_obj(handle, free_obj);
> 	obj_free(pool, class, used_obj);

oh, disregard the last "re-ordering" commentary, sorry.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18 14:09                 ` Minchan Kim
@ 2016-01-18 14:10                   ` Vlastimil Babka
  -1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 14:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 01/18/2016 03:09 PM, Minchan Kim wrote:
>
> True.
> Let's go with this. I hope it's the last.
> Thanks, guys.
>
>  From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
> From: Junil Lee <junil0814.lee@lge.com>
> Date: Mon, 18 Jan 2016 23:01:29 +0900
> Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0					CPU1
> migrate_zspage
>    find_alloced_obj
>      trypin_tag
>        set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>    obj_malloc() -- new object, no tag
>    record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>    unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> This patch keeps the lock bit in migration path and update
> value atomically.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Cc: <stable@vger.kernel.org> [4.1+]
> ---
>   mm/zsmalloc.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414cec220b..2d7c4c11fc63 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
>
>   static void record_obj(unsigned long handle, unsigned long obj)
>   {
> -	*(unsigned long *)handle = obj;
> +	/*
> +	 * lsb of @obj represents handle lock while other bits
> +	 * represent object value the handle is pointing so
> +	 * updating shouldn't do store tearing.
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, obj);
>   }
>
>   /* zpool driver */
> @@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>   		free_obj = obj_malloc(d_page, class, handle);
>   		zs_object_copy(free_obj, used_obj, class);
>   		index++;
> +		/*
> +		 * record_obj updates handle's value to free_obj and it will
> +		 * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
> +		 * breaks synchronization using pin_tag(e,g, zs_free) so
> +		 * let's keep the lock bit.
> +		 */
> +		free_obj |= BIT(HANDLE_PIN_BIT);
>   		record_obj(handle, free_obj);
>   		unpin_tag(handle);
>   		obj_free(pool, class, used_obj);
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 14:10                   ` Vlastimil Babka
  0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 14:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 01/18/2016 03:09 PM, Minchan Kim wrote:
>
> True.
> Let's go with this. I hope it's the last.
> Thanks, guys.
>
>  From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
> From: Junil Lee <junil0814.lee@lge.com>
> Date: Mon, 18 Jan 2016 23:01:29 +0900
> Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0					CPU1
> migrate_zspage
>    find_alloced_obj
>      trypin_tag
>        set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>    obj_malloc() -- new object, no tag
>    record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>    unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> This patch keeps the lock bit in migration path and update
> value atomically.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Cc: <stable@vger.kernel.org> [4.1+]
> ---
>   mm/zsmalloc.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414cec220b..2d7c4c11fc63 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
>
>   static void record_obj(unsigned long handle, unsigned long obj)
>   {
> -	*(unsigned long *)handle = obj;
> +	/*
> +	 * lsb of @obj represents handle lock while other bits
> +	 * represent object value the handle is pointing so
> +	 * updating shouldn't do store tearing.
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, obj);
>   }
>
>   /* zpool driver */
> @@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>   		free_obj = obj_malloc(d_page, class, handle);
>   		zs_object_copy(free_obj, used_obj, class);
>   		index++;
> +		/*
> +		 * record_obj updates handle's value to free_obj and it will
> +		 * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
> +		 * breaks synchronization using pin_tag(e,g, zs_free) so
> +		 * let's keep the lock bit.
> +		 */
> +		free_obj |= BIT(HANDLE_PIN_BIT);
>   		record_obj(handle, free_obj);
>   		unpin_tag(handle);
>   		obj_free(pool, class, used_obj);
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18 12:18               ` Vlastimil Babka
@ 2016-01-18 14:09                 ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 14:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On Mon, Jan 18, 2016 at 01:18:31PM +0100, Vlastimil Babka wrote:
> On 01/18/2016 09:20 AM, Minchan Kim wrote:
> >On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> >>On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> >>>On (01/18/16 16:11), Minchan Kim wrote:
> >>>[..]
> >>>>>so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>>>>barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>>>>so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>>>>something?
> >>>>
> >>>>We need two things
> >>>>2. memory barrier.
> >>>>
> >>>>As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >>>>by compiler.
> >>>>However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >>>>so another CPU could see stale data caused CPU memory reordering.
> >>>
> >>>oh... good find! lost release semantic of unpin_tag()...
> >>
> >>Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> >>with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> >>release semantic, which would be a bit more effective, but I guess migration is
> >>not that critical path to be worth introducing it.
> >
> >WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> 
> A simple WRITE_ONCE would just add a compiler barrier. What you
> suggests below does indeed add more operations, which are actually
> needed just in the migration. What I suggested is the v2 approach of
> adding the PIN bit before calling record_obj, *and* simply doing a
> WRITE_ONCE in record_obj() to make sure the PIN bit is indeed
> applied *before* writing to the handle, and not as two separate
> writes.
> 
> >but I don't feel it's too heavy in this phase so,
> 
> I'm afraid it's dangerous for the usage of record_obj() in
> zs_malloc() where the handle is freshly allocated by alloc_handle().
> Are we sure the bit is not set?
> 
> The code in alloc_handle() is:
>         return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
>                 pool->flags & ~__GFP_HIGHMEM);
> 
> There's no explicit __GFP_ZERO, so the handles are not guaranteed to
> be allocated empty? And expecting all zpool users to include
> __GFP_ZERO in flags would be too subtle and error prone.

True.
Let's go with this. I hope it's the last.
Thanks, guys.

>From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
From: Junil Lee <junil0814.lee@lge.com>
Date: Mon, 18 Jan 2016 23:01:29 +0900
Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.

Schematically, it goes like this:

CPU0					CPU1
migrate_zspage
  find_alloced_obj
    trypin_tag
      set HANDLE_PIN_BIT			zs_free()
						  pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update
value atomically.

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [4.1+]
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414cec220b..2d7c4c11fc63 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
 
 static void record_obj(unsigned long handle, unsigned long obj)
 {
-	*(unsigned long *)handle = obj;
+	/*
+	 * lsb of @obj represents handle lock while other bits
+	 * represent object value the handle is pointing so
+	 * updating shouldn't do store tearing.
+	 */
+	WRITE_ONCE(*(unsigned long *)handle, obj);
 }
 
 /* zpool driver */
@@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(free_obj, used_obj, class);
 		index++;
+		/*
+		 * record_obj updates handle's value to free_obj and it will
+		 * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
+		 * breaks synchronization using pin_tag(e,g, zs_free) so
+		 * let's keep the lock bit.
+		 */
+		free_obj |= BIT(HANDLE_PIN_BIT);
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 14:09                 ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 14:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On Mon, Jan 18, 2016 at 01:18:31PM +0100, Vlastimil Babka wrote:
> On 01/18/2016 09:20 AM, Minchan Kim wrote:
> >On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> >>On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> >>>On (01/18/16 16:11), Minchan Kim wrote:
> >>>[..]
> >>>>>so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>>>>barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>>>>so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>>>>something?
> >>>>
> >>>>We need two things
> >>>>2. memory barrier.
> >>>>
> >>>>As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >>>>by compiler.
> >>>>However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >>>>so another CPU could see stale data caused CPU memory reordering.
> >>>
> >>>oh... good find! lost release semantic of unpin_tag()...
> >>
> >>Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> >>with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> >>release semantic, which would be a bit more effective, but I guess migration is
> >>not that critical path to be worth introducing it.
> >
> >WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> 
> A simple WRITE_ONCE would just add a compiler barrier. What you
> suggests below does indeed add more operations, which are actually
> needed just in the migration. What I suggested is the v2 approach of
> adding the PIN bit before calling record_obj, *and* simply doing a
> WRITE_ONCE in record_obj() to make sure the PIN bit is indeed
> applied *before* writing to the handle, and not as two separate
> writes.
> 
> >but I don't feel it's too heavy in this phase so,
> 
> I'm afraid it's dangerous for the usage of record_obj() in
> zs_malloc() where the handle is freshly allocated by alloc_handle().
> Are we sure the bit is not set?
> 
> The code in alloc_handle() is:
>         return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
>                 pool->flags & ~__GFP_HIGHMEM);
> 
> There's no explicit __GFP_ZERO, so the handles are not guaranteed to
> be allocated empty? And expecting all zpool users to include
> __GFP_ZERO in flags would be too subtle and error prone.

True.
Let's go with this. I hope it's the last.
Thanks, guys.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  8:20             ` Minchan Kim
@ 2016-01-18 12:18               ` Vlastimil Babka
  -1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 12:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 01/18/2016 09:20 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
>> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
>>> On (01/18/16 16:11), Minchan Kim wrote:
>>> [..]
>>>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>>>> something?
>>>>
>>>> We need two things
>>>> 2. memory barrier.
>>>>
>>>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>>>> by compiler.
>>>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>>>> so another CPU could see stale data caused CPU memory reordering.
>>>
>>> oh... good find! lost release semantic of unpin_tag()...
>>
>> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
>> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
>> release semantic, which would be a bit more effective, but I guess migration is
>> not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc

A simple WRITE_ONCE would just add a compiler barrier. What you suggests 
below does indeed add more operations, which are actually needed just in 
the migration. What I suggested is the v2 approach of adding the PIN bit 
before calling record_obj, *and* simply doing a WRITE_ONCE in 
record_obj() to make sure the PIN bit is indeed applied *before* writing 
to the handle, and not as two separate writes.

> but I don't feel it's too heavy in this phase so,

I'm afraid it's dangerous for the usage of record_obj() in zs_malloc() 
where the handle is freshly allocated by alloc_handle(). Are we sure the 
bit is not set?

The code in alloc_handle() is:
         return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
                 pool->flags & ~__GFP_HIGHMEM);

There's no explicit __GFP_ZERO, so the handles are not guaranteed to be 
allocated empty? And expecting all zpool users to include __GFP_ZERO in 
flags would be too subtle and error prone.

> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
>   static void record_obj(unsigned long handle, unsigned long obj)
>   {
> -	*(unsigned long *)handle = obj;
> +	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> +	unsigned long val = obj | locked;
> +
> +	/*
> +	 * WRITE_ONCE could prevent store tearing like below
> +	 * *(unsigned long *)handle = free_obj
> +	 * *(unsigned long *)handle |= locked;
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, val);
>   }
>
>
>
>>
>> Thanks,
>> Vlastimil
>>
>>>
>>> 	-ss
>>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 12:18               ` Vlastimil Babka
  0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 12:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 01/18/2016 09:20 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
>> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
>>> On (01/18/16 16:11), Minchan Kim wrote:
>>> [..]
>>>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>>>> something?
>>>>
>>>> We need two things
>>>> 2. memory barrier.
>>>>
>>>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>>>> by compiler.
>>>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>>>> so another CPU could see stale data caused CPU memory reordering.
>>>
>>> oh... good find! lost release semantic of unpin_tag()...
>>
>> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
>> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
>> release semantic, which would be a bit more effective, but I guess migration is
>> not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc

A simple WRITE_ONCE would just add a compiler barrier. What you suggests 
below does indeed add more operations, which are actually needed just in 
the migration. What I suggested is the v2 approach of adding the PIN bit 
before calling record_obj, *and* simply doing a WRITE_ONCE in 
record_obj() to make sure the PIN bit is indeed applied *before* writing 
to the handle, and not as two separate writes.

> but I don't feel it's too heavy in this phase so,

I'm afraid it's dangerous for the usage of record_obj() in zs_malloc() 
where the handle is freshly allocated by alloc_handle(). Are we sure the 
bit is not set?

The code in alloc_handle() is:
         return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
                 pool->flags & ~__GFP_HIGHMEM);

There's no explicit __GFP_ZERO, so the handles are not guaranteed to be 
allocated empty? And expecting all zpool users to include __GFP_ZERO in 
flags would be too subtle and error prone.

> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
>   static void record_obj(unsigned long handle, unsigned long obj)
>   {
> -	*(unsigned long *)handle = obj;
> +	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> +	unsigned long val = obj | locked;
> +
> +	/*
> +	 * WRITE_ONCE could prevent store tearing like below
> +	 * *(unsigned long *)handle = free_obj
> +	 * *(unsigned long *)handle |= locked;
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, val);
>   }
>
>
>
>>
>> Thanks,
>> Vlastimil
>>
>>>
>>> 	-ss
>>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  8:20             ` Minchan Kim
@ 2016-01-18 11:08               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 11:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Vlastimil Babka, Sergey Senozhatsky, Junil Lee, ngupta, akpm,
	linux-mm, linux-kernel

On (01/18/16 17:20), Minchan Kim wrote:
[..]
> > > oh... good find! lost release semantic of unpin_tag()...
> > 
> > Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> > with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> > release semantic, which would be a bit more effective, but I guess migration is
> > not that critical path to be worth introducing it.
> 
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> but I don't feel it's too heavy in this phase so,
> 
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
> 
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
>  static void record_obj(unsigned long handle, unsigned long obj)
>  {
> -	*(unsigned long *)handle = obj;
> +	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> +	unsigned long val = obj | locked;
> +
> +	/*
> +	 * WRITE_ONCE could prevent store tearing like below
> +	 * *(unsigned long *)handle = free_obj
> +	 * *(unsigned long *)handle |= locked;
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, val);
>  }

given that memory barriers are also compiler barriers, wouldn't

	record_obj()
	{
		barrier
		*(unsigned long *)handle) = new
	}

suffice?

	-ss

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 11:08               ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 11:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Vlastimil Babka, Sergey Senozhatsky, Junil Lee, ngupta, akpm,
	linux-mm, linux-kernel

On (01/18/16 17:20), Minchan Kim wrote:
[..]
> > > oh... good find! lost release semantic of unpin_tag()...
> > 
> > Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> > with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> > release semantic, which would be a bit more effective, but I guess migration is
> > not that critical path to be worth introducing it.
> 
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> but I don't feel it's too heavy in this phase so,
> 
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
> 
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
>  static void record_obj(unsigned long handle, unsigned long obj)
>  {
> -	*(unsigned long *)handle = obj;
> +	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> +	unsigned long val = obj | locked;
> +
> +	/*
> +	 * WRITE_ONCE could prevent store tearing like below
> +	 * *(unsigned long *)handle = free_obj
> +	 * *(unsigned long *)handle |= locked;
> +	 */
> +	WRITE_ONCE(*(unsigned long *)handle, val);
>  }

given that memory barriers are also compiler barriers, wouldn't

	record_obj()
	{
		barrier
		*(unsigned long *)handle) = new
	}

suffice?

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  7:54           ` Vlastimil Babka
@ 2016-01-18  8:20             ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  8:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> > On (01/18/16 16:11), Minchan Kim wrote:
> > [..]
> >>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>> something?
> >>
> >> We need two things
> >> 2. memory barrier.
> >>
> >> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >> by compiler.
> >> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >> so another CPU could see stale data caused CPU memory reordering.
> > 
> > oh... good find! lost release semantic of unpin_tag()...
> 
> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> release semantic, which would be a bit more effective, but I guess migration is
> not that critical path to be worth introducing it.

WRITE_ONCE in record_obj would add more memory operations in obj_malloc
but I don't feel it's too heavy in this phase so,

How about this? Junil, Could you resend patch if others agree this?
Thanks.

+/*
+ * record_obj updates handle's value to free_obj and it shouldn't
+ * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
+ * it breaks synchronization using pin_tag(e,g, zs_free) so let's
+ * keep the lock bit.
+ */
 static void record_obj(unsigned long handle, unsigned long obj)
 {
-	*(unsigned long *)handle = obj;
+	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
+	unsigned long val = obj | locked;
+
+	/*
+	 * WRITE_ONCE could prevent store tearing like below
+	 * *(unsigned long *)handle = free_obj
+	 * *(unsigned long *)handle |= locked;
+	 */
+	WRITE_ONCE(*(unsigned long *)handle, val);
 }



> 
> Thanks,
> Vlastimil
> 
> > 
> > 	-ss
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  8:20             ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  8:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> > On (01/18/16 16:11), Minchan Kim wrote:
> > [..]
> >>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>> something?
> >>
> >> We need two things
> >> 2. memory barrier.
> >>
> >> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >> by compiler.
> >> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >> so another CPU could see stale data caused CPU memory reordering.
> > 
> > oh... good find! lost release semantic of unpin_tag()...
> 
> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> release semantic, which would be a bit more effective, but I guess migration is
> not that critical path to be worth introducing it.

WRITE_ONCE in record_obj would add more memory operations in obj_malloc
but I don't feel it's too heavy in this phase so,

How about this? Junil, Could you resend patch if others agree this?
Thanks.

+/*
+ * record_obj updates handle's value to free_obj and it shouldn't
+ * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
+ * it breaks synchronization using pin_tag(e,g, zs_free) so let's
+ * keep the lock bit.
+ */
 static void record_obj(unsigned long handle, unsigned long obj)
 {
-	*(unsigned long *)handle = obj;
+	int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
+	unsigned long val = obj | locked;
+
+	/*
+	 * WRITE_ONCE could prevent store tearing like below
+	 * *(unsigned long *)handle = free_obj
+	 * *(unsigned long *)handle |= locked;
+	 */
+	WRITE_ONCE(*(unsigned long *)handle, val);
 }



> 
> Thanks,
> Vlastimil
> 
> > 
> > 	-ss
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  7:39         ` Sergey Senozhatsky
@ 2016-01-18  7:54           ` Vlastimil Babka
  -1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18  7:54 UTC (permalink / raw)
  To: Sergey Senozhatsky, Minchan Kim
  Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> On (01/18/16 16:11), Minchan Kim wrote:
> [..]
>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>> something?
>>
>> We need two things
>> 2. memory barrier.
>>
>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>> by compiler.
>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>> so another CPU could see stale data caused CPU memory reordering.
> 
> oh... good find! lost release semantic of unpin_tag()...

Ah, release semantic, good point indeed. OK then we need the v2 approach again,
with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
release semantic, which would be a bit more effective, but I guess migration is
not that critical path to be worth introducing it.

Thanks,
Vlastimil

> 
> 	-ss
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  7:54           ` Vlastimil Babka
  0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18  7:54 UTC (permalink / raw)
  To: Sergey Senozhatsky, Minchan Kim
  Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel

On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> On (01/18/16 16:11), Minchan Kim wrote:
> [..]
>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>> something?
>>
>> We need two things
>> 2. memory barrier.
>>
>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>> by compiler.
>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>> so another CPU could see stale data caused CPU memory reordering.
> 
> oh... good find! lost release semantic of unpin_tag()...

Ah, release semantic, good point indeed. OK then we need the v2 approach again,
with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
release semantic, which would be a bit more effective, but I guess migration is
not that critical path to be worth introducing it.

Thanks,
Vlastimil

> 
> 	-ss
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  7:11       ` Minchan Kim
@ 2016-01-18  7:39         ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  7:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 16:11), Minchan Kim wrote:
[..]
> > so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> > barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> > so I don't think WRITE_ONCE() will help the compiler, or am I missing
> > something?
> 
> We need two things

thanks.

> 1. compiler barrier

um... probably gcc can reorder that sequence to something like this

	*handle = obj_malloc()   /* unpin the object */
	zs_object_copy(*handle, used_obj, class) /* now use it*/

ok.


> 2. memory barrier.
> 
> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> by compiler.
> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> so another CPU could see stale data caused CPU memory reordering.

oh... good find! lost release semantic of unpin_tag()...

	-ss

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  7:39         ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  7:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 16:11), Minchan Kim wrote:
[..]
> > so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> > barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> > so I don't think WRITE_ONCE() will help the compiler, or am I missing
> > something?
> 
> We need two things

thanks.

> 1. compiler barrier

um... probably gcc can reorder that sequence to something like this

	*handle = obj_malloc()   /* unpin the object */
	zs_object_copy(*handle, used_obj, class) /* now use it*/

ok.


> 2. memory barrier.
> 
> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> by compiler.
> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> so another CPU could see stale data caused CPU memory reordering.

oh... good find! lost release semantic of unpin_tag()...

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  6:54     ` Sergey Senozhatsky
@ 2016-01-18  7:11       ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  7:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel, vbabka

On Mon, Jan 18, 2016 at 03:54:34PM +0900, Sergey Senozhatsky wrote:
> On (01/18/16 15:36), Minchan Kim wrote:
> [..]
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > >  		free_obj = obj_malloc(d_page, class, handle);
> > >  		zs_object_copy(free_obj, used_obj, class);
> > >  		index++;
> > > +		/* This also effectively unpins the handle */
> > 
> > As reply of Vlastimil, I relied that I guess it doesn't work.
> > We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> > record_obj.
> > 
> > As well, it's worth to dobule check with locking guys.
> > I will send updated version.
> 
> but would WRITE_ONCE() tell the compiler that there is a dependency?
> __write_once_size() does not even issue a barrier for sizes <= 8 (our
> case).
> 
> include/linux/compiler.h
> 
> static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> {
> 	switch (size) {
> 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> 	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> 	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> 	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> 	default:
> 		barrier();
> 		__builtin_memcpy((void *)p, (const void *)res, size);
> 		barrier();
> 	}
> }
> 
> #define WRITE_ONCE(x, val) \
> ({							\
> 	union { typeof(x) __val; char __c[1]; } __u =	\
> 		{ .__val = (__force typeof(x)) (val) }; \
> 	__write_once_size(&(x), __u.__c, sizeof(x));	\
> 	__u.__val;					\
> })
> 
> 
> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> something?

We need two things

1. compiler barrier
2. memory barrier.

As compiler barrier, WRITE_ONCE works to prevent store tearing here
by compiler.
However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
so another CPU could see stale data caused CPU memory reordering.


> 
> .... add a barrier() to record_obj()?
> 
> 	-ss

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  7:11       ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  7:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel, vbabka

On Mon, Jan 18, 2016 at 03:54:34PM +0900, Sergey Senozhatsky wrote:
> On (01/18/16 15:36), Minchan Kim wrote:
> [..]
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > >  		free_obj = obj_malloc(d_page, class, handle);
> > >  		zs_object_copy(free_obj, used_obj, class);
> > >  		index++;
> > > +		/* This also effectively unpins the handle */
> > 
> > As reply of Vlastimil, I relied that I guess it doesn't work.
> > We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> > record_obj.
> > 
> > As well, it's worth to dobule check with locking guys.
> > I will send updated version.
> 
> but would WRITE_ONCE() tell the compiler that there is a dependency?
> __write_once_size() does not even issue a barrier for sizes <= 8 (our
> case).
> 
> include/linux/compiler.h
> 
> static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> {
> 	switch (size) {
> 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> 	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> 	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> 	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> 	default:
> 		barrier();
> 		__builtin_memcpy((void *)p, (const void *)res, size);
> 		barrier();
> 	}
> }
> 
> #define WRITE_ONCE(x, val) \
> ({							\
> 	union { typeof(x) __val; char __c[1]; } __u =	\
> 		{ .__val = (__force typeof(x)) (val) }; \
> 	__write_once_size(&(x), __u.__c, sizeof(x));	\
> 	__u.__val;					\
> })
> 
> 
> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> something?

We need two things

1. compiler barrier
2. memory barrier.

As compiler barrier, WRITE_ONCE works to prevent store tearing here
by compiler.
However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
so another CPU could see stale data caused CPU memory reordering.


> 
> .... add a barrier() to record_obj()?
> 
> 	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  6:36   ` Minchan Kim
@ 2016-01-18  6:54     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  6:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Junil Lee, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 15:36), Minchan Kim wrote:
[..]
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> >  		free_obj = obj_malloc(d_page, class, handle);
> >  		zs_object_copy(free_obj, used_obj, class);
> >  		index++;
> > +		/* This also effectively unpins the handle */
> 
> As reply of Vlastimil, I relied that I guess it doesn't work.
> We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> record_obj.
> 
> As well, it's worth to dobule check with locking guys.
> I will send updated version.

but would WRITE_ONCE() tell the compiler that there is a dependency?
__write_once_size() does not even issue a barrier for sizes <= 8 (our
case).

include/linux/compiler.h

static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
	switch (size) {
	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
	default:
		barrier();
		__builtin_memcpy((void *)p, (const void *)res, size);
		barrier();
	}
}

#define WRITE_ONCE(x, val) \
({							\
	union { typeof(x) __val; char __c[1]; } __u =	\
		{ .__val = (__force typeof(x)) (val) }; \
	__write_once_size(&(x), __u.__c, sizeof(x));	\
	__u.__val;					\
})


so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
so I don't think WRITE_ONCE() will help the compiler, or am I missing
something?

.... add a barrier() to record_obj()?

	-ss

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  6:54     ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  6:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Junil Lee, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 15:36), Minchan Kim wrote:
[..]
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> >  		free_obj = obj_malloc(d_page, class, handle);
> >  		zs_object_copy(free_obj, used_obj, class);
> >  		index++;
> > +		/* This also effectively unpins the handle */
> 
> As reply of Vlastimil, I relied that I guess it doesn't work.
> We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> record_obj.
> 
> As well, it's worth to dobule check with locking guys.
> I will send updated version.

but would WRITE_ONCE() tell the compiler that there is a dependency?
__write_once_size() does not even issue a barrier for sizes <= 8 (our
case).

include/linux/compiler.h

static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
	switch (size) {
	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
	default:
		barrier();
		__builtin_memcpy((void *)p, (const void *)res, size);
		barrier();
	}
}

#define WRITE_ONCE(x, val) \
({							\
	union { typeof(x) __val; char __c[1]; } __u =	\
		{ .__val = (__force typeof(x)) (val) }; \
	__write_once_size(&(x), __u.__c, sizeof(x));	\
	__u.__val;					\
})


so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
so I don't think WRITE_ONCE() will help the compiler, or am I missing
something?

.... add a barrier() to record_obj()?

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  5:39 ` Junil Lee
@ 2016-01-18  6:36   ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  6:36 UTC (permalink / raw)
  To: Junil Lee
  Cc: ngupta, sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka

Hello Junil,

On Mon, Jan 18, 2016 at 02:39:56PM +0900, Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
> 
> Schematically, it goes like this:
> 
> CPU0					CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */

As reply of Vlastimil, I relied that I guess it doesn't work.
We shouldn't omit unpin_tag and we should add WRITE_ONCE in
record_obj.

As well, it's worth to dobule check with locking guys.
I will send updated version.

Thanks.


>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  6:36   ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18  6:36 UTC (permalink / raw)
  To: Junil Lee
  Cc: ngupta, sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka

Hello Junil,

On Mon, Jan 18, 2016 at 02:39:56PM +0900, Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
> 
> Schematically, it goes like this:
> 
> CPU0					CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */

As reply of Vlastimil, I relied that I guess it doesn't work.
We shouldn't omit unpin_tag and we should add WRITE_ONCE in
record_obj.

As well, it's worth to dobule check with locking guys.
I will send updated version.

Thanks.


>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
  2016-01-18  5:39 ` Junil Lee
@ 2016-01-18  6:13   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  6:13 UTC (permalink / raw)
  To: Junil Lee
  Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 14:39), Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
> 
> Schematically, it goes like this:
> 
> CPU0					CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>

 Suggested-by: Vlastimil Babka <vbabka@suse.cz>
 Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
 Cc: <stable@vger.kernel.org> [4.1+]


migrate_zspage() was introduced in 4.1
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/zsmalloc.c?id=refs/tags/v4.1

thank you.

	-ss

> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */
>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  6:13   ` Sergey Senozhatsky
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18  6:13 UTC (permalink / raw)
  To: Junil Lee
  Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
	linux-kernel, vbabka

On (01/18/16 14:39), Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
> 
> Schematically, it goes like this:
> 
> CPU0					CPU1
> migrate_zspage
>   find_alloced_obj
>     trypin_tag
>       set HANDLE_PIN_BIT			zs_free()
> 						  pin_tag()
>   obj_malloc() -- new object, no tag
>   record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
>   unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
> 
> The race condition may result in a NULL pointer dereference:
> 	Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> 	PC is at get_zspage_mapping+0x0/0x24
> 	LR is at obj_free.isra.22+0x64/0x128
> 	Call trace:
> 		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> 		[<ffffffc0001a4918>] zs_free+0x88/0x114
> 		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> 		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> 		[<ffffffc000196638>] swap_entry_free+0x278/0x294
> 		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> 		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> 		[<ffffffc000184350>] unmap_vmas+0x44/0x60
> 		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
> 		[<ffffffc00009e408>] mmput+0x58/0xe0
> 		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
> 		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> 		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
> 		[<ffffffc000087e44>] do_signal+0x98/0x4b8
> 		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
> 
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>

 Suggested-by: Vlastimil Babka <vbabka@suse.cz>
 Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
 Cc: <stable@vger.kernel.org> [4.1+]


migrate_zspage() was introduced in 4.1
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/zsmalloc.c?id=refs/tags/v4.1

thank you.

	-ss

> ---
>  mm/zsmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		free_obj = obj_malloc(d_page, class, handle);
>  		zs_object_copy(free_obj, used_obj, class);
>  		index++;
> +		/* This also effectively unpins the handle */
>  		record_obj(handle, free_obj);
> -		unpin_tag(handle);
>  		obj_free(pool, class, used_obj);
>  	}
>  
> -- 
> 2.6.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  5:39 ` Junil Lee
  0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18  5:39 UTC (permalink / raw)
  To: minchan, ngupta
  Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka, Junil Lee

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.

Schematically, it goes like this:

CPU0					CPU1
migrate_zspage
  find_alloced_obj
    trypin_tag
      set HANDLE_PIN_BIT			zs_free()
						  pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

Fix the race by removing explicit unpin_tag() from migrate_zspage().

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(free_obj, used_obj, class);
 		index++;
+		/* This also effectively unpins the handle */
 		record_obj(handle, free_obj);
-		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
 	}
 
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18  5:39 ` Junil Lee
  0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18  5:39 UTC (permalink / raw)
  To: minchan, ngupta
  Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka, Junil Lee

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.

Schematically, it goes like this:

CPU0					CPU1
migrate_zspage
  find_alloced_obj
    trypin_tag
      set HANDLE_PIN_BIT			zs_free()
						  pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT	    set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:
	Unable to handle kernel NULL pointer dereference at virtual address 00000000
	CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
	PC is at get_zspage_mapping+0x0/0x24
	LR is at obj_free.isra.22+0x64/0x128
	Call trace:
		[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
		[<ffffffc0001a4918>] zs_free+0x88/0x114
		[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
		[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
		[<ffffffc000196638>] swap_entry_free+0x278/0x294
		[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
		[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
		[<ffffffc000184350>] unmap_vmas+0x44/0x60
		[<ffffffc00018a53c>] exit_mmap+0x50/0x110
		[<ffffffc00009e408>] mmput+0x58/0xe0
		[<ffffffc0000a2854>] do_exit+0x320/0x8dc
		[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
		[<ffffffc0000ae1bc>] get_signal+0x538/0x580
		[<ffffffc000087e44>] do_signal+0x98/0x4b8
		[<ffffffc00008843c>] do_notify_resume+0x14/0x5c

Fix the race by removing explicit unpin_tag() from migrate_zspage().

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(free_obj, used_obj, class);
 		index++;
+		/* This also effectively unpins the handle */
 		record_obj(handle, free_obj);
-		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
 	}
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-01-18 14:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18  1:15 [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition Junil Lee
2016-01-18  1:15 ` Junil Lee
2016-01-18  4:14 ` Sergey Senozhatsky
2016-01-18  4:14   ` Sergey Senozhatsky
2016-01-18  4:17   ` Sergey Senozhatsky
2016-01-18  4:17     ` Sergey Senozhatsky
2016-01-18  5:39 Junil Lee
2016-01-18  5:39 ` Junil Lee
2016-01-18  6:13 ` Sergey Senozhatsky
2016-01-18  6:13   ` Sergey Senozhatsky
2016-01-18  6:36 ` Minchan Kim
2016-01-18  6:36   ` Minchan Kim
2016-01-18  6:54   ` Sergey Senozhatsky
2016-01-18  6:54     ` Sergey Senozhatsky
2016-01-18  7:11     ` Minchan Kim
2016-01-18  7:11       ` Minchan Kim
2016-01-18  7:39       ` Sergey Senozhatsky
2016-01-18  7:39         ` Sergey Senozhatsky
2016-01-18  7:54         ` Vlastimil Babka
2016-01-18  7:54           ` Vlastimil Babka
2016-01-18  8:20           ` Minchan Kim
2016-01-18  8:20             ` Minchan Kim
2016-01-18 11:08             ` Sergey Senozhatsky
2016-01-18 11:08               ` Sergey Senozhatsky
2016-01-18 12:18             ` Vlastimil Babka
2016-01-18 12:18               ` Vlastimil Babka
2016-01-18 14:09               ` Minchan Kim
2016-01-18 14:09                 ` Minchan Kim
2016-01-18 14:10                 ` Vlastimil Babka
2016-01-18 14:10                   ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.