* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 1:15 ` Junil Lee
0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18 1:15 UTC (permalink / raw)
To: minchan, ngupta
Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, Junil Lee
To prevent unlock at the not correct situation, tagging the new obj to
assure lock in migrate_zspage() before right unlock path.
Two functions are in race condition by tag which set 1 on last bit of
obj, however unlock succrently when update new obj to handle before call
unpin_tag() which is right unlock path.
summarize this problem by call flow as below:
CPU0 CPU1
migrate_zspage
find_alloced_obj()
trypin_tag() -- obj |= HANDLE_PIN_BIT
obj_malloc() -- new obj is not set zs_free
record_obj() -- unlock and break sync pin_tag() -- get lock
unpin_tag()
Before code make crash as below:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
and for test, print obj value after pin_tag() in zs_free().
Sometimes obj is even number means break synchronization.
After patched, crash is not occurred and obj is only odd number in same
situation.
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
mm/zsmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj = obj_malloc(d_page, class, handle);
zs_object_copy(free_obj, used_obj, class);
index++;
+ /* This also effectively unpins the handle */
record_obj(handle, free_obj);
- unpin_tag(handle);
obj_free(pool, class, used_obj);
}
--
2.6.2
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 1:15 ` Junil Lee
0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18 1:15 UTC (permalink / raw)
To: minchan, ngupta
Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, Junil Lee
To prevent unlock at the not correct situation, tagging the new obj to
assure lock in migrate_zspage() before right unlock path.
Two functions are in race condition by tag which set 1 on last bit of
obj, however unlock succrently when update new obj to handle before call
unpin_tag() which is right unlock path.
summarize this problem by call flow as below:
CPU0 CPU1
migrate_zspage
find_alloced_obj()
trypin_tag() -- obj |= HANDLE_PIN_BIT
obj_malloc() -- new obj is not set zs_free
record_obj() -- unlock and break sync pin_tag() -- get lock
unpin_tag()
Before code make crash as below:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
and for test, print obj value after pin_tag() in zs_free().
Sometimes obj is even number means break synchronization.
After patched, crash is not occurred and obj is only odd number in same
situation.
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
mm/zsmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj = obj_malloc(d_page, class, handle);
zs_object_copy(free_obj, used_obj, class);
index++;
+ /* This also effectively unpins the handle */
record_obj(handle, free_obj);
- unpin_tag(handle);
obj_free(pool, class, used_obj);
}
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 1:15 ` Junil Lee
@ 2016-01-18 4:14 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 4:14 UTC (permalink / raw)
To: Junil Lee
Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, sergey.senozhatsky, Vlastimil Babka
Cc Vlastimil,
Hello,
On (01/18/16 10:15), Junil Lee wrote:
> To prevent unlock at the not correct situation, tagging the new obj to
> assure lock in migrate_zspage() before right unlock path.
>
> Two functions are in race condition by tag which set 1 on last bit of
> obj, however unlock succrently when update new obj to handle before call
> unpin_tag() which is right unlock path.
>
> summarize this problem by call flow as below:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj()
> trypin_tag() -- obj |= HANDLE_PIN_BIT
> obj_malloc() -- new obj is not set zs_free
> record_obj() -- unlock and break sync pin_tag() -- get lock
> unpin_tag()
Junil, can something like this be a bit simpler problem description?
---
record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
implicitly (accidentally) un-pins the handle, while migrate_zspage()
still performs an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition
with zs_free(), which can pin that handle by this time, so the handle
becomes un-pinned. Schematically, it goes like this:
CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
The race condition may result in a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
Fix the race by removing explicit unpin_tag() from migrate_zspage().
---
> and for test, print obj value after pin_tag() in zs_free().
> Sometimes obj is even number means break synchronization.
>
> After patched, crash is not occurred and obj is only odd number in same
> situation.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
I believe Vlastimil deserves a credit here (at least Suggested-by)
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
now, can the compiler re-order
record_obj(handle, free_obj);
obj_free(pool, class, used_obj);
?
-ss
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 4:14 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 4:14 UTC (permalink / raw)
To: Junil Lee
Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, sergey.senozhatsky, Vlastimil Babka
Cc Vlastimil,
Hello,
On (01/18/16 10:15), Junil Lee wrote:
> To prevent unlock at the not correct situation, tagging the new obj to
> assure lock in migrate_zspage() before right unlock path.
>
> Two functions are in race condition by tag which set 1 on last bit of
> obj, however unlock succrently when update new obj to handle before call
> unpin_tag() which is right unlock path.
>
> summarize this problem by call flow as below:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj()
> trypin_tag() -- obj |= HANDLE_PIN_BIT
> obj_malloc() -- new obj is not set zs_free
> record_obj() -- unlock and break sync pin_tag() -- get lock
> unpin_tag()
Junil, can something like this be a bit simpler problem description?
---
record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
implicitly (accidentally) un-pins the handle, while migrate_zspage()
still performs an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition
with zs_free(), which can pin that handle by this time, so the handle
becomes un-pinned. Schematically, it goes like this:
CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
The race condition may result in a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
Fix the race by removing explicit unpin_tag() from migrate_zspage().
---
> and for test, print obj value after pin_tag() in zs_free().
> Sometimes obj is even number means break synchronization.
>
> After patched, crash is not occurred and obj is only odd number in same
> situation.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
I believe Vlastimil deserves a credit here (at least Suggested-by)
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
now, can the compiler re-order
record_obj(handle, free_obj);
obj_free(pool, class, used_obj);
?
-ss
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 4:14 ` Sergey Senozhatsky
@ 2016-01-18 4:17 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 4:17 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Junil Lee, minchan, ngupta, akpm, linux-mm, linux-kernel,
sergey.senozhatsky, Vlastimil Babka
On (01/18/16 13:14), Sergey Senozhatsky wrote:
> Cc Vlastimil,
>
> Hello,
>
> On (01/18/16 10:15), Junil Lee wrote:
> > To prevent unlock at the not correct situation, tagging the new obj to
> > assure lock in migrate_zspage() before right unlock path.
> >
> > Two functions are in race condition by tag which set 1 on last bit of
> > obj, however unlock succrently when update new obj to handle before call
> > unpin_tag() which is right unlock path.
> >
> > summarize this problem by call flow as below:
> >
> > CPU0 CPU1
> > migrate_zspage
> > find_alloced_obj()
> > trypin_tag() -- obj |= HANDLE_PIN_BIT
> > obj_malloc() -- new obj is not set zs_free
> > record_obj() -- unlock and break sync pin_tag() -- get lock
> > unpin_tag()
>
> Junil, can something like this be a bit simpler problem description?
>
> ---
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
> implicitly (accidentally) un-pins the handle, while migrate_zspage()
> still performs an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition
> with zs_free(), which can pin that handle by this time, so the handle
> becomes un-pinned. Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> ---
>
>
> > and for test, print obj value after pin_tag() in zs_free().
> > Sometimes obj is even number means break synchronization.
> >
> > After patched, crash is not occurred and obj is only odd number in same
> > situation.
> >
> > Signed-off-by: Junil Lee <junil0814.lee@lge.com>
>
> I believe Vlastimil deserves a credit here (at least Suggested-by)
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>
>
> now, can the compiler re-order
>
> record_obj(handle, free_obj);
> obj_free(pool, class, used_obj);
oh, disregard the last "re-ordering" commentary, sorry.
-ss
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 4:17 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 4:17 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Junil Lee, minchan, ngupta, akpm, linux-mm, linux-kernel,
sergey.senozhatsky, Vlastimil Babka
On (01/18/16 13:14), Sergey Senozhatsky wrote:
> Cc Vlastimil,
>
> Hello,
>
> On (01/18/16 10:15), Junil Lee wrote:
> > To prevent unlock at the not correct situation, tagging the new obj to
> > assure lock in migrate_zspage() before right unlock path.
> >
> > Two functions are in race condition by tag which set 1 on last bit of
> > obj, however unlock succrently when update new obj to handle before call
> > unpin_tag() which is right unlock path.
> >
> > summarize this problem by call flow as below:
> >
> > CPU0 CPU1
> > migrate_zspage
> > find_alloced_obj()
> > trypin_tag() -- obj |= HANDLE_PIN_BIT
> > obj_malloc() -- new obj is not set zs_free
> > record_obj() -- unlock and break sync pin_tag() -- get lock
> > unpin_tag()
>
> Junil, can something like this be a bit simpler problem description?
>
> ---
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_alloced_obj()->trypin_tag(), and
> implicitly (accidentally) un-pins the handle, while migrate_zspage()
> still performs an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition
> with zs_free(), which can pin that handle by this time, so the handle
> becomes un-pinned. Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> ---
>
>
> > and for test, print obj value after pin_tag() in zs_free().
> > Sometimes obj is even number means break synchronization.
> >
> > After patched, crash is not occurred and obj is only odd number in same
> > situation.
> >
> > Signed-off-by: Junil Lee <junil0814.lee@lge.com>
>
> I believe Vlastimil deserves a credit here (at least Suggested-by)
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>
>
> now, can the compiler re-order
>
> record_obj(handle, free_obj);
> obj_free(pool, class, used_obj);
oh, disregard the last "re-ordering" commentary, sorry.
-ss
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 14:09 ` Minchan Kim
@ 2016-01-18 14:10 ` Vlastimil Babka
-1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 14:10 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 01/18/2016 03:09 PM, Minchan Kim wrote:
>
> True.
> Let's go with this. I hope it's the last.
> Thanks, guys.
>
> From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
> From: Junil Lee <junil0814.lee@lge.com>
> Date: Mon, 18 Jan 2016 23:01:29 +0900
> Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> This patch keeps the lock bit in migration path and update
> value atomically.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org> [4.1+]
> ---
> mm/zsmalloc.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414cec220b..2d7c4c11fc63 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
>
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + /*
> + * lsb of @obj represents handle lock while other bits
> + * represent object value the handle is pointing so
> + * updating shouldn't do store tearing.
> + */
> + WRITE_ONCE(*(unsigned long *)handle, obj);
> }
>
> /* zpool driver */
> @@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /*
> + * record_obj updates handle's value to free_obj and it will
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
> + * breaks synchronization using pin_tag(e,g, zs_free) so
> + * let's keep the lock bit.
> + */
> + free_obj |= BIT(HANDLE_PIN_BIT);
> record_obj(handle, free_obj);
> unpin_tag(handle);
> obj_free(pool, class, used_obj);
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 14:10 ` Vlastimil Babka
0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 14:10 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 01/18/2016 03:09 PM, Minchan Kim wrote:
>
> True.
> Let's go with this. I hope it's the last.
> Thanks, guys.
>
> From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
> From: Junil Lee <junil0814.lee@lge.com>
> Date: Mon, 18 Jan 2016 23:01:29 +0900
> Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition
>
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> This patch keeps the lock bit in migration path and update
> value atomically.
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org> [4.1+]
> ---
> mm/zsmalloc.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414cec220b..2d7c4c11fc63 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
>
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + /*
> + * lsb of @obj represents handle lock while other bits
> + * represent object value the handle is pointing so
> + * updating shouldn't do store tearing.
> + */
> + WRITE_ONCE(*(unsigned long *)handle, obj);
> }
>
> /* zpool driver */
> @@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /*
> + * record_obj updates handle's value to free_obj and it will
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
> + * breaks synchronization using pin_tag(e,g, zs_free) so
> + * let's keep the lock bit.
> + */
> + free_obj |= BIT(HANDLE_PIN_BIT);
> record_obj(handle, free_obj);
> unpin_tag(handle);
> obj_free(pool, class, used_obj);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 12:18 ` Vlastimil Babka
@ 2016-01-18 14:09 ` Minchan Kim
-1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 14:09 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On Mon, Jan 18, 2016 at 01:18:31PM +0100, Vlastimil Babka wrote:
> On 01/18/2016 09:20 AM, Minchan Kim wrote:
> >On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> >>On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> >>>On (01/18/16 16:11), Minchan Kim wrote:
> >>>[..]
> >>>>>so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>>>>barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>>>>so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>>>>something?
> >>>>
> >>>>We need two things
> >>>>2. memory barrier.
> >>>>
> >>>>As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >>>>by compiler.
> >>>>However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >>>>so another CPU could see stale data caused CPU memory reordering.
> >>>
> >>>oh... good find! lost release semantic of unpin_tag()...
> >>
> >>Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> >>with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> >>release semantic, which would be a bit more effective, but I guess migration is
> >>not that critical path to be worth introducing it.
> >
> >WRITE_ONCE in record_obj would add more memory operations in obj_malloc
>
> A simple WRITE_ONCE would just add a compiler barrier. What you
> suggests below does indeed add more operations, which are actually
> needed just in the migration. What I suggested is the v2 approach of
> adding the PIN bit before calling record_obj, *and* simply doing a
> WRITE_ONCE in record_obj() to make sure the PIN bit is indeed
> applied *before* writing to the handle, and not as two separate
> writes.
>
> >but I don't feel it's too heavy in this phase so,
>
> I'm afraid it's dangerous for the usage of record_obj() in
> zs_malloc() where the handle is freshly allocated by alloc_handle().
> Are we sure the bit is not set?
>
> The code in alloc_handle() is:
> return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> pool->flags & ~__GFP_HIGHMEM);
>
> There's no explicit __GFP_ZERO, so the handles are not guaranteed to
> be allocated empty? And expecting all zpool users to include
> __GFP_ZERO in flags would be too subtle and error prone.
True.
Let's go with this. I hope it's the last.
Thanks, guys.
>From 389bbcbad9aba7d86a575b8c6ea3b8985cc801ea Mon Sep 17 00:00:00 2001
From: Junil Lee <junil0814.lee@lge.com>
Date: Mon, 18 Jan 2016 23:01:29 +0900
Subject: [PATCH v5] zsmalloc: fix migrate_zspage-zs_free race condition
record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.
Schematically, it goes like this:
CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
The race condition may result in a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
This patch keeps the lock bit in migration path and update
value atomically.
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [4.1+]
---
mm/zsmalloc.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414cec220b..2d7c4c11fc63 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -309,7 +309,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
static void record_obj(unsigned long handle, unsigned long obj)
{
- *(unsigned long *)handle = obj;
+ /*
+ * lsb of @obj represents handle lock while other bits
+ * represent object value the handle is pointing so
+ * updating shouldn't do store tearing.
+ */
+ WRITE_ONCE(*(unsigned long *)handle, obj);
}
/* zpool driver */
@@ -1635,6 +1640,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj = obj_malloc(d_page, class, handle);
zs_object_copy(free_obj, used_obj, class);
index++;
+ /*
+ * record_obj updates handle's value to free_obj and it will
+ * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
+ * breaks synchronization using pin_tag(e,g, zs_free) so
+ * let's keep the lock bit.
+ */
+ free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
unpin_tag(handle);
obj_free(pool, class, used_obj);
--
1.9.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 14:09 ` Minchan Kim
0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 14:09 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On Mon, Jan 18, 2016 at 01:18:31PM +0100, Vlastimil Babka wrote:
> On 01/18/2016 09:20 AM, Minchan Kim wrote:
> >On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> >>On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> >>>On (01/18/16 16:11), Minchan Kim wrote:
> >>>[..]
> >>>>>so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>>>>barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>>>>so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>>>>something?
> >>>>
> >>>>We need two things
> >>>>2. memory barrier.
> >>>>
> >>>>As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >>>>by compiler.
> >>>>However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >>>>so another CPU could see stale data caused CPU memory reordering.
> >>>
> >>>oh... good find! lost release semantic of unpin_tag()...
> >>
> >>Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> >>with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> >>release semantic, which would be a bit more effective, but I guess migration is
> >>not that critical path to be worth introducing it.
> >
> >WRITE_ONCE in record_obj would add more memory operations in obj_malloc
>
> A simple WRITE_ONCE would just add a compiler barrier. What you
> suggests below does indeed add more operations, which are actually
> needed just in the migration. What I suggested is the v2 approach of
> adding the PIN bit before calling record_obj, *and* simply doing a
> WRITE_ONCE in record_obj() to make sure the PIN bit is indeed
> applied *before* writing to the handle, and not as two separate
> writes.
>
> >but I don't feel it's too heavy in this phase so,
>
> I'm afraid it's dangerous for the usage of record_obj() in
> zs_malloc() where the handle is freshly allocated by alloc_handle().
> Are we sure the bit is not set?
>
> The code in alloc_handle() is:
> return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> pool->flags & ~__GFP_HIGHMEM);
>
> There's no explicit __GFP_ZERO, so the handles are not guaranteed to
> be allocated empty? And expecting all zpool users to include
> __GFP_ZERO in flags would be too subtle and error prone.
True.
Let's go with this. I hope it's the last.
Thanks, guys.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 8:20 ` Minchan Kim
@ 2016-01-18 12:18 ` Vlastimil Babka
-1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 12:18 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 01/18/2016 09:20 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
>> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
>>> On (01/18/16 16:11), Minchan Kim wrote:
>>> [..]
>>>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>>>> something?
>>>>
>>>> We need two things
>>>> 2. memory barrier.
>>>>
>>>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>>>> by compiler.
>>>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>>>> so another CPU could see stale data caused CPU memory reordering.
>>>
>>> oh... good find! lost release semantic of unpin_tag()...
>>
>> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
>> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
>> release semantic, which would be a bit more effective, but I guess migration is
>> not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
A simple WRITE_ONCE would just add a compiler barrier. What you suggests
below does indeed add more operations, which are actually needed just in
the migration. What I suggested is the v2 approach of adding the PIN bit
before calling record_obj, *and* simply doing a WRITE_ONCE in
record_obj() to make sure the PIN bit is indeed applied *before* writing
to the handle, and not as two separate writes.
> but I don't feel it's too heavy in this phase so,
I'm afraid it's dangerous for the usage of record_obj() in zs_malloc()
where the handle is freshly allocated by alloc_handle(). Are we sure the
bit is not set?
The code in alloc_handle() is:
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
pool->flags & ~__GFP_HIGHMEM);
There's no explicit __GFP_ZERO, so the handles are not guaranteed to be
allocated empty? And expecting all zpool users to include __GFP_ZERO in
flags would be too subtle and error prone.
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> + unsigned long val = obj | locked;
> +
> + /*
> + * WRITE_ONCE could prevent store tearing like below
> + * *(unsigned long *)handle = free_obj
> + * *(unsigned long *)handle |= locked;
> + */
> + WRITE_ONCE(*(unsigned long *)handle, val);
> }
>
>
>
>>
>> Thanks,
>> Vlastimil
>>
>>>
>>> -ss
>>>
>>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 12:18 ` Vlastimil Babka
0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 12:18 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 01/18/2016 09:20 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
>> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
>>> On (01/18/16 16:11), Minchan Kim wrote:
>>> [..]
>>>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>>>> something?
>>>>
>>>> We need two things
>>>> 2. memory barrier.
>>>>
>>>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>>>> by compiler.
>>>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>>>> so another CPU could see stale data caused CPU memory reordering.
>>>
>>> oh... good find! lost release semantic of unpin_tag()...
>>
>> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
>> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
>> release semantic, which would be a bit more effective, but I guess migration is
>> not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
A simple WRITE_ONCE would just add a compiler barrier. What you suggests
below does indeed add more operations, which are actually needed just in
the migration. What I suggested is the v2 approach of adding the PIN bit
before calling record_obj, *and* simply doing a WRITE_ONCE in
record_obj() to make sure the PIN bit is indeed applied *before* writing
to the handle, and not as two separate writes.
> but I don't feel it's too heavy in this phase so,
I'm afraid it's dangerous for the usage of record_obj() in zs_malloc()
where the handle is freshly allocated by alloc_handle(). Are we sure the
bit is not set?
The code in alloc_handle() is:
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
pool->flags & ~__GFP_HIGHMEM);
There's no explicit __GFP_ZERO, so the handles are not guaranteed to be
allocated empty? And expecting all zpool users to include __GFP_ZERO in
flags would be too subtle and error prone.
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> + unsigned long val = obj | locked;
> +
> + /*
> + * WRITE_ONCE could prevent store tearing like below
> + * *(unsigned long *)handle = free_obj
> + * *(unsigned long *)handle |= locked;
> + */
> + WRITE_ONCE(*(unsigned long *)handle, val);
> }
>
>
>
>>
>> Thanks,
>> Vlastimil
>>
>>>
>>> -ss
>>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 8:20 ` Minchan Kim
@ 2016-01-18 11:08 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 11:08 UTC (permalink / raw)
To: Minchan Kim
Cc: Vlastimil Babka, Sergey Senozhatsky, Junil Lee, ngupta, akpm,
linux-mm, linux-kernel
On (01/18/16 17:20), Minchan Kim wrote:
[..]
> > > oh... good find! lost release semantic of unpin_tag()...
> >
> > Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> > with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> > release semantic, which would be a bit more effective, but I guess migration is
> > not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> but I don't feel it's too heavy in this phase so,
>
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> + unsigned long val = obj | locked;
> +
> + /*
> + * WRITE_ONCE could prevent store tearing like below
> + * *(unsigned long *)handle = free_obj
> + * *(unsigned long *)handle |= locked;
> + */
> + WRITE_ONCE(*(unsigned long *)handle, val);
> }
given that memory barriers are also compiler barriers, wouldn't
record_obj()
{
barrier
*(unsigned long *)handle) = new
}
suffice?
-ss
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 11:08 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 11:08 UTC (permalink / raw)
To: Minchan Kim
Cc: Vlastimil Babka, Sergey Senozhatsky, Junil Lee, ngupta, akpm,
linux-mm, linux-kernel
On (01/18/16 17:20), Minchan Kim wrote:
[..]
> > > oh... good find! lost release semantic of unpin_tag()...
> >
> > Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> > with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> > release semantic, which would be a bit more effective, but I guess migration is
> > not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
> but I don't feel it's too heavy in this phase so,
>
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> + unsigned long val = obj | locked;
> +
> + /*
> + * WRITE_ONCE could prevent store tearing like below
> + * *(unsigned long *)handle = free_obj
> + * *(unsigned long *)handle |= locked;
> + */
> + WRITE_ONCE(*(unsigned long *)handle, val);
> }
given that memory barriers are also compiler barriers, wouldn't
record_obj()
{
barrier
*(unsigned long *)handle) = new
}
suffice?
-ss
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 7:54 ` Vlastimil Babka
@ 2016-01-18 8:20 ` Minchan Kim
-1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 8:20 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> > On (01/18/16 16:11), Minchan Kim wrote:
> > [..]
> >>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>> something?
> >>
> >> We need two things
> >> 2. memory barrier.
> >>
> >> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >> by compiler.
> >> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >> so another CPU could see stale data caused CPU memory reordering.
> >
> > oh... good find! lost release semantic of unpin_tag()...
>
> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> release semantic, which would be a bit more effective, but I guess migration is
> not that critical path to be worth introducing it.
WRITE_ONCE in record_obj would add more memory operations in obj_malloc
but I don't feel it's too heavy in this phase so,
How about this? Junil, Could you resend patch if others agree this?
Thanks.
+/*
+ * record_obj updates handle's value to free_obj and it shouldn't
+ * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
+ * it breaks synchronization using pin_tag(e,g, zs_free) so let's
+ * keep the lock bit.
+ */
static void record_obj(unsigned long handle, unsigned long obj)
{
- *(unsigned long *)handle = obj;
+ int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
+ unsigned long val = obj | locked;
+
+ /*
+ * WRITE_ONCE could prevent store tearing like below
+ * *(unsigned long *)handle = free_obj
+ * *(unsigned long *)handle |= locked;
+ */
+ WRITE_ONCE(*(unsigned long *)handle, val);
}
>
> Thanks,
> Vlastimil
>
> >
> > -ss
> >
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 8:20 ` Minchan Kim
0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 8:20 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> > On (01/18/16 16:11), Minchan Kim wrote:
> > [..]
> >>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>> something?
> >>
> >> We need two things
> >> 2. memory barrier.
> >>
> >> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >> by compiler.
> >> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >> so another CPU could see stale data caused CPU memory reordering.
> >
> > oh... good find! lost release semantic of unpin_tag()...
>
> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> release semantic, which would be a bit more effective, but I guess migration is
> not that critical path to be worth introducing it.
WRITE_ONCE in record_obj would add more memory operations in obj_malloc
but I don't feel it's too heavy in this phase so,
How about this? Junil, Could you resend patch if others agree this?
Thanks.
+/*
+ * record_obj updates handle's value to free_obj and it shouldn't
+ * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
+ * it breaks synchronization using pin_tag(e,g, zs_free) so let's
+ * keep the lock bit.
+ */
static void record_obj(unsigned long handle, unsigned long obj)
{
- *(unsigned long *)handle = obj;
+ int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
+ unsigned long val = obj | locked;
+
+ /*
+ * WRITE_ONCE could prevent store tearing like below
+ * *(unsigned long *)handle = free_obj
+ * *(unsigned long *)handle |= locked;
+ */
+ WRITE_ONCE(*(unsigned long *)handle, val);
}
>
> Thanks,
> Vlastimil
>
> >
> > -ss
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 7:39 ` Sergey Senozhatsky
@ 2016-01-18 7:54 ` Vlastimil Babka
-1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 7:54 UTC (permalink / raw)
To: Sergey Senozhatsky, Minchan Kim
Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> On (01/18/16 16:11), Minchan Kim wrote:
> [..]
>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>> something?
>>
>> We need two things
>> 2. memory barrier.
>>
>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>> by compiler.
>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>> so another CPU could see stale data caused CPU memory reordering.
>
> oh... good find! lost release semantic of unpin_tag()...
Ah, release semantic, good point indeed. OK then we need the v2 approach again,
with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
release semantic, which would be a bit more effective, but I guess migration is
not that critical path to be worth introducing it.
Thanks,
Vlastimil
>
> -ss
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 7:54 ` Vlastimil Babka
0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-01-18 7:54 UTC (permalink / raw)
To: Sergey Senozhatsky, Minchan Kim
Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel
On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> On (01/18/16 16:11), Minchan Kim wrote:
> [..]
>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>> something?
>>
>> We need two things
>> 2. memory barrier.
>>
>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>> by compiler.
>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>> so another CPU could see stale data caused CPU memory reordering.
>
> oh... good find! lost release semantic of unpin_tag()...
Ah, release semantic, good point indeed. OK then we need the v2 approach again,
with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
release semantic, which would be a bit more effective, but I guess migration is
not that critical path to be worth introducing it.
Thanks,
Vlastimil
>
> -ss
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 7:11 ` Minchan Kim
@ 2016-01-18 7:39 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 7:39 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 16:11), Minchan Kim wrote:
[..]
> > so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> > barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> > so I don't think WRITE_ONCE() will help the compiler, or am I missing
> > something?
>
> We need two things
thanks.
> 1. compiler barrier
um... probably gcc can reorder that sequence to something like this
*handle = obj_malloc() /* unpin the object */
zs_object_copy(*handle, used_obj, class) /* now use it*/
ok.
> 2. memory barrier.
>
> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> by compiler.
> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> so another CPU could see stale data caused CPU memory reordering.
oh... good find! lost release semantic of unpin_tag()...
-ss
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 7:39 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 7:39 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Junil Lee, ngupta, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 16:11), Minchan Kim wrote:
[..]
> > so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> > barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> > so I don't think WRITE_ONCE() will help the compiler, or am I missing
> > something?
>
> We need two things
thanks.
> 1. compiler barrier
um... probably gcc can reorder that sequence to something like this
*handle = obj_malloc() /* unpin the object */
zs_object_copy(*handle, used_obj, class) /* now use it*/
ok.
> 2. memory barrier.
>
> As compiler barrier, WRITE_ONCE works to prevent store tearing here
> by compiler.
> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> so another CPU could see stale data caused CPU memory reordering.
oh... good find! lost release semantic of unpin_tag()...
-ss
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 6:54 ` Sergey Senozhatsky
@ 2016-01-18 7:11 ` Minchan Kim
-1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 7:11 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel, vbabka
On Mon, Jan 18, 2016 at 03:54:34PM +0900, Sergey Senozhatsky wrote:
> On (01/18/16 15:36), Minchan Kim wrote:
> [..]
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > > free_obj = obj_malloc(d_page, class, handle);
> > > zs_object_copy(free_obj, used_obj, class);
> > > index++;
> > > + /* This also effectively unpins the handle */
> >
> > As reply of Vlastimil, I relied that I guess it doesn't work.
> > We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> > record_obj.
> >
> > As well, it's worth to dobule check with locking guys.
> > I will send updated version.
>
> but would WRITE_ONCE() tell the compiler that there is a dependency?
> __write_once_size() does not even issue a barrier for sizes <= 8 (our
> case).
>
> include/linux/compiler.h
>
> static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> {
> switch (size) {
> case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> default:
> barrier();
> __builtin_memcpy((void *)p, (const void *)res, size);
> barrier();
> }
> }
>
> #define WRITE_ONCE(x, val) \
> ({ \
> union { typeof(x) __val; char __c[1]; } __u = \
> { .__val = (__force typeof(x)) (val) }; \
> __write_once_size(&(x), __u.__c, sizeof(x)); \
> __u.__val; \
> })
>
>
> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> something?
We need two things
1. compiler barrier
2. memory barrier.
As compiler barrier, WRITE_ONCE works to prevent store tearing here
by compiler.
However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
so another CPU could see stale data caused CPU memory reordering.
>
> .... add a barrier() to record_obj()?
>
> -ss
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 7:11 ` Minchan Kim
0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 7:11 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Junil Lee, ngupta, akpm, linux-mm, linux-kernel, vbabka
On Mon, Jan 18, 2016 at 03:54:34PM +0900, Sergey Senozhatsky wrote:
> On (01/18/16 15:36), Minchan Kim wrote:
> [..]
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > > free_obj = obj_malloc(d_page, class, handle);
> > > zs_object_copy(free_obj, used_obj, class);
> > > index++;
> > > + /* This also effectively unpins the handle */
> >
> > As reply of Vlastimil, I relied that I guess it doesn't work.
> > We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> > record_obj.
> >
> > As well, it's worth to dobule check with locking guys.
> > I will send updated version.
>
> but would WRITE_ONCE() tell the compiler that there is a dependency?
> __write_once_size() does not even issue a barrier for sizes <= 8 (our
> case).
>
> include/linux/compiler.h
>
> static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> {
> switch (size) {
> case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> default:
> barrier();
> __builtin_memcpy((void *)p, (const void *)res, size);
> barrier();
> }
> }
>
> #define WRITE_ONCE(x, val) \
> ({ \
> union { typeof(x) __val; char __c[1]; } __u = \
> { .__val = (__force typeof(x)) (val) }; \
> __write_once_size(&(x), __u.__c, sizeof(x)); \
> __u.__val; \
> })
>
>
> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> so I don't think WRITE_ONCE() will help the compiler, or am I missing
> something?
We need two things
1. compiler barrier
2. memory barrier.
As compiler barrier, WRITE_ONCE works to prevent store tearing here
by compiler.
However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
so another CPU could see stale data caused CPU memory reordering.
>
> .... add a barrier() to record_obj()?
>
> -ss
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 6:36 ` Minchan Kim
@ 2016-01-18 6:54 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 6:54 UTC (permalink / raw)
To: Minchan Kim
Cc: Junil Lee, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 15:36), Minchan Kim wrote:
[..]
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > free_obj = obj_malloc(d_page, class, handle);
> > zs_object_copy(free_obj, used_obj, class);
> > index++;
> > + /* This also effectively unpins the handle */
>
> As reply of Vlastimil, I relied that I guess it doesn't work.
> We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> record_obj.
>
> As well, it's worth to dobule check with locking guys.
> I will send updated version.
but would WRITE_ONCE() tell the compiler that there is a dependency?
__write_once_size() does not even issue a barrier for sizes <= 8 (our
case).
include/linux/compiler.h
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
default:
barrier();
__builtin_memcpy((void *)p, (const void *)res, size);
barrier();
}
}
#define WRITE_ONCE(x, val) \
({ \
union { typeof(x) __val; char __c[1]; } __u = \
{ .__val = (__force typeof(x)) (val) }; \
__write_once_size(&(x), __u.__c, sizeof(x)); \
__u.__val; \
})
so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
so I don't think WRITE_ONCE() will help the compiler, or am I missing
something?
.... add a barrier() to record_obj()?
-ss
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 6:54 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 6:54 UTC (permalink / raw)
To: Minchan Kim
Cc: Junil Lee, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 15:36), Minchan Kim wrote:
[..]
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > free_obj = obj_malloc(d_page, class, handle);
> > zs_object_copy(free_obj, used_obj, class);
> > index++;
> > + /* This also effectively unpins the handle */
>
> As reply of Vlastimil, I relied that I guess it doesn't work.
> We shouldn't omit unpin_tag and we should add WRITE_ONCE in
> record_obj.
>
> As well, it's worth to dobule check with locking guys.
> I will send updated version.
but would WRITE_ONCE() tell the compiler that there is a dependency?
__write_once_size() does not even issue a barrier for sizes <= 8 (our
case).
include/linux/compiler.h
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
default:
barrier();
__builtin_memcpy((void *)p, (const void *)res, size);
barrier();
}
}
#define WRITE_ONCE(x, val) \
({ \
union { typeof(x) __val; char __c[1]; } __u = \
{ .__val = (__force typeof(x)) (val) }; \
__write_once_size(&(x), __u.__c, sizeof(x)); \
__u.__val; \
})
so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
so I don't think WRITE_ONCE() will help the compiler, or am I missing
something?
.... add a barrier() to record_obj()?
-ss
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 5:39 ` Junil Lee
@ 2016-01-18 6:36 ` Minchan Kim
-1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 6:36 UTC (permalink / raw)
To: Junil Lee
Cc: ngupta, sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka
Hello Junil,
On Mon, Jan 18, 2016 at 02:39:56PM +0900, Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
As reply of Vlastimil, I relied that I guess it doesn't work.
We shouldn't omit unpin_tag and we should add WRITE_ONCE in
record_obj.
As well, it's worth to dobule check with locking guys.
I will send updated version.
Thanks.
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 6:36 ` Minchan Kim
0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2016-01-18 6:36 UTC (permalink / raw)
To: Junil Lee
Cc: ngupta, sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka
Hello Junil,
On Mon, Jan 18, 2016 at 02:39:56PM +0900, Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
As reply of Vlastimil, I relied that I guess it doesn't work.
We shouldn't omit unpin_tag and we should add WRITE_ONCE in
record_obj.
As well, it's worth to dobule check with locking guys.
I will send updated version.
Thanks.
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
2016-01-18 5:39 ` Junil Lee
@ 2016-01-18 6:13 ` Sergey Senozhatsky
-1 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 6:13 UTC (permalink / raw)
To: Junil Lee
Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 14:39), Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.1+]
migrate_zspage() was introduced in 4.1
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/zsmalloc.c?id=refs/tags/v4.1
thank you.
-ss
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 6:13 ` Sergey Senozhatsky
0 siblings, 0 replies; 30+ messages in thread
From: Sergey Senozhatsky @ 2016-01-18 6:13 UTC (permalink / raw)
To: Junil Lee
Cc: minchan, ngupta, sergey.senozhatsky.work, akpm, linux-mm,
linux-kernel, vbabka
On (01/18/16 14:39), Junil Lee wrote:
> record_obj() in migrate_zspage() does not preserve handle's
> HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
> (accidentally) un-pins the handle, while migrate_zspage() still performs
> an explicit unpin_tag() on the that handle.
> This additional explicit unpin_tag() introduces a race condition with
> zs_free(), which can pin that handle by this time, so the handle becomes
> un-pinned.
>
> Schematically, it goes like this:
>
> CPU0 CPU1
> migrate_zspage
> find_alloced_obj
> trypin_tag
> set HANDLE_PIN_BIT zs_free()
> pin_tag()
> obj_malloc() -- new object, no tag
> record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
> unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
>
> The race condition may result in a NULL pointer dereference:
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
> PC is at get_zspage_mapping+0x0/0x24
> LR is at obj_free.isra.22+0x64/0x128
> Call trace:
> [<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
> [<ffffffc0001a4918>] zs_free+0x88/0x114
> [<ffffffc00053ae54>] zram_free_page+0x64/0xcc
> [<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
> [<ffffffc000196638>] swap_entry_free+0x278/0x294
> [<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
> [<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
> [<ffffffc000184350>] unmap_vmas+0x44/0x60
> [<ffffffc00018a53c>] exit_mmap+0x50/0x110
> [<ffffffc00009e408>] mmput+0x58/0xe0
> [<ffffffc0000a2854>] do_exit+0x320/0x8dc
> [<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
> [<ffffffc0000ae1bc>] get_signal+0x538/0x580
> [<ffffffc000087e44>] do_signal+0x98/0x4b8
> [<ffffffc00008843c>] do_notify_resume+0x14/0x5c
>
> Fix the race by removing explicit unpin_tag() from migrate_zspage().
>
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.1+]
migrate_zspage() was introduced in 4.1
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/mm/zsmalloc.c?id=refs/tags/v4.1
thank you.
-ss
> ---
> mm/zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7414ce..0acfa20 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
> free_obj = obj_malloc(d_page, class, handle);
> zs_object_copy(free_obj, used_obj, class);
> index++;
> + /* This also effectively unpins the handle */
> record_obj(handle, free_obj);
> - unpin_tag(handle);
> obj_free(pool, class, used_obj);
> }
>
> --
> 2.6.2
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 5:39 ` Junil Lee
0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18 5:39 UTC (permalink / raw)
To: minchan, ngupta
Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka, Junil Lee
record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.
Schematically, it goes like this:
CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
The race condition may result in a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
Fix the race by removing explicit unpin_tag() from migrate_zspage().
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
mm/zsmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj = obj_malloc(d_page, class, handle);
zs_object_copy(free_obj, used_obj, class);
index++;
+ /* This also effectively unpins the handle */
record_obj(handle, free_obj);
- unpin_tag(handle);
obj_free(pool, class, used_obj);
}
--
2.6.2
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
@ 2016-01-18 5:39 ` Junil Lee
0 siblings, 0 replies; 30+ messages in thread
From: Junil Lee @ 2016-01-18 5:39 UTC (permalink / raw)
To: minchan, ngupta
Cc: sergey.senozhatsky.work, akpm, linux-mm, linux-kernel, vbabka, Junil Lee
record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.
This additional explicit unpin_tag() introduces a race condition with
zs_free(), which can pin that handle by this time, so the handle becomes
un-pinned.
Schematically, it goes like this:
CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT
The race condition may result in a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
[<ffffffc0001a3aa8>] get_zspage_mapping+0x0/0x24
[<ffffffc0001a4918>] zs_free+0x88/0x114
[<ffffffc00053ae54>] zram_free_page+0x64/0xcc
[<ffffffc00053af4c>] zram_slot_free_notify+0x90/0x108
[<ffffffc000196638>] swap_entry_free+0x278/0x294
[<ffffffc000199008>] free_swap_and_cache+0x38/0x11c
[<ffffffc0001837ac>] unmap_single_vma+0x480/0x5c8
[<ffffffc000184350>] unmap_vmas+0x44/0x60
[<ffffffc00018a53c>] exit_mmap+0x50/0x110
[<ffffffc00009e408>] mmput+0x58/0xe0
[<ffffffc0000a2854>] do_exit+0x320/0x8dc
[<ffffffc0000a3cb4>] do_group_exit+0x44/0xa8
[<ffffffc0000ae1bc>] get_signal+0x538/0x580
[<ffffffc000087e44>] do_signal+0x98/0x4b8
[<ffffffc00008843c>] do_notify_resume+0x14/0x5c
Fix the race by removing explicit unpin_tag() from migrate_zspage().
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
mm/zsmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7414ce..0acfa20 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1635,8 +1635,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj = obj_malloc(d_page, class, handle);
zs_object_copy(free_obj, used_obj, class);
index++;
+ /* This also effectively unpins the handle */
record_obj(handle, free_obj);
- unpin_tag(handle);
obj_free(pool, class, used_obj);
}
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-01-18 14:11 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18 1:15 [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition Junil Lee
2016-01-18 1:15 ` Junil Lee
2016-01-18 4:14 ` Sergey Senozhatsky
2016-01-18 4:14 ` Sergey Senozhatsky
2016-01-18 4:17 ` Sergey Senozhatsky
2016-01-18 4:17 ` Sergey Senozhatsky
2016-01-18 5:39 Junil Lee
2016-01-18 5:39 ` Junil Lee
2016-01-18 6:13 ` Sergey Senozhatsky
2016-01-18 6:13 ` Sergey Senozhatsky
2016-01-18 6:36 ` Minchan Kim
2016-01-18 6:36 ` Minchan Kim
2016-01-18 6:54 ` Sergey Senozhatsky
2016-01-18 6:54 ` Sergey Senozhatsky
2016-01-18 7:11 ` Minchan Kim
2016-01-18 7:11 ` Minchan Kim
2016-01-18 7:39 ` Sergey Senozhatsky
2016-01-18 7:39 ` Sergey Senozhatsky
2016-01-18 7:54 ` Vlastimil Babka
2016-01-18 7:54 ` Vlastimil Babka
2016-01-18 8:20 ` Minchan Kim
2016-01-18 8:20 ` Minchan Kim
2016-01-18 11:08 ` Sergey Senozhatsky
2016-01-18 11:08 ` Sergey Senozhatsky
2016-01-18 12:18 ` Vlastimil Babka
2016-01-18 12:18 ` Vlastimil Babka
2016-01-18 14:09 ` Minchan Kim
2016-01-18 14:09 ` Minchan Kim
2016-01-18 14:10 ` Vlastimil Babka
2016-01-18 14:10 ` Vlastimil Babka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.