linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount
@ 2020-03-27  3:10 Qian Cai
  2020-03-27  9:37 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2020-03-27  3:10 UTC (permalink / raw)
  To: peterz, mingo; +Cc: will, dbueso, juri.lelli, longman, linux-kernel, Qian Cai

There are some memory leaks due to a missing put_task_struct().

Fixes: 7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem")
Signed-off-by: Qian Cai <cai@lca.pw>
---
 kernel/locking/percpu-rwsem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index a008a1ba21a7..6f487e5d923f 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -123,8 +123,10 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
 	struct percpu_rw_semaphore *sem = key;
 
 	/* concurrent against percpu_down_write(), can get stolen */
-	if (!__percpu_rwsem_trylock(sem, reader))
+	if (!__percpu_rwsem_trylock(sem, reader)) {
+		put_task_struct(p);
 		return 1;
+	}
 
 	list_del_init(&wq_entry->entry);
 	smp_store_release(&wq_entry->private, NULL);
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount
  2020-03-27  3:10 [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Qian Cai
@ 2020-03-27  9:37 ` Peter Zijlstra
  2020-03-27 10:19   ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-03-27  9:37 UTC (permalink / raw)
  To: Qian Cai; +Cc: mingo, will, dbueso, juri.lelli, longman, linux-kernel

On Thu, Mar 26, 2020 at 11:10:57PM -0400, Qian Cai wrote:
> There are some memory leaks due to a missing put_task_struct().

This is an absolutely inadequate changelog. There is no explaning what
the actual race is and why this patch is correct.

> Fixes: 7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem")
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  kernel/locking/percpu-rwsem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index a008a1ba21a7..6f487e5d923f 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -123,8 +123,10 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
>  	struct percpu_rw_semaphore *sem = key;
>  
>  	/* concurrent against percpu_down_write(), can get stolen */
> -	if (!__percpu_rwsem_trylock(sem, reader))
> +	if (!__percpu_rwsem_trylock(sem, reader)) {
> +		put_task_struct(p);
>  		return 1;
> +	}


If the trylock fails, someone else got the lock and we remain on the
waitqueue. It seems like a very bad idea to put the task while it
remains on the waitqueue, no?

>  
>  	list_del_init(&wq_entry->entry);
>  	smp_store_release(&wq_entry->private, NULL);
> -- 
> 2.21.0 (Apple Git-122.2)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount
  2020-03-27  9:37 ` Peter Zijlstra
@ 2020-03-27 10:19   ` Qian Cai
  2020-03-27 20:47     ` Memory leaks due to "locking/percpu-rwsem: Remove the embedded rwsem" Qian Cai
  2020-03-30 11:18     ` [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Qian Cai @ 2020-03-27 10:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, will, dbueso, juri.lelli, longman, linux-kernel



> On Mar 27, 2020, at 5:37 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> If the trylock fails, someone else got the lock and we remain on the
> waitqueue. It seems like a very bad idea to put the task while it
> remains on the waitqueue, no?

Interesting, I thought this was more straightforward to see, but I may be wrong as always. At the beginning of percpu_rwsem_wake_function() it calls get_task_struct(), but if the trylock failed, it will remain in the waitqueue. However, it will run percpu_rwsem_wake_function() again with get_task_struct() to increase the refcount. Can you enlighten me where it will call put_task_struct() in waitqueue or elsewhere to balance the refcount in this case?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Memory leaks due to "locking/percpu-rwsem: Remove the embedded rwsem"
  2020-03-27 10:19   ` Qian Cai
@ 2020-03-27 20:47     ` Qian Cai
  2020-03-30 11:18     ` [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Qian Cai @ 2020-03-27 20:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, dbueso, juri.lelli, longman, linux-kernel



> On Mar 27, 2020, at 6:19 AM, Qian Cai <cai@lca.pw> wrote:
> 
> 
> 
>> On Mar 27, 2020, at 5:37 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> If the trylock fails, someone else got the lock and we remain on the
>> waitqueue. It seems like a very bad idea to put the task while it
>> remains on the waitqueue, no?
> 
> Interesting, I thought this was more straightforward to see, but I may be wrong as always. At the beginning of percpu_rwsem_wake_function() it calls get_task_struct(), but if the trylock failed, it will remain in the waitqueue. However, it will run percpu_rwsem_wake_function() again with get_task_struct() to increase the refcount. Can you enlighten me where it will call put_task_struct() in waitqueue or elsewhere to balance the refcount in this case?

I am pretty confident that the linux-next commit,

7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem”)

Introduced memory leaks,

I put a debugging patch here,

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index a008a1ba21a7..857602ef54f1 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -123,8 +123,10 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
 	struct percpu_rw_semaphore *sem = key;
 
 	/* concurrent against percpu_down_write(), can get stolen */
-	if (!__percpu_rwsem_trylock(sem, reader))
+	if (!__percpu_rwsem_trylock(sem, reader)) {
+		printk("KK __percpu_rwsem_trylock\n");
 		return 1;
+	}
 
 	list_del_init(&wq_entry->entry);
 	smp_store_release(&wq_entry->private, NULL);

Once those printks() triggered, it ends up with task_struct leaks,

unreferenced object 0xc000200df1422280 (size 8192):
  comm "read_all", pid 12975, jiffies 4297309144 (age 5351.480s)
  hex dump (first 32 bytes):
    02 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f5c5fa2d>] copy_process+0x26c/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc00020047ef8eb80 (size 120):
  comm "read_all", pid 12975, jiffies 4297309144 (age 5351.480s)
  hex dump (first 32 bytes):
    02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000004def8a44>] prepare_creds+0x38/0x110
    [<0000000037a68116>] copy_creds+0xbc/0x1d0
    [<0000000016b7471c>] copy_process+0x454/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc000200d96f80800 (size 1384):
  comm "read_all", pid 12975, jiffies 4297309144 (age 5351.480s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    10 08 f8 96 0d 20 00 c0 10 08 f8 96 0d 20 00 c0  ..... ....... ..
  backtrace:
    [<000000008894d13b>] copy_process+0xa40/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc000001e91ba4000 (size 16384):
  comm "read_all", pid 12982, jiffies 4297309462 (age 5348.300s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000009689397b>] kzalloc.constprop.48+0x1c/0x30
    [<000000001753eb18>] task_numa_fault+0xac8/0x1260
    [<0000000047bb80b1>] __handle_mm_fault+0x12cc/0x1b00
    [<00000000c0a4c8ba>] handle_mm_fault+0x298/0x450
    [<000000003465b20d>] __do_page_fault+0x2b8/0xf90
    [<000000005037fec9>] handle_page_fault+0x10/0x30
unreferenced object 0xc0002015fe4aaa80 (size 8192):
  comm "read_all", pid 13157, jiffies 4297353979 (age 4903.130s)
  hex dump (first 32 bytes):
    02 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f5c5fa2d>] copy_process+0x26c/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc00020047ef8f080 (size 120):
  comm "read_all", pid 13157, jiffies 4297353979 (age 4903.130s)
  hex dump (first 32 bytes):
    02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000004def8a44>] prepare_creds+0x38/0x110
    [<0000000037a68116>] copy_creds+0xbc/0x1d0
    [<0000000016b7471c>] copy_process+0x454/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc0002012a9388f00 (size 1384):
  comm "read_all", pid 13157, jiffies 4297353979 (age 4903.130s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    10 8f 38 a9 12 20 00 c0 10 8f 38 a9 12 20 00 c0  ..8.. ....8.. ..
  backtrace:
    [<000000008894d13b>] copy_process+0xa40/0x1920
    [<0000000099229290>] _do_fork+0xac/0xb20
    [<00000000d40a7825>] __do_sys_clone+0x98/0xe0
    [<00000000c7cd06a4>] ppc_clone+0x8/0xc
unreferenced object 0xc000001c86704000 (size 16384):
  comm "read_all", pid 13164, jiffies 4297354081 (age 4902.110s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000009689397b>] kzalloc.constprop.48+0x1c/0x30
    [<000000001753eb18>] task_numa_fault+0xac8/0x1260
    [<0000000047bb80b1>] __handle_mm_fault+0x12cc/0x1b00
    [<00000000c0a4c8ba>] handle_mm_fault+0x298/0x450
    [<000000003465b20d>] __do_page_fault+0x2b8/0xf90
    [<000000005037fec9>] handle_page_fault+0x10/0x30

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount
  2020-03-27 10:19   ` Qian Cai
  2020-03-27 20:47     ` Memory leaks due to "locking/percpu-rwsem: Remove the embedded rwsem" Qian Cai
@ 2020-03-30 11:18     ` Peter Zijlstra
  2020-03-30 13:18       ` Qian Cai
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-03-30 11:18 UTC (permalink / raw)
  To: Qian Cai; +Cc: mingo, will, dbueso, juri.lelli, longman, linux-kernel

On Fri, Mar 27, 2020 at 06:19:37AM -0400, Qian Cai wrote:
> 
> 
> > On Mar 27, 2020, at 5:37 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > If the trylock fails, someone else got the lock and we remain on the
> > waitqueue. It seems like a very bad idea to put the task while it
> > remains on the waitqueue, no?
> 
> Interesting, I thought this was more straightforward to see,

It is indeed as straight forward as you explain; but when doing 10
things at once, and having just dug through some low-level arch assembly
code for the previous email, even obvious things might sometimes need
a little explaining :/

So please, always try and err on the side of a little verbose when
writing Changelogs, esp. when concerning locking / concurrency, you
really can't be clear enough.

> but I may
> be wrong as always. At the beginning of percpu_rwsem_wake_function()
> it calls get_task_struct(), but if the trylock failed, it will remain
> in the waitqueue. However, it will run percpu_rwsem_wake_function()
> again with get_task_struct() to increase the refcount. Can you
> enlighten me where it will call put_task_struct() in waitqueue or
> elsewhere to balance the refcount in this case?

See, had that explaination been part of the Changelog, my brain would've
probably been able to kick itself in gear and actually spot the problem.

Yes, you're right.

That said, I wonder if we can just move the get_task_struct() call like
below; after all the race we're guarding against is percpu_rwsem_wait()
observing !private, terminating the wait and doing a quick exit() while
percpu_rwsem_wake_function() then does wake_up_process(p) as a
use-after-free.

Hmm?

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index a008a1ba21a7..8bbafe3e5203 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -118,14 +118,15 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
 				      unsigned int mode, int wake_flags,
 				      void *key)
 {
-	struct task_struct *p = get_task_struct(wq_entry->private);
 	bool reader = wq_entry->flags & WQ_FLAG_CUSTOM;
 	struct percpu_rw_semaphore *sem = key;
+	struct task_struct *p;
 
 	/* concurrent against percpu_down_write(), can get stolen */
 	if (!__percpu_rwsem_trylock(sem, reader))
 		return 1;
 
+	p = get_task_struct(wq_entry->private);
 	list_del_init(&wq_entry->entry);
 	smp_store_release(&wq_entry->private, NULL);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount
  2020-03-30 11:18     ` [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Peter Zijlstra
@ 2020-03-30 13:18       ` Qian Cai
  0 siblings, 0 replies; 6+ messages in thread
From: Qian Cai @ 2020-03-30 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, dbueso, juri.lelli, Waiman Long, linux-kernel



> On Mar 30, 2020, at 7:18 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Fri, Mar 27, 2020 at 06:19:37AM -0400, Qian Cai wrote:
>> 
>> 
>>> On Mar 27, 2020, at 5:37 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> If the trylock fails, someone else got the lock and we remain on the
>>> waitqueue. It seems like a very bad idea to put the task while it
>>> remains on the waitqueue, no?
>> 
>> Interesting, I thought this was more straightforward to see,
> 
> It is indeed as straight forward as you explain; but when doing 10
> things at once, and having just dug through some low-level arch assembly
> code for the previous email, even obvious things might sometimes need
> a little explaining :/
> 
> So please, always try and err on the side of a little verbose when
> writing Changelogs, esp. when concerning locking / concurrency, you
> really can't be clear enough.
> 
>> but I may
>> be wrong as always. At the beginning of percpu_rwsem_wake_function()
>> it calls get_task_struct(), but if the trylock failed, it will remain
>> in the waitqueue. However, it will run percpu_rwsem_wake_function()
>> again with get_task_struct() to increase the refcount. Can you
>> enlighten me where it will call put_task_struct() in waitqueue or
>> elsewhere to balance the refcount in this case?
> 
> See, had that explaination been part of the Changelog, my brain would've
> probably been able to kick itself in gear and actually spot the problem.
> 
> Yes, you're right.
> 
> That said, I wonder if we can just move the get_task_struct() call like
> below; after all the race we're guarding against is percpu_rwsem_wait()
> observing !private, terminating the wait and doing a quick exit() while
> percpu_rwsem_wake_function() then does wake_up_process(p) as a
> use-after-free.

Looks good to me. If no one has any objection, I’ll dust-out the commit log
and send out a v2 for it. 

> 
> Hmm?
> 
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index a008a1ba21a7..8bbafe3e5203 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -118,14 +118,15 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
> 				      unsigned int mode, int wake_flags,
> 				      void *key)
> {
> -	struct task_struct *p = get_task_struct(wq_entry->private);
> 	bool reader = wq_entry->flags & WQ_FLAG_CUSTOM;
> 	struct percpu_rw_semaphore *sem = key;
> +	struct task_struct *p;
> 
> 	/* concurrent against percpu_down_write(), can get stolen */
> 	if (!__percpu_rwsem_trylock(sem, reader))
> 		return 1;
> 
> +	p = get_task_struct(wq_entry->private);
> 	list_del_init(&wq_entry->entry);
> 	smp_store_release(&wq_entry->private, NULL);
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-30 13:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27  3:10 [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Qian Cai
2020-03-27  9:37 ` Peter Zijlstra
2020-03-27 10:19   ` Qian Cai
2020-03-27 20:47     ` Memory leaks due to "locking/percpu-rwsem: Remove the embedded rwsem" Qian Cai
2020-03-30 11:18     ` [PATCH -next] locking/percpu-rwsem: fix a task_struct refcount Peter Zijlstra
2020-03-30 13:18       ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).