linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] racy access to p->mm in membarrier_global_expedited()
@ 2019-01-25 17:26 Jann Horn
  2019-01-28 14:15 ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Jann Horn @ 2019-01-25 17:26 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul E. McKenney
  Cc: kernel list, Thomas Gleixner, Peter Zijlstra (Intel)

membarrier_global_expedited() runs the following code (introduced in
commit c5f58bd58f43), protected only by an RCU read-side critical
section and the cpu_hotplug_lock:

        p = task_rcu_dereference(&cpu_rq(cpu)->curr);
        if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
                           MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
                if (!fallback)
                        __cpumask_set_cpu(cpu, tmpmask);
                else
                        smp_call_function_single(cpu, ipi_mb, NULL, 1);
        }

p->mm is not protected by either lock. This means that in theory, the
following races could occur:

1. If the compiler emitted two separate reads of ->mm, the second read
of p->mm could return a NULL pointer and crash.
2. If the mm is deallocated directly before the atomic_read() occurs,
the atomic_read() could access a freed pointer (I think?).

Neither of these are particularly likely - looking at the assembly of
a normal build, the first race doesn't exist because the compiler
optimizes the second read away, and the second race isn't going to
cause anything particularly interesting. Still, this should probably
be fixed...

As far as I can tell, you'll have to either take the task_lock()
around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
RCU to the lifetime of mm_struct. I'm not entirely sure what the
better fix is... probably task_lock() makes more sense?


To test the bug, I patched an extra delay into the code:

====================
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 3cd8a3a795d2..69cc52039576 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -14,6 +14,7 @@
  * GNU General Public License for more details.
  */
 #include "sched.h"
+#include <linux/delay.h>

 /*
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
@@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)

                rcu_read_lock();
                p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
+               if (p && p->mm && (mdelay(100), 1) &&
(atomic_read(&p->mm->membarrier_state) &
                                   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
                        if (!fallback)
                                __cpumask_set_cpu(cpu, tmpmask);
====================

On a kernel with that patch applied, I ran this test code:

====================
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <linux/membarrier.h>
#include <err.h>

int main(void) {
  while (1) {
    printf("executing global expedited barrier...\n");
    int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
    if (res) err(1, "barrier");
  }
}
====================

That resulted in this splat:

[  212.697681] ==================================================================
[  212.700582] BUG: KASAN: null-ptr-deref in
membarrier_global_expedited+0x15f/0x220
[  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177

[  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
[  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  212.712263] Call Trace:
[  212.713177]  dump_stack+0x71/0xab
[  212.714375]  ? membarrier_global_expedited+0x15f/0x220
[  212.716236]  ? membarrier_global_expedited+0x15f/0x220
[  212.718099]  kasan_report+0x176/0x192
[  212.719445]  ? finish_task_switch+0x340/0x3d0
[  212.721057]  ? membarrier_global_expedited+0x15f/0x220
[  212.722921]  membarrier_global_expedited+0x15f/0x220
[  212.724696]  ? ipi_mb+0x10/0x10
[  212.725816]  ? vfs_write+0x120/0x230
[  212.727113]  ? __ia32_sys_read+0x50/0x50
[  212.728596]  __x64_sys_membarrier+0x85/0xf0
[  212.730056]  do_syscall_64+0x73/0x160
[  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  212.733236] RIP: 0033:0x7fbe8747e229
[  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
01 48
[  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000144
[  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
[  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
[  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
[  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
[  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
[  212.756305] ==================================================================

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [BUG] racy access to p->mm in membarrier_global_expedited()
  2019-01-25 17:26 [BUG] racy access to p->mm in membarrier_global_expedited() Jann Horn
@ 2019-01-28 14:15 ` Paul E. McKenney
  2019-01-28 16:10   ` Mathieu Desnoyers
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2019-01-28 14:15 UTC (permalink / raw)
  To: Jann Horn
  Cc: Mathieu Desnoyers, kernel list, Thomas Gleixner, Peter Zijlstra (Intel)

On Fri, Jan 25, 2019 at 06:26:47PM +0100, Jann Horn wrote:
> membarrier_global_expedited() runs the following code (introduced in
> commit c5f58bd58f43), protected only by an RCU read-side critical
> section and the cpu_hotplug_lock:
> 
>         p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>         if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>                            MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>                 if (!fallback)
>                         __cpumask_set_cpu(cpu, tmpmask);
>                 else
>                         smp_call_function_single(cpu, ipi_mb, NULL, 1);
>         }
> 
> p->mm is not protected by either lock. This means that in theory, the
> following races could occur:
> 
> 1. If the compiler emitted two separate reads of ->mm, the second read
> of p->mm could return a NULL pointer and crash.
> 2. If the mm is deallocated directly before the atomic_read() occurs,
> the atomic_read() could access a freed pointer (I think?).
> 
> Neither of these are particularly likely - looking at the assembly of
> a normal build, the first race doesn't exist because the compiler
> optimizes the second read away, and the second race isn't going to
> cause anything particularly interesting. Still, this should probably
> be fixed...
> 
> As far as I can tell, you'll have to either take the task_lock()
> around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
> RCU to the lifetime of mm_struct. I'm not entirely sure what the
> better fix is... probably task_lock() makes more sense?

Ouch!!!

Acquiring task_lock() would work, but would be a global lock.
This could be addressed to some extent by batching concurrent
membarrier_global_expedited() invocations, so that one call to
membarrier_global_expedited() does the job for the set of concurrent
calls.  The usual approach would use a counter, a pair of wait queues,
and a kthread.

I must defer to the mm guys on adding RCU to the lifetime of mm_struct.

Another approach would be to put the MEMBARRIER_STATE_GLOBAL_EXPEDITED
in the task structure.  Yet another approach would be to acquire the
runqueue lock, thus preventing the task from switching away -- except
that it might be in the middle of exit(), so never mind.

Other approaches?

							Thanx, Paul

> To test the bug, I patched an extra delay into the code:
> 
> ====================
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 3cd8a3a795d2..69cc52039576 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -14,6 +14,7 @@
>   * GNU General Public License for more details.
>   */
>  #include "sched.h"
> +#include <linux/delay.h>
> 
>  /*
>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> @@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)
> 
>                 rcu_read_lock();
>                 p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> -               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> +               if (p && p->mm && (mdelay(100), 1) &&
> (atomic_read(&p->mm->membarrier_state) &
>                                    MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>                         if (!fallback)
>                                 __cpumask_set_cpu(cpu, tmpmask);
> ====================
> 
> On a kernel with that patch applied, I ran this test code:
> 
> ====================
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <sys/syscall.h>
> #include <stdio.h>
> #include <linux/membarrier.h>
> #include <err.h>
> 
> int main(void) {
>   while (1) {
>     printf("executing global expedited barrier...\n");
>     int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
>     if (res) err(1, "barrier");
>   }
> }
> ====================
> 
> That resulted in this splat:
> 
> [  212.697681] ==================================================================
> [  212.700582] BUG: KASAN: null-ptr-deref in
> membarrier_global_expedited+0x15f/0x220
> [  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177
> 
> [  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
> [  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1 04/01/2014
> [  212.712263] Call Trace:
> [  212.713177]  dump_stack+0x71/0xab
> [  212.714375]  ? membarrier_global_expedited+0x15f/0x220
> [  212.716236]  ? membarrier_global_expedited+0x15f/0x220
> [  212.718099]  kasan_report+0x176/0x192
> [  212.719445]  ? finish_task_switch+0x340/0x3d0
> [  212.721057]  ? membarrier_global_expedited+0x15f/0x220
> [  212.722921]  membarrier_global_expedited+0x15f/0x220
> [  212.724696]  ? ipi_mb+0x10/0x10
> [  212.725816]  ? vfs_write+0x120/0x230
> [  212.727113]  ? __ia32_sys_read+0x50/0x50
> [  212.728596]  __x64_sys_membarrier+0x85/0xf0
> [  212.730056]  do_syscall_64+0x73/0x160
> [  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  212.733236] RIP: 0033:0x7fbe8747e229
> [  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
> 01 48
> [  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000144
> [  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
> [  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
> [  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
> [  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
> [  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
> [  212.756305] ==================================================================
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] racy access to p->mm in membarrier_global_expedited()
  2019-01-28 14:15 ` Paul E. McKenney
@ 2019-01-28 16:10   ` Mathieu Desnoyers
  2019-01-28 16:27     ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2019-01-28 16:10 UTC (permalink / raw)
  To: paulmck; +Cc: Jann Horn, linux-kernel, Thomas Gleixner, Peter Zijlstra

----- On Jan 28, 2019, at 9:15 AM, paulmck paulmck@linux.ibm.com wrote:

> On Fri, Jan 25, 2019 at 06:26:47PM +0100, Jann Horn wrote:
>> membarrier_global_expedited() runs the following code (introduced in
>> commit c5f58bd58f43), protected only by an RCU read-side critical
>> section and the cpu_hotplug_lock:
>> 
>>         p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>>         if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>>                            MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>>                 if (!fallback)
>>                         __cpumask_set_cpu(cpu, tmpmask);
>>                 else
>>                         smp_call_function_single(cpu, ipi_mb, NULL, 1);
>>         }
>> 
>> p->mm is not protected by either lock. This means that in theory, the
>> following races could occur:
>> 
>> 1. If the compiler emitted two separate reads of ->mm, the second read
>> of p->mm could return a NULL pointer and crash.
>> 2. If the mm is deallocated directly before the atomic_read() occurs,
>> the atomic_read() could access a freed pointer (I think?).
>> 
>> Neither of these are particularly likely - looking at the assembly of
>> a normal build, the first race doesn't exist because the compiler
>> optimizes the second read away, and the second race isn't going to
>> cause anything particularly interesting. Still, this should probably
>> be fixed...
>> 
>> As far as I can tell, you'll have to either take the task_lock()
>> around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
>> RCU to the lifetime of mm_struct. I'm not entirely sure what the
>> better fix is... probably task_lock() makes more sense?
> 
> Ouch!!!
> 
> Acquiring task_lock() would work, but would be a global lock.
> This could be addressed to some extent by batching concurrent
> membarrier_global_expedited() invocations, so that one call to
> membarrier_global_expedited() does the job for the set of concurrent
> calls.  The usual approach would use a counter, a pair of wait queues,
> and a kthread.

We could start by grabbing the task_lock() as an initial fix, and
then address any performance-related complains with your approach
if need be.

> 
> I must defer to the mm guys on adding RCU to the lifetime of mm_struct.

Likewise.

> Another approach would be to put the MEMBARRIER_STATE_GLOBAL_EXPEDITED
> in the task structure.

Then the tricky part becomes how to make sure the per-task-struct
state is consistent across all tasks pointing to the same mm_struct
(including processes created with clone CLONE_VM flag).

> Yet another approach would be to acquire the
> runqueue lock, thus preventing the task from switching away -- except
> that it might be in the middle of exit(), so never mind.

And I suspect that grabbing the runqueue lock may cause more contention
that grabbing the task_lock().

I'll send a patch implementing the task_lock() approach as RFC.

Thanks,

Mathieu

> 
> Other approaches?
> 
>							Thanx, Paul
> 
>> To test the bug, I patched an extra delay into the code:
>> 
>> ====================
>> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> index 3cd8a3a795d2..69cc52039576 100644
>> --- a/kernel/sched/membarrier.c
>> +++ b/kernel/sched/membarrier.c
>> @@ -14,6 +14,7 @@
>>   * GNU General Public License for more details.
>>   */
>>  #include "sched.h"
>> +#include <linux/delay.h>
>> 
>>  /*
>>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>> @@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)
>> 
>>                 rcu_read_lock();
>>                 p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> -               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>> +               if (p && p->mm && (mdelay(100), 1) &&
>> (atomic_read(&p->mm->membarrier_state) &
>>                                    MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>>                         if (!fallback)
>>                                 __cpumask_set_cpu(cpu, tmpmask);
>> ====================
>> 
>> On a kernel with that patch applied, I ran this test code:
>> 
>> ====================
>> #define _GNU_SOURCE
>> #include <unistd.h>
>> #include <sys/syscall.h>
>> #include <stdio.h>
>> #include <linux/membarrier.h>
>> #include <err.h>
>> 
>> int main(void) {
>>   while (1) {
>>     printf("executing global expedited barrier...\n");
>>     int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
>>     if (res) err(1, "barrier");
>>   }
>> }
>> ====================
>> 
>> That resulted in this splat:
>> 
>> [  212.697681]
>> ==================================================================
>> [  212.700582] BUG: KASAN: null-ptr-deref in
>> membarrier_global_expedited+0x15f/0x220
>> [  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177
>> 
>> [  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
>> [  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.10.2-1 04/01/2014
>> [  212.712263] Call Trace:
>> [  212.713177]  dump_stack+0x71/0xab
>> [  212.714375]  ? membarrier_global_expedited+0x15f/0x220
>> [  212.716236]  ? membarrier_global_expedited+0x15f/0x220
>> [  212.718099]  kasan_report+0x176/0x192
>> [  212.719445]  ? finish_task_switch+0x340/0x3d0
>> [  212.721057]  ? membarrier_global_expedited+0x15f/0x220
>> [  212.722921]  membarrier_global_expedited+0x15f/0x220
>> [  212.724696]  ? ipi_mb+0x10/0x10
>> [  212.725816]  ? vfs_write+0x120/0x230
>> [  212.727113]  ? __ia32_sys_read+0x50/0x50
>> [  212.728596]  __x64_sys_membarrier+0x85/0xf0
>> [  212.730056]  do_syscall_64+0x73/0x160
>> [  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  212.733236] RIP: 0033:0x7fbe8747e229
>> [  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
>> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
>> 01 48
>> [  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000144
>> [  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
>> [  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
>> [  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
>> [  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
>> [  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
>> [  212.756305]
>> ==================================================================

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] racy access to p->mm in membarrier_global_expedited()
  2019-01-28 16:10   ` Mathieu Desnoyers
@ 2019-01-28 16:27     ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2019-01-28 16:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jann Horn, linux-kernel, Thomas Gleixner, Peter Zijlstra

On Mon, Jan 28, 2019 at 11:10:00AM -0500, Mathieu Desnoyers wrote:
> ----- On Jan 28, 2019, at 9:15 AM, paulmck paulmck@linux.ibm.com wrote:
> 
> > On Fri, Jan 25, 2019 at 06:26:47PM +0100, Jann Horn wrote:
> >> membarrier_global_expedited() runs the following code (introduced in
> >> commit c5f58bd58f43), protected only by an RCU read-side critical
> >> section and the cpu_hotplug_lock:
> >> 
> >>         p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> >>         if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> >>                            MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
> >>                 if (!fallback)
> >>                         __cpumask_set_cpu(cpu, tmpmask);
> >>                 else
> >>                         smp_call_function_single(cpu, ipi_mb, NULL, 1);
> >>         }
> >> 
> >> p->mm is not protected by either lock. This means that in theory, the
> >> following races could occur:
> >> 
> >> 1. If the compiler emitted two separate reads of ->mm, the second read
> >> of p->mm could return a NULL pointer and crash.
> >> 2. If the mm is deallocated directly before the atomic_read() occurs,
> >> the atomic_read() could access a freed pointer (I think?).
> >> 
> >> Neither of these are particularly likely - looking at the assembly of
> >> a normal build, the first race doesn't exist because the compiler
> >> optimizes the second read away, and the second race isn't going to
> >> cause anything particularly interesting. Still, this should probably
> >> be fixed...
> >> 
> >> As far as I can tell, you'll have to either take the task_lock()
> >> around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
> >> RCU to the lifetime of mm_struct. I'm not entirely sure what the
> >> better fix is... probably task_lock() makes more sense?
> > 
> > Ouch!!!
> > 
> > Acquiring task_lock() would work, but would be a global lock.
> > This could be addressed to some extent by batching concurrent
> > membarrier_global_expedited() invocations, so that one call to
> > membarrier_global_expedited() does the job for the set of concurrent
> > calls.  The usual approach would use a counter, a pair of wait queues,
> > and a kthread.
> 
> We could start by grabbing the task_lock() as an initial fix, and
> then address any performance-related complains with your approach
> if need be.

Makes sense to me, always good to start simply.

> > I must defer to the mm guys on adding RCU to the lifetime of mm_struct.
> 
> Likewise.
> 
> > Another approach would be to put the MEMBARRIER_STATE_GLOBAL_EXPEDITED
> > in the task structure.
> 
> Then the tricky part becomes how to make sure the per-task-struct
> state is consistent across all tasks pointing to the same mm_struct
> (including processes created with clone CLONE_VM flag).

If a multi-threaded process can change its mm_struct, agreed.  I was
under the impression that such a change can only happen while the task
is single-threaded, but I wouldn't trust my impression all that much.

> > Yet another approach would be to acquire the
> > runqueue lock, thus preventing the task from switching away -- except
> > that it might be in the middle of exit(), so never mind.
> 
> And I suspect that grabbing the runqueue lock may cause more contention
> that grabbing the task_lock().

Quite possibly.

> I'll send a patch implementing the task_lock() approach as RFC.

Sounds good to me!

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > Other approaches?
> > 
> >							Thanx, Paul
> > 
> >> To test the bug, I patched an extra delay into the code:
> >> 
> >> ====================
> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> >> index 3cd8a3a795d2..69cc52039576 100644
> >> --- a/kernel/sched/membarrier.c
> >> +++ b/kernel/sched/membarrier.c
> >> @@ -14,6 +14,7 @@
> >>   * GNU General Public License for more details.
> >>   */
> >>  #include "sched.h"
> >> +#include <linux/delay.h>
> >> 
> >>  /*
> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> >> @@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)
> >> 
> >>                 rcu_read_lock();
> >>                 p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> >> -               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> >> +               if (p && p->mm && (mdelay(100), 1) &&
> >> (atomic_read(&p->mm->membarrier_state) &
> >>                                    MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
> >>                         if (!fallback)
> >>                                 __cpumask_set_cpu(cpu, tmpmask);
> >> ====================
> >> 
> >> On a kernel with that patch applied, I ran this test code:
> >> 
> >> ====================
> >> #define _GNU_SOURCE
> >> #include <unistd.h>
> >> #include <sys/syscall.h>
> >> #include <stdio.h>
> >> #include <linux/membarrier.h>
> >> #include <err.h>
> >> 
> >> int main(void) {
> >>   while (1) {
> >>     printf("executing global expedited barrier...\n");
> >>     int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
> >>     if (res) err(1, "barrier");
> >>   }
> >> }
> >> ====================
> >> 
> >> That resulted in this splat:
> >> 
> >> [  212.697681]
> >> ==================================================================
> >> [  212.700582] BUG: KASAN: null-ptr-deref in
> >> membarrier_global_expedited+0x15f/0x220
> >> [  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177
> >> 
> >> [  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
> >> [  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.10.2-1 04/01/2014
> >> [  212.712263] Call Trace:
> >> [  212.713177]  dump_stack+0x71/0xab
> >> [  212.714375]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.716236]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.718099]  kasan_report+0x176/0x192
> >> [  212.719445]  ? finish_task_switch+0x340/0x3d0
> >> [  212.721057]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.722921]  membarrier_global_expedited+0x15f/0x220
> >> [  212.724696]  ? ipi_mb+0x10/0x10
> >> [  212.725816]  ? vfs_write+0x120/0x230
> >> [  212.727113]  ? __ia32_sys_read+0x50/0x50
> >> [  212.728596]  __x64_sys_membarrier+0x85/0xf0
> >> [  212.730056]  do_syscall_64+0x73/0x160
> >> [  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [  212.733236] RIP: 0033:0x7fbe8747e229
> >> [  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> >> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> >> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
> >> 01 48
> >> [  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
> >> 0000000000000144
> >> [  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
> >> [  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
> >> [  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
> >> [  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
> >> [  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
> >> [  212.756305]
> >> ==================================================================
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-28 16:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25 17:26 [BUG] racy access to p->mm in membarrier_global_expedited() Jann Horn
2019-01-28 14:15 ` Paul E. McKenney
2019-01-28 16:10   ` Mathieu Desnoyers
2019-01-28 16:27     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).