All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mm] mm/mmap: relax mmap_assert_locked() for find_vma()
@ 2021-09-07  6:26 Yonghong Song
  2021-09-08  3:09 ` Liam Howlett
  0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2021-09-07  6:26 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Luigi Rizzo

Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:

  [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
  [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
  [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
  [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
  [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
  [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
  [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
  [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
  [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
  [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
  [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
  [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
  [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
  [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1411.304512] Call Trace:
  [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
  [ 1411.304528]  __bpf_get_stack+0x18f/0x230
  [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
  [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
  [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
  [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
  [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
  [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
  [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
  [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
  [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
  [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
  [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
  [ 1411.312291]  do_syscall_64+0x15/0x80
  [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
  [ 1411.315626] Call Trace:
  [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260

To reproduce, first build `test_progs` binary:
  make -C tools/testing/selftests/bpf -j60
and then run the binary at tools/testing/selftests/bpf directory:
  ./test_progs -t get_stack_raw_tp

The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
asserts that mm->mmap_lock needs to be held. But this is not the case for
bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
in bpf_get_stack[id]() use case, the above warning is emitted during test run.

This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock)
in find_vma().

Cc: Luigi Rizzo <lrizzo@google.com>
Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Alternatively we could add a bool argument to find_vma to indicate whether
mm->mmap_lock has been held or not. But I would like to report and try the
simpler solution first.

diff --git a/mm/mmap.c b/mm/mmap.c
index 88dcc5c25225..8c78b85475b1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	struct rb_node *rb_node;
 	struct vm_area_struct *vma;
 
-	mmap_assert_locked(mm);
+	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
 	/* Check the cache first. */
 	vma = vmacache_find(mm, addr);
 	if (likely(vma))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH mm] mm/mmap: relax mmap_assert_locked() for find_vma()
  2021-09-07  6:26 [PATCH mm] mm/mmap: relax mmap_assert_locked() for find_vma() Yonghong Song
@ 2021-09-08  3:09 ` Liam Howlett
  2021-09-08  4:39   ` Yonghong Song
  0 siblings, 1 reply; 3+ messages in thread
From: Liam Howlett @ 2021-09-08  3:09 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, linux-mm, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Luigi Rizzo

* Yonghong Song <yhs@fb.com> [210907 02:27]:
> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:
> 
>   [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
>   [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
>   [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
>   [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>   [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
>   [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
>   [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
>   [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
>   [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>   [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
>   [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
>   [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
>   [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
>   [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
>   [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [ 1411.304512] Call Trace:
>   [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
>   [ 1411.304528]  __bpf_get_stack+0x18f/0x230
>   [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
>   [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
>   [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>   [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
>   [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
>   [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
>   [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
>   [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
>   [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
>   [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
>   [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
>   [ 1411.312291]  do_syscall_64+0x15/0x80
>   [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>   [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
>   [ 1411.315626] Call Trace:
>   [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
> 
> To reproduce, first build `test_progs` binary:
>   make -C tools/testing/selftests/bpf -j60
> and then run the binary at tools/testing/selftests/bpf directory:
>   ./test_progs -t get_stack_raw_tp
> 
> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
> asserts that mm->mmap_lock needs to be held. But this is not the case for
> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
> 
> This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock)
> in find_vma().

I'm not a fan of removing the lockdep check.  I'd rather see find_vma()
do the lockdep and call another internal function which does exactly
what you have below.  Then you could call the one and everyone else
could keep the code they have.

More importantly, since this is the only user of the
mmap_read_trylock_non_owner(), is there a build bot that could be used
to catch errors introduced in this
path?

Thanks,
Liam

> 
> Cc: Luigi Rizzo <lrizzo@google.com>
> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  mm/mmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Alternatively we could add a bool argument to find_vma to indicate whether
> mm->mmap_lock has been held or not. But I would like to report and try the
> simpler solution first.
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 88dcc5c25225..8c78b85475b1 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>  	struct rb_node *rb_node;
>  	struct vm_area_struct *vma;
>  
> -	mmap_assert_locked(mm);
> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
>  	/* Check the cache first. */
>  	vma = vmacache_find(mm, addr);
>  	if (likely(vma))
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mm] mm/mmap: relax mmap_assert_locked() for find_vma()
  2021-09-08  3:09 ` Liam Howlett
@ 2021-09-08  4:39   ` Yonghong Song
  0 siblings, 0 replies; 3+ messages in thread
From: Yonghong Song @ 2021-09-08  4:39 UTC (permalink / raw)
  To: Liam Howlett
  Cc: bpf, linux-mm, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Luigi Rizzo



On 9/7/21 8:09 PM, Liam Howlett wrote:
> * Yonghong Song <yhs@fb.com> [210907 02:27]:
>> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:
>>
>>    [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
>>    [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
>>    [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
>>    [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>    [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
>>    [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
>>    [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
>>    [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
>>    [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>>    [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
>>    [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
>>    [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
>>    [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
>>    [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
>>    [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    [ 1411.304512] Call Trace:
>>    [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
>>    [ 1411.304528]  __bpf_get_stack+0x18f/0x230
>>    [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
>>    [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
>>    [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>>    [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
>>    [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
>>    [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
>>    [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
>>    [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
>>    [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
>>    [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
>>    [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
>>    [ 1411.312291]  do_syscall_64+0x15/0x80
>>    [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>    [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
>>    [ 1411.315626] Call Trace:
>>    [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
>>
>> To reproduce, first build `test_progs` binary:
>>    make -C tools/testing/selftests/bpf -j60
>> and then run the binary at tools/testing/selftests/bpf directory:
>>    ./test_progs -t get_stack_raw_tp
>>
>> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
>> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
>> asserts that mm->mmap_lock needs to be held. But this is not the case for
>> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
>> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
>> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
>>
>> This patch fixed the issue by replacing mmap_assert_locked() with rwsem_is_locked(&mm->mmap_lock)
>> in find_vma().
> 
> I'm not a fan of removing the lockdep check.  I'd rather see find_vma()
> do the lockdep and call another internal function which does exactly
> what you have below.  Then you could call the one and everyone else
> could keep the code they have.

Okay, will do that in the next revision.

> 
> More importantly, since this is the only user of the
> mmap_read_trylock_non_owner(), is there a build bot that could be used
> to catch errors introduced in this
> path?

We do have libbpf ci to test selftest against current *bpf_next* and 
some past old kernels. In this case, I think it is linux-next first 
introduced issue, and then the change went to net-next and then 
bpf-next. So to catch this earlier, we need to do libbpf ci on linux-next
and/or net-next. We will discuss how we could incorporate this.

> 
> Thanks,
> Liam
> 
>>
>> Cc: Luigi Rizzo <lrizzo@google.com>
>> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   mm/mmap.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Alternatively we could add a bool argument to find_vma to indicate whether
>> mm->mmap_lock has been held or not. But I would like to report and try the
>> simpler solution first.
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 88dcc5c25225..8c78b85475b1 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2275,7 +2275,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>>   	struct rb_node *rb_node;
>>   	struct vm_area_struct *vma;
>>   
>> -	mmap_assert_locked(mm);
>> +	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
>>   	/* Check the cache first. */
>>   	vma = vmacache_find(mm, addr);
>>   	if (likely(vma))
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-08  4:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07  6:26 [PATCH mm] mm/mmap: relax mmap_assert_locked() for find_vma() Yonghong Song
2021-09-08  3:09 ` Liam Howlett
2021-09-08  4:39   ` Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.