linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm lock issue while booting Linux on 5.8-rc1 for RISC-V
@ 2020-06-15  7:28 Atish Patra
  2020-06-15 21:57 ` Stafford Horne
  0 siblings, 1 reply; 18+ messages in thread
From: Atish Patra @ 2020-06-15  7:28 UTC (permalink / raw)
  To: linux-riscv, linux-kernel@vger.kernel.org List
  Cc: Palmer Dabbelt, Bjorn Topel

Hi,
I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
I added additional dump_stack and observed that it's happening in bpf free path.
It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
compiled away without that.
------------------------------------------------------------------------
forked to background, child pid 113
[   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
5.8.0-rc1-dirty #732
[   10.331739] Workqueue: events bpf_prog_free_deferred
[   10.334133] Call Trace:
[   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
[   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
[   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
[   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
[   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
[   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
[   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
[   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
[   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
[   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
[   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
[   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
[   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
[   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
[   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
[   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
[   10.354405] get_unmapped_area 0000000000000000
[   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
[   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
pgtables_bytes 8192 map_count 0
[   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
[   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
[   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
start_data 0 end_data ffffffe00106dfe4
[   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
[   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
[   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
[   10.354405] ioctx_table 0000000000000000
[   10.354405] exe_file 0000000000000000
[   10.354405] tlb_flush_pending 0
[   10.354405] def_flags: 0x0()
[   10.369325] ------------[ cut here ]------------
[   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
[   10.375235] Kernel BUG [#1]
[   10.377198] Modules linked in:
[   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
[   10.380179] Workqueue: events bpf_prog_free_deferred
[   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
[   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
ffffffe0010836e8
[   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
ffffffe3eaea7c90
[   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
0000000000000020
[   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
ffffffe001012758
[   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
0000000000000006
[   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
ffffffe00106e100
[   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
6db6db6db6db6db7
[   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
ffffffffffffffff
[   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
000000000001a7a8
[   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
[   10.393096] status: 0000000000000120 badaddr: 0000000000000000
cause: 0000000000000003
[   10.397755] ---[ end trace 861659596ac28841 ]---
---------------------------------------------------------------------------------------------------

I haven't had the chance to bisect to figure out which commit caused
the issue. Just wanted
to check if it is a known issue already.

Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-15  7:28 mm lock issue while booting Linux on 5.8-rc1 for RISC-V Atish Patra
@ 2020-06-15 21:57 ` Stafford Horne
  2020-06-16  4:51   ` Stafford Horne
  0 siblings, 1 reply; 18+ messages in thread
From: Stafford Horne @ 2020-06-15 21:57 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-riscv, linux-kernel@vger.kernel.org List, Palmer Dabbelt,
	Bjorn Topel

On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> Hi,
> I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> I added additional dump_stack and observed that it's happening in bpf free path.
> It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> compiled away without that.
> ------------------------------------------------------------------------
> forked to background, child pid 113
> [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> 5.8.0-rc1-dirty #732
> [   10.331739] Workqueue: events bpf_prog_free_deferred
> [   10.334133] Call Trace:
> [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> [   10.354405] get_unmapped_area 0000000000000000
> [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> pgtables_bytes 8192 map_count 0
> [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> start_data 0 end_data ffffffe00106dfe4
> [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> [   10.354405] ioctx_table 0000000000000000
> [   10.354405] exe_file 0000000000000000
> [   10.354405] tlb_flush_pending 0
> [   10.354405] def_flags: 0x0()
> [   10.369325] ------------[ cut here ]------------
> [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> [   10.375235] Kernel BUG [#1]
> [   10.377198] Modules linked in:
> [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> [   10.380179] Workqueue: events bpf_prog_free_deferred
> [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> ffffffe0010836e8
> [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> ffffffe3eaea7c90
> [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> 0000000000000020
> [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> ffffffe001012758
> [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> 0000000000000006
> [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> ffffffe00106e100
> [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> 6db6db6db6db6db7
> [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> ffffffffffffffff
> [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> 000000000001a7a8
> [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> cause: 0000000000000003
> [   10.397755] ---[ end trace 861659596ac28841 ]---
> ---------------------------------------------------------------------------------------------------
> 
> I haven't had the chance to bisect to figure out which commit caused
> the issue. Just wanted
> to check if it is a known issue already.

Hi Atish,

Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
you have any updates please post back.  I will try to look into this today or
tomorrow.

-Stafford

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-15 21:57 ` Stafford Horne
@ 2020-06-16  4:51   ` Stafford Horne
  2020-06-16  8:40     ` Palmer Dabbelt
  0 siblings, 1 reply; 18+ messages in thread
From: Stafford Horne @ 2020-06-16  4:51 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-riscv, linux-kernel@vger.kernel.org List, Palmer Dabbelt,
	Bjorn Topel, Michel Lespinasse

On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > Hi,
> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > I added additional dump_stack and observed that it's happening in bpf free path.
> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > compiled away without that.
> > ------------------------------------------------------------------------
> > forked to background, child pid 113
> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > 5.8.0-rc1-dirty #732
> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > [   10.334133] Call Trace:
> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > [   10.354405] get_unmapped_area 0000000000000000
> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > pgtables_bytes 8192 map_count 0
> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > start_data 0 end_data ffffffe00106dfe4
> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > [   10.354405] ioctx_table 0000000000000000
> > [   10.354405] exe_file 0000000000000000
> > [   10.354405] tlb_flush_pending 0
> > [   10.354405] def_flags: 0x0()
> > [   10.369325] ------------[ cut here ]------------
> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > [   10.375235] Kernel BUG [#1]
> > [   10.377198] Modules linked in:
> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > ffffffe0010836e8
> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > ffffffe3eaea7c90
> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > 0000000000000020
> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > ffffffe001012758
> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > 0000000000000006
> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > ffffffe00106e100
> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > 6db6db6db6db6db7
> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > ffffffffffffffff
> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > 000000000001a7a8
> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > cause: 0000000000000003
> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > ---------------------------------------------------------------------------------------------------
> > 
> > I haven't had the chance to bisect to figure out which commit caused
> > the issue. Just wanted
> > to check if it is a known issue already.
> 
> Hi Atish,
> 
> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> you have any updates please post back.  I will try to look into this today or
> tomorrow.

I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:

  mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()

This should have just changed the existing lockdep api's but something has
changed.  I haven't had time to look at it yet.

Ccing: Michel Lespinasse <walken@google.com>

-Stafford

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16  4:51   ` Stafford Horne
@ 2020-06-16  8:40     ` Palmer Dabbelt
  2020-06-16 10:44       ` Michel Lespinasse
  0 siblings, 1 reply; 18+ messages in thread
From: Palmer Dabbelt @ 2020-06-16  8:40 UTC (permalink / raw)
  To: shorne; +Cc: atishp, linux-riscv, linux-kernel, Bjorn Topel, walken

On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
>> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
>> > Hi,
>> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
>> > I added additional dump_stack and observed that it's happening in bpf free path.
>> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
>> > compiled away without that.
>> > ------------------------------------------------------------------------
>> > forked to background, child pid 113
>> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
>> > 5.8.0-rc1-dirty #732
>> > [   10.331739] Workqueue: events bpf_prog_free_deferred
>> > [   10.334133] Call Trace:
>> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
>> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
>> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
>> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
>> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
>> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
>> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
>> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
>> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
>> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
>> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
>> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
>> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
>> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
>> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
>> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
>> > [   10.354405] get_unmapped_area 0000000000000000
>> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
>> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
>> > pgtables_bytes 8192 map_count 0
>> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
>> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
>> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
>> > start_data 0 end_data ffffffe00106dfe4
>> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
>> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
>> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
>> > [   10.354405] ioctx_table 0000000000000000
>> > [   10.354405] exe_file 0000000000000000
>> > [   10.354405] tlb_flush_pending 0
>> > [   10.354405] def_flags: 0x0()
>> > [   10.369325] ------------[ cut here ]------------
>> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
>> > [   10.375235] Kernel BUG [#1]
>> > [   10.377198] Modules linked in:
>> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
>> > [   10.380179] Workqueue: events bpf_prog_free_deferred
>> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
>> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
>> > ffffffe0010836e8
>> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
>> > ffffffe3eaea7c90
>> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
>> > 0000000000000020
>> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
>> > ffffffe001012758
>> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
>> > 0000000000000006
>> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
>> > ffffffe00106e100
>> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
>> > 6db6db6db6db6db7
>> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
>> > ffffffffffffffff
>> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
>> > 000000000001a7a8
>> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
>> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
>> > cause: 0000000000000003
>> > [   10.397755] ---[ end trace 861659596ac28841 ]---
>> > ---------------------------------------------------------------------------------------------------
>> >
>> > I haven't had the chance to bisect to figure out which commit caused
>> > the issue. Just wanted
>> > to check if it is a known issue already.
>>
>> Hi Atish,
>>
>> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
>> you have any updates please post back.  I will try to look into this today or
>> tomorrow.
>
> I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
>
>   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
>
> This should have just changed the existing lockdep api's but something has
> changed.  I haven't had time to look at it yet.
>
> Ccing: Michel Lespinasse <walken@google.com>

This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
running a simple buildroot-based userspace, so I doubt anything is triggering
BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
don't cross compile and need LLVM) -- does anyone have a userspace I can use to
trigger the bug?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16  8:40     ` Palmer Dabbelt
@ 2020-06-16 10:44       ` Michel Lespinasse
  2020-06-16 17:54         ` Atish Patra
  2020-06-16 19:19         ` Stafford Horne
  0 siblings, 2 replies; 18+ messages in thread
From: Michel Lespinasse @ 2020-06-16 10:44 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: shorne, atishp, linux-riscv, LKML, Bjorn Topel

I am also unable to reproduce the issue so far.

I wanted to point to a few things in case this helps:
- Commit 42fc541404f2 was bisected as the cause. This commit changes
walk_page_range_novma() to use mmap_assert_locked() instead of
lockdep_assert_held()
- mmap_assert_locked() checks lockdep_assert_held(), but also checks
that the rwsem itself is locked.

Now how could lockdep think the lock is held, but the lock itself is
not marked as locked ???

I'm not sure if it helps at all, but a few commits earlier,
0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
exclusively by stackmap, and does the opposite: it acquires the mmap
lock without telling lockdep about it. I can't see any smoking gun
linking this to our bug, but I thought it may be worth mentioning as
it involves the same suspects (stackmap and the difference between
owning the lock vs lockdep thinking we own the lock).

I'm sorry, that's only how far I was able to go on this bug - I'm not
sure how to investigate it further as I can not reproduce the issue...

On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> >> > Hi,
> >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> >> > I added additional dump_stack and observed that it's happening in bpf free path.
> >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> >> > compiled away without that.
> >> > ------------------------------------------------------------------------
> >> > forked to background, child pid 113
> >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> >> > 5.8.0-rc1-dirty #732
> >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> >> > [   10.334133] Call Trace:
> >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> >> > [   10.354405] get_unmapped_area 0000000000000000
> >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> >> > pgtables_bytes 8192 map_count 0
> >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> >> > start_data 0 end_data ffffffe00106dfe4
> >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> >> > [   10.354405] ioctx_table 0000000000000000
> >> > [   10.354405] exe_file 0000000000000000
> >> > [   10.354405] tlb_flush_pending 0
> >> > [   10.354405] def_flags: 0x0()
> >> > [   10.369325] ------------[ cut here ]------------
> >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> >> > [   10.375235] Kernel BUG [#1]
> >> > [   10.377198] Modules linked in:
> >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> >> > ffffffe0010836e8
> >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> >> > ffffffe3eaea7c90
> >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> >> > 0000000000000020
> >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> >> > ffffffe001012758
> >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> >> > 0000000000000006
> >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> >> > ffffffe00106e100
> >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> >> > 6db6db6db6db6db7
> >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> >> > ffffffffffffffff
> >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> >> > 000000000001a7a8
> >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> >> > cause: 0000000000000003
> >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> >> > ---------------------------------------------------------------------------------------------------
> >> >
> >> > I haven't had the chance to bisect to figure out which commit caused
> >> > the issue. Just wanted
> >> > to check if it is a known issue already.
> >>
> >> Hi Atish,
> >>
> >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> >> you have any updates please post back.  I will try to look into this today or
> >> tomorrow.
> >
> > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> >
> >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> >
> > This should have just changed the existing lockdep api's but something has
> > changed.  I haven't had time to look at it yet.
> >
> > Ccing: Michel Lespinasse <walken@google.com>
>
> This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> running a simple buildroot-based userspace, so I doubt anything is triggering
> BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> trigger the bug?



-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 10:44       ` Michel Lespinasse
@ 2020-06-16 17:54         ` Atish Patra
  2020-06-16 18:07           ` Palmer Dabbelt
  2020-06-16 19:19         ` Stafford Horne
  1 sibling, 1 reply; 18+ messages in thread
From: Atish Patra @ 2020-06-16 17:54 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: Palmer Dabbelt, shorne, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> wrote:
>
> I am also unable to reproduce the issue so far.
>
> I wanted to point to a few things in case this helps:
> - Commit 42fc541404f2 was bisected as the cause. This commit changes
> walk_page_range_novma() to use mmap_assert_locked() instead of
> lockdep_assert_held()
> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> that the rwsem itself is locked.
>
> Now how could lockdep think the lock is held, but the lock itself is
> not marked as locked ???
>
> I'm not sure if it helps at all, but a few commits earlier,
> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> exclusively by stackmap, and does the opposite: it acquires the mmap
> lock without telling lockdep about it. I can't see any smoking gun
> linking this to our bug, but I thought it may be worth mentioning as
> it involves the same suspects (stackmap and the difference between
> owning the lock vs lockdep thinking we own the lock).
>
> I'm sorry, that's only how far I was able to go on this bug - I'm not
> sure how to investigate it further as I can not reproduce the issue...
>
> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > >> > Hi,
> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > >> > compiled away without that.
> > >> > ------------------------------------------------------------------------
> > >> > forked to background, child pid 113
> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > >> > 5.8.0-rc1-dirty #732
> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > >> > [   10.334133] Call Trace:
> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > >> > [   10.354405] get_unmapped_area 0000000000000000
> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > >> > pgtables_bytes 8192 map_count 0
> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > >> > start_data 0 end_data ffffffe00106dfe4
> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > >> > [   10.354405] ioctx_table 0000000000000000
> > >> > [   10.354405] exe_file 0000000000000000
> > >> > [   10.354405] tlb_flush_pending 0
> > >> > [   10.354405] def_flags: 0x0()
> > >> > [   10.369325] ------------[ cut here ]------------
> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > >> > [   10.375235] Kernel BUG [#1]
> > >> > [   10.377198] Modules linked in:
> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > >> > ffffffe0010836e8
> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > >> > ffffffe3eaea7c90
> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > >> > 0000000000000020
> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > >> > ffffffe001012758
> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > >> > 0000000000000006
> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > >> > ffffffe00106e100
> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > >> > 6db6db6db6db6db7
> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > >> > ffffffffffffffff
> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > >> > 000000000001a7a8
> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > >> > cause: 0000000000000003
> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > >> > ---------------------------------------------------------------------------------------------------
> > >> >
> > >> > I haven't had the chance to bisect to figure out which commit caused
> > >> > the issue. Just wanted
> > >> > to check if it is a known issue already.
> > >>
> > >> Hi Atish,
> > >>
> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > >> you have any updates please post back.  I will try to look into this today or
> > >> tomorrow.
> > >
> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > >
> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > >
> > > This should have just changed the existing lockdep api's but something has
> > > changed.  I haven't had time to look at it yet.
> > >
> > > Ccing: Michel Lespinasse <walken@google.com>
> >
> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > running a simple buildroot-based userspace, so I doubt anything is triggering
> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > trigger the bug?
>
I am also using buildroot based userspace but it's a bit bulky because
of my config.
You can access it from here:
https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls

A defconfig build & boot in Qemu with above userspace is sufficient to
trigger the bug.

FYI: I noticed the kernel bug message every time during ssh-key
generation. Not sure if that is related.

>
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 17:54         ` Atish Patra
@ 2020-06-16 18:07           ` Palmer Dabbelt
  2020-06-16 18:08             ` Atish Patra
  2020-06-16 19:18             ` Alex Ghiti
  0 siblings, 2 replies; 18+ messages in thread
From: Palmer Dabbelt @ 2020-06-16 18:07 UTC (permalink / raw)
  To: atishp; +Cc: walken, shorne, linux-riscv, linux-kernel, Bjorn Topel

On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@atishpatra.org wrote:
> On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> wrote:
>>
>> I am also unable to reproduce the issue so far.
>>
>> I wanted to point to a few things in case this helps:
>> - Commit 42fc541404f2 was bisected as the cause. This commit changes
>> walk_page_range_novma() to use mmap_assert_locked() instead of
>> lockdep_assert_held()
>> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
>> that the rwsem itself is locked.
>>
>> Now how could lockdep think the lock is held, but the lock itself is
>> not marked as locked ???
>>
>> I'm not sure if it helps at all, but a few commits earlier,
>> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
>> exclusively by stackmap, and does the opposite: it acquires the mmap
>> lock without telling lockdep about it. I can't see any smoking gun
>> linking this to our bug, but I thought it may be worth mentioning as
>> it involves the same suspects (stackmap and the difference between
>> owning the lock vs lockdep thinking we own the lock).
>>
>> I'm sorry, that's only how far I was able to go on this bug - I'm not
>> sure how to investigate it further as I can not reproduce the issue...
>>
>> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> >
>> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
>> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
>> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
>> > >> > Hi,
>> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
>> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
>> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
>> > >> > compiled away without that.
>> > >> > ------------------------------------------------------------------------
>> > >> > forked to background, child pid 113
>> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
>> > >> > 5.8.0-rc1-dirty #732
>> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
>> > >> > [   10.334133] Call Trace:
>> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
>> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
>> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
>> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
>> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
>> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
>> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
>> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
>> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
>> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
>> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
>> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
>> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
>> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
>> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
>> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
>> > >> > [   10.354405] get_unmapped_area 0000000000000000
>> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
>> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
>> > >> > pgtables_bytes 8192 map_count 0
>> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
>> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
>> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
>> > >> > start_data 0 end_data ffffffe00106dfe4
>> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
>> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
>> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
>> > >> > [   10.354405] ioctx_table 0000000000000000
>> > >> > [   10.354405] exe_file 0000000000000000
>> > >> > [   10.354405] tlb_flush_pending 0
>> > >> > [   10.354405] def_flags: 0x0()
>> > >> > [   10.369325] ------------[ cut here ]------------
>> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
>> > >> > [   10.375235] Kernel BUG [#1]
>> > >> > [   10.377198] Modules linked in:
>> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
>> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
>> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
>> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
>> > >> > ffffffe0010836e8
>> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
>> > >> > ffffffe3eaea7c90
>> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
>> > >> > 0000000000000020
>> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
>> > >> > ffffffe001012758
>> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
>> > >> > 0000000000000006
>> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
>> > >> > ffffffe00106e100
>> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
>> > >> > 6db6db6db6db6db7
>> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
>> > >> > ffffffffffffffff
>> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
>> > >> > 000000000001a7a8
>> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
>> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
>> > >> > cause: 0000000000000003
>> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
>> > >> > ---------------------------------------------------------------------------------------------------
>> > >> >
>> > >> > I haven't had the chance to bisect to figure out which commit caused
>> > >> > the issue. Just wanted
>> > >> > to check if it is a known issue already.
>> > >>
>> > >> Hi Atish,
>> > >>
>> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
>> > >> you have any updates please post back.  I will try to look into this today or
>> > >> tomorrow.
>> > >
>> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
>> > >
>> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
>> > >
>> > > This should have just changed the existing lockdep api's but something has
>> > > changed.  I haven't had time to look at it yet.
>> > >
>> > > Ccing: Michel Lespinasse <walken@google.com>
>> >
>> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
>> > running a simple buildroot-based userspace, so I doubt anything is triggering
>> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
>> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
>> > trigger the bug?
>>
> I am also using buildroot based userspace but it's a bit bulky because
> of my config.
> You can access it from here:
> https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls

I'm up to ~900M trying to get enough of a userspace to run the BPF selftests,
so this would be way better.  Unfortunately I'm not actually getting the
crash with your rootfs.

>
> A defconfig build & boot in Qemu with above userspace is sufficient to
> trigger the bug.
>
> FYI: I noticed the kernel bug message every time during ssh-key
> generation. Not sure if that is related.
>
>>
>>
>> --
>> Michel "Walken" Lespinasse
>> A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 18:07           ` Palmer Dabbelt
@ 2020-06-16 18:08             ` Atish Patra
  2020-06-16 18:15               ` Palmer Dabbelt
  2020-06-16 19:18             ` Alex Ghiti
  1 sibling, 1 reply; 18+ messages in thread
From: Atish Patra @ 2020-06-16 18:08 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Michel Lespinasse, shorne, linux-riscv,
	linux-kernel@vger.kernel.org List, Bjorn Topel

On Tue, Jun 16, 2020 at 11:07 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@atishpatra.org wrote:
> > On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> wrote:
> >>
> >> I am also unable to reproduce the issue so far.
> >>
> >> I wanted to point to a few things in case this helps:
> >> - Commit 42fc541404f2 was bisected as the cause. This commit changes
> >> walk_page_range_novma() to use mmap_assert_locked() instead of
> >> lockdep_assert_held()
> >> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> >> that the rwsem itself is locked.
> >>
> >> Now how could lockdep think the lock is held, but the lock itself is
> >> not marked as locked ???
> >>
> >> I'm not sure if it helps at all, but a few commits earlier,
> >> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> >> exclusively by stackmap, and does the opposite: it acquires the mmap
> >> lock without telling lockdep about it. I can't see any smoking gun
> >> linking this to our bug, but I thought it may be worth mentioning as
> >> it involves the same suspects (stackmap and the difference between
> >> owning the lock vs lockdep thinking we own the lock).
> >>
> >> I'm sorry, that's only how far I was able to go on this bug - I'm not
> >> sure how to investigate it further as I can not reproduce the issue...
> >>
> >> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >> >
> >> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> >> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> >> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> >> > >> > Hi,
> >> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> >> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> >> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> >> > >> > compiled away without that.
> >> > >> > ------------------------------------------------------------------------
> >> > >> > forked to background, child pid 113
> >> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> >> > >> > 5.8.0-rc1-dirty #732
> >> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> >> > >> > [   10.334133] Call Trace:
> >> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> >> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> >> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> >> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> >> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> >> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> >> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> >> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> >> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> >> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> >> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> >> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> >> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> >> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> >> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> >> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> >> > >> > [   10.354405] get_unmapped_area 0000000000000000
> >> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> >> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> >> > >> > pgtables_bytes 8192 map_count 0
> >> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> >> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> >> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> >> > >> > start_data 0 end_data ffffffe00106dfe4
> >> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> >> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> >> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> >> > >> > [   10.354405] ioctx_table 0000000000000000
> >> > >> > [   10.354405] exe_file 0000000000000000
> >> > >> > [   10.354405] tlb_flush_pending 0
> >> > >> > [   10.354405] def_flags: 0x0()
> >> > >> > [   10.369325] ------------[ cut here ]------------
> >> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> >> > >> > [   10.375235] Kernel BUG [#1]
> >> > >> > [   10.377198] Modules linked in:
> >> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> >> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> >> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> >> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> >> > >> > ffffffe0010836e8
> >> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> >> > >> > ffffffe3eaea7c90
> >> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> >> > >> > 0000000000000020
> >> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> >> > >> > ffffffe001012758
> >> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> >> > >> > 0000000000000006
> >> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> >> > >> > ffffffe00106e100
> >> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> >> > >> > 6db6db6db6db6db7
> >> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> >> > >> > ffffffffffffffff
> >> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> >> > >> > 000000000001a7a8
> >> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> >> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> >> > >> > cause: 0000000000000003
> >> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> >> > >> > ---------------------------------------------------------------------------------------------------
> >> > >> >
> >> > >> > I haven't had the chance to bisect to figure out which commit caused
> >> > >> > the issue. Just wanted
> >> > >> > to check if it is a known issue already.
> >> > >>
> >> > >> Hi Atish,
> >> > >>
> >> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> >> > >> you have any updates please post back.  I will try to look into this today or
> >> > >> tomorrow.
> >> > >
> >> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> >> > >
> >> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> >> > >
> >> > > This should have just changed the existing lockdep api's but something has
> >> > > changed.  I haven't had time to look at it yet.
> >> > >
> >> > > Ccing: Michel Lespinasse <walken@google.com>
> >> >
> >> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> >> > running a simple buildroot-based userspace, so I doubt anything is triggering
> >> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> >> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> >> > trigger the bug?
> >>
> > I am also using buildroot based userspace but it's a bit bulky because
> > of my config.
> > You can access it from here:
> > https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls
>
> I'm up to ~900M trying to get enough of a userspace to run the BPF selftests,
> so this would be way better.  Unfortunately I'm not actually getting the
> crash with your rootfs.
>

Strange. I see it every time with this rootfs and CONFIG_DEBUG_VM enabled.
Do you have CONFIG_DEBUG_VM enabled ?

> >
> > A defconfig build & boot in Qemu with above userspace is sufficient to
> > trigger the bug.
> >
> > FYI: I noticed the kernel bug message every time during ssh-key
> > generation. Not sure if that is related.
> >
> >>
> >>
> >> --
> >> Michel "Walken" Lespinasse
> >> A program is never fully debugged until the last user dies.



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 18:08             ` Atish Patra
@ 2020-06-16 18:15               ` Palmer Dabbelt
  2020-06-16 18:21                 ` Atish Patra
  0 siblings, 1 reply; 18+ messages in thread
From: Palmer Dabbelt @ 2020-06-16 18:15 UTC (permalink / raw)
  To: atishp; +Cc: walken, shorne, linux-riscv, linux-kernel, Bjorn Topel

On Tue, 16 Jun 2020 11:08:55 PDT (-0700), atishp@atishpatra.org wrote:
> On Tue, Jun 16, 2020 at 11:07 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>> On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@atishpatra.org wrote:
>> > On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> wrote:
>> >>
>> >> I am also unable to reproduce the issue so far.
>> >>
>> >> I wanted to point to a few things in case this helps:
>> >> - Commit 42fc541404f2 was bisected as the cause. This commit changes
>> >> walk_page_range_novma() to use mmap_assert_locked() instead of
>> >> lockdep_assert_held()
>> >> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
>> >> that the rwsem itself is locked.
>> >>
>> >> Now how could lockdep think the lock is held, but the lock itself is
>> >> not marked as locked ???
>> >>
>> >> I'm not sure if it helps at all, but a few commits earlier,
>> >> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
>> >> exclusively by stackmap, and does the opposite: it acquires the mmap
>> >> lock without telling lockdep about it. I can't see any smoking gun
>> >> linking this to our bug, but I thought it may be worth mentioning as
>> >> it involves the same suspects (stackmap and the difference between
>> >> owning the lock vs lockdep thinking we own the lock).
>> >>
>> >> I'm sorry, that's only how far I was able to go on this bug - I'm not
>> >> sure how to investigate it further as I can not reproduce the issue...
>> >>
>> >> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> >> >
>> >> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
>> >> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
>> >> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
>> >> > >> > Hi,
>> >> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
>> >> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
>> >> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
>> >> > >> > compiled away without that.
>> >> > >> > ------------------------------------------------------------------------
>> >> > >> > forked to background, child pid 113
>> >> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
>> >> > >> > 5.8.0-rc1-dirty #732
>> >> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
>> >> > >> > [   10.334133] Call Trace:
>> >> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
>> >> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
>> >> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
>> >> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
>> >> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
>> >> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
>> >> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
>> >> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
>> >> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
>> >> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
>> >> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
>> >> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
>> >> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
>> >> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
>> >> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
>> >> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
>> >> > >> > [   10.354405] get_unmapped_area 0000000000000000
>> >> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
>> >> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
>> >> > >> > pgtables_bytes 8192 map_count 0
>> >> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
>> >> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
>> >> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
>> >> > >> > start_data 0 end_data ffffffe00106dfe4
>> >> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
>> >> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
>> >> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
>> >> > >> > [   10.354405] ioctx_table 0000000000000000
>> >> > >> > [   10.354405] exe_file 0000000000000000
>> >> > >> > [   10.354405] tlb_flush_pending 0
>> >> > >> > [   10.354405] def_flags: 0x0()
>> >> > >> > [   10.369325] ------------[ cut here ]------------
>> >> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
>> >> > >> > [   10.375235] Kernel BUG [#1]
>> >> > >> > [   10.377198] Modules linked in:
>> >> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
>> >> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
>> >> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
>> >> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
>> >> > >> > ffffffe0010836e8
>> >> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
>> >> > >> > ffffffe3eaea7c90
>> >> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
>> >> > >> > 0000000000000020
>> >> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
>> >> > >> > ffffffe001012758
>> >> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
>> >> > >> > 0000000000000006
>> >> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
>> >> > >> > ffffffe00106e100
>> >> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
>> >> > >> > 6db6db6db6db6db7
>> >> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
>> >> > >> > ffffffffffffffff
>> >> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
>> >> > >> > 000000000001a7a8
>> >> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
>> >> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
>> >> > >> > cause: 0000000000000003
>> >> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
>> >> > >> > ---------------------------------------------------------------------------------------------------
>> >> > >> >
>> >> > >> > I haven't had the chance to bisect to figure out which commit caused
>> >> > >> > the issue. Just wanted
>> >> > >> > to check if it is a known issue already.
>> >> > >>
>> >> > >> Hi Atish,
>> >> > >>
>> >> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
>> >> > >> you have any updates please post back.  I will try to look into this today or
>> >> > >> tomorrow.
>> >> > >
>> >> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
>> >> > >
>> >> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
>> >> > >
>> >> > > This should have just changed the existing lockdep api's but something has
>> >> > > changed.  I haven't had time to look at it yet.
>> >> > >
>> >> > > Ccing: Michel Lespinasse <walken@google.com>
>> >> >
>> >> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
>> >> > running a simple buildroot-based userspace, so I doubt anything is triggering
>> >> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
>> >> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
>> >> > trigger the bug?
>> >>
>> > I am also using buildroot based userspace but it's a bit bulky because
>> > of my config.
>> > You can access it from here:
>> > https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls
>>
>> I'm up to ~900M trying to get enough of a userspace to run the BPF selftests,
>> so this would be way better.  Unfortunately I'm not actually getting the
>> crash with your rootfs.
>>
>
> Strange. I see it every time with this rootfs and CONFIG_DEBUG_VM enabled.
> Do you have CONFIG_DEBUG_VM enabled ?

Ya, that's the first thing I checked :).  I'm just building defconfig and it's
in there:

# zcat /proc/config.gz | grep DEBUG_VM
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_VM_PGFLAGS=y

>> >
>> > A defconfig build & boot in Qemu with above userspace is sufficient to
>> > trigger the bug.
>> >
>> > FYI: I noticed the kernel bug message every time during ssh-key
>> > generation. Not sure if that is related.
>> >
>> >>
>> >>
>> >> --
>> >> Michel "Walken" Lespinasse
>> >> A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 18:15               ` Palmer Dabbelt
@ 2020-06-16 18:21                 ` Atish Patra
  0 siblings, 0 replies; 18+ messages in thread
From: Atish Patra @ 2020-06-16 18:21 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Michel Lespinasse, shorne, linux-riscv,
	linux-kernel@vger.kernel.org List, Bjorn Topel

On Tue, Jun 16, 2020 at 11:15 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 16 Jun 2020 11:08:55 PDT (-0700), atishp@atishpatra.org wrote:
> > On Tue, Jun 16, 2020 at 11:07 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >>
> >> On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@atishpatra.org wrote:
> >> > On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> wrote:
> >> >>
> >> >> I am also unable to reproduce the issue so far.
> >> >>
> >> >> I wanted to point to a few things in case this helps:
> >> >> - Commit 42fc541404f2 was bisected as the cause. This commit changes
> >> >> walk_page_range_novma() to use mmap_assert_locked() instead of
> >> >> lockdep_assert_held()
> >> >> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> >> >> that the rwsem itself is locked.
> >> >>
> >> >> Now how could lockdep think the lock is held, but the lock itself is
> >> >> not marked as locked ???
> >> >>
> >> >> I'm not sure if it helps at all, but a few commits earlier,
> >> >> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> >> >> exclusively by stackmap, and does the opposite: it acquires the mmap
> >> >> lock without telling lockdep about it. I can't see any smoking gun
> >> >> linking this to our bug, but I thought it may be worth mentioning as
> >> >> it involves the same suspects (stackmap and the difference between
> >> >> owning the lock vs lockdep thinking we own the lock).
> >> >>
> >> >> I'm sorry, that's only how far I was able to go on this bug - I'm not
> >> >> sure how to investigate it further as I can not reproduce the issue...
> >> >>
> >> >> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >> >> >
> >> >> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> >> >> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> >> >> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> >> >> > >> > Hi,
> >> >> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> >> >> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> >> >> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> >> >> > >> > compiled away without that.
> >> >> > >> > ------------------------------------------------------------------------
> >> >> > >> > forked to background, child pid 113
> >> >> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> >> >> > >> > 5.8.0-rc1-dirty #732
> >> >> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> >> >> > >> > [   10.334133] Call Trace:
> >> >> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> >> >> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> >> >> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> >> >> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> >> >> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> >> >> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> >> >> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> >> >> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> >> >> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> >> >> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> >> >> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> >> >> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> >> >> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> >> >> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> >> >> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> >> >> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> >> >> > >> > [   10.354405] get_unmapped_area 0000000000000000
> >> >> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> >> >> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> >> >> > >> > pgtables_bytes 8192 map_count 0
> >> >> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> >> >> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> >> >> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> >> >> > >> > start_data 0 end_data ffffffe00106dfe4
> >> >> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> >> >> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> >> >> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> >> >> > >> > [   10.354405] ioctx_table 0000000000000000
> >> >> > >> > [   10.354405] exe_file 0000000000000000
> >> >> > >> > [   10.354405] tlb_flush_pending 0
> >> >> > >> > [   10.354405] def_flags: 0x0()
> >> >> > >> > [   10.369325] ------------[ cut here ]------------
> >> >> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> >> >> > >> > [   10.375235] Kernel BUG [#1]
> >> >> > >> > [   10.377198] Modules linked in:
> >> >> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> >> >> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> >> >> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> >> >> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> >> >> > >> > ffffffe0010836e8
> >> >> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> >> >> > >> > ffffffe3eaea7c90
> >> >> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> >> >> > >> > 0000000000000020
> >> >> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> >> >> > >> > ffffffe001012758
> >> >> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> >> >> > >> > 0000000000000006
> >> >> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> >> >> > >> > ffffffe00106e100
> >> >> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> >> >> > >> > 6db6db6db6db6db7
> >> >> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> >> >> > >> > ffffffffffffffff
> >> >> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> >> >> > >> > 000000000001a7a8
> >> >> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> >> >> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> >> >> > >> > cause: 0000000000000003
> >> >> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> >> >> > >> > ---------------------------------------------------------------------------------------------------
> >> >> > >> >
> >> >> > >> > I haven't had the chance to bisect to figure out which commit caused
> >> >> > >> > the issue. Just wanted
> >> >> > >> > to check if it is a known issue already.
> >> >> > >>
> >> >> > >> Hi Atish,
> >> >> > >>
> >> >> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> >> >> > >> you have any updates please post back.  I will try to look into this today or
> >> >> > >> tomorrow.
> >> >> > >
> >> >> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> >> >> > >
> >> >> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> >> >> > >
> >> >> > > This should have just changed the existing lockdep api's but something has
> >> >> > > changed.  I haven't had time to look at it yet.
> >> >> > >
> >> >> > > Ccing: Michel Lespinasse <walken@google.com>
> >> >> >
> >> >> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> >> >> > running a simple buildroot-based userspace, so I doubt anything is triggering
> >> >> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> >> >> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> >> >> > trigger the bug?
> >> >>
> >> > I am also using buildroot based userspace but it's a bit bulky because
> >> > of my config.
> >> > You can access it from here:
> >> > https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls
> >>
> >> I'm up to ~900M trying to get enough of a userspace to run the BPF selftests,
> >> so this would be way better.  Unfortunately I'm not actually getting the
> >> crash with your rootfs.
> >>
> >
> > Strange. I see it every time with this rootfs and CONFIG_DEBUG_VM enabled.
> > Do you have CONFIG_DEBUG_VM enabled ?
>
> Ya, that's the first thing I checked :).  I'm just building defconfig and it's
> in there:
>
> # zcat /proc/config.gz | grep DEBUG_VM
> CONFIG_DEBUG_VM=y
> # CONFIG_DEBUG_VM_VMACACHE is not set
> # CONFIG_DEBUG_VM_RB is not set
> CONFIG_DEBUG_VM_PGFLAGS=y
>

Hmm. Can you try with this qemu command line ?

/home/atish/workspace/qemu/riscv64-softmmu/qemu-system-riscv64 -M virt
-smp 4 -m 2G -display none -serial mon:stdio \
   -kernel /home/atish/workspace/linux/arch/riscv/boot/Image -initrd
/home/atish/workspace/rootfs_images/test_rootfs.cpio -object
rng-random,filename=/dev/urandom,id=rng0 -device
virtio-rng-device,rng=rng0 \
   -device virtio-net-device,netdev=usernet -netdev
user,id=usernet,hostfwd=tcp::10000-:22 -append ' rw console=ttyS0
earlycon=uart8250,mmio,0x10000000'

> >> >
> >> > A defconfig build & boot in Qemu with above userspace is sufficient to
> >> > trigger the bug.
> >> >
> >> > FYI: I noticed the kernel bug message every time during ssh-key
> >> > generation. Not sure if that is related.
> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Michel "Walken" Lespinasse
> >> >> A program is never fully debugged until the last user dies.



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 18:07           ` Palmer Dabbelt
  2020-06-16 18:08             ` Atish Patra
@ 2020-06-16 19:18             ` Alex Ghiti
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Ghiti @ 2020-06-16 19:18 UTC (permalink / raw)
  To: Palmer Dabbelt, atishp
  Cc: Bjorn Topel, shorne, walken, linux-riscv, linux-kernel

Le 6/16/20 à 2:07 PM, Palmer Dabbelt a écrit :
> On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@atishpatra.org wrote:
>> On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@google.com> 
>> wrote:
>>>
>>> I am also unable to reproduce the issue so far.
>>>
>>> I wanted to point to a few things in case this helps:
>>> - Commit 42fc541404f2 was bisected as the cause. This commit changes
>>> walk_page_range_novma() to use mmap_assert_locked() instead of
>>> lockdep_assert_held()
>>> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
>>> that the rwsem itself is locked.
>>>
>>> Now how could lockdep think the lock is held, but the lock itself is
>>> not marked as locked ???
>>>
>>> I'm not sure if it helps at all, but a few commits earlier,
>>> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
>>> exclusively by stackmap, and does the opposite: it acquires the mmap
>>> lock without telling lockdep about it. I can't see any smoking gun
>>> linking this to our bug, but I thought it may be worth mentioning as
>>> it involves the same suspects (stackmap and the difference between
>>> owning the lock vs lockdep thinking we own the lock).
>>>
>>> I'm sorry, that's only how far I was able to go on this bug - I'm not
>>> sure how to investigate it further as I can not reproduce the issue...
>>>
>>> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> 
>>> wrote:
>>> >
>>> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
>>> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
>>> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
>>> > >> > Hi,
>>> > >> > I encountered the following issue while booting 5.8-rc1 on 
>>> Qemu for RV64.
>>> > >> > I added additional dump_stack and observed that it's 
>>> happening in bpf free path.
>>> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
>>> > >> > compiled away without that.
>>> > >> > 
>>> ------------------------------------------------------------------------ 
>>>
>>> > >> > forked to background, child pid 113
>>> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
>>> > >> > 5.8.0-rc1-dirty #732
>>> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
>>> > >> > [   10.334133] Call Trace:
>>> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
>>> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
>>> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
>>> > >> > [   10.341451] [<ffffffe0002db4ce>] 
>>> mmap_assert_locked.part.13+0x14/0x1c
>>> > >> > [   10.342131] [<ffffffe0002db330>] 
>>> walk_page_range_novma+0x0/0x4e
>>> > >> > [   10.342973] [<ffffffe000204f94>] 
>>> set_direct_map_invalid_noflush+0x66/0x6e
>>> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
>>> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
>>> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
>>> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
>>> > >> > [   10.346529] [<ffffffe0002801a2>] 
>>> bpf_prog_free_deferred+0x74/0x8a
>>> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
>>> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
>>> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
>>> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
>>> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 
>>> seqnum 0 task_size 0
>>> > >> > [   10.354405] get_unmapped_area 0000000000000000
>>> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
>>> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
>>> > >> > pgtables_bytes 8192 map_count 0
>>> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
>>> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
>>> > >> > [   10.354405] start_code ffffffe000200000 end_code 
>>> ffffffe00084acc2
>>> > >> > start_data 0 end_data ffffffe00106dfe4
>>> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
>>> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
>>> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 
>>> 0000000000000000
>>> > >> > [   10.354405] ioctx_table 0000000000000000
>>> > >> > [   10.354405] exe_file 0000000000000000
>>> > >> > [   10.354405] tlb_flush_pending 0
>>> > >> > [   10.354405] def_flags: 0x0()
>>> > >> > [   10.369325] ------------[ cut here ]------------
>>> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
>>> > >> > [   10.375235] Kernel BUG [#1]
>>> > >> > [   10.377198] Modules linked in:
>>> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 
>>> 5.8.0-rc1-dirty #732
>>> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
>>> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp 
>>> : ffffffe3eaea7c70
>>> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
>>> > >> > ffffffe0010836e8
>>> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
>>> > >> > ffffffe3eaea7c90
>>> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
>>> > >> > 0000000000000020
>>> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
>>> > >> > ffffffe001012758
>>> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
>>> > >> > 0000000000000006
>>> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
>>> > >> > ffffffe00106e100
>>> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
>>> > >> > 6db6db6db6db6db7
>>> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
>>> > >> > ffffffffffffffff
>>> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
>>> > >> > 000000000001a7a8
>>> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
>>> > >> > [   10.393096] status: 0000000000000120 badaddr: 
>>> 0000000000000000
>>> > >> > cause: 0000000000000003
>>> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
>>> > >> > 
>>> ---------------------------------------------------------------------------------------------------
>>> > >> >
>>> > >> > I haven't had the chance to bisect to figure out which commit 
>>> caused
>>> > >> > the issue. Just wanted
>>> > >> > to check if it is a known issue already.
>>> > >>
>>> > >> Hi Atish,
>>> > >>
>>> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on 
>>> OpenRISC.  If
>>> > >> you have any updates please post back.  I will try to look into 
>>> this today or
>>> > >> tomorrow.
>>> > >
>>> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
>>> > >
>>> > >   mmap locking API: add mmap_assert_locked() and 
>>> mmap_assert_write_locked()
>>> > >
>>> > > This should have just changed the existing lockdep api's but 
>>> something has
>>> > > changed.  I haven't had time to look at it yet.
>>> > >
>>> > > Ccing: Michel Lespinasse <walken@google.com>
>>> >
>>> > This isn't manifesting on boot for me, on either rc1 or that 
>>> commit.  I'm
>>> > running a simple buildroot-based userspace, so I doubt anything is 
>>> triggering
>>> > BPF.  I don't run the BPF selftests, as they're a bit of a pain 
>>> (IIRC they
>>> > don't cross compile and need LLVM) -- does anyone have a userspace 
>>> I can use to
>>> > trigger the bug?
>>>
>> I am also using buildroot based userspace but it's a bit bulky because
>> of my config.
>> You can access it from here:
>> https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls
>
> I'm up to ~900M trying to get enough of a userspace to run the BPF 
> selftests,
> so this would be way better.  Unfortunately I'm not actually getting the
> crash with your rootfs.
>

I have what looks like the same bug by simply using the TEST_BPF config, 
with a very small rootfs
generated by buildroot if that can help.


>>
>> A defconfig build & boot in Qemu with above userspace is sufficient to
>> trigger the bug.
>>
>> FYI: I noticed the kernel bug message every time during ssh-key
>> generation. Not sure if that is related.
>>
>>>
>>>
>>> -- 
>>> Michel "Walken" Lespinasse
>>> A program is never fully debugged until the last user dies.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 10:44       ` Michel Lespinasse
  2020-06-16 17:54         ` Atish Patra
@ 2020-06-16 19:19         ` Stafford Horne
  2020-06-16 19:41           ` Atish Patra
  1 sibling, 1 reply; 18+ messages in thread
From: Stafford Horne @ 2020-06-16 19:19 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: Palmer Dabbelt, atishp, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 03:44:49AM -0700, Michel Lespinasse wrote:
> I am also unable to reproduce the issue so far.
> 
> I wanted to point to a few things in case this helps:
> - Commit 42fc541404f2 was bisected as the cause. This commit changes
> walk_page_range_novma() to use mmap_assert_locked() instead of
> lockdep_assert_held()
> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> that the rwsem itself is locked.
> 
> Now how could lockdep think the lock is held, but the lock itself is
> not marked as locked ???
> 
> I'm not sure if it helps at all, but a few commits earlier,
> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> exclusively by stackmap, and does the opposite: it acquires the mmap
> lock without telling lockdep about it. I can't see any smoking gun
> linking this to our bug, but I thought it may be worth mentioning as
> it involves the same suspects (stackmap and the difference between
> owning the lock vs lockdep thinking we own the lock).
> 
> I'm sorry, that's only how far I was able to go on this bug - I'm not
> sure how to investigate it further as I can not reproduce the issue...

Hello Michel,

Thanks for your notes, it may be related.

On my OpenRISC build with DEBUG_LOCKDEP and DEBUG_VM I am seeing the below
trace.  Without DEBUG_VM enabled the system does seem to work normally.

Some notes on the trace:
 - The mmap_assert_locked failure is not in the same place as RISC-V, but
   similarly during walk_page_range not walk_page_range_novma.
 - This build does not use BPF.
 - I do see a LOCKDEP warning early on, which may be the cause.

I will still do some more investination, you input is very helpful.

-Stafford

exec: /home/shorne/work/openrisc/qemu/build/or1k-softmmu/qemu-system-or1k -cpu or1200 -M or1k-sim -kernel /home/shorne/work/linux/vmlinux -net nic -net tap,ifname=tap0,script=no,downscript=no -serial mon:stdio -nographic -gdb tcp::10001 -m 512
[    0.000000] Compiled-in FDT at (ptrval)
[    0.000000] Linux version 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty (shorne@lianli.shorne-pla.net) (or1k-elf-gcc (GCC) 9.0.1 20190409 (experimental), GNU ld (GNU Binutils) 2.32.51.20190425) #140 Wed Jun 17 04:01:43 JST 2020
[    0.000000] CPU: OpenRISC-13 (revision 8) @20 MHz
[    0.000000] -- dcache disabled
[    0.000000] -- icache disabled
[    0.000000] -- dmmu:  128 entries, 1 way(s)
[    0.000000] -- immu:  128 entries, 1 way(s)
[    0.000000] -- additional features:
[    0.000000] -- power management
[    0.000000] -- PIC
[    0.000000] -- timer
[    0.000000] setup_memory: Memory: 0x0-0x2000000
[    0.000000] Setting up paging and PTEs.
[    0.000000] map_ram: Memory: 0x0-0x2000000
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000001ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000001ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
[    0.000000] On node 0 totalpages: 4096
[    0.000000]   Normal zone: 16 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 4096 pages, LIFO batch:0
[    0.000000] itlb_miss_handler (ptrval)
[    0.000000] dtlb_miss_handler (ptrval)
[    0.000000] OpenRISC Linux -- http://openrisc.io
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 4080
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
[    0.000000] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
[    0.000000] Call trace:
[    0.000000] [<(ptrval)>] dump_stack+0x34/0x48
[    0.000000] [<(ptrval)>] __warn+0x104/0x158
[    0.000000] [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec
[    0.000000] [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94
[    0.000000] [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec
[    0.000000] [<(ptrval)>] ? vprintk_emit+0x1e8/0x3d8
[    0.000000] [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30
[    0.000000] [<(ptrval)>] mutex_unlock+0x18/0x28
[    0.000000] [<(ptrval)>] __cpuhp_setup_state_cpuslocked+0x200/0x2f4
[    0.000000] [<(ptrval)>] ? vprintk_func+0x94/0x14c
[    0.000000] [<(ptrval)>] ? start_kernel+0x0/0x684
[    0.000000] [<(ptrval)>] __cpuhp_setup_state+0x14/0x24
[    0.000000] [<(ptrval)>] page_alloc_init+0x34/0x68
[    0.000000] [<(ptrval)>] ? start_kernel+0x1a0/0x684
[    0.000000] [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0): [<00000000>] 0x0
[    0.000000] hardirqs last disabled at (0): [<00000000>] 0x0
[    0.000000] softirqs last  enabled at (0): [<00000000>] 0x0
[    0.000000] softirqs last disabled at (0): [<00000000>] 0x0
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x98 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel command line: earlycon
[    0.000000] earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
[    0.000000] Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 11384K/32768K available (4756K kernel code, 380K rwdata, 1160K rodata, 4072K init, 10784K bss, 21384K reserved, 0K cma-reserved)
[    0.000000] mem_init_done ...........................................
[    0.000000] Running RCU self tests
[    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
[    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 95563022313 ns
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8192
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
[    0.000000] ... CHAINHASH_SIZE:          32768
[    0.000000]  memory used by lock dependency info: 3773 kB
[    0.000000]  memory used for stack traces: 2112 kB
[    0.000000]  per task-struct memory footprint: 1344 bytes
[    0.010000] 40.00 BogoMIPS (lpj=200000)
[    0.010000] pid_max: default: 32768 minimum: 301
[    0.010000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.010000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.150000] devtmpfs: initialized
[    0.210000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.210000] futex hash table entries: 256 (order: 0, 12288 bytes, linear)
[    0.240000] NET: Registered protocol family 16
[    0.450000] clocksource: Switched to clocksource openrisc_timer
[    2.820000] NET: Registered protocol family 2
[    2.840000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 11264 bytes, linear)
[    2.850000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    2.850000] TCP bind hash table entries: 2048 (order: 3, 81920 bytes, linear)
[    2.850000] TCP: Hash tables configured (established 2048 bind 2048)
[    2.860000] UDP hash table entries: 256 (order: 1, 24576 bytes, linear)
[    2.860000] UDP-Lite hash table entries: 256 (order: 1, 24576 bytes, linear)
[    2.880000] NET: Registered protocol family 1
[    2.900000] RPC: Registered named UNIX socket transport module.
[    2.900000] RPC: Registered udp transport module.
[    2.900000] RPC: Registered tcp transport module.
[    2.900000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    4.120000] workingset: timestamp_bits=30 max_order=11 bucket_order=0
[    4.150000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    4.210000] printk: console [ttyS0] disabled
[    4.220000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A
[    4.230000] printk: console [ttyS0] enabled
[    4.230000] printk: console [ttyS0] enabled
[    4.230000] printk: bootconsole [ns16550a0] disabled
[    4.230000] printk: bootconsole [ns16550a0] disabled
[    4.280000] libphy: Fixed MDIO Bus: probed
[    4.310000] mm c05fb530 mmap 00000000 seqnum 0 task_size 0
[    4.310000] get_unmapped_area 00000000
[    4.310000] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
[    4.310000] pgd c05d2000 mm_users 2 mm_count 2 pgtables_bytes 0 map_count 0
[    4.310000] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
[    4.310000] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
[    4.310000] start_code c0000000 end_code c04a5208 start_data 0 end_data c06263d8
[    4.310000] start_brk 0 brk c14aa360 start_stack 0
[    4.310000] arg_start 0 arg_end 0 env_start 0 env_end 0
[    4.310000] binfmt 00000000 flags 0 core_state 00000000
[    4.310000] exe_file 00000000
[    4.310000] tlb_flush_pending 0
[    4.310000] def_flags: 0x0()
[    4.310000] BUG: failure at include/linux/mmap_lock.h:81/mmap_assert_locked()!
[    4.310000] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
[    4.310000] Call trace:
[    4.320000] [<(ptrval)>] dump_stack+0x34/0x48
[    4.320000] [<(ptrval)>] walk_page_range+0x230/0x244
[    4.320000] [<(ptrval)>] arch_dma_set_uncached+0x38/0x60
[    4.320000] [<(ptrval)>] dma_direct_alloc_pages+0x104/0x178
[    4.320000] [<(ptrval)>] dma_direct_alloc+0x14/0x24
[    4.320000] [<(ptrval)>] dma_alloc_attrs+0x9c/0x13c
[    4.320000] [<(ptrval)>] dmam_alloc_attrs+0x80/0xf4
[    4.320000] [<(ptrval)>] ethoc_probe+0x1b0/0x884
[    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
[    4.320000] [<(ptrval)>] platform_drv_probe+0x3c/0xa8
[    4.320000] [<(ptrval)>] really_probe+0x238/0x42c
[    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
[    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
[    4.320000] [<(ptrval)>] driver_probe_device+0x50/0xa8
[    4.320000] [<(ptrval)>] ? __device_driver_lock+0x44/0x84
[    4.320000] [<(ptrval)>] device_driver_attach+0x88/0x90
[    4.320000] [<(ptrval)>] __driver_attach+0x84/0x140
[    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
[    4.330000] [<(ptrval)>] bus_for_each_dev+0x8c/0xd4
[    4.330000] [<(ptrval)>] driver_attach+0x28/0x38
[    4.330000] [<(ptrval)>] bus_add_driver+0x1dc/0x270
[    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
[    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
[    4.330000] [<(ptrval)>] driver_register+0xbc/0x1a8
[    4.330000] [<(ptrval)>] ? phy_drivers_register+0x68/0xdc
[    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
[    4.330000] [<(ptrval)>] __platform_driver_register+0x50/0x60
[    4.330000] [<(ptrval)>] ethoc_driver_init+0x20/0x30
[    4.330000] [<(ptrval)>] do_one_initcall+0x80/0x2e0
[    4.330000] [<(ptrval)>] ? parse_args+0x270/0x4f0
[    4.330000] [<(ptrval)>] kernel_init_freeable+0x1c8/0x268
[    4.330000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
[    4.330000] [<(ptrval)>] ? kernel_init+0x0/0x144
[    4.330000] [<(ptrval)>] kernel_init+0x18/0x144
[    4.340000] [<(ptrval)>] ? schedule_tail+0x54/0x94
[    4.340000] [<(ptrval)>] ret_from_fork+0x1c/0x150
[    4.340000] Kernel panic - not syncing: BUG!
[    4.340000] ---[ end Kernel panic - not syncing: BUG! ]---


> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > >> > Hi,
> > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > >> > compiled away without that.
> > >> > ------------------------------------------------------------------------
> > >> > forked to background, child pid 113
> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > >> > 5.8.0-rc1-dirty #732
> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > >> > [   10.334133] Call Trace:
> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > >> > [   10.354405] get_unmapped_area 0000000000000000
> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > >> > pgtables_bytes 8192 map_count 0
> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > >> > start_data 0 end_data ffffffe00106dfe4
> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > >> > [   10.354405] ioctx_table 0000000000000000
> > >> > [   10.354405] exe_file 0000000000000000
> > >> > [   10.354405] tlb_flush_pending 0
> > >> > [   10.354405] def_flags: 0x0()
> > >> > [   10.369325] ------------[ cut here ]------------
> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > >> > [   10.375235] Kernel BUG [#1]
> > >> > [   10.377198] Modules linked in:
> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > >> > ffffffe0010836e8
> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > >> > ffffffe3eaea7c90
> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > >> > 0000000000000020
> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > >> > ffffffe001012758
> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > >> > 0000000000000006
> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > >> > ffffffe00106e100
> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > >> > 6db6db6db6db6db7
> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > >> > ffffffffffffffff
> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > >> > 000000000001a7a8
> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > >> > cause: 0000000000000003
> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > >> > ---------------------------------------------------------------------------------------------------
> > >> >
> > >> > I haven't had the chance to bisect to figure out which commit caused
> > >> > the issue. Just wanted
> > >> > to check if it is a known issue already.
> > >>
> > >> Hi Atish,
> > >>
> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > >> you have any updates please post back.  I will try to look into this today or
> > >> tomorrow.
> > >
> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > >
> > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > >
> > > This should have just changed the existing lockdep api's but something has
> > > changed.  I haven't had time to look at it yet.
> > >
> > > Ccing: Michel Lespinasse <walken@google.com>
> >
> > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > running a simple buildroot-based userspace, so I doubt anything is triggering
> > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > trigger the bug?
> 
> 
> 
> -- 
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 19:19         ` Stafford Horne
@ 2020-06-16 19:41           ` Atish Patra
  2020-06-16 20:47             ` Michel Lespinasse
  0 siblings, 1 reply; 18+ messages in thread
From: Atish Patra @ 2020-06-16 19:41 UTC (permalink / raw)
  To: Stafford Horne
  Cc: Michel Lespinasse, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 12:19 PM Stafford Horne <shorne@gmail.com> wrote:
>
> On Tue, Jun 16, 2020 at 03:44:49AM -0700, Michel Lespinasse wrote:
> > I am also unable to reproduce the issue so far.
> >
> > I wanted to point to a few things in case this helps:
> > - Commit 42fc541404f2 was bisected as the cause. This commit changes
> > walk_page_range_novma() to use mmap_assert_locked() instead of
> > lockdep_assert_held()
> > - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> > that the rwsem itself is locked.
> >
> > Now how could lockdep think the lock is held, but the lock itself is
> > not marked as locked ???
> >
> > I'm not sure if it helps at all, but a few commits earlier,
> > 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> > exclusively by stackmap, and does the opposite: it acquires the mmap
> > lock without telling lockdep about it. I can't see any smoking gun
> > linking this to our bug, but I thought it may be worth mentioning as
> > it involves the same suspects (stackmap and the difference between
> > owning the lock vs lockdep thinking we own the lock).
> >
> > I'm sorry, that's only how far I was able to go on this bug - I'm not
> > sure how to investigate it further as I can not reproduce the issue...
>
> Hello Michel,
>
> Thanks for your notes, it may be related.
>
> On my OpenRISC build with DEBUG_LOCKDEP and DEBUG_VM I am seeing the below
> trace.  Without DEBUG_VM enabled the system does seem to work normally.
>
> Some notes on the trace:
>  - The mmap_assert_locked failure is not in the same place as RISC-V, but
>    similarly during walk_page_range not walk_page_range_novma.
>  - This build does not use BPF.
>  - I do see a LOCKDEP warning early on, which may be the cause.
>

I don't have LOCKDEP enabled in RISC-V. I guess we are seeing the same
bug caused in different paths.
Here is the full boot log for RISC-V if it helps.

https://paste.centos.org/view/bb5ee1e7

> I will still do some more investination, you input is very helpful.
>
> -Stafford
>
> exec: /home/shorne/work/openrisc/qemu/build/or1k-softmmu/qemu-system-or1k -cpu or1200 -M or1k-sim -kernel /home/shorne/work/linux/vmlinux -net nic -net tap,ifname=tap0,script=no,downscript=no -serial mon:stdio -nographic -gdb tcp::10001 -m 512
> [    0.000000] Compiled-in FDT at (ptrval)
> [    0.000000] Linux version 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty (shorne@lianli.shorne-pla.net) (or1k-elf-gcc (GCC) 9.0.1 20190409 (experimental), GNU ld (GNU Binutils) 2.32.51.20190425) #140 Wed Jun 17 04:01:43 JST 2020
> [    0.000000] CPU: OpenRISC-13 (revision 8) @20 MHz
> [    0.000000] -- dcache disabled
> [    0.000000] -- icache disabled
> [    0.000000] -- dmmu:  128 entries, 1 way(s)
> [    0.000000] -- immu:  128 entries, 1 way(s)
> [    0.000000] -- additional features:
> [    0.000000] -- power management
> [    0.000000] -- PIC
> [    0.000000] -- timer
> [    0.000000] setup_memory: Memory: 0x0-0x2000000
> [    0.000000] Setting up paging and PTEs.
> [    0.000000] map_ram: Memory: 0x0-0x2000000
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x0000000001ffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x0000000001ffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
> [    0.000000] On node 0 totalpages: 4096
> [    0.000000]   Normal zone: 16 pages used for memmap
> [    0.000000]   Normal zone: 0 pages reserved
> [    0.000000]   Normal zone: 4096 pages, LIFO batch:0
> [    0.000000] itlb_miss_handler (ptrval)
> [    0.000000] dtlb_miss_handler (ptrval)
> [    0.000000] OpenRISC Linux -- http://openrisc.io
> [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> [    0.000000] pcpu-alloc: [0] 0
> [    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 4080
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
> [    0.000000] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> [    0.000000] Call trace:
> [    0.000000] [<(ptrval)>] dump_stack+0x34/0x48
> [    0.000000] [<(ptrval)>] __warn+0x104/0x158
> [    0.000000] [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec
> [    0.000000] [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94
> [    0.000000] [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec
> [    0.000000] [<(ptrval)>] ? vprintk_emit+0x1e8/0x3d8
> [    0.000000] [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30
> [    0.000000] [<(ptrval)>] mutex_unlock+0x18/0x28
> [    0.000000] [<(ptrval)>] __cpuhp_setup_state_cpuslocked+0x200/0x2f4
> [    0.000000] [<(ptrval)>] ? vprintk_func+0x94/0x14c
> [    0.000000] [<(ptrval)>] ? start_kernel+0x0/0x684
> [    0.000000] [<(ptrval)>] __cpuhp_setup_state+0x14/0x24
> [    0.000000] [<(ptrval)>] page_alloc_init+0x34/0x68
> [    0.000000] [<(ptrval)>] ? start_kernel+0x1a0/0x684
> [    0.000000] [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70
> [    0.000000] irq event stamp: 0
> [    0.000000] hardirqs last  enabled at (0): [<00000000>] 0x0
> [    0.000000] hardirqs last disabled at (0): [<00000000>] 0x0
> [    0.000000] softirqs last  enabled at (0): [<00000000>] 0x0
> [    0.000000] softirqs last disabled at (0): [<00000000>] 0x0
> [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x98 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel command line: earlycon
> [    0.000000] earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')
> [    0.000000] printk: bootconsole [ns16550a0] enabled
> [    0.000000] Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> [    0.000000] Sorting __ex_table...
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 11384K/32768K available (4756K kernel code, 380K rwdata, 1160K rodata, 4072K init, 10784K bss, 21384K reserved, 0K cma-reserved)
> [    0.000000] mem_init_done ...........................................
> [    0.000000] Running RCU self tests
> [    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
> [    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 95563022313 ns
> [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> [    0.000000] ... MAX_LOCK_DEPTH:          48
> [    0.000000] ... MAX_LOCKDEP_KEYS:        8192
> [    0.000000] ... CLASSHASH_SIZE:          4096
> [    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
> [    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
> [    0.000000] ... CHAINHASH_SIZE:          32768
> [    0.000000]  memory used by lock dependency info: 3773 kB
> [    0.000000]  memory used for stack traces: 2112 kB
> [    0.000000]  per task-struct memory footprint: 1344 bytes
> [    0.010000] 40.00 BogoMIPS (lpj=200000)
> [    0.010000] pid_max: default: 32768 minimum: 301
> [    0.010000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> [    0.010000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> [    0.150000] devtmpfs: initialized
> [    0.210000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.210000] futex hash table entries: 256 (order: 0, 12288 bytes, linear)
> [    0.240000] NET: Registered protocol family 16
> [    0.450000] clocksource: Switched to clocksource openrisc_timer
> [    2.820000] NET: Registered protocol family 2
> [    2.840000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 11264 bytes, linear)
> [    2.850000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
> [    2.850000] TCP bind hash table entries: 2048 (order: 3, 81920 bytes, linear)
> [    2.850000] TCP: Hash tables configured (established 2048 bind 2048)
> [    2.860000] UDP hash table entries: 256 (order: 1, 24576 bytes, linear)
> [    2.860000] UDP-Lite hash table entries: 256 (order: 1, 24576 bytes, linear)
> [    2.880000] NET: Registered protocol family 1
> [    2.900000] RPC: Registered named UNIX socket transport module.
> [    2.900000] RPC: Registered udp transport module.
> [    2.900000] RPC: Registered tcp transport module.
> [    2.900000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [    4.120000] workingset: timestamp_bits=30 max_order=11 bucket_order=0
> [    4.150000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    4.210000] printk: console [ttyS0] disabled
> [    4.220000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A
> [    4.230000] printk: console [ttyS0] enabled
> [    4.230000] printk: console [ttyS0] enabled
> [    4.230000] printk: bootconsole [ns16550a0] disabled
> [    4.230000] printk: bootconsole [ns16550a0] disabled
> [    4.280000] libphy: Fixed MDIO Bus: probed
> [    4.310000] mm c05fb530 mmap 00000000 seqnum 0 task_size 0
> [    4.310000] get_unmapped_area 00000000
> [    4.310000] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> [    4.310000] pgd c05d2000 mm_users 2 mm_count 2 pgtables_bytes 0 map_count 0
> [    4.310000] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> [    4.310000] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> [    4.310000] start_code c0000000 end_code c04a5208 start_data 0 end_data c06263d8
> [    4.310000] start_brk 0 brk c14aa360 start_stack 0
> [    4.310000] arg_start 0 arg_end 0 env_start 0 env_end 0
> [    4.310000] binfmt 00000000 flags 0 core_state 00000000
> [    4.310000] exe_file 00000000
> [    4.310000] tlb_flush_pending 0
> [    4.310000] def_flags: 0x0()
> [    4.310000] BUG: failure at include/linux/mmap_lock.h:81/mmap_assert_locked()!
> [    4.310000] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> [    4.310000] Call trace:
> [    4.320000] [<(ptrval)>] dump_stack+0x34/0x48
> [    4.320000] [<(ptrval)>] walk_page_range+0x230/0x244
> [    4.320000] [<(ptrval)>] arch_dma_set_uncached+0x38/0x60
> [    4.320000] [<(ptrval)>] dma_direct_alloc_pages+0x104/0x178
> [    4.320000] [<(ptrval)>] dma_direct_alloc+0x14/0x24
> [    4.320000] [<(ptrval)>] dma_alloc_attrs+0x9c/0x13c
> [    4.320000] [<(ptrval)>] dmam_alloc_attrs+0x80/0xf4
> [    4.320000] [<(ptrval)>] ethoc_probe+0x1b0/0x884
> [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> [    4.320000] [<(ptrval)>] platform_drv_probe+0x3c/0xa8
> [    4.320000] [<(ptrval)>] really_probe+0x238/0x42c
> [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> [    4.320000] [<(ptrval)>] driver_probe_device+0x50/0xa8
> [    4.320000] [<(ptrval)>] ? __device_driver_lock+0x44/0x84
> [    4.320000] [<(ptrval)>] device_driver_attach+0x88/0x90
> [    4.320000] [<(ptrval)>] __driver_attach+0x84/0x140
> [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> [    4.330000] [<(ptrval)>] bus_for_each_dev+0x8c/0xd4
> [    4.330000] [<(ptrval)>] driver_attach+0x28/0x38
> [    4.330000] [<(ptrval)>] bus_add_driver+0x1dc/0x270
> [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> [    4.330000] [<(ptrval)>] driver_register+0xbc/0x1a8
> [    4.330000] [<(ptrval)>] ? phy_drivers_register+0x68/0xdc
> [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> [    4.330000] [<(ptrval)>] __platform_driver_register+0x50/0x60
> [    4.330000] [<(ptrval)>] ethoc_driver_init+0x20/0x30
> [    4.330000] [<(ptrval)>] do_one_initcall+0x80/0x2e0
> [    4.330000] [<(ptrval)>] ? parse_args+0x270/0x4f0
> [    4.330000] [<(ptrval)>] kernel_init_freeable+0x1c8/0x268
> [    4.330000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> [    4.330000] [<(ptrval)>] ? kernel_init+0x0/0x144
> [    4.330000] [<(ptrval)>] kernel_init+0x18/0x144
> [    4.340000] [<(ptrval)>] ? schedule_tail+0x54/0x94
> [    4.340000] [<(ptrval)>] ret_from_fork+0x1c/0x150
> [    4.340000] Kernel panic - not syncing: BUG!
> [    4.340000] ---[ end Kernel panic - not syncing: BUG! ]---
>
>
> > On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > >
> > > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > > >> > Hi,
> > > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > > >> > compiled away without that.
> > > >> > ------------------------------------------------------------------------
> > > >> > forked to background, child pid 113
> > > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > > >> > 5.8.0-rc1-dirty #732
> > > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > > >> > [   10.334133] Call Trace:
> > > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > > >> > [   10.354405] get_unmapped_area 0000000000000000
> > > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > > >> > pgtables_bytes 8192 map_count 0
> > > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > > >> > start_data 0 end_data ffffffe00106dfe4
> > > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > > >> > [   10.354405] ioctx_table 0000000000000000
> > > >> > [   10.354405] exe_file 0000000000000000
> > > >> > [   10.354405] tlb_flush_pending 0
> > > >> > [   10.354405] def_flags: 0x0()
> > > >> > [   10.369325] ------------[ cut here ]------------
> > > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > > >> > [   10.375235] Kernel BUG [#1]
> > > >> > [   10.377198] Modules linked in:
> > > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > > >> > ffffffe0010836e8
> > > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > > >> > ffffffe3eaea7c90
> > > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > > >> > 0000000000000020
> > > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > > >> > ffffffe001012758
> > > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > > >> > 0000000000000006
> > > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > > >> > ffffffe00106e100
> > > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > > >> > 6db6db6db6db6db7
> > > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > > >> > ffffffffffffffff
> > > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > > >> > 000000000001a7a8
> > > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > > >> > cause: 0000000000000003
> > > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > > >> > ---------------------------------------------------------------------------------------------------
> > > >> >
> > > >> > I haven't had the chance to bisect to figure out which commit caused
> > > >> > the issue. Just wanted
> > > >> > to check if it is a known issue already.
> > > >>
> > > >> Hi Atish,
> > > >>
> > > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > > >> you have any updates please post back.  I will try to look into this today or
> > > >> tomorrow.
> > > >
> > > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > > >
> > > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > > >
> > > > This should have just changed the existing lockdep api's but something has
> > > > changed.  I haven't had time to look at it yet.
> > > >
> > > > Ccing: Michel Lespinasse <walken@google.com>
> > >
> > > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > > running a simple buildroot-based userspace, so I doubt anything is triggering
> > > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > > trigger the bug?
> >
> >
> >
> > --
> > Michel "Walken" Lespinasse
> > A program is never fully debugged until the last user dies.



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 19:41           ` Atish Patra
@ 2020-06-16 20:47             ` Michel Lespinasse
  2020-06-17  5:35               ` Stafford Horne
  0 siblings, 1 reply; 18+ messages in thread
From: Michel Lespinasse @ 2020-06-16 20:47 UTC (permalink / raw)
  To: Atish Patra
  Cc: Stafford Horne, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

This makes me wonder actually - maybe there is a latent bug that got
exposed after my change added the rwsem_is_locked assertion to the
lockdep_assert_held one. If that is the case, it may be helpful to
bisect when that issue first appeared, by testing before my patchset
with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
walk_page_range() / walk_page_range_novma() / walk_page_vma() ...

On Tue, Jun 16, 2020 at 12:41 PM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Tue, Jun 16, 2020 at 12:19 PM Stafford Horne <shorne@gmail.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 03:44:49AM -0700, Michel Lespinasse wrote:
> > > I am also unable to reproduce the issue so far.
> > >
> > > I wanted to point to a few things in case this helps:
> > > - Commit 42fc541404f2 was bisected as the cause. This commit changes
> > > walk_page_range_novma() to use mmap_assert_locked() instead of
> > > lockdep_assert_held()
> > > - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> > > that the rwsem itself is locked.
> > >
> > > Now how could lockdep think the lock is held, but the lock itself is
> > > not marked as locked ???
> > >
> > > I'm not sure if it helps at all, but a few commits earlier,
> > > 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> > > exclusively by stackmap, and does the opposite: it acquires the mmap
> > > lock without telling lockdep about it. I can't see any smoking gun
> > > linking this to our bug, but I thought it may be worth mentioning as
> > > it involves the same suspects (stackmap and the difference between
> > > owning the lock vs lockdep thinking we own the lock).
> > >
> > > I'm sorry, that's only how far I was able to go on this bug - I'm not
> > > sure how to investigate it further as I can not reproduce the issue...
> >
> > Hello Michel,
> >
> > Thanks for your notes, it may be related.
> >
> > On my OpenRISC build with DEBUG_LOCKDEP and DEBUG_VM I am seeing the below
> > trace.  Without DEBUG_VM enabled the system does seem to work normally.
> >
> > Some notes on the trace:
> >  - The mmap_assert_locked failure is not in the same place as RISC-V, but
> >    similarly during walk_page_range not walk_page_range_novma.
> >  - This build does not use BPF.
> >  - I do see a LOCKDEP warning early on, which may be the cause.
> >
>
> I don't have LOCKDEP enabled in RISC-V. I guess we are seeing the same
> bug caused in different paths.
> Here is the full boot log for RISC-V if it helps.
>
> https://paste.centos.org/view/bb5ee1e7
>
> > I will still do some more investination, you input is very helpful.
> >
> > -Stafford
> >
> > exec: /home/shorne/work/openrisc/qemu/build/or1k-softmmu/qemu-system-or1k -cpu or1200 -M or1k-sim -kernel /home/shorne/work/linux/vmlinux -net nic -net tap,ifname=tap0,script=no,downscript=no -serial mon:stdio -nographic -gdb tcp::10001 -m 512
> > [    0.000000] Compiled-in FDT at (ptrval)
> > [    0.000000] Linux version 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty (shorne@lianli.shorne-pla.net) (or1k-elf-gcc (GCC) 9.0.1 20190409 (experimental), GNU ld (GNU Binutils) 2.32.51.20190425) #140 Wed Jun 17 04:01:43 JST 2020
> > [    0.000000] CPU: OpenRISC-13 (revision 8) @20 MHz
> > [    0.000000] -- dcache disabled
> > [    0.000000] -- icache disabled
> > [    0.000000] -- dmmu:  128 entries, 1 way(s)
> > [    0.000000] -- immu:  128 entries, 1 way(s)
> > [    0.000000] -- additional features:
> > [    0.000000] -- power management
> > [    0.000000] -- PIC
> > [    0.000000] -- timer
> > [    0.000000] setup_memory: Memory: 0x0-0x2000000
> > [    0.000000] Setting up paging and PTEs.
> > [    0.000000] map_ram: Memory: 0x0-0x2000000
> > [    0.000000] Zone ranges:
> > [    0.000000]   Normal   [mem 0x0000000000000000-0x0000000001ffffff]
> > [    0.000000] Movable zone start for each node
> > [    0.000000] Early memory node ranges
> > [    0.000000]   node   0: [mem 0x0000000000000000-0x0000000001ffffff]
> > [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
> > [    0.000000] On node 0 totalpages: 4096
> > [    0.000000]   Normal zone: 16 pages used for memmap
> > [    0.000000]   Normal zone: 0 pages reserved
> > [    0.000000]   Normal zone: 4096 pages, LIFO batch:0
> > [    0.000000] itlb_miss_handler (ptrval)
> > [    0.000000] dtlb_miss_handler (ptrval)
> > [    0.000000] OpenRISC Linux -- http://openrisc.io
> > [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> > [    0.000000] pcpu-alloc: [0] 0
> > [    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 4080
> > [    0.000000] ------------[ cut here ]------------
> > [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
> > [    0.000000] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > [    0.000000] Call trace:
> > [    0.000000] [<(ptrval)>] dump_stack+0x34/0x48
> > [    0.000000] [<(ptrval)>] __warn+0x104/0x158
> > [    0.000000] [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec
> > [    0.000000] [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94
> > [    0.000000] [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec
> > [    0.000000] [<(ptrval)>] ? vprintk_emit+0x1e8/0x3d8
> > [    0.000000] [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30
> > [    0.000000] [<(ptrval)>] mutex_unlock+0x18/0x28
> > [    0.000000] [<(ptrval)>] __cpuhp_setup_state_cpuslocked+0x200/0x2f4
> > [    0.000000] [<(ptrval)>] ? vprintk_func+0x94/0x14c
> > [    0.000000] [<(ptrval)>] ? start_kernel+0x0/0x684
> > [    0.000000] [<(ptrval)>] __cpuhp_setup_state+0x14/0x24
> > [    0.000000] [<(ptrval)>] page_alloc_init+0x34/0x68
> > [    0.000000] [<(ptrval)>] ? start_kernel+0x1a0/0x684
> > [    0.000000] [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70
> > [    0.000000] irq event stamp: 0
> > [    0.000000] hardirqs last  enabled at (0): [<00000000>] 0x0
> > [    0.000000] hardirqs last disabled at (0): [<00000000>] 0x0
> > [    0.000000] softirqs last  enabled at (0): [<00000000>] 0x0
> > [    0.000000] softirqs last disabled at (0): [<00000000>] 0x0
> > [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x98 with crng_init=0
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] Kernel command line: earlycon
> > [    0.000000] earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')
> > [    0.000000] printk: bootconsole [ns16550a0] enabled
> > [    0.000000] Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
> > [    0.000000] Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > [    0.000000] Sorting __ex_table...
> > [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > [    0.000000] Memory: 11384K/32768K available (4756K kernel code, 380K rwdata, 1160K rodata, 4072K init, 10784K bss, 21384K reserved, 0K cma-reserved)
> > [    0.000000] mem_init_done ...........................................
> > [    0.000000] Running RCU self tests
> > [    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
> > [    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 95563022313 ns
> > [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> > [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> > [    0.000000] ... MAX_LOCK_DEPTH:          48
> > [    0.000000] ... MAX_LOCKDEP_KEYS:        8192
> > [    0.000000] ... CLASSHASH_SIZE:          4096
> > [    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
> > [    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
> > [    0.000000] ... CHAINHASH_SIZE:          32768
> > [    0.000000]  memory used by lock dependency info: 3773 kB
> > [    0.000000]  memory used for stack traces: 2112 kB
> > [    0.000000]  per task-struct memory footprint: 1344 bytes
> > [    0.010000] 40.00 BogoMIPS (lpj=200000)
> > [    0.010000] pid_max: default: 32768 minimum: 301
> > [    0.010000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > [    0.010000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > [    0.150000] devtmpfs: initialized
> > [    0.210000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> > [    0.210000] futex hash table entries: 256 (order: 0, 12288 bytes, linear)
> > [    0.240000] NET: Registered protocol family 16
> > [    0.450000] clocksource: Switched to clocksource openrisc_timer
> > [    2.820000] NET: Registered protocol family 2
> > [    2.840000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 11264 bytes, linear)
> > [    2.850000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > [    2.850000] TCP bind hash table entries: 2048 (order: 3, 81920 bytes, linear)
> > [    2.850000] TCP: Hash tables configured (established 2048 bind 2048)
> > [    2.860000] UDP hash table entries: 256 (order: 1, 24576 bytes, linear)
> > [    2.860000] UDP-Lite hash table entries: 256 (order: 1, 24576 bytes, linear)
> > [    2.880000] NET: Registered protocol family 1
> > [    2.900000] RPC: Registered named UNIX socket transport module.
> > [    2.900000] RPC: Registered udp transport module.
> > [    2.900000] RPC: Registered tcp transport module.
> > [    2.900000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> > [    4.120000] workingset: timestamp_bits=30 max_order=11 bucket_order=0
> > [    4.150000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > [    4.210000] printk: console [ttyS0] disabled
> > [    4.220000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A
> > [    4.230000] printk: console [ttyS0] enabled
> > [    4.230000] printk: console [ttyS0] enabled
> > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > [    4.280000] libphy: Fixed MDIO Bus: probed
> > [    4.310000] mm c05fb530 mmap 00000000 seqnum 0 task_size 0
> > [    4.310000] get_unmapped_area 00000000
> > [    4.310000] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > [    4.310000] pgd c05d2000 mm_users 2 mm_count 2 pgtables_bytes 0 map_count 0
> > [    4.310000] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > [    4.310000] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > [    4.310000] start_code c0000000 end_code c04a5208 start_data 0 end_data c06263d8
> > [    4.310000] start_brk 0 brk c14aa360 start_stack 0
> > [    4.310000] arg_start 0 arg_end 0 env_start 0 env_end 0
> > [    4.310000] binfmt 00000000 flags 0 core_state 00000000
> > [    4.310000] exe_file 00000000
> > [    4.310000] tlb_flush_pending 0
> > [    4.310000] def_flags: 0x0()
> > [    4.310000] BUG: failure at include/linux/mmap_lock.h:81/mmap_assert_locked()!
> > [    4.310000] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > [    4.310000] Call trace:
> > [    4.320000] [<(ptrval)>] dump_stack+0x34/0x48
> > [    4.320000] [<(ptrval)>] walk_page_range+0x230/0x244
> > [    4.320000] [<(ptrval)>] arch_dma_set_uncached+0x38/0x60
> > [    4.320000] [<(ptrval)>] dma_direct_alloc_pages+0x104/0x178
> > [    4.320000] [<(ptrval)>] dma_direct_alloc+0x14/0x24
> > [    4.320000] [<(ptrval)>] dma_alloc_attrs+0x9c/0x13c
> > [    4.320000] [<(ptrval)>] dmam_alloc_attrs+0x80/0xf4
> > [    4.320000] [<(ptrval)>] ethoc_probe+0x1b0/0x884
> > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > [    4.320000] [<(ptrval)>] platform_drv_probe+0x3c/0xa8
> > [    4.320000] [<(ptrval)>] really_probe+0x238/0x42c
> > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > [    4.320000] [<(ptrval)>] driver_probe_device+0x50/0xa8
> > [    4.320000] [<(ptrval)>] ? __device_driver_lock+0x44/0x84
> > [    4.320000] [<(ptrval)>] device_driver_attach+0x88/0x90
> > [    4.320000] [<(ptrval)>] __driver_attach+0x84/0x140
> > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > [    4.330000] [<(ptrval)>] bus_for_each_dev+0x8c/0xd4
> > [    4.330000] [<(ptrval)>] driver_attach+0x28/0x38
> > [    4.330000] [<(ptrval)>] bus_add_driver+0x1dc/0x270
> > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > [    4.330000] [<(ptrval)>] driver_register+0xbc/0x1a8
> > [    4.330000] [<(ptrval)>] ? phy_drivers_register+0x68/0xdc
> > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > [    4.330000] [<(ptrval)>] __platform_driver_register+0x50/0x60
> > [    4.330000] [<(ptrval)>] ethoc_driver_init+0x20/0x30
> > [    4.330000] [<(ptrval)>] do_one_initcall+0x80/0x2e0
> > [    4.330000] [<(ptrval)>] ? parse_args+0x270/0x4f0
> > [    4.330000] [<(ptrval)>] kernel_init_freeable+0x1c8/0x268
> > [    4.330000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > [    4.330000] [<(ptrval)>] ? kernel_init+0x0/0x144
> > [    4.330000] [<(ptrval)>] kernel_init+0x18/0x144
> > [    4.340000] [<(ptrval)>] ? schedule_tail+0x54/0x94
> > [    4.340000] [<(ptrval)>] ret_from_fork+0x1c/0x150
> > [    4.340000] Kernel panic - not syncing: BUG!
> > [    4.340000] ---[ end Kernel panic - not syncing: BUG! ]---
> >
> >
> > > On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > >
> > > > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > > > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > > > >> > Hi,
> > > > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > > > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > > > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > > > >> > compiled away without that.
> > > > >> > ------------------------------------------------------------------------
> > > > >> > forked to background, child pid 113
> > > > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > > > >> > 5.8.0-rc1-dirty #732
> > > > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > > > >> > [   10.334133] Call Trace:
> > > > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > > > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > > > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > > > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > > > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > > > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > > > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > > > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > > > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > > > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > > > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > > > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > > > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > > > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > > > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > > > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > > > >> > [   10.354405] get_unmapped_area 0000000000000000
> > > > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > > > >> > pgtables_bytes 8192 map_count 0
> > > > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > > > >> > start_data 0 end_data ffffffe00106dfe4
> > > > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > > > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > > > >> > [   10.354405] ioctx_table 0000000000000000
> > > > >> > [   10.354405] exe_file 0000000000000000
> > > > >> > [   10.354405] tlb_flush_pending 0
> > > > >> > [   10.354405] def_flags: 0x0()
> > > > >> > [   10.369325] ------------[ cut here ]------------
> > > > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > > > >> > [   10.375235] Kernel BUG [#1]
> > > > >> > [   10.377198] Modules linked in:
> > > > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > > > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > > > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > > > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > > > >> > ffffffe0010836e8
> > > > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > > > >> > ffffffe3eaea7c90
> > > > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > > > >> > 0000000000000020
> > > > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > > > >> > ffffffe001012758
> > > > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > > > >> > 0000000000000006
> > > > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > > > >> > ffffffe00106e100
> > > > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > > > >> > 6db6db6db6db6db7
> > > > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > > > >> > ffffffffffffffff
> > > > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > > > >> > 000000000001a7a8
> > > > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > > > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > > > >> > cause: 0000000000000003
> > > > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > > > >> > ---------------------------------------------------------------------------------------------------
> > > > >> >
> > > > >> > I haven't had the chance to bisect to figure out which commit caused
> > > > >> > the issue. Just wanted
> > > > >> > to check if it is a known issue already.
> > > > >>
> > > > >> Hi Atish,
> > > > >>
> > > > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > > > >> you have any updates please post back.  I will try to look into this today or
> > > > >> tomorrow.
> > > > >
> > > > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > > > >
> > > > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > > > >
> > > > > This should have just changed the existing lockdep api's but something has
> > > > > changed.  I haven't had time to look at it yet.
> > > > >
> > > > > Ccing: Michel Lespinasse <walken@google.com>
> > > >
> > > > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > > > running a simple buildroot-based userspace, so I doubt anything is triggering
> > > > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > > > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > > > trigger the bug?
> > >
> > >
> > >
> > > --
> > > Michel "Walken" Lespinasse
> > > A program is never fully debugged until the last user dies.
>
>
>
> --
> Regards,
> Atish



-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-16 20:47             ` Michel Lespinasse
@ 2020-06-17  5:35               ` Stafford Horne
  2020-06-17  6:07                 ` Stafford Horne
  0 siblings, 1 reply; 18+ messages in thread
From: Stafford Horne @ 2020-06-17  5:35 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Atish Patra, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> This makes me wonder actually - maybe there is a latent bug that got
> exposed after my change added the rwsem_is_locked assertion to the
> lockdep_assert_held one. If that is the case, it may be helpful to
> bisect when that issue first appeared, by testing before my patchset
> with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> walk_page_range() / walk_page_range_novma() / walk_page_vma() ...

Hello,

I tried to bisect it, but I think this issue goes much further back.

Just with the below patch booting fails all the way back to v5.7.

What does this mean by they way, why would mmap_assert_locked() want to assert
that the rwsem_is_locked() is not true?

Why is it not: VM_BUG_ON(rwsem_is_locked(&walk.mm->mmap_sem));

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 928df1638c30..2c093507dbfe 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -396,6 +396,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long
start,
                return -EINVAL;
 
        lockdep_assert_held(&walk.mm->mmap_sem);
+       VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_sem));
 
        vma = find_vma(walk.mm, start);
        do {


-Stafford

> On Tue, Jun 16, 2020 at 12:41 PM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > On Tue, Jun 16, 2020 at 12:19 PM Stafford Horne <shorne@gmail.com> wrote:
> > >
> > > On Tue, Jun 16, 2020 at 03:44:49AM -0700, Michel Lespinasse wrote:
> > > > I am also unable to reproduce the issue so far.
> > > >
> > > > I wanted to point to a few things in case this helps:
> > > > - Commit 42fc541404f2 was bisected as the cause. This commit changes
> > > > walk_page_range_novma() to use mmap_assert_locked() instead of
> > > > lockdep_assert_held()
> > > > - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> > > > that the rwsem itself is locked.
> > > >
> > > > Now how could lockdep think the lock is held, but the lock itself is
> > > > not marked as locked ???
> > > >
> > > > I'm not sure if it helps at all, but a few commits earlier,
> > > > 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> > > > exclusively by stackmap, and does the opposite: it acquires the mmap
> > > > lock without telling lockdep about it. I can't see any smoking gun
> > > > linking this to our bug, but I thought it may be worth mentioning as
> > > > it involves the same suspects (stackmap and the difference between
> > > > owning the lock vs lockdep thinking we own the lock).
> > > >
> > > > I'm sorry, that's only how far I was able to go on this bug - I'm not
> > > > sure how to investigate it further as I can not reproduce the issue...
> > >
> > > Hello Michel,
> > >
> > > Thanks for your notes, it may be related.
> > >
> > > On my OpenRISC build with DEBUG_LOCKDEP and DEBUG_VM I am seeing the below
> > > trace.  Without DEBUG_VM enabled the system does seem to work normally.
> > >
> > > Some notes on the trace:
> > >  - The mmap_assert_locked failure is not in the same place as RISC-V, but
> > >    similarly during walk_page_range not walk_page_range_novma.
> > >  - This build does not use BPF.
> > >  - I do see a LOCKDEP warning early on, which may be the cause.
> > >
> >
> > I don't have LOCKDEP enabled in RISC-V. I guess we are seeing the same
> > bug caused in different paths.
> > Here is the full boot log for RISC-V if it helps.
> >
> > https://paste.centos.org/view/bb5ee1e7
> >
> > > I will still do some more investination, you input is very helpful.
> > >
> > > -Stafford
> > >
> > > exec: /home/shorne/work/openrisc/qemu/build/or1k-softmmu/qemu-system-or1k -cpu or1200 -M or1k-sim -kernel /home/shorne/work/linux/vmlinux -net nic -net tap,ifname=tap0,script=no,downscript=no -serial mon:stdio -nographic -gdb tcp::10001 -m 512
> > > [    0.000000] Compiled-in FDT at (ptrval)
> > > [    0.000000] Linux version 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty (shorne@lianli.shorne-pla.net) (or1k-elf-gcc (GCC) 9.0.1 20190409 (experimental), GNU ld (GNU Binutils) 2.32.51.20190425) #140 Wed Jun 17 04:01:43 JST 2020
> > > [    0.000000] CPU: OpenRISC-13 (revision 8) @20 MHz
> > > [    0.000000] -- dcache disabled
> > > [    0.000000] -- icache disabled
> > > [    0.000000] -- dmmu:  128 entries, 1 way(s)
> > > [    0.000000] -- immu:  128 entries, 1 way(s)
> > > [    0.000000] -- additional features:
> > > [    0.000000] -- power management
> > > [    0.000000] -- PIC
> > > [    0.000000] -- timer
> > > [    0.000000] setup_memory: Memory: 0x0-0x2000000
> > > [    0.000000] Setting up paging and PTEs.
> > > [    0.000000] map_ram: Memory: 0x0-0x2000000
> > > [    0.000000] Zone ranges:
> > > [    0.000000]   Normal   [mem 0x0000000000000000-0x0000000001ffffff]
> > > [    0.000000] Movable zone start for each node
> > > [    0.000000] Early memory node ranges
> > > [    0.000000]   node   0: [mem 0x0000000000000000-0x0000000001ffffff]
> > > [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
> > > [    0.000000] On node 0 totalpages: 4096
> > > [    0.000000]   Normal zone: 16 pages used for memmap
> > > [    0.000000]   Normal zone: 0 pages reserved
> > > [    0.000000]   Normal zone: 4096 pages, LIFO batch:0
> > > [    0.000000] itlb_miss_handler (ptrval)
> > > [    0.000000] dtlb_miss_handler (ptrval)
> > > [    0.000000] OpenRISC Linux -- http://openrisc.io
> > > [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> > > [    0.000000] pcpu-alloc: [0] 0
> > > [    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 4080
> > > [    0.000000] ------------[ cut here ]------------
> > > [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
> > > [    0.000000] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > > [    0.000000] Call trace:
> > > [    0.000000] [<(ptrval)>] dump_stack+0x34/0x48
> > > [    0.000000] [<(ptrval)>] __warn+0x104/0x158
> > > [    0.000000] [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec
> > > [    0.000000] [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94
> > > [    0.000000] [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec
> > > [    0.000000] [<(ptrval)>] ? vprintk_emit+0x1e8/0x3d8
> > > [    0.000000] [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30
> > > [    0.000000] [<(ptrval)>] mutex_unlock+0x18/0x28
> > > [    0.000000] [<(ptrval)>] __cpuhp_setup_state_cpuslocked+0x200/0x2f4
> > > [    0.000000] [<(ptrval)>] ? vprintk_func+0x94/0x14c
> > > [    0.000000] [<(ptrval)>] ? start_kernel+0x0/0x684
> > > [    0.000000] [<(ptrval)>] __cpuhp_setup_state+0x14/0x24
> > > [    0.000000] [<(ptrval)>] page_alloc_init+0x34/0x68
> > > [    0.000000] [<(ptrval)>] ? start_kernel+0x1a0/0x684
> > > [    0.000000] [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70
> > > [    0.000000] irq event stamp: 0
> > > [    0.000000] hardirqs last  enabled at (0): [<00000000>] 0x0
> > > [    0.000000] hardirqs last disabled at (0): [<00000000>] 0x0
> > > [    0.000000] softirqs last  enabled at (0): [<00000000>] 0x0
> > > [    0.000000] softirqs last disabled at (0): [<00000000>] 0x0
> > > [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x98 with crng_init=0
> > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > [    0.000000] Kernel command line: earlycon
> > > [    0.000000] earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')
> > > [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > [    0.000000] Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
> > > [    0.000000] Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > [    0.000000] Sorting __ex_table...
> > > [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > > [    0.000000] Memory: 11384K/32768K available (4756K kernel code, 380K rwdata, 1160K rodata, 4072K init, 10784K bss, 21384K reserved, 0K cma-reserved)
> > > [    0.000000] mem_init_done ...........................................
> > > [    0.000000] Running RCU self tests
> > > [    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
> > > [    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 95563022313 ns
> > > [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> > > [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> > > [    0.000000] ... MAX_LOCK_DEPTH:          48
> > > [    0.000000] ... MAX_LOCKDEP_KEYS:        8192
> > > [    0.000000] ... CLASSHASH_SIZE:          4096
> > > [    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
> > > [    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
> > > [    0.000000] ... CHAINHASH_SIZE:          32768
> > > [    0.000000]  memory used by lock dependency info: 3773 kB
> > > [    0.000000]  memory used for stack traces: 2112 kB
> > > [    0.000000]  per task-struct memory footprint: 1344 bytes
> > > [    0.010000] 40.00 BogoMIPS (lpj=200000)
> > > [    0.010000] pid_max: default: 32768 minimum: 301
> > > [    0.010000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > [    0.010000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > [    0.150000] devtmpfs: initialized
> > > [    0.210000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> > > [    0.210000] futex hash table entries: 256 (order: 0, 12288 bytes, linear)
> > > [    0.240000] NET: Registered protocol family 16
> > > [    0.450000] clocksource: Switched to clocksource openrisc_timer
> > > [    2.820000] NET: Registered protocol family 2
> > > [    2.840000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 11264 bytes, linear)
> > > [    2.850000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > [    2.850000] TCP bind hash table entries: 2048 (order: 3, 81920 bytes, linear)
> > > [    2.850000] TCP: Hash tables configured (established 2048 bind 2048)
> > > [    2.860000] UDP hash table entries: 256 (order: 1, 24576 bytes, linear)
> > > [    2.860000] UDP-Lite hash table entries: 256 (order: 1, 24576 bytes, linear)
> > > [    2.880000] NET: Registered protocol family 1
> > > [    2.900000] RPC: Registered named UNIX socket transport module.
> > > [    2.900000] RPC: Registered udp transport module.
> > > [    2.900000] RPC: Registered tcp transport module.
> > > [    2.900000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > [    4.120000] workingset: timestamp_bits=30 max_order=11 bucket_order=0
> > > [    4.150000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > > [    4.210000] printk: console [ttyS0] disabled
> > > [    4.220000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A
> > > [    4.230000] printk: console [ttyS0] enabled
> > > [    4.230000] printk: console [ttyS0] enabled
> > > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > > [    4.280000] libphy: Fixed MDIO Bus: probed
> > > [    4.310000] mm c05fb530 mmap 00000000 seqnum 0 task_size 0
> > > [    4.310000] get_unmapped_area 00000000
> > > [    4.310000] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > [    4.310000] pgd c05d2000 mm_users 2 mm_count 2 pgtables_bytes 0 map_count 0
> > > [    4.310000] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > [    4.310000] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > [    4.310000] start_code c0000000 end_code c04a5208 start_data 0 end_data c06263d8
> > > [    4.310000] start_brk 0 brk c14aa360 start_stack 0
> > > [    4.310000] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > [    4.310000] binfmt 00000000 flags 0 core_state 00000000
> > > [    4.310000] exe_file 00000000
> > > [    4.310000] tlb_flush_pending 0
> > > [    4.310000] def_flags: 0x0()
> > > [    4.310000] BUG: failure at include/linux/mmap_lock.h:81/mmap_assert_locked()!
> > > [    4.310000] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > > [    4.310000] Call trace:
> > > [    4.320000] [<(ptrval)>] dump_stack+0x34/0x48
> > > [    4.320000] [<(ptrval)>] walk_page_range+0x230/0x244
> > > [    4.320000] [<(ptrval)>] arch_dma_set_uncached+0x38/0x60
> > > [    4.320000] [<(ptrval)>] dma_direct_alloc_pages+0x104/0x178
> > > [    4.320000] [<(ptrval)>] dma_direct_alloc+0x14/0x24
> > > [    4.320000] [<(ptrval)>] dma_alloc_attrs+0x9c/0x13c
> > > [    4.320000] [<(ptrval)>] dmam_alloc_attrs+0x80/0xf4
> > > [    4.320000] [<(ptrval)>] ethoc_probe+0x1b0/0x884
> > > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > [    4.320000] [<(ptrval)>] platform_drv_probe+0x3c/0xa8
> > > [    4.320000] [<(ptrval)>] really_probe+0x238/0x42c
> > > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > [    4.320000] [<(ptrval)>] driver_probe_device+0x50/0xa8
> > > [    4.320000] [<(ptrval)>] ? __device_driver_lock+0x44/0x84
> > > [    4.320000] [<(ptrval)>] device_driver_attach+0x88/0x90
> > > [    4.320000] [<(ptrval)>] __driver_attach+0x84/0x140
> > > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > > [    4.330000] [<(ptrval)>] bus_for_each_dev+0x8c/0xd4
> > > [    4.330000] [<(ptrval)>] driver_attach+0x28/0x38
> > > [    4.330000] [<(ptrval)>] bus_add_driver+0x1dc/0x270
> > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > [    4.330000] [<(ptrval)>] driver_register+0xbc/0x1a8
> > > [    4.330000] [<(ptrval)>] ? phy_drivers_register+0x68/0xdc
> > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > [    4.330000] [<(ptrval)>] __platform_driver_register+0x50/0x60
> > > [    4.330000] [<(ptrval)>] ethoc_driver_init+0x20/0x30
> > > [    4.330000] [<(ptrval)>] do_one_initcall+0x80/0x2e0
> > > [    4.330000] [<(ptrval)>] ? parse_args+0x270/0x4f0
> > > [    4.330000] [<(ptrval)>] kernel_init_freeable+0x1c8/0x268
> > > [    4.330000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > [    4.330000] [<(ptrval)>] ? kernel_init+0x0/0x144
> > > [    4.330000] [<(ptrval)>] kernel_init+0x18/0x144
> > > [    4.340000] [<(ptrval)>] ? schedule_tail+0x54/0x94
> > > [    4.340000] [<(ptrval)>] ret_from_fork+0x1c/0x150
> > > [    4.340000] Kernel panic - not syncing: BUG!
> > > [    4.340000] ---[ end Kernel panic - not syncing: BUG! ]---
> > >
> > >
> > > > On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > > >
> > > > > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > > > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > > > > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > > > > >> > Hi,
> > > > > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > > > > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > > > > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > > > > >> > compiled away without that.
> > > > > >> > ------------------------------------------------------------------------
> > > > > >> > forked to background, child pid 113
> > > > > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > > > > >> > 5.8.0-rc1-dirty #732
> > > > > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > > > > >> > [   10.334133] Call Trace:
> > > > > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > > > > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > > > > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > > > > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > > > > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > > > > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > > > > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > > > > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > > > > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > > > > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > > > > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > > > > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > > > > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > > > > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > > > > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > > > > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > > > > >> > [   10.354405] get_unmapped_area 0000000000000000
> > > > > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > > > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > > > > >> > pgtables_bytes 8192 map_count 0
> > > > > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > > > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > > > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > > > > >> > start_data 0 end_data ffffffe00106dfe4
> > > > > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > > > > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > > > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > > > > >> > [   10.354405] ioctx_table 0000000000000000
> > > > > >> > [   10.354405] exe_file 0000000000000000
> > > > > >> > [   10.354405] tlb_flush_pending 0
> > > > > >> > [   10.354405] def_flags: 0x0()
> > > > > >> > [   10.369325] ------------[ cut here ]------------
> > > > > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > > > > >> > [   10.375235] Kernel BUG [#1]
> > > > > >> > [   10.377198] Modules linked in:
> > > > > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > > > > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > > > > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > > > > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > > > > >> > ffffffe0010836e8
> > > > > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > > > > >> > ffffffe3eaea7c90
> > > > > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > > > > >> > 0000000000000020
> > > > > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > > > > >> > ffffffe001012758
> > > > > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > > > > >> > 0000000000000006
> > > > > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > > > > >> > ffffffe00106e100
> > > > > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > > > > >> > 6db6db6db6db6db7
> > > > > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > > > > >> > ffffffffffffffff
> > > > > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > > > > >> > 000000000001a7a8
> > > > > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > > > > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > > > > >> > cause: 0000000000000003
> > > > > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > > > > >> > ---------------------------------------------------------------------------------------------------
> > > > > >> >
> > > > > >> > I haven't had the chance to bisect to figure out which commit caused
> > > > > >> > the issue. Just wanted
> > > > > >> > to check if it is a known issue already.
> > > > > >>
> > > > > >> Hi Atish,
> > > > > >>
> > > > > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > > > > >> you have any updates please post back.  I will try to look into this today or
> > > > > >> tomorrow.
> > > > > >
> > > > > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > > > > >
> > > > > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > > > > >
> > > > > > This should have just changed the existing lockdep api's but something has
> > > > > > changed.  I haven't had time to look at it yet.
> > > > > >
> > > > > > Ccing: Michel Lespinasse <walken@google.com>
> > > > >
> > > > > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > > > > running a simple buildroot-based userspace, so I doubt anything is triggering
> > > > > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > > > > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > > > > trigger the bug?
> > > >
> > > >
> > > >
> > > > --
> > > > Michel "Walken" Lespinasse
> > > > A program is never fully debugged until the last user dies.
> >
> >
> >
> > --
> > Regards,
> > Atish
> 
> 
> 
> -- 
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-17  5:35               ` Stafford Horne
@ 2020-06-17  6:07                 ` Stafford Horne
  2020-06-17  6:29                   ` Michel Lespinasse
  0 siblings, 1 reply; 18+ messages in thread
From: Stafford Horne @ 2020-06-17  6:07 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Atish Patra, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > This makes me wonder actually - maybe there is a latent bug that got
> > exposed after my change added the rwsem_is_locked assertion to the
> > lockdep_assert_held one. If that is the case, it may be helpful to
> > bisect when that issue first appeared, by testing before my patchset
> > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> 
> Hello,
> 
> I tried to bisect it, but I think this issue goes much further back.
> 
> Just with the below patch booting fails all the way back to v5.7.
> 
> What does this mean by they way, why would mmap_assert_locked() want to assert
> that the rwsem_is_locked() is not true?
> 
> Why is it not: VM_BUG_ON(rwsem_is_locked(&walk.mm->mmap_sem));

Either way,

The openrisc code that was walking the page ranges was not locking mm. I have
added the  below patch to v5.8-rc1 and it seems to work fine.  I will send a
better patch in a bit.

iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index c152a68811dd..bd5f05dd9174 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
         * We need to iterate through the pages, clearing the dcache for
         * them and setting the cache-inhibit bit.
         */
+       mmap_read_lock(&init_mm);
        error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
                        NULL);
+       mmap_read_unlock(&init_mm);
        if (error)
                return ERR_PTR(error);
        return cpu_addr;
@@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
 {
        unsigned long va = (unsigned long)cpu_addr;
 
+       mmap_read_lock(&init_mm);
        /* walk_page_range shouldn't be able to fail here */
        WARN_ON(walk_page_range(&init_mm, va, va + size,
                        &clear_nocache_walk_ops, NULL));
+       mmap_read_unlock(&init_mm);
 }
 
 void arch_sync_dma_for_device(phys_addr_t addr, size_t size,


> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 928df1638c30..2c093507dbfe 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -396,6 +396,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long
> start,
>                 return -EINVAL;
>  
>         lockdep_assert_held(&walk.mm->mmap_sem);
> +       VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_sem));
>  
>         vma = find_vma(walk.mm, start);
>         do {
> 
> 
> -Stafford
> 
> > On Tue, Jun 16, 2020 at 12:41 PM Atish Patra <atishp@atishpatra.org> wrote:
> > >
> > > On Tue, Jun 16, 2020 at 12:19 PM Stafford Horne <shorne@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 16, 2020 at 03:44:49AM -0700, Michel Lespinasse wrote:
> > > > > I am also unable to reproduce the issue so far.
> > > > >
> > > > > I wanted to point to a few things in case this helps:
> > > > > - Commit 42fc541404f2 was bisected as the cause. This commit changes
> > > > > walk_page_range_novma() to use mmap_assert_locked() instead of
> > > > > lockdep_assert_held()
> > > > > - mmap_assert_locked() checks lockdep_assert_held(), but also checks
> > > > > that the rwsem itself is locked.
> > > > >
> > > > > Now how could lockdep think the lock is held, but the lock itself is
> > > > > not marked as locked ???
> > > > >
> > > > > I'm not sure if it helps at all, but a few commits earlier,
> > > > > 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
> > > > > exclusively by stackmap, and does the opposite: it acquires the mmap
> > > > > lock without telling lockdep about it. I can't see any smoking gun
> > > > > linking this to our bug, but I thought it may be worth mentioning as
> > > > > it involves the same suspects (stackmap and the difference between
> > > > > owning the lock vs lockdep thinking we own the lock).
> > > > >
> > > > > I'm sorry, that's only how far I was able to go on this bug - I'm not
> > > > > sure how to investigate it further as I can not reproduce the issue...
> > > >
> > > > Hello Michel,
> > > >
> > > > Thanks for your notes, it may be related.
> > > >
> > > > On my OpenRISC build with DEBUG_LOCKDEP and DEBUG_VM I am seeing the below
> > > > trace.  Without DEBUG_VM enabled the system does seem to work normally.
> > > >
> > > > Some notes on the trace:
> > > >  - The mmap_assert_locked failure is not in the same place as RISC-V, but
> > > >    similarly during walk_page_range not walk_page_range_novma.
> > > >  - This build does not use BPF.
> > > >  - I do see a LOCKDEP warning early on, which may be the cause.
> > > >
> > >
> > > I don't have LOCKDEP enabled in RISC-V. I guess we are seeing the same
> > > bug caused in different paths.
> > > Here is the full boot log for RISC-V if it helps.
> > >
> > > https://paste.centos.org/view/bb5ee1e7
> > >
> > > > I will still do some more investination, you input is very helpful.
> > > >
> > > > -Stafford
> > > >
> > > > exec: /home/shorne/work/openrisc/qemu/build/or1k-softmmu/qemu-system-or1k -cpu or1200 -M or1k-sim -kernel /home/shorne/work/linux/vmlinux -net nic -net tap,ifname=tap0,script=no,downscript=no -serial mon:stdio -nographic -gdb tcp::10001 -m 512
> > > > [    0.000000] Compiled-in FDT at (ptrval)
> > > > [    0.000000] Linux version 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty (shorne@lianli.shorne-pla.net) (or1k-elf-gcc (GCC) 9.0.1 20190409 (experimental), GNU ld (GNU Binutils) 2.32.51.20190425) #140 Wed Jun 17 04:01:43 JST 2020
> > > > [    0.000000] CPU: OpenRISC-13 (revision 8) @20 MHz
> > > > [    0.000000] -- dcache disabled
> > > > [    0.000000] -- icache disabled
> > > > [    0.000000] -- dmmu:  128 entries, 1 way(s)
> > > > [    0.000000] -- immu:  128 entries, 1 way(s)
> > > > [    0.000000] -- additional features:
> > > > [    0.000000] -- power management
> > > > [    0.000000] -- PIC
> > > > [    0.000000] -- timer
> > > > [    0.000000] setup_memory: Memory: 0x0-0x2000000
> > > > [    0.000000] Setting up paging and PTEs.
> > > > [    0.000000] map_ram: Memory: 0x0-0x2000000
> > > > [    0.000000] Zone ranges:
> > > > [    0.000000]   Normal   [mem 0x0000000000000000-0x0000000001ffffff]
> > > > [    0.000000] Movable zone start for each node
> > > > [    0.000000] Early memory node ranges
> > > > [    0.000000]   node   0: [mem 0x0000000000000000-0x0000000001ffffff]
> > > > [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
> > > > [    0.000000] On node 0 totalpages: 4096
> > > > [    0.000000]   Normal zone: 16 pages used for memmap
> > > > [    0.000000]   Normal zone: 0 pages reserved
> > > > [    0.000000]   Normal zone: 4096 pages, LIFO batch:0
> > > > [    0.000000] itlb_miss_handler (ptrval)
> > > > [    0.000000] dtlb_miss_handler (ptrval)
> > > > [    0.000000] OpenRISC Linux -- http://openrisc.io
> > > > [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> > > > [    0.000000] pcpu-alloc: [0] 0
> > > > [    0.000000] Built 1 zonelists, mobility grouping off.  Total pages: 4080
> > > > [    0.000000] ------------[ cut here ]------------
> > > > [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1242 __mutex_unlock_slowpath+0x328/0x3ec
> > > > [    0.000000] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> > > > [    0.000000] Modules linked in:
> > > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > > > [    0.000000] Call trace:
> > > > [    0.000000] [<(ptrval)>] dump_stack+0x34/0x48
> > > > [    0.000000] [<(ptrval)>] __warn+0x104/0x158
> > > > [    0.000000] [<(ptrval)>] ? __mutex_unlock_slowpath+0x328/0x3ec
> > > > [    0.000000] [<(ptrval)>] warn_slowpath_fmt+0x7c/0x94
> > > > [    0.000000] [<(ptrval)>] __mutex_unlock_slowpath+0x328/0x3ec
> > > > [    0.000000] [<(ptrval)>] ? vprintk_emit+0x1e8/0x3d8
> > > > [    0.000000] [<(ptrval)>] ? page_alloc_cpu_dead+0x0/0x30
> > > > [    0.000000] [<(ptrval)>] mutex_unlock+0x18/0x28
> > > > [    0.000000] [<(ptrval)>] __cpuhp_setup_state_cpuslocked+0x200/0x2f4
> > > > [    0.000000] [<(ptrval)>] ? vprintk_func+0x94/0x14c
> > > > [    0.000000] [<(ptrval)>] ? start_kernel+0x0/0x684
> > > > [    0.000000] [<(ptrval)>] __cpuhp_setup_state+0x14/0x24
> > > > [    0.000000] [<(ptrval)>] page_alloc_init+0x34/0x68
> > > > [    0.000000] [<(ptrval)>] ? start_kernel+0x1a0/0x684
> > > > [    0.000000] [<(ptrval)>] ? early_init_dt_scan_nodes+0x60/0x70
> > > > [    0.000000] irq event stamp: 0
> > > > [    0.000000] hardirqs last  enabled at (0): [<00000000>] 0x0
> > > > [    0.000000] hardirqs last disabled at (0): [<00000000>] 0x0
> > > > [    0.000000] softirqs last  enabled at (0): [<00000000>] 0x0
> > > > [    0.000000] softirqs last disabled at (0): [<00000000>] 0x0
> > > > [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x98 with crng_init=0
> > > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > > [    0.000000] Kernel command line: earlycon
> > > > [    0.000000] earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')
> > > > [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > > [    0.000000] Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
> > > > [    0.000000] Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > > [    0.000000] Sorting __ex_table...
> > > > [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > > > [    0.000000] Memory: 11384K/32768K available (4756K kernel code, 380K rwdata, 1160K rodata, 4072K init, 10784K bss, 21384K reserved, 0K cma-reserved)
> > > > [    0.000000] mem_init_done ...........................................
> > > > [    0.000000] Running RCU self tests
> > > > [    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
> > > > [    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 95563022313 ns
> > > > [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> > > > [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> > > > [    0.000000] ... MAX_LOCK_DEPTH:          48
> > > > [    0.000000] ... MAX_LOCKDEP_KEYS:        8192
> > > > [    0.000000] ... CLASSHASH_SIZE:          4096
> > > > [    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
> > > > [    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
> > > > [    0.000000] ... CHAINHASH_SIZE:          32768
> > > > [    0.000000]  memory used by lock dependency info: 3773 kB
> > > > [    0.000000]  memory used for stack traces: 2112 kB
> > > > [    0.000000]  per task-struct memory footprint: 1344 bytes
> > > > [    0.010000] 40.00 BogoMIPS (lpj=200000)
> > > > [    0.010000] pid_max: default: 32768 minimum: 301
> > > > [    0.010000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > > [    0.010000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > > [    0.150000] devtmpfs: initialized
> > > > [    0.210000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> > > > [    0.210000] futex hash table entries: 256 (order: 0, 12288 bytes, linear)
> > > > [    0.240000] NET: Registered protocol family 16
> > > > [    0.450000] clocksource: Switched to clocksource openrisc_timer
> > > > [    2.820000] NET: Registered protocol family 2
> > > > [    2.840000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 11264 bytes, linear)
> > > > [    2.850000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
> > > > [    2.850000] TCP bind hash table entries: 2048 (order: 3, 81920 bytes, linear)
> > > > [    2.850000] TCP: Hash tables configured (established 2048 bind 2048)
> > > > [    2.860000] UDP hash table entries: 256 (order: 1, 24576 bytes, linear)
> > > > [    2.860000] UDP-Lite hash table entries: 256 (order: 1, 24576 bytes, linear)
> > > > [    2.880000] NET: Registered protocol family 1
> > > > [    2.900000] RPC: Registered named UNIX socket transport module.
> > > > [    2.900000] RPC: Registered udp transport module.
> > > > [    2.900000] RPC: Registered tcp transport module.
> > > > [    2.900000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > > [    4.120000] workingset: timestamp_bits=30 max_order=11 bucket_order=0
> > > > [    4.150000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > > > [    4.210000] printk: console [ttyS0] disabled
> > > > [    4.220000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A
> > > > [    4.230000] printk: console [ttyS0] enabled
> > > > [    4.230000] printk: console [ttyS0] enabled
> > > > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > > > [    4.230000] printk: bootconsole [ns16550a0] disabled
> > > > [    4.280000] libphy: Fixed MDIO Bus: probed
> > > > [    4.310000] mm c05fb530 mmap 00000000 seqnum 0 task_size 0
> > > > [    4.310000] get_unmapped_area 00000000
> > > > [    4.310000] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > > [    4.310000] pgd c05d2000 mm_users 2 mm_count 2 pgtables_bytes 0 map_count 0
> > > > [    4.310000] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > > [    4.310000] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > > [    4.310000] start_code c0000000 end_code c04a5208 start_data 0 end_data c06263d8
> > > > [    4.310000] start_brk 0 brk c14aa360 start_stack 0
> > > > [    4.310000] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > > [    4.310000] binfmt 00000000 flags 0 core_state 00000000
> > > > [    4.310000] exe_file 00000000
> > > > [    4.310000] tlb_flush_pending 0
> > > > [    4.310000] def_flags: 0x0()
> > > > [    4.310000] BUG: failure at include/linux/mmap_lock.h:81/mmap_assert_locked()!
> > > > [    4.310000] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.8.0-rc1-simple-smp-00002-g2335724b6a99-dirty #140
> > > > [    4.310000] Call trace:
> > > > [    4.320000] [<(ptrval)>] dump_stack+0x34/0x48
> > > > [    4.320000] [<(ptrval)>] walk_page_range+0x230/0x244
> > > > [    4.320000] [<(ptrval)>] arch_dma_set_uncached+0x38/0x60
> > > > [    4.320000] [<(ptrval)>] dma_direct_alloc_pages+0x104/0x178
> > > > [    4.320000] [<(ptrval)>] dma_direct_alloc+0x14/0x24
> > > > [    4.320000] [<(ptrval)>] dma_alloc_attrs+0x9c/0x13c
> > > > [    4.320000] [<(ptrval)>] dmam_alloc_attrs+0x80/0xf4
> > > > [    4.320000] [<(ptrval)>] ethoc_probe+0x1b0/0x884
> > > > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > > [    4.320000] [<(ptrval)>] platform_drv_probe+0x3c/0xa8
> > > > [    4.320000] [<(ptrval)>] really_probe+0x238/0x42c
> > > > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > > > [    4.320000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > > [    4.320000] [<(ptrval)>] driver_probe_device+0x50/0xa8
> > > > [    4.320000] [<(ptrval)>] ? __device_driver_lock+0x44/0x84
> > > > [    4.320000] [<(ptrval)>] device_driver_attach+0x88/0x90
> > > > [    4.320000] [<(ptrval)>] __driver_attach+0x84/0x140
> > > > [    4.320000] [<(ptrval)>] ? __driver_attach+0x0/0x140
> > > > [    4.330000] [<(ptrval)>] bus_for_each_dev+0x8c/0xd4
> > > > [    4.330000] [<(ptrval)>] driver_attach+0x28/0x38
> > > > [    4.330000] [<(ptrval)>] bus_add_driver+0x1dc/0x270
> > > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > > [    4.330000] [<(ptrval)>] driver_register+0xbc/0x1a8
> > > > [    4.330000] [<(ptrval)>] ? phy_drivers_register+0x68/0xdc
> > > > [    4.330000] [<(ptrval)>] ? ethoc_driver_init+0x0/0x30
> > > > [    4.330000] [<(ptrval)>] __platform_driver_register+0x50/0x60
> > > > [    4.330000] [<(ptrval)>] ethoc_driver_init+0x20/0x30
> > > > [    4.330000] [<(ptrval)>] do_one_initcall+0x80/0x2e0
> > > > [    4.330000] [<(ptrval)>] ? parse_args+0x270/0x4f0
> > > > [    4.330000] [<(ptrval)>] kernel_init_freeable+0x1c8/0x268
> > > > [    4.330000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x24
> > > > [    4.330000] [<(ptrval)>] ? kernel_init+0x0/0x144
> > > > [    4.330000] [<(ptrval)>] kernel_init+0x18/0x144
> > > > [    4.340000] [<(ptrval)>] ? schedule_tail+0x54/0x94
> > > > [    4.340000] [<(ptrval)>] ret_from_fork+0x1c/0x150
> > > > [    4.340000] Kernel panic - not syncing: BUG!
> > > > [    4.340000] ---[ end Kernel panic - not syncing: BUG! ]---
> > > >
> > > >
> > > > > On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> > > > > >
> > > > > > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@gmail.com wrote:
> > > > > > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> > > > > > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> > > > > > >> > Hi,
> > > > > > >> > I encountered the following issue while booting 5.8-rc1 on Qemu for RV64.
> > > > > > >> > I added additional dump_stack and observed that it's happening in bpf free path.
> > > > > > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> > > > > > >> > compiled away without that.
> > > > > > >> > ------------------------------------------------------------------------
> > > > > > >> > forked to background, child pid 113
> > > > > > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> > > > > > >> > 5.8.0-rc1-dirty #732
> > > > > > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> > > > > > >> > [   10.334133] Call Trace:
> > > > > > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
> > > > > > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
> > > > > > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
> > > > > > >> > [   10.341451] [<ffffffe0002db4ce>] mmap_assert_locked.part.13+0x14/0x1c
> > > > > > >> > [   10.342131] [<ffffffe0002db330>] walk_page_range_novma+0x0/0x4e
> > > > > > >> > [   10.342973] [<ffffffe000204f94>] set_direct_map_invalid_noflush+0x66/0x6e
> > > > > > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
> > > > > > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
> > > > > > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
> > > > > > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
> > > > > > >> > [   10.346529] [<ffffffe0002801a2>] bpf_prog_free_deferred+0x74/0x8a
> > > > > > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
> > > > > > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
> > > > > > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
> > > > > > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
> > > > > > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 seqnum 0 task_size 0
> > > > > > >> > [   10.354405] get_unmapped_area 0000000000000000
> > > > > > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> > > > > > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
> > > > > > >> > pgtables_bytes 8192 map_count 0
> > > > > > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> > > > > > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> > > > > > >> > [   10.354405] start_code ffffffe000200000 end_code ffffffe00084acc2
> > > > > > >> > start_data 0 end_data ffffffe00106dfe4
> > > > > > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
> > > > > > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> > > > > > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 0000000000000000
> > > > > > >> > [   10.354405] ioctx_table 0000000000000000
> > > > > > >> > [   10.354405] exe_file 0000000000000000
> > > > > > >> > [   10.354405] tlb_flush_pending 0
> > > > > > >> > [   10.354405] def_flags: 0x0()
> > > > > > >> > [   10.369325] ------------[ cut here ]------------
> > > > > > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> > > > > > >> > [   10.375235] Kernel BUG [#1]
> > > > > > >> > [   10.377198] Modules linked in:
> > > > > > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 5.8.0-rc1-dirty #732
> > > > > > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> > > > > > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp : ffffffe3eaea7c70
> > > > > > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
> > > > > > >> > ffffffe0010836e8
> > > > > > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
> > > > > > >> > ffffffe3eaea7c90
> > > > > > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
> > > > > > >> > 0000000000000020
> > > > > > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
> > > > > > >> > ffffffe001012758
> > > > > > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
> > > > > > >> > 0000000000000006
> > > > > > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
> > > > > > >> > ffffffe00106e100
> > > > > > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
> > > > > > >> > 6db6db6db6db6db7
> > > > > > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
> > > > > > >> > ffffffffffffffff
> > > > > > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
> > > > > > >> > 000000000001a7a8
> > > > > > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
> > > > > > >> > [   10.393096] status: 0000000000000120 badaddr: 0000000000000000
> > > > > > >> > cause: 0000000000000003
> > > > > > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
> > > > > > >> > ---------------------------------------------------------------------------------------------------
> > > > > > >> >
> > > > > > >> > I haven't had the chance to bisect to figure out which commit caused
> > > > > > >> > the issue. Just wanted
> > > > > > >> > to check if it is a known issue already.
> > > > > > >>
> > > > > > >> Hi Atish,
> > > > > > >>
> > > > > > >> Note, I am getting the same (just now) when booting v5.8-rc1 on OpenRISC.  If
> > > > > > >> you have any updates please post back.  I will try to look into this today or
> > > > > > >> tomorrow.
> > > > > > >
> > > > > > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
> > > > > > >
> > > > > > >   mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
> > > > > > >
> > > > > > > This should have just changed the existing lockdep api's but something has
> > > > > > > changed.  I haven't had time to look at it yet.
> > > > > > >
> > > > > > > Ccing: Michel Lespinasse <walken@google.com>
> > > > > >
> > > > > > This isn't manifesting on boot for me, on either rc1 or that commit.  I'm
> > > > > > running a simple buildroot-based userspace, so I doubt anything is triggering
> > > > > > BPF.  I don't run the BPF selftests, as they're a bit of a pain (IIRC they
> > > > > > don't cross compile and need LLVM) -- does anyone have a userspace I can use to
> > > > > > trigger the bug?
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Michel "Walken" Lespinasse
> > > > > A program is never fully debugged until the last user dies.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
> > 
> > 
> > 
> > -- 
> > Michel "Walken" Lespinasse
> > A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-17  6:07                 ` Stafford Horne
@ 2020-06-17  6:29                   ` Michel Lespinasse
  2020-06-17  7:34                     ` Atish Patra
  0 siblings, 1 reply; 18+ messages in thread
From: Michel Lespinasse @ 2020-06-17  6:29 UTC (permalink / raw)
  To: Stafford Horne
  Cc: Atish Patra, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne <shorne@gmail.com> wrote:
> On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > > This makes me wonder actually - maybe there is a latent bug that got
> > > exposed after my change added the rwsem_is_locked assertion to the
> > > lockdep_assert_held one. If that is the case, it may be helpful to
> > > bisect when that issue first appeared, by testing before my patchset
> > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> >
> > Hello,
> >
> > I tried to bisect it, but I think this issue goes much further back.
> >
> > Just with the below patch booting fails all the way back to v5.7.
> >
> > What does this mean by they way, why would mmap_assert_locked() want to assert
> > that the rwsem_is_locked() is not true?

It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in
other words it asserts that cond is false. Yeah, I agree it is kinda
confusing. But in our case, it asserts that the rwsem is locked, which
is what we want.

> The openrisc code that was walking the page ranges was not locking mm. I have
> added the  below patch to v5.8-rc1 and it seems to work fine.  I will send a
> better patch in a bit.
>
> iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> index c152a68811dd..bd5f05dd9174 100644
> --- a/arch/openrisc/kernel/dma.c
> +++ b/arch/openrisc/kernel/dma.c
> @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
>          * We need to iterate through the pages, clearing the dcache for
>          * them and setting the cache-inhibit bit.
>          */
> +       mmap_read_lock(&init_mm);
>         error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
>                         NULL);
> +       mmap_read_unlock(&init_mm);
>         if (error)
>                 return ERR_PTR(error);
>         return cpu_addr;
> @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
>  {
>         unsigned long va = (unsigned long)cpu_addr;
>
> +       mmap_read_lock(&init_mm);
>         /* walk_page_range shouldn't be able to fail here */
>         WARN_ON(walk_page_range(&init_mm, va, va + size,
>                         &clear_nocache_walk_ops, NULL));
> +       mmap_read_unlock(&init_mm);
>  }

Thanks a lot for getting to the bottom of this. I think this is the proper fix.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
  2020-06-17  6:29                   ` Michel Lespinasse
@ 2020-06-17  7:34                     ` Atish Patra
  0 siblings, 0 replies; 18+ messages in thread
From: Atish Patra @ 2020-06-17  7:34 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Stafford Horne, Palmer Dabbelt, linux-riscv, LKML, Bjorn Topel

On Tue, Jun 16, 2020 at 11:30 PM Michel Lespinasse <walken@google.com> wrote:
>
> On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne <shorne@gmail.com> wrote:
> > On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> > > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > > > This makes me wonder actually - maybe there is a latent bug that got
> > > > exposed after my change added the rwsem_is_locked assertion to the
> > > > lockdep_assert_held one. If that is the case, it may be helpful to
> > > > bisect when that issue first appeared, by testing before my patchset
> > > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> > >
> > > Hello,
> > >
> > > I tried to bisect it, but I think this issue goes much further back.
> > >
> > > Just with the below patch booting fails all the way back to v5.7.
> > >
> > > What does this mean by they way, why would mmap_assert_locked() want to assert
> > > that the rwsem_is_locked() is not true?
>
> It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in
> other words it asserts that cond is false. Yeah, I agree it is kinda
> confusing. But in our case, it asserts that the rwsem is locked, which
> is what we want.
>
> > The openrisc code that was walking the page ranges was not locking mm. I have
> > added the  below patch to v5.8-rc1 and it seems to work fine.  I will send a
> > better patch in a bit.
> >
> > iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> > index c152a68811dd..bd5f05dd9174 100644
> > --- a/arch/openrisc/kernel/dma.c
> > +++ b/arch/openrisc/kernel/dma.c
> > @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
> >          * We need to iterate through the pages, clearing the dcache for
> >          * them and setting the cache-inhibit bit.
> >          */
> > +       mmap_read_lock(&init_mm);
> >         error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
> >                         NULL);
> > +       mmap_read_unlock(&init_mm);
> >         if (error)
> >                 return ERR_PTR(error);
> >         return cpu_addr;
> > @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
> >  {
> >         unsigned long va = (unsigned long)cpu_addr;
> >
> > +       mmap_read_lock(&init_mm);
> >         /* walk_page_range shouldn't be able to fail here */
> >         WARN_ON(walk_page_range(&init_mm, va, va + size,
> >                         &clear_nocache_walk_ops, NULL));
> > +       mmap_read_unlock(&init_mm);
> >  }
>
> Thanks a lot for getting to the bottom of this. I think this is the proper fix.

A similar patch works for RISC-V as well. Thanks for debugging it.
To sum it up, mm should be locked before walk_page_range and
walk_page_range_novma.

Here is a diff that works for RISC-V. I will send the patch soon.

diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index ec2c70f84994..289a9a5ea5b5 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -151,6 +151,7 @@ int set_memory_nx(unsigned long addr, int numpages)

 int set_direct_map_invalid_noflush(struct page *page)
 {
+       int ret;
        unsigned long start = (unsigned long)page_address(page);
        unsigned long end = start + PAGE_SIZE;
        struct pageattr_masks masks = {
@@ -158,11 +159,16 @@ int set_direct_map_invalid_noflush(struct page *page)
                .clear_mask = __pgprot(_PAGE_PRESENT)
        };

-       return walk_page_range(&init_mm, start, end, &pageattr_ops, &masks);
+       mmap_read_lock(&init_mm);
+       ret = walk_page_range(&init_mm, start, end, &pageattr_ops, &masks);
+       mmap_read_unlock(&init_mm);
+
+       return ret;
 }

 int set_direct_map_default_noflush(struct page *page)
 {
+       int ret;
        unsigned long start = (unsigned long)page_address(page);
        unsigned long end = start + PAGE_SIZE;
        struct pageattr_masks masks = {
@@ -170,7 +176,11 @@ int set_direct_map_default_noflush(struct page *page)
                .clear_mask = __pgprot(0)
        };

-       return walk_page_range(&init_mm, start, end, &pageattr_ops, &masks);
+       mmap_read_lock(&init_mm);
+       ret = walk_page_range(&init_mm, start, end, &pageattr_ops, &masks);
+       mmap_read_unlock(&init_mm);
+
+       return ret;
 }



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-17  7:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15  7:28 mm lock issue while booting Linux on 5.8-rc1 for RISC-V Atish Patra
2020-06-15 21:57 ` Stafford Horne
2020-06-16  4:51   ` Stafford Horne
2020-06-16  8:40     ` Palmer Dabbelt
2020-06-16 10:44       ` Michel Lespinasse
2020-06-16 17:54         ` Atish Patra
2020-06-16 18:07           ` Palmer Dabbelt
2020-06-16 18:08             ` Atish Patra
2020-06-16 18:15               ` Palmer Dabbelt
2020-06-16 18:21                 ` Atish Patra
2020-06-16 19:18             ` Alex Ghiti
2020-06-16 19:19         ` Stafford Horne
2020-06-16 19:41           ` Atish Patra
2020-06-16 20:47             ` Michel Lespinasse
2020-06-17  5:35               ` Stafford Horne
2020-06-17  6:07                 ` Stafford Horne
2020-06-17  6:29                   ` Michel Lespinasse
2020-06-17  7:34                     ` Atish Patra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).