linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linux-next: runtime failure of next-20190603
@ 2019-06-03  6:22 Stephen Rothwell
  2019-06-03 21:07 ` Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Rothwell @ 2019-06-03  6:22 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

Hi all,

My qemu powerpc_pseries_le_defconfig boots failed today with the following
output at shutdown time:

# halt
# Stopping network...[   32.109495] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ifdown:97]
[   32.109794] Modules linked in:
[   32.110367] CPU: 0 PID: 97 Comm: ifdown Not tainted 5.2.0-rc3 #2
[   32.110577] NIP:  c000000000bbeb38 LR: c000000000bbf0a0 CTR: 0000000000000000
[   32.110732] REGS: c00000007c50f4d0 TRAP: 0901   Not tainted  (5.2.0-rc3)
[   32.110813] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24422422  XER: 20000000
[   32.111162] CFAR: c000000000bbf09c IRQMASK: 0 
[   32.111162] GPR00: c0000000002fa540 c00000007c50f760 c00000000119c100 c00000007c50f7b8 
[   32.111162] GPR04: ffffffffffffffff 000000000000000f c00000007c50f8f0 0000000000000003 
[   32.111162] GPR08: c00000007d468450 0000000000000003 c00c000000006834 c00000007ffe7000 
[   32.111162] GPR12: c000000001268a90 c000000001380000 
[   32.112430] NIP [c000000000bbeb38] xas_load+0x8/0xd0
[   32.112506] LR [c000000000bbf0a0] xas_find+0x1b0/0x260
[   32.112737] Call Trace:
[   32.112859] [c00000007c50f760] [c000000000bbf0a0] xas_find+0x1b0/0x260 (unreliable)
[   32.113082] [c00000007c50f790] [c0000000002fa540] find_get_entries+0x2e0/0x350
[   32.113204] [c00000007c50f840] [c00000000031fe08] shmem_undo_range+0x2d8/0xa90
[   32.113292] [c00000007c50f9f0] [c0000000003205ec] shmem_truncate_range+0x2c/0x60
[   32.113381] [c00000007c50fa20] [c000000000320ca4] shmem_setattr+0x3a4/0x3e0
[   32.113467] [c00000007c50fa80] [c00000000040e600] notify_change+0x2d0/0x5c0
[   32.113596] [c00000007c50fae0] [c0000000003d8f7c] do_truncate+0x9c/0x140
[   32.113672] [c00000007c50fb70] [c0000000003f66d8] path_openat+0x848/0x1660
[   32.113747] [c00000007c50fc70] [c0000000003f900c] do_filp_open+0x8c/0x120
[   32.113816] [c00000007c50fda0] [c0000000003da9e8] do_sys_open+0x248/0x350
[   32.113956] [c00000007c50fe20] [c00000000000bce4] system_call+0x5c/0x70
[   32.114123] Instruction dump:
[   32.114508] 4e800020 60000000 3900c005 7fa94040 409dff60 7d291674 2f890000 419eff54 
[   32.114651] 38600000 4e800020 3c4c005e 3842d5d0 <7c0802a6> fbe1fff8 7c7f1b78 f8010010 

Reverting commit

  fa858b6eec3f ("XArray: Add xas_replace")

made the failure go away.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: linux-next: runtime failure of next-20190603
  2019-06-03  6:22 linux-next: runtime failure of next-20190603 Stephen Rothwell
@ 2019-06-03 21:07 ` Dave Hansen
  2019-06-03 22:59   ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2019-06-03 21:07 UTC (permalink / raw)
  To: Stephen Rothwell, Matthew Wilcox
  Cc: Linux Next Mailing List, Linux Kernel Mailing List

On 6/2/19 11:22 PM, Stephen Rothwell wrote:
> My qemu powerpc_pseries_le_defconfig boots failed today with the
> following output at shutdown time:
...
> [   32.112430] NIP [c000000000bbeb38] xas_load+0x8/0xd0
...
> Reverting commit
> 
> fa858b6eec3f ("XArray: Add xas_replace")
> 
> made the failure go away.

I'm seeing a similar softlockup:

> [124384.345395] watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> [TaskSchedulerFo:22804] [124384.345405] Modules linked in: bridge
> stp llc ctr ccm hid_logitech_hidpp hid_logitech_dj xt_MASQUERADE
> rfcomm hid_generic usbhid hid bnep iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables bpfilter arc4
> iwlmvm intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal mac80211
> wmi_bmof coretemp snd_hda_codec_realtek snd_hda_codec_generic
> ghash_clmulni_intel snd_hda_intel snd_hda_codec aesni_intel
> snd_hwdep aes_x86_64 snd_hda_core glue_helper crypto_simd
> thinkpad_acpi cryptd snd_pcm nvram iwlwifi btusb ledtrig_audio
> btrtl snd_seq_midi btbcm snd_seq_midi_event btintel snd_rawmidi
> bluetooth snd_seq snd_timer snd_seq_device ecdh_generic cfg80211
> ecc snd joydev soundcore kvm_intel mac_hid wmi kvm irqbypass
> squashfs zstd_decompress lz4_decompress netconsole rtsx_pci_sdmmc
> rtsx_pci [124384.345426] CPU: 1 PID: 22804 Comm: TaskSchedulerFo
> Not tainted 5.2.0-rc2 #14 [124384.345427] Hardware name: LENOVO
> 20F5S7V800/20F5S7V800, BIOS R02ET70W (1.43 ) 01/28/2019 
> [124384.345431] RIP: 0010:xas_load+0x2c/0x80 [124384.345432] Code:
> 89 fb e8 67 ff ff ff eb 5b 48 3d 00 10 00 00 76 5f 0f b6 48 fe 48
> 8d 70 fe 38 4b 10 77 52 48 8b 53 08 48 d3 ea 83 e2 3f 89 d0 <48> 8d
> 44 c6 20 48 8b 40 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 
> [124384.345433] RSP: 0018:ffffc900095f3a70 EFLAGS: 00000206
> ORIG_RAX: ffffffffffffff13 [124384.345434] RAX: 0000000000000022
> RBX: ffffc900095f3a80 RCX: 0000000000000006 [124384.345435] RDX:
> 0000000000000022 RSI: ffff888085eb4490 RDI: ffffc900095f3a80 
> [124384.345435] RBP: 00000000001dc8b0 R08: 0000000000000001 R09:
> ffff8884216fab80 [124384.345436] R10: ffff8884216fa000 R11:
> ffff8884216fa000 R12: 0000000000000000 [124384.345437] R13:
> ffff88840a006bd8 R14: 00000000001dc8b0 R15: ffff88810cc12580 
> [124384.345437] FS:  00007f1aa5b77700(0000)
> GS:ffff888411880000(0000) knlGS:0000000000000000 [124384.345438]
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [124384.345439]
> CR2: 00007f1ae3ffd9d0 CR3: 00000003e925c001 CR4: 00000000003626e0 
> [124384.345439] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000 [124384.345440] DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400 [124384.345440] Call Trace: 
> [124384.345464]  find_get_entry+0x74/0x1a0 [124384.345466]
> pagecache_get_page+0x27/0x250 [124384.345467]
> __try_to_reclaim_swap.isra.38+0x47/0xe0 [124384.345469]
> free_swap_and_cache+0x6e/0x70 [124384.345470]
> unmap_page_range+0x444/0xa50 [124384.345472]  unmap_vmas+0x81/0xe0 
> [124384.345474]  exit_mmap+0xab/0x1a0 [124384.345477]
> mmput+0x5d/0x130 [124384.345478]  do_exit+0x2af/0xbf0 
> [124384.345480]  do_group_exit+0x3d/0xb0 [124384.345481]
> get_signal+0x12d/0x8b0 [124384.345483]  do_signal+0x36/0x6a0 
> [124384.345485]  ? __might_fault+0x2b/0x30 [124384.345486]  ?
> _copy_from_user+0x5b/0x90 [124384.345488]  ?
> exit_to_usermode_loop+0x4a/0xb0 [124384.345489]
> exit_to_usermode_loop+0x62/0xb0 [124384.345507]
> do_syscall_64+0xfc/0x120 [124384.345508]
> entry_SYSCALL_64_after_hwframe+0x49/0xbe

I saw this over a period of time and caught a bunch of different
softlokup messages.  They almost all appear under xas_load, though:

 RIP: 0010:xas_load+0x13/0x80
 RIP: 0010:xas_load+0x17/0x80
 RIP: 0010:xas_load+0x1b/0x80
 RIP: 0010:xas_load+0x27/0x80
 RIP: 0010:xas_load+0x2c/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x45/0x80
 RIP: 0010:xas_load+0x66/0x80
 RIP: 0010:xas_load+0xb/0x80
 RIP: 0010:xas_start+0x45/0x90

So it seems like it's actively spinning in a fairly big loop since
it's hitting a bunch of different places.

I only hit this once, though.  It's not easily reproducible for me.  I
haven't tried the above revert.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: linux-next: runtime failure of next-20190603
  2019-06-03 21:07 ` Dave Hansen
@ 2019-06-03 22:59   ` Matthew Wilcox
  2019-06-26 18:01     ` xarray soft lockups on 5.2-rc's Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2019-06-03 22:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Stephen Rothwell, Linux Next Mailing List, Linux Kernel Mailing List

On Mon, Jun 03, 2019 at 02:07:26PM -0700, Dave Hansen wrote:
> On 6/2/19 11:22 PM, Stephen Rothwell wrote:
> > My qemu powerpc_pseries_le_defconfig boots failed today with the
> > following output at shutdown time:
> ...
> > [   32.112430] NIP [c000000000bbeb38] xas_load+0x8/0xd0
> ...
> > Reverting commit
> > 
> > fa858b6eec3f ("XArray: Add xas_replace")
> > 
> > made the failure go away.
> 
> I'm seeing a similar softlockup:

I've taken that commit out of my xarray tree.  It's only a performance
optimisation, so I don't mind dropping it until it's debugged.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* xarray soft lockups on 5.2-rc's
  2019-06-03 22:59   ` Matthew Wilcox
@ 2019-06-26 18:01     ` Dave Hansen
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Hansen @ 2019-06-26 18:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Stephen Rothwell, Linux Next Mailing List, Linux Kernel Mailing List

On 6/3/19 3:59 PM, Matthew Wilcox wrote:
>> I'm seeing a similar softlockup:
> I've taken that commit out of my xarray tree.  It's only a performance
> optimisation, so I don't mind dropping it until it's debugged.

I'm also seeing these on mainline pretty regularly (~once a week).  The
latest was on -rc4, but the xarray code doesn't look like it's been
updated in mainline since 5.1.

I only seem to see these for the swap cache path, never the page cache
path, which makes me suspicious (typical dump is below).  The stack is
always the same path, but the exact instruction pointer address seems to
bounce around between find_get_entry() and deeper into the xarray code.

find_get_entry() seems to have some recent changes, and this patch of
backtraces would be consistent with it getting stuck in a "goto repeat"
loop in there.

     12 RIP: 0010:xas_load+0x35/0x80
      9 RIP: 0010:xas_load+0x27/0x80
      7 RIP: 0010:xas_load+0x13/0x80
      4 RIP: 0010:xas_load+0x66/0x80
      4 RIP: 0010:xas_load+0x45/0x80
      2 RIP: 0010:xas_start+0x0/0x90
      2 RIP: 0010:xas_load+0x3f/0x80
      2 RIP: 0010:xas_load+0x31/0x80
      2 RIP: 0010:xas_load+0x20/0x80
      2 RIP: 0010:xas_load+0x0/0x80
      2 RIP: 0010:find_get_entry+0x87/0x1a0
      2 RIP: 0010:find_get_entry+0x74/0x1a0
      1 RIP: 0010:xas_start+0x57/0x90
      1 RIP: 0010:xas_start+0x42/0x90
      1 RIP: 0010:xas_load+0xb/0x80
      1 RIP: 0010:xas_load+0x72/0x80
      1 RIP: 0010:xas_load+0x5e/0x80
      1 RIP: 0010:xas_load+0x43/0x80
      1 RIP: 0010:xas_load+0x39/0x80
      1 RIP: 0010:xas_load+0x24/0x80
      1 RIP: 0010:find_get_entry+0x63/0x1a0
      1 RIP: 0010:dev_watchdog+0x273/0x280


> CPU: 2 PID: 1025 Comm: TaskSchedulerFo Not tainted 5.2.0-rc4 #15
> Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET70W (1.43 ) 01/28/2019
> RIP: 0010:xas_start+0x0/0x90
> Code: 48 85 d2 75 e3 48 8b 17 b8 00 00 80 00 89 f1 d3 e0 8b 7a 18 85 c7 75 07 09 f8 89 42 18 f3 c3 f3 c3 f3 c3 0f 1f 80 00 00 00 00 <48> 8b 47 18 48 89 c2 83 e2 03 74 19 48 83 fa 02 75 26 48 3d 05 c0
> RSP: 0000:ffffc9000996fcf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000000 RBX: ffffc9000996fd08 RCX: 0000000000000000 
> RDX: 0000000000000021 RSI: ffff888168184490 RDI: ffffc9000996fd08
> RBP: 000000000027db21 R08: 000fffffffe00000 R09: ffff8884216fa5c0 
> R10: ffff8884216fa000 R11: 0000000000000822 R12: 0000000000000000
> R13: ffff888409fc9018 R14: 000000000027db21 R15: 0000000000000001 
> FS:  00007f4f44d01700(0000) GS:ffff888411900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4f4cb21048 CR3: 000000029751a003 CR4: 00000000003626e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  xas_load+0x9/0x80
>  find_get_entry+0x74/0x1a0
>  pagecache_get_page+0x27/0x250
>  lookup_swap_cache+0x40/0x130
>  do_swap_page+0x84/0x860
>  __handle_mm_fault+0x865/0xfb0
>  handle_mm_fault+0x103/0x210
>  __do_page_fault+0x2f5/0x530
>  ? page_fault+0x8/0x30
>  page_fault+0x1e/0x30
> RIP: 0033:0x7f4ff44cb836
> Code: 39 f7 0f 84 48 fd ff ff 44 89 ca 83 4b 04 04 e9 f3 fe ff ff 48 8b 53 68 49 8d 04 2f 4c 39 62 18 75 30 48 81 3c 24 ff 03 00 00 <4c> 89 60 18 48 89 50 10 48 89 43 68 48 89 42 18 0f 86 55 fe ff ff
> RSP: 002b:00007f4f44cffa50 EFLAGS: 00010212 
> RAX: 00007f4f4cb21030 RBX: 00007f4f4c000020 RCX: 00007f4f4c75c5f0
> RDX: 00007f4f4c000078 RSI: 00000000000030f1 RDI: 00000000007117f0 
> RBP: 00000000001eda40 R08: 00007f4f4c350370 R09: 0000000004422604
> R10: 0000000000184cb5 R11: 0000000000000000 R12: 00007f4f4c000078
> R13: 00007f4f4c75c5f0 R14: 00000000008ff230 R15: 00007f4f4c9335f0
> watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [TaskSchedulerFo:1025]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-06-26 18:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03  6:22 linux-next: runtime failure of next-20190603 Stephen Rothwell
2019-06-03 21:07 ` Dave Hansen
2019-06-03 22:59   ` Matthew Wilcox
2019-06-26 18:01     ` xarray soft lockups on 5.2-rc's Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).