LKML Archive on lore.kernel.org
 help / Atom feed
* Kernel 4.17.4 lockup
@ 2018-07-08 21:36 H.J. Lu
  2018-07-09 14:54 ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-08 21:36 UTC (permalink / raw)
  To: H. Peter Anvin, Matthew Wilcox, LKML

On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of them don't
have any kernel messages.  One has

Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
fault: 0000 [#1] SMP PTI
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink ebtable_filter
ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal
intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek
kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec
snd_hda_core snd_hwdep irqbypass crct10dif_pclmul crc32_pclmul snd_seq
mei_wdt ghash_clmulni_intel snd_seq_device intel_cstate ppdev
intel_uncore iTCO_wdt gpio_ich iTCO_vendor_support snd_pcm
intel_rapl_perf snd_timer snd mei_me parport_pc joydev i2c_i801 mei
soundcore shpchp lpc_ich parport nfsd auth_rpcgss nfs_acl lockd grace
sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii
video
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CPU: 7 PID: 7093 Comm:
cc1 Not tainted 4.17.4-200.0.fc28.x86_64 #1
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Hardware name: Gigabyte
Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP:
0010:free_pages_and_swap_cache+0x29/0xb0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP:
0018:ffffb2cd83ffbd58 EFLAGS: 00010202
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: 0017fffe00040068
RBX: ffff93d4abb5ec80 RCX: 0000000000000000
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0017fffe00040068
RSI: 00000000000001fe RDI: ffff93d51e3dd2a0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 00000000000001fe
R08: fffff0809df82d20 R09: ffff93d51e5d5000
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: ffff93d51e5d5e20
R11: ffff93d51e5d5d00 R12: ffff93d4abb5e010
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: fffbf0809e304bc0
R14: ffff93d4abb5f000 R15: ffff93d4cbcee8f0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: FS:
0000000000000000(0000) GS:ffff93d51e3c0000(0000)
knlGS:0000000000000000
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CR2: 00007ffb255e753c
CR3: 00000005e820a002 CR4: 00000000001606e0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Call Trace:
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_flush_mmu_free+0x31/0x50
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  arch_tlb_finish_mmu+0x42/0x70
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_finish_mmu+0x1f/0x30
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  exit_mmap+0xca/0x190
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  mmput+0x5f/0x130
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_exit+0x280/0xae0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  ? __do_page_fault+0x263/0x4e0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_group_exit+0x3a/0xa0
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  __x64_sys_exit_group+0x14/0x20
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_syscall_64+0x65/0x160
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0033:0x7ffb2542b3c6
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP:
002b:00007ffd9e7e33b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: ffffffffffffffda
RBX: 00007ffb2551c740 RCX: 00007ffb2542b3c6
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0000000000000000
RSI: 000000000000003c RDI: 0000000000000000
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 0000000000000000
R08: 00000000000000e7 R09: fffffffffffffe70
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: 00007ffd9e7e3250
R11: 0000000000000246 R12: 00007ffb2551c740
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: 0000000000000037
R14: 00007ffb25525708 R15: 0000000000000000
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Code: 40 00 0f 1f 44 00
00 41 56 41 55 41 54 49 89 fc 55 89 f5 53 e8 29 99 fb ff 85 ed 7e 6b
8d 45 ff 4c 89 e3 4d 8d 74 c4 08 4c 8b 2b <49> 8b 55 20 48 8d 42 ff 83
e2 01 49 0f 44 c5 48 8b 48 20 48 8d
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP:
free_pages_and_swap_cache+0x29/0xb0 RSP: ffffb2cd83ffbd58
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ---[ end trace
5960277fd8a3c0b5 ]---
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Fixing recursive fault
but reboot is needed!

Kernel 4.16.x is OK.  Is this a known issue?

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-08 21:36 Kernel 4.17.4 lockup H.J. Lu
@ 2018-07-09 14:54 ` Dave Hansen
  2018-07-09 23:02   ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-09 14:54 UTC (permalink / raw)
  To: H.J. Lu, H. Peter Anvin, Matthew Wilcox, LKML

On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
them don't  have any kernel messages.  One has

Hi H.J.,

It'd be really handy if you could pastebin things like this, or attach a
text file with the oops. Your email wrapped the heck out of the oops and
I had to go and unwrap it to read it.

A full disassembly of free_pages_and_swap_cache() from the actual
vmlinux to account for differences between toolchains would be helpful.
It'll probably help me figure out what the loop counter was for
instance.  It makes it a bit easier to read random oopses if you boot
with 'nokaslr' because the pointer types (kernel text, linear map,
vmemmap, etc...) stick out much more easily. Would you be able to boot
with it in the future?

We've had a bit of churn in that code, but nothing between 4.16 and 4.17
that really sticks out to me in the x86 code.

The general protection fault is a bit of an oddball. If I disassembled
right, it's trying to dereference %R13+20. That doesn't even cross a
page boundary, so it's a bit hard to fathom where the #GP would come from.

(mostly) unwrapped oops below.

>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
>> fault: 0000 [#1] SMP PTI
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink ebtable_filter
>> ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek
>> kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec
>> snd_hda_core snd_hwdep irqbypass crct10dif_pclmul crc32_pclmul snd_seq
>> mei_wdt ghash_clmulni_intel snd_seq_device intel_cstate ppdev
>> intel_uncore iTCO_wdt gpio_ich iTCO_vendor_support snd_pcm
>> intel_rapl_perf snd_timer snd mei_me parport_pc joydev i2c_i801 mei
>> soundcore shpchp lpc_ich parport nfsd auth_rpcgss nfs_acl lockd grace
>> sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii
>> video
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CPU: 7 PID: 7093 Comm:
>> cc1 Not tainted 4.17.4-200.0.fc28.x86_64 #1
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Hardware name: Gigabyte
>> Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0010:free_pages_and_swap_cache+0x29/0xb0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 0018:ffffb2cd83ffbd58 EFLAGS: 00010202
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: 0017fffe00040068 RBX: ffff93d4abb5ec80 RCX: 0000000000000000
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0017fffe00040068 RSI: 00000000000001fe RDI: ffff93d51e3dd2a0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 00000000000001fe R08: fffff0809df82d20 R09: ffff93d51e5d5000
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: ffff93d51e5d5e20 R11: ffff93d51e5d5d00 R12: ffff93d4abb5e010
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: fffbf0809e304bc0 R14: ffff93d4abb5f000 R15: ffff93d4cbcee8f0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: FS: 0000000000000000(0000) GS:ffff93d51e3c0000(0000) knlGS:0000000000000000
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CR2: 00007ffb255e753c CR3: 00000005e820a002 CR4: 00000000001606e0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Call Trace: 
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_flush_mmu_free+0x31/0x50
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  arch_tlb_finish_mmu+0x42/0x70
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_finish_mmu+0x1f/0x30
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  exit_mmap+0xca/0x190
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  mmput+0x5f/0x130
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_exit+0x280/0xae0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  ? __do_page_fault+0x263/0x4e0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_group_exit+0x3a/0xa0
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  __x64_sys_exit_group+0x14/0x20
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_syscall_64+0x65/0x160
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0033:0x7ffb2542b3c6
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 002b:00007ffd9e7e33b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: ffffffffffffffda RBX: 00007ffb2551c740 RCX: 00007ffb2542b3c6
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: 00007ffd9e7e3250 R11: 0000000000000246 R12: 00007ffb2551c740
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: 0000000000000037 R14: 00007ffb25525708 R15: 0000000000000000
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Code: 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 fc 55 89 f5 53 e8 29 99 fb ff 85 ed 7e 6b 8d 45 ff 4c 89 e3 4d 8d 74 c4 08 4c 8b 2b <49> 8b 55 20 48 8d 42 ff 83 e2 01 49 0f 44 c5 48 8b 48 20 48 8d
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: free_pages_and_swap_cache+0x29/0xb0 RSP: ffffb2cd83ffbd58
>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ---[ end trace 5960277fd8a3c0b5 ]---

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-09 14:54 ` Dave Hansen
@ 2018-07-09 23:02   ` H.J. Lu
  2018-07-10  0:44     ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-09 23:02 UTC (permalink / raw)
  To: Dave Hansen; +Cc: H. Peter Anvin, Matthew Wilcox, LKML

On Mon, Jul 9, 2018 at 7:54 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
>> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
> them don't  have any kernel messages.  One has
>
> Hi H.J.,
>
> It'd be really handy if you could pastebin things like this, or attach a
> text file with the oops. Your email wrapped the heck out of the oops and
> I had to go and unwrap it to read it.
>
> A full disassembly of free_pages_and_swap_cache() from the actual
> vmlinux to account for differences between toolchains would be helpful.
> It'll probably help me figure out what the loop counter was for
> instance.  It makes it a bit easier to read random oopses if you boot
> with 'nokaslr' because the pointer types (kernel text, linear map,
> vmemmap, etc...) stick out much more easily. Would you be able to boot
> with it in the future?

I will do that if it happens again.

> We've had a bit of churn in that code, but nothing between 4.16 and 4.17
> that really sticks out to me in the x86 code.
>
> The general protection fault is a bit of an oddball. If I disassembled
> right, it's trying to dereference %R13+20. That doesn't even cross a

This is correct.

> page boundary, so it's a bit hard to fathom where the #GP would come from.

(gdb) disass free_pages_and_swap_cache
Dump of assembler code for function free_pages_and_swap_cache:
   0xffffffff8124c0d0 <+0>: callq  0xffffffff81a017a0 <__fentry__>
   0xffffffff8124c0d5 <+5>: push   %r14
   0xffffffff8124c0d7 <+7>: push   %r13
   0xffffffff8124c0d9 <+9>: push   %r12
   0xffffffff8124c0db <+11>: mov    %rdi,%r12
   0xffffffff8124c0de <+14>: push   %rbp
   0xffffffff8124c0df <+15>: mov    %esi,%ebp
   0xffffffff8124c0e1 <+17>: push   %rbx
   0xffffffff8124c0e2 <+18>: callq  0xffffffff81205a10 <lru_add_drain>
   0xffffffff8124c0e7 <+23>: test   %ebp,%ebp
   0xffffffff8124c0e9 <+25>: jle    0xffffffff8124c156
<free_pages_and_swap_cache+134>
   0xffffffff8124c0eb <+27>: lea    -0x1(%rbp),%eax
   0xffffffff8124c0ee <+30>: mov    %r12,%rbx
   0xffffffff8124c0f1 <+33>: lea    0x8(%r12,%rax,8),%r14
   0xffffffff8124c0f6 <+38>: mov    (%rbx),%r13
   0xffffffff8124c0f9 <+41>: mov    0x20(%r13),%rdx
<<<<<<<<<<<<<<<<<<<< GPF here.
   0xffffffff8124c0fd <+45>: lea    -0x1(%rdx),%rax
   0xffffffff8124c101 <+49>: and    $0x1,%edx
   0xffffffff8124c104 <+52>: cmove  %r13,%rax
   0xffffffff8124c108 <+56>: mov    0x20(%rax),%rcx
   0xffffffff8124c10c <+60>: lea    -0x1(%rcx),%rdx
   0xffffffff8124c110 <+64>: and    $0x1,%ecx
   0xffffffff8124c113 <+67>: cmove  %rax,%rdx
   0xffffffff8124c117 <+71>: mov    (%rdx),%rdx
   0xffffffff8124c11a <+74>: test   $0x40000,%edx
   0xffffffff8124c120 <+80>: je     0xffffffff8124c14d
<free_pages_and_swap_cache+125>
   0xffffffff8124c122 <+82>: mov    (%rax),%rax
   0xffffffff8124c125 <+85>: test   $0x2,%ah
   0xffffffff8124c128 <+88>: je     0xffffffff8124c14d
<free_pages_and_swap_cache+125>
   0xffffffff8124c12a <+90>: mov    %r13,%rdi
   0xffffffff8124c12d <+93>: callq  0xffffffff81218260 <page_mapped>
   0xffffffff8124c132 <+98>: test   %al,%al
   0xffffffff8124c134 <+100>: jne    0xffffffff8124c14d
<free_pages_and_swap_cache+125>
   0xffffffff8124c136 <+102>: mov    0x20(%r13),%rdx
   0xffffffff8124c13a <+106>: lea    -0x1(%rdx),%rax
   0xffffffff8124c13e <+110>: and    $0x1,%edx
   0xffffffff8124c141 <+113>: cmove  %r13,%rax
   0xffffffff8124c145 <+117>: lock btsq $0x0,(%rax)
   0xffffffff8124c14b <+123>: jae    0xffffffff8124c168
<free_pages_and_swap_cache+152>
   0xffffffff8124c14d <+125>: add    $0x8,%rbx
   0xffffffff8124c151 <+129>: cmp    %rbx,%r14
   0xffffffff8124c154 <+132>: jne    0xffffffff8124c0f6
<free_pages_and_swap_cache+38>
   0xffffffff8124c156 <+134>: pop    %rbx
   0xffffffff8124c157 <+135>: mov    %ebp,%esi
   0xffffffff8124c159 <+137>: mov    %r12,%rdi
   0xffffffff8124c15c <+140>: pop    %rbp
   0xffffffff8124c15d <+141>: pop    %r12
   0xffffffff8124c15f <+143>: pop    %r13
   0xffffffff8124c161 <+145>: pop    %r14
   0xffffffff8124c163 <+147>: jmpq   0xffffffff81204600 <release_pages>
   0xffffffff8124c168 <+152>: mov    %r13,%rdi
   0xffffffff8124c16b <+155>: callq  0xffffffff81250aa0 <try_to_free_swap>
   0xffffffff8124c170 <+160>: mov    %r13,%rdi
   0xffffffff8124c173 <+163>: callq  0xffffffff811ed140 <unlock_page>
   0xffffffff8124c178 <+168>: jmp    0xffffffff8124c14d
<free_pages_and_swap_cache+125>
End of assembler dump.
(gdb)

> (mostly) unwrapped oops below.
>
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
>>> fault: 0000 [#1] SMP PTI
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
>>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink ebtable_filter
>>> ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal
>>> intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek
>>> kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec
>>> snd_hda_core snd_hwdep irqbypass crct10dif_pclmul crc32_pclmul snd_seq
>>> mei_wdt ghash_clmulni_intel snd_seq_device intel_cstate ppdev
>>> intel_uncore iTCO_wdt gpio_ich iTCO_vendor_support snd_pcm
>>> intel_rapl_perf snd_timer snd mei_me parport_pc joydev i2c_i801 mei
>>> soundcore shpchp lpc_ich parport nfsd auth_rpcgss nfs_acl lockd grace
>>> sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii
>>> video
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CPU: 7 PID: 7093 Comm:
>>> cc1 Not tainted 4.17.4-200.0.fc28.x86_64 #1
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Hardware name: Gigabyte
>>> Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0010:free_pages_and_swap_cache+0x29/0xb0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 0018:ffffb2cd83ffbd58 EFLAGS: 00010202
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: 0017fffe00040068 RBX: ffff93d4abb5ec80 RCX: 0000000000000000
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0017fffe00040068 RSI: 00000000000001fe RDI: ffff93d51e3dd2a0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 00000000000001fe R08: fffff0809df82d20 R09: ffff93d51e5d5000
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: ffff93d51e5d5e20 R11: ffff93d51e5d5d00 R12: ffff93d4abb5e010
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: fffbf0809e304bc0 R14: ffff93d4abb5f000 R15: ffff93d4cbcee8f0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: FS: 0000000000000000(0000) GS:ffff93d51e3c0000(0000) knlGS:0000000000000000
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CR2: 00007ffb255e753c CR3: 00000005e820a002 CR4: 00000000001606e0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Call Trace:
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_flush_mmu_free+0x31/0x50
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  arch_tlb_finish_mmu+0x42/0x70
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_finish_mmu+0x1f/0x30
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  exit_mmap+0xca/0x190
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  mmput+0x5f/0x130
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_exit+0x280/0xae0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  ? __do_page_fault+0x263/0x4e0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_group_exit+0x3a/0xa0
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  __x64_sys_exit_group+0x14/0x20
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_syscall_64+0x65/0x160
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0033:0x7ffb2542b3c6
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 002b:00007ffd9e7e33b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: ffffffffffffffda RBX: 00007ffb2551c740 RCX: 00007ffb2542b3c6
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: 00007ffd9e7e3250 R11: 0000000000000246 R12: 00007ffb2551c740
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: 0000000000000037 R14: 00007ffb25525708 R15: 0000000000000000
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Code: 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 fc 55 89 f5 53 e8 29 99 fb ff 85 ed 7e 6b 8d 45 ff 4c 89 e3 4d 8d 74 c4 08 4c 8b 2b <49> 8b 55 20 48 8d 42 ff 83 e2 01 49 0f 44 c5 48 8b 48 20 48 8d
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: free_pages_and_swap_cache+0x29/0xb0 RSP: ffffb2cd83ffbd58
>>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ---[ end trace 5960277fd8a3c0b5 ]---



-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-09 23:02   ` H.J. Lu
@ 2018-07-10  0:44     ` Dave Hansen
  2018-07-10  2:14       ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-10  0:44 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Matthew Wilcox, LKML, Andy Lutomirski,
	Mel Gorman, Andrew Morton, Rik van Riel, Minchan Kim

...  cc'ing a few folks who I know have been looking at this code
lately.  The full oops is below if any of you want to take a look.

OK, well, annotating the disassembly a bit:

> (gdb) disass free_pages_and_swap_cache
> Dump of assembler code for function free_pages_and_swap_cache:
>    0xffffffff8124c0d0 <+0>: callq  0xffffffff81a017a0 <__fentry__>
>    0xffffffff8124c0d5 <+5>: push   %r14
>    0xffffffff8124c0d7 <+7>: push   %r13
>    0xffffffff8124c0d9 <+9>: push   %r12
>    0xffffffff8124c0db <+11>: mov    %rdi,%r12		// %r12 = pages
>    0xffffffff8124c0de <+14>: push   %rbp
>    0xffffffff8124c0df <+15>: mov    %esi,%ebp		// %ebp = nr
>    0xffffffff8124c0e1 <+17>: push   %rbx
>    0xffffffff8124c0e2 <+18>: callq  0xffffffff81205a10 <lru_add_drain>
>    0xffffffff8124c0e7 <+23>: test   %ebp,%ebp		// test nr==0
>    0xffffffff8124c0e9 <+25>: jle    0xffffffff8124c156 <free_pages_and_swap_cache+134>
>    0xffffffff8124c0eb <+27>: lea    -0x1(%rbp),%eax
>    0xffffffff8124c0ee <+30>: mov    %r12,%rbx		// %rbx = pages
>    0xffffffff8124c0f1 <+33>: lea    0x8(%r12,%rax,8),%r14 // load &pages[nr] into %r14?
>    0xffffffff8124c0f6 <+38>: mov    (%rbx),%r13 	// %r13 = pages[i]
>    0xffffffff8124c0f9 <+41>: mov    0x20(%r13),%rdx 	//<<<<<<<<<<<<<<<<<<<< GPF here.
%r13 is 64-byte aligned, so looks like a halfway reasonable 'struct page *'.

%R14 looks OK (0xffff93d4abb5f000) because it points to the end of a
dynamically-allocated (not on-stack) mmu_gather_batch page.  %RBX is
pointing 50 pages up from the start of the previous page.  That makes it
the 48th page in pages[] after a pointer and two integers in the
beginning of the structure.  That 48 is important because it's way
larger than the on-stack size of 8.

It's hard to make much sense of %R13 (pages[48] / 0xfffbf0809e304bc0)
because the vmemmap addresses get randomized.  But, I _think_ that's too
high of an address for a 4-level paging vmemmap[] entry.  Does anybody
else know offhand?

I'd really want to see this reproduced without KASLR to make the oops
easier to read.  It would also be handy to try your workload with all
the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
and see if it still triggers.

Some relevant functions and structures below for reference.

void free_pages_and_swap_cache(struct page **pages, int nr)
{
        for (i = 0; i < nr; i++)
                free_swap_cache(pages[i]);
}


static void tlb_flush_mmu_free(struct mmu_gather *tlb)
{
        for (batch = &tlb->local; batch && batch->nr;
	     batch = batch->next) {
                free_pages_and_swap_cache(batch->pages, batch->nr);
}

zap_pte_range()
{
	if (force_flush)
		tlb_flush_mmu_free(tlb);
}

... all the way up to the on-stack-allocated mmu_gather:

void zap_page_range(struct vm_area_struct *vma, unsigned long start,
                unsigned long size)
{
        struct mmu_gather tlb;


#define MMU_GATHER_BUNDLE       8

struct mmu_gather {
...
        struct mmu_gather_batch local;
        struct page             *__pages[MMU_GATHER_BUNDLE];
}

struct mmu_gather_batch {
        struct mmu_gather_batch *next;
        unsigned int            nr;
        unsigned int            max;
        struct page             *pages[0];
};

#define MAX_GATHER_BATCH        \
        ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))



> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
> fault: 0000 [#1] SMP PTI
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink ebtable_filter
> ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal
> intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek
> kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec
> snd_hda_core snd_hwdep irqbypass crct10dif_pclmul crc32_pclmul snd_seq
> mei_wdt ghash_clmulni_intel snd_seq_device intel_cstate ppdev
> intel_uncore iTCO_wdt gpio_ich iTCO_vendor_support snd_pcm
> intel_rapl_perf snd_timer snd mei_me parport_pc joydev i2c_i801 mei
> soundcore shpchp lpc_ich parport nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii
> video
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CPU: 7 PID: 7093 Comm:
> cc1 Not tainted 4.17.4-200.0.fc28.x86_64 #1
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Hardware name: Gigabyte
> Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0010:free_pages_and_swap_cache+0x29/0xb0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 0018:ffffb2cd83ffbd58 EFLAGS: 00010202
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: 0017fffe00040068 RBX: ffff93d4abb5ec80 RCX: 0000000000000000
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0017fffe00040068 RSI: 00000000000001fe RDI: ffff93d51e3dd2a0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 00000000000001fe R08: fffff0809df82d20 R09: ffff93d51e5d5000
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: ffff93d51e5d5e20 R11: ffff93d51e5d5d00 R12: ffff93d4abb5e010
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: fffbf0809e304bc0 R14: ffff93d4abb5f000 R15: ffff93d4cbcee8f0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: FS: 0000000000000000(0000) GS:ffff93d51e3c0000(0000) knlGS:0000000000000000
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CR2: 00007ffb255e753c CR3: 00000005e820a002 CR4: 00000000001606e0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Call Trace:
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_flush_mmu_free+0x31/0x50
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  arch_tlb_finish_mmu+0x42/0x70
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  tlb_finish_mmu+0x1f/0x30
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  exit_mmap+0xca/0x190
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  mmput+0x5f/0x130
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_exit+0x280/0xae0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  ? __do_page_fault+0x263/0x4e0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_group_exit+0x3a/0xa0
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  __x64_sys_exit_group+0x14/0x20
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel:  do_syscall_64+0x65/0x160
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0033:0x7ffb2542b3c6
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 002b:00007ffd9e7e33b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: ffffffffffffffda RBX: 00007ffb2551c740 RCX: 00007ffb2542b3c6
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: 00007ffd9e7e3250 R11: 0000000000000246 R12: 00007ffb2551c740
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: 0000000000000037 R14: 00007ffb25525708 R15: 0000000000000000
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Code: 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 fc 55 89 f5 53 e8 29 99 fb ff 85 ed 7e 6b 8d 45 ff 4c 89 e3 4d 8d 74 c4 08 4c 8b 2b <49> 8b 55 20 48 8d 42 ff 83 e2 01 49 0f 44 c5 48 8b 48 20 48 8d
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: free_pages_and_swap_cache+0x29/0xb0 RSP: ffffb2cd83ffbd58
> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ---[ end trace 5960277fd8a3c0b5 ]---

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-10  0:44     ` Dave Hansen
@ 2018-07-10  2:14       ` H.J. Lu
  2018-07-10  3:47         ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-10  2:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On Mon, Jul 9, 2018 at 5:44 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> ...  cc'ing a few folks who I know have been looking at this code
> lately.  The full oops is below if any of you want to take a look.
>
> OK, well, annotating the disassembly a bit:
>
>> (gdb) disass free_pages_and_swap_cache
>> Dump of assembler code for function free_pages_and_swap_cache:
>>    0xffffffff8124c0d0 <+0>: callq  0xffffffff81a017a0 <__fentry__>
>>    0xffffffff8124c0d5 <+5>: push   %r14
>>    0xffffffff8124c0d7 <+7>: push   %r13
>>    0xffffffff8124c0d9 <+9>: push   %r12
>>    0xffffffff8124c0db <+11>: mov    %rdi,%r12         // %r12 = pages
>>    0xffffffff8124c0de <+14>: push   %rbp
>>    0xffffffff8124c0df <+15>: mov    %esi,%ebp         // %ebp = nr
>>    0xffffffff8124c0e1 <+17>: push   %rbx
>>    0xffffffff8124c0e2 <+18>: callq  0xffffffff81205a10 <lru_add_drain>
>>    0xffffffff8124c0e7 <+23>: test   %ebp,%ebp         // test nr==0
>>    0xffffffff8124c0e9 <+25>: jle    0xffffffff8124c156 <free_pages_and_swap_cache+134>
>>    0xffffffff8124c0eb <+27>: lea    -0x1(%rbp),%eax
>>    0xffffffff8124c0ee <+30>: mov    %r12,%rbx         // %rbx = pages
>>    0xffffffff8124c0f1 <+33>: lea    0x8(%r12,%rax,8),%r14 // load &pages[nr] into %r14?
>>    0xffffffff8124c0f6 <+38>: mov    (%rbx),%r13       // %r13 = pages[i]
>>    0xffffffff8124c0f9 <+41>: mov    0x20(%r13),%rdx   //<<<<<<<<<<<<<<<<<<<< GPF here.
> %r13 is 64-byte aligned, so looks like a halfway reasonable 'struct page *'.
>
> %R14 looks OK (0xffff93d4abb5f000) because it points to the end of a
> dynamically-allocated (not on-stack) mmu_gather_batch page.  %RBX is
> pointing 50 pages up from the start of the previous page.  That makes it
> the 48th page in pages[] after a pointer and two integers in the
> beginning of the structure.  That 48 is important because it's way
> larger than the on-stack size of 8.
>
> It's hard to make much sense of %R13 (pages[48] / 0xfffbf0809e304bc0)
> because the vmemmap addresses get randomized.  But, I _think_ that's too
> high of an address for a 4-level paging vmemmap[] entry.  Does anybody
> else know offhand?
>
> I'd really want to see this reproduced without KASLR to make the oops
> easier to read.  It would also be handy to try your workload with all
> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
> and see if it still triggers.

How can I turn them on at boot time?

> Some relevant functions and structures below for reference.
>
> void free_pages_and_swap_cache(struct page **pages, int nr)
> {
>         for (i = 0; i < nr; i++)
>                 free_swap_cache(pages[i]);
> }
>
>
> static void tlb_flush_mmu_free(struct mmu_gather *tlb)
> {
>         for (batch = &tlb->local; batch && batch->nr;
>              batch = batch->next) {
>                 free_pages_and_swap_cache(batch->pages, batch->nr);
> }
>
> zap_pte_range()
> {
>         if (force_flush)
>                 tlb_flush_mmu_free(tlb);
> }
>
> ... all the way up to the on-stack-allocated mmu_gather:
>
> void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>                 unsigned long size)
> {
>         struct mmu_gather tlb;
>
>
> #define MMU_GATHER_BUNDLE       8
>
> struct mmu_gather {
> ...
>         struct mmu_gather_batch local;
>         struct page             *__pages[MMU_GATHER_BUNDLE];
> }
>
> struct mmu_gather_batch {
>         struct mmu_gather_batch *next;
>         unsigned int            nr;
>         unsigned int            max;
>         struct page             *pages[0];
> };
>
> #define MAX_GATHER_BATCH        \
>         ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-10  2:14       ` H.J. Lu
@ 2018-07-10  3:47         ` Dave Hansen
       [not found]           ` <CAMe9rOrHowBX06nihdRRmEqhV8v7cs+PwVY7JYQFpUFOnHC71A@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-10  3:47 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/09/2018 07:14 PM, H.J. Lu wrote:
>> I'd really want to see this reproduced without KASLR to make the oops
>> easier to read.  It would also be handy to try your workload with all
>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
>> and see if it still triggers.
> How can I turn them on at boot time?

The only thing you can add at boot time is slab debugging, and it's
probably the most useless of the three that I listed since you're not
actually seeing any slab corruption.

The rest are compile-time options.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
       [not found]           ` <CAMe9rOrHowBX06nihdRRmEqhV8v7cs+PwVY7JYQFpUFOnHC71A@mail.gmail.com>
@ 2018-07-11 15:13             ` Dave Hansen
  2018-07-11 15:40               ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 15:13 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/11/2018 07:56 AM, H.J. Lu wrote:
> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
>>>> I'd really want to see this reproduced without KASLR to make the oops
>>>> easier to read.  It would also be handy to try your workload with all
>>>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
>>>> and see if it still triggers.
>>> How can I turn them on at boot time?
>> The only thing you can add at boot time is slab debugging, and it's
>> probably the most useless of the three that I listed since you're not
>> actually seeing any slab corruption.
>>
>> The rest are compile-time options.
> I enabled KASAN, slab debugging, DEBUG_PAGE_ALLOC and disabled
> KASLR.  Machine locked up.  Here is the last kernel message before locking
> up.

KASAN looks to have caught it, although it scrolled off the screen.  I
can certainly imagine the oops you saw earlier being caused by stack
corruption.

Sounds like we need to reproduce this in an environment that can
actually capture a real oops.  Can you share more about your workload?
I'll see if I can get it to reproduce in a VM.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 15:13             ` Dave Hansen
@ 2018-07-11 15:40               ` H.J. Lu
  2018-07-11 16:24                 ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-11 15:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 8:13 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2018 07:56 AM, H.J. Lu wrote:
>> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
>>>>> I'd really want to see this reproduced without KASLR to make the oops
>>>>> easier to read.  It would also be handy to try your workload with all
>>>>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
>>>>> and see if it still triggers.
>>>> How can I turn them on at boot time?
>>> The only thing you can add at boot time is slab debugging, and it's
>>> probably the most useless of the three that I listed since you're not
>>> actually seeing any slab corruption.
>>>
>>> The rest are compile-time options.
>> I enabled KASAN, slab debugging, DEBUG_PAGE_ALLOC and disabled
>> KASLR.  Machine locked up.  Here is the last kernel message before locking
>> up.
>
> KASAN looks to have caught it, although it scrolled off the screen.  I
> can certainly imagine the oops you saw earlier being caused by stack
> corruption.
>
> Sounds like we need to reproduce this in an environment that can
> actually capture a real oops.  Can you share more about your workload?
> I'll see if I can get it to reproduce in a VM.

This is a quad-core machine with HT and 6 GB RAM.  The workload is
x32 GCC build and test with "make -j8".  The bug is triggered during GCC
test after a couple hours.  I have a script to set up my workload:

https://github.com/hjl-tools/gcc-regression

with a cron job

# It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
TIMEOUT=480
# Run it every hour,
30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
--with-pic > /dev/null 2>&1


-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 15:40               ` H.J. Lu
@ 2018-07-11 16:24                 ` Dave Hansen
  2018-07-11 16:29                   ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 16:24 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/11/2018 08:40 AM, H.J. Lu wrote:
> This is a quad-core machine with HT and 6 GB RAM.  The workload is
> x32 GCC build and test with "make -j8".  The bug is triggered during GCC
> test after a couple hours.  I have a script to set up my workload:
> 
> https://github.com/hjl-tools/gcc-regression
> 
> with a cron job
> 
> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
> TIMEOUT=480
> # Run it every hour,
> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
> --with-pic > /dev/null 2>&1

Oh, fun, one of those.

How long does it take to reproduce?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 16:24                 ` Dave Hansen
@ 2018-07-11 16:29                   ` H.J. Lu
  2018-07-11 16:49                     ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 9:24 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2018 08:40 AM, H.J. Lu wrote:
>> This is a quad-core machine with HT and 6 GB RAM.  The workload is
>> x32 GCC build and test with "make -j8".  The bug is triggered during GCC
>> test after a couple hours.  I have a script to set up my workload:
>>
>> https://github.com/hjl-tools/gcc-regression
>>
>> with a cron job
>>
>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>> TIMEOUT=480
>> # Run it every hour,
>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>> --with-pic > /dev/null 2>&1
>
> Oh, fun, one of those.
>
> How long does it take to reproduce?

About 5 hours.

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 16:29                   ` H.J. Lu
@ 2018-07-11 16:49                     ` Dave Hansen
  2018-07-11 16:53                       ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 16:49 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/11/2018 09:29 AM, H.J. Lu wrote:
>>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>>> TIMEOUT=480
>>> # Run it every hour,
>>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>>> --with-pic > /dev/null 2>&1
>> Oh, fun, one of those.
>>
>> How long does it take to reproduce?
> About 5 hours.

It would be much appreciated if you can get a console of some kind on
that system and try to get a full KASAN oops out of it.  Serial, usb
debug, or maybe netconsole would probably work.

I'll also try to get it running on a big system and see if it triggers
faster there.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 16:49                     ` Dave Hansen
@ 2018-07-11 16:53                       ` H.J. Lu
       [not found]                         ` <CAMe9rOpV89jWhvAZtqOJnc0eXzRYiLF5pLWRcdb7-kKLigj4rQ@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-11 16:53 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2018 09:29 AM, H.J. Lu wrote:
>>>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>>>> TIMEOUT=480
>>>> # Run it every hour,
>>>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>>>> --with-pic > /dev/null 2>&1
>>> Oh, fun, one of those.
>>>
>>> How long does it take to reproduce?
>> About 5 hours.
>
> It would be much appreciated if you can get a console of some kind on
> that system and try to get a full KASAN oops out of it.  Serial, usb
> debug, or maybe netconsole would probably work.

I don't have serial ports. I will try netconsole.

> I'll also try to get it running on a big system and see if it triggers
> faster there.

I have seen it on machines with various amounts of cores and RAMs.
It triggers the fastest on 8 cores with 6GB RAM reliably.

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
       [not found]                         ` <CAMe9rOpV89jWhvAZtqOJnc0eXzRYiLF5pLWRcdb7-kKLigj4rQ@mail.gmail.com>
@ 2018-07-11 17:36                           ` Andy Lutomirski
  2018-07-11 17:43                           ` Dave Hansen
  2018-07-11 17:50                           ` Dave Hansen
  2 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2018-07-11 17:36 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Dave Hansen, H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman,
	Andrew Morton, Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 10:29 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 11, 2018 at 9:53 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>> On 07/11/2018 09:29 AM, H.J. Lu wrote:
>>>>>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>>>>>> TIMEOUT=480
>>>>>> # Run it every hour,
>>>>>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>>>>>> --with-pic > /dev/null 2>&1
>>>>> Oh, fun, one of those.
>>>>>
>>>>> How long does it take to reproduce?
>>>> About 5 hours.
>>>
>>> It would be much appreciated if you can get a console of some kind on
>>> that system and try to get a full KASAN oops out of it.  Serial, usb
>>> debug, or maybe netconsole would probably work.
>>
>> I don't have serial ports. I will try netconsole.
>>
>>> I'll also try to get it running on a big system and see if it triggers
>>> faster there.
>>
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
>
> Here is the first kernel message.

I find myself wondering if this is somehow the same issue as below:

https://syzkaller.appspot.com/bug?id=2c33dff2a2138d709b43671603dc01d65b28a689

We could plausibly have a problem with the entry code, but I don't see
anything relevant in the changelog.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
       [not found]                         ` <CAMe9rOpV89jWhvAZtqOJnc0eXzRYiLF5pLWRcdb7-kKLigj4rQ@mail.gmail.com>
  2018-07-11 17:36                           ` Andy Lutomirski
@ 2018-07-11 17:43                           ` Dave Hansen
  2018-07-11 17:50                             ` H.J. Lu
  2018-07-11 17:50                           ` Dave Hansen
  2 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 17:43 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.

Does it trigger better with more RAM or less?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
       [not found]                         ` <CAMe9rOpV89jWhvAZtqOJnc0eXzRYiLF5pLWRcdb7-kKLigj4rQ@mail.gmail.com>
  2018-07-11 17:36                           ` Andy Lutomirski
  2018-07-11 17:43                           ` Dave Hansen
@ 2018-07-11 17:50                           ` Dave Hansen
  2018-07-11 18:31                             ` Dave Jones
  2 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 17:50 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.

This looks like random corruption again.  It's probably a bogus 'struct
page' that fails the move_freepages() pfn_valid() checks.  I'm too lazy
to go reproduce the likely stack trace (not sure why it didn't show up
on your screen), but this could just be another symptom of the same
issue that caused the TLB batching oops.

My money is on this being some kind of odd stack corruption, maybe
interrupt-induced, but that's a total guess at this point.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 17:43                           ` Dave Hansen
@ 2018-07-11 17:50                             ` H.J. Lu
  0 siblings, 0 replies; 24+ messages in thread
From: H.J. Lu @ 2018-07-11 17:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman, Andrew Morton,
	Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 10:43 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2018 10:29 AM, H.J. Lu wrote:
>>> I have seen it on machines with various amounts of cores and RAMs.
>>> It triggers the fastest on 8 cores with 6GB RAM reliably.
>> Here is the first kernel message.
>
> Does it trigger better with more RAM or less?

It triggers much more with 6GB RAM than with 32GB and 64GB.

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 17:50                           ` Dave Hansen
@ 2018-07-11 18:31                             ` Dave Jones
  2018-07-11 23:07                               ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Jones @ 2018-07-11 18:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H.J. Lu, H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman,
	Andrew Morton, Rik van Riel, Minchan Kim

On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
 > On 07/11/2018 10:29 AM, H.J. Lu wrote:
 > >> I have seen it on machines with various amounts of cores and RAMs.
 > >> It triggers the fastest on 8 cores with 6GB RAM reliably.
 > > Here is the first kernel message.
 > 
 > This looks like random corruption again.  It's probably a bogus 'struct
 > page' that fails the move_freepages() pfn_valid() checks.  I'm too lazy
 > to go reproduce the likely stack trace (not sure why it didn't show up
 > on your screen), but this could just be another symptom of the same
 > issue that caused the TLB batching oops.
 > 
 > My money is on this being some kind of odd stack corruption, maybe
 > interrupt-induced, but that's a total guess at this point.

So, maybe related.. I reported this to linux-mm a few days ago:

When I ran an rsync on my machine I use for backups, it eventually
hits this trace..

kernel BUG at mm/page_alloc.c:2016!
invalid opcode: 0000 [#1] SMP RIP: move_freepages_block+0x120/0x2d0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1
Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
RIP: 0010:move_freepages_block+0x120/0x2d0
Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 28 44 89 f0 5b 5d 41
5c 41 5d 41 5e 41 5f
RSP: 0018:ffff88043fac3af8 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffffea0002e20000 RCX: 0000000000000003
RDX: 0000000000000000 RSI: ffffea0002e20000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff88043fac3b5c R09: ffffffff9295e110
R10: ffff88043fdf4000 R11: ffffea0002e20008 R12: ffffea0002e20000
R13: ffffffff9295dd40 R14: 0000000000000008 R15: ffffea0002e27fc0
FS:  0000000000000000(0000) GS:ffff88043fac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2a75f71fe8 CR3: 00000001e380f006 CR4: 00000000001606e0
Call Trace:
 <IRQ>
 ? lock_acquire+0xe6/0x1dc
 steal_suitable_fallback+0x152/0x1a0
 get_page_from_freelist+0x1029/0x1650
 ? free_debug_processing+0x271/0x410
 __alloc_pages_nodemask+0x111/0x310
 page_frag_alloc+0x74/0x120
 __netdev_alloc_skb+0x95/0x110
 e1000_alloc_rx_buffers+0x225/0x2b0
 e1000_clean_rx_irq+0x2ee/0x450
 e1000e_poll+0x7c/0x2e0
 net_rx_action+0x273/0x4d0
 __do_softirq+0xc6/0x4d6
 irq_exit+0xbb/0xc0
 do_IRQ+0x60/0x110
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0010:cpuidle_enter_state+0xb5/0x390
Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 89 c8 48 c1 f9 3f 48
f7 ea b8 ff ff ff 7f
RSP: 0018:ffffc900000abe70 EFLAGS: 00000202
 ORIG_RAX: ffffffffffffffdc
RAX: ffff880107fe8040 RBX: 0000000000000003 RCX: 0000000000000001
RDX: 20c49ba5e353f7cf RSI: 0000000000000001 RDI: ffff880107fe8040
RBP: ffff88043fae8c20 R08: 0000000000000001 R09: 0000000000000018
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff928fb7d8
R13: 0000000000000003 R14: 0000000000000003 R15: 0000015e55aecf23
 do_idle+0x128/0x230
 cpu_startup_entry+0x6f/0x80
 start_secondary+0x192/0x1f0
 secondary_startup_64+0xa5/0xb0
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4

Everything then locks up & rebooots.

It's fairly reproduceable, though every time I run it my rsync gets further, and eventually I suspect it
won't create enough load to reproduce.

2006 #ifndef CONFIG_HOLES_IN_ZONE
2007         /*
2008          * page_zone is not safe to call in this context when
2009          * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
2010          * anyway as we check zone boundaries in move_freepages_block().
2011          * Remove at a later date when no bug reports exist related to
2012          * grouping pages by mobility
2013          */
2014         VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
2015                   pfn_valid(page_to_pfn(end_page)) &&
2016                   page_zone(start_page) != page_zone(end_page));
2017 #endif
2018


I could trigger it fairly quickly last week, but it seemed dependant on just how much
rsync is actually transferring. (There are millions of files, and only a few thousand had changed)

When there's nothing changed, the rsync was running to completion every time.

	Dave


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 18:31                             ` Dave Jones
@ 2018-07-11 23:07                               ` Andy Lutomirski
  2018-07-11 23:14                                 ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2018-07-11 23:07 UTC (permalink / raw)
  To: Dave Jones
  Cc: Dave Hansen, H.J. Lu, H. Peter Anvin, LKML, Andy Lutomirski,
	Mel Gorman, Andrew Morton, Rik van Riel, Minchan Kim




> On Jul 11, 2018, at 11:31 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> 
>> On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
>> On 07/11/2018 10:29 AM, H.J. Lu wrote:
>>>> I have seen it on machines with various amounts of cores and RAMs.
>>>> It triggers the fastest on 8 cores with 6GB RAM reliably.
>>> Here is the first kernel message.
>> 
>> This looks like random corruption again.  It's probably a bogus 'struct
>> page' that fails the move_freepages() pfn_valid() checks.  I'm too lazy
>> to go reproduce the likely stack trace (not sure why it didn't show up
>> on your screen), but this could just be another symptom of the same
>> issue that caused the TLB batching oops.
>> 
>> My money is on this being some kind of odd stack corruption, maybe
>> interrupt-induced, but that's a total guess at this point.
> 
> So, maybe related.. I reported this to linux-mm a few days ago:
> 
> When I ran an rsync on my machine I use for backups, it eventually
> hits this trace..
> 
> kernel BUG at mm/page_alloc.c:2016!
> invalid opcode: 0000 [#1] SMP RIP: move_freepages_block+0x120/0x2d0
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1
> Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
> RIP: 0010:move_freepages_block+0x120/0x2d0
> Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 28 44 89 f0 5b 5d 41
> 5c 41 5d 41 5e 41 5f
> RSP: 0018:ffff88043fac3af8 EFLAGS: 00010093
> RAX: 0000000000000000 RBX: ffffea0002e20000 RCX: 0000000000000003
> RDX: 0000000000000000 RSI: ffffea0002e20000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffff88043fac3b5c R09: ffffffff9295e110
> R10: ffff88043fdf4000 R11: ffffea0002e20008 R12: ffffea0002e20000
> R13: ffffffff9295dd40 R14: 0000000000000008 R15: ffffea0002e27fc0
> FS:  0000000000000000(0000) GS:ffff88043fac0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2a75f71fe8 CR3: 00000001e380f006 CR4: 00000000001606e0
> Call Trace:
> <IRQ>
> ? lock_acquire+0xe6/0x1dc
> steal_suitable_fallback+0x152/0x1a0
> get_page_from_freelist+0x1029/0x1650
> ? free_debug_processing+0x271/0x410
> __alloc_pages_nodemask+0x111/0x310
> page_frag_alloc+0x74/0x120
> __netdev_alloc_skb+0x95/0x110
> e1000_alloc_rx_buffers+0x225/0x2b0
> e1000_clean_rx_irq+0x2ee/0x450
> e1000e_poll+0x7c/0x2e0
> net_rx_action+0x273/0x4d0
> __do_softirq+0xc6/0x4d6
> irq_exit+0xbb/0xc0
> do_IRQ+0x60/0x110
> common_interrupt+0xf/0xf
> </IRQ>
> RIP: 0010:cpuidle_enter_state+0xb5/0x390
> Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 89 c8 48 c1 f9 3f 48
> f7 ea b8 ff ff ff 7f
> RSP: 0018:ffffc900000abe70 EFLAGS: 00000202
> ORIG_RAX: ffffffffffffffdc
> RAX: ffff880107fe8040 RBX: 0000000000000003 RCX: 0000000000000001
> RDX: 20c49ba5e353f7cf RSI: 0000000000000001 RDI: ffff880107fe8040
> RBP: ffff88043fae8c20 R08: 0000000000000001 R09: 0000000000000018
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff928fb7d8
> R13: 0000000000000003 R14: 0000000000000003 R15: 0000015e55aecf23
> do_idle+0x128/0x230
> cpu_startup_entry+0x6f/0x80
> start_secondary+0x192/0x1f0
> secondary_startup_64+0xa5/0xb0
> NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
> 
> Everything then locks up & rebooots.
> 
> It's fairly reproduceable, though every time I run it my rsync gets further, and eventually I suspect it
> won't create enough load to reproduce.
> 
> 2006 #ifndef CONFIG_HOLES_IN_ZONE
> 2007         /*
> 2008          * page_zone is not safe to call in this context when
> 2009          * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
> 2010          * anyway as we check zone boundaries in move_freepages_block().
> 2011          * Remove at a later date when no bug reports exist related to
> 2012          * grouping pages by mobility
> 2013          */
> 2014         VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
> 2015                   pfn_valid(page_to_pfn(end_page)) &&
> 2016                   page_zone(start_page) != page_zone(end_page));
> 2017 #endif
> 2018
> 
> 
> I could trigger it fairly quickly last week, but it seemed dependant on just how much
> rsync is actually transferring. (There are millions of files, and only a few thousand had changed)
> 
> When there's nothing changed, the rsync was running to completion every time.
> 
> 

Could the cause be an overflow of the IRQ stack?  I’ve been meaning to put guard pages on all the special stacks for a while. Let me see if I can do that in the next couple days.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 23:07                               ` Andy Lutomirski
@ 2018-07-11 23:14                                 ` Dave Hansen
  2018-07-12 14:44                                   ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2018-07-11 23:14 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Jones
  Cc: H.J. Lu, H. Peter Anvin, LKML, Andy Lutomirski, Mel Gorman,
	Andrew Morton, Rik van Riel, Minchan Kim

On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
> Could the cause be an overflow of the IRQ stack?  I’ve been meaning
> to put guard pages on all the special stacks for a while. Let me see
> if I can do that in the next couple days.

But what would that overflow into?  Wouldn't it most likely be another
interrupt stack since they're all allocated together?

This looks more like thread stack corruption.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-11 23:14                                 ` Dave Hansen
@ 2018-07-12 14:44                                   ` H.J. Lu
  2018-07-12 23:28                                     ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-12 14:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Dave Jones, H. Peter Anvin, LKML,
	Andy Lutomirski, Mel Gorman, Andrew Morton, Rik van Riel,
	Minchan Kim

On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>> Could the cause be an overflow of the IRQ stack?  I’ve been meaning
>> to put guard pages on all the special stacks for a while. Let me see
>> if I can do that in the next couple days.
>
> But what would that overflow into?  Wouldn't it most likely be another
> interrupt stack since they're all allocated together?
>
> This looks more like thread stack corruption.

I tried netconsole and got this:

[29369.552998] ------------[ cut here ]------------
[29369.560996] kernel BUG at mm/page_alloc.c:2019!
[29369.568980] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[29369.576892] Modules linked in: netconsole xt_CHECKSUM
ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute
bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter
ebtables ip6table_filter ip6_tables sunrpc vfat fat intel_powerclamp
coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore
snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt
iTCO_vendor_support gpio_ich snd_hda_intel joydev snd_hda_codec
snd_hda_core snd_hwdep mxm_wmi snd_seq snd_seq_device
[29369.627745]  snd_pcm pcspkr snd_timer snd i2c_i801 soundcore
lpc_ich i5500_temp i7core_edac shpchp wmi acpi_cpufreq ata_generic
pata_acpi radeon crc32c_intel i2c_algo_bit drm_kms_helper
firewire_ohci firewire_core ttm crc_itu_t drm e1000e pata_marvell
[29369.645472] CPU: 1 PID: 3896 Comm: expect Tainted: G          I
  4.17.5+ #7
[29369.654333] Hardware name:  /DX58SO, BIOS
SOX5810J.86A.5600.2013.0729.2250 07/29/2013
[29369.663320] RIP: 0010:move_freepages_block+0x246/0x4b0
[29369.672238] RSP: 0018:ffff8800b61f7178 EFLAGS: 00010002
[29369.681064] RAX: ffff8801af3d7000 RBX: ffffea00033c8000 RCX: 0000000000000000
[29369.690011] RDX: dffffc0000000000 RSI: ffffea00033cc000 RDI: ffffffff831d8ec0
[29369.698992] RBP: ffff8801af3d7680 R08: ffff8800b61f73c8 R09: ffffed0035e7af78
[29369.708025] R10: ffffed0035e7af78 R11: ffff8801af3d7bc3 R12: ffff8800b61f7228

before machine looked up.

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-12 14:44                                   ` H.J. Lu
@ 2018-07-12 23:28                                     ` H.J. Lu
  2018-07-14  2:08                                       ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-12 23:28 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Dave Jones, H. Peter Anvin, LKML,
	Andy Lutomirski, Mel Gorman, Andrew Morton, Rik van Riel,
	Minchan Kim

On Thu, Jul 12, 2018 at 7:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>>> Could the cause be an overflow of the IRQ stack?  I’ve been meaning
>>> to put guard pages on all the special stacks for a while. Let me see
>>> if I can do that in the next couple days.
>>
>> But what would that overflow into?  Wouldn't it most likely be another
>> interrupt stack since they're all allocated together?
>>
>> This looks more like thread stack corruption.
>
> I tried netconsole and got this:
>
> [29369.552998] ------------[ cut here ]------------
> [29369.560996] kernel BUG at mm/page_alloc.c:2019!
> [29369.568980] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [29369.576892] Modules linked in: netconsole xt_CHECKSUM
> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
> nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute
> bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
> nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter
> ebtables ip6table_filter ip6_tables sunrpc vfat fat intel_powerclamp
> coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore
> snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt
> iTCO_vendor_support gpio_ich snd_hda_intel joydev snd_hda_codec
> snd_hda_core snd_hwdep mxm_wmi snd_seq snd_seq_device
> [29369.627745]  snd_pcm pcspkr snd_timer snd i2c_i801 soundcore
> lpc_ich i5500_temp i7core_edac shpchp wmi acpi_cpufreq ata_generic
> pata_acpi radeon crc32c_intel i2c_algo_bit drm_kms_helper
> firewire_ohci firewire_core ttm crc_itu_t drm e1000e pata_marvell
> [29369.645472] CPU: 1 PID: 3896 Comm: expect Tainted: G          I
>   4.17.5+ #7
> [29369.654333] Hardware name:  /DX58SO, BIOS
> SOX5810J.86A.5600.2013.0729.2250 07/29/2013
> [29369.663320] RIP: 0010:move_freepages_block+0x246/0x4b0
> [29369.672238] RSP: 0018:ffff8800b61f7178 EFLAGS: 00010002
> [29369.681064] RAX: ffff8801af3d7000 RBX: ffffea00033c8000 RCX: 0000000000000000
> [29369.690011] RDX: dffffc0000000000 RSI: ffffea00033cc000 RDI: ffffffff831d8ec0
> [29369.698992] RBP: ffff8801af3d7680 R08: ffff8800b61f73c8 R09: ffffed0035e7af78
> [29369.708025] R10: ffffed0035e7af78 R11: ffff8801af3d7bc3 R12: ffff8800b61f7228
>
> before machine looked up.

4.17.6:

[24707.152017] kernel BUG at mm/page_alloc.c:2019!
[24707.157304] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[24707.162573] Modules linked in: netconsole xt_CHECKSUM
ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute
bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter
ebtables ip6table_filter ip6_tables sunrpc vfat fat intel_powerclamp
coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore
snd_hda_codec_realtek iTCO_wdt mxm_wmi gpio_ich iTCO_vendor_support
snd_hda_codec_generic snd_hda_intel snd_hda_codec joydev snd_hda_core
snd_hwdep snd_seq snd_seq_device
[24707.196565]  snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich
soundcore i7core_edac i5500_temp shpchp wmi acpi_cpufreq ata_generic
pata_acpi radeon i2c_algo_bit drm_kms_helper crc32c_intel
firewire_ohci ttm firewire_core crc_itu_t drm pata_marvell e1000e
[24707.208944] CPU: 3 PID: 23661 Comm: cc1plus Tainted: G          I
    4.17.6+ #11
[24707.215277] Hardware name:  /DX58SO, BIOS
SOX5810J.86A.5600.2013.0729.2250 07/29/2013
[24707.221715] RIP: 0010:move_freepages_block+0x246/0x4b0
[24707.228145] RSP: 0018:ffff880170387818 EFLAGS: 00010002
[24707.234614] RAX: ffff8801af3d7000 RBX: ffffea00033c8000 RCX: 0000000000000000
[24707.241158] RDX: dffffc0000000000 RSI: ffffea00033cf000 RDI: ffffffff831d9380
[24707.247735] RBP: ffff8801af3d7680 R08: ffff880170387a68 R09: ffffed0035e7af78
[24707.254233] R10: ffffed0035e7af78 R11: ffff8801af3d7bc3 R12: ffff8801703878c8
[24707.260650] R13: ffff8801af3d7680 R14: 0000000000000800 R15: 0000000000000000
[24707.267015] FS:  00007f18c4310c80(0000) GS:ffff880176ec0000(0000)
knlGS:0000000000000000
[24707.273381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24707.279656] CR2: 00007f18c42a0000 CR3: 000000016b982000 CR4: 00000000000006e0
[24707.285919] Call Trace:
[24707.292076]  ? unwind_next_frame+0x793/0x850
[24707.298199]  steal_suitable_fallback+0x1f2/0x2a0
[24707.304344]  ? ftrace_ops_trampoline+0x4c/0x80
[24707.310425]  ? move_freepages_block+0x4b0/0x4b0
[24707.316423]  ? is_bpf_text_address+0xa/0x20
[24707.322325]  ? kernel_text_address+0x100/0x110
[24707.328105]  ? find_suitable_fallback+0xd0/0x100
[24707.333828]  get_page_from_freelist+0x189c/0x2540
[24707.339471]  ? __isolate_free_page+0x2c0/0x2c0
[24707.345011]  ? unlazy_walk+0xb8/0x160
[24707.350539]  __alloc_pages_nodemask+0x1b0/0x3c0
[24707.355912]  ? __alloc_pages_slowpath+0x1240/0x1240
[24707.361193]  ? path_mountpoint+0x8d0/0x8d0
[24707.366500]  ? deactivate_slab.isra.61+0x200/0x500
[24707.371712]  ? policy_node+0x56/0x60
[24707.376838]  new_slab+0x288/0x790
[24707.381853]  ? __kasan_slab_free+0x143/0x180
[24707.386795]  ___slab_alloc+0x3b4/0x550
[24707.391606]  ? getname_flags+0x6c/0x2a0
[24707.396435]  ? getname_flags+0x6c/0x2a0
[24707.401241]  __slab_alloc+0x1c/0x30
[24707.406059]  kmem_cache_alloc+0x223/0x260
[24707.410913]  getname_flags+0x6c/0x2a0
[24707.415772]  ? __ia32_sys_fstat+0x30/0x30
[24707.420655]  user_path_at_empty+0x1d/0x40
[24707.425551]  vfs_statx+0xb9/0x140
[24707.430489]  ? vfs_statx_fd+0x80/0x80
[24707.435307]  __do_sys_newlstat+0x77/0xd0
[24707.440144]  ? __ia32_sys_newstat+0x30/0x30
[24707.445005]  ? do_sys_open+0x16b/0x290
[24707.449881]  do_syscall_64+0x7d/0x1c0
[24707.454778]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[24707.459716] RIP: 0033:0x7f18c43ff049
[24707.464670] RSP: 002b:00007fff75c7aaa8 EFLAGS: 00000246 ORIG_RAX:
0000000000000006
[24707.469625] RAX: ffffffffffffffda RBX: 00007fff75c7bbd0 RCX: 00007f18c43ff049


-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-12 23:28                                     ` H.J. Lu
@ 2018-07-14  2:08                                       ` Andy Lutomirski
  2018-07-16 16:05                                         ` H.J. Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2018-07-14  2:08 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Dave Hansen, Dave Jones, H. Peter Anvin, LKML, Andy Lutomirski,
	Mel Gorman, Andrew Morton, Rik van Riel, Minchan Kim

I'm not at all convinced that this is the problem, but the series here
will give a better diagnostic if the issue really is an IRQ stack
overflow:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages

(link currently broken.  should work soon.)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-14  2:08                                       ` Andy Lutomirski
@ 2018-07-16 16:05                                         ` H.J. Lu
  2018-07-20 21:35                                           ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2018-07-16 16:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Dave Jones, H. Peter Anvin, LKML, Andy Lutomirski,
	Mel Gorman, Andrew Morton, Rik van Riel, Minchan Kim

[-- Attachment #1: Type: text/plain, Size: 383 bytes --]

On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I'm not at all convinced that this is the problem, but the series here
> will give a better diagnostic if the issue really is an IRQ stack
> overflow:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages
>

This kernel won't boot with this configure file.

-- 
H.J.

[-- Attachment #2: config.debug.xz --]
[-- Type: application/x-xz, Size: 42236 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Kernel 4.17.4 lockup
  2018-07-16 16:05                                         ` H.J. Lu
@ 2018-07-20 21:35                                           ` Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2018-07-20 21:35 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Dave Hansen, Dave Jones, H. Peter Anvin, LKML, Andy Lutomirski,
	Mel Gorman, Andrew Morton, Rik van Riel, Minchan Kim


> On Jul 16, 2018, at 6:05 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
>> On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> I'm not at all convinced that this is the problem, but the series here
>> will give a better diagnostic if the issue really is an IRQ stack
>> overflow:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages
>> 
> 
> This kernel won't boot with this configure file.

Bah humbug. You’ll need to turn KASAN off (and maybe even compile it out entirely) if you want to test this. I need to bug the KASAN folks to unbreak their interaction with stacks in vmap space.  It’s been broken literally forever, but I keep incorrectly imagining that it’s been fixed already.

> 
> -- 
> H.J.
> <config.debug.xz>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-08 21:36 Kernel 4.17.4 lockup H.J. Lu
2018-07-09 14:54 ` Dave Hansen
2018-07-09 23:02   ` H.J. Lu
2018-07-10  0:44     ` Dave Hansen
2018-07-10  2:14       ` H.J. Lu
2018-07-10  3:47         ` Dave Hansen
     [not found]           ` <CAMe9rOrHowBX06nihdRRmEqhV8v7cs+PwVY7JYQFpUFOnHC71A@mail.gmail.com>
2018-07-11 15:13             ` Dave Hansen
2018-07-11 15:40               ` H.J. Lu
2018-07-11 16:24                 ` Dave Hansen
2018-07-11 16:29                   ` H.J. Lu
2018-07-11 16:49                     ` Dave Hansen
2018-07-11 16:53                       ` H.J. Lu
     [not found]                         ` <CAMe9rOpV89jWhvAZtqOJnc0eXzRYiLF5pLWRcdb7-kKLigj4rQ@mail.gmail.com>
2018-07-11 17:36                           ` Andy Lutomirski
2018-07-11 17:43                           ` Dave Hansen
2018-07-11 17:50                             ` H.J. Lu
2018-07-11 17:50                           ` Dave Hansen
2018-07-11 18:31                             ` Dave Jones
2018-07-11 23:07                               ` Andy Lutomirski
2018-07-11 23:14                                 ` Dave Hansen
2018-07-12 14:44                                   ` H.J. Lu
2018-07-12 23:28                                     ` H.J. Lu
2018-07-14  2:08                                       ` Andy Lutomirski
2018-07-16 16:05                                         ` H.J. Lu
2018-07-20 21:35                                           ` Andy Lutomirski

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox