On Mon, Oct 30, 2017 at 12:29:47PM -0700, Linus Torvalds wrote: >On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu wrote: >> >> Here are 3 dmesgs related to wireless and 1 from ethernet. > >Fengguang, these would be lovelier still _if_ you have DEBUG_INFO >enabled on the kernel, and your script were to find things like >"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them. > >So > >> [ 235.425464] BUG: unable to handle kernel paging request at 0000000000010007 >> [ 235.425470] IP: run_timer_softirq+0x13a/0x470 > >would also then have > > run_timer_softirq at timer.c:XYZ > >which would make it easier to see exactly _what_ it is that faults. As >it is, I think there's a fair number of inlining that makes it hard to >see the cause, but that faddrtoline would make very obvious. > >Finding that "symbol+xyz/abc" pattern should be fairly easy to >automate, and should fit the 0day model fairly well. No? > >In this case, the trapping instruction ends up decoding to > > 0: 4c 8d 6c c5 90 lea -0x70(%rbp,%rax,8),%r13 > 5: 49 8b 45 00 mov 0x0(%r13),%rax > 9: 48 85 c0 test %rax,%rax > c: 74 de je 0xffffffffffffffec > e: 4d 8b 7d 00 mov 0x0(%r13),%r15 > 12: 4d 89 7c 24 08 mov %r15,0x8(%r12) > 17: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 1c: 49 8b 07 mov (%r15),%rax > 1f: 49 8b 57 08 mov 0x8(%r15),%rdx > 23: 48 85 c0 test %rax,%rax > 26: 48 89 02 mov %rax,(%rdx) > 29: 74 04 je 0x2f > 2b:* 48 89 50 08 mov %rdx,0x8(%rax) <-- trapping instruction > 2f: 41 f6 47 2a 20 testb $0x20,0x2a(%r15) > 34: 49 c7 47 08 00 00 00 movq $0x0,0x8(%r15) > >and %rax has the value 0xffff, so yes, it will trap at 0x10007. > >It's not trivial to see just *wjhat* access it is. > >I *think* that "testb $32" is checking for TIMER_IRQSAFE in >expire_timers(), and that the oops is due to the list operations in >detach_timer() (inlined). > >Which doesn't really help: it looks like the timer lists are corrupt. >With some luck, some register state could have the timer function >pointer in it, and we'd get a hint of *which* timer this is, but that >doesn't look to be the case here either. > >I'm not seeing anything to really help debug this here. Yes it's accessing the list. Here is the faddr2line output. [ 190.983445] PASS fils_sk_hlp_no_rapid_commit 0.549105 2017-11-09 12:41:50.114825 [ 190.983447] [ 190.986376] passed all 1 test case(s) [ 190.986377] [ 191.006705] ignored_by_lkp fils_sk_hlp_oom [ 191.006708] [ 191.007635] general protection fault: 0000 [#1] SMP [ 191.007636] Modules linked in: cmac ccm arc4 mac80211_hwsim mac80211 cfg80211 rfkill rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi sb_edac crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ppdev aesni_intel crypto_simd glue_helper cryptd snd_pcm snd_timer snd soundcore pcspkr serio_raw ata_piix i2c_piix4 libata floppy parport_pc parport ip_tables [ 191.007656] CPU: 1 PID: 186 Comm: sed Not tainted 4.14.0-rc8-00013-g87df261 #1 [ 191.007657] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 191.007659] task: ffff88007fa58000 task.stack: ffffc900008b0000 [ 191.007663] RIP: 0010:run_timer_softirq+0x13a/0x470: run_timer_softirq at include/linux/list.h:651 [ 191.007664] RSP: 0018:ffff88013fd03ec8 EFLAGS: 00010086 [ 191.007665] RAX: dead000000000200 RBX: 0000000000000001 RCX: ffff8800710bbb80 [ 191.007665] RDX: ffff88013fd03ed0 RSI: ffff88013fd122e8 RDI: ffff88013fd03ed8 [ 191.007666] RBP: ffff88013fd03f40 R08: ffff88013fd127f8 R09: ffff88013fd03ed8 [ 191.007667] R10: ffff88013fd12330 R11: 000000000000008f R12: ffff88013fd122c0 [ 191.007668] R13: ffff88013fd03ed0 R14: dead000000000200 R15: ffff8800710bbb80 [ 191.007669] FS: 00007fabab3b9800(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 [ 191.007670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.007674] CR2: 00007fc7ea4895d0 CR3: 000000007fa57002 CR4: 00000000001606e0 [ 191.007678] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 191.007679] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 191.007680] Call Trace: [ 191.007681] [ 191.007685] ? kvm_clock_get_cycles+0x25/0x30: kvm_clock_get_cycles at arch/x86/include/asm/preempt.h:81 [ 191.007687] ? ktime_get+0x40/0xa0: ktime_get at kernel/time/timekeeping_internal.h:19 [ 191.007690] __do_softirq+0x104/0x2ab [ 191.007693] irq_exit+0xf1/0x100: irq_exit at kernel/softirq.c:364 [ 191.007695] smp_apic_timer_interrupt+0x69/0x140: smp_apic_timer_interrupt at arch/x86/include/asm/irq_regs.h:27 [ 191.007696] apic_timer_interrupt+0x9d/0xb0 [ 191.007697] [ 191.007701] RIP: 0010:console_unlock+0x1f0/0x4f0: console_unlock at arch/x86/include/asm/paravirt.h:778 [ 191.007702] RSP: 0018:ffffc900008b3d18 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff10 [ 191.007703] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff [ 191.007703] RDX: ffffc900008b3cd0 RSI: 0000000000000004 RDI: 0000000000000247 [ 191.007704] RBP: ffffc900008b3d50 R08: 0000000000000000 R09: 0000000000000000 [ 191.007705] R10: ffff8800000be400 R11: 0000000082354601 R12: 000000000000002e [ 191.007705] R13: ffffffff820755a0 R14: 0000000000000000 R15: ffffffff823546a0 [ 191.007708] vprintk_emit+0x302/0x3b0: vprintk_emit at kernel/printk/printk.c:1760 [ 191.007710] printk_emit+0x33/0x3b: printk_emit at kernel/printk/printk.c:1782 [ 191.007713] ? __kmalloc+0x195/0x210: __kmalloc at mm/slab.h:421 [ 191.007715] devkmsg_write+0xf6/0x150: devkmsg_write at kernel/printk/printk.c:815 [ 191.007718] __vfs_write+0xd3/0x150: __vfs_write at fs/read_write.c:470 [ 191.007719] vfs_write+0xc7/0x1b0: vfs_write at fs/read_write.c:545 [ 191.007721] SyS_write+0x46/0xa0: SyS_write at fs/read_write.c:590 [ 191.007722] entry_SYSCALL_64_fastpath+0x1a/0xa5: entry_SYSCALL_64_fastpath at arch/x86/entry/entry_64.S:206 [ 191.007723] RIP: 0033:0x7fabaaab6720 [ 191.007724] RSP: 002b:00007fff07d55008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 191.007725] RAX: ffffffffffffffda RBX: 0000000000000106 RCX: 00007fabaaab6720 [ 191.007725] RDX: 0000000000000001 RSI: 0000000000611598 RDI: 0000000000000001 [ 191.007726] RBP: 000000000000022c R08: 00007fabaad74760 R09: 00007fabaad73600 [ 191.007727] R10: 000000000000001e R11: 0000000000000246 R12: 00000000021ada10 [ 191.007727] R13: 0000000000fd4940 R14: 00007fff07d54f30 R15: 000000000000022c [ 191.007729] Code: 4c 8d 6c c5 90 49 8b 45 00 48 85 c0 74 de 4d 8b 7d 00 4d 89 7c 24 08 0f 1f 44 00 00 49 8b 07 49 8b 57 08 48 85 c0 48 89 02 74 04 <48> 89 50 08 41 f6 47 2a 20 49 c7 47 08 00 00 00 00 4c 89 e7 4d [ 191.007748] RIP: run_timer_softirq+0x13a/0x470: run_timer_softirq at include/linux/list.h:651 RSP: ffff88013fd03ec8 [ 191.007751] ---[ end trace bb3719ef16688f4f ]--- [ 191.007752] Kernel panic - not syncing: Fatal exception in interrupt [ 191.007773] Kernel Offset: disabled Regards, Fengguang