run_timer_softirq gpf. tracing?

* run_timer_softirq gpf. tracing?
@ 2017-03-21 18:17 Dave Jones
  2017-03-21 19:25 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Jones @ 2017-03-21 18:17 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Steven Rostedt, tglx

I just hit this while fuzzing..

general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc2-think+ #1 
task: ffff88017f0ed440 task.stack: ffffc90000094000
RIP: 0010:run_timer_softirq+0x15f/0x700
RSP: 0018:ffff880507c03ec8 EFLAGS: 00010086
RAX: dead000000000200 RBX: ffff880507dd0d00 RCX: 0000000000000002
RDX: ffff880507c03ed0 RSI: 00000000ffffffff RDI: ffffffff8204b3a0
RBP: ffff880507c03f48 R08: ffff880507dd12d0 R09: ffff880507c03ed8
R10: ffff880507dd0db0 R11: 0000000000000000 R12: ffffffff8215cc38
R13: ffff880507c03ed0 R14: ffffffff82005188 R15: ffff8804b55491a8
FS:  0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000005011000 CR4: 00000000001406e0
Call Trace:
 <IRQ>
 ? clockevents_program_event+0x47/0x120
 __do_softirq+0xbf/0x5b1
 irq_exit+0xb5/0xc0
 smp_apic_timer_interrupt+0x3d/0x50
 apic_timer_interrupt+0x97/0xa0
RIP: 0010:cpuidle_enter_state+0x12e/0x400
RSP: 0018:ffffc90000097e40 EFLAGS: 00000202
[CONT START]  ORIG_RAX: ffffffffffffff10
RAX: ffff88017f0ed440 RBX: ffffe8ffffa03cc8 RCX: 0000000000000001
RDX: 20c49ba5e353f7cf RSI: 0000000000000001 RDI: ffff88017f0ed440
RBP: ffffc90000097e80 R08: 00000000ffffffff R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: ffffffff820b9338 R14: 0000000000000005 R15: ffffffff820b9320
 </IRQ>
 cpuidle_enter+0x17/0x20
 call_cpuidle+0x23/0x40
 do_idle+0xfb/0x200
 cpu_startup_entry+0x71/0x80
 start_secondary+0x16a/0x210
 start_cpu+0x14/0x14
Code: 8b 05 ce 1b ef 7e 83 f8 03 0f 87 4e 01 00 00 89 c0 49 0f a3 04 24 0f 82 0a 01 00 00 49 8b 07 49 8b 57 08 48 85 c0 48 89 02 74 04 <48> 89 50 08 41 f6 47 2a 20 49 c7 47 08 00 00 00 00 48 89 df 48 

RAX looks like list poison, and CR2 = 4, which is likely the ->next of a list,
with a NULL pointer.

Here's the disassembly of timer.o
https://codemonkey.org.uk/junk/timer.o.txt

Sure enough, run_timer_softirq+0x15f (1ccf) is..

	entry->next = LIST_POISON2;
    1cc8:	48 b8 00 02 00 00 00 	movabs $0xdead000000000200,%rax
    1ccf:	00 ad de 

This looks to be something to do with the tracing definition of timer_cancel.
I don't fully grok how the tracing macros work, is there something obvious
here before I dig deeper ?

Right before that list op, we do this..

	if (timer->flags & TIMER_IRQSAFE) {

Which looks like a line from expire_timers.   I'm not following how we get there
from that tracing definition.

No tracing was active when this happened.

	Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread