[PATCH 0/5] [GIT PULL] ftrace: Fix bug with function tracing and lockdep

* [PATCH 0/5] [GIT PULL] ftrace: Fix bug with function tracing and lockdep
@ 2012-05-31  1:28 Steven Rostedt
  2012-05-31  1:28 ` [PATCH 1/5] ftrace: Synchronize variable setting with breakpoints Steven Rostedt
                   ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: Steven Rostedt @ 2012-05-31  1:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Frederic Weisbecker,
	Masami Hiramatsu, H. Peter Anvin, Dave Jones, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 5623 bytes --]

Ingo,

Dave Jones reported a bug from ftrace. I saw the same bug but
thought it was a compile problem because it would trigger everytime
I booted, but after rebuilding the kernel, it stopped triggering.
I mistook this as a build bug (something not cleaned properly).

What also confused me was that Dave also experienced the bug going
away after doing a make clean.

Seems that was just shear luck!

After rebuilding and booting I was able to get it to trigger again.
I found that it was something going wrong with the updating 
of the functions via the breakpoints (the removal of stop machine for
function tracing). I added a mdelay(1) where I usually sync the
updates (add breakpoint, wait 1ms, modify code, wait 1ms, remove breakpoint)
and this happened to trigger the bug 90% of the time.

Finally figured it out. It was a combination of lockdep and the int3
handler:

.macro paranoidzeroentry_ist sym do_sym ist
ENTRY(\sym)
	INTR_FRAME
	PARAVIRT_ADJUST_EXCEPTION_FRAME
	pushq_cfi $-1		/* ORIG_RAX: no syscall to restart */
	subq $ORIG_RAX-R15, %rsp
	CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
	call save_paranoid
	TRACE_IRQS_OFF
	movq %rsp,%rdi		/* pt_regs pointer */
	xorl %esi,%esi		/* no error code */
	subq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist)
	call \do_sym
	addq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist)
	jmp paranoid_exit	/* %ebx: no swapgs flag */
	CFI_ENDPROC
END(\sym)
.endm

The above macro is called by the do_int3 code. When a int3 is hit, the
stack jumps to the debug stack (like what happens on NMI). The
 subq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist)
is a little trick incase the \do_sym happens to hit a breakpoint too.
And for the most part, that works.

But that TRACE_IRQS_OFF up there (and several more called by the
paranoid_exit), is not protected by that trick. Thus, if the tracing
code hits a breakpoint we reset the debug stack and start clobbering all
over the previous code's stack.

Usually this is not an issue, unless lockdep sees the TRACE_IRQS_OFF
or TRACE_IRQS_ON while holding locks that have been taken for the first
time (with interrupts disabled). This explains why if the bug did not
trigger on the first try of starting the function tracer, it usually
would not trigger. It requires a new lock to be taken. The longer you
keep the box running, the less likely it will acquire a lock for the first
time.

Once lockdep sees a new lock changing its interrupts status, it starts
recording it (and calling lots of code). Unfortunately, some of this
code is traced by the function tracer.

I first started making a lot of this code 'notrace' but found that
it just wasn't reliable enough. The list of functions to mark just
kept growing, and I would only know if I got them all if I happen
to run all the paths.

Then I tried to play tricks by doing a switch of the stack. I tried
to modify sync_regs() but it would partially work, and it seemed
that NMIs would cause it to crash as well. I spent too much time on
it and finally figured to go with the easiest approach.

As this is only needed when LOCKDEP (and just in case, I did it when
TRACE_IRQFLAGS was enabled), I figured breakpoint performance wasn't
an issue (LOCKDEP and TRACE_IRQSFLAGS has a much bigger overhead).

I decided to reuse the code from the NMI to switch the IDT before calling
the TRACE_IRQS_ON/OFF and switch it back afterward. It would be nice
to do it just once on entering do_int3, and once leaving, but because
the TRACE_IRQS_ON/OFF and TRACE_IRQS_IRETQ are called in several branches
of the paranoid_exit, I decided to keep it simple and modify it every
time.

In the future, we can work to make this rely on modifying_ftrace_code,
but there's races to deal with, and this bug needs to be fixed now.

Also, while debugging this, I found a few more little bugs.

The first patch fixes a race with setting the modifying_ftrace_code
and the executing of the int3 handlers.

The second fixes a bug where we don't do the breakpoint update of
the function trace callback (the one called in the mcount trampoline).
This could cause a GPF when tracing is running and the function
tracer changes.

The third patch is a minor bug fix in the NMI handling of the idt switch.
If a NMI preempts a int3 handler, it switches the idt, properly, and
sets a variable to tell the handler to put it back when it exits. The
problem is that the variable never got reset, and it constantly loaded
the old IDT back on every NMI from then on. This is more of a performance
and correctness fix.

The last two patches fix the lockdep and breakpoint issue.

Comments are welcomed, but if there's no problem then ...

Please pull the latest tip/perf/urgent tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/urgent

Head SHA1: 1f9f5e93a9ab9cdf2504a25c24f3f8715c4ba785

Steven Rostedt (5):
      ftrace: Synchronize variable setting with breakpoints
      ftrace: Use breakpoint method to update ftrace caller
      x86: Reset the debug_stack update counter
      x86: Allow nesting of the debug stack IDT setting
      ftrace/x86: Do not change stacks in DEBUG when calling lockdep

----
 arch/x86/include/asm/ftrace.h |    2 +-
 arch/x86/kernel/cpu/common.c  |    8 ++++-
 arch/x86/kernel/entry_64.S    |   44 ++++++++++++++++++++++++--
 arch/x86/kernel/ftrace.c      |   69 ++++++++++++++++++++++++++++++++++++-----
 arch/x86/kernel/nmi.c         |    4 ++-
 arch/x86/kernel/traps.c       |    3 +-
 6 files changed, 116 insertions(+), 14 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread