From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>, Pekka Paalanen <pq@iki.fi>,
miltonm@bga.com, David Miller <davem@davemloft.net>,
linuxppc-dev@ozlabs.org, Steven Rostedt <srostedt@redhat.com>,
Paul Mundt <lethal@linux-sh.org>,
Paul Mackerras <paulus@samba.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 1/1] ftrace,ppc64: do not use nops for modules
Date: Mon, 17 Nov 2008 20:32:13 -0500 [thread overview]
Message-ID: <20081118013402.002336279@goodmis.org> (raw)
In-Reply-To: 20081118013212.470074567@goodmis.org
Impact: fix PPC64 race condition
Milton Miller pointed out a nasty race with the current code of PPC64
dynamic ftrace. PPC64 keeps a table of content reference in the register
r2. If this gets corrupted, the kernel can crash. The calls to mcount are
replaced with nops. This is not the problem. The problem arises when we
start a trace and we can modify this code after preempting a function
being traced. Lets look at the assembly:
Origin code:
bl <mcount-trampoline>
ld r2,40(r1)
The mcount-trampoline includes this code:
std r2,40(r1)
ld r11,32(r12) <- 12 holds information on the jump.
ld r2,40(r12)
mtctr r11
bctr r11
We see that the trampoline stores the original TOC in the stack, replaces
the TOC with the target jump and then jumps to the target. The link
register will jump directly back to the code that called the trampoline.
But if we replace the ld r2,40(r1) as a nop and a function has been
preempted while tracing, on return, we never update the r2 back to
the module's TOC.
Milton Miller suggested instead of replacing the code with nops, to
replace the bl <mcount-trampoline> with a "b 8". Hence we would end
up with this code:
b 1f
ld r2,40(r1)
1:
Any new callers would not change their r2 TOC register, and any callers
that have been preempted will still return to their original TOC.
But a branch has slightly more overhead than a plain nop. Since the
first change is done before the module ever runs, there are no
race conditions. For the first change, I convert the code to nops,
only if they point to mcount. After that I convert the code to the branch.
The code only points to mcount on start up, and will point to
ftrace_caller when a trace is started.
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
arch/powerpc/kernel/ftrace.c | 44 +++++++++++++++++++++++++++++++++++------
1 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index cc901b1..50a3246 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -158,6 +158,9 @@ static unsigned int branch_offset(unsigned long offset)
}
#ifdef CONFIG_PPC64
+/* static variable to not confuse recordmcount.pl script */
+static const unsigned long mcount_addr = MCOUNT_ADDR;
+
static int
__ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
@@ -168,7 +171,7 @@ __ftrace_make_nop(struct module *mod,
unsigned long *ptr = (unsigned long *)&jmp;
unsigned long ip = rec->ip;
unsigned long tramp;
- int offset;
+ int offset, size;
/* read where this goes */
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
@@ -248,10 +251,34 @@ __ftrace_make_nop(struct module *mod,
return -EINVAL;
}
- op[0] = PPC_NOP_INSTR;
- op[1] = PPC_NOP_INSTR;
+ /*
+ * Milton Miller pointed out that we can not blindly do nops.
+ * If a task was preempted when calling a trace function,
+ * the nops will remove the way to restore the TOC in r2
+ * and the r2 TOC will get corrupted.
+ *
+ * But, we only need to do that on shutdown of the tracer,
+ * if the pointer is still to mcount, then this is being called
+ * from initialization code, and we do not need to worry about
+ * races. NOPs are a tiny bit faster than a branch, so use that first.
+ * Why punish those that never start a trace.
+ */
+ if (addr == (unsigned long)mcount_addr) {
+ op[0] = PPC_NOP_INSTR;
+ op[1] = PPC_NOP_INSTR;
+ size = MCOUNT_INSN_SIZE * 2;
+ } else {
+ /*
+ * Replace with:
+ * bl <tramp> <<<<< replace by "b 1f"
+ * ld r2,40(r1)
+ * 1:
+ */
+ op[0] = 0x48000008; /* b +8 */
+ size = MCOUNT_INSN_SIZE;
+ }
- if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE * 2))
+ if (probe_kernel_write((void *)ip, replaced, size))
return -EPERM;
return 0;
@@ -381,9 +408,12 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE * 2))
return -EFAULT;
- /* It should be pointing to two nops */
- if ((op[0] != PPC_NOP_INSTR) ||
- (op[1] != PPC_NOP_INSTR)) {
+ /*
+ * It should be pointing to two nops or
+ * b +8; ld r2,40(r1)
+ */
+ if (((op[0] != 0x48000008) || (op[1] != 0xe8410028)) &&
+ ((op[0] != PPC_NOP_INSTR) || (op[1] != PPC_NOP_INSTR))) {
printk(KERN_ERR "Expected NOPs but have %x %x\n", op[0], op[1]);
return -EINVAL;
}
--
1.5.6.5
--
prev parent reply other threads:[~2008-11-18 1:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-18 1:32 [PATCH 0/1] ftrace, ppc64: fix race condition pointed out by Milton Miller Steven Rostedt
2008-11-18 1:32 ` Steven Rostedt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081118013402.002336279@goodmis.org \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=fweisbec@gmail.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=miltonm@bga.com \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=pq@iki.fi \
--cc=rusty@rustcorp.com.au \
--cc=srostedt@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).