linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: He Zhe <zhe.he@windriver.com>
To: Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Joel Fernandes <joel@joelfernandes.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC][PATCH] tracing/x86: Save CR2 before tracing irqsoff on error_entry
Date: Wed, 17 Apr 2019 09:52:43 +0800	[thread overview]
Message-ID: <bb376e6b-3ba4-8cd1-9fa4-8cfd421f8413@windriver.com> (raw)
In-Reply-To: <20190320221534.165ab87b@oasis.local.home>

From the last replies in the thread, it seems some work is going on. May I ask
when we can possibly roughly have a fix or workaround?

Thanks,
Zhe


On 3/21/19 10:15 AM, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
>
> He Zhe reported a crash by enabling trace events and selecting
> "userstacktrace" which will read the stack of userspace for every trace
> event recorded. Zhe narrowed it down to:
>
>  c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
>
> With the problem config, I was able to also reproduce the error. I
> narrowed it down to just having to do the following:
>
>  # cd /sys/kernel/tracing
>  # echo 1 > options/userstacktrace
>  # echo 1 > events/preemptirq/irq_disable/enable
>
> And sure enough, I triggered a crash. Well, it was systemd crashing
> with a bad memory access??
>
>  systemd-journal[537]: segfault at ed8cb8 ip 00007f7fffc9fef5 sp 00007ffc4062cb10 error 7
>
> And it would crash similarly each time I tried it, but always at a
> different place. After spending the day on this, I finally figured it
> out. The bug is happening in entry_64.S right after error_entry.
> There's two TRACE_IRQS_OFF in that code path, which if I comment out,
> the bug goes away. Then it dawned on me that the crash always happens
> when systemd does a normal page fault. We had this bug before, and it
> was with the exception trace points.
>
> The issue is that a tracepoint can fault (reading vmalloc or whatever).
> And doing a userspace stack trace most definitely will fault. But if we
> are coming from a legitimate page fault, the address of that fault (in
> the CR2 register) will be lost if we fault before we get to the page
> fault handler. That's exactly what is happening.
>
> To solve this, a TRACE_IRQS_OFF_CR2 (and ON for consistency) was added
> that saves the CR2 register. A new trace_hardirqs_off_thunk_cr2 is
> created that stores the cr2 register, calls the
> trace_hardirqs_off_caller, then on return restores the cr2 register if
> it changed, before returning.
>
> On my tests this fixes the issue. I just want to know if this is a
> legitimate fix or if someone can come up with a better fix?
>
> Note: this also saves the exception context just like the
> do_page_fault() function does.
>
> Note2: This only gets enabled when lockdep or irq tracing is enabled,
> which is not recommended for production environments.
>
> Link: http://lkml.kernel.org/r/897cf5cf-fc24-8a64-cb28-847f2d2e63d2@windriver.com
>
> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 7bc105f47d21..7edffec328e2 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -292,6 +292,32 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
>  
>  	syscall_return_slowpath(regs);
>  }
> +
> +extern void trace_hardirqs_on_caller(unsigned long caller_addr);
> +__visible void trace_hardirqs_on_caller_cr2(unsigned long caller_addr)
> +{
> +	unsigned long address = read_cr2(); /* Get the faulting address */
> +	enum ctx_state prev_state;
> +
> +	prev_state = exception_enter();
> +	trace_hardirqs_on_caller(caller_addr);
> +	if (address != read_cr2())
> +		write_cr2(address);
> +	exception_exit(prev_state);
> +}
> +
> +extern void trace_hardirqs_off_caller(unsigned long caller_addr);
> +__visible void trace_hardirqs_off_caller_cr2(unsigned long caller_addr)
> +{
> +	unsigned long address = read_cr2(); /* Get the faulting address */
> +	enum ctx_state prev_state;
> +
> +	prev_state = exception_enter();
> +	trace_hardirqs_off_caller(caller_addr);
> +	if (address != read_cr2())
> +		write_cr2(address);
> +	exception_exit(prev_state);
> +}
>  #endif
>  
>  #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 1f0efdb7b629..73ddf24fed3e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1248,12 +1248,12 @@ ENTRY(error_entry)
>  	 * we fix gsbase, and we should do it before enter_from_user_mode
>  	 * (which can take locks).
>  	 */
> -	TRACE_IRQS_OFF
> +	TRACE_IRQS_OFF_CR2
>  	CALL_enter_from_user_mode
>  	ret
>  
>  .Lerror_entry_done:
> -	TRACE_IRQS_OFF
> +	TRACE_IRQS_OFF_CR2
>  	ret
>  
>  	/*
> diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
> index be36bf4e0957..1300b53b98cb 100644
> --- a/arch/x86/entry/thunk_64.S
> +++ b/arch/x86/entry/thunk_64.S
> @@ -41,6 +41,8 @@
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
>  	THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
> +	THUNK trace_hardirqs_on_thunk_cr2,trace_hardirqs_on_caller_cr2,1
> +	THUNK trace_hardirqs_off_thunk_cr2,trace_hardirqs_off_caller_cr2,1
>  #endif
>  
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 058e40fed167..dd511742ca2e 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -172,9 +172,13 @@ static inline int arch_irqs_disabled(void)
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  #  define TRACE_IRQS_ON		call trace_hardirqs_on_thunk;
>  #  define TRACE_IRQS_OFF	call trace_hardirqs_off_thunk;
> +#  define TRACE_IRQS_ON_CR2	call trace_hardirqs_on_thunk_cr2;
> +#  define TRACE_IRQS_OFF_CR2	call trace_hardirqs_off_thunk_cr2;
>  #else
>  #  define TRACE_IRQS_ON
>  #  define TRACE_IRQS_OFF
> +#  define TRACE_IRQS_ON_CR2
> +#  define TRACE_IRQS_OFF_CR2
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  #  ifdef CONFIG_X86_64
>


      parent reply	other threads:[~2019-04-17  1:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21  2:15 [RFC][PATCH] tracing/x86: Save CR2 before tracing irqsoff on error_entry Steven Rostedt
2019-03-21  8:33 ` Peter Zijlstra
2019-03-21  9:02   ` Peter Zijlstra
2019-03-21 10:45     ` Peter Zijlstra
2019-03-21 13:32       ` Steven Rostedt
2019-03-21 13:55         ` Steven Rostedt
2019-03-21 17:23           ` Linus Torvalds
2019-03-21 17:22         ` Peter Zijlstra
2019-03-21 18:05           ` Andy Lutomirski
2019-03-21 18:10             ` Steven Rostedt
2019-03-21 18:27               ` Andy Lutomirski
2019-03-21 20:50                 ` Peter Zijlstra
2019-03-22  2:52                   ` Andy Lutomirski
2019-03-21 18:28               ` Peter Zijlstra
2019-03-21 18:55                 ` Steven Rostedt
2019-03-21 19:31                   ` Peter Zijlstra
2019-03-21 19:50                     ` Steven Rostedt
2019-03-21 20:03                       ` Peter Zijlstra
2019-03-21 20:11                         ` Steven Rostedt
2019-03-21 18:18             ` Linus Torvalds
2019-03-21 18:20               ` Andy Lutomirski
2019-03-21 18:25                 ` Linus Torvalds
2019-03-21 18:37                   ` Peter Zijlstra
2019-03-21 18:39                     ` Andy Lutomirski
2019-03-21 20:00                       ` Andrew Cooper
2019-03-21 20:35                         ` Steven Rostedt
2019-03-21 18:38                   ` Andy Lutomirski
2019-03-21 18:42                     ` Peter Zijlstra
2019-03-21 18:22               ` hpa
2019-03-22  5:54               ` Juergen Gross
2019-03-21 18:27             ` Peter Zijlstra
2019-03-21 18:28               ` Andy Lutomirski
2019-03-21 18:33                 ` Peter Zijlstra
2019-03-21 13:04   ` Steven Rostedt
2019-04-17  1:52 ` He Zhe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb376e6b-3ba4-8cd1-9fa4-8cfd421f8413@windriver.com \
    --to=zhe.he@windriver.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).