From: Joerg Roedel <jroedel@suse.de>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Borislav Petkov <bp@alien8.de>,
Andrew Morton <akpm@linux-foundation.org>,
Shile Zhang <shile.zhang@linux.alibaba.com>,
Andy Lutomirski <luto@amacapital.net>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Tzvetomir Stoyanov <tz.stoyanov@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
Date: Thu, 30 Apr 2020 16:11:21 +0200 [thread overview]
Message-ID: <20200430141120.GA8135@suse.de> (raw)
In-Reply-To: <20200429100731.201312a9@gandalf.local.home>
Hi,
On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote:
> Talking with Mathieu about this on IRC, he pointed out that my code does
> have a vzalloc() that is called:
>
> in trace_pid_write()
>
> pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
>
> This is done when -P1,2 is on the trace-cmd command line.
Okay, tracked it down, some instrumentation in the page-fault and
double-fault handler gave me the stack-traces. Here is what happens:
As already pointed out, it all happens because of page-faults on the
vzalloc'ed pid bitmap. It starts with this stack-trace:
RIP: 0010:trace_event_ignore_this_pid+0x23/0x30
Code: e9 c2 4b 6b 00 cc cc 48 8b 57 28 48 8b 8a b8 00 00 00 48 8b 82 c0 00 00 00 48 85 c0 74 11 48 8b 42 28 65 48 03 05 5d 9c e6 7e <0f> b6 40 7c c3 48 85 c9 75 ea f3 c3 90 48 8b 4f 70 48 83 02 01 48
RSP: 0018:ffffc90000673bd8 EFLAGS: 00010082
RAX: ffffe8ffffd8c870 RBX: 0000000000000203 RCX: ffff88810734ca90
RDX: ffff888451578000 RSI: ffffffff820f3d2a RDI: ffff888453594d68
RBP: ffff888453594d68 R08: 000000000001845e R09: ffffffff81114ba0
R10: 0000000000000000 R11: 000000000000000e R12: 4000000000000000
R13: ffffffff820f3d2a R14: 000000000001845e R15: 4000000000000002
? trace_event_raw_event_rcu_fqs+0xa0/0xa0
trace_event_raw_event_rcu_dyntick+0x89/0xa0
? trace_event_raw_event_rcu_dyntick+0x89/0xa0
? trace_event_raw_event_rcu_dyntick+0x89/0xa0
? insn_get_prefixes.part.2+0x174/0x2d0
rcu_irq_enter+0xf0/0x1d0
rcu_irq_enter_irqson+0x21/0x50
switch_mm_irqs_off+0x43c/0x570
? do_one_initcall+0x51/0x210
__text_poke+0x1a9/0x470
text_poke_bp_batch+0x73/0x180
text_poke_flush+0x43/0x50
arch_jump_label_transform_apply+0x16/0x30
__static_key_slow_dec_cpuslocked+0x42/0x50
static_key_slow_dec+0x1f/0x50
tracepoint_probe_unregister+0x1e2/0x220
trace_event_reg+0x6a/0x80
__ftrace_event_enable_disable+0x1ca/0x240
__ftrace_set_clr_event_nolock+0xe1/0x140
__ftrace_set_clr_event+0x3d/0x60
system_enable_write+0x76/0xa0
vfs_write+0xad/0x1a0
? rcu_irq_exit+0xb8/0x170
ksys_write+0x48/0xb0
do_syscall_64+0x60/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7f875b0ecdb0
Code: Bad RIP value.
RSP: 002b:00007ffc746be918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f875b0ecdb0
RDX: 0000000000000001 RSI: 00007ffc746be93f RDI: 0000000000000004
RBP: 00007ffc746be9a0 R08: 636172742f6c6500 R09: 0000000002242ee0
R10: 6f662f7365636e61 R11: 0000000000000246 R12: 000000000040a4b0
R13: 00007ffc746bebd0 R14: 0000000000000000 R15: 0000000000000000
I havn't figured out how rcu_irq_enter() calls down into
trace_event_raw_event_rcu_dyntick() and further into
trace_event_ignore_this_pid(), but the stacktrace shows it does.
So trace_event_ignore_this_pid() faults on the vzalloc()'ed memory, calling
into the page-fault handler. What happens there is:
RIP: 0010:trace_event_ignore_this_pid+0x23/0x30
Code: e9 c2 4b 6b 00 cc cc 48 8b 57 28 48 8b 8a b8 00 00 00 48 8b 82 c0 00 00 00 48 85 c0 74 11 48 8b 42 28 65 48 03 05 5d 9c e6 7e <0f> b6 40 7c c3 48 85 c9 75 ea f3 c3 90 48 8b 4f 70 48 83 02 01 48
RSP: 0018:ffffc90000673a50 EFLAGS: 00010082
RAX: ffffe8ffffd8c870 RBX: 0000000000000203 RCX: ffff88810734ca90
RDX: ffff888451578000 RSI: ffffe8ffffd8c8ec RDI: ffff88844fd98478
RBP: ffff88844fd98478 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffe8ffffd8c8ec
R13: 0000000000000000 R14: ffffc90000673b28 R15: 0000000000000000
trace_event_raw_event_x86_exceptions+0x87/0xa0
? trace_event_buffer_lock_reserve+0x6e/0x110
do_page_fault+0x45e/0x630
? trace_hardirqs_off_thunk+0x1a/0x37
page_fault+0x43/0x50
The page-fault handler calls a tracing function which again ends up in
trace_event_ignore_this_pid(), where it faults again. From here on the CPU is in
a page-fault loop, which continues until the stack overflows (with
CONFIG_VMAP_STACK).
Then there is no mapped stack anymore, so the page-fault results in a
double-fault, which uses an IST stack. The double-fault handler does
ist_enter(), which calls into rcu_nmi_enter(), which also has trace-events down
its call-path. I have no stack-trace for this, but what likely happens
now is that it page-faults again while on the IST stack and the
page-fault loops until the #DF IST stack overflows. Then the next #DF
happens and the stack pointer is reset to the top of the #DF IST stack,
starting the loop over again. This loops forever, causing the hang.
Regards,
Joerg
next prev parent reply other threads:[~2020-04-30 14:11 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-29 9:48 [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-04-29 10:59 ` Joerg Roedel
2020-04-29 12:28 ` Steven Rostedt
2020-04-29 14:07 ` Steven Rostedt
2020-04-29 14:10 ` Joerg Roedel
2020-04-29 14:32 ` Steven Rostedt
2020-04-29 15:44 ` Peter Zijlstra
2020-04-29 16:17 ` Joerg Roedel
2020-04-29 16:20 ` Joerg Roedel
2020-04-29 16:52 ` Steven Rostedt
2020-04-29 17:29 ` Mathieu Desnoyers
2020-04-29 18:51 ` Peter Zijlstra
2020-04-30 14:11 ` Joerg Roedel [this message]
2020-04-30 14:50 ` Joerg Roedel
2020-04-30 15:20 ` Mathieu Desnoyers
2020-04-30 16:16 ` Steven Rostedt
2020-04-30 16:18 ` Mathieu Desnoyers
2020-04-30 16:30 ` Steven Rostedt
2020-04-30 16:35 ` Mathieu Desnoyers
2020-04-30 15:23 ` Mathieu Desnoyers
2020-04-30 16:12 ` Steven Rostedt
2020-04-30 16:11 ` Steven Rostedt
2020-04-30 16:16 ` Mathieu Desnoyers
2020-04-30 16:25 ` Steven Rostedt
2020-04-30 19:14 ` Joerg Roedel
2020-05-01 1:13 ` Steven Rostedt
2020-05-01 2:26 ` Mathieu Desnoyers
2020-05-01 2:39 ` Steven Rostedt
2020-05-01 10:16 ` Joerg Roedel
2020-05-01 13:35 ` Mathieu Desnoyers
2020-05-04 15:12 ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Joerg Roedel
2020-05-04 15:28 ` Mathieu Desnoyers
2020-05-04 15:31 ` Joerg Roedel
2020-05-04 15:38 ` Mathieu Desnoyers
2020-05-04 15:51 ` Joerg Roedel
2020-05-04 17:04 ` Steven Rostedt
2020-05-04 17:40 ` Steven Rostedt
2020-05-04 18:38 ` Joerg Roedel
2020-05-04 19:10 ` Steven Rostedt
2020-05-05 12:31 ` [PATCH] tracing: Call vmalloc_sync_mappings() after alloc_percpu() Joerg Roedel
2020-05-06 15:17 ` Steven Rostedt
2020-05-08 14:42 ` Joerg Roedel
2020-05-04 20:25 ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Peter Zijlstra
2020-05-04 20:43 ` Steven Rostedt
2020-05-01 4:20 ` [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-05-01 13:22 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200430141120.GA8135@suse.de \
--to=jroedel@suse.de \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=rostedt@goodmis.org \
--cc=shile.zhang@linux.alibaba.com \
--cc=tglx@linutronix.de \
--cc=tz.stoyanov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.