All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joerg Roedel <jroedel@suse.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
Date: Thu, 30 Apr 2020 12:25:58 -0400	[thread overview]
Message-ID: <20200430122558.406c9755@gandalf.local.home> (raw)
In-Reply-To: <505666080.77869.1588263380070.JavaMail.zimbra@efficios.com>

On Thu, 30 Apr 2020 12:16:20 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Apr 30, 2020, at 12:11 PM, rostedt rostedt@goodmis.org wrote:
> 
> > On Thu, 30 Apr 2020 16:11:21 +0200
> > Joerg Roedel <jroedel@suse.de> wrote:
> >   
> >> Hi,
> >> 
> >> On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote:  
> >> > Talking with Mathieu about this on IRC, he pointed out that my code does
> >> > have a vzalloc() that is called:
> >> > 
> >> > in trace_pid_write()
> >> > 
> >> > 	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
> >> > 
> >> > This is done when -P1,2 is on the trace-cmd command line.  
> >> 
> >> Okay, tracked it down, some instrumentation in the page-fault and
> >> double-fault handler gave me the stack-traces. Here is what happens:
> >> 
> >> As already pointed out, it all happens because of page-faults on the
> >> vzalloc'ed pid bitmap. It starts with this stack-trace:
> >> 
> >>  RIP: 0010:trace_event_ignore_this_pid+0x23/0x30  
> > 
> > Interesting. Because that function is this:
> > 
> > bool trace_event_ignore_this_pid(struct trace_event_file *trace_file)
> > {
> >	struct trace_array *tr = trace_file->tr;
> >	struct trace_array_cpu *data;
> >	struct trace_pid_list *no_pid_list;
> >	struct trace_pid_list *pid_list;
> > 
> >	pid_list = rcu_dereference_raw(tr->filtered_pids);
> >	no_pid_list = rcu_dereference_raw(tr->filtered_no_pids);
> > 
> >	if (!pid_list && !no_pid_list)
> >		return false;
> > 
> >	data = this_cpu_ptr(tr->array_buffer.data);
> > 
> >	return data->ignore_pid;
> > }
> > 
> > Where it only sees if the pid masks exist. That is, it looks to see if
> > there's pointers to them, it doesn't actually touch the vmalloc'd area.
> > This check is to handle a race between allocating and deallocating the
> > buffers and setting the ignore_pid bit. The reading of these arrays is done
> > at sched_switch time, which sets or clears the ignore_pid field.
> > 
> > That said, since this only happens on buffer instances (it does not trigger
> > on the top level instance, which uses the same code for the pid masks)
> > 
> > Could this possibly be for the tr->array_buffer.data, which is allocated
> > with:
> > 
> > allocate_trace_buffer() {
> >	[..]
> >	buf->data = alloc_percpu(struct trace_array_cpu);
> > 
> > That is, the bug isn't the vmalloc being a problem, but perhaps the per_cpu
> > allocation. This would explain why this crashes with the buffer instance
> > and not with the top level instance. If it was related to the pid masks,
> > then it would trigger for either (because they act the same in allocating
> > at time of use). But when an instance is made, the tr->array_buffer.data is
> > created. Which for the top level happens at boot up and the pages would
> > have been synced long ago. But for a newly created instance, this happens
> > just before its used. This could possibly explain why it's not a problem
> > when doing it manually by hand, because the time between creating the
> > instance, and the time to start and stop the tracing, is long enough for
> > something to sync them page tables.
> > 
> > tl;dr; It's not an issue with the vmalloc, it's an issue with per_cpu
> > allocations!  
> 
> Did I mention that alloc_percpu uses:
> 
> static void *pcpu_mem_zalloc(size_t size, gfp_t gfp)
> {
>         if (WARN_ON_ONCE(!slab_is_available()))
>                 return NULL;
> 
>         if (size <= PAGE_SIZE)
>                 return kzalloc(size, gfp);
>         else
>                 return __vmalloc(size, gfp | __GFP_ZERO, PAGE_KERNEL);
> }
> 
> So yeah, it's vmalloc'd memory when size > PAGE_SIZE.
> 

I certainly hope that struct trace_array_cpu is not bigger than PAGE_SIZE!

-- Steve

  reply	other threads:[~2020-04-30 16:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29  9:48 [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-04-29 10:59 ` Joerg Roedel
2020-04-29 12:28   ` Steven Rostedt
2020-04-29 14:07     ` Steven Rostedt
2020-04-29 14:10       ` Joerg Roedel
2020-04-29 14:32         ` Steven Rostedt
2020-04-29 15:44           ` Peter Zijlstra
2020-04-29 16:17       ` Joerg Roedel
2020-04-29 16:20         ` Joerg Roedel
2020-04-29 16:52           ` Steven Rostedt
2020-04-29 17:29             ` Mathieu Desnoyers
2020-04-29 18:51               ` Peter Zijlstra
2020-04-30 14:11       ` Joerg Roedel
2020-04-30 14:50         ` Joerg Roedel
2020-04-30 15:20           ` Mathieu Desnoyers
2020-04-30 16:16             ` Steven Rostedt
2020-04-30 16:18               ` Mathieu Desnoyers
2020-04-30 16:30                 ` Steven Rostedt
2020-04-30 16:35                   ` Mathieu Desnoyers
2020-04-30 15:23         ` Mathieu Desnoyers
2020-04-30 16:12           ` Steven Rostedt
2020-04-30 16:11         ` Steven Rostedt
2020-04-30 16:16           ` Mathieu Desnoyers
2020-04-30 16:25             ` Steven Rostedt [this message]
2020-04-30 19:14           ` Joerg Roedel
2020-05-01  1:13             ` Steven Rostedt
2020-05-01  2:26               ` Mathieu Desnoyers
2020-05-01  2:39                 ` Steven Rostedt
2020-05-01 10:16                   ` Joerg Roedel
2020-05-01 13:35                   ` Mathieu Desnoyers
2020-05-04 15:12                   ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Joerg Roedel
2020-05-04 15:28                     ` Mathieu Desnoyers
2020-05-04 15:31                       ` Joerg Roedel
2020-05-04 15:38                         ` Mathieu Desnoyers
2020-05-04 15:51                           ` Joerg Roedel
2020-05-04 17:04                           ` Steven Rostedt
2020-05-04 17:40                     ` Steven Rostedt
2020-05-04 18:38                       ` Joerg Roedel
2020-05-04 19:10                         ` Steven Rostedt
2020-05-05 12:31                           ` [PATCH] tracing: Call vmalloc_sync_mappings() after alloc_percpu() Joerg Roedel
2020-05-06 15:17                             ` Steven Rostedt
2020-05-08 14:42                               ` Joerg Roedel
2020-05-04 20:25                     ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Peter Zijlstra
2020-05-04 20:43                       ` Steven Rostedt
2020-05-01  4:20                 ` [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-05-01 13:22                   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200430122558.406c9755@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=tglx@linutronix.de \
    --cc=tz.stoyanov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.