linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
@ 2014-07-16  8:58 Petr Mladek
  2014-07-16 16:43 ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2014-07-16  8:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel, Petr Mladek

The trace/ring_buffer allows to swap the entire ring buffer. Everything has to
be done lockless. I think that I have found a race when trying to understand
the code. The problematic situation is the following:

CPU 1 (write/reserve event)		CPU 2 (swap the cpu buffer)
-------------------------------------------------------------------------
ring_buffer_write()
  if (cpu_buffer->record_disabled)
  ^^^ test fails and write continues

					ring_buffer_swap_cpu()

					  inc(&cpu_buffer_a->record_disabled);
					  inc(&cpu_buffer_b->record_disabled);

					  if (cpu_buffer_a->committing)
					  ^^^ test fails and swap continues
					  if (cpu_buffer_b->committing)
					  ^^^ test fails and swap continues

  rb_reserve_next_event()
    rb_start_commit()
      local_inc(&cpu_buffer->committing);
      local_inc(&cpu_buffer->commits);

    if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
    ^^^ test fails and reserve_next continues

					  buffer_a->buffers[cpu] = cpu_buffer_b;
					  buffer_b->buffers[cpu] = cpu_buffer_a;
					  cpu_buffer_b->buffer = buffer_a;
					  cpu_buffer_a->buffer = buffer_b;

  Pheeee, reservation continues in the removed buffer.

This patch solves the problem by swapping the CPU buffer on the same CPU. It
helps to avoid memory barriers and keep both writing and swapping fast.

Also it removes the obsolete check in rb_reserve_next_event(). The swap
operation is not allowed in NMI context, it is done on the same CPU, and
therefore it could newer interrupt any write.

Finally, it adds some comments about why the reserve and swap operations are
safe. I hope that I have got it right :-)

Changes:
  + v3: removes the check in  rb_reserve_next_event(); idea by Steven Rostedt

  + v2: does the swap on the same CPU instead of using memory barriers;
	idea by Steven Rostedt

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/trace/ring_buffer.c | 100 ++++++++++++++++++++++++++++++---------------
 1 file changed, 66 insertions(+), 34 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7c56c3d06943..2d5a316b0c06 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2527,21 +2527,6 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 	rb_start_commit(cpu_buffer);
 
-#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
-	/*
-	 * Due to the ability to swap a cpu buffer from a buffer
-	 * it is possible it was swapped before we committed.
-	 * (committing stops a swap). We check for it here and
-	 * if it happened, we have to fail the write.
-	 */
-	barrier();
-	if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
-		local_dec(&cpu_buffer->committing);
-		local_dec(&cpu_buffer->commits);
-		return NULL;
-	}
-#endif
-
 	length = rb_calculate_event_length(length);
  again:
 	add_timestamp = 0;
@@ -4280,23 +4265,35 @@ int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu)
 EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu);
 
 #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
+struct ring_buffer_swap_info {
+	struct ring_buffer *buffer_a;	/* One buffer to swap with */
+	struct ring_buffer *buffer_b;	/* The other buffer to swap with */
+	int ret;			/* Return value from the swap op. */
+};
+
 /**
- * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
- * @buffer_a: One buffer to swap with
- * @buffer_b: The other buffer to swap with
+ * ring_buffer_swap_this_cpu - swap CPU buffer, related to this CPU,
+ *	between two ring buffers.
+ * @arg: arguments and return value is passed via struct ring_buffer_swap_info.
+ *	The function is called via smp_call_function_single()
  *
- * This function is useful for tracers that want to take a "snapshot"
- * of a CPU buffer and has another back up buffer lying around.
- * it is expected that the tracer handles the cpu buffer not being
- * used at the moment.
+ * The swapping of a CPU buffer is done on the same CPU. It helps to avoid
+ * memory barriers and keep writing fast. We can't use synchronize_sched
+ * here because this function can be called in atomic context.
  */
-int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
-			 struct ring_buffer *buffer_b, int cpu)
+static void ring_buffer_swap_this_cpu(void *arg)
 {
+	struct ring_buffer_swap_info *rb_swap_info = arg;
+	struct ring_buffer *buffer_a;
+	struct ring_buffer *buffer_b;
 	struct ring_buffer_per_cpu *cpu_buffer_a;
 	struct ring_buffer_per_cpu *cpu_buffer_b;
-	int ret = -EINVAL;
+	int cpu = smp_processor_id();
+
+	buffer_a = rb_swap_info->buffer_a;
+	buffer_b = rb_swap_info->buffer_b;
 
+	rb_swap_info->ret = -EINVAL;
 	if (!cpumask_test_cpu(cpu, buffer_a->cpumask) ||
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
@@ -4308,7 +4305,8 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
 		goto out;
 
-	ret = -EAGAIN;
+	/* Also check if the buffers can be manipulated */
+	rb_swap_info->ret = -EAGAIN;
 
 	if (ring_buffer_flags != RB_BUFFERS_ON)
 		goto out;
@@ -4326,15 +4324,19 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 		goto out;
 
 	/*
-	 * We can't do a synchronize_sched here because this
-	 * function can be called in atomic context.
-	 * Normally this will be called from the same CPU as cpu.
-	 * If not it's up to the caller to protect this.
+	 * Recording has to be disabled. Otherwise, ring_buffer_lock_reserve()
+	 * and ring_buffer_unlock_commit() might operate with different
+	 * cpu buffers.
+	 *
+	 * All other operations are safe. Read gets the CPU buffer only once
+	 * and then works with it. Resize does the critical operation on the
+	 * same CPU with disabled interrupts.
 	 */
 	atomic_inc(&cpu_buffer_a->record_disabled);
 	atomic_inc(&cpu_buffer_b->record_disabled);
 
-	ret = -EBUSY;
+	/* Bail out if we interrupted some recording */
+	rb_swap_info->ret = -EBUSY;
 	if (local_read(&cpu_buffer_a->committing))
 		goto out_dec;
 	if (local_read(&cpu_buffer_b->committing))
@@ -4346,13 +4348,43 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	cpu_buffer_b->buffer = buffer_a;
 	cpu_buffer_a->buffer = buffer_b;
 
-	ret = 0;
-
+	rb_swap_info->ret = 0;
 out_dec:
 	atomic_dec(&cpu_buffer_a->record_disabled);
 	atomic_dec(&cpu_buffer_b->record_disabled);
 out:
-	return ret;
+	return;
+}
+
+/**
+ * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
+ * @buffer_a: One buffer to swap with
+ * @buffer_b: The other buffer to swap with
+ *
+ * This function is useful for tracers that want to take a "snapshot"
+ * of a CPU buffer and has another back up buffer lying around.
+ * It is expected that the tracer handles the cpu buffer not being
+ * used at the moment.
+ */
+int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
+			 struct ring_buffer *buffer_b, int cpu)
+{
+	struct ring_buffer_swap_info rb_swap_info = {
+		.buffer_a = buffer_a,
+		.buffer_b = buffer_b,
+	};
+	int ret;
+
+	/*
+	 * Swap the CPU buffer on the same CPU. Recording has to be fast
+	 * and and this helps to avoid memory barriers.
+	 */
+	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
+				       (void *)&rb_swap_info, 1);
+	if (ret)
+		return ret;
+
+	return rb_swap_info.ret;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu);
 #endif /* CONFIG_RING_BUFFER_ALLOW_SWAP */
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-16  8:58 [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel Petr Mladek
@ 2014-07-16 16:43 ` Steven Rostedt
  2014-07-18 15:34   ` Petr Mládek
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2014-07-16 16:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel

On Wed, 16 Jul 2014 10:58:04 +0200
Petr Mladek <pmladek@suse.cz> wrote:


> +/**
> + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> + * @buffer_a: One buffer to swap with
> + * @buffer_b: The other buffer to swap with
> + *
> + * This function is useful for tracers that want to take a "snapshot"
> + * of a CPU buffer and has another back up buffer lying around.
> + * It is expected that the tracer handles the cpu buffer not being
> + * used at the moment.
> + */
> +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> +			 struct ring_buffer *buffer_b, int cpu)
> +{
> +	struct ring_buffer_swap_info rb_swap_info = {
> +		.buffer_a = buffer_a,
> +		.buffer_b = buffer_b,
> +	};
> +	int ret;
> +
> +	/*
> +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> +	 * and and this helps to avoid memory barriers.
> +	 */
> +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> +				       (void *)&rb_swap_info, 1);
> +	if (ret)
> +		return ret;
> +
> +	return rb_swap_info.ret;

We need to check if the cpu is on the current CPU and if so, just call
the function directly. Otherwise this can't be done from interrupt
disabled context.

Enable IRQSOFF_TRACER and FTRACE_STARTUP_TEST to see why.

[    2.098008] Testing tracer irqsoff: 
[    2.351020] ------------[ cut here ]------------
[    2.352000] WARNING: CPU: 3 PID: 0 at /home/rostedt/work/git/linux-trace.git/kernel/smp.c:283 smp_call_function_single+0x66/0xad()
[    2.352000] Modules linked in:
[    2.352000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.16.0-rc3-test+ #93
[    2.352000] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[    2.352000]  0000000000000000 ffffffff810058cb ffffffff81503d20 0000000000000000
[    2.352000]  ffffffff81040ca1 0000000000000000 ffffffff8109712f 0000000000000001
[    2.352000]  0000000000000003 ffffffff810bf9db ffff88007a3c3e30 0000000000000001
[    2.352000] Call Trace:
[    2.352000]  [<ffffffff810058cb>] ? show_stack+0x5/0x3e
[    2.352000]  [<ffffffff81503d20>] ? dump_stack+0x4a/0x75
[    2.352000]  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
[    2.352000]  [<ffffffff8109712f>] ? smp_call_function_single+0x66/0xad
[    2.352000]  [<ffffffff810bf9db>] ? ring_buffer_swap_cpu+0x45/0x45
[    2.352000]  [<ffffffff8109712f>] ? smp_call_function_single+0x66/0xad
[    2.352000]  [<ffffffff810970ce>] ? smp_call_function_single+0x5/0xad
[    2.352000]  [<ffffffff810bf9ce>] ? ring_buffer_swap_cpu+0x38/0x45
[    2.352000]  [<ffffffff810c6c64>] ? update_max_tr_single+0xb6/0x11f
[    2.352000]  [<ffffffff8100ae3e>] ? default_idle+0x1d/0x30
[    2.352000]  [<ffffffff810cd2ac>] ? stop_critical_timing+0x139/0x20d
[    2.352000]  [<ffffffff8100ae3e>] ? default_idle+0x1d/0x30
[    2.352000]  [<ffffffff81072d35>] ? cpu_startup_entry+0x115/0x220
[    2.352000]  [<ffffffff81027610>] ? start_secondary+0x21c/0x222
[    2.352000] ---[ end trace a712676a9f9a53a0 ]---


-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-16 16:43 ` Steven Rostedt
@ 2014-07-18 15:34   ` Petr Mládek
  2014-07-21 14:43     ` Petr Mládek
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mládek @ 2014-07-18 15:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel

On Wed 2014-07-16 12:43:56, Steven Rostedt wrote:
> On Wed, 16 Jul 2014 10:58:04 +0200
> Petr Mladek <pmladek@suse.cz> wrote:
> 
> 
> > +/**
> > + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> > + * @buffer_a: One buffer to swap with
> > + * @buffer_b: The other buffer to swap with
> > + *
> > + * This function is useful for tracers that want to take a "snapshot"
> > + * of a CPU buffer and has another back up buffer lying around.
> > + * It is expected that the tracer handles the cpu buffer not being
> > + * used at the moment.
> > + */
> > +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> > +			 struct ring_buffer *buffer_b, int cpu)
> > +{
> > +	struct ring_buffer_swap_info rb_swap_info = {
> > +		.buffer_a = buffer_a,
> > +		.buffer_b = buffer_b,
> > +	};
> > +	int ret;
> > +
> > +	/*
> > +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> > +	 * and and this helps to avoid memory barriers.
> > +	 */
> > +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > +				       (void *)&rb_swap_info, 1);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return rb_swap_info.ret;
> 
> We need to check if the cpu is on the current CPU and if so, just call
> the function directly. Otherwise this can't be done from interrupt
> disabled context.

I see, my testing was not good enough :-(

So, I tried to use:

	if (cpu == smp_processor_id())
		ring_buffer_swap_this_cpu(&rb_swap_info);
	else
		ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
					       (void *)&rb_swap_info, 1);

It solved the problem with enabled IRQSOFF_TRACER and
FTRACE_STARTUP_TEST because there the swap was called from the same CPU.

But there is still the problem when the function is called from another
CPU. I manage to trigger it by:

     echo 1 >/sys/kernel/debug/tracing/per_cpu/cpu0/snapshot

It produces:

[ 1594.060650] ------------[ cut here ]------------
[ 1594.060664] WARNING: CPU: 3 PID: 1558 at kernel/smp.c:242 smp_call_function_single+0xa4/0xb0()
[ 1594.060666] Modules linked in:
[ 1594.060673] CPU: 3 PID: 1558 Comm: bash Not tainted 3.16.0-rc1-2-default+ #2404
[ 1594.060676] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
[ 1594.060679]  00000000000000f2 ffff880815b93db8 ffffffff818d34e6 ffff880815b93df8
[ 1594.060685]  ffffffff810cf28c ffff880813658150 0000000000000001 ffff880815b93e48
[ 1594.060691]  ffffffff8118b7e0 0000000000000000 0000000000000002 ffff880815b93e08
[ 1594.060696] Call Trace:
[ 1594.060705]  [<ffffffff818d34e6>] dump_stack+0x6a/0x7c
[ 1594.060713]  [<ffffffff810cf28c>] warn_slowpath_common+0x8c/0xc0
[ 1594.060720]  [<ffffffff8118b7e0>] ? ring_buffer_size+0x40/0x40
[ 1594.060725]  [<ffffffff810cf2da>] warn_slowpath_null+0x1a/0x20
[ 1594.060730]  [<ffffffff81149cc4>] smp_call_function_single+0xa4/0xb0
[ 1594.060735]  [<ffffffff8118c72f>] ring_buffer_swap_cpu+0x5f/0x70
[ 1594.060742]  [<ffffffff811981ea>] update_max_tr_single+0x8a/0x180
[ 1594.060747]  [<ffffffff8119843a>] tracing_snapshot_write+0x15a/0x1a0
[ 1594.060754]  [<ffffffff8123cf95>] vfs_write+0xd5/0x180
[ 1594.060759]  [<ffffffff8123d969>] SyS_write+0x59/0xc0
[ 1594.060766]  [<ffffffff818d8569>] system_call_fastpath+0x16/0x1b
[ 1594.060769] ---[ end trace 662a3aa81711f30e ]---


No clever idea comes to my mind now. Maybe Monday will bring some
fresh thinking.

I think about using IPI but this is what smp_call_function_single()
does and it warns about possible deadlocks. I am not sure if it is
because it is a generic function or if it is dangerous even in this
particular situation.

Have a nice weekend,
Petr


PS: I am sorry that it took me so much time to respond. I wanted to
have free mind when looking into it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-18 15:34   ` Petr Mládek
@ 2014-07-21 14:43     ` Petr Mládek
  2014-07-21 15:43       ` Paul E. McKenney
  2014-07-21 16:07       ` Steven Rostedt
  0 siblings, 2 replies; 15+ messages in thread
From: Petr Mládek @ 2014-07-21 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel

On Fri 2014-07-18 17:34:43, Petr Mládek wrote:
> On Wed 2014-07-16 12:43:56, Steven Rostedt wrote:
> > On Wed, 16 Jul 2014 10:58:04 +0200
> > Petr Mladek <pmladek@suse.cz> wrote:
> > 
> > 
> > > +/**
> > > + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> > > + * @buffer_a: One buffer to swap with
> > > + * @buffer_b: The other buffer to swap with
> > > + *
> > > + * This function is useful for tracers that want to take a "snapshot"
> > > + * of a CPU buffer and has another back up buffer lying around.
> > > + * It is expected that the tracer handles the cpu buffer not being
> > > + * used at the moment.
> > > + */
> > > +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> > > +			 struct ring_buffer *buffer_b, int cpu)
> > > +{
> > > +	struct ring_buffer_swap_info rb_swap_info = {
> > > +		.buffer_a = buffer_a,
> > > +		.buffer_b = buffer_b,
> > > +	};
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> > > +	 * and and this helps to avoid memory barriers.
> > > +	 */
> > > +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > > +				       (void *)&rb_swap_info, 1);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	return rb_swap_info.ret;
> > 
> > We need to check if the cpu is on the current CPU and if so, just call
> > the function directly. Otherwise this can't be done from interrupt
> > disabled context.
> 
> I see, my testing was not good enough :-(
> 
> So, I tried to use:
> 
> 	if (cpu == smp_processor_id())
> 		ring_buffer_swap_this_cpu(&rb_swap_info);
> 	else
> 		ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> 					       (void *)&rb_swap_info, 1);
> 
> It solved the problem with enabled IRQSOFF_TRACER and
> FTRACE_STARTUP_TEST because there the swap was called from the same CPU.
> 
> But there is still the problem when the function is called from another
> CPU. I manage to trigger it by:
> 
>      echo 1 >/sys/kernel/debug/tracing/per_cpu/cpu0/snapshot
> 
> It produces:
> 
> [ 1594.060650] ------------[ cut here ]------------
> [ 1594.060664] WARNING: CPU: 3 PID: 1558 at kernel/smp.c:242 smp_call_function_single+0xa4/0xb0()
> [ 1594.060666] Modules linked in:
> [ 1594.060673] CPU: 3 PID: 1558 Comm: bash Not tainted 3.16.0-rc1-2-default+ #2404
> [ 1594.060676] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> [ 1594.060679]  00000000000000f2 ffff880815b93db8 ffffffff818d34e6 ffff880815b93df8
> [ 1594.060685]  ffffffff810cf28c ffff880813658150 0000000000000001 ffff880815b93e48
> [ 1594.060691]  ffffffff8118b7e0 0000000000000000 0000000000000002 ffff880815b93e08
> [ 1594.060696] Call Trace:
> [ 1594.060705]  [<ffffffff818d34e6>] dump_stack+0x6a/0x7c
> [ 1594.060713]  [<ffffffff810cf28c>] warn_slowpath_common+0x8c/0xc0
> [ 1594.060720]  [<ffffffff8118b7e0>] ? ring_buffer_size+0x40/0x40
> [ 1594.060725]  [<ffffffff810cf2da>] warn_slowpath_null+0x1a/0x20
> [ 1594.060730]  [<ffffffff81149cc4>] smp_call_function_single+0xa4/0xb0
> [ 1594.060735]  [<ffffffff8118c72f>] ring_buffer_swap_cpu+0x5f/0x70
> [ 1594.060742]  [<ffffffff811981ea>] update_max_tr_single+0x8a/0x180
> [ 1594.060747]  [<ffffffff8119843a>] tracing_snapshot_write+0x15a/0x1a0
> [ 1594.060754]  [<ffffffff8123cf95>] vfs_write+0xd5/0x180
> [ 1594.060759]  [<ffffffff8123d969>] SyS_write+0x59/0xc0
> [ 1594.060766]  [<ffffffff818d8569>] system_call_fastpath+0x16/0x1b
> [ 1594.060769] ---[ end trace 662a3aa81711f30e ]---
> 
> 
> No clever idea comes to my mind now. Maybe Monday will bring some
> fresh thinking.
> 
> I think about using IPI but this is what smp_call_function_single()
> does and it warns about possible deadlocks. I am not sure if it is
> because it is a generic function or if it is dangerous even in this
> particular situation.

I have two more ideas but both are ugly :-(


1. I wonder if we really need to call ring_buffer_swap_cpu() with IRQs
   disabled. It is used "only" in update_max_tr_single().

   The disabled IRQs might be needed only inside __update_max_tr()
   when we do something with "current" task.

   Otherwise, update_max_tr_single() is already called with IRQs
   disabled from:

       + tracing_snapshot_write() - here the IRQs are disabled only to
		call the function update_max_tr_single()/

       + check_critical_timing() - it looks to me the IRQs could get
		enabled before calling update_max_tr_single()



2. Go back, do the swap on any CPU, and do memory barriers via IPI.

   I wonder if the needed memory barrier in rb_reserve_next_event()
   could be avoided by calling IPI from ring_buffer_swap_cpu().

   I mean that rb_reserve_next_event() will include the current check
   for swapped ring buffer without barriers. But
   ring_buffer_swap_cpu() will interrupt the affected CPU and
   basically do the barrier there only when needed.

   But I am not sure how this is different from calling
   smp_call_function_single() from ring_buffer_swap_cpu().
   And I am back on the question why it is dangerous with disabled
   interrupts. I can't find any clue in git history. And I miss this
   part of the picture :-(


Any pointers or ideas are welcome.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 14:43     ` Petr Mládek
@ 2014-07-21 15:43       ` Paul E. McKenney
  2014-07-21 16:18         ` Petr Mládek
  2014-07-23 16:28         ` Frederic Weisbecker
  2014-07-21 16:07       ` Steven Rostedt
  1 sibling, 2 replies; 15+ messages in thread
From: Paul E. McKenney @ 2014-07-21 15:43 UTC (permalink / raw)
  To: Petr Mládek
  Cc: Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Jiri Kosina,
	linux-kernel

On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> On Fri 2014-07-18 17:34:43, Petr Mládek wrote:
> > On Wed 2014-07-16 12:43:56, Steven Rostedt wrote:
> > > On Wed, 16 Jul 2014 10:58:04 +0200
> > > Petr Mladek <pmladek@suse.cz> wrote:
> > > 
> > > 
> > > > +/**
> > > > + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> > > > + * @buffer_a: One buffer to swap with
> > > > + * @buffer_b: The other buffer to swap with
> > > > + *
> > > > + * This function is useful for tracers that want to take a "snapshot"
> > > > + * of a CPU buffer and has another back up buffer lying around.
> > > > + * It is expected that the tracer handles the cpu buffer not being
> > > > + * used at the moment.
> > > > + */
> > > > +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> > > > +			 struct ring_buffer *buffer_b, int cpu)
> > > > +{
> > > > +	struct ring_buffer_swap_info rb_swap_info = {
> > > > +		.buffer_a = buffer_a,
> > > > +		.buffer_b = buffer_b,
> > > > +	};
> > > > +	int ret;
> > > > +
> > > > +	/*
> > > > +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> > > > +	 * and and this helps to avoid memory barriers.
> > > > +	 */
> > > > +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > > > +				       (void *)&rb_swap_info, 1);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	return rb_swap_info.ret;
> > > 
> > > We need to check if the cpu is on the current CPU and if so, just call
> > > the function directly. Otherwise this can't be done from interrupt
> > > disabled context.
> > 
> > I see, my testing was not good enough :-(
> > 
> > So, I tried to use:
> > 
> > 	if (cpu == smp_processor_id())
> > 		ring_buffer_swap_this_cpu(&rb_swap_info);
> > 	else
> > 		ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > 					       (void *)&rb_swap_info, 1);
> > 
> > It solved the problem with enabled IRQSOFF_TRACER and
> > FTRACE_STARTUP_TEST because there the swap was called from the same CPU.
> > 
> > But there is still the problem when the function is called from another
> > CPU. I manage to trigger it by:
> > 
> >      echo 1 >/sys/kernel/debug/tracing/per_cpu/cpu0/snapshot
> > 
> > It produces:
> > 
> > [ 1594.060650] ------------[ cut here ]------------
> > [ 1594.060664] WARNING: CPU: 3 PID: 1558 at kernel/smp.c:242 smp_call_function_single+0xa4/0xb0()
> > [ 1594.060666] Modules linked in:
> > [ 1594.060673] CPU: 3 PID: 1558 Comm: bash Not tainted 3.16.0-rc1-2-default+ #2404
> > [ 1594.060676] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> > [ 1594.060679]  00000000000000f2 ffff880815b93db8 ffffffff818d34e6 ffff880815b93df8
> > [ 1594.060685]  ffffffff810cf28c ffff880813658150 0000000000000001 ffff880815b93e48
> > [ 1594.060691]  ffffffff8118b7e0 0000000000000000 0000000000000002 ffff880815b93e08
> > [ 1594.060696] Call Trace:
> > [ 1594.060705]  [<ffffffff818d34e6>] dump_stack+0x6a/0x7c
> > [ 1594.060713]  [<ffffffff810cf28c>] warn_slowpath_common+0x8c/0xc0
> > [ 1594.060720]  [<ffffffff8118b7e0>] ? ring_buffer_size+0x40/0x40
> > [ 1594.060725]  [<ffffffff810cf2da>] warn_slowpath_null+0x1a/0x20
> > [ 1594.060730]  [<ffffffff81149cc4>] smp_call_function_single+0xa4/0xb0
> > [ 1594.060735]  [<ffffffff8118c72f>] ring_buffer_swap_cpu+0x5f/0x70
> > [ 1594.060742]  [<ffffffff811981ea>] update_max_tr_single+0x8a/0x180
> > [ 1594.060747]  [<ffffffff8119843a>] tracing_snapshot_write+0x15a/0x1a0
> > [ 1594.060754]  [<ffffffff8123cf95>] vfs_write+0xd5/0x180
> > [ 1594.060759]  [<ffffffff8123d969>] SyS_write+0x59/0xc0
> > [ 1594.060766]  [<ffffffff818d8569>] system_call_fastpath+0x16/0x1b
> > [ 1594.060769] ---[ end trace 662a3aa81711f30e ]---
> > 
> > 
> > No clever idea comes to my mind now. Maybe Monday will bring some
> > fresh thinking.
> > 
> > I think about using IPI but this is what smp_call_function_single()
> > does and it warns about possible deadlocks. I am not sure if it is
> > because it is a generic function or if it is dangerous even in this
> > particular situation.
> 
> I have two more ideas but both are ugly :-(
> 
> 
> 1. I wonder if we really need to call ring_buffer_swap_cpu() with IRQs
>    disabled. It is used "only" in update_max_tr_single().
> 
>    The disabled IRQs might be needed only inside __update_max_tr()
>    when we do something with "current" task.
> 
>    Otherwise, update_max_tr_single() is already called with IRQs
>    disabled from:
> 
>        + tracing_snapshot_write() - here the IRQs are disabled only to
> 		call the function update_max_tr_single()/
> 
>        + check_critical_timing() - it looks to me the IRQs could get
> 		enabled before calling update_max_tr_single()
> 
> 
> 
> 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> 
>    I wonder if the needed memory barrier in rb_reserve_next_event()
>    could be avoided by calling IPI from ring_buffer_swap_cpu().
> 
>    I mean that rb_reserve_next_event() will include the current check
>    for swapped ring buffer without barriers. But
>    ring_buffer_swap_cpu() will interrupt the affected CPU and
>    basically do the barrier there only when needed.
> 
>    But I am not sure how this is different from calling
>    smp_call_function_single() from ring_buffer_swap_cpu().
>    And I am back on the question why it is dangerous with disabled
>    interrupts. I can't find any clue in git history. And I miss this
>    part of the picture :-(

IIRC, deadlock in the case where two CPUs attempt to invoke
smp_call_function_single() at each other, but both have
interrupts disabled.  It might be possible to avoid this by telling
smp_call_function_single() not to wait for a response, but this often
just re-introduces the deadlock at a higher level.

> Any pointers or ideas are welcome.

Not immediately.  Mark Batty (mark.batty@cl.cam.ac.uk) has come up with
cute ring-buffer tricks in the past, but would need a clear statement of
the problem.  I would be happy to bring him into the discussion if it
would help.

And yes, my knee-jerk reaction of suggesting RCU runs into the problem
that it is not so good to invoke synchronize_rcu() with interrupts
disabled.  Might be able to use call_rcu(), but if that worked, then
just telling smp_call_function_single() not to wait would probably
be a lot simpler.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 14:43     ` Petr Mládek
  2014-07-21 15:43       ` Paul E. McKenney
@ 2014-07-21 16:07       ` Steven Rostedt
  2014-07-22  9:41         ` Petr Mládek
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2014-07-21 16:07 UTC (permalink / raw)
  To: Petr Mládek
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel

On Mon, 21 Jul 2014 16:43:24 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> On Fri 2014-07-18 17:34:43, Petr Mládek wrote:
> > On Wed 2014-07-16 12:43:56, Steven Rostedt wrote:
> > > On Wed, 16 Jul 2014 10:58:04 +0200
> > > Petr Mladek <pmladek@suse.cz> wrote:
> > > 
> > > 
> > > > +/**
> > > > + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> > > > + * @buffer_a: One buffer to swap with
> > > > + * @buffer_b: The other buffer to swap with
> > > > + *
> > > > + * This function is useful for tracers that want to take a "snapshot"
> > > > + * of a CPU buffer and has another back up buffer lying around.
> > > > + * It is expected that the tracer handles the cpu buffer not being
> > > > + * used at the moment.
> > > > + */
> > > > +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> > > > +			 struct ring_buffer *buffer_b, int cpu)
> > > > +{
> > > > +	struct ring_buffer_swap_info rb_swap_info = {
> > > > +		.buffer_a = buffer_a,
> > > > +		.buffer_b = buffer_b,
> > > > +	};
> > > > +	int ret;
> > > > +
> > > > +	/*
> > > > +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> > > > +	 * and and this helps to avoid memory barriers.
> > > > +	 */
> > > > +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > > > +				       (void *)&rb_swap_info, 1);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	return rb_swap_info.ret;
> > > 
> > > We need to check if the cpu is on the current CPU and if so, just call
> > > the function directly. Otherwise this can't be done from interrupt
> > > disabled context.
> > 
> > I see, my testing was not good enough :-(
> > 
> > So, I tried to use:
> > 
> > 	if (cpu == smp_processor_id())
> > 		ring_buffer_swap_this_cpu(&rb_swap_info);
> > 	else
> > 		ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > 					       (void *)&rb_swap_info, 1);
> > 
> > It solved the problem with enabled IRQSOFF_TRACER and
> > FTRACE_STARTUP_TEST because there the swap was called from the same CPU.
> > 
> > But there is still the problem when the function is called from another
> > CPU. I manage to trigger it by:
> > 
> >      echo 1 >/sys/kernel/debug/tracing/per_cpu/cpu0/snapshot
> > 
> > It produces:
> > 
> > [ 1594.060650] ------------[ cut here ]------------
> > [ 1594.060664] WARNING: CPU: 3 PID: 1558 at kernel/smp.c:242 smp_call_function_single+0xa4/0xb0()
> > [ 1594.060666] Modules linked in:
> > [ 1594.060673] CPU: 3 PID: 1558 Comm: bash Not tainted 3.16.0-rc1-2-default+ #2404
> > [ 1594.060676] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> > [ 1594.060679]  00000000000000f2 ffff880815b93db8 ffffffff818d34e6 ffff880815b93df8
> > [ 1594.060685]  ffffffff810cf28c ffff880813658150 0000000000000001 ffff880815b93e48
> > [ 1594.060691]  ffffffff8118b7e0 0000000000000000 0000000000000002 ffff880815b93e08
> > [ 1594.060696] Call Trace:
> > [ 1594.060705]  [<ffffffff818d34e6>] dump_stack+0x6a/0x7c
> > [ 1594.060713]  [<ffffffff810cf28c>] warn_slowpath_common+0x8c/0xc0
> > [ 1594.060720]  [<ffffffff8118b7e0>] ? ring_buffer_size+0x40/0x40
> > [ 1594.060725]  [<ffffffff810cf2da>] warn_slowpath_null+0x1a/0x20
> > [ 1594.060730]  [<ffffffff81149cc4>] smp_call_function_single+0xa4/0xb0
> > [ 1594.060735]  [<ffffffff8118c72f>] ring_buffer_swap_cpu+0x5f/0x70
> > [ 1594.060742]  [<ffffffff811981ea>] update_max_tr_single+0x8a/0x180
> > [ 1594.060747]  [<ffffffff8119843a>] tracing_snapshot_write+0x15a/0x1a0
> > [ 1594.060754]  [<ffffffff8123cf95>] vfs_write+0xd5/0x180
> > [ 1594.060759]  [<ffffffff8123d969>] SyS_write+0x59/0xc0
> > [ 1594.060766]  [<ffffffff818d8569>] system_call_fastpath+0x16/0x1b
> > [ 1594.060769] ---[ end trace 662a3aa81711f30e ]---
> > 
> > 
> > No clever idea comes to my mind now. Maybe Monday will bring some
> > fresh thinking.
> > 
> > I think about using IPI but this is what smp_call_function_single()
> > does and it warns about possible deadlocks. I am not sure if it is
> > because it is a generic function or if it is dangerous even in this
> > particular situation.
> 
> I have two more ideas but both are ugly :-(
> 
> 
> 1. I wonder if we really need to call ring_buffer_swap_cpu() with IRQs
>    disabled. It is used "only" in update_max_tr_single().
> 
>    The disabled IRQs might be needed only inside __update_max_tr()
>    when we do something with "current" task.
> 
>    Otherwise, update_max_tr_single() is already called with IRQs
>    disabled from:
> 
>        + tracing_snapshot_write() - here the IRQs are disabled only to
> 		call the function update_max_tr_single()/
> 
>        + check_critical_timing() - it looks to me the IRQs could get
> 		enabled before calling update_max_tr_single()
> 
> 
> 
> 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> 
>    I wonder if the needed memory barrier in rb_reserve_next_event()
>    could be avoided by calling IPI from ring_buffer_swap_cpu().
> 
>    I mean that rb_reserve_next_event() will include the current check
>    for swapped ring buffer without barriers. But
>    ring_buffer_swap_cpu() will interrupt the affected CPU and
>    basically do the barrier there only when needed.
> 
>    But I am not sure how this is different from calling
>    smp_call_function_single() from ring_buffer_swap_cpu().
>    And I am back on the question why it is dangerous with disabled
>    interrupts. I can't find any clue in git history. And I miss this
>    part of the picture :-(
> 
> 
> Any pointers or ideas are welcome.
> 


We could do:

	if (irqs_disabled()) {
		/* Only allowed to swap current CPU if irqs are disabled */
		if (WARN_ON_ONCE(cpu != smp_processor_id())
			return;
		ring_buffer_swap_this_cpu();
	} else {
		smp_call_function_single(...);
	}

and then we need to modify tracing_snapshot_write() to do something
else besides call update_max_tr_single().

We could modify the update_max_tr_*() to have both a irqs disabled and
a irqs enabled version.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 15:43       ` Paul E. McKenney
@ 2014-07-21 16:18         ` Petr Mládek
  2014-07-21 16:30           ` Steven Rostedt
  2014-07-23 16:28         ` Frederic Weisbecker
  1 sibling, 1 reply; 15+ messages in thread
From: Petr Mládek @ 2014-07-21 16:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Jiri Kosina,
	linux-kernel

On Mon 2014-07-21 08:43:17, Paul E. McKenney wrote:
> On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> 
> IIRC, deadlock in the case where two CPUs attempt to invoke
> smp_call_function_single() at each other, but both have
> interrupts disabled.  It might be possible to avoid this by telling
> smp_call_function_single() not to wait for a response, but this often
> just re-introduces the deadlock at a higher level.

I thought that IPI used NMI and thus could not be blocked if the
called function was reasonable. Note that ring_buffer_swap_cpu() does not take
any lock and can't block anywhere. I am probably too optimistic here.


> > Any pointers or ideas are welcome.
> 
> Not immediately.  Mark Batty (mark.batty@cl.cam.ac.uk) has come up with
> cute ring-buffer tricks in the past, but would need a clear statement of
> the problem.  I would be happy to bring him into the discussion if it
> would help.

In short. We have two operations: writing and swap. They "block" each
other by setting the variables "committing" and "record_disabled".
It is not a real block. The other operation is "nop" when the other
one is in the critical section.

We want to keep writing fast and avoid memory barriers there. Writing
works with per-CPU buffer. It would help when also the swap happens
on the same CPU.

More detailed description of the current race can be found at
https://lkml.org/lkml/2014/7/16/178


> And yes, my knee-jerk reaction of suggesting RCU runs into the problem
> that it is not so good to invoke synchronize_rcu() with interrupts
> disabled.  Might be able to use call_rcu(), but if that worked, then
> just telling smp_call_function_single() not to wait would probably
> be a lot simpler.

I am still not sure if it really has to be called with IRQs disabled.

BTW: I have just got another idea. If we store pointer to the used
cpu_buffer into struct ring_buffer_event, it might be possible
to keep the write operation consistent even when the cpu buffers
are switched.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 16:18         ` Petr Mládek
@ 2014-07-21 16:30           ` Steven Rostedt
  2014-07-29  9:02             ` Jiri Kosina
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2014-07-21 16:30 UTC (permalink / raw)
  To: Petr Mládek
  Cc: Paul E. McKenney, Ingo Molnar, Frederic Weisbecker, Jiri Kosina,
	linux-kernel

On Mon, 21 Jul 2014 18:18:28 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> On Mon 2014-07-21 08:43:17, Paul E. McKenney wrote:
> > On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> > 
> > IIRC, deadlock in the case where two CPUs attempt to invoke
> > smp_call_function_single() at each other, but both have
> > interrupts disabled.  It might be possible to avoid this by telling
> > smp_call_function_single() not to wait for a response, but this often
> > just re-introduces the deadlock at a higher level.
> 
> I thought that IPI used NMI and thus could not be blocked if the
> called function was reasonable. Note that ring_buffer_swap_cpu() does not take
> any lock and can't block anywhere. I am probably too optimistic here.
> 

Heh, that would be a crazy system. No, IPI is a normal maskable
interrupt. It does not use NMIs. In fact, IPI is how irq_work is
implemented to do stuff from an NMI outside of NMI context.

> 
> > > Any pointers or ideas are welcome.
> > 
> > Not immediately.  Mark Batty (mark.batty@cl.cam.ac.uk) has come up with
> > cute ring-buffer tricks in the past, but would need a clear statement of
> > the problem.  I would be happy to bring him into the discussion if it
> > would help.
> 
> In short. We have two operations: writing and swap. They "block" each
> other by setting the variables "committing" and "record_disabled".
> It is not a real block. The other operation is "nop" when the other
> one is in the critical section.
> 
> We want to keep writing fast and avoid memory barriers there. Writing
> works with per-CPU buffer. It would help when also the swap happens
> on the same CPU.
> 
> More detailed description of the current race can be found at
> https://lkml.org/lkml/2014/7/16/178
> 
> 
> > And yes, my knee-jerk reaction of suggesting RCU runs into the problem
> > that it is not so good to invoke synchronize_rcu() with interrupts
> > disabled.  Might be able to use call_rcu(), but if that worked, then
> > just telling smp_call_function_single() not to wait would probably
> > be a lot simpler.
> 
> I am still not sure if it really has to be called with IRQs disabled.

Yes it has to be. The stop_critical_timings() is the core of the
interrupts off latency tracer and that code is heavily dependent on
interrupts being disabled (for the irqsoff part, preemption must be
off for the preemptoff part).


> 
> BTW: I have just got another idea. If we store pointer to the used
> cpu_buffer into struct ring_buffer_event, it might be possible
> to keep the write operation consistent even when the cpu buffers
> are switched.

Can't. The ring_buffer_event is an ABI and is mapped in the ring buffer
itself which is exported to userspace.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 16:07       ` Steven Rostedt
@ 2014-07-22  9:41         ` Petr Mládek
  0 siblings, 0 replies; 15+ messages in thread
From: Petr Mládek @ 2014-07-22  9:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, Paul E. McKenney, Jiri Kosina,
	linux-kernel

On Mon 2014-07-21 12:07:38, Steven Rostedt wrote:
> On Mon, 21 Jul 2014 16:43:24 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> > On Fri 2014-07-18 17:34:43, Petr Mládek wrote:
> > > On Wed 2014-07-16 12:43:56, Steven Rostedt wrote:
> > > > On Wed, 16 Jul 2014 10:58:04 +0200
> > > > Petr Mladek <pmladek@suse.cz> wrote:
> > > > 
> > > > 
> > > > > +/**
> > > > > + * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
> > > > > + * @buffer_a: One buffer to swap with
> > > > > + * @buffer_b: The other buffer to swap with
> > > > > + *
> > > > > + * This function is useful for tracers that want to take a "snapshot"
> > > > > + * of a CPU buffer and has another back up buffer lying around.
> > > > > + * It is expected that the tracer handles the cpu buffer not being
> > > > > + * used at the moment.
> > > > > + */
> > > > > +int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> > > > > +			 struct ring_buffer *buffer_b, int cpu)
> > > > > +{
> > > > > +	struct ring_buffer_swap_info rb_swap_info = {
> > > > > +		.buffer_a = buffer_a,
> > > > > +		.buffer_b = buffer_b,
> > > > > +	};
> > > > > +	int ret;
> > > > > +
> > > > > +	/*
> > > > > +	 * Swap the CPU buffer on the same CPU. Recording has to be fast
> > > > > +	 * and and this helps to avoid memory barriers.
> > > > > +	 */
> > > > > +	ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > > > > +				       (void *)&rb_swap_info, 1);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	return rb_swap_info.ret;
> > > > 
> > > > We need to check if the cpu is on the current CPU and if so, just call
> > > > the function directly. Otherwise this can't be done from interrupt
> > > > disabled context.
> > > 
> > > I see, my testing was not good enough :-(
> > > 
> > > So, I tried to use:
> > > 
> > > 	if (cpu == smp_processor_id())
> > > 		ring_buffer_swap_this_cpu(&rb_swap_info);
> > > 	else
> > > 		ret = smp_call_function_single(cpu, ring_buffer_swap_this_cpu,
> > > 					       (void *)&rb_swap_info, 1);
> > > 
> > > It solved the problem with enabled IRQSOFF_TRACER and
> > > FTRACE_STARTUP_TEST because there the swap was called from the same CPU.
> > > 
> > > But there is still the problem when the function is called from another
> > > CPU. I manage to trigger it by:
> > > 
> > >      echo 1 >/sys/kernel/debug/tracing/per_cpu/cpu0/snapshot
> > > 
> > > It produces:
> > > 
> > > [ 1594.060650] ------------[ cut here ]------------
> > > [ 1594.060664] WARNING: CPU: 3 PID: 1558 at kernel/smp.c:242 smp_call_function_single+0xa4/0xb0()
> > > [ 1594.060666] Modules linked in:
> > > [ 1594.060673] CPU: 3 PID: 1558 Comm: bash Not tainted 3.16.0-rc1-2-default+ #2404
> > > [ 1594.060676] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> > > [ 1594.060679]  00000000000000f2 ffff880815b93db8 ffffffff818d34e6 ffff880815b93df8
> > > [ 1594.060685]  ffffffff810cf28c ffff880813658150 0000000000000001 ffff880815b93e48
> > > [ 1594.060691]  ffffffff8118b7e0 0000000000000000 0000000000000002 ffff880815b93e08
> > > [ 1594.060696] Call Trace:
> > > [ 1594.060705]  [<ffffffff818d34e6>] dump_stack+0x6a/0x7c
> > > [ 1594.060713]  [<ffffffff810cf28c>] warn_slowpath_common+0x8c/0xc0
> > > [ 1594.060720]  [<ffffffff8118b7e0>] ? ring_buffer_size+0x40/0x40
> > > [ 1594.060725]  [<ffffffff810cf2da>] warn_slowpath_null+0x1a/0x20
> > > [ 1594.060730]  [<ffffffff81149cc4>] smp_call_function_single+0xa4/0xb0
> > > [ 1594.060735]  [<ffffffff8118c72f>] ring_buffer_swap_cpu+0x5f/0x70
> > > [ 1594.060742]  [<ffffffff811981ea>] update_max_tr_single+0x8a/0x180
> > > [ 1594.060747]  [<ffffffff8119843a>] tracing_snapshot_write+0x15a/0x1a0
> > > [ 1594.060754]  [<ffffffff8123cf95>] vfs_write+0xd5/0x180
> > > [ 1594.060759]  [<ffffffff8123d969>] SyS_write+0x59/0xc0
> > > [ 1594.060766]  [<ffffffff818d8569>] system_call_fastpath+0x16/0x1b
> > > [ 1594.060769] ---[ end trace 662a3aa81711f30e ]---
> > > 
> > > 
> > > No clever idea comes to my mind now. Maybe Monday will bring some
> > > fresh thinking.
> > > 
> > > I think about using IPI but this is what smp_call_function_single()
> > > does and it warns about possible deadlocks. I am not sure if it is
> > > because it is a generic function or if it is dangerous even in this
> > > particular situation.
> > 
> > I have two more ideas but both are ugly :-(
> > 
> > 
> > 1. I wonder if we really need to call ring_buffer_swap_cpu() with IRQs
> >    disabled. It is used "only" in update_max_tr_single().
> > 
> >    The disabled IRQs might be needed only inside __update_max_tr()
> >    when we do something with "current" task.
> > 
> >    Otherwise, update_max_tr_single() is already called with IRQs
> >    disabled from:
> > 
> >        + tracing_snapshot_write() - here the IRQs are disabled only to
> > 		call the function update_max_tr_single()/
> > 
> >        + check_critical_timing() - it looks to me the IRQs could get
> > 		enabled before calling update_max_tr_single()
> > 
> > 
> > 
> > 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> > 
> >    I wonder if the needed memory barrier in rb_reserve_next_event()
> >    could be avoided by calling IPI from ring_buffer_swap_cpu().
> > 
> >    I mean that rb_reserve_next_event() will include the current check
> >    for swapped ring buffer without barriers. But
> >    ring_buffer_swap_cpu() will interrupt the affected CPU and
> >    basically do the barrier there only when needed.
> > 
> >    But I am not sure how this is different from calling
> >    smp_call_function_single() from ring_buffer_swap_cpu().
> >    And I am back on the question why it is dangerous with disabled
> >    interrupts. I can't find any clue in git history. And I miss this
> >    part of the picture :-(
> > 
> > 
> > Any pointers or ideas are welcome.
> > 
> 
> 
> We could do:
> 
> 	if (irqs_disabled()) {
> 		/* Only allowed to swap current CPU if irqs are disabled */
> 		if (WARN_ON_ONCE(cpu != smp_processor_id())
> 			return;
> 		ring_buffer_swap_this_cpu();
> 	} else {
> 		smp_call_function_single(...);
> 	}
> 
> and then we need to modify tracing_snapshot_write() to do something
> else besides call update_max_tr_single().
> 
> We could modify the update_max_tr_*() to have both a irqs disabled and
> a irqs enabled version.

I am going to look at this solution.

Thank you guys a lot for explanation, hints, and patience.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 15:43       ` Paul E. McKenney
  2014-07-21 16:18         ` Petr Mládek
@ 2014-07-23 16:28         ` Frederic Weisbecker
  2014-07-23 16:34           ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2014-07-23 16:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Petr Mládek, Steven Rostedt, Ingo Molnar, Jiri Kosina, linux-kernel

On Mon, Jul 21, 2014 at 08:43:17AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> > 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> > 
> >    I wonder if the needed memory barrier in rb_reserve_next_event()
> >    could be avoided by calling IPI from ring_buffer_swap_cpu().
> > 
> >    I mean that rb_reserve_next_event() will include the current check
> >    for swapped ring buffer without barriers. But
> >    ring_buffer_swap_cpu() will interrupt the affected CPU and
> >    basically do the barrier there only when needed.
> > 
> >    But I am not sure how this is different from calling
> >    smp_call_function_single() from ring_buffer_swap_cpu().
> >    And I am back on the question why it is dangerous with disabled
> >    interrupts. I can't find any clue in git history. And I miss this
> >    part of the picture :-(
> 
> IIRC, deadlock in the case where two CPUs attempt to invoke
> smp_call_function_single() at each other, but both have
> interrupts disabled.  It might be possible to avoid this by telling
> smp_call_function_single() not to wait for a response, but this often
> just re-introduces the deadlock at a higher level.

FWIW, this is what smp_call_function_single_async() does. But then the call
must synchronized such that no concurrent call happen until the IPI completion.

Otherwise you also have irq_work_queue_on() (not yet upstream but in tip/timers/nohz
and tip/sched/core).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-23 16:28         ` Frederic Weisbecker
@ 2014-07-23 16:34           ` Steven Rostedt
  2014-07-23 16:47             ` Frederic Weisbecker
  2014-07-23 16:49             ` Petr Mládek
  0 siblings, 2 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-07-23 16:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, Petr Mládek, Ingo Molnar, Jiri Kosina,
	linux-kernel

On Wed, 23 Jul 2014 18:28:48 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Mon, Jul 21, 2014 at 08:43:17AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> > > 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> > > 
> > >    I wonder if the needed memory barrier in rb_reserve_next_event()
> > >    could be avoided by calling IPI from ring_buffer_swap_cpu().
> > > 
> > >    I mean that rb_reserve_next_event() will include the current check
> > >    for swapped ring buffer without barriers. But
> > >    ring_buffer_swap_cpu() will interrupt the affected CPU and
> > >    basically do the barrier there only when needed.
> > > 
> > >    But I am not sure how this is different from calling
> > >    smp_call_function_single() from ring_buffer_swap_cpu().
> > >    And I am back on the question why it is dangerous with disabled
> > >    interrupts. I can't find any clue in git history. And I miss this
> > >    part of the picture :-(
> > 
> > IIRC, deadlock in the case where two CPUs attempt to invoke
> > smp_call_function_single() at each other, but both have
> > interrupts disabled.  It might be possible to avoid this by telling
> > smp_call_function_single() not to wait for a response, but this often
> > just re-introduces the deadlock at a higher level.
> 
> FWIW, this is what smp_call_function_single_async() does. But then the call
> must synchronized such that no concurrent call happen until the IPI completion.
> 
> Otherwise you also have irq_work_queue_on() (not yet upstream but in tip/timers/nohz
> and tip/sched/core).

Well, the code in question must wait for the IPI to finish, thus as
Paul said, we just push the issue to the caller.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-23 16:34           ` Steven Rostedt
@ 2014-07-23 16:47             ` Frederic Weisbecker
  2014-07-23 16:49             ` Petr Mládek
  1 sibling, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2014-07-23 16:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Petr Mládek, Ingo Molnar, Jiri Kosina,
	linux-kernel

On Wed, Jul 23, 2014 at 12:34:58PM -0400, Steven Rostedt wrote:
> On Wed, 23 Jul 2014 18:28:48 +0200
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Jul 21, 2014 at 08:43:17AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> > > > 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> > > > 
> > > >    I wonder if the needed memory barrier in rb_reserve_next_event()
> > > >    could be avoided by calling IPI from ring_buffer_swap_cpu().
> > > > 
> > > >    I mean that rb_reserve_next_event() will include the current check
> > > >    for swapped ring buffer without barriers. But
> > > >    ring_buffer_swap_cpu() will interrupt the affected CPU and
> > > >    basically do the barrier there only when needed.
> > > > 
> > > >    But I am not sure how this is different from calling
> > > >    smp_call_function_single() from ring_buffer_swap_cpu().
> > > >    And I am back on the question why it is dangerous with disabled
> > > >    interrupts. I can't find any clue in git history. And I miss this
> > > >    part of the picture :-(
> > > 
> > > IIRC, deadlock in the case where two CPUs attempt to invoke
> > > smp_call_function_single() at each other, but both have
> > > interrupts disabled.  It might be possible to avoid this by telling
> > > smp_call_function_single() not to wait for a response, but this often
> > > just re-introduces the deadlock at a higher level.
> > 
> > FWIW, this is what smp_call_function_single_async() does. But then the call
> > must synchronized such that no concurrent call happen until the IPI completion.
> > 
> > Otherwise you also have irq_work_queue_on() (not yet upstream but in tip/timers/nohz
> > and tip/sched/core).
> 
> Well, the code in question must wait for the IPI to finish, thus as
> Paul said, we just push the issue to the caller.

Ah right if we need to wait for IPI completion from irqs disabled, I fear we can't.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-23 16:34           ` Steven Rostedt
  2014-07-23 16:47             ` Frederic Weisbecker
@ 2014-07-23 16:49             ` Petr Mládek
  2014-07-23 16:55               ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: Petr Mládek @ 2014-07-23 16:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Paul E. McKenney, Ingo Molnar, Jiri Kosina,
	linux-kernel

On Wed 2014-07-23 12:34:58, Steven Rostedt wrote:
> On Wed, 23 Jul 2014 18:28:48 +0200
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Jul 21, 2014 at 08:43:17AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 21, 2014 at 04:43:24PM +0200, Petr Mládek wrote:
> > > > 2. Go back, do the swap on any CPU, and do memory barriers via IPI.
> > > > 
> > > >    I wonder if the needed memory barrier in rb_reserve_next_event()
> > > >    could be avoided by calling IPI from ring_buffer_swap_cpu().
> > > > 
> > > >    I mean that rb_reserve_next_event() will include the current check
> > > >    for swapped ring buffer without barriers. But
> > > >    ring_buffer_swap_cpu() will interrupt the affected CPU and
> > > >    basically do the barrier there only when needed.
> > > > 
> > > >    But I am not sure how this is different from calling
> > > >    smp_call_function_single() from ring_buffer_swap_cpu().
> > > >    And I am back on the question why it is dangerous with disabled
> > > >    interrupts. I can't find any clue in git history. And I miss this
> > > >    part of the picture :-(
> > > 
> > > IIRC, deadlock in the case where two CPUs attempt to invoke
> > > smp_call_function_single() at each other, but both have
> > > interrupts disabled.  It might be possible to avoid this by telling
> > > smp_call_function_single() not to wait for a response, but this often
> > > just re-introduces the deadlock at a higher level.
> > 
> > FWIW, this is what smp_call_function_single_async() does. But then the call
> > must synchronized such that no concurrent call happen until the IPI completion.
> > 
> > Otherwise you also have irq_work_queue_on() (not yet upstream but in tip/timers/nohz
> > and tip/sched/core).
> 
> Well, the code in question must wait for the IPI to finish, thus as
> Paul said, we just push the issue to the caller.

JFYI, I already have a variant based on https://lkml.org/lkml/2014/7/21/416
It seems to work fine. I just want to double check few things before sending.

Best Regards,
Petr

PS: I am a bit distracted now because my wife is about to give birth
to our twins :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-23 16:49             ` Petr Mládek
@ 2014-07-23 16:55               ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-07-23 16:55 UTC (permalink / raw)
  To: Petr Mládek
  Cc: Frederic Weisbecker, Paul E. McKenney, Ingo Molnar, Jiri Kosina,
	linux-kernel

On Wed, 23 Jul 2014 18:49:18 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> JFYI, I already have a variant based on https://lkml.org/lkml/2014/7/21/416
> It seems to work fine. I just want to double check few things before sending.

Looking forward to seeing it.

> 
> Best Regards,
> Petr
> 
> PS: I am a bit distracted now because my wife is about to give birth
> to our twins :-)

What? Doesn't the hospital have WIFI? ;-)

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel
  2014-07-21 16:30           ` Steven Rostedt
@ 2014-07-29  9:02             ` Jiri Kosina
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Kosina @ 2014-07-29  9:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mládek, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, linux-kernel

On Mon, 21 Jul 2014, Steven Rostedt wrote:

> > I thought that IPI used NMI and thus could not be blocked if the
> > called function was reasonable. Note that ring_buffer_swap_cpu() does not take
> > any lock and can't block anywhere. I am probably too optimistic here.
> 
> Heh, that would be a crazy system. No, IPI is a normal maskable
> interrupt. It does not use NMIs. In fact, IPI is how irq_work is
> implemented to do stuff from an NMI outside of NMI context.

Just for the sake of completness -- on x86, it is possible to send NMI IPI 
by simply doing

	apic->send_IPI_mask(mask, NMI_VECTOR);

but obviously smp_call_funcion_*() are not using this.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-07-29  9:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-16  8:58 [PATCH v3] ring-buffer: Race when writing and swapping cpu buffer in parallel Petr Mladek
2014-07-16 16:43 ` Steven Rostedt
2014-07-18 15:34   ` Petr Mládek
2014-07-21 14:43     ` Petr Mládek
2014-07-21 15:43       ` Paul E. McKenney
2014-07-21 16:18         ` Petr Mládek
2014-07-21 16:30           ` Steven Rostedt
2014-07-29  9:02             ` Jiri Kosina
2014-07-23 16:28         ` Frederic Weisbecker
2014-07-23 16:34           ` Steven Rostedt
2014-07-23 16:47             ` Frederic Weisbecker
2014-07-23 16:49             ` Petr Mládek
2014-07-23 16:55               ` Steven Rostedt
2014-07-21 16:07       ` Steven Rostedt
2014-07-22  9:41         ` Petr Mládek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).