linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] trace: Check for waiting readers before calling wakeup()
@ 2011-05-04  0:05 Vaibhav Nagarnaik
  2011-05-04 21:21 ` Vaibhav Nagarnaik
  2011-05-10 20:27 ` [PATCH v2] trace: Schedule a delayed work to call wakeup() Vaibhav Nagarnaik
  0 siblings, 2 replies; 10+ messages in thread
From: Vaibhav Nagarnaik @ 2011-05-04  0:05 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

In using syscall tracing by concurrent processes, the wakeup() that is
called in the event commit function causes contention on the spin lock
of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
tracepoints, and by running getuid_microbench from autotest in parallel
I found that the contention causes exponential latency increase in the
tracing path.

The autotest binary getuid_microbench calls getuid() in a tight loop for
the given number of iterations and measures the average time required to
complete a single invocation of syscall.

The patch checks if there are any readers waiting in the waitqueue
before the queue is locked. This reduces any contention and the
latencies associated with it when there are no readers in the queue.

In the following example, the script enables the sys_enter_getuid and
sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
with the given number of processes. The output clearly shows the latency
increase caused by contentions.

$ ~/getuid.sh 1
1000000 calls in 0.720974253 s (720.974253 ns/call)

$ ~/getuid.sh 2
1000000 calls in 1.166457554 s (1166.457554 ns/call)
1000000 calls in 1.168933765 s (1168.933765 ns/call)

$ ~/getuid.sh 3
1000000 calls in 1.783827516 s (1783.827516 ns/call)
1000000 calls in 1.795553270 s (1795.553270 ns/call)
1000000 calls in 1.796493376 s (1796.493376 ns/call)

$ ~/getuid.sh 4
1000000 calls in 4.483041796 s (4483.041796 ns/call)
1000000 calls in 4.484165388 s (4484.165388 ns/call)
1000000 calls in 4.484850762 s (4484.850762 ns/call)
1000000 calls in 4.485643576 s (4485.643576 ns/call)

$ ~/getuid.sh 5
1000000 calls in 6.497521653 s (6497.521653 ns/call)
1000000 calls in 6.502000236 s (6502.000236 ns/call)
1000000 calls in 6.501709115 s (6501.709115 ns/call)
1000000 calls in 6.502124100 s (6502.124100 ns/call)
1000000 calls in 6.502936358 s (6502.936358 ns/call)

After the patch, the latencies scale well and are exactly as expected.
$ ~/getuid.sh 1
1000000 calls in 0.626811320 s (626.811320 ns/call)

$ ~/getuid.sh 2
1000000 calls in 0.616566144 s (616.566144 ns/call)
1000000 calls in 0.622712754 s (622.712754 ns/call)

$ ~/getuid.sh 3
1000000 calls in 0.602886676 s (602.886676 ns/call)
1000000 calls in 0.612168691 s (612.168691 ns/call)
1000000 calls in 0.623516009 s (623.516009 ns/call)

$ ~/getuid.sh 4
1000000 calls in 0.614037455 s (614.037455 ns/call)
1000000 calls in 0.615070842 s (615.070842 ns/call)
1000000 calls in 0.615051941 s (615.051941 ns/call)
1000000 calls in 0.621556902 s (621.556902 ns/call)

$ ~/getuid.sh 5
1000000 calls in 0.605297117 s (605.297117 ns/call)
1000000 calls in 0.609545095 s (609.545095 ns/call)
1000000 calls in 0.610305369 s (610.305369 ns/call)
1000000 calls in 0.614056014 s (614.056014 ns/call)
1000000 calls in 0.622048279 s (622.048279 ns/call)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/trace/trace.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 06f4458..e03ee24 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -353,7 +353,7 @@ void trace_wake_up(void)
 {
 	int cpu;
 
-	if (trace_flags & TRACE_ITER_BLOCK)
+	if (trace_flags & TRACE_ITER_BLOCK || !waitqueue_active(&trace_wait))
 		return;
 	/*
 	 * The runqueue_is_locked() can fail, but this is the best we
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] trace: Check for waiting readers before calling wakeup()
  2011-05-04  0:05 [PATCH] trace: Check for waiting readers before calling wakeup() Vaibhav Nagarnaik
@ 2011-05-04 21:21 ` Vaibhav Nagarnaik
  2011-05-04 21:26   ` Frederic Weisbecker
  2011-05-10 20:27 ` [PATCH v2] trace: Schedule a delayed work to call wakeup() Vaibhav Nagarnaik
  1 sibling, 1 reply; 10+ messages in thread
From: Vaibhav Nagarnaik @ 2011-05-04 21:21 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Steven, Frederic

Does this patch look ok?



Vaibhav Nagarnaik



On Tue, May 3, 2011 at 5:05 PM, Vaibhav Nagarnaik <vnagarnaik@google.com> wrote:
> In using syscall tracing by concurrent processes, the wakeup() that is
> called in the event commit function causes contention on the spin lock
> of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
> tracepoints, and by running getuid_microbench from autotest in parallel
> I found that the contention causes exponential latency increase in the
> tracing path.
>
> The autotest binary getuid_microbench calls getuid() in a tight loop for
> the given number of iterations and measures the average time required to
> complete a single invocation of syscall.
>
> The patch checks if there are any readers waiting in the waitqueue
> before the queue is locked. This reduces any contention and the
> latencies associated with it when there are no readers in the queue.
>
> In the following example, the script enables the sys_enter_getuid and
> sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
> with the given number of processes. The output clearly shows the latency
> increase caused by contentions.
>
> $ ~/getuid.sh 1
> 1000000 calls in 0.720974253 s (720.974253 ns/call)
>
> $ ~/getuid.sh 2
> 1000000 calls in 1.166457554 s (1166.457554 ns/call)
> 1000000 calls in 1.168933765 s (1168.933765 ns/call)
>
> $ ~/getuid.sh 3
> 1000000 calls in 1.783827516 s (1783.827516 ns/call)
> 1000000 calls in 1.795553270 s (1795.553270 ns/call)
> 1000000 calls in 1.796493376 s (1796.493376 ns/call)
>
> $ ~/getuid.sh 4
> 1000000 calls in 4.483041796 s (4483.041796 ns/call)
> 1000000 calls in 4.484165388 s (4484.165388 ns/call)
> 1000000 calls in 4.484850762 s (4484.850762 ns/call)
> 1000000 calls in 4.485643576 s (4485.643576 ns/call)
>
> $ ~/getuid.sh 5
> 1000000 calls in 6.497521653 s (6497.521653 ns/call)
> 1000000 calls in 6.502000236 s (6502.000236 ns/call)
> 1000000 calls in 6.501709115 s (6501.709115 ns/call)
> 1000000 calls in 6.502124100 s (6502.124100 ns/call)
> 1000000 calls in 6.502936358 s (6502.936358 ns/call)
>
> After the patch, the latencies scale well and are exactly as expected.
> $ ~/getuid.sh 1
> 1000000 calls in 0.626811320 s (626.811320 ns/call)
>
> $ ~/getuid.sh 2
> 1000000 calls in 0.616566144 s (616.566144 ns/call)
> 1000000 calls in 0.622712754 s (622.712754 ns/call)
>
> $ ~/getuid.sh 3
> 1000000 calls in 0.602886676 s (602.886676 ns/call)
> 1000000 calls in 0.612168691 s (612.168691 ns/call)
> 1000000 calls in 0.623516009 s (623.516009 ns/call)
>
> $ ~/getuid.sh 4
> 1000000 calls in 0.614037455 s (614.037455 ns/call)
> 1000000 calls in 0.615070842 s (615.070842 ns/call)
> 1000000 calls in 0.615051941 s (615.051941 ns/call)
> 1000000 calls in 0.621556902 s (621.556902 ns/call)
>
> $ ~/getuid.sh 5
> 1000000 calls in 0.605297117 s (605.297117 ns/call)
> 1000000 calls in 0.609545095 s (609.545095 ns/call)
> 1000000 calls in 0.610305369 s (610.305369 ns/call)
> 1000000 calls in 0.614056014 s (614.056014 ns/call)
> 1000000 calls in 0.622048279 s (622.048279 ns/call)
>
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  kernel/trace/trace.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 06f4458..e03ee24 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -353,7 +353,7 @@ void trace_wake_up(void)
>  {
>        int cpu;
>
> -       if (trace_flags & TRACE_ITER_BLOCK)
> +       if (trace_flags & TRACE_ITER_BLOCK || !waitqueue_active(&trace_wait))
>                return;
>        /*
>         * The runqueue_is_locked() can fail, but this is the best we
> --
> 1.7.3.1
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] trace: Check for waiting readers before calling wakeup()
  2011-05-04 21:21 ` Vaibhav Nagarnaik
@ 2011-05-04 21:26   ` Frederic Weisbecker
  2011-05-05  3:53     ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2011-05-04 21:26 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Steven Rostedt, Ingo Molnar, Michael Rubin, David Sharp, linux-kernel

On Wed, May 04, 2011 at 02:21:28PM -0700, Vaibhav Nagarnaik wrote:
> Steven, Frederic
> 
> Does this patch look ok?
> 
> 
> 
> Vaibhav Nagarnaik
> 

No problem for me. I'll let Steve give the last word.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] trace: Check for waiting readers before calling wakeup()
  2011-05-04 21:26   ` Frederic Weisbecker
@ 2011-05-05  3:53     ` Linus Torvalds
  2011-05-05  6:05       ` Vaibhav Nagarnaik
  2011-05-05 12:39       ` Steven Rostedt
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2011-05-05  3:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Vaibhav Nagarnaik, Steven Rostedt, Ingo Molnar, Michael Rubin,
	David Sharp, linux-kernel

On Wed, May 4, 2011 at 2:26 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Wed, May 04, 2011 at 02:21:28PM -0700, Vaibhav Nagarnaik wrote:
>> Steven, Frederic
>>
>> Does this patch look ok?
>>
>> Vaibhav Nagarnaik
>>
>
> No problem for me. I'll let Steve give the last word.

Almost every time somebody does this optimization, there's a race.

The thing is, because waitqueue_active() implies no barriers, if there
is somebody adding itself to the waitqueue just at the same time as
somebody is waking things up, the waiter may lose wakeup events -
there is no memory barriers to guarantee that the waker will see
either the waiter, or the waiter will see the event that it waited
for.

I dunno. It's sometimes safe, but it's equally often not safe unless
there is some other locking around the thing. So I'd be a bit nervous.
The tracer people would need to do some serious thinking about it.

                   Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] trace: Check for waiting readers before calling wakeup()
  2011-05-05  3:53     ` Linus Torvalds
@ 2011-05-05  6:05       ` Vaibhav Nagarnaik
  2011-05-05 12:39       ` Steven Rostedt
  1 sibling, 0 replies; 10+ messages in thread
From: Vaibhav Nagarnaik @ 2011-05-05  6:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Steven Rostedt, Ingo Molnar, Michael Rubin,
	David Sharp, linux-kernel

On Wed, May 4, 2011 at 8:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, May 4, 2011 at 2:26 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> On Wed, May 04, 2011 at 02:21:28PM -0700, Vaibhav Nagarnaik wrote:
>>> Steven, Frederic
>>>
>>> Does this patch look ok?
>>>
>>> Vaibhav Nagarnaik
>>>
>>
>> No problem for me. I'll let Steve give the last word.
>
> Almost every time somebody does this optimization, there's a race.
>
> The thing is, because waitqueue_active() implies no barriers, if there
> is somebody adding itself to the waitqueue just at the same time as
> somebody is waking things up, the waiter may lose wakeup events -
> there is no memory barriers to guarantee that the waker will see
> either the waiter, or the waiter will see the event that it waited
> for.
>
> I dunno. It's sometimes safe, but it's equally often not safe unless
> there is some other locking around the thing. So I'd be a bit nervous.
> The tracer people would need to do some serious thinking about it.
>
>                   Linus
>

Thanks for looking at the patch and weighing in.

The problem I was trying to solve here is the contention when trying
to wake up a process. The process calling wakeup() has to take a
spinlock. When concurrent processes are getting traced, they contend
on locking the spinlock and cause large tracing latency. Having safe
locking around this code would defeat the purpose of this patch.

One could argue that if the trace reader does not get a wakeup() event
due to race, it will get it when the tracepoint is hit the next time.

Also, it seems the current tracers cannot rely on polling on the trace
file. The wakeup() is called only when the syscalls get traced. If any
other events are traced, the trace logging function (in
include/linux/syscalls.h) does not call wakeup(). This was done due to
hard lockup in the kernel if sched events were being traced.

I am new to this area, so Frederic and/or Steven could address your
concerns better.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] trace: Check for waiting readers before calling wakeup()
  2011-05-05  3:53     ` Linus Torvalds
  2011-05-05  6:05       ` Vaibhav Nagarnaik
@ 2011-05-05 12:39       ` Steven Rostedt
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2011-05-05 12:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Vaibhav Nagarnaik, Ingo Molnar,
	Michael Rubin, David Sharp, linux-kernel

On Wed, 2011-05-04 at 20:53 -0700, Linus Torvalds wrote:

> I dunno. It's sometimes safe, but it's equally often not safe unless
> there is some other locking around the thing. So I'd be a bit nervous.
> The tracer people would need to do some serious thinking about it.

The problem with tracing is that it can be very intrusive. We want the
data but we also do not want to affect the system either. We are
constantly fighting with Heissenberg on this.

Perhaps we could do what perf currently does and use irq_work() instead.
This should be lighter weight. The wakeup should not be at the tracing
event itself as the wakeup is too invasive.

Maybe we could use bitmasks in this case, but I would also hate to use
memory barriers here, although a memory barrier might be much lighter
weight than spinlocks that we currently are fighting with.

If irq_work() doesn't work, perhaps just have something like:

event() {
	write_data();
	smp_mb();
	if (test_and_clear_bit())
		wakeup();
}

reader() {
	set_bit();	
	smp_mb();
	if (data_empty())
		schedule();
	clear_bit();
	smp_mb();
	wake_up_other_readers_if_needed()
}

This is probably better than a spinlock.

-- Steve



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] trace: Schedule a delayed work to call wakeup()
  2011-05-04  0:05 [PATCH] trace: Check for waiting readers before calling wakeup() Vaibhav Nagarnaik
  2011-05-04 21:21 ` Vaibhav Nagarnaik
@ 2011-05-10 20:27 ` Vaibhav Nagarnaik
  2011-05-17 18:01   ` Steven Rostedt
  2011-07-05 12:48   ` [tip:perf/core] tracing: " tip-bot for Vaibhav Nagarnaik
  1 sibling, 2 replies; 10+ messages in thread
From: Vaibhav Nagarnaik @ 2011-05-10 20:27 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

In using syscall tracing by concurrent processes, the wakeup() that is
called in the event commit function causes contention on the spin lock
of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
tracepoints, and by running getuid_microbench from autotest in parallel
I found that the contention causes exponential latency increase in the
tracing path.

The autotest binary getuid_microbench calls getuid() in a tight loop for
the given number of iterations and measures the average time required to
complete a single invocation of syscall.

The patch schedules a delayed work after 2 ms once an event commit calls
to wake up the trace wait_queue. This removes the delay caused by
contention on spin lock in wakeup() and amortizes the wakeup() calls
scheduled over the 2 ms period.

In the following example, the script enables the sys_enter_getuid and
sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
with the given number of processes. The output clearly shows the latency
increase caused by contentions.

$ ~/getuid.sh 1
1000000 calls in 0.720974253 s (720.974253 ns/call)

$ ~/getuid.sh 2
1000000 calls in 1.166457554 s (1166.457554 ns/call)
1000000 calls in 1.168933765 s (1168.933765 ns/call)

$ ~/getuid.sh 3
1000000 calls in 1.783827516 s (1783.827516 ns/call)
1000000 calls in 1.795553270 s (1795.553270 ns/call)
1000000 calls in 1.796493376 s (1796.493376 ns/call)

$ ~/getuid.sh 4
1000000 calls in 4.483041796 s (4483.041796 ns/call)
1000000 calls in 4.484165388 s (4484.165388 ns/call)
1000000 calls in 4.484850762 s (4484.850762 ns/call)
1000000 calls in 4.485643576 s (4485.643576 ns/call)

$ ~/getuid.sh 5
1000000 calls in 6.497521653 s (6497.521653 ns/call)
1000000 calls in 6.502000236 s (6502.000236 ns/call)
1000000 calls in 6.501709115 s (6501.709115 ns/call)
1000000 calls in 6.502124100 s (6502.124100 ns/call)
1000000 calls in 6.502936358 s (6502.936358 ns/call)

After the patch, the latencies scale better.
1000000 calls in 0.728720455 s (728.720455 ns/call)

1000000 calls in 0.842782857 s (842.782857 ns/call)
1000000 calls in 0.883803135 s (883.803135 ns/call)

1000000 calls in 0.902077764 s (902.077764 ns/call)
1000000 calls in 0.902838202 s (902.838202 ns/call)
1000000 calls in 0.908896885 s (908.896885 ns/call)

1000000 calls in 0.932523515 s (932.523515 ns/call)
1000000 calls in 0.958009672 s (958.009672 ns/call)
1000000 calls in 0.986188020 s (986.188020 ns/call)
1000000 calls in 0.989771102 s (989.771102 ns/call)

1000000 calls in 0.933518391 s (933.518391 ns/call)
1000000 calls in 0.958897947 s (958.897947 ns/call)
1000000 calls in 1.031038897 s (1031.038897 ns/call)
1000000 calls in 1.089516025 s (1089.516025 ns/call)
1000000 calls in 1.141998347 s (1141.998347 ns/call)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 51e4e4d..c72d150 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -343,26 +343,28 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
 static int trace_stop_count;
 static DEFINE_SPINLOCK(tracing_start_lock);
 
+static atomic_t scheduled = {0};
+static void wakeup_work_handler(struct work_struct *work)
+{
+	wake_up(&trace_wait);
+}
+
+static DECLARE_DELAYED_WORK(wakeup_work, wakeup_work_handler);
+
 /**
  * trace_wake_up - wake up tasks waiting for trace input
  *
- * Simply wakes up any task that is blocked on the trace_wait
- * queue. These is used with trace_poll for tasks polling the trace.
+ * Schedules a delayed work to wake up any task that is blocked on the
+ * trace_wait queue. These is used with trace_poll for tasks polling the
+ * trace.
  */
 void trace_wake_up(void)
 {
-	int cpu;
+	const unsigned long delay = msecs_to_jiffies(2); //delay 2 ms
 
 	if (trace_flags & TRACE_ITER_BLOCK)
 		return;
-	/*
-	 * The runqueue_is_locked() can fail, but this is the best we
-	 * have for now:
-	 */
-	cpu = get_cpu();
-	if (!runqueue_is_locked(cpu))
-		wake_up(&trace_wait);
-	put_cpu();
+	schedule_delayed_work(&wakeup_work, delay);
 }
 
 static int __init set_buf_size(char *str)
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] trace: Schedule a delayed work to call wakeup()
  2011-05-10 20:27 ` [PATCH v2] trace: Schedule a delayed work to call wakeup() Vaibhav Nagarnaik
@ 2011-05-17 18:01   ` Steven Rostedt
  2011-05-17 20:06     ` Vaibhav Nagarnaik
  2011-07-05 12:48   ` [tip:perf/core] tracing: " tip-bot for Vaibhav Nagarnaik
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2011-05-17 18:01 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2011-05-10 at 13:27 -0700, Vaibhav Nagarnaik wrote:
> In using syscall tracing by concurrent processes, the wakeup() that is
> called in the event commit function causes contention on the spin lock
> of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
> tracepoints, and by running getuid_microbench from autotest in parallel
> I found that the contention causes exponential latency increase in the
> tracing path.
> 
> The autotest binary getuid_microbench calls getuid() in a tight loop for
> the given number of iterations and measures the average time required to
> complete a single invocation of syscall.
> 
> The patch schedules a delayed work after 2 ms once an event commit calls
> to wake up the trace wait_queue. This removes the delay caused by
> contention on spin lock in wakeup() and amortizes the wakeup() calls
> scheduled over the 2 ms period.
> 
> In the following example, the script enables the sys_enter_getuid and
> sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
> with the given number of processes. The output clearly shows the latency
> increase caused by contentions.
> 

Have you taken a look at irq_work() instead of delayed work? This would
probably allow us to call the wakeup version for all events.

See kernel/perf_events.c for details.

-- Steve

> $ ~/getuid.sh 1
> 1000000 calls in 0.720974253 s (720.974253 ns/call)
> 
> $ ~/getuid.sh 2
> 1000000 calls in 1.166457554 s (1166.457554 ns/call)
> 1000000 calls in 1.168933765 s (1168.933765 ns/call)
> 
> $ ~/getuid.sh 3
> 1000000 calls in 1.783827516 s (1783.827516 ns/call)
> 1000000 calls in 1.795553270 s (1795.553270 ns/call)
> 1000000 calls in 1.796493376 s (1796.493376 ns/call)
> 
> $ ~/getuid.sh 4
> 1000000 calls in 4.483041796 s (4483.041796 ns/call)
> 1000000 calls in 4.484165388 s (4484.165388 ns/call)
> 1000000 calls in 4.484850762 s (4484.850762 ns/call)
> 1000000 calls in 4.485643576 s (4485.643576 ns/call)
> 
> $ ~/getuid.sh 5
> 1000000 calls in 6.497521653 s (6497.521653 ns/call)
> 1000000 calls in 6.502000236 s (6502.000236 ns/call)
> 1000000 calls in 6.501709115 s (6501.709115 ns/call)
> 1000000 calls in 6.502124100 s (6502.124100 ns/call)
> 1000000 calls in 6.502936358 s (6502.936358 ns/call)
> 
> After the patch, the latencies scale better.
> 1000000 calls in 0.728720455 s (728.720455 ns/call)
> 
> 1000000 calls in 0.842782857 s (842.782857 ns/call)
> 1000000 calls in 0.883803135 s (883.803135 ns/call)
> 
> 1000000 calls in 0.902077764 s (902.077764 ns/call)
> 1000000 calls in 0.902838202 s (902.838202 ns/call)
> 1000000 calls in 0.908896885 s (908.896885 ns/call)
> 
> 1000000 calls in 0.932523515 s (932.523515 ns/call)
> 1000000 calls in 0.958009672 s (958.009672 ns/call)
> 1000000 calls in 0.986188020 s (986.188020 ns/call)
> 1000000 calls in 0.989771102 s (989.771102 ns/call)
> 
> 1000000 calls in 0.933518391 s (933.518391 ns/call)
> 1000000 calls in 0.958897947 s (958.897947 ns/call)
> 1000000 calls in 1.031038897 s (1031.038897 ns/call)
> 1000000 calls in 1.089516025 s (1089.516025 ns/call)
> 1000000 calls in 1.141998347 s (1141.998347 ns/call)
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
>  kernel/trace/trace.c |   24 +++++++++++++-----------
>  1 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 51e4e4d..c72d150 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -343,26 +343,28 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
>  static int trace_stop_count;
>  static DEFINE_SPINLOCK(tracing_start_lock);
>  
> +static atomic_t scheduled = {0};
> +static void wakeup_work_handler(struct work_struct *work)
> +{
> +	wake_up(&trace_wait);
> +}
> +
> +static DECLARE_DELAYED_WORK(wakeup_work, wakeup_work_handler);
> +
>  /**
>   * trace_wake_up - wake up tasks waiting for trace input
>   *
> - * Simply wakes up any task that is blocked on the trace_wait
> - * queue. These is used with trace_poll for tasks polling the trace.
> + * Schedules a delayed work to wake up any task that is blocked on the
> + * trace_wait queue. These is used with trace_poll for tasks polling the
> + * trace.
>   */
>  void trace_wake_up(void)
>  {
> -	int cpu;
> +	const unsigned long delay = msecs_to_jiffies(2); //delay 2 ms
>  
>  	if (trace_flags & TRACE_ITER_BLOCK)
>  		return;
> -	/*
> -	 * The runqueue_is_locked() can fail, but this is the best we
> -	 * have for now:
> -	 */
> -	cpu = get_cpu();
> -	if (!runqueue_is_locked(cpu))
> -		wake_up(&trace_wait);
> -	put_cpu();
> +	schedule_delayed_work(&wakeup_work, delay);
>  }
>  
>  static int __init set_buf_size(char *str)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] trace: Schedule a delayed work to call wakeup()
  2011-05-17 18:01   ` Steven Rostedt
@ 2011-05-17 20:06     ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 10+ messages in thread
From: Vaibhav Nagarnaik @ 2011-05-17 20:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, May 17, 2011 at 11:01 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-05-10 at 13:27 -0700, Vaibhav Nagarnaik wrote:
>> The patch schedules a delayed work after 2 ms once an event commit calls
>> to wake up the trace wait_queue. This removes the delay caused by
>> contention on spin lock in wakeup() and amortizes the wakeup() calls
>> scheduled over the 2 ms period.
>>
>> In the following example, the script enables the sys_enter_getuid and
>> sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
>> with the given number of processes. The output clearly shows the latency
>> increase caused by contentions.
>>
>
> Have you taken a look at irq_work() instead of delayed work? This would
> probably allow us to call the wakeup version for all events.
>
> See kernel/perf_events.c for details.
>
> -- Steve
>

I looked at irq_work() as used in perf_event.c and changed the ftrace
wakeup logic to that. I found that the latencies are still bad and don't
scale very well. Mostly because the contention moved to irq_work_queue
and in the irq work handler.

The reason to have delayed work is to have the wakeups amortize over
different calls and it is easier because all the events have a common
wakeup logic.

Also, when you talk about having a wakeup version for events, do you see
different wakeup handlers depending on the event? I can see that perf
has struct irq_work as part of the event structure, but every event has
the same wakeup handler.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perf/core] tracing: Schedule a delayed work to call wakeup()
  2011-05-10 20:27 ` [PATCH v2] trace: Schedule a delayed work to call wakeup() Vaibhav Nagarnaik
  2011-05-17 18:01   ` Steven Rostedt
@ 2011-07-05 12:48   ` tip-bot for Vaibhav Nagarnaik
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Vaibhav Nagarnaik @ 2011-07-05 12:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, vnagarnaik, hpa, mingo, torvalds, dhsharp,
	fweisbec, rostedt, mrubin, tglx

Commit-ID:  e7e2ee89a9dbf48d70a922d5625cd7320a27cbff
Gitweb:     http://git.kernel.org/tip/e7e2ee89a9dbf48d70a922d5625cd7320a27cbff
Author:     Vaibhav Nagarnaik <vnagarnaik@google.com>
AuthorDate: Tue, 10 May 2011 13:27:21 -0700
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Tue, 14 Jun 2011 21:59:41 -0400

tracing: Schedule a delayed work to call wakeup()

In using syscall tracing by concurrent processes, the wakeup() that is
called in the event commit function causes contention on the spin lock
of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
tracepoints, and by running getuid_microbench from autotest in parallel
I found that the contention causes exponential latency increase in the
tracing path.

The autotest binary getuid_microbench calls getuid() in a tight loop for
the given number of iterations and measures the average time required to
complete a single invocation of syscall.

The patch schedules a delayed work after 2 ms once an event commit calls
to wake up the trace wait_queue. This removes the delay caused by
contention on spin lock in wakeup() and amortizes the wakeup() calls
scheduled over the 2 ms period.

In the following example, the script enables the sys_enter_getuid and
sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
with the given number of processes. The output clearly shows the latency
increase caused by contentions.

$ ~/getuid.sh 1
1000000 calls in 0.720974253 s (720.974253 ns/call)

$ ~/getuid.sh 2
1000000 calls in 1.166457554 s (1166.457554 ns/call)
1000000 calls in 1.168933765 s (1168.933765 ns/call)

$ ~/getuid.sh 3
1000000 calls in 1.783827516 s (1783.827516 ns/call)
1000000 calls in 1.795553270 s (1795.553270 ns/call)
1000000 calls in 1.796493376 s (1796.493376 ns/call)

$ ~/getuid.sh 4
1000000 calls in 4.483041796 s (4483.041796 ns/call)
1000000 calls in 4.484165388 s (4484.165388 ns/call)
1000000 calls in 4.484850762 s (4484.850762 ns/call)
1000000 calls in 4.485643576 s (4485.643576 ns/call)

$ ~/getuid.sh 5
1000000 calls in 6.497521653 s (6497.521653 ns/call)
1000000 calls in 6.502000236 s (6502.000236 ns/call)
1000000 calls in 6.501709115 s (6501.709115 ns/call)
1000000 calls in 6.502124100 s (6502.124100 ns/call)
1000000 calls in 6.502936358 s (6502.936358 ns/call)

After the patch, the latencies scale better.
1000000 calls in 0.728720455 s (728.720455 ns/call)

1000000 calls in 0.842782857 s (842.782857 ns/call)
1000000 calls in 0.883803135 s (883.803135 ns/call)

1000000 calls in 0.902077764 s (902.077764 ns/call)
1000000 calls in 0.902838202 s (902.838202 ns/call)
1000000 calls in 0.908896885 s (908.896885 ns/call)

1000000 calls in 0.932523515 s (932.523515 ns/call)
1000000 calls in 0.958009672 s (958.009672 ns/call)
1000000 calls in 0.986188020 s (986.188020 ns/call)
1000000 calls in 0.989771102 s (989.771102 ns/call)

1000000 calls in 0.933518391 s (933.518391 ns/call)
1000000 calls in 0.958897947 s (958.897947 ns/call)
1000000 calls in 1.031038897 s (1031.038897 ns/call)
1000000 calls in 1.089516025 s (1089.516025 ns/call)
1000000 calls in 1.141998347 s (1141.998347 ns/call)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ee9c921..71777c8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -343,26 +343,27 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
 static int trace_stop_count;
 static DEFINE_SPINLOCK(tracing_start_lock);
 
+static void wakeup_work_handler(struct work_struct *work)
+{
+	wake_up(&trace_wait);
+}
+
+static DECLARE_DELAYED_WORK(wakeup_work, wakeup_work_handler);
+
 /**
  * trace_wake_up - wake up tasks waiting for trace input
  *
- * Simply wakes up any task that is blocked on the trace_wait
- * queue. These is used with trace_poll for tasks polling the trace.
+ * Schedules a delayed work to wake up any task that is blocked on the
+ * trace_wait queue. These is used with trace_poll for tasks polling the
+ * trace.
  */
 void trace_wake_up(void)
 {
-	int cpu;
+	const unsigned long delay = msecs_to_jiffies(2);
 
 	if (trace_flags & TRACE_ITER_BLOCK)
 		return;
-	/*
-	 * The runqueue_is_locked() can fail, but this is the best we
-	 * have for now:
-	 */
-	cpu = get_cpu();
-	if (!runqueue_is_locked(cpu))
-		wake_up(&trace_wait);
-	put_cpu();
+	schedule_delayed_work(&wakeup_work, delay);
 }
 
 static int __init set_buf_size(char *str)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-05 12:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-04  0:05 [PATCH] trace: Check for waiting readers before calling wakeup() Vaibhav Nagarnaik
2011-05-04 21:21 ` Vaibhav Nagarnaik
2011-05-04 21:26   ` Frederic Weisbecker
2011-05-05  3:53     ` Linus Torvalds
2011-05-05  6:05       ` Vaibhav Nagarnaik
2011-05-05 12:39       ` Steven Rostedt
2011-05-10 20:27 ` [PATCH v2] trace: Schedule a delayed work to call wakeup() Vaibhav Nagarnaik
2011-05-17 18:01   ` Steven Rostedt
2011-05-17 20:06     ` Vaibhav Nagarnaik
2011-07-05 12:48   ` [tip:perf/core] tracing: " tip-bot for Vaibhav Nagarnaik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).