All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] timer: Improve itimers scalability
@ 2015-08-26  3:17 Jason Low
  2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26  3:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton, Jason Low

When running a database workload on a 16 socket machine, there were
scalability issues related to itimers.

Commit 1018016c706f addressed the issue with the thread_group_cputimer
spinlock taking up a significant portion of total run time.

This patch series address the other issue where a lot of time is spent
trying to acquire the sighand lock. It was found in some cases that
200+ threads were simultaneously contending for the same sighand lock,
reducing throughput by more than 30%.

Jason Low (3):
  timer: Optimize fastpath_timer_check()
  timer: Check thread timers only when there are active thread timers
  timer: Reduce unnecessary sighand lock contention

 include/linux/init_task.h      |    1 +
 include/linux/sched.h          |    3 ++
 kernel/time/posix-cpu-timers.c |   44 +++++++++++++++++++++++++++------------
 3 files changed, 34 insertions(+), 14 deletions(-)

-- 
1.7.2.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] timer: Optimize fastpath_timer_check()
  2015-08-26  3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
@ 2015-08-26  3:17 ` Jason Low
  2015-08-26 21:57   ` Frederic Weisbecker
  2015-08-31 15:15   ` Davidlohr Bueso
  2015-08-26  3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26  3:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton, Jason Low

In fastpath_timer_check(), the task_cputime() function is always
called to compute the utime and stime values. However, this is not
necessary if there are no per-thread timers to check for. This patch
modifies the code such that we compute the task_cputime values only
when there are per-thread timers set.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/time/posix-cpu-timers.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3da..02596ff 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1117,16 +1117,15 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
 static inline int fastpath_timer_check(struct task_struct *tsk)
 {
 	struct signal_struct *sig;
-	cputime_t utime, stime;
-
-	task_cputime(tsk, &utime, &stime);
 
 	if (!task_cputime_zero(&tsk->cputime_expires)) {
-		struct task_cputime task_sample = {
-			.utime = utime,
-			.stime = stime,
-			.sum_exec_runtime = tsk->se.sum_exec_runtime
-		};
+		struct task_cputime task_sample;
+		cputime_t utime, stime;
+
+		task_cputime(tsk, &utime, &stime);
+		task_sample.utime = utime;
+		task_sample.stime = stime;
+		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
 
 		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
 			return 1;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] timer: Check thread timers only when there are active thread timers
  2015-08-26  3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
  2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
@ 2015-08-26  3:17 ` Jason Low
  2015-08-26  3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
  2015-08-26  3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
  3 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26  3:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton, Jason Low

The fastpath_timer_check() contains logic to check for if any timers
are set by checking if !task_cputime_zero(). Similarly, we can do this
before calling check_thread_timers(). In the case where there
are only process-wide timers, this will skip all the computations for
the per-thread timers when there are no per-thread timers.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/time/posix-cpu-timers.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 02596ff..535bef5 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1168,11 +1168,13 @@ void run_posix_cpu_timers(struct task_struct *tsk)
 	if (!lock_task_sighand(tsk, &flags))
 		return;
 	/*
-	 * Here we take off tsk->signal->cpu_timers[N] and
-	 * tsk->cpu_timers[N] all the timers that are firing, and
-	 * put them on the firing list.
+	 * If there are active per-thread timers, take off
+	 * tsk->signal->cpu_timers[N] and tsk->cpu_timers[N] all the
+	 * timers that are firing, and put them on the firing list.
 	 */
-	check_thread_timers(tsk, &firing);
+	if (!task_cputime_zero(&tsk->cputime_expires))
+		check_thread_timers(tsk, &firing);
+
 	/*
 	 * If there are any active process wide timers (POSIX 1.b, itimers,
 	 * RLIMIT_CPU) cputimer must be running.
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26  3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
  2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
  2015-08-26  3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
@ 2015-08-26  3:17 ` Jason Low
  2015-08-26 17:53   ` Linus Torvalds
  2015-08-26 22:56   ` Frederic Weisbecker
  2015-08-26  3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
  3 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26  3:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton, Jason Low

It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.

The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had already sent the signal and updated the "next expire" time. The
fastpath_timer_check() didn't help much since the "next expire" time
was updated later.
 
This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 include/linux/init_task.h      |    1 +
 include/linux/sched.h          |    3 +++
 kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d0b380e..3350c77 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
 	.cputimer	= { 						\
 		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
 		.running	= 0,					\
+		.checking_timer	= 0,					\
 	},								\
 	INIT_PREV_CPUTIME(sig)						\
 	.cred_guard_mutex =						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 119823d..a6c8334 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -619,6 +619,8 @@ struct task_cputime_atomic {
  * @cputime_atomic:	atomic thread group interval timers.
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
+ * @checking_timer:	non-zero when a thread is in the process of
+ *			checking for thread group timers.
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -626,6 +628,7 @@ struct task_cputime_atomic {
 struct thread_group_cputimer {
 	struct task_cputime_atomic cputime_atomic;
 	int running;
+	int checking_timer;
 };
 
 #include <linux/rwsem.h>
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 535bef5..f3ddf0e 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -962,6 +962,14 @@ static void check_process_timers(struct task_struct *tsk,
 	unsigned long soft;
 
 	/*
+	 * Signify that a thread is checking for process timers.
+	 * The checking_timer field is only modified in this function,
+	 * which is called with the sighand lock held. Thus, we can
+	 * just use WRITE_ONCE() without any further locking.
+	 */
+	WRITE_ONCE(sig->cputimer.checking_timer, 1);
+
+	/*
 	 * Collect the current process totals.
 	 */
 	thread_group_cputimer(tsk, &cputime);
@@ -1015,6 +1023,8 @@ static void check_process_timers(struct task_struct *tsk,
 	sig->cputime_expires.sched_exp = sched_expires;
 	if (task_cputime_zero(&sig->cputime_expires))
 		stop_process_timers(sig);
+
+	WRITE_ONCE(sig->cputimer.checking_timer, 0);
 }
 
 /*
@@ -1132,8 +1142,13 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 	}
 
 	sig = tsk->signal;
-	/* Check if cputimer is running. This is accessed without locking. */
-	if (READ_ONCE(sig->cputimer.running)) {
+	/*
+	 * Check if thread group timers expired if the cputimer is running
+	 * and that no other thread in the group is already checking for
+	 * thread group cputimers.
+	 */
+	if (READ_ONCE(sig->cputimer.running) &&
+	    !READ_ONCE(sig->cputimer.checking_timer)) {
 		struct task_cputime group_sample;
 
 		sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26  3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
                   ` (2 preceding siblings ...)
  2015-08-26  3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
@ 2015-08-26  3:27 ` Andrew Morton
  2015-08-26 16:33   ` Jason Low
  3 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2015-08-26  3:27 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
	Rik van Riel, Scott J Norton

On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:

> When running a database workload on a 16 socket machine, there were
> scalability issues related to itimers.
> 
> Commit 1018016c706f addressed the issue with the thread_group_cputimer
> spinlock taking up a significant portion of total run time.
> 
> This patch series address the other issue where a lot of time is spent
> trying to acquire the sighand lock. It was found in some cases that
> 200+ threads were simultaneously contending for the same sighand lock,
> reducing throughput by more than 30%.

Does this imply that the patchset increased the throughput of this
workload by 30%?

And is this test case realistic?  If not, what are the benefits on a
real-world workload?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26  3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
@ 2015-08-26 16:33   ` Jason Low
  2015-08-26 17:08     ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-26 16:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
	Rik van Riel, Scott J Norton, jason.low2

Hi Andrew,

On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> 
> > When running a database workload on a 16 socket machine, there were
> > scalability issues related to itimers.
> > 
> > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > spinlock taking up a significant portion of total run time.
> > 
> > This patch series address the other issue where a lot of time is spent
> > trying to acquire the sighand lock. It was found in some cases that
> > 200+ threads were simultaneously contending for the same sighand lock,
> > reducing throughput by more than 30%.
> 
> Does this imply that the patchset increased the throughput of this
> workload by 30%?
> 
> And is this test case realistic?  If not, what are the benefits on a
> real-world workload?

Yes, the test case with the database workload is realistic. We did write
a simple micro-benchmark that just generates the contention in this code
path to quickly test experimental patches, since the database takes
longer to set up and run. However, the performance issues and numbers
mentioned here are for the database workload.

These patches should also be beneficial for other multi-threaded
applications which uses process-wide timers particularly on systems with
a lot of cores.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 16:33   ` Jason Low
@ 2015-08-26 17:08     ` Oleg Nesterov
  2015-08-26 22:07       ` Jason Low
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2015-08-26 17:08 UTC (permalink / raw)
  To: Jason Low
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
	Rik van Riel, Scott J Norton

On 08/26, Jason Low wrote:
>
> Hi Andrew,
>
> On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> >
> > > When running a database workload on a 16 socket machine, there were
> > > scalability issues related to itimers.
> > >
> > > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > > spinlock taking up a significant portion of total run time.
> > >
> > > This patch series address the other issue where a lot of time is spent
> > > trying to acquire the sighand lock. It was found in some cases that
> > > 200+ threads were simultaneously contending for the same sighand lock,
> > > reducing throughput by more than 30%.
> >
> > Does this imply that the patchset increased the throughput of this
> > workload by 30%?
> >
> > And is this test case realistic?  If not, what are the benefits on a
> > real-world workload?
>
> Yes, the test case with the database workload is realistic.

Can't resists, sorry... to me the very idea to use the process wide posix-
cpu-timers on performance critical application doesn't look realistic ;)

However, I thinks the patches are fine.


Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26  3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
@ 2015-08-26 17:53   ` Linus Torvalds
  2015-08-26 22:31     ` Frederic Weisbecker
  2015-08-26 22:56   ` Frederic Weisbecker
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2015-08-26 17:53 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, Linux Kernel Mailing List, Frederic Weisbecker,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton

On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
>
> This patch addresses this by having the thread_group_cputimer structure
> maintain a boolean to signify when a thread in the group is already
> checking for process wide timers, and adds extra logic in the fastpath
> to check the boolean.

It is not at all obvious why the unlocked read of that variable is
safe, and why there is no race with another thread just about to end
its check_process_timers().

I can well imagine that this is all perfectly safe and fine, but I'd
really like to see that comment about _why_ that's the case, and why a
completely unlocked access without even memory barriers is fine.

                         Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
  2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
@ 2015-08-26 21:57   ` Frederic Weisbecker
  2015-08-31 15:15   ` Davidlohr Bueso
  1 sibling, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 21:57 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton

On Tue, Aug 25, 2015 at 08:17:46PM -0700, Jason Low wrote:
> In fastpath_timer_check(), the task_cputime() function is always
> called to compute the utime and stime values. However, this is not
> necessary if there are no per-thread timers to check for. This patch
> modifies the code such that we compute the task_cputime values only
> when there are per-thread timers set.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

Thanks.

> ---
>  kernel/time/posix-cpu-timers.c |   15 +++++++--------
>  1 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index 892e3da..02596ff 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -1117,16 +1117,15 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
>  static inline int fastpath_timer_check(struct task_struct *tsk)
>  {
>  	struct signal_struct *sig;
> -	cputime_t utime, stime;
> -
> -	task_cputime(tsk, &utime, &stime);
>  
>  	if (!task_cputime_zero(&tsk->cputime_expires)) {
> -		struct task_cputime task_sample = {
> -			.utime = utime,
> -			.stime = stime,
> -			.sum_exec_runtime = tsk->se.sum_exec_runtime
> -		};
> +		struct task_cputime task_sample;
> +		cputime_t utime, stime;
> +
> +		task_cputime(tsk, &utime, &stime);
> +		task_sample.utime = utime;
> +		task_sample.stime = stime;
> +		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
>  
>  		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
>  			return 1;
> -- 
> 1.7.2.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 17:08     ` Oleg Nesterov
@ 2015-08-26 22:07       ` Jason Low
  2015-08-26 22:53         ` Hideaki Kimura
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-26 22:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton,
	jason.low2, hideaki.kimura

On Wed, 2015-08-26 at 19:08 +0200, Oleg Nesterov wrote:
> On 08/26, Jason Low wrote:
> >
> > Hi Andrew,
> >
> > On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> > > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> > >
> > > > When running a database workload on a 16 socket machine, there were
> > > > scalability issues related to itimers.
> > > >
> > > > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > > > spinlock taking up a significant portion of total run time.
> > > >
> > > > This patch series address the other issue where a lot of time is spent
> > > > trying to acquire the sighand lock. It was found in some cases that
> > > > 200+ threads were simultaneously contending for the same sighand lock,
> > > > reducing throughput by more than 30%.
> > >
> > > Does this imply that the patchset increased the throughput of this
> > > workload by 30%?
> > >
> > > And is this test case realistic?  If not, what are the benefits on a
> > > real-world workload?
> >
> > Yes, the test case with the database workload is realistic.
> 
> Can't resists, sorry... to me the very idea to use the process wide posix-
> cpu-timers on performance critical application doesn't look realistic ;)

I will let Hideaki elaborate more regarding the issue at the application
level.

> However, I thinks the patches are fine.
> 
> 
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Thanks for reviewing!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26 17:53   ` Linus Torvalds
@ 2015-08-26 22:31     ` Frederic Weisbecker
  2015-08-26 22:57       ` Jason Low
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 22:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Low, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Oleg Nesterov, Paul E. McKenney, Linux Kernel Mailing List,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton

On Wed, Aug 26, 2015 at 10:53:35AM -0700, Linus Torvalds wrote:
> On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
> >
> > This patch addresses this by having the thread_group_cputimer structure
> > maintain a boolean to signify when a thread in the group is already
> > checking for process wide timers, and adds extra logic in the fastpath
> > to check the boolean.
> 
> It is not at all obvious why the unlocked read of that variable is
> safe, and why there is no race with another thread just about to end
> its check_process_timers().

The risk is when a next timer is going to expire soon after we relaxed
the "checking" variable due to a recent expiration. The thread which
expires the next timer may still see a stale value on the "checking"
state and therefore delay the timer firing until the new value is seen.
So the worst that can happen is that the timer firing gets delayed for
X jiffies (I guess in practice it's only 1 jiffy).

That said, posix cpu timers already suffer such race because
sig->cputimer.running itself is checked outside the sighand lock anyway.

> I can well imagine that this is all perfectly safe and fine, but I'd
> really like to see that comment about _why_ that's the case, and why a
> completely unlocked access without even memory barriers is fine.

Agreed, there should be a comment about that in the code (that is already full
of undocumented subtleties).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 22:07       ` Jason Low
@ 2015-08-26 22:53         ` Hideaki Kimura
  2015-08-26 23:13           ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Hideaki Kimura @ 2015-08-26 22:53 UTC (permalink / raw)
  To: Jason Low, Oleg Nesterov, Andrew Morton
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	linux-kernel, Frederic Weisbecker, Linus Torvalds,
	Steven Rostedt, Rik van Riel, Scott J Norton

Sure, let me elaborate.

Executive summary:
  Yes, enabling a process-wide timer in such a large machine is not 
wise, but sometimes users/applications cannot avoid it.


The issue was observed actually not in a database itself but in a common 
library it links to; gperftools.

The database itself is optimized for many-cores/sockets, so surely it 
avoids putting a process-wide timer or other unscalable things. It just 
links to libprofiler for an optional feature to profile performance 
bottleneck only when the user turns it on. We of course avoid turning 
the feature on unless while we debug/tune the database.

However, libprofiler sets the timer even when the client program doesn't 
invoke any of its functions: libprofiler does it when the shared library 
is loaded. We requested the developer of libprofiler to change the 
behavior, but seems like there is a reason to keep that behavior:
   https://code.google.com/p/gperftools/issues/detail?id=133

Based on this, I think there are two reasons why we should ameliorate 
this issue in kernel layer.


1. In the particular case, it's hard to prevent or even detect the issue 
in user space.

We (a team of low-level database and kernel experts) in fact spent huge 
amount of time to just figure out what's the bottleneck there because 
nothing measurable happens in user space. I pulled out countless hairs.

Also, the user has to de-link the library from the application to 
prevent the itimer installation. Imagine a case where the software is 
proprietary. It won't fly.


2. This is just one example. There could be many other such 
binaries/libraries that do similar things somewhere in a complex 
software stack.

Today we haven't heard of many such cases, but people will start hitting 
it once 100s~1,000s of cores become common.


After applying this patchset, we have observed that the performance hit 
almost completely went away at least for 240 cores. So, it's quite 
beneficial in real world.

-- 
Hideaki Kimura

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26  3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
  2015-08-26 17:53   ` Linus Torvalds
@ 2015-08-26 22:56   ` Frederic Weisbecker
  2015-08-26 23:32     ` Jason Low
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 22:56 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton

On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> It was found while running a database workload on large systems that
> significant time was spent trying to acquire the sighand lock.
> 
> The issue was that whenever an itimer expired, many threads ended up
> simultaneously trying to send the signal. Most of the time, nothing
> happened after acquiring the sighand lock because another thread
> had already sent the signal and updated the "next expire" time. The
> fastpath_timer_check() didn't help much since the "next expire" time
> was updated later.
>  
> This patch addresses this by having the thread_group_cputimer structure
> maintain a boolean to signify when a thread in the group is already
> checking for process wide timers, and adds extra logic in the fastpath
> to check the boolean.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
>  include/linux/init_task.h      |    1 +
>  include/linux/sched.h          |    3 +++
>  kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index d0b380e..3350c77 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
>  	.cputimer	= { 						\
>  		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
>  		.running	= 0,					\
> +		.checking_timer	= 0,					\
>  	},								\
>  	INIT_PREV_CPUTIME(sig)						\
>  	.cred_guard_mutex =						\
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823d..a6c8334 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -619,6 +619,8 @@ struct task_cputime_atomic {
>   * @cputime_atomic:	atomic thread group interval timers.
>   * @running:		non-zero when there are timers running and
>   * 			@cputime receives updates.
> + * @checking_timer:	non-zero when a thread is in the process of
> + *			checking for thread group timers.
>   *
>   * This structure contains the version of task_cputime, above, that is
>   * used for thread group CPU timer calculations.
> @@ -626,6 +628,7 @@ struct task_cputime_atomic {
>  struct thread_group_cputimer {
>  	struct task_cputime_atomic cputime_atomic;
>  	int running;
> +	int checking_timer;

How about a flag in the "running" field instead?

1) Space in signal_struct is not as important as in task_strut but it
   still matters.

2) We already read the "running" field locklessly. Adding a new field like
   checking_timer gets even more complicated. Ideally there should be at
   least a paired memory barrier between both. Let's just simplify that
   with a single field.

Now concerning the solution for your problem, I'm a bit uncomfortable with
lockless magics like this. When the thread sets checking_timer to 1, there is
no guarantee that the other threads in the process will see it "fast" enough
to avoid the slow path checks. Then there is also the risk that the threads
don't see "fast" enough that checking_timers has toggled to 0 and as a result
a timer may expire late. Now the lockless access of "running" already induces
such race. So if it really solves issues in practice, why not.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26 22:31     ` Frederic Weisbecker
@ 2015-08-26 22:57       ` Jason Low
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 22:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Oleg Nesterov, Paul E. McKenney, Linux Kernel Mailing List,
	Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton, jason.low2

On Thu, 2015-08-27 at 00:31 +0200, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 10:53:35AM -0700, Linus Torvalds wrote:
> > On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
> > >
> > > This patch addresses this by having the thread_group_cputimer structure
> > > maintain a boolean to signify when a thread in the group is already
> > > checking for process wide timers, and adds extra logic in the fastpath
> > > to check the boolean.
> > 
> > It is not at all obvious why the unlocked read of that variable is
> > safe, and why there is no race with another thread just about to end
> > its check_process_timers().
> 
> The risk is when a next timer is going to expire soon after we relaxed
> the "checking" variable due to a recent expiration. The thread which
> expires the next timer may still see a stale value on the "checking"
> state and therefore delay the timer firing until the new value is seen.
> So the worst that can happen is that the timer firing gets delayed for
> X jiffies (I guess in practice it's only 1 jiffy).
> 
> That said, posix cpu timers already suffer such race because
> sig->cputimer.running itself is checked outside the sighand lock anyway.

Right, in the worst case, the next thread that comes along will handle
it. The current code already expects that level of granularity, so this
shouldn't change anything much. For example, there is almost always a
delay between a timer expiring and when a thread actually calls into
check_process_timers().

> > I can well imagine that this is all perfectly safe and fine, but I'd
> > really like to see that comment about _why_ that's the case, and why a
> > completely unlocked access without even memory barriers is fine.
> 
> Agreed, there should be a comment about that in the code (that is already full
> of undocumented subtleties).

Okay, I will add more comments about this.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 22:53         ` Hideaki Kimura
@ 2015-08-26 23:13           ` Frederic Weisbecker
  2015-08-26 23:45             ` Hideaki Kimura
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 23:13 UTC (permalink / raw)
  To: Hideaki Kimura
  Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
	Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton

On Wed, Aug 26, 2015 at 03:53:26PM -0700, Hideaki Kimura wrote:
> Sure, let me elaborate.
> 
> Executive summary:
>  Yes, enabling a process-wide timer in such a large machine is not wise, but
> sometimes users/applications cannot avoid it.
> 
> 
> The issue was observed actually not in a database itself but in a common
> library it links to; gperftools.
> 
> The database itself is optimized for many-cores/sockets, so surely it avoids
> putting a process-wide timer or other unscalable things. It just links to
> libprofiler for an optional feature to profile performance bottleneck only
> when the user turns it on. We of course avoid turning the feature on unless
> while we debug/tune the database.
> 
> However, libprofiler sets the timer even when the client program doesn't
> invoke any of its functions: libprofiler does it when the shared library is
> loaded. We requested the developer of libprofiler to change the behavior,
> but seems like there is a reason to keep that behavior:
>   https://code.google.com/p/gperftools/issues/detail?id=133
> 
> Based on this, I think there are two reasons why we should ameliorate this
> issue in kernel layer.
> 
> 
> 1. In the particular case, it's hard to prevent or even detect the issue in
> user space.
> 
> We (a team of low-level database and kernel experts) in fact spent huge
> amount of time to just figure out what's the bottleneck there because
> nothing measurable happens in user space. I pulled out countless hairs.
> 
> Also, the user has to de-link the library from the application to prevent
> the itimer installation. Imagine a case where the software is proprietary.
> It won't fly.
> 
> 
> 2. This is just one example. There could be many other such
> binaries/libraries that do similar things somewhere in a complex software
> stack.
> 
> Today we haven't heard of many such cases, but people will start hitting it
> once 100s~1,000s of cores become common.
> 
> 
> After applying this patchset, we have observed that the performance hit
> almost completely went away at least for 240 cores. So, it's quite
> beneficial in real world.

I can easily imagine that many code incidentally use posix cpu timers when
it's not strictly required. But it doesn't look right to fix the kernel
for that. For this simple reason: posix cpu timers, even after your fix,
should introduce noticeable overhead. All threads of a process with a timer
enqueued in elapse the cputime in a shared atomic variable. Add to that the
overhead of enqueuing the timer, firing it. There is a bunch of scalability
issue there.

> 
> -- 
> Hideaki Kimura

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26 22:56   ` Frederic Weisbecker
@ 2015-08-26 23:32     ` Jason Low
  2015-08-27  4:52       ` Jason Low
  2015-08-27 12:53       ` Frederic Weisbecker
  0 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 23:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton, jason.low2

On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > It was found while running a database workload on large systems that
> > significant time was spent trying to acquire the sighand lock.
> > 
> > The issue was that whenever an itimer expired, many threads ended up
> > simultaneously trying to send the signal. Most of the time, nothing
> > happened after acquiring the sighand lock because another thread
> > had already sent the signal and updated the "next expire" time. The
> > fastpath_timer_check() didn't help much since the "next expire" time
> > was updated later.
> >  
> > This patch addresses this by having the thread_group_cputimer structure
> > maintain a boolean to signify when a thread in the group is already
> > checking for process wide timers, and adds extra logic in the fastpath
> > to check the boolean.
> > 
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> > ---
> >  include/linux/init_task.h      |    1 +
> >  include/linux/sched.h          |    3 +++
> >  kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
> >  3 files changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index d0b380e..3350c77 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> >  	.cputimer	= { 						\
> >  		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
> >  		.running	= 0,					\
> > +		.checking_timer	= 0,					\
> >  	},								\
> >  	INIT_PREV_CPUTIME(sig)						\
> >  	.cred_guard_mutex =						\
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 119823d..a6c8334 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> >   * @cputime_atomic:	atomic thread group interval timers.
> >   * @running:		non-zero when there are timers running and
> >   * 			@cputime receives updates.
> > + * @checking_timer:	non-zero when a thread is in the process of
> > + *			checking for thread group timers.
> >   *
> >   * This structure contains the version of task_cputime, above, that is
> >   * used for thread group CPU timer calculations.
> > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> >  struct thread_group_cputimer {
> >  	struct task_cputime_atomic cputime_atomic;
> >  	int running;
> > +	int checking_timer;
> 
> How about a flag in the "running" field instead?
> 
> 1) Space in signal_struct is not as important as in task_strut but it
>    still matters.

George Spelvin suggested that we convert them to booleans which would
make them take up 2 bytes.

> 2) We already read the "running" field locklessly. Adding a new field like
>    checking_timer gets even more complicated. Ideally there should be at
>    least a paired memory barrier between both. Let's just simplify that
>    with a single field.

hmmm, so having 1 "flag" where we access bits for the "running" and
"checking_timer"?

> Now concerning the solution for your problem, I'm a bit uncomfortable with
> lockless magics like this. When the thread sets checking_timer to 1, there is
> no guarantee that the other threads in the process will see it "fast" enough
> to avoid the slow path checks. Then there is also the risk that the threads
> don't see "fast" enough that checking_timers has toggled to 0 and as a result
> a timer may expire late. Now the lockless access of "running" already induces
> such race. So if it really solves issues in practice, why not.

Perhaps to be safer, we use something like load_acquire() and
store_release() for accessing both the ->running and ->checking_timer
fields?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 23:13           ` Frederic Weisbecker
@ 2015-08-26 23:45             ` Hideaki Kimura
  2015-08-27 13:18               ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Hideaki Kimura @ 2015-08-26 23:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
	Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton



On 08/26/2015 04:13 PM, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 03:53:26PM -0700, Hideaki Kimura wrote:
>> Sure, let me elaborate.
>>
>> Executive summary:
>>   Yes, enabling a process-wide timer in such a large machine is not wise, but
>> sometimes users/applications cannot avoid it.
>>
>>
>> The issue was observed actually not in a database itself but in a common
>> library it links to; gperftools.
>>
>> The database itself is optimized for many-cores/sockets, so surely it avoids
>> putting a process-wide timer or other unscalable things. It just links to
>> libprofiler for an optional feature to profile performance bottleneck only
>> when the user turns it on. We of course avoid turning the feature on unless
>> while we debug/tune the database.
>>
>> However, libprofiler sets the timer even when the client program doesn't
>> invoke any of its functions: libprofiler does it when the shared library is
>> loaded. We requested the developer of libprofiler to change the behavior,
>> but seems like there is a reason to keep that behavior:
>>    https://code.google.com/p/gperftools/issues/detail?id=133
>>
>> Based on this, I think there are two reasons why we should ameliorate this
>> issue in kernel layer.
>>
>>
>> 1. In the particular case, it's hard to prevent or even detect the issue in
>> user space.
>>
>> We (a team of low-level database and kernel experts) in fact spent huge
>> amount of time to just figure out what's the bottleneck there because
>> nothing measurable happens in user space. I pulled out countless hairs.
>>
>> Also, the user has to de-link the library from the application to prevent
>> the itimer installation. Imagine a case where the software is proprietary.
>> It won't fly.
>>
>>
>> 2. This is just one example. There could be many other such
>> binaries/libraries that do similar things somewhere in a complex software
>> stack.
>>
>> Today we haven't heard of many such cases, but people will start hitting it
>> once 100s~1,000s of cores become common.
>>
>>
>> After applying this patchset, we have observed that the performance hit
>> almost completely went away at least for 240 cores. So, it's quite
>> beneficial in real world.
>
> I can easily imagine that many code incidentally use posix cpu timers when
> it's not strictly required. But it doesn't look right to fix the kernel
> for that. For this simple reason: posix cpu timers, even after your fix,
> should introduce noticeable overhead. All threads of a process with a timer
> enqueued in elapse the cputime in a shared atomic variable. Add to that the
> overhead of enqueuing the timer, firing it. There is a bunch of scalability
> issue there.

I totally agree that this is not a perfect solution. If there are 10x 
more cores and sockets, just the atomic fetch_add might be too expensive.

However, it's comparatively/realistically the best thing we can do 
without any drawbacks. We can't magically force all library developers 
to write the most scalable code always.

My point is: this is a safety net, and a very effective one.

-- 
Hideaki Kimura

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26 23:32     ` Jason Low
@ 2015-08-27  4:52       ` Jason Low
  2015-08-27 12:53       ` Frederic Weisbecker
  1 sibling, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-27  4:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Rik van Riel, Scott J Norton,
	jason.low2

On Wed, 2015-08-26 at 16:32 -0700, Jason Low wrote:

> Perhaps to be safer, we use something like load_acquire() and
> store_release() for accessing both the ->running and ->checking_timer
> fields?

Regarding using barriers, one option could be to pair them between
sig->cputime_expires and the sig->cputimer.checking_timer accesses.

fastpath_timer_check()
{
	...

        if (READ_ONCE(sig->cputimer.running))
                struct task_cputime group_sample;

                sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);

                if (task_cputime_expired(&group_sample, &sig->cputime_expires)) {
			/*
			 * Comments
			 */
                        mb();

                        if (!READ_ONCE(sig->cputimer.checking_timer))
                                return 1;
                }
        }
}

check_process_timers()
{
	...

        WRITE_ONCE(sig->cputimer.checking_timer, 0);

	/*
	 * Comments
	 */
        mb();

        sig->cputime_expires.prof_exp = expires_to_cputime(prof_expires);
        sig->cputime_expires.virt_exp = expires_to_cputime(virt_expires);
        sig->cputime_expires.sched_exp = sched_expires;

	...	
}

By the time the cputime_expires fields get updated at the end of
check_process_timers(), other threads in the fastpath_timer_check()
should observe the value 0 for READ_ONCE(sig->cputimer.checking_timer).

In the case where threads in the fastpath don't observe the
WRITE_ONCE(checking_timer, 1) early enough, that's fine, since it will
just (unnecessarily) go through the slowpath which is what we also do in
the current code.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-26 23:32     ` Jason Low
  2015-08-27  4:52       ` Jason Low
@ 2015-08-27 12:53       ` Frederic Weisbecker
  2015-08-27 20:29         ` Jason Low
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 12:53 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton

On Wed, Aug 26, 2015 at 04:32:34PM -0700, Jason Low wrote:
> On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > > It was found while running a database workload on large systems that
> > > significant time was spent trying to acquire the sighand lock.
> > > 
> > > The issue was that whenever an itimer expired, many threads ended up
> > > simultaneously trying to send the signal. Most of the time, nothing
> > > happened after acquiring the sighand lock because another thread
> > > had already sent the signal and updated the "next expire" time. The
> > > fastpath_timer_check() didn't help much since the "next expire" time
> > > was updated later.
> > >  
> > > This patch addresses this by having the thread_group_cputimer structure
> > > maintain a boolean to signify when a thread in the group is already
> > > checking for process wide timers, and adds extra logic in the fastpath
> > > to check the boolean.
> > > 
> > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > ---
> > >  include/linux/init_task.h      |    1 +
> > >  include/linux/sched.h          |    3 +++
> > >  kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
> > >  3 files changed, 21 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > > index d0b380e..3350c77 100644
> > > --- a/include/linux/init_task.h
> > > +++ b/include/linux/init_task.h
> > > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> > >  	.cputimer	= { 						\
> > >  		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
> > >  		.running	= 0,					\
> > > +		.checking_timer	= 0,					\
> > >  	},								\
> > >  	INIT_PREV_CPUTIME(sig)						\
> > >  	.cred_guard_mutex =						\
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index 119823d..a6c8334 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> > >   * @cputime_atomic:	atomic thread group interval timers.
> > >   * @running:		non-zero when there are timers running and
> > >   * 			@cputime receives updates.
> > > + * @checking_timer:	non-zero when a thread is in the process of
> > > + *			checking for thread group timers.
> > >   *
> > >   * This structure contains the version of task_cputime, above, that is
> > >   * used for thread group CPU timer calculations.
> > > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > >  struct thread_group_cputimer {
> > >  	struct task_cputime_atomic cputime_atomic;
> > >  	int running;
> > > +	int checking_timer;
> > 
> > How about a flag in the "running" field instead?
> > 
> > 1) Space in signal_struct is not as important as in task_strut but it
> >    still matters.
> 
> George Spelvin suggested that we convert them to booleans which would
> make them take up 2 bytes.
> 
> > 2) We already read the "running" field locklessly. Adding a new field like
> >    checking_timer gets even more complicated. Ideally there should be at
> >    least a paired memory barrier between both. Let's just simplify that
> >    with a single field.
> 
> hmmm, so having 1 "flag" where we access bits for the "running" and
> "checking_timer"?

Sure, like:

#define CPUTIMER_RUNNING 0x1
#define CPUTIMER_CHECKING 0x2

struct thread_group_cputimer {
    struct task_cputime_atomic cputime_atomic;
    int status;
}

So from cputimer_running() you just need to check:

     if (cputimer->status & CPUTIMER_RUNNING)

And from run_posix_cpu_timer() fast-path:

     if (cputimer->status == CPUTIMER_RUNNING)

so that ignores CPUTIMER_CHECKING case.

> 
> > Now concerning the solution for your problem, I'm a bit uncomfortable with
> > lockless magics like this. When the thread sets checking_timer to 1, there is
> > no guarantee that the other threads in the process will see it "fast" enough
> > to avoid the slow path checks. Then there is also the risk that the threads
> > don't see "fast" enough that checking_timers has toggled to 0 and as a result
> > a timer may expire late. Now the lockless access of "running" already induces
> > such race. So if it really solves issues in practice, why not.
> 
> Perhaps to be safer, we use something like load_acquire() and
> store_release() for accessing both the ->running and ->checking_timer
> fields?

Well it depends against what we want to order them. If it's a single field
we don't need to order them together at least.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-26 23:45             ` Hideaki Kimura
@ 2015-08-27 13:18               ` Frederic Weisbecker
  2015-08-27 14:47                 ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 13:18 UTC (permalink / raw)
  To: Hideaki Kimura
  Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
	Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton

On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> I totally agree that this is not a perfect solution. If there are 10x more
> cores and sockets, just the atomic fetch_add might be too expensive.
> 
> However, it's comparatively/realistically the best thing we can do without
> any drawbacks. We can't magically force all library developers to write the
> most scalable code always.
> 
> My point is: this is a safety net, and a very effective one.

I mean the problem here is that a library uses an unscalable profiling feature,
unconditionally as soon as you load it without even initializing anything. And
this library is used in production.

At first sight, fixing that in the kernel is only a hack that just reduces a bit
the symptoms.

What is the technical issue that prevents from fixing that in the library itself?
Posix timers can be attached anytime.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-27 13:18               ` Frederic Weisbecker
@ 2015-08-27 14:47                 ` Steven Rostedt
  2015-08-27 15:09                   ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2015-08-27 14:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Hideaki Kimura, Jason Low, Oleg Nesterov, Andrew Morton,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton

On Thu, 27 Aug 2015 15:18:49 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > I totally agree that this is not a perfect solution. If there are 10x more
> > cores and sockets, just the atomic fetch_add might be too expensive.
> > 
> > However, it's comparatively/realistically the best thing we can do without
> > any drawbacks. We can't magically force all library developers to write the
> > most scalable code always.
> > 
> > My point is: this is a safety net, and a very effective one.
> 
> I mean the problem here is that a library uses an unscalable profiling feature,
> unconditionally as soon as you load it without even initializing anything. And
> this library is used in production.
> 
> At first sight, fixing that in the kernel is only a hack that just reduces a bit
> the symptoms.
> 
> What is the technical issue that prevents from fixing that in the library itself?
> Posix timers can be attached anytime.

I'm curious to what the downside of this patch set is? If we can fix a
problem that should be fixed in userspace, but does not harm the kernel
by doing so, is that bad? (an argument for kdbus? ;-)


As Hideaki noted, this could be a problem in other locations as well
that people have yet to find.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-27 14:47                 ` Steven Rostedt
@ 2015-08-27 15:09                   ` Thomas Gleixner
  2015-08-27 15:17                     ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2015-08-27 15:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Hideaki Kimura, Jason Low, Oleg Nesterov,
	Andrew Morton, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton

On Thu, 27 Aug 2015, Steven Rostedt wrote:
> On Thu, 27 Aug 2015 15:18:49 +0200
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > > I totally agree that this is not a perfect solution. If there are 10x more
> > > cores and sockets, just the atomic fetch_add might be too expensive.
> > > 
> > > However, it's comparatively/realistically the best thing we can do without
> > > any drawbacks. We can't magically force all library developers to write the
> > > most scalable code always.
> > > 
> > > My point is: this is a safety net, and a very effective one.
> > 
> > I mean the problem here is that a library uses an unscalable profiling feature,
> > unconditionally as soon as you load it without even initializing anything. And
> > this library is used in production.
> > 
> > At first sight, fixing that in the kernel is only a hack that just reduces a bit
> > the symptoms.
> > 
> > What is the technical issue that prevents from fixing that in the library itself?
> > Posix timers can be attached anytime.
> 
> I'm curious to what the downside of this patch set is? If we can fix a
> problem that should be fixed in userspace, but does not harm the kernel
> by doing so, is that bad? (an argument for kdbus? ;-)

The patches are not fixing a problem which should be fixed in user
space. They merily avoid lock contention which happens to be prominent
with that particular library. But avoiding lock contention even for 2
threads is a worthwhile exercise if it does not hurt otherwise. And I
can't see anything what hurts with these patches.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] timer: Improve itimers scalability
  2015-08-27 15:09                   ` Thomas Gleixner
@ 2015-08-27 15:17                     ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 15:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Hideaki Kimura, Jason Low, Oleg Nesterov,
	Andrew Morton, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton

On Thu, Aug 27, 2015 at 05:09:21PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Aug 2015, Steven Rostedt wrote:
> > On Thu, 27 Aug 2015 15:18:49 +0200
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > > > I totally agree that this is not a perfect solution. If there are 10x more
> > > > cores and sockets, just the atomic fetch_add might be too expensive.
> > > > 
> > > > However, it's comparatively/realistically the best thing we can do without
> > > > any drawbacks. We can't magically force all library developers to write the
> > > > most scalable code always.
> > > > 
> > > > My point is: this is a safety net, and a very effective one.
> > > 
> > > I mean the problem here is that a library uses an unscalable profiling feature,
> > > unconditionally as soon as you load it without even initializing anything. And
> > > this library is used in production.
> > > 
> > > At first sight, fixing that in the kernel is only a hack that just reduces a bit
> > > the symptoms.
> > > 
> > > What is the technical issue that prevents from fixing that in the library itself?
> > > Posix timers can be attached anytime.
> > 
> > I'm curious to what the downside of this patch set is? If we can fix a
> > problem that should be fixed in userspace, but does not harm the kernel
> > by doing so, is that bad? (an argument for kdbus? ;-)
> 
> The patches are not fixing a problem which should be fixed in user
> space. They merily avoid lock contention which happens to be prominent
> with that particular library. But avoiding lock contention even for 2
> threads is a worthwhile exercise if it does not hurt otherwise. And I
> can't see anything what hurts with these patches.

Sure it shouldn't really hurt anyway, since the presense of elapsing timers
itself is checked locklessly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-27 12:53       ` Frederic Weisbecker
@ 2015-08-27 20:29         ` Jason Low
  2015-08-27 21:12           ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-27 20:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton, jason.low2

On Thu, 2015-08-27 at 14:53 +0200, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 04:32:34PM -0700, Jason Low wrote:
> > On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> > > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > > > It was found while running a database workload on large systems that
> > > > significant time was spent trying to acquire the sighand lock.
> > > > 
> > > > The issue was that whenever an itimer expired, many threads ended up
> > > > simultaneously trying to send the signal. Most of the time, nothing
> > > > happened after acquiring the sighand lock because another thread
> > > > had already sent the signal and updated the "next expire" time. The
> > > > fastpath_timer_check() didn't help much since the "next expire" time
> > > > was updated later.
> > > >  
> > > > This patch addresses this by having the thread_group_cputimer structure
> > > > maintain a boolean to signify when a thread in the group is already
> > > > checking for process wide timers, and adds extra logic in the fastpath
> > > > to check the boolean.
> > > > 
> > > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > > ---
> > > >  include/linux/init_task.h      |    1 +
> > > >  include/linux/sched.h          |    3 +++
> > > >  kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
> > > >  3 files changed, 21 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > > > index d0b380e..3350c77 100644
> > > > --- a/include/linux/init_task.h
> > > > +++ b/include/linux/init_task.h
> > > > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> > > >  	.cputimer	= { 						\
> > > >  		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
> > > >  		.running	= 0,					\
> > > > +		.checking_timer	= 0,					\
> > > >  	},								\
> > > >  	INIT_PREV_CPUTIME(sig)						\
> > > >  	.cred_guard_mutex =						\
> > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > index 119823d..a6c8334 100644
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> > > >   * @cputime_atomic:	atomic thread group interval timers.
> > > >   * @running:		non-zero when there are timers running and
> > > >   * 			@cputime receives updates.
> > > > + * @checking_timer:	non-zero when a thread is in the process of
> > > > + *			checking for thread group timers.
> > > >   *
> > > >   * This structure contains the version of task_cputime, above, that is
> > > >   * used for thread group CPU timer calculations.
> > > > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > > >  struct thread_group_cputimer {
> > > >  	struct task_cputime_atomic cputime_atomic;
> > > >  	int running;
> > > > +	int checking_timer;
> > > 
> > > How about a flag in the "running" field instead?
> > > 
> > > 1) Space in signal_struct is not as important as in task_strut but it
> > >    still matters.
> > 
> > George Spelvin suggested that we convert them to booleans which would
> > make them take up 2 bytes.
> > 
> > > 2) We already read the "running" field locklessly. Adding a new field like
> > >    checking_timer gets even more complicated. Ideally there should be at
> > >    least a paired memory barrier between both. Let's just simplify that
> > >    with a single field.
> > 
> > hmmm, so having 1 "flag" where we access bits for the "running" and
> > "checking_timer"?
> 
> Sure, like:
> 
> #define CPUTIMER_RUNNING 0x1
> #define CPUTIMER_CHECKING 0x2
> 
> struct thread_group_cputimer {
>     struct task_cputime_atomic cputime_atomic;
>     int status;
> }
> 
> So from cputimer_running() you just need to check:
> 
>      if (cputimer->status & CPUTIMER_RUNNING)
> 
> And from run_posix_cpu_timer() fast-path:
> 
>      if (cputimer->status == CPUTIMER_RUNNING)
> 
> so that ignores CPUTIMER_CHECKING case.

Right, having just 1 "status" field can simply things a bit. The
(cputimer->status == CPUTIMER_RUNNING) check does appear misleading
though, since we're technically not only checking for if the "cputimer
is running".

Maybe something like:

int status = cputimer->status;
if ((status & CPUTIMER_RUNNING) && !(status & CPUTIMER_CHECKING))

makes it more obvious what's going on here.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
  2015-08-27 20:29         ` Jason Low
@ 2015-08-27 21:12           ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 21:12 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton

On Thu, Aug 27, 2015 at 01:29:50PM -0700, Jason Low wrote:
> On Thu, 2015-08-27 at 14:53 +0200, Frederic Weisbecker wrote:
> > Sure, like:
> > 
> > #define CPUTIMER_RUNNING 0x1
> > #define CPUTIMER_CHECKING 0x2
> > 
> > struct thread_group_cputimer {
> >     struct task_cputime_atomic cputime_atomic;
> >     int status;
> > }
> > 
> > So from cputimer_running() you just need to check:
> > 
> >      if (cputimer->status & CPUTIMER_RUNNING)
> > 
> > And from run_posix_cpu_timer() fast-path:
> > 
> >      if (cputimer->status == CPUTIMER_RUNNING)
> > 
> > so that ignores CPUTIMER_CHECKING case.
> 
> Right, having just 1 "status" field can simply things a bit. The
> (cputimer->status == CPUTIMER_RUNNING) check does appear misleading
> though, since we're technically not only checking for if the "cputimer
> is running".
> 
> Maybe something like:
> 
> int status = cputimer->status;
> if ((status & CPUTIMER_RUNNING) && !(status & CPUTIMER_CHECKING))
> 
> makes it more obvious what's going on here.

The result is the same but it may clarify the code indeed.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
  2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
  2015-08-26 21:57   ` Frederic Weisbecker
@ 2015-08-31 15:15   ` Davidlohr Bueso
  2015-08-31 19:40     ` Jason Low
  1 sibling, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2015-08-31 15:15 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Linus Torvalds, Steven Rostedt, Andrew Morton, Terry Rudd,
	Rik van Riel, Scott J Norton

On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote:
> In fastpath_timer_check(), the task_cputime() function is always
> called to compute the utime and stime values. However, this is not
> necessary if there are no per-thread timers to check for. This patch
> modifies the code such that we compute the task_cputime values only
> when there are per-thread timers set.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
  2015-08-31 15:15   ` Davidlohr Bueso
@ 2015-08-31 19:40     ` Jason Low
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-31 19:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
	Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
	Scott J Norton, jason.low2

On Mon, 2015-08-31 at 08:15 -0700, Davidlohr Bueso wrote:
> On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote:
> > In fastpath_timer_check(), the task_cputime() function is always
> > called to compute the utime and stime values. However, this is not
> > necessary if there are no per-thread timers to check for. This patch
> > modifies the code such that we compute the task_cputime values only
> > when there are per-thread timers set.
> > 
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> 
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

Thanks David and Frederic.

I want to mention that this patch has been slightly changed. As
suggested by George, those extra utime, stime local variables are not
needed anymore, and we should be able to directly call:

task_cputime(tsk, &task_sample.utime, &task_sample.stime);

---
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3da..3a8c5b4 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1117,16 +1117,13 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
 static inline int fastpath_timer_check(struct task_struct *tsk)
 {
        struct signal_struct *sig;
-       cputime_t utime, stime;
-
-       task_cputime(tsk, &utime, &stime);

        if (!task_cputime_zero(&tsk->cputime_expires)) {
-               struct task_cputime task_sample = {
-                       .utime = utime,
-                       .stime = stime,
-                       .sum_exec_runtime = tsk->se.sum_exec_runtime
-               };
+               struct task_cputime task_sample;
+
+               task_cputime(tsk, &task_sample.utime, &task_sample.stime);
+
+               task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;

                if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
                        return 1;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
  2015-08-26 16:57 [PATCH 1/3] timer: Optimize fastpath_timer_check() George Spelvin
@ 2015-08-26 17:31 ` Jason Low
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 17:31 UTC (permalink / raw)
  To: George Spelvin; +Cc: akpm, linux-kernel, jason.low2

On Wed, 2015-08-26 at 12:57 -0400, George Spelvin wrote:
> > 	if (!task_cputime_zero(&tsk->cputime_expires)) {
> >+		struct task_cputime task_sample;
> >+		cputime_t utime, stime;
> >+
> >+		task_cputime(tsk, &utime, &stime);
> >+		task_sample.utime = utime;
> >+		task_sample.stime = stime;
> >+		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
> 
> Er, task_sample.[us]time are already the correct types.
> Whay are the local variables necessary?  How about:
> 
>  	if (!task_cputime_zero(&tsk->cputime_expires)) {
> +		struct task_cputime task_sample;
> +
> +		task_cputime(tsk, &task_simple.utime, &task_simple.stime);
> +		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;

Yes, good point. Now that we're moving the task_cputime() call to after
the task_sample structure is declared, the utime and stime local
variables are not required anymore.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
@ 2015-08-26 16:57 George Spelvin
  2015-08-26 17:31 ` Jason Low
  0 siblings, 1 reply; 29+ messages in thread
From: George Spelvin @ 2015-08-26 16:57 UTC (permalink / raw)
  To: jason.low2; +Cc: akpm, linux, linux-kernel

> 	if (!task_cputime_zero(&tsk->cputime_expires)) {
>+		struct task_cputime task_sample;
>+		cputime_t utime, stime;
>+
>+		task_cputime(tsk, &utime, &stime);
>+		task_sample.utime = utime;
>+		task_sample.stime = stime;
>+		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;

Er, task_sample.[us]time are already the correct types.
Whay are the local variables necessary?  How about:

 	if (!task_cputime_zero(&tsk->cputime_expires)) {
+		struct task_cputime task_sample;
+
+		task_cputime(tsk, &task_simple.utime, &task_simple.stime);
+		task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-08-31 19:44 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-26  3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
2015-08-26  3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
2015-08-26 21:57   ` Frederic Weisbecker
2015-08-31 15:15   ` Davidlohr Bueso
2015-08-31 19:40     ` Jason Low
2015-08-26  3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
2015-08-26  3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
2015-08-26 17:53   ` Linus Torvalds
2015-08-26 22:31     ` Frederic Weisbecker
2015-08-26 22:57       ` Jason Low
2015-08-26 22:56   ` Frederic Weisbecker
2015-08-26 23:32     ` Jason Low
2015-08-27  4:52       ` Jason Low
2015-08-27 12:53       ` Frederic Weisbecker
2015-08-27 20:29         ` Jason Low
2015-08-27 21:12           ` Frederic Weisbecker
2015-08-26  3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
2015-08-26 16:33   ` Jason Low
2015-08-26 17:08     ` Oleg Nesterov
2015-08-26 22:07       ` Jason Low
2015-08-26 22:53         ` Hideaki Kimura
2015-08-26 23:13           ` Frederic Weisbecker
2015-08-26 23:45             ` Hideaki Kimura
2015-08-27 13:18               ` Frederic Weisbecker
2015-08-27 14:47                 ` Steven Rostedt
2015-08-27 15:09                   ` Thomas Gleixner
2015-08-27 15:17                     ` Frederic Weisbecker
2015-08-26 16:57 [PATCH 1/3] timer: Optimize fastpath_timer_check() George Spelvin
2015-08-26 17:31 ` Jason Low

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.