* [PATCH 0/3] timer: Improve itimers scalability
@ 2015-08-26 3:17 Jason Low
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
` (3 more replies)
0 siblings, 4 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 3:17 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney
Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton, Jason Low
When running a database workload on a 16 socket machine, there were
scalability issues related to itimers.
Commit 1018016c706f addressed the issue with the thread_group_cputimer
spinlock taking up a significant portion of total run time.
This patch series address the other issue where a lot of time is spent
trying to acquire the sighand lock. It was found in some cases that
200+ threads were simultaneously contending for the same sighand lock,
reducing throughput by more than 30%.
Jason Low (3):
timer: Optimize fastpath_timer_check()
timer: Check thread timers only when there are active thread timers
timer: Reduce unnecessary sighand lock contention
include/linux/init_task.h | 1 +
include/linux/sched.h | 3 ++
kernel/time/posix-cpu-timers.c | 44 +++++++++++++++++++++++++++------------
3 files changed, 34 insertions(+), 14 deletions(-)
--
1.7.2.5
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] timer: Optimize fastpath_timer_check()
2015-08-26 3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
@ 2015-08-26 3:17 ` Jason Low
2015-08-26 21:57 ` Frederic Weisbecker
2015-08-31 15:15 ` Davidlohr Bueso
2015-08-26 3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
` (2 subsequent siblings)
3 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 3:17 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney
Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton, Jason Low
In fastpath_timer_check(), the task_cputime() function is always
called to compute the utime and stime values. However, this is not
necessary if there are no per-thread timers to check for. This patch
modifies the code such that we compute the task_cputime values only
when there are per-thread timers set.
Signed-off-by: Jason Low <jason.low2@hp.com>
---
kernel/time/posix-cpu-timers.c | 15 +++++++--------
1 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3da..02596ff 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1117,16 +1117,15 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
static inline int fastpath_timer_check(struct task_struct *tsk)
{
struct signal_struct *sig;
- cputime_t utime, stime;
-
- task_cputime(tsk, &utime, &stime);
if (!task_cputime_zero(&tsk->cputime_expires)) {
- struct task_cputime task_sample = {
- .utime = utime,
- .stime = stime,
- .sum_exec_runtime = tsk->se.sum_exec_runtime
- };
+ struct task_cputime task_sample;
+ cputime_t utime, stime;
+
+ task_cputime(tsk, &utime, &stime);
+ task_sample.utime = utime;
+ task_sample.stime = stime;
+ task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
return 1;
--
1.7.2.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] timer: Check thread timers only when there are active thread timers
2015-08-26 3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
@ 2015-08-26 3:17 ` Jason Low
2015-08-26 3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
2015-08-26 3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
3 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 3:17 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney
Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton, Jason Low
The fastpath_timer_check() contains logic to check for if any timers
are set by checking if !task_cputime_zero(). Similarly, we can do this
before calling check_thread_timers(). In the case where there
are only process-wide timers, this will skip all the computations for
the per-thread timers when there are no per-thread timers.
Signed-off-by: Jason Low <jason.low2@hp.com>
---
kernel/time/posix-cpu-timers.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 02596ff..535bef5 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1168,11 +1168,13 @@ void run_posix_cpu_timers(struct task_struct *tsk)
if (!lock_task_sighand(tsk, &flags))
return;
/*
- * Here we take off tsk->signal->cpu_timers[N] and
- * tsk->cpu_timers[N] all the timers that are firing, and
- * put them on the firing list.
+ * If there are active per-thread timers, take off
+ * tsk->signal->cpu_timers[N] and tsk->cpu_timers[N] all the
+ * timers that are firing, and put them on the firing list.
*/
- check_thread_timers(tsk, &firing);
+ if (!task_cputime_zero(&tsk->cputime_expires))
+ check_thread_timers(tsk, &firing);
+
/*
* If there are any active process wide timers (POSIX 1.b, itimers,
* RLIMIT_CPU) cputimer must be running.
--
1.7.2.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
2015-08-26 3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
@ 2015-08-26 3:17 ` Jason Low
2015-08-26 17:53 ` Linus Torvalds
2015-08-26 22:56 ` Frederic Weisbecker
2015-08-26 3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
3 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 3:17 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney
Cc: linux-kernel, Frederic Weisbecker, Linus Torvalds,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton, Jason Low
It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.
The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had already sent the signal and updated the "next expire" time. The
fastpath_timer_check() didn't help much since the "next expire" time
was updated later.
This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.
Signed-off-by: Jason Low <jason.low2@hp.com>
---
include/linux/init_task.h | 1 +
include/linux/sched.h | 3 +++
kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d0b380e..3350c77 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
.cputimer = { \
.cputime_atomic = INIT_CPUTIME_ATOMIC, \
.running = 0, \
+ .checking_timer = 0, \
}, \
INIT_PREV_CPUTIME(sig) \
.cred_guard_mutex = \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 119823d..a6c8334 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -619,6 +619,8 @@ struct task_cputime_atomic {
* @cputime_atomic: atomic thread group interval timers.
* @running: non-zero when there are timers running and
* @cputime receives updates.
+ * @checking_timer: non-zero when a thread is in the process of
+ * checking for thread group timers.
*
* This structure contains the version of task_cputime, above, that is
* used for thread group CPU timer calculations.
@@ -626,6 +628,7 @@ struct task_cputime_atomic {
struct thread_group_cputimer {
struct task_cputime_atomic cputime_atomic;
int running;
+ int checking_timer;
};
#include <linux/rwsem.h>
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 535bef5..f3ddf0e 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -962,6 +962,14 @@ static void check_process_timers(struct task_struct *tsk,
unsigned long soft;
/*
+ * Signify that a thread is checking for process timers.
+ * The checking_timer field is only modified in this function,
+ * which is called with the sighand lock held. Thus, we can
+ * just use WRITE_ONCE() without any further locking.
+ */
+ WRITE_ONCE(sig->cputimer.checking_timer, 1);
+
+ /*
* Collect the current process totals.
*/
thread_group_cputimer(tsk, &cputime);
@@ -1015,6 +1023,8 @@ static void check_process_timers(struct task_struct *tsk,
sig->cputime_expires.sched_exp = sched_expires;
if (task_cputime_zero(&sig->cputime_expires))
stop_process_timers(sig);
+
+ WRITE_ONCE(sig->cputimer.checking_timer, 0);
}
/*
@@ -1132,8 +1142,13 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
}
sig = tsk->signal;
- /* Check if cputimer is running. This is accessed without locking. */
- if (READ_ONCE(sig->cputimer.running)) {
+ /*
+ * Check if thread group timers expired if the cputimer is running
+ * and that no other thread in the group is already checking for
+ * thread group cputimers.
+ */
+ if (READ_ONCE(sig->cputimer.running) &&
+ !READ_ONCE(sig->cputimer.checking_timer)) {
struct task_cputime group_sample;
sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);
--
1.7.2.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
` (2 preceding siblings ...)
2015-08-26 3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
@ 2015-08-26 3:27 ` Andrew Morton
2015-08-26 16:33 ` Jason Low
3 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2015-08-26 3:27 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
Rik van Riel, Scott J Norton
On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> When running a database workload on a 16 socket machine, there were
> scalability issues related to itimers.
>
> Commit 1018016c706f addressed the issue with the thread_group_cputimer
> spinlock taking up a significant portion of total run time.
>
> This patch series address the other issue where a lot of time is spent
> trying to acquire the sighand lock. It was found in some cases that
> 200+ threads were simultaneously contending for the same sighand lock,
> reducing throughput by more than 30%.
Does this imply that the patchset increased the throughput of this
workload by 30%?
And is this test case realistic? If not, what are the benefits on a
real-world workload?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
@ 2015-08-26 16:33 ` Jason Low
2015-08-26 17:08 ` Oleg Nesterov
0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-26 16:33 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
Rik van Riel, Scott J Norton, jason.low2
Hi Andrew,
On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
>
> > When running a database workload on a 16 socket machine, there were
> > scalability issues related to itimers.
> >
> > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > spinlock taking up a significant portion of total run time.
> >
> > This patch series address the other issue where a lot of time is spent
> > trying to acquire the sighand lock. It was found in some cases that
> > 200+ threads were simultaneously contending for the same sighand lock,
> > reducing throughput by more than 30%.
>
> Does this imply that the patchset increased the throughput of this
> workload by 30%?
>
> And is this test case realistic? If not, what are the benefits on a
> real-world workload?
Yes, the test case with the database workload is realistic. We did write
a simple micro-benchmark that just generates the contention in this code
path to quickly test experimental patches, since the database takes
longer to set up and run. However, the performance issues and numbers
mentioned here are for the database workload.
These patches should also be beneficial for other multi-threaded
applications which uses process-wide timers particularly on systems with
a lot of cores.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 16:33 ` Jason Low
@ 2015-08-26 17:08 ` Oleg Nesterov
2015-08-26 22:07 ` Jason Low
0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2015-08-26 17:08 UTC (permalink / raw)
To: Jason Low
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Linus Torvalds, Davidlohr Bueso, Steven Rostedt, Terry Rudd,
Rik van Riel, Scott J Norton
On 08/26, Jason Low wrote:
>
> Hi Andrew,
>
> On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> >
> > > When running a database workload on a 16 socket machine, there were
> > > scalability issues related to itimers.
> > >
> > > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > > spinlock taking up a significant portion of total run time.
> > >
> > > This patch series address the other issue where a lot of time is spent
> > > trying to acquire the sighand lock. It was found in some cases that
> > > 200+ threads were simultaneously contending for the same sighand lock,
> > > reducing throughput by more than 30%.
> >
> > Does this imply that the patchset increased the throughput of this
> > workload by 30%?
> >
> > And is this test case realistic? If not, what are the benefits on a
> > real-world workload?
>
> Yes, the test case with the database workload is realistic.
Can't resists, sorry... to me the very idea to use the process wide posix-
cpu-timers on performance critical application doesn't look realistic ;)
However, I thinks the patches are fine.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
@ 2015-08-26 17:53 ` Linus Torvalds
2015-08-26 22:31 ` Frederic Weisbecker
2015-08-26 22:56 ` Frederic Weisbecker
1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2015-08-26 17:53 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, Linux Kernel Mailing List, Frederic Weisbecker,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton
On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
>
> This patch addresses this by having the thread_group_cputimer structure
> maintain a boolean to signify when a thread in the group is already
> checking for process wide timers, and adds extra logic in the fastpath
> to check the boolean.
It is not at all obvious why the unlocked read of that variable is
safe, and why there is no race with another thread just about to end
its check_process_timers().
I can well imagine that this is all perfectly safe and fine, but I'd
really like to see that comment about _why_ that's the case, and why a
completely unlocked access without even memory barriers is fine.
Linus
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
@ 2015-08-26 21:57 ` Frederic Weisbecker
2015-08-31 15:15 ` Davidlohr Bueso
1 sibling, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 21:57 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton
On Tue, Aug 25, 2015 at 08:17:46PM -0700, Jason Low wrote:
> In fastpath_timer_check(), the task_cputime() function is always
> called to compute the utime and stime values. However, this is not
> necessary if there are no per-thread timers to check for. This patch
> modifies the code such that we compute the task_cputime values only
> when there are per-thread timers set.
>
> Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Thanks.
> ---
> kernel/time/posix-cpu-timers.c | 15 +++++++--------
> 1 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index 892e3da..02596ff 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -1117,16 +1117,15 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
> static inline int fastpath_timer_check(struct task_struct *tsk)
> {
> struct signal_struct *sig;
> - cputime_t utime, stime;
> -
> - task_cputime(tsk, &utime, &stime);
>
> if (!task_cputime_zero(&tsk->cputime_expires)) {
> - struct task_cputime task_sample = {
> - .utime = utime,
> - .stime = stime,
> - .sum_exec_runtime = tsk->se.sum_exec_runtime
> - };
> + struct task_cputime task_sample;
> + cputime_t utime, stime;
> +
> + task_cputime(tsk, &utime, &stime);
> + task_sample.utime = utime;
> + task_sample.stime = stime;
> + task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
>
> if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
> return 1;
> --
> 1.7.2.5
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 17:08 ` Oleg Nesterov
@ 2015-08-26 22:07 ` Jason Low
2015-08-26 22:53 ` Hideaki Kimura
0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-26 22:07 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton,
jason.low2, hideaki.kimura
On Wed, 2015-08-26 at 19:08 +0200, Oleg Nesterov wrote:
> On 08/26, Jason Low wrote:
> >
> > Hi Andrew,
> >
> > On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote:
> > > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low <jason.low2@hp.com> wrote:
> > >
> > > > When running a database workload on a 16 socket machine, there were
> > > > scalability issues related to itimers.
> > > >
> > > > Commit 1018016c706f addressed the issue with the thread_group_cputimer
> > > > spinlock taking up a significant portion of total run time.
> > > >
> > > > This patch series address the other issue where a lot of time is spent
> > > > trying to acquire the sighand lock. It was found in some cases that
> > > > 200+ threads were simultaneously contending for the same sighand lock,
> > > > reducing throughput by more than 30%.
> > >
> > > Does this imply that the patchset increased the throughput of this
> > > workload by 30%?
> > >
> > > And is this test case realistic? If not, what are the benefits on a
> > > real-world workload?
> >
> > Yes, the test case with the database workload is realistic.
>
> Can't resists, sorry... to me the very idea to use the process wide posix-
> cpu-timers on performance critical application doesn't look realistic ;)
I will let Hideaki elaborate more regarding the issue at the application
level.
> However, I thinks the patches are fine.
>
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Thanks for reviewing!
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 17:53 ` Linus Torvalds
@ 2015-08-26 22:31 ` Frederic Weisbecker
2015-08-26 22:57 ` Jason Low
0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 22:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jason Low, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
Oleg Nesterov, Paul E. McKenney, Linux Kernel Mailing List,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton
On Wed, Aug 26, 2015 at 10:53:35AM -0700, Linus Torvalds wrote:
> On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
> >
> > This patch addresses this by having the thread_group_cputimer structure
> > maintain a boolean to signify when a thread in the group is already
> > checking for process wide timers, and adds extra logic in the fastpath
> > to check the boolean.
>
> It is not at all obvious why the unlocked read of that variable is
> safe, and why there is no race with another thread just about to end
> its check_process_timers().
The risk is when a next timer is going to expire soon after we relaxed
the "checking" variable due to a recent expiration. The thread which
expires the next timer may still see a stale value on the "checking"
state and therefore delay the timer firing until the new value is seen.
So the worst that can happen is that the timer firing gets delayed for
X jiffies (I guess in practice it's only 1 jiffy).
That said, posix cpu timers already suffer such race because
sig->cputimer.running itself is checked outside the sighand lock anyway.
> I can well imagine that this is all perfectly safe and fine, but I'd
> really like to see that comment about _why_ that's the case, and why a
> completely unlocked access without even memory barriers is fine.
Agreed, there should be a comment about that in the code (that is already full
of undocumented subtleties).
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 22:07 ` Jason Low
@ 2015-08-26 22:53 ` Hideaki Kimura
2015-08-26 23:13 ` Frederic Weisbecker
0 siblings, 1 reply; 29+ messages in thread
From: Hideaki Kimura @ 2015-08-26 22:53 UTC (permalink / raw)
To: Jason Low, Oleg Nesterov, Andrew Morton
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
linux-kernel, Frederic Weisbecker, Linus Torvalds,
Steven Rostedt, Rik van Riel, Scott J Norton
Sure, let me elaborate.
Executive summary:
Yes, enabling a process-wide timer in such a large machine is not
wise, but sometimes users/applications cannot avoid it.
The issue was observed actually not in a database itself but in a common
library it links to; gperftools.
The database itself is optimized for many-cores/sockets, so surely it
avoids putting a process-wide timer or other unscalable things. It just
links to libprofiler for an optional feature to profile performance
bottleneck only when the user turns it on. We of course avoid turning
the feature on unless while we debug/tune the database.
However, libprofiler sets the timer even when the client program doesn't
invoke any of its functions: libprofiler does it when the shared library
is loaded. We requested the developer of libprofiler to change the
behavior, but seems like there is a reason to keep that behavior:
https://code.google.com/p/gperftools/issues/detail?id=133
Based on this, I think there are two reasons why we should ameliorate
this issue in kernel layer.
1. In the particular case, it's hard to prevent or even detect the issue
in user space.
We (a team of low-level database and kernel experts) in fact spent huge
amount of time to just figure out what's the bottleneck there because
nothing measurable happens in user space. I pulled out countless hairs.
Also, the user has to de-link the library from the application to
prevent the itimer installation. Imagine a case where the software is
proprietary. It won't fly.
2. This is just one example. There could be many other such
binaries/libraries that do similar things somewhere in a complex
software stack.
Today we haven't heard of many such cases, but people will start hitting
it once 100s~1,000s of cores become common.
After applying this patchset, we have observed that the performance hit
almost completely went away at least for 240 cores. So, it's quite
beneficial in real world.
--
Hideaki Kimura
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
2015-08-26 17:53 ` Linus Torvalds
@ 2015-08-26 22:56 ` Frederic Weisbecker
2015-08-26 23:32 ` Jason Low
1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 22:56 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton
On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> It was found while running a database workload on large systems that
> significant time was spent trying to acquire the sighand lock.
>
> The issue was that whenever an itimer expired, many threads ended up
> simultaneously trying to send the signal. Most of the time, nothing
> happened after acquiring the sighand lock because another thread
> had already sent the signal and updated the "next expire" time. The
> fastpath_timer_check() didn't help much since the "next expire" time
> was updated later.
>
> This patch addresses this by having the thread_group_cputimer structure
> maintain a boolean to signify when a thread in the group is already
> checking for process wide timers, and adds extra logic in the fastpath
> to check the boolean.
>
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
> include/linux/init_task.h | 1 +
> include/linux/sched.h | 3 +++
> kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index d0b380e..3350c77 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> .cputimer = { \
> .cputime_atomic = INIT_CPUTIME_ATOMIC, \
> .running = 0, \
> + .checking_timer = 0, \
> }, \
> INIT_PREV_CPUTIME(sig) \
> .cred_guard_mutex = \
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823d..a6c8334 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> * @cputime_atomic: atomic thread group interval timers.
> * @running: non-zero when there are timers running and
> * @cputime receives updates.
> + * @checking_timer: non-zero when a thread is in the process of
> + * checking for thread group timers.
> *
> * This structure contains the version of task_cputime, above, that is
> * used for thread group CPU timer calculations.
> @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> struct thread_group_cputimer {
> struct task_cputime_atomic cputime_atomic;
> int running;
> + int checking_timer;
How about a flag in the "running" field instead?
1) Space in signal_struct is not as important as in task_strut but it
still matters.
2) We already read the "running" field locklessly. Adding a new field like
checking_timer gets even more complicated. Ideally there should be at
least a paired memory barrier between both. Let's just simplify that
with a single field.
Now concerning the solution for your problem, I'm a bit uncomfortable with
lockless magics like this. When the thread sets checking_timer to 1, there is
no guarantee that the other threads in the process will see it "fast" enough
to avoid the slow path checks. Then there is also the risk that the threads
don't see "fast" enough that checking_timers has toggled to 0 and as a result
a timer may expire late. Now the lockless access of "running" already induces
such race. So if it really solves issues in practice, why not.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 22:31 ` Frederic Weisbecker
@ 2015-08-26 22:57 ` Jason Low
0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 22:57 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
Oleg Nesterov, Paul E. McKenney, Linux Kernel Mailing List,
Davidlohr Bueso, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton, jason.low2
On Thu, 2015-08-27 at 00:31 +0200, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 10:53:35AM -0700, Linus Torvalds wrote:
> > On Tue, Aug 25, 2015 at 8:17 PM, Jason Low <jason.low2@hp.com> wrote:
> > >
> > > This patch addresses this by having the thread_group_cputimer structure
> > > maintain a boolean to signify when a thread in the group is already
> > > checking for process wide timers, and adds extra logic in the fastpath
> > > to check the boolean.
> >
> > It is not at all obvious why the unlocked read of that variable is
> > safe, and why there is no race with another thread just about to end
> > its check_process_timers().
>
> The risk is when a next timer is going to expire soon after we relaxed
> the "checking" variable due to a recent expiration. The thread which
> expires the next timer may still see a stale value on the "checking"
> state and therefore delay the timer firing until the new value is seen.
> So the worst that can happen is that the timer firing gets delayed for
> X jiffies (I guess in practice it's only 1 jiffy).
>
> That said, posix cpu timers already suffer such race because
> sig->cputimer.running itself is checked outside the sighand lock anyway.
Right, in the worst case, the next thread that comes along will handle
it. The current code already expects that level of granularity, so this
shouldn't change anything much. For example, there is almost always a
delay between a timer expiring and when a thread actually calls into
check_process_timers().
> > I can well imagine that this is all perfectly safe and fine, but I'd
> > really like to see that comment about _why_ that's the case, and why a
> > completely unlocked access without even memory barriers is fine.
>
> Agreed, there should be a comment about that in the code (that is already full
> of undocumented subtleties).
Okay, I will add more comments about this.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 22:53 ` Hideaki Kimura
@ 2015-08-26 23:13 ` Frederic Weisbecker
2015-08-26 23:45 ` Hideaki Kimura
0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-26 23:13 UTC (permalink / raw)
To: Hideaki Kimura
Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton
On Wed, Aug 26, 2015 at 03:53:26PM -0700, Hideaki Kimura wrote:
> Sure, let me elaborate.
>
> Executive summary:
> Yes, enabling a process-wide timer in such a large machine is not wise, but
> sometimes users/applications cannot avoid it.
>
>
> The issue was observed actually not in a database itself but in a common
> library it links to; gperftools.
>
> The database itself is optimized for many-cores/sockets, so surely it avoids
> putting a process-wide timer or other unscalable things. It just links to
> libprofiler for an optional feature to profile performance bottleneck only
> when the user turns it on. We of course avoid turning the feature on unless
> while we debug/tune the database.
>
> However, libprofiler sets the timer even when the client program doesn't
> invoke any of its functions: libprofiler does it when the shared library is
> loaded. We requested the developer of libprofiler to change the behavior,
> but seems like there is a reason to keep that behavior:
> https://code.google.com/p/gperftools/issues/detail?id=133
>
> Based on this, I think there are two reasons why we should ameliorate this
> issue in kernel layer.
>
>
> 1. In the particular case, it's hard to prevent or even detect the issue in
> user space.
>
> We (a team of low-level database and kernel experts) in fact spent huge
> amount of time to just figure out what's the bottleneck there because
> nothing measurable happens in user space. I pulled out countless hairs.
>
> Also, the user has to de-link the library from the application to prevent
> the itimer installation. Imagine a case where the software is proprietary.
> It won't fly.
>
>
> 2. This is just one example. There could be many other such
> binaries/libraries that do similar things somewhere in a complex software
> stack.
>
> Today we haven't heard of many such cases, but people will start hitting it
> once 100s~1,000s of cores become common.
>
>
> After applying this patchset, we have observed that the performance hit
> almost completely went away at least for 240 cores. So, it's quite
> beneficial in real world.
I can easily imagine that many code incidentally use posix cpu timers when
it's not strictly required. But it doesn't look right to fix the kernel
for that. For this simple reason: posix cpu timers, even after your fix,
should introduce noticeable overhead. All threads of a process with a timer
enqueued in elapse the cputime in a shared atomic variable. Add to that the
overhead of enqueuing the timer, firing it. There is a bunch of scalability
issue there.
>
> --
> Hideaki Kimura
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 22:56 ` Frederic Weisbecker
@ 2015-08-26 23:32 ` Jason Low
2015-08-27 4:52 ` Jason Low
2015-08-27 12:53 ` Frederic Weisbecker
0 siblings, 2 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 23:32 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton, jason.low2
On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > It was found while running a database workload on large systems that
> > significant time was spent trying to acquire the sighand lock.
> >
> > The issue was that whenever an itimer expired, many threads ended up
> > simultaneously trying to send the signal. Most of the time, nothing
> > happened after acquiring the sighand lock because another thread
> > had already sent the signal and updated the "next expire" time. The
> > fastpath_timer_check() didn't help much since the "next expire" time
> > was updated later.
> >
> > This patch addresses this by having the thread_group_cputimer structure
> > maintain a boolean to signify when a thread in the group is already
> > checking for process wide timers, and adds extra logic in the fastpath
> > to check the boolean.
> >
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> > ---
> > include/linux/init_task.h | 1 +
> > include/linux/sched.h | 3 +++
> > kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
> > 3 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index d0b380e..3350c77 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> > .cputimer = { \
> > .cputime_atomic = INIT_CPUTIME_ATOMIC, \
> > .running = 0, \
> > + .checking_timer = 0, \
> > }, \
> > INIT_PREV_CPUTIME(sig) \
> > .cred_guard_mutex = \
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 119823d..a6c8334 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> > * @cputime_atomic: atomic thread group interval timers.
> > * @running: non-zero when there are timers running and
> > * @cputime receives updates.
> > + * @checking_timer: non-zero when a thread is in the process of
> > + * checking for thread group timers.
> > *
> > * This structure contains the version of task_cputime, above, that is
> > * used for thread group CPU timer calculations.
> > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > struct thread_group_cputimer {
> > struct task_cputime_atomic cputime_atomic;
> > int running;
> > + int checking_timer;
>
> How about a flag in the "running" field instead?
>
> 1) Space in signal_struct is not as important as in task_strut but it
> still matters.
George Spelvin suggested that we convert them to booleans which would
make them take up 2 bytes.
> 2) We already read the "running" field locklessly. Adding a new field like
> checking_timer gets even more complicated. Ideally there should be at
> least a paired memory barrier between both. Let's just simplify that
> with a single field.
hmmm, so having 1 "flag" where we access bits for the "running" and
"checking_timer"?
> Now concerning the solution for your problem, I'm a bit uncomfortable with
> lockless magics like this. When the thread sets checking_timer to 1, there is
> no guarantee that the other threads in the process will see it "fast" enough
> to avoid the slow path checks. Then there is also the risk that the threads
> don't see "fast" enough that checking_timers has toggled to 0 and as a result
> a timer may expire late. Now the lockless access of "running" already induces
> such race. So if it really solves issues in practice, why not.
Perhaps to be safer, we use something like load_acquire() and
store_release() for accessing both the ->running and ->checking_timer
fields?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 23:13 ` Frederic Weisbecker
@ 2015-08-26 23:45 ` Hideaki Kimura
2015-08-27 13:18 ` Frederic Weisbecker
0 siblings, 1 reply; 29+ messages in thread
From: Hideaki Kimura @ 2015-08-26 23:45 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton
On 08/26/2015 04:13 PM, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 03:53:26PM -0700, Hideaki Kimura wrote:
>> Sure, let me elaborate.
>>
>> Executive summary:
>> Yes, enabling a process-wide timer in such a large machine is not wise, but
>> sometimes users/applications cannot avoid it.
>>
>>
>> The issue was observed actually not in a database itself but in a common
>> library it links to; gperftools.
>>
>> The database itself is optimized for many-cores/sockets, so surely it avoids
>> putting a process-wide timer or other unscalable things. It just links to
>> libprofiler for an optional feature to profile performance bottleneck only
>> when the user turns it on. We of course avoid turning the feature on unless
>> while we debug/tune the database.
>>
>> However, libprofiler sets the timer even when the client program doesn't
>> invoke any of its functions: libprofiler does it when the shared library is
>> loaded. We requested the developer of libprofiler to change the behavior,
>> but seems like there is a reason to keep that behavior:
>> https://code.google.com/p/gperftools/issues/detail?id=133
>>
>> Based on this, I think there are two reasons why we should ameliorate this
>> issue in kernel layer.
>>
>>
>> 1. In the particular case, it's hard to prevent or even detect the issue in
>> user space.
>>
>> We (a team of low-level database and kernel experts) in fact spent huge
>> amount of time to just figure out what's the bottleneck there because
>> nothing measurable happens in user space. I pulled out countless hairs.
>>
>> Also, the user has to de-link the library from the application to prevent
>> the itimer installation. Imagine a case where the software is proprietary.
>> It won't fly.
>>
>>
>> 2. This is just one example. There could be many other such
>> binaries/libraries that do similar things somewhere in a complex software
>> stack.
>>
>> Today we haven't heard of many such cases, but people will start hitting it
>> once 100s~1,000s of cores become common.
>>
>>
>> After applying this patchset, we have observed that the performance hit
>> almost completely went away at least for 240 cores. So, it's quite
>> beneficial in real world.
>
> I can easily imagine that many code incidentally use posix cpu timers when
> it's not strictly required. But it doesn't look right to fix the kernel
> for that. For this simple reason: posix cpu timers, even after your fix,
> should introduce noticeable overhead. All threads of a process with a timer
> enqueued in elapse the cputime in a shared atomic variable. Add to that the
> overhead of enqueuing the timer, firing it. There is a bunch of scalability
> issue there.
I totally agree that this is not a perfect solution. If there are 10x
more cores and sockets, just the atomic fetch_add might be too expensive.
However, it's comparatively/realistically the best thing we can do
without any drawbacks. We can't magically force all library developers
to write the most scalable code always.
My point is: this is a safety net, and a very effective one.
--
Hideaki Kimura
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 23:32 ` Jason Low
@ 2015-08-27 4:52 ` Jason Low
2015-08-27 12:53 ` Frederic Weisbecker
1 sibling, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-27 4:52 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Rik van Riel, Scott J Norton,
jason.low2
On Wed, 2015-08-26 at 16:32 -0700, Jason Low wrote:
> Perhaps to be safer, we use something like load_acquire() and
> store_release() for accessing both the ->running and ->checking_timer
> fields?
Regarding using barriers, one option could be to pair them between
sig->cputime_expires and the sig->cputimer.checking_timer accesses.
fastpath_timer_check()
{
...
if (READ_ONCE(sig->cputimer.running))
struct task_cputime group_sample;
sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);
if (task_cputime_expired(&group_sample, &sig->cputime_expires)) {
/*
* Comments
*/
mb();
if (!READ_ONCE(sig->cputimer.checking_timer))
return 1;
}
}
}
check_process_timers()
{
...
WRITE_ONCE(sig->cputimer.checking_timer, 0);
/*
* Comments
*/
mb();
sig->cputime_expires.prof_exp = expires_to_cputime(prof_expires);
sig->cputime_expires.virt_exp = expires_to_cputime(virt_expires);
sig->cputime_expires.sched_exp = sched_expires;
...
}
By the time the cputime_expires fields get updated at the end of
check_process_timers(), other threads in the fastpath_timer_check()
should observe the value 0 for READ_ONCE(sig->cputimer.checking_timer).
In the case where threads in the fastpath don't observe the
WRITE_ONCE(checking_timer, 1) early enough, that's fine, since it will
just (unnecessarily) go through the slowpath which is what we also do in
the current code.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-26 23:32 ` Jason Low
2015-08-27 4:52 ` Jason Low
@ 2015-08-27 12:53 ` Frederic Weisbecker
2015-08-27 20:29 ` Jason Low
1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 12:53 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton
On Wed, Aug 26, 2015 at 04:32:34PM -0700, Jason Low wrote:
> On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > > It was found while running a database workload on large systems that
> > > significant time was spent trying to acquire the sighand lock.
> > >
> > > The issue was that whenever an itimer expired, many threads ended up
> > > simultaneously trying to send the signal. Most of the time, nothing
> > > happened after acquiring the sighand lock because another thread
> > > had already sent the signal and updated the "next expire" time. The
> > > fastpath_timer_check() didn't help much since the "next expire" time
> > > was updated later.
> > >
> > > This patch addresses this by having the thread_group_cputimer structure
> > > maintain a boolean to signify when a thread in the group is already
> > > checking for process wide timers, and adds extra logic in the fastpath
> > > to check the boolean.
> > >
> > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > ---
> > > include/linux/init_task.h | 1 +
> > > include/linux/sched.h | 3 +++
> > > kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
> > > 3 files changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > > index d0b380e..3350c77 100644
> > > --- a/include/linux/init_task.h
> > > +++ b/include/linux/init_task.h
> > > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> > > .cputimer = { \
> > > .cputime_atomic = INIT_CPUTIME_ATOMIC, \
> > > .running = 0, \
> > > + .checking_timer = 0, \
> > > }, \
> > > INIT_PREV_CPUTIME(sig) \
> > > .cred_guard_mutex = \
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index 119823d..a6c8334 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> > > * @cputime_atomic: atomic thread group interval timers.
> > > * @running: non-zero when there are timers running and
> > > * @cputime receives updates.
> > > + * @checking_timer: non-zero when a thread is in the process of
> > > + * checking for thread group timers.
> > > *
> > > * This structure contains the version of task_cputime, above, that is
> > > * used for thread group CPU timer calculations.
> > > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > > struct thread_group_cputimer {
> > > struct task_cputime_atomic cputime_atomic;
> > > int running;
> > > + int checking_timer;
> >
> > How about a flag in the "running" field instead?
> >
> > 1) Space in signal_struct is not as important as in task_strut but it
> > still matters.
>
> George Spelvin suggested that we convert them to booleans which would
> make them take up 2 bytes.
>
> > 2) We already read the "running" field locklessly. Adding a new field like
> > checking_timer gets even more complicated. Ideally there should be at
> > least a paired memory barrier between both. Let's just simplify that
> > with a single field.
>
> hmmm, so having 1 "flag" where we access bits for the "running" and
> "checking_timer"?
Sure, like:
#define CPUTIMER_RUNNING 0x1
#define CPUTIMER_CHECKING 0x2
struct thread_group_cputimer {
struct task_cputime_atomic cputime_atomic;
int status;
}
So from cputimer_running() you just need to check:
if (cputimer->status & CPUTIMER_RUNNING)
And from run_posix_cpu_timer() fast-path:
if (cputimer->status == CPUTIMER_RUNNING)
so that ignores CPUTIMER_CHECKING case.
>
> > Now concerning the solution for your problem, I'm a bit uncomfortable with
> > lockless magics like this. When the thread sets checking_timer to 1, there is
> > no guarantee that the other threads in the process will see it "fast" enough
> > to avoid the slow path checks. Then there is also the risk that the threads
> > don't see "fast" enough that checking_timers has toggled to 0 and as a result
> > a timer may expire late. Now the lockless access of "running" already induces
> > such race. So if it really solves issues in practice, why not.
>
> Perhaps to be safer, we use something like load_acquire() and
> store_release() for accessing both the ->running and ->checking_timer
> fields?
Well it depends against what we want to order them. If it's a single field
we don't need to order them together at least.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-26 23:45 ` Hideaki Kimura
@ 2015-08-27 13:18 ` Frederic Weisbecker
2015-08-27 14:47 ` Steven Rostedt
0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 13:18 UTC (permalink / raw)
To: Hideaki Kimura
Cc: Jason Low, Oleg Nesterov, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Thomas Gleixner, Paul E. McKenney, linux-kernel,
Linus Torvalds, Steven Rostedt, Rik van Riel, Scott J Norton
On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> I totally agree that this is not a perfect solution. If there are 10x more
> cores and sockets, just the atomic fetch_add might be too expensive.
>
> However, it's comparatively/realistically the best thing we can do without
> any drawbacks. We can't magically force all library developers to write the
> most scalable code always.
>
> My point is: this is a safety net, and a very effective one.
I mean the problem here is that a library uses an unscalable profiling feature,
unconditionally as soon as you load it without even initializing anything. And
this library is used in production.
At first sight, fixing that in the kernel is only a hack that just reduces a bit
the symptoms.
What is the technical issue that prevents from fixing that in the library itself?
Posix timers can be attached anytime.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-27 13:18 ` Frederic Weisbecker
@ 2015-08-27 14:47 ` Steven Rostedt
2015-08-27 15:09 ` Thomas Gleixner
0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2015-08-27 14:47 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Hideaki Kimura, Jason Low, Oleg Nesterov, Andrew Morton,
Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton
On Thu, 27 Aug 2015 15:18:49 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > I totally agree that this is not a perfect solution. If there are 10x more
> > cores and sockets, just the atomic fetch_add might be too expensive.
> >
> > However, it's comparatively/realistically the best thing we can do without
> > any drawbacks. We can't magically force all library developers to write the
> > most scalable code always.
> >
> > My point is: this is a safety net, and a very effective one.
>
> I mean the problem here is that a library uses an unscalable profiling feature,
> unconditionally as soon as you load it without even initializing anything. And
> this library is used in production.
>
> At first sight, fixing that in the kernel is only a hack that just reduces a bit
> the symptoms.
>
> What is the technical issue that prevents from fixing that in the library itself?
> Posix timers can be attached anytime.
I'm curious to what the downside of this patch set is? If we can fix a
problem that should be fixed in userspace, but does not harm the kernel
by doing so, is that bad? (an argument for kdbus? ;-)
As Hideaki noted, this could be a problem in other locations as well
that people have yet to find.
-- Steve
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-27 14:47 ` Steven Rostedt
@ 2015-08-27 15:09 ` Thomas Gleixner
2015-08-27 15:17 ` Frederic Weisbecker
0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2015-08-27 15:09 UTC (permalink / raw)
To: Steven Rostedt
Cc: Frederic Weisbecker, Hideaki Kimura, Jason Low, Oleg Nesterov,
Andrew Morton, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton
On Thu, 27 Aug 2015, Steven Rostedt wrote:
> On Thu, 27 Aug 2015 15:18:49 +0200
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > > I totally agree that this is not a perfect solution. If there are 10x more
> > > cores and sockets, just the atomic fetch_add might be too expensive.
> > >
> > > However, it's comparatively/realistically the best thing we can do without
> > > any drawbacks. We can't magically force all library developers to write the
> > > most scalable code always.
> > >
> > > My point is: this is a safety net, and a very effective one.
> >
> > I mean the problem here is that a library uses an unscalable profiling feature,
> > unconditionally as soon as you load it without even initializing anything. And
> > this library is used in production.
> >
> > At first sight, fixing that in the kernel is only a hack that just reduces a bit
> > the symptoms.
> >
> > What is the technical issue that prevents from fixing that in the library itself?
> > Posix timers can be attached anytime.
>
> I'm curious to what the downside of this patch set is? If we can fix a
> problem that should be fixed in userspace, but does not harm the kernel
> by doing so, is that bad? (an argument for kdbus? ;-)
The patches are not fixing a problem which should be fixed in user
space. They merily avoid lock contention which happens to be prominent
with that particular library. But avoiding lock contention even for 2
threads is a worthwhile exercise if it does not hurt otherwise. And I
can't see anything what hurts with these patches.
Thanks,
tglx
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] timer: Improve itimers scalability
2015-08-27 15:09 ` Thomas Gleixner
@ 2015-08-27 15:17 ` Frederic Weisbecker
0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 15:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, Hideaki Kimura, Jason Low, Oleg Nesterov,
Andrew Morton, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
linux-kernel, Linus Torvalds, Rik van Riel, Scott J Norton
On Thu, Aug 27, 2015 at 05:09:21PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Aug 2015, Steven Rostedt wrote:
> > On Thu, 27 Aug 2015 15:18:49 +0200
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > > On Wed, Aug 26, 2015 at 04:45:44PM -0700, Hideaki Kimura wrote:
> > > > I totally agree that this is not a perfect solution. If there are 10x more
> > > > cores and sockets, just the atomic fetch_add might be too expensive.
> > > >
> > > > However, it's comparatively/realistically the best thing we can do without
> > > > any drawbacks. We can't magically force all library developers to write the
> > > > most scalable code always.
> > > >
> > > > My point is: this is a safety net, and a very effective one.
> > >
> > > I mean the problem here is that a library uses an unscalable profiling feature,
> > > unconditionally as soon as you load it without even initializing anything. And
> > > this library is used in production.
> > >
> > > At first sight, fixing that in the kernel is only a hack that just reduces a bit
> > > the symptoms.
> > >
> > > What is the technical issue that prevents from fixing that in the library itself?
> > > Posix timers can be attached anytime.
> >
> > I'm curious to what the downside of this patch set is? If we can fix a
> > problem that should be fixed in userspace, but does not harm the kernel
> > by doing so, is that bad? (an argument for kdbus? ;-)
>
> The patches are not fixing a problem which should be fixed in user
> space. They merily avoid lock contention which happens to be prominent
> with that particular library. But avoiding lock contention even for 2
> threads is a worthwhile exercise if it does not hurt otherwise. And I
> can't see anything what hurts with these patches.
Sure it shouldn't really hurt anyway, since the presense of elapsing timers
itself is checked locklessly.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-27 12:53 ` Frederic Weisbecker
@ 2015-08-27 20:29 ` Jason Low
2015-08-27 21:12 ` Frederic Weisbecker
0 siblings, 1 reply; 29+ messages in thread
From: Jason Low @ 2015-08-27 20:29 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton, jason.low2
On Thu, 2015-08-27 at 14:53 +0200, Frederic Weisbecker wrote:
> On Wed, Aug 26, 2015 at 04:32:34PM -0700, Jason Low wrote:
> > On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote:
> > > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> > > > It was found while running a database workload on large systems that
> > > > significant time was spent trying to acquire the sighand lock.
> > > >
> > > > The issue was that whenever an itimer expired, many threads ended up
> > > > simultaneously trying to send the signal. Most of the time, nothing
> > > > happened after acquiring the sighand lock because another thread
> > > > had already sent the signal and updated the "next expire" time. The
> > > > fastpath_timer_check() didn't help much since the "next expire" time
> > > > was updated later.
> > > >
> > > > This patch addresses this by having the thread_group_cputimer structure
> > > > maintain a boolean to signify when a thread in the group is already
> > > > checking for process wide timers, and adds extra logic in the fastpath
> > > > to check the boolean.
> > > >
> > > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > > ---
> > > > include/linux/init_task.h | 1 +
> > > > include/linux/sched.h | 3 +++
> > > > kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
> > > > 3 files changed, 21 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > > > index d0b380e..3350c77 100644
> > > > --- a/include/linux/init_task.h
> > > > +++ b/include/linux/init_task.h
> > > > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
> > > > .cputimer = { \
> > > > .cputime_atomic = INIT_CPUTIME_ATOMIC, \
> > > > .running = 0, \
> > > > + .checking_timer = 0, \
> > > > }, \
> > > > INIT_PREV_CPUTIME(sig) \
> > > > .cred_guard_mutex = \
> > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > index 119823d..a6c8334 100644
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -619,6 +619,8 @@ struct task_cputime_atomic {
> > > > * @cputime_atomic: atomic thread group interval timers.
> > > > * @running: non-zero when there are timers running and
> > > > * @cputime receives updates.
> > > > + * @checking_timer: non-zero when a thread is in the process of
> > > > + * checking for thread group timers.
> > > > *
> > > > * This structure contains the version of task_cputime, above, that is
> > > > * used for thread group CPU timer calculations.
> > > > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > > > struct thread_group_cputimer {
> > > > struct task_cputime_atomic cputime_atomic;
> > > > int running;
> > > > + int checking_timer;
> > >
> > > How about a flag in the "running" field instead?
> > >
> > > 1) Space in signal_struct is not as important as in task_strut but it
> > > still matters.
> >
> > George Spelvin suggested that we convert them to booleans which would
> > make them take up 2 bytes.
> >
> > > 2) We already read the "running" field locklessly. Adding a new field like
> > > checking_timer gets even more complicated. Ideally there should be at
> > > least a paired memory barrier between both. Let's just simplify that
> > > with a single field.
> >
> > hmmm, so having 1 "flag" where we access bits for the "running" and
> > "checking_timer"?
>
> Sure, like:
>
> #define CPUTIMER_RUNNING 0x1
> #define CPUTIMER_CHECKING 0x2
>
> struct thread_group_cputimer {
> struct task_cputime_atomic cputime_atomic;
> int status;
> }
>
> So from cputimer_running() you just need to check:
>
> if (cputimer->status & CPUTIMER_RUNNING)
>
> And from run_posix_cpu_timer() fast-path:
>
> if (cputimer->status == CPUTIMER_RUNNING)
>
> so that ignores CPUTIMER_CHECKING case.
Right, having just 1 "status" field can simply things a bit. The
(cputimer->status == CPUTIMER_RUNNING) check does appear misleading
though, since we're technically not only checking for if the "cputimer
is running".
Maybe something like:
int status = cputimer->status;
if ((status & CPUTIMER_RUNNING) && !(status & CPUTIMER_CHECKING))
makes it more obvious what's going on here.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
2015-08-27 20:29 ` Jason Low
@ 2015-08-27 21:12 ` Frederic Weisbecker
0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2015-08-27 21:12 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Linus Torvalds, Davidlohr Bueso,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton
On Thu, Aug 27, 2015 at 01:29:50PM -0700, Jason Low wrote:
> On Thu, 2015-08-27 at 14:53 +0200, Frederic Weisbecker wrote:
> > Sure, like:
> >
> > #define CPUTIMER_RUNNING 0x1
> > #define CPUTIMER_CHECKING 0x2
> >
> > struct thread_group_cputimer {
> > struct task_cputime_atomic cputime_atomic;
> > int status;
> > }
> >
> > So from cputimer_running() you just need to check:
> >
> > if (cputimer->status & CPUTIMER_RUNNING)
> >
> > And from run_posix_cpu_timer() fast-path:
> >
> > if (cputimer->status == CPUTIMER_RUNNING)
> >
> > so that ignores CPUTIMER_CHECKING case.
>
> Right, having just 1 "status" field can simply things a bit. The
> (cputimer->status == CPUTIMER_RUNNING) check does appear misleading
> though, since we're technically not only checking for if the "cputimer
> is running".
>
> Maybe something like:
>
> int status = cputimer->status;
> if ((status & CPUTIMER_RUNNING) && !(status & CPUTIMER_CHECKING))
>
> makes it more obvious what's going on here.
The result is the same but it may clarify the code indeed.
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
2015-08-26 21:57 ` Frederic Weisbecker
@ 2015-08-31 15:15 ` Davidlohr Bueso
2015-08-31 19:40 ` Jason Low
1 sibling, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2015-08-31 15:15 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Linus Torvalds, Steven Rostedt, Andrew Morton, Terry Rudd,
Rik van Riel, Scott J Norton
On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote:
> In fastpath_timer_check(), the task_cputime() function is always
> called to compute the utime and stime values. However, this is not
> necessary if there are no per-thread timers to check for. This patch
> modifies the code such that we compute the task_cputime values only
> when there are per-thread timers set.
>
> Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
2015-08-31 15:15 ` Davidlohr Bueso
@ 2015-08-31 19:40 ` Jason Low
0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-31 19:40 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Oleg Nesterov,
Paul E. McKenney, linux-kernel, Frederic Weisbecker,
Steven Rostedt, Andrew Morton, Terry Rudd, Rik van Riel,
Scott J Norton, jason.low2
On Mon, 2015-08-31 at 08:15 -0700, Davidlohr Bueso wrote:
> On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote:
> > In fastpath_timer_check(), the task_cputime() function is always
> > called to compute the utime and stime values. However, this is not
> > necessary if there are no per-thread timers to check for. This patch
> > modifies the code such that we compute the task_cputime values only
> > when there are per-thread timers set.
> >
> > Signed-off-by: Jason Low <jason.low2@hp.com>
>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Thanks David and Frederic.
I want to mention that this patch has been slightly changed. As
suggested by George, those extra utime, stime local variables are not
needed anymore, and we should be able to directly call:
task_cputime(tsk, &task_sample.utime, &task_sample.stime);
---
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3da..3a8c5b4 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1117,16 +1117,13 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
static inline int fastpath_timer_check(struct task_struct *tsk)
{
struct signal_struct *sig;
- cputime_t utime, stime;
-
- task_cputime(tsk, &utime, &stime);
if (!task_cputime_zero(&tsk->cputime_expires)) {
- struct task_cputime task_sample = {
- .utime = utime,
- .stime = stime,
- .sum_exec_runtime = tsk->se.sum_exec_runtime
- };
+ struct task_cputime task_sample;
+
+ task_cputime(tsk, &task_sample.utime, &task_sample.stime);
+
+ task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
return 1;
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
2015-08-26 16:57 [PATCH 1/3] timer: Optimize fastpath_timer_check() George Spelvin
@ 2015-08-26 17:31 ` Jason Low
0 siblings, 0 replies; 29+ messages in thread
From: Jason Low @ 2015-08-26 17:31 UTC (permalink / raw)
To: George Spelvin; +Cc: akpm, linux-kernel, jason.low2
On Wed, 2015-08-26 at 12:57 -0400, George Spelvin wrote:
> > if (!task_cputime_zero(&tsk->cputime_expires)) {
> >+ struct task_cputime task_sample;
> >+ cputime_t utime, stime;
> >+
> >+ task_cputime(tsk, &utime, &stime);
> >+ task_sample.utime = utime;
> >+ task_sample.stime = stime;
> >+ task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
>
> Er, task_sample.[us]time are already the correct types.
> Whay are the local variables necessary? How about:
>
> if (!task_cputime_zero(&tsk->cputime_expires)) {
> + struct task_cputime task_sample;
> +
> + task_cputime(tsk, &task_simple.utime, &task_simple.stime);
> + task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
Yes, good point. Now that we're moving the task_cputime() call to after
the task_sample structure is declared, the utime and stime local
variables are not required anymore.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()
@ 2015-08-26 16:57 George Spelvin
2015-08-26 17:31 ` Jason Low
0 siblings, 1 reply; 29+ messages in thread
From: George Spelvin @ 2015-08-26 16:57 UTC (permalink / raw)
To: jason.low2; +Cc: akpm, linux, linux-kernel
> if (!task_cputime_zero(&tsk->cputime_expires)) {
>+ struct task_cputime task_sample;
>+ cputime_t utime, stime;
>+
>+ task_cputime(tsk, &utime, &stime);
>+ task_sample.utime = utime;
>+ task_sample.stime = stime;
>+ task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
Er, task_sample.[us]time are already the correct types.
Whay are the local variables necessary? How about:
if (!task_cputime_zero(&tsk->cputime_expires)) {
+ struct task_cputime task_sample;
+
+ task_cputime(tsk, &task_simple.utime, &task_simple.stime);
+ task_sample.sum_exec_runtime = tsk->se.sum_exec_runtime;
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-08-31 19:44 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-26 3:17 [PATCH 0/3] timer: Improve itimers scalability Jason Low
2015-08-26 3:17 ` [PATCH 1/3] timer: Optimize fastpath_timer_check() Jason Low
2015-08-26 21:57 ` Frederic Weisbecker
2015-08-31 15:15 ` Davidlohr Bueso
2015-08-31 19:40 ` Jason Low
2015-08-26 3:17 ` [PATCH 2/3] timer: Check thread timers only when there are active thread timers Jason Low
2015-08-26 3:17 ` [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Jason Low
2015-08-26 17:53 ` Linus Torvalds
2015-08-26 22:31 ` Frederic Weisbecker
2015-08-26 22:57 ` Jason Low
2015-08-26 22:56 ` Frederic Weisbecker
2015-08-26 23:32 ` Jason Low
2015-08-27 4:52 ` Jason Low
2015-08-27 12:53 ` Frederic Weisbecker
2015-08-27 20:29 ` Jason Low
2015-08-27 21:12 ` Frederic Weisbecker
2015-08-26 3:27 ` [PATCH 0/3] timer: Improve itimers scalability Andrew Morton
2015-08-26 16:33 ` Jason Low
2015-08-26 17:08 ` Oleg Nesterov
2015-08-26 22:07 ` Jason Low
2015-08-26 22:53 ` Hideaki Kimura
2015-08-26 23:13 ` Frederic Weisbecker
2015-08-26 23:45 ` Hideaki Kimura
2015-08-27 13:18 ` Frederic Weisbecker
2015-08-27 14:47 ` Steven Rostedt
2015-08-27 15:09 ` Thomas Gleixner
2015-08-27 15:17 ` Frederic Weisbecker
2015-08-26 16:57 [PATCH 1/3] timer: Optimize fastpath_timer_check() George Spelvin
2015-08-26 17:31 ` Jason Low
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.