All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix the itimer regression (BZ 12618)
@ 2009-02-05 11:24 Peter Zijlstra
  2009-02-05 11:24 ` [PATCH 1/2] signal: re-add dead task accumulation stats Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-05 11:24 UTC (permalink / raw)
  To: mingo, tglx, oleg; +Cc: linux-kernel, yanmin_zhang, seto.hidetoshi

This should hopefully address all the itimer borkage.

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] signal: re-add dead task accumulation stats.
  2009-02-05 11:24 [PATCH 0/2] fix the itimer regression (BZ 12618) Peter Zijlstra
@ 2009-02-05 11:24 ` Peter Zijlstra
  2009-02-05 11:24 ` [PATCH 2/2] timers: split process wide cpu clocks/timers Peter Zijlstra
  2009-02-05 12:06 ` [PATCH 0/2] fix the itimer regression (BZ 12618) Ingo Molnar
  2 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-05 11:24 UTC (permalink / raw)
  To: mingo, tglx, oleg
  Cc: linux-kernel, yanmin_zhang, seto.hidetoshi, Peter Zijlstra

[-- Attachment #1: sig-cummulative-times.patch --]
[-- Type: text/plain, Size: 4560 bytes --]

We're going to split the process wide cpu accounting into two parts:

 - clocks; which can take all the time they want since they run
           from user context.

 - timers; which need constant time tracing but can affort the overhead
           because they're default off -- and rare.

The clock readout will go back to a full sum of the thread group, for this
we need to re-add the exit stats that were removed in the initial itimer
rework (f06febc9: timers: fix itimer/many thread hang).

Furthermore, since that full sum can be rather slow for large thread groups
and we have the complete dead task stats, revert the do_notify_parent time
computation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |   10 +++++++++-
 kernel/exit.c         |    3 +++
 kernel/fork.c         |    3 ++-
 3 files changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -567,7 +567,7 @@ struct signal_struct {
 	 * Live threads maintain their own counters and add to these
 	 * in __exit_signal, except for the group leader.
 	 */
-	cputime_t cutime, cstime;
+	cputime_t utime, stime, cutime, cstime;
 	cputime_t gtime;
 	cputime_t cgtime;
 	unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
@@ -576,6 +576,14 @@ struct signal_struct {
 	struct task_io_accounting ioac;
 
 	/*
+	 * Cumulative ns of schedule CPU time fo dead threads in the
+	 * group, not including a zombie group leader, (This only differs
+	 * from jiffies_to_ns(utime + stime) if sched_clock uses something
+	 * other than jiffies.)
+	 */
+	unsigned long long sum_sched_runtime;
+
+	/*
 	 * We don't bother to synchronize most readers of this at all,
 	 * because there is no reader checking a limit that actually needs
 	 * to get both rlim_cur and rlim_max atomically, and either one
Index: linux-2.6/kernel/exit.c
===================================================================
--- linux-2.6.orig/kernel/exit.c
+++ linux-2.6/kernel/exit.c
@@ -118,6 +118,8 @@ static void __exit_signal(struct task_st
 		 * We won't ever get here for the group leader, since it
 		 * will have been the last reference on the signal_struct.
 		 */
+		sig->utime = cputime_add(sig->utime, task_utime(tsk));
+		sig->stime = cputime_add(sig->stime, task_stime(tsk));
 		sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
 		sig->min_flt += tsk->min_flt;
 		sig->maj_flt += tsk->maj_flt;
@@ -126,6 +128,7 @@ static void __exit_signal(struct task_st
 		sig->inblock += task_io_get_inblock(tsk);
 		sig->oublock += task_io_get_oublock(tsk);
 		task_io_accounting_add(&sig->ioac, &tsk->ioac);
+		sig->sum_sched_runtime += tsk->se.sum_exec_runtime;
 		sig = NULL; /* Marker for below. */
 	}
 
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -856,13 +856,14 @@ static int copy_signal(unsigned long clo
 	sig->tty_old_pgrp = NULL;
 	sig->tty = NULL;
 
-	sig->cutime = sig->cstime = cputime_zero;
+	sig->utime = sig->stime = sig->cutime = sig->cstime = cputime_zero;
 	sig->gtime = cputime_zero;
 	sig->cgtime = cputime_zero;
 	sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
 	sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
 	sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
 	task_io_accounting_init(&sig->ioac);
+	sig->sum_sched_runtime = 0;
 	taskstats_tgid_init(sig);
 
 	task_lock(current->group_leader);
Index: linux-2.6/kernel/signal.c
===================================================================
--- linux-2.6.orig/kernel/signal.c
+++ linux-2.6/kernel/signal.c
@@ -1367,7 +1367,6 @@ int do_notify_parent(struct task_struct 
 	struct siginfo info;
 	unsigned long flags;
 	struct sighand_struct *psig;
-	struct task_cputime cputime;
 	int ret = sig;
 
 	BUG_ON(sig == -1);
@@ -1397,9 +1396,10 @@ int do_notify_parent(struct task_struct 
 	info.si_uid = __task_cred(tsk)->uid;
 	rcu_read_unlock();
 
-	thread_group_cputime(tsk, &cputime);
-	info.si_utime = cputime_to_jiffies(cputime.utime);
-	info.si_stime = cputime_to_jiffies(cputime.stime);
+	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
+				tsk->signal->utime));
+	info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
+				tsk->signal->stime));
 
 	info.si_status = tsk->exit_code & 0x7f;
 	if (tsk->exit_code & 0x80)

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] timers: split process wide cpu clocks/timers
  2009-02-05 11:24 [PATCH 0/2] fix the itimer regression (BZ 12618) Peter Zijlstra
  2009-02-05 11:24 ` [PATCH 1/2] signal: re-add dead task accumulation stats Peter Zijlstra
@ 2009-02-05 11:24 ` Peter Zijlstra
  2009-02-05 21:30   ` Oleg Nesterov
  2009-02-05 12:06 ` [PATCH 0/2] fix the itimer regression (BZ 12618) Ingo Molnar
  2 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-05 11:24 UTC (permalink / raw)
  To: mingo, tglx, oleg
  Cc: linux-kernel, yanmin_zhang, seto.hidetoshi, Peter Zijlstra

[-- Attachment #1: thread_group_cputime.patch --]
[-- Type: text/plain, Size: 13101 bytes --]

Change the process wide cpu timers/clocks so that we:

 1) don't mess up the kernel with too many threads,
 2) don't have a per-cpu allocation for each process,
 3) have no impact when not used.

In order to accomplish this we're going to split it into two parts:

 - clocks; which can take all the time they want since they run
           from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)

 - timers; which need constant time sampling but since they're
           explicity used, the user can pay the overhead.

The clock readout will go back to a full sum of the thread group, while the
timers will run of a global 'clock' that only runs when needed, so only
programs that make use of the facility pay the price.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/init_task.h |   11 ++---
 include/linux/sched.h     |   54 +++++++++++++++-----------
 kernel/itimer.c           |    4 -
 kernel/posix-cpu-timers.c |   95 ++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched_stats.h      |   45 ++++++++++++---------
 5 files changed, 155 insertions(+), 54 deletions(-)

Index: linux-2.6/include/linux/init_task.h
===================================================================
--- linux-2.6.orig/include/linux/init_task.h
+++ linux-2.6/include/linux/init_task.h
@@ -48,12 +48,11 @@ extern struct fs_struct init_fs;
 	.posix_timers	 = LIST_HEAD_INIT(sig.posix_timers),		\
 	.cpu_timers	= INIT_CPU_TIMERS(sig.cpu_timers),		\
 	.rlim		= INIT_RLIMITS,					\
-	.cputime	= { .totals = {					\
-		.utime = cputime_zero,					\
-		.stime = cputime_zero,					\
-		.sum_exec_runtime = 0,					\
-		.lock = __SPIN_LOCK_UNLOCKED(sig.cputime.totals.lock),	\
-	}, },								\
+	.cputimer	= { 						\
+		.cputime = INIT_CPUTIME,				\
+		.running = 0,						\
+		.lock = __SPIN_LOCK_UNLOCKED(sig.cputimer.lock),	\
+	},								\
 }
 
 extern struct nsproxy init_nsproxy;
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -451,7 +451,6 @@ struct pacct_struct {
  * @utime:		time spent in user mode, in &cputime_t units
  * @stime:		time spent in kernel mode, in &cputime_t units
  * @sum_exec_runtime:	total time spent on the CPU, in nanoseconds
- * @lock:		lock for fields in this struct
  *
  * This structure groups together three kinds of CPU time that are
  * tracked for threads and thread groups.  Most things considering
@@ -462,23 +461,33 @@ struct task_cputime {
 	cputime_t utime;
 	cputime_t stime;
 	unsigned long long sum_exec_runtime;
-	spinlock_t lock;
 };
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
 #define virt_exp	utime
 #define sched_exp	sum_exec_runtime
 
+#define INIT_CPUTIME	\
+	(struct task_cputime) {					\
+		.utime = cputime_zero,				\
+		.stime = cputime_zero,				\
+		.sum_exec_runtime = 0,				\
+	}
+
 /**
- * struct thread_group_cputime - thread group interval timer counts
- * @totals:		thread group interval timers; substructure for
- *			uniprocessor kernel, per-cpu for SMP kernel.
+ * struct thread_group_cputimer - thread group interval timer counts
+ * @cputime:		thread group interval timers.
+ * @running:		non-zero when there are timers running and
+ * 			@cputime receives updates.
+ * @lock:		lock for fields in this struct.
  *
  * This structure contains the version of task_cputime, above, that is
- * used for thread group CPU clock calculations.
+ * used for thread group CPU timer calculations.
  */
-struct thread_group_cputime {
-	struct task_cputime totals;
+struct thread_group_cputimer {
+	struct task_cputime cputime;
+	int running;
+	spinlock_t lock;
 };
 
 /*
@@ -527,10 +536,10 @@ struct signal_struct {
 	cputime_t it_prof_incr, it_virt_incr;
 
 	/*
-	 * Thread group totals for process CPU clocks.
-	 * See thread_group_cputime(), et al, for details.
+	 * Thread group totals for process CPU timers.
+	 * See thread_group_cputimer(), et al, for details.
 	 */
-	struct thread_group_cputime cputime;
+	struct thread_group_cputimer cputimer;
 
 	/* Earliest-expiration cache. */
 	struct task_cputime cputime_expires;
@@ -2218,27 +2227,26 @@ static inline int spin_needbreak(spinloc
 /*
  * Thread group CPU time accounting.
  */
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
 
 static inline
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 {
-	struct task_cputime *totals = &tsk->signal->cputime.totals;
+	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
 	unsigned long flags;
 
-	spin_lock_irqsave(&totals->lock, flags);
-	*times = *totals;
-	spin_unlock_irqrestore(&totals->lock, flags);
+	WARN_ON(!cputimer->running);
+
+	spin_lock_irqsave(&cputimer->lock, flags);
+	*times = cputimer->cputime;
+	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
-	sig->cputime.totals = (struct task_cputime){
-		.utime = cputime_zero,
-		.stime = cputime_zero,
-		.sum_exec_runtime = 0,
-	};
-
-	spin_lock_init(&sig->cputime.totals.lock);
+	sig->cputimer.cputime = INIT_CPUTIME;
+	spin_lock_init(&sig->cputimer.lock);
+	sig->cputimer.running = 0;
 }
 
 static inline void thread_group_cputime_free(struct signal_struct *sig)
Index: linux-2.6/kernel/itimer.c
===================================================================
--- linux-2.6.orig/kernel/itimer.c
+++ linux-2.6/kernel/itimer.c
@@ -62,7 +62,7 @@ int do_getitimer(int which, struct itime
 			struct task_cputime cputime;
 			cputime_t utime;
 
-			thread_group_cputime(tsk, &cputime);
+			thread_group_cputimer(tsk, &cputime);
 			utime = cputime.utime;
 			if (cputime_le(cval, utime)) { /* about to fire */
 				cval = jiffies_to_cputime(1);
@@ -82,7 +82,7 @@ int do_getitimer(int which, struct itime
 			struct task_cputime times;
 			cputime_t ptime;
 
-			thread_group_cputime(tsk, &times);
+			thread_group_cputimer(tsk, &times);
 			ptime = cputime_add(times.utime, times.stime);
 			if (cputime_le(cval, ptime)) { /* about to fire */
 				cval = jiffies_to_cputime(1);
Index: linux-2.6/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.orig/kernel/posix-cpu-timers.c
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -230,6 +230,37 @@ static int cpu_clock_sample(const clocki
 	return 0;
 }
 
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+{
+	struct sighand_struct *sighand;
+	struct signal_struct *sig;
+	struct task_struct *t;
+
+	*times = INIT_CPUTIME;
+
+	rcu_read_lock();
+	sighand = rcu_dereference(tsk->sighand);
+	if (!sighand)
+		goto out;
+
+	sig = tsk->signal;
+
+	t = tsk;
+	do {
+		times->utime = cputime_add(times->utime, t->utime);
+		times->stime = cputime_add(times->stime, t->stime);
+		times->sum_exec_runtime += t->se.sum_exec_runtime;
+
+		t = next_thread(t);
+	} while (t != tsk);
+
+	times->utime = cputime_add(times->utime, sig->utime);
+	times->stime = cputime_add(times->stime, sig->stime);
+	times->sum_exec_runtime += sig->sum_sched_runtime;
+out:
+	rcu_read_unlock();
+}
+
 /*
  * Sample a process (thread group) clock for the given group_leader task.
  * Must be called with tasklist_lock held for reading.
@@ -476,6 +507,29 @@ static void clear_dead_task(struct k_iti
 }
 
 /*
+ * Enable the process wide cpu timer accounting.
+ *
+ * serialized using ->sighand->siglock
+ */
+static void start_process_timers(struct task_struct *tsk)
+{
+	tsk->signal->cputimer.running = 1;
+	barrier();
+}
+
+/*
+ * Release the process wide timer accounting -- timer stops ticking when
+ * nobody cares about it.
+ *
+ * serialized using ->sighand->siglock
+ */
+static void stop_process_timers(struct task_struct *tsk)
+{
+	tsk->signal->cputimer.running = 0;
+	barrier();
+}
+
+/*
  * Insert the timer on the appropriate list before any timers that
  * expire later.  This must be called with the tasklist_lock held
  * for reading, and interrupts disabled.
@@ -495,6 +549,9 @@ static void arm_timer(struct k_itimer *t
 	BUG_ON(!irqs_disabled());
 	spin_lock(&p->sighand->siglock);
 
+	if (!CPUCLOCK_PERTHREAD(timer->it_clock))
+		start_process_timers(p);
+
 	listpos = head;
 	if (CPUCLOCK_WHICH(timer->it_clock) == CPUCLOCK_SCHED) {
 		list_for_each_entry(next, head, entry) {
@@ -987,13 +1044,15 @@ static void check_process_timers(struct 
 	    sig->rlim[RLIMIT_CPU].rlim_cur == RLIM_INFINITY &&
 	    list_empty(&timers[CPUCLOCK_VIRT]) &&
 	    cputime_eq(sig->it_virt_expires, cputime_zero) &&
-	    list_empty(&timers[CPUCLOCK_SCHED]))
+	    list_empty(&timers[CPUCLOCK_SCHED])) {
+		stop_process_timers(tsk);
 		return;
+	}
 
 	/*
 	 * Collect the current process totals.
 	 */
-	thread_group_cputime(tsk, &cputime);
+	thread_group_cputimer(tsk, &cputime);
 	utime = cputime.utime;
 	ptime = cputime_add(utime, cputime.stime);
 	sum_sched_runtime = cputime.sum_exec_runtime;
@@ -1259,7 +1318,7 @@ static inline int fastpath_timer_check(s
 	if (!task_cputime_zero(&sig->cputime_expires)) {
 		struct task_cputime group_sample;
 
-		thread_group_cputime(tsk, &group_sample);
+		thread_group_cputimer(tsk, &group_sample);
 		if (task_cputime_expired(&group_sample, &sig->cputime_expires))
 			return 1;
 	}
@@ -1329,6 +1388,33 @@ void run_posix_cpu_timers(struct task_st
 }
 
 /*
+ * Sample a process (thread group) timer for the given group_leader task.
+ * Must be called with tasklist_lock held for reading.
+ */
+static int cpu_timer_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	struct task_cputime cputime;
+
+	thread_group_cputimer(p, &cputime);
+	switch (CPUCLOCK_WHICH(which_clock)) {
+	default:
+		return -EINVAL;
+	case CPUCLOCK_PROF:
+		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		break;
+	case CPUCLOCK_VIRT:
+		cpu->cpu = cputime.utime;
+		break;
+	case CPUCLOCK_SCHED:
+		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		break;
+	}
+	return 0;
+}
+
+/*
  * Set one of the process-wide special case CPU timers.
  * The tsk->sighand->siglock must be held by the caller.
  * The *newval argument is relative and we update it to be absolute, *oldval
@@ -1341,7 +1427,8 @@ void set_process_cpu_timer(struct task_s
 	struct list_head *head;
 
 	BUG_ON(clock_idx == CPUCLOCK_SCHED);
-	cpu_clock_sample_group(clock_idx, tsk, &now);
+	start_process_timers(tsk);
+	cpu_timer_sample_group(clock_idx, tsk, &now);
 
 	if (oldval) {
 		if (!cputime_eq(*oldval, cputime_zero)) {
Index: linux-2.6/kernel/sched_stats.h
===================================================================
--- linux-2.6.orig/kernel/sched_stats.h
+++ linux-2.6/kernel/sched_stats.h
@@ -296,19 +296,21 @@ sched_info_switch(struct task_struct *pr
 static inline void account_group_user_time(struct task_struct *tsk,
 					   cputime_t cputime)
 {
-	struct task_cputime *times;
-	struct signal_struct *sig;
+	struct thread_group_cputimer *cputimer;
 
 	/* tsk == current, ensure it is safe to use ->signal */
 	if (unlikely(tsk->exit_state))
 		return;
 
-	sig = tsk->signal;
-	times = &sig->cputime.totals;
+	cputimer = &tsk->signal->cputimer;
 
-	spin_lock(&times->lock);
-	times->utime = cputime_add(times->utime, cputime);
-	spin_unlock(&times->lock);
+	if (!cputimer->running)
+		return;
+
+	spin_lock(&cputimer->lock);
+	cputimer->cputime.utime =
+		cputime_add(cputimer->cputime.utime, cputime);
+	spin_unlock(&cputimer->lock);
 }
 
 /**
@@ -324,19 +326,21 @@ static inline void account_group_user_ti
 static inline void account_group_system_time(struct task_struct *tsk,
 					     cputime_t cputime)
 {
-	struct task_cputime *times;
-	struct signal_struct *sig;
+	struct thread_group_cputimer *cputimer;
 
 	/* tsk == current, ensure it is safe to use ->signal */
 	if (unlikely(tsk->exit_state))
 		return;
 
-	sig = tsk->signal;
-	times = &sig->cputime.totals;
+	cputimer = &tsk->signal->cputimer;
+
+	if (!cputimer->running)
+		return;
 
-	spin_lock(&times->lock);
-	times->stime = cputime_add(times->stime, cputime);
-	spin_unlock(&times->lock);
+	spin_lock(&cputimer->lock);
+	cputimer->cputime.stime =
+		cputime_add(cputimer->cputime.stime, cputime);
+	spin_unlock(&cputimer->lock);
 }
 
 /**
@@ -352,7 +356,7 @@ static inline void account_group_system_
 static inline void account_group_exec_runtime(struct task_struct *tsk,
 					      unsigned long long ns)
 {
-	struct task_cputime *times;
+	struct thread_group_cputimer *cputimer;
 	struct signal_struct *sig;
 
 	sig = tsk->signal;
@@ -361,9 +365,12 @@ static inline void account_group_exec_ru
 	if (unlikely(!sig))
 		return;
 
-	times = &sig->cputime.totals;
+	cputimer = &sig->cputimer;
+
+	if (!cputimer->running)
+		return;
 
-	spin_lock(&times->lock);
-	times->sum_exec_runtime += ns;
-	spin_unlock(&times->lock);
+	spin_lock(&cputimer->lock);
+	cputimer->cputime.sum_exec_runtime += ns;
+	spin_unlock(&cputimer->lock);
 }

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-05 11:24 [PATCH 0/2] fix the itimer regression (BZ 12618) Peter Zijlstra
  2009-02-05 11:24 ` [PATCH 1/2] signal: re-add dead task accumulation stats Peter Zijlstra
  2009-02-05 11:24 ` [PATCH 2/2] timers: split process wide cpu clocks/timers Peter Zijlstra
@ 2009-02-05 12:06 ` Ingo Molnar
  2009-02-06  4:51   ` Zhang, Yanmin
  2009-02-10  2:48   ` Lin Ming
  2 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-02-05 12:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, oleg, linux-kernel, yanmin_zhang, seto.hidetoshi


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> This should hopefully address all the itimer borkage.

Applied to tip:timers/urgent, thanks Peter!

Yanmin: could you check hacbench_pth with latest tip/master, do
these fixes resolve that 3% regression you reported?

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] timers: split process wide cpu clocks/timers
  2009-02-05 11:24 ` [PATCH 2/2] timers: split process wide cpu clocks/timers Peter Zijlstra
@ 2009-02-05 21:30   ` Oleg Nesterov
  2009-02-05 22:20     ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2009-02-05 21:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, yanmin_zhang, seto.hidetoshi, Roland McGrath

On 02/05, Peter Zijlstra wrote:
>
> Change the process wide cpu timers/clocks so that we:
>
>  1) don't mess up the kernel with too many threads,
>  2) don't have a per-cpu allocation for each process,
>  3) have no impact when not used.
>
> In order to accomplish this we're going to split it into two parts:
>
>  - clocks; which can take all the time they want since they run
>            from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
>
>  - timers; which need constant time sampling but since they're
>            explicity used, the user can pay the overhead.
>
> The clock readout will go back to a full sum of the thread group, while the
> timers will run of a global 'clock' that only runs when needed, so only
> programs that make use of the facility pay the price.

Ah, personally I think this is a very nice idea!

A couple of minor questions, before I try to read this patch more...

>  static inline
> -void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> +void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)

I know, it is silly to complain about the naming, but can't resist.

Now we have both thread_group_cputime() and thread_group_cputimer().
But it is not possible to distinguish them while reading the code.
For example, looks like posix_cpu_timers_exit_group() needs
thread_group_cputimer, not thread_group_cputime, no? But then we can
hit the WARN_ON(!cputimer->running). Afaics.

Hmm... Can't fastpath_timer_check()->thread_group_cputimer() have the
false warning too? Suppose we had the timer, then posix_cpu_timer_del()
removes this timer, but task_cputime_zero(&sig->cputime_expires) still
not true.

> +void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> +{
> +	struct sighand_struct *sighand;
> +	struct signal_struct *sig;
> +	struct task_struct *t;
> +
> +	*times = INIT_CPUTIME;
> +
> +	rcu_read_lock();
> +	sighand = rcu_dereference(tsk->sighand);
> +	if (!sighand)
> +		goto out;
> +
> +	sig = tsk->signal;

I am afraid to be wrong, but it looks like we always call this function
when we know we must have a valid ->sighand/siglock. Perhaps we do not
need any check?

IOW, unless I missed something we should not just return if there is no
->sighand or ->signal, this just hides the problem while we should fix
the caller.

> + * Enable the process wide cpu timer accounting.
> + *
> + * serialized using ->sighand->siglock
> + */
> +static void start_process_timers(struct task_struct *tsk)
> +{
> +	tsk->signal->cputimer.running = 1;
> +	barrier();
> +}
> +
> +/*
> + * Release the process wide timer accounting -- timer stops ticking when
> + * nobody cares about it.
> + *
> + * serialized using ->sighand->siglock
> + */
> +static void stop_process_timers(struct task_struct *tsk)
> +{
> +	tsk->signal->cputimer.running = 0;
> +	barrier();
> +}

Could you explain these barriers?


And, I am a bit lost... set_process_cpu_timer() does cpu_timer_sample_group(),
but posix_cpu_timer_set() does cpu_clock_sample_group() in !CPUCLOCK_PERTHREAD
case, could you confirm this is correct?

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] timers: split process wide cpu clocks/timers
  2009-02-05 21:30   ` Oleg Nesterov
@ 2009-02-05 22:20     ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-05 22:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mingo, tglx, linux-kernel, yanmin_zhang, seto.hidetoshi, Roland McGrath

On Thu, 2009-02-05 at 22:30 +0100, Oleg Nesterov wrote:
> On 02/05, Peter Zijlstra wrote:
> >
> > Change the process wide cpu timers/clocks so that we:
> >
> >  1) don't mess up the kernel with too many threads,
> >  2) don't have a per-cpu allocation for each process,
> >  3) have no impact when not used.
> >
> > In order to accomplish this we're going to split it into two parts:
> >
> >  - clocks; which can take all the time they want since they run
> >            from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
> >
> >  - timers; which need constant time sampling but since they're
> >            explicity used, the user can pay the overhead.
> >
> > The clock readout will go back to a full sum of the thread group, while the
> > timers will run of a global 'clock' that only runs when needed, so only
> > programs that make use of the facility pay the price.
> 
> Ah, personally I think this is a very nice idea!

Thanks.

> A couple of minor questions, before I try to read this patch more...
> 
> >  static inline
> > -void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> > +void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
> 
> I know, it is silly to complain about the naming, but can't resist.
> 
> Now we have both thread_group_cputime() and thread_group_cputimer().
> But it is not possible to distinguish them while reading the code.

Good point, s/thread_group_cputime/thread_group_sum_time/g ?

> For example, looks like posix_cpu_timers_exit_group() needs
> thread_group_cputimer, not thread_group_cputime, no? But then we can
> hit the WARN_ON(!cputimer->running). Afaics.

Yes, I think you're right. Although does it really matter what we do
here, can anybody still access the remaining time after we clean up the
whole thread group?

> Hmm... Can't fastpath_timer_check()->thread_group_cputimer() have the
> false warning too? Suppose we had the timer, then posix_cpu_timer_del()
> removes this timer, but task_cputime_zero(&sig->cputime_expires) still
> not true.

Another good spotting, this appears to be the only site not under
siglock or tasklist_lock. Yes in that case there can be a false
positive.

I'd better remove that warning.

> > +void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> > +{
> > +	struct sighand_struct *sighand;
> > +	struct signal_struct *sig;
> > +	struct task_struct *t;
> > +
> > +	*times = INIT_CPUTIME;
> > +
> > +	rcu_read_lock();
> > +	sighand = rcu_dereference(tsk->sighand);
> > +	if (!sighand)
> > +		goto out;
> > +
> > +	sig = tsk->signal;
> 
> I am afraid to be wrong, but it looks like we always call this function
> when we know we must have a valid ->sighand/siglock. Perhaps we do not
> need any check?

I think you're right, but paranoia won out.

> IOW, unless I missed something we should not just return if there is no
> ->sighand or ->signal, this just hides the problem while we should fix
> the caller.

Might be a patch for .30-rc1 and then fix up everything that falls over.
Agreed.

> > + * Enable the process wide cpu timer accounting.
> > + *
> > + * serialized using ->sighand->siglock
> > + */
> > +static void start_process_timers(struct task_struct *tsk)
> > +{
> > +	tsk->signal->cputimer.running = 1;
> > +	barrier();
> > +}
> > +
> > +/*
> > + * Release the process wide timer accounting -- timer stops ticking when
> > + * nobody cares about it.
> > + *
> > + * serialized using ->sighand->siglock
> > + */
> > +static void stop_process_timers(struct task_struct *tsk)
> > +{
> > +	tsk->signal->cputimer.running = 0;
> > +	barrier();
> > +}
> 
> Could you explain these barriers?

I was thinking that since account_group_*time() can run concurrently,
its best to issue the write asap, smp barriers seemed overkill since its
not a correctness issue.

> And, I am a bit lost... set_process_cpu_timer() does cpu_timer_sample_group(),
> but posix_cpu_timer_set() does cpu_clock_sample_group() in !CPUCLOCK_PERTHREAD
> case, could you confirm this is correct?

Ah, you're saying I missed an start_process_timers() instance?

Ok, so how about this -- it does not yet address the
posix_cpu_timers_exit_group() issue, but should clarify the rest a bit,
agreed?

---
Subject: timer: cleanup the clock/timer separation

Rename thread_group_cputime() to thread_group_sum_time() to increase the
visual difference to thread_group_cputimer().

Furthermore, to decrease the chance of a missed enable, always enable
the timer when we sample it, we'll always disable it when we find that
there are no active timers in the jiffy tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 fs/binfmt_elf.c           |    2 +-
 fs/binfmt_elf_fdpic.c     |    2 +-
 fs/proc/array.c           |    2 +-
 include/linux/sched.h     |    4 ++--
 kernel/exit.c             |    4 ++--
 kernel/posix-cpu-timers.c |   35 ++++-------------------------------
 kernel/sys.c              |    4 ++--
 7 files changed, 13 insertions(+), 40 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index e3ff2b9..b5e32b1 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1359,7 +1359,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_sum_time(p, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index f3e72c5..71717cf 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1389,7 +1389,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_sum_time(p, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 7e4877d..16c8196 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -411,7 +411,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
 			min_flt += sig->min_flt;
 			maj_flt += sig->maj_flt;
-			thread_group_cputime(task, &cputime);
+			thread_group_sum_time(task, &cputime);
 			utime = cputime.utime;
 			stime = cputime.stime;
 			gtime = cputime_add(gtime, sig->gtime);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2d19bc8..328402a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2227,7 +2227,7 @@ static inline int spin_needbreak(spinlock_t *lock)
 /*
  * Thread group CPU time accounting.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_sum_time(struct task_struct *tsk, struct task_cputime *times);
 
 static inline
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
@@ -2235,7 +2235,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
 	unsigned long flags;
 
-	WARN_ON(!cputimer->running);
+	cputimer->running = 1;
 
 	spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
diff --git a/kernel/exit.c b/kernel/exit.c
index f52c24e..85d2a19 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1316,11 +1316,11 @@ static int wait_task_zombie(struct task_struct *p, int options,
 		 * as other threads in the parent group can be right
 		 * here reaping other children at the same time.
 		 *
-		 * We use thread_group_cputime() to get times for the thread
+		 * We use thread_group_sum_time() to get times for the thread
 		 * group, which consolidates times for all threads in the
 		 * group including the group leader.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_sum_time(p, &cputime);
 		spin_lock_irq(&p->parent->sighand->siglock);
 		psig = p->parent->signal;
 		sig = p->signal;
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index db107c9..90a32e4 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -230,7 +230,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 	return 0;
 }
 
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_sum_time(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct sighand_struct *sighand;
 	struct signal_struct *sig;
@@ -271,7 +271,7 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_sum_time(p, &cputime);
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
 		return -EINVAL;
@@ -488,7 +488,7 @@ void posix_cpu_timers_exit_group(struct task_struct *tsk)
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(tsk, &cputime);
+	thread_group_sum_time(tsk, &cputime);
 	cleanup_timers(tsk->signal->cpu_timers,
 		       cputime.utime, cputime.stime, cputime.sum_exec_runtime);
 }
@@ -507,29 +507,6 @@ static void clear_dead_task(struct k_itimer *timer, union cpu_time_count now)
 }
 
 /*
- * Enable the process wide cpu timer accounting.
- *
- * serialized using ->sighand->siglock
- */
-static void start_process_timers(struct task_struct *tsk)
-{
-	tsk->signal->cputimer.running = 1;
-	barrier();
-}
-
-/*
- * Release the process wide timer accounting -- timer stops ticking when
- * nobody cares about it.
- *
- * serialized using ->sighand->siglock
- */
-static void stop_process_timers(struct task_struct *tsk)
-{
-	tsk->signal->cputimer.running = 0;
-	barrier();
-}
-
-/*
  * Insert the timer on the appropriate list before any timers that
  * expire later.  This must be called with the tasklist_lock held
  * for reading, and interrupts disabled.
@@ -549,9 +526,6 @@ static void arm_timer(struct k_itimer *timer, union cpu_time_count now)
 	BUG_ON(!irqs_disabled());
 	spin_lock(&p->sighand->siglock);
 
-	if (!CPUCLOCK_PERTHREAD(timer->it_clock))
-		start_process_timers(p);
-
 	listpos = head;
 	if (CPUCLOCK_WHICH(timer->it_clock) == CPUCLOCK_SCHED) {
 		list_for_each_entry(next, head, entry) {
@@ -1045,7 +1019,7 @@ static void check_process_timers(struct task_struct *tsk,
 	    list_empty(&timers[CPUCLOCK_VIRT]) &&
 	    cputime_eq(sig->it_virt_expires, cputime_zero) &&
 	    list_empty(&timers[CPUCLOCK_SCHED])) {
-		stop_process_timers(tsk);
+		tsk->signal->cputimer.running = 0;
 		return;
 	}
 
@@ -1427,7 +1401,6 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx,
 	struct list_head *head;
 
 	BUG_ON(clock_idx == CPUCLOCK_SCHED);
-	start_process_timers(tsk);
 	cpu_timer_sample_group(clock_idx, tsk, &now);
 
 	if (oldval) {
diff --git a/kernel/sys.c b/kernel/sys.c
index 87ca037..6fe340c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -910,7 +910,7 @@ void do_sys_times(struct tms *tms)
 	struct task_cputime cputime;
 	cputime_t cutime, cstime;
 
-	thread_group_cputime(current, &cputime);
+	thread_group_sum_time(current, &cputime);
 	spin_lock_irq(&current->sighand->siglock);
 	cutime = current->signal->cutime;
 	cstime = current->signal->cstime;
@@ -1657,7 +1657,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 				break;
 
 		case RUSAGE_SELF:
-			thread_group_cputime(p, &cputime);
+			thread_group_sum_time(p, &cputime);
 			utime = cputime_add(utime, cputime.utime);
 			stime = cputime_add(stime, cputime.stime);
 			r->ru_nvcsw += p->signal->nvcsw;



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-05 12:06 ` [PATCH 0/2] fix the itimer regression (BZ 12618) Ingo Molnar
@ 2009-02-06  4:51   ` Zhang, Yanmin
  2009-02-06 15:18     ` Ingo Molnar
  2009-02-10  2:48   ` Lin Ming
  1 sibling, 1 reply; 20+ messages in thread
From: Zhang, Yanmin @ 2009-02-06  4:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, tglx, oleg, linux-kernel, seto.hidetoshi, ming.m.lin

On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > This should hopefully address all the itimer borkage.
> 
> Applied to tip:timers/urgent, thanks Peter!
> 
> Yanmin: could you check hacbench_pth with latest tip/master, do
> these fixes resolve that 3% regression you reported?
Lin Ming tested it and hackbench_pth/volanoMark regression all
disappear. But oltp has a regression. We think oltp new regression isn't related
to the patch. Ming is investigating it.

Yanmin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-06  4:51   ` Zhang, Yanmin
@ 2009-02-06 15:18     ` Ingo Molnar
  2009-02-09  6:46       ` Lin Ming
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-02-06 15:18 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Peter Zijlstra, tglx, oleg, linux-kernel, seto.hidetoshi, ming.m.lin


* Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:

> On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > 
> > > This should hopefully address all the itimer borkage.
> > 
> > Applied to tip:timers/urgent, thanks Peter!
> > 
> > Yanmin: could you check hacbench_pth with latest tip/master, do
> > these fixes resolve that 3% regression you reported?
>
> Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> But oltp has a regression. We think oltp new regression isn't related to 
> the patch. Ming is investigating it.

Potential suspects for oltp regression would be:

 3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
 a571bbe: sched: fix buddie group latency
 a9f3e2b: sched: clear buddies more aggressively
 1596e29: sched: symmetric sync vs avg_overlap
 d942fb6: sched: fix sync wakeups

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-06 15:18     ` Ingo Molnar
@ 2009-02-09  6:46       ` Lin Ming
  2009-02-09 21:47         ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Lin Ming @ 2009-02-09  6:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, tglx, oleg, linux-kernel, seto.hidetoshi

On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> 
> > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > 
> > > > This should hopefully address all the itimer borkage.
> > > 
> > > Applied to tip:timers/urgent, thanks Peter!
> > > 
> > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > these fixes resolve that 3% regression you reported?
> >
> > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > But oltp has a regression. We think oltp new regression isn't related to 
> > the patch. Ming is investigating it.
> 
> Potential suspects for oltp regression would be:
> 
>  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
>  a571bbe: sched: fix buddie group latency
>  a9f3e2b: sched: clear buddies more aggressively
>  1596e29: sched: symmetric sync vs avg_overlap
>  d942fb6: sched: fix sync wakeups

I tested the latest tip-master branch.
After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
on the 8cores Stockley machine is mostly fixed.

On another 4*4 cores Tigerton machine, oltp has more than 10% regression
with 2.6.29-rc4 compared with 2.6.29-rc3.

Lin Ming

> 
> 	Ingo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-09  6:46       ` Lin Ming
@ 2009-02-09 21:47         ` Ingo Molnar
  2009-02-10  5:52           ` Mike Galbraith
  2009-02-10 12:47           ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-02-09 21:47 UTC (permalink / raw)
  To: Lin Ming, Peter Zijlstra, Mike Galbraith
  Cc: Zhang, Yanmin, Peter Zijlstra, tglx, oleg, linux-kernel, seto.hidetoshi


* Lin Ming <ming.m.lin@intel.com> wrote:

> On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > 
> > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > 
> > > > > This should hopefully address all the itimer borkage.
> > > > 
> > > > Applied to tip:timers/urgent, thanks Peter!
> > > > 
> > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > these fixes resolve that 3% regression you reported?
> > >
> > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > But oltp has a regression. We think oltp new regression isn't related to 
> > > the patch. Ming is investigating it.
> > 
> > Potential suspects for oltp regression would be:
> > 
> >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> >  a571bbe: sched: fix buddie group latency
> >  a9f3e2b: sched: clear buddies more aggressively
> >  1596e29: sched: symmetric sync vs avg_overlap
> >  d942fb6: sched: fix sync wakeups
> 
> I tested the latest tip-master branch.
> After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> on the 8cores Stockley machine is mostly fixed.
> 
> On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> with 2.6.29-rc4 compared with 2.6.29-rc3.

ok, that commit needs fixed or reverted. Peter, Mike?

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-05 12:06 ` [PATCH 0/2] fix the itimer regression (BZ 12618) Ingo Molnar
  2009-02-06  4:51   ` Zhang, Yanmin
@ 2009-02-10  2:48   ` Lin Ming
  2009-02-11 12:59     ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Lin Ming @ 2009-02-10  2:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, tglx, oleg, linux-kernel, yanmin_zhang, seto.hidetoshi

On Thu, Feb 5, 2009 at 8:06 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>> This should hopefully address all the itimer borkage.
>
> Applied to tip:timers/urgent, thanks Peter!
>
> Yanmin: could you check hacbench_pth with latest tip/master, do
> these fixes resolve that 3% regression you reported?

hacbench_pth still has regression with 2.6.29-rc4.
Below 2 patches that fixed the regression are not merged into 2.6.29-rc4.

commit 32bd671d6cbeda60dc73be77fa2b9037d9a9bfa0
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Thu Feb 5 12:24:15 2009 +0100

    signal: re-add dead task accumulation stats.

commit 4cd4c1b40d40447fb5e7ba80746c6d7ba91d7a53
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Thu Feb 5 12:24:16 2009 +0100

    timers: split process wide cpu clocks/timers

Lin Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-09 21:47         ` Ingo Molnar
@ 2009-02-10  5:52           ` Mike Galbraith
  2009-02-10 12:47           ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2009-02-10  5:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lin Ming, Peter Zijlstra, Zhang, Yanmin, tglx, oleg,
	linux-kernel, seto.hidetoshi

On Mon, 2009-02-09 at 22:47 +0100, Ingo Molnar wrote:
> * Lin Ming <ming.m.lin@intel.com> wrote:
> 
> > On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > 
> > > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > > 
> > > > > > This should hopefully address all the itimer borkage.
> > > > > 
> > > > > Applied to tip:timers/urgent, thanks Peter!
> > > > > 
> > > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > > these fixes resolve that 3% regression you reported?
> > > >
> > > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > > But oltp has a regression. We think oltp new regression isn't related to 
> > > > the patch. Ming is investigating it.
> > > 
> > > Potential suspects for oltp regression would be:
> > > 
> > >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> > >  a571bbe: sched: fix buddie group latency
> > >  a9f3e2b: sched: clear buddies more aggressively
> > >  1596e29: sched: symmetric sync vs avg_overlap
> > >  d942fb6: sched: fix sync wakeups
> > 
> > I tested the latest tip-master branch.
> > After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> > on the 8cores Stockley machine is mostly fixed.
> > 
> > On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> > with 2.6.29-rc4 compared with 2.6.29-rc3.
> 
> ok, that commit needs fixed or reverted. Peter, Mike?

I see some ~problems.

Looking at the tasks sitting in my ~idle box right now:

tasks 284, avg_overlap = 0.000000 196

starts make -j30

tasks 401, avg_overlap = 0.000000 285

0.0 (should) means zero wakeups since birth, it does not mean this task
is showing synchronous behavior until it's non-zero.  New tasks start
with zero, so until they grow an avg_overlap, when they wake, at least
half of the decision making data is bogus/non-existent.  With make -j30,
I added 117 tasks, 89 are unknown, 28 known.  This parallel load _tries_
to go affine.  On an nfs mount where runners are also frequent (and
truly synchronous) wakers, it tries really hard.

IOW, I think the affinity logic may become too strong when strengthened
by flipping the hint.  I originally inverted that test to filter out the
case where we _have_ behavioral data indicating that the tasks in
question were definitely not synchronous despite the sync wakeup hint.

Another ~problem is that a task with low avg_overlap can change
behavior to cpu hog, and retain it's stale avg_overlap up to forever.

Maybe we shouldn't use avg_overlap until it's been established.

But..

Flip-side: I have a strong feeling that that _not_ using it would have
negative impact.  Freshly forked task generates red-hot data for a yet
to be awakened partner...

Sigh.  Damned if ya do, damned if ya don't.

	-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-09 21:47         ` Ingo Molnar
  2009-02-10  5:52           ` Mike Galbraith
@ 2009-02-10 12:47           ` Peter Zijlstra
  2009-02-11  2:09             ` Zhang, Yanmin
  2009-02-11 13:11             ` Ingo Molnar
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-10 12:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lin Ming, Mike Galbraith, Zhang, Yanmin, tglx, oleg,
	linux-kernel, seto.hidetoshi

On Mon, 2009-02-09 at 22:47 +0100, Ingo Molnar wrote:
> * Lin Ming <ming.m.lin@intel.com> wrote:
> 
> > On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > 
> > > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > > 
> > > > > > This should hopefully address all the itimer borkage.
> > > > > 
> > > > > Applied to tip:timers/urgent, thanks Peter!
> > > > > 
> > > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > > these fixes resolve that 3% regression you reported?
> > > >
> > > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > > But oltp has a regression. We think oltp new regression isn't related to 
> > > > the patch. Ming is investigating it.
> > > 
> > > Potential suspects for oltp regression would be:
> > > 
> > >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> > >  a571bbe: sched: fix buddie group latency
> > >  a9f3e2b: sched: clear buddies more aggressively
> > >  1596e29: sched: symmetric sync vs avg_overlap
> > >  d942fb6: sched: fix sync wakeups
> > 
> > I tested the latest tip-master branch.
> > After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> > on the 8cores Stockley machine is mostly fixed.
> > 
> > On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> > with 2.6.29-rc4 compared with 2.6.29-rc3.
> 
> ok, that commit needs fixed or reverted. Peter, Mike?

Yanmin, is that tigerton regression also due to the sync changes?

That is, if you revert both d942fb6 and 1596e29, does it get back to
-rc3 state, or is the tigerton regression due to something else?

This isn't quite clear to me.

Ingo, if that is the case, I'm fine with reverting those changes for
now, and have another look at them later on -- preferably when someone
ships me a 4*4 machine so I can validate :-)




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-10 12:47           ` Peter Zijlstra
@ 2009-02-11  2:09             ` Zhang, Yanmin
  2009-02-12 11:05               ` Ingo Molnar
  2009-02-11 13:11             ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Zhang, Yanmin @ 2009-02-11  2:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Lin Ming, Mike Galbraith, tglx, oleg, linux-kernel,
	seto.hidetoshi

On Tue, 2009-02-10 at 13:47 +0100, Peter Zijlstra wrote:
> On Mon, 2009-02-09 at 22:47 +0100, Ingo Molnar wrote:
> > * Lin Ming <ming.m.lin@intel.com> wrote:
> > 
> > > On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > > 
> > > > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > > > 
> > > > > > > This should hopefully address all the itimer borkage.
> > > > > > 
> > > > > > Applied to tip:timers/urgent, thanks Peter!
> > > > > > 
> > > > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > > > these fixes resolve that 3% regression you reported?
> > > > >
> > > > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > > > But oltp has a regression. We think oltp new regression isn't related to 
> > > > > the patch. Ming is investigating it.
> > > > 
> > > > Potential suspects for oltp regression would be:
> > > > 
> > > >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> > > >  a571bbe: sched: fix buddie group latency
> > > >  a9f3e2b: sched: clear buddies more aggressively
> > > >  1596e29: sched: symmetric sync vs avg_overlap
> > > >  d942fb6: sched: fix sync wakeups
> > > 
> > > I tested the latest tip-master branch.
> > > After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> > > on the 8cores Stockley machine is mostly fixed.
> > > 
> > > On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> > > with 2.6.29-rc4 compared with 2.6.29-rc3.
> > 
> > ok, that commit needs fixed or reverted. Peter, Mike?
> 
> Yanmin, is that tigerton regression also due to the sync changes?
Yes.

> 
> That is, if you revert both d942fb6 and 1596e29, does it get back to
> -rc3 state,
Yes.

>  or is the tigerton regression due to something else?
> This isn't quite clear to me.
> 
> Ingo, if that is the case, I'm fine with reverting those changes for
> now, and have another look at them later on -- preferably when someone
> ships me a 4*4 machine so I can validate :-)
2*4 stoakley has the similiar regression. To find potential scalability issues,
I run sysbench+mysql(oltp) with many thread numbers, such like 8,12,16,32,64,128,
then get an average value.

yanmin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-10  2:48   ` Lin Ming
@ 2009-02-11 12:59     ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-02-11 12:59 UTC (permalink / raw)
  To: Lin Ming
  Cc: Peter Zijlstra, tglx, oleg, linux-kernel, yanmin_zhang, seto.hidetoshi


* Lin Ming <lin@minggr.cn> wrote:

> On Thu, Feb 5, 2009 at 8:06 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> >> This should hopefully address all the itimer borkage.
> >
> > Applied to tip:timers/urgent, thanks Peter!
> >
> > Yanmin: could you check hacbench_pth with latest tip/master, do
> > these fixes resolve that 3% regression you reported?
> 
> hacbench_pth still has regression with 2.6.29-rc4.
> Below 2 patches that fixed the regression are not merged into 2.6.29-rc4.
> 
> commit 32bd671d6cbeda60dc73be77fa2b9037d9a9bfa0
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Thu Feb 5 12:24:15 2009 +0100
> 
>     signal: re-add dead task accumulation stats.
> 
> commit 4cd4c1b40d40447fb5e7ba80746c6d7ba91d7a53
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Thu Feb 5 12:24:16 2009 +0100
> 
>     timers: split process wide cpu clocks/timers

they are queued up but not fully baken yet. Maybe in -rc5.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-10 12:47           ` Peter Zijlstra
  2009-02-11  2:09             ` Zhang, Yanmin
@ 2009-02-11 13:11             ` Ingo Molnar
  2009-02-11 13:27               ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-02-11 13:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lin Ming, Mike Galbraith, Zhang, Yanmin, tglx, oleg,
	linux-kernel, seto.hidetoshi


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2009-02-09 at 22:47 +0100, Ingo Molnar wrote:
> > * Lin Ming <ming.m.lin@intel.com> wrote:
> > 
> > > On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > > 
> > > > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > > > 
> > > > > > > This should hopefully address all the itimer borkage.
> > > > > > 
> > > > > > Applied to tip:timers/urgent, thanks Peter!
> > > > > > 
> > > > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > > > these fixes resolve that 3% regression you reported?
> > > > >
> > > > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > > > But oltp has a regression. We think oltp new regression isn't related to 
> > > > > the patch. Ming is investigating it.
> > > > 
> > > > Potential suspects for oltp regression would be:
> > > > 
> > > >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> > > >  a571bbe: sched: fix buddie group latency
> > > >  a9f3e2b: sched: clear buddies more aggressively
> > > >  1596e29: sched: symmetric sync vs avg_overlap
> > > >  d942fb6: sched: fix sync wakeups
> > > 
> > > I tested the latest tip-master branch.
> > > After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> > > on the 8cores Stockley machine is mostly fixed.
> > > 
> > > On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> > > with 2.6.29-rc4 compared with 2.6.29-rc3.
> > 
> > ok, that commit needs fixed or reverted. Peter, Mike?
> 
> Yanmin, is that tigerton regression also due to the sync changes?
> 
> That is, if you revert both d942fb6 and 1596e29, does it get back to
> -rc3 state, or is the tigerton regression due to something else?
> 
> This isn't quite clear to me.
> 
> Ingo, if that is the case, I'm fine with reverting those changes for
> now, and have another look at them later on -- preferably when someone
> ships me a 4*4 machine so I can validate :-)

Could you please send a changelogged minimal reverter patch with Reported-by
and Bisected-by tags in place, etc? It does not have to be a full revert,
if you think we can do with less that's fine too.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-11 13:11             ` Ingo Molnar
@ 2009-02-11 13:27               ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-11 13:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lin Ming, Mike Galbraith, Zhang, Yanmin, tglx, oleg,
	linux-kernel, seto.hidetoshi

On Wed, 2009-02-11 at 14:11 +0100, Ingo Molnar wrote:

> Could you please send a changelogged minimal reverter patch with Reported-by
> and Bisected-by tags in place, etc? It does not have to be a full revert,
> if you think we can do with less that's fine too.

I don't think its worth playing games with at this point, lets just do a
full revert.

---
Subject: sched: revert recent sync wakeup changes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Feb 11 14:22:24 CET 2009

Intel reported a hefty 10% regression on a 16-way machine with these patches:

  1596e29: sched: symmetric sync vs avg_overlap
  d942fb6: sched: fix sync wakeups

Revert them.

Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Bisected-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c      |   10 ----------
 kernel/sched_fair.c |   11 +++++++++--
 2 files changed, 9 insertions(+), 12 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2362,16 +2362,6 @@ static int try_to_wake_up(struct task_st
 	if (!sched_feat(SYNC_WAKEUPS))
 		sync = 0;
 
-	if (!sync) {
-		if (current->se.avg_overlap < sysctl_sched_migration_cost &&
-			  p->se.avg_overlap < sysctl_sched_migration_cost)
-			sync = 1;
-	} else {
-		if (current->se.avg_overlap >= sysctl_sched_migration_cost ||
-			  p->se.avg_overlap >= sysctl_sched_migration_cost)
-			sync = 0;
-	}
-
 #ifdef CONFIG_SMP
 	if (sched_feat(LB_WAKEUP_UPDATE)) {
 		struct sched_domain *sd;
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1191,15 +1191,20 @@ wake_affine(struct sched_domain *this_sd
 	    int idx, unsigned long load, unsigned long this_load,
 	    unsigned int imbalance)
 {
+	struct task_struct *curr = this_rq->curr;
+	struct task_group *tg;
 	unsigned long tl = this_load;
 	unsigned long tl_per_task;
-	struct task_group *tg;
 	unsigned long weight;
 	int balanced;
 
 	if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
 		return 0;
 
+	if (sync && (curr->se.avg_overlap > sysctl_sched_migration_cost ||
+			p->se.avg_overlap > sysctl_sched_migration_cost))
+		sync = 0;
+
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
 	 * effect of the currently running task from the load
@@ -1473,7 +1478,9 @@ static void check_preempt_wakeup(struct 
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
-	if (sched_feat(WAKEUP_OVERLAP) && sync) {
+	if (sched_feat(WAKEUP_OVERLAP) && (sync ||
+			(se->avg_overlap < sysctl_sched_migration_cost &&
+			 pse->avg_overlap < sysctl_sched_migration_cost))) {
 		resched_task(curr);
 		return;
 	}



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-11  2:09             ` Zhang, Yanmin
@ 2009-02-12 11:05               ` Ingo Molnar
  2009-02-13  9:15                 ` Lin Ming
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-02-12 11:05 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Peter Zijlstra, Lin Ming, Mike Galbraith, tglx, oleg,
	linux-kernel, seto.hidetoshi


* Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:

> On Tue, 2009-02-10 at 13:47 +0100, Peter Zijlstra wrote:
> > On Mon, 2009-02-09 at 22:47 +0100, Ingo Molnar wrote:
> > > * Lin Ming <ming.m.lin@intel.com> wrote:
> > > 
> > > > On Fri, 2009-02-06 at 23:18 +0800, Ingo Molnar wrote:
> > > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > > > 
> > > > > > On Thu, 2009-02-05 at 13:06 +0100, Ingo Molnar wrote:
> > > > > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > > > > > 
> > > > > > > > This should hopefully address all the itimer borkage.
> > > > > > > 
> > > > > > > Applied to tip:timers/urgent, thanks Peter!
> > > > > > > 
> > > > > > > Yanmin: could you check hacbench_pth with latest tip/master, do
> > > > > > > these fixes resolve that 3% regression you reported?
> > > > > >
> > > > > > Lin Ming tested it and hackbench_pth/volanoMark regression all disappear. 
> > > > > > But oltp has a regression. We think oltp new regression isn't related to 
> > > > > > the patch. Ming is investigating it.
> > > > > 
> > > > > Potential suspects for oltp regression would be:
> > > > > 
> > > > >  3d39870: sched_rt: don't use first_cpu on cpumask created with cpumask_and
> > > > >  a571bbe: sched: fix buddie group latency
> > > > >  a9f3e2b: sched: clear buddies more aggressively
> > > > >  1596e29: sched: symmetric sync vs avg_overlap
> > > > >  d942fb6: sched: fix sync wakeups
> > > > 
> > > > I tested the latest tip-master branch.
> > > > After reverting "d942fb6: sched: fix sync wakeups", the oltp regression
> > > > on the 8cores Stockley machine is mostly fixed.
> > > > 
> > > > On another 4*4 cores Tigerton machine, oltp has more than 10% regression
> > > > with 2.6.29-rc4 compared with 2.6.29-rc3.
> > > 
> > > ok, that commit needs fixed or reverted. Peter, Mike?
> > 
> > Yanmin, is that tigerton regression also due to the sync changes?
> Yes.
> 
> > 
> > That is, if you revert both d942fb6 and 1596e29, does it get back to
> > -rc3 state,
> Yes.
> 
> >  or is the tigerton regression due to something else?
> > This isn't quite clear to me.
> > 
> > Ingo, if that is the case, I'm fine with reverting those changes for
> > now, and have another look at them later on -- preferably when someone
> > ships me a 4*4 machine so I can validate :-)
> 2*4 stoakley has the similiar regression. To find potential scalability issues,
> I run sysbench+mysql(oltp) with many thread numbers, such like 8,12,16,32,64,128,
> then get an average value.

FYI, in Linus's latest tree (v2.6.29-rc4-175-gb578f3f or later), all
the scheduler related performance regressions should be addressed.

Could you please double-check that there's no performance regression
remaining?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-12 11:05               ` Ingo Molnar
@ 2009-02-13  9:15                 ` Lin Ming
  2009-02-13 10:06                   ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Lin Ming @ 2009-02-13  9:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Mike Galbraith, tglx, oleg,
	linux-kernel, seto.hidetoshi


> FYI, in Linus's latest tree (v2.6.29-rc4-175-gb578f3f or later), all
> the scheduler related performance regressions should be addressed.
> 
> Could you please double-check that there's no performance regression
> remaining?

I have tested the latest git tree(071a0bc2).
hackbench, volanoMark, oltp, specjbb2005 regressions are all fixed.

Thanks,
Lin Ming


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] fix the itimer regression (BZ 12618)
  2009-02-13  9:15                 ` Lin Ming
@ 2009-02-13 10:06                   ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-02-13 10:06 UTC (permalink / raw)
  To: Lin Ming
  Cc: Zhang, Yanmin, Peter Zijlstra, Mike Galbraith, tglx, oleg,
	linux-kernel, seto.hidetoshi


* Lin Ming <ming.m.lin@intel.com> wrote:

> 
> > FYI, in Linus's latest tree (v2.6.29-rc4-175-gb578f3f or later), all
> > the scheduler related performance regressions should be addressed.
> > 
> > Could you please double-check that there's no performance regression
> > remaining?
> 
> I have tested the latest git tree(071a0bc2).
> hackbench, volanoMark, oltp, specjbb2005 regressions are all fixed.

Great - thanks for testing it!

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-02-13 10:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-05 11:24 [PATCH 0/2] fix the itimer regression (BZ 12618) Peter Zijlstra
2009-02-05 11:24 ` [PATCH 1/2] signal: re-add dead task accumulation stats Peter Zijlstra
2009-02-05 11:24 ` [PATCH 2/2] timers: split process wide cpu clocks/timers Peter Zijlstra
2009-02-05 21:30   ` Oleg Nesterov
2009-02-05 22:20     ` Peter Zijlstra
2009-02-05 12:06 ` [PATCH 0/2] fix the itimer regression (BZ 12618) Ingo Molnar
2009-02-06  4:51   ` Zhang, Yanmin
2009-02-06 15:18     ` Ingo Molnar
2009-02-09  6:46       ` Lin Ming
2009-02-09 21:47         ` Ingo Molnar
2009-02-10  5:52           ` Mike Galbraith
2009-02-10 12:47           ` Peter Zijlstra
2009-02-11  2:09             ` Zhang, Yanmin
2009-02-12 11:05               ` Ingo Molnar
2009-02-13  9:15                 ` Lin Ming
2009-02-13 10:06                   ` Ingo Molnar
2009-02-11 13:11             ` Ingo Molnar
2009-02-11 13:27               ` Peter Zijlstra
2009-02-10  2:48   ` Lin Ming
2009-02-11 12:59     ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.