linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] posix-cpu-timers fixlet
@ 2013-04-30  3:17 kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 01/10] events: Protect access via task_subsys_state_check() kosaki.motohiro
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra

Glibc's posix timer testcase found a lot of bugs in posix timer code. This series, hopefully,
fixes all of them. All patches are independent each other logically.


[PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
[PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
[PATCH 3/7] posix-cpu-timers: fix wrong timer initialization
[PATCH 4/7] posix-cpu-timers: timer functions must use timer time instead of clock time
[PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
[PATCH 6/7] sched: task_sched_runtime introduce micro optimization
[PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/10] events: Protect access via task_subsys_state_check()
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 02/10] vm: add no-mmu vm_iomap_memory() stub kosaki.motohiro
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, a.p.zijlstra, paulus, acme

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

The following RCU splat indicates lack of RCU protection:

[  953.267649] ===============================
[  953.267652] [ INFO: suspicious RCU usage. ]
[  953.267657] 3.9.0-0.rc6.git2.4.fc19.ppc64p7 #1 Not tainted
[  953.267661] -------------------------------
[  953.267664] include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
[  953.267669]
[  953.267669] other info that might help us debug this:
[  953.267669]
[  953.267675]
[  953.267675] rcu_scheduler_active = 1, debug_locks = 0
[  953.267680] 1 lock held by glxgears/1289:
[  953.267683]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<c00000000027f884>] .prepare_bprm_creds+0x34/0xa0
[  953.267700]
[  953.267700] stack backtrace:
[  953.267704] Call Trace:
[  953.267709] [c0000001f0d1b6e0] [c000000000016e30] .show_stack+0x130/0x200 (unreliable)
[  953.267717] [c0000001f0d1b7b0] [c0000000001267f8] .lockdep_rcu_suspicious+0x138/0x180
[  953.267724] [c0000001f0d1b840] [c0000000001d43a4] .perf_event_comm+0x4c4/0x690
[  953.267731] [c0000001f0d1b950] [c00000000027f6e4] .set_task_comm+0x84/0x1f0
[  953.267737] [c0000001f0d1b9f0] [c000000000280414] .setup_new_exec+0x94/0x220
[  953.267744] [c0000001f0d1ba70] [c0000000002f665c] .load_elf_binary+0x58c/0x19b0
...

This commit therefore adds the required RCU read-side critical
section to perf_event_comm().

Reported-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20130419190124.GA8638@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Gustavo Luiz Duarte <gusld@br.ibm.com>
---
 kernel/events/core.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4d3124b..9fcb094 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4596,6 +4596,7 @@ void perf_event_comm(struct task_struct *task)
 	struct perf_event_context *ctx;
 	int ctxn;
 
+	rcu_read_lock();
 	for_each_task_context_nr(ctxn) {
 		ctx = task->perf_event_ctxp[ctxn];
 		if (!ctx)
@@ -4603,6 +4604,7 @@ void perf_event_comm(struct task_struct *task)
 
 		perf_event_enable_on_exec(ctx);
 	}
+	rcu_read_unlock();
 
 	if (!atomic_read(&nr_comm_events))
 		return;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/10] vm: add no-mmu vm_iomap_memory() stub
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 01/10] events: Protect access via task_subsys_state_check() kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 03/10] ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock kosaki.motohiro
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Linus Torvalds

From: Linus Torvalds <torvalds@linux-foundation.org>

I think we could just move the full vm_iomap_memory() function into
util.h or similar, but I didn't get any reply from anybody actually
using nommu even to this trivial patch, so I'm not going to touch it any
more than required.

Here's the fairly minimal stub to make the nommu case at least
potentially work.  It doesn't seem like anybody cares, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/nommu.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index 2f3ea74..e001768 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1838,6 +1838,16 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(remap_pfn_range);
 
+int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len)
+{
+	unsigned long pfn = start >> PAGE_SHIFT;
+	unsigned long vm_len = vma->vm_end - vma->vm_start;
+
+	pfn += vma->vm_pgoff;
+	return io_remap_pfn_range(vma, vma->vm_start, pfn, vm_len, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(vm_iomap_memory);
+
 int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 			unsigned long pgoff)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/10] ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 01/10] events: Protect access via task_subsys_state_check() kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 02/10] vm: add no-mmu vm_iomap_memory() stub kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Kishon Vijay Abraham I, Keerthy,
	Benoît Cousson, Paul Walmsley, Tony Lindgren

From: Kishon Vijay Abraham I <kishon@ti.com>

Commit 92702df3570e ("ARM: OMAP4: PM: fix PM regression introduced by recent
clock cleanup") makes the 'ocp2scp_usb_phy_phy_48m' as optional
functional clock causing regression in MUSB. But this 48MHz clock is a
mandatory clock for usb phy attached to ocp2scp and hence made as the main
clock for ocp2scp.

Cc: Keerthy <j-keerthy@ti.com>
Cc: Benoît Cousson <b-cousson@ti.com>
Cc: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[paul@pwsan.com: add comment to the hwmod data to try to prevent any
 future mistakes here]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 arch/arm/mach-omap2/omap_hwmod_44xx_data.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
index 9e05765..eaba9dc 100644
--- a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
@@ -2714,16 +2714,22 @@ static struct omap_ocp2scp_dev ocp2scp_dev_attr[] = {
 	{ }
 };
 
-static struct omap_hwmod_opt_clk ocp2scp_usb_phy_opt_clks[] = {
-	{ .role = "48mhz", .clk = "ocp2scp_usb_phy_phy_48m" },
-};
-
 /* ocp2scp_usb_phy */
 static struct omap_hwmod omap44xx_ocp2scp_usb_phy_hwmod = {
 	.name		= "ocp2scp_usb_phy",
 	.class		= &omap44xx_ocp2scp_hwmod_class,
 	.clkdm_name	= "l3_init_clkdm",
-	.main_clk	= "func_48m_fclk",
+	/*
+	 * ocp2scp_usb_phy_phy_48m is provided by the OMAP4 PRCM IP
+	 * block as an "optional clock," and normally should never be
+	 * specified as the main_clk for an OMAP IP block.  However it
+	 * turns out that this clock is actually the main clock for
+	 * the ocp2scp_usb_phy IP block:
+	 * http://lists.infradead.org/pipermail/linux-arm-kernel/2012-September/119943.html
+	 * So listing ocp2scp_usb_phy_phy_48m as a main_clk here seems
+	 * to be the best workaround.
+	 */
+	.main_clk	= "ocp2scp_usb_phy_phy_48m",
 	.prcm = {
 		.omap4 = {
 			.clkctrl_offs = OMAP4_CM_L3INIT_USBPHYOCP2SCP_CLKCTRL_OFFSET,
@@ -2732,8 +2738,6 @@ static struct omap_hwmod omap44xx_ocp2scp_usb_phy_hwmod = {
 		},
 	},
 	.dev_attr	= ocp2scp_dev_attr,
-	.opt_clks	= ocp2scp_usb_phy_opt_clks,
-	.opt_clks_cnt	= ARRAY_SIZE(ocp2scp_usb_phy_opt_clks),
 };
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (2 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 03/10] ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:38   ` KOSAKI Motohiro
  2013-04-30  3:17 ` [PATCH 05/10] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

When tsk->signal->cputimer->running is 1, signal->cputimer and
tsk->sum_sched_runtime increase the same pace. update_curr() increase
both account.

However, there is one exeception. When thread exiting, __exit_signal()
turns over task's sum_shced_runtime to sig->sum_sched_runtime. but it
doesn't stop signal->cputimer accounting.

This inconsistency makes too early POSIX timer wakeup. Fix it.

Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/sched/stats.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 2ef90a5..5a0cfc4 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
+	/*
+	 * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
+	 * in __exit_signal(), we must not account exec_runtime for consistency.
+	 */
+	if (unlikely(!tsk->sighand))
+		return;
+
 	raw_spin_lock(&cputimer->lock);
 	cputimer->cputime.sum_exec_runtime += ns;
 	raw_spin_unlock(&cputimer->lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/10] posix-cpu-timers: fix acounting delta_exec twice
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (3 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Currently glibc rt/tst-cpuclock2 test(*) sporadically fail. Because
scheduler delta can be accounted twice from thread_group_cputimer()
and account_group_exec_runtime().

Finally, clock_nanosleep() wakes up before an argument. And that is
posix violation. This issue was introduced by commit d670ec1317
(posix-cpu-timers: Cure SMP wobbles).

(*) http://sourceware.org/git/?p=glibc.git;a=blob;f=rt/tst-cpuclock2.c;h=6752721717f959e89c0d692b3f1ee082d507eec2;hb=HEAD

Cc: Olivier Langlois <olivier@trillion01.com>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/binfmt_elf.c           |    2 +-
 fs/binfmt_elf_fdpic.c     |    2 +-
 include/linux/sched.h     |    4 ++--
 kernel/posix-cpu-timers.c |   15 ++++++++++-----
 kernel/sched/core.c       |    6 ++++--
 kernel/sched/cputime.c    |    8 ++++----
 6 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 86af964..fea51e7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1322,7 +1322,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9c13e02..ab5b508 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1371,7 +1371,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e692a02..7863d4b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2002,7 +2002,7 @@ static inline void disable_sched_clock_irqtime(void) {}
 #endif
 
 extern unsigned long long
-task_sched_runtime(struct task_struct *task);
+task_sched_runtime(struct task_struct *task, bool add_delta);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
@@ -2625,7 +2625,7 @@ static inline int spin_needbreak(spinlock_t *lock)
 /*
  * Thread group CPU time accounting.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times);
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 8fd709c..e56be4c 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -220,7 +220,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 		cpu->cpu = virt_ticks(p);
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p);
+		cpu->sched = task_sched_runtime(p, true);
 		break;
 	}
 	return 0;
@@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * values through the TIMER_ABSTIME flag, therefore we have
 		 * to synchronize the timer to the clock every time we start
 		 * it.
+		 *
+		 * Do not add the current delta, because
+		 * account_group_exec_runtime() will also this delta and we
+		 * wouldn't want to double account time and get ahead of
+		 * ourselves.
 		 */
-		thread_group_cputime(tsk, &sum);
+		thread_group_cputime(tsk, false, &sum);
 		raw_spin_lock_irqsave(&cputimer->lock, flags);
 		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
@@ -275,15 +280,15 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d0465..ad3339f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2664,14 +2664,16 @@ unsigned long long task_delta_exec(struct task_struct *p)
  * In case the task is currently running, return the runtime plus current's
  * pending runtime that have not been accounted yet.
  */
-unsigned long long task_sched_runtime(struct task_struct *p)
+unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
 {
 	unsigned long flags;
 	struct rq *rq;
 	u64 ns = 0;
 
 	rq = task_rq_lock(p, &flags);
-	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+	ns = p->se.sum_exec_runtime;
+	if (add_delta)
+		ns += do_task_delta_exec(p, rq);
 	task_rq_unlock(rq, p, &flags);
 
 	return ns;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e93cca9..69d3f6c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -293,7 +293,7 @@ static __always_inline bool steal_account_process_tick(void)
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
  * tasks (sum on group iteration) belonging to @tsk's group.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
 	cputime_t utime, stime;
@@ -313,7 +313,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 		task_cputime(t, &utime, &stime);
 		times->utime += utime;
 		times->stime += stime;
-		times->sum_exec_runtime += task_sched_runtime(t);
+		times->sum_exec_runtime += task_sched_runtime(t, add_delta);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
@@ -459,7 +459,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, true, &cputime);
 
 	*ut = cputime.utime;
 	*st = cputime.stime;
@@ -594,7 +594,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, true, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (4 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 05/10] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30 13:34   ` Olivier Langlois
  2013-04-30  3:17 ` [PATCH 07/10] posix-cpu-timers: timer functions must use timer time instead of clock time kosaki.motohiro
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Currently glibc's rt/tst-cputimer1 testcase is spradically fail because
a timer created by timer_create() may fire earlier than it's specified.

posix_cpu_timer_set() uses "val" as current time for three purpose. 1)
initialize sig->cputimer. 2) calculation "old" val. 3) calculations an
expires time.

(1) and (2) should be used only committed time (i.e. without delta_exec)
because run_posix_cpu_timers() don't care of delta_exec and we need
consistency. But (3) need exact current time (aka cpu clock time) beucase
an expires should be "now + timeout" by definition.

This patch makes separete two kind of "now".

Cc: Olivier Langlois <olivier@trillion01.com>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/kernel_stat.h |    5 -----
 kernel/posix-cpu-timers.c   |   14 ++++++++++++--
 kernel/sched/core.c         |   13 -------------
 3 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index ed5f6ed..f5d4fdf 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -117,11 +117,6 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 	return kstat_cpu(cpu).irqs_sum;
 }
 
-/*
- * Lock/unlock the current runqueue - to extract task statistics:
- */
-extern unsigned long long task_delta_exec(struct task_struct *);
-
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
 extern void account_steal_time(cputime_t);
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index e56be4c..98f354e 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -635,7 +635,7 @@ static int cpu_timer_sample_group(const clockid_t which_clock,
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
 	return 0;
@@ -749,7 +749,17 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 	}
 
 	if (new_expires.sched != 0 && !(flags & TIMER_ABSTIME)) {
-		cpu_time_add(timer->it_clock, &new_expires, val);
+		union cpu_time_count now;
+
+		/*
+		 * The expires is "now + timeout" by definition. So,
+		 * we need exact current time.
+		 */
+		if (CPUCLOCK_PERTHREAD(timer->it_clock))
+			now = val;
+		else
+			cpu_clock_sample_group(timer->it_clock, p, &now);
+		cpu_time_add(timer->it_clock, &new_expires, now);
 	}
 
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad3339f..b817e6d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2646,19 +2646,6 @@ static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)
 	return ns;
 }
 
-unsigned long long task_delta_exec(struct task_struct *p)
-{
-	unsigned long flags;
-	struct rq *rq;
-	u64 ns = 0;
-
-	rq = task_rq_lock(p, &flags);
-	ns = do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, p, &flags);
-
-	return ns;
-}
-
 /*
  * Return accounted runtime for the task.
  * In case the task is currently running, return the runtime plus current's
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/10] posix-cpu-timers: timer functions must use timer time instead of clock time
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (5 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 08/10] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

For process timer, we use cpu_clock_sample_group() and cpu_timer_sample_group()
correctly. However for thread timer, we always use cpu_clock_sample().
That's wrong becuase cpu_clock_sample() account uncommited delta_exec too. And
this is inconsistency against run_posix_cpu_tiemrs().

This is not big matter because a following workaround code in
posix_cpu_timer_get() hides the timer miscalculation issue. However it makes
wrong rq lock held and would be better to fix.

  if (cpu_time_before(timer->it_clock, now, timer->it.cpu.expires)) {
    ...
  } else {
    /*
     * The timer should have expired already, but the firing
     * hasn't taken place yet.  Say it's just about to expire.
     */
     itp->it_value.tv_nsec = 1;
     itp->it_value.tv_sec = 0;

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/posix-cpu-timers.c |   37 +++++++++++++++++++++++++++----------
 1 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 98f354e..c69b8d8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -204,11 +204,10 @@ posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
 }
 
 
-/*
- * Sample a per-thread clock for the given task.
- */
-static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
+static int do_cpu_clock_timer_sample(const clockid_t which_clock,
+				     struct task_struct *p,
+				     bool add_delta,
+				     union cpu_time_count *cpu)
 {
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
@@ -220,12 +219,30 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 		cpu->cpu = virt_ticks(p);
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p, true);
+		cpu->sched = task_sched_runtime(p, add_delta);
 		break;
 	}
 	return 0;
 }
 
+/*
+ * Sample a per-thread clock for the given task.
+ */
+static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, cpu);
+}
+
+/*
+ * Sample a per-thread timer clock for the given task.
+ */
+static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, cpu);
+}
+
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
 	if (b->utime > a->utime)
@@ -700,7 +717,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 	 * check if it's already passed.  In short, we need a sample.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &val);
+		cpu_timer_sample(timer->it_clock, p, &val);
 	} else {
 		cpu_timer_sample_group(timer->it_clock, p, &val);
 	}
@@ -756,7 +773,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 		 * we need exact current time.
 		 */
 		if (CPUCLOCK_PERTHREAD(timer->it_clock))
-			now = val;
+			cpu_clock_sample(timer->it_clock, p, &now);
 		else
 			cpu_clock_sample_group(timer->it_clock, p, &now);
 		cpu_time_add(timer->it_clock, &new_expires, now);
@@ -844,7 +861,7 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
 	 * Sample the clock to take the difference with the expiry time.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &now);
+		cpu_timer_sample(timer->it_clock, p, &now);
 		clear_dead = p->exit_state;
 	} else {
 		read_lock(&tasklist_lock);
@@ -1168,7 +1185,7 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 	 * Fetch the current sample and update the timer's expiry time.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &now);
+		cpu_timer_sample(timer->it_clock, p, &now);
 		bump_cpu_timer(timer, now);
 		if (unlikely(p->exit_state)) {
 			clear_dead_task(timer, now);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/10] posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (6 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 07/10] posix-cpu-timers: timer functions must use timer time instead of clock time kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-04-30  3:17 ` [PATCH 09/10] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

A type of tsk->se.sum_exec_runtime is u64. Thus reading it is racy when
running 32bit. We should use task_sched_runtime().

Cc: Olivier Langlois <olivier@trillion01.com>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/posix-cpu-timers.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c69b8d8..4043282 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -958,7 +958,8 @@ static void check_thread_timers(struct task_struct *tsk,
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || tsk->se.sum_exec_runtime < t->expires.sched) {
+		unsigned long long runtime = task_sched_runtime(tsk, false);
+		if (!--maxfire || runtime < t->expires.sched) {
 			tsk->cputime_expires.sched_exp = t->expires.sched;
 			break;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/10] sched: task_sched_runtime introduce micro optimization
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (7 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 08/10] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-05-01 11:00   ` Peter Zijlstra
  2013-04-30  3:17 ` [PATCH 10/10] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro
  2013-05-01 11:01 ` [PATCH v3 0/7] posix-cpu-timers fixlet Peter Zijlstra
  10 siblings, 1 reply; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

rq lock in task_sched_runtime() is necessary for two reasons. 1)
accessing se.sum_exec_runtime is inatomic on 32bit and 2)
do_task_delta_exec() require it.

And then, 64bit can avoid holds rq lock when add_delta is false.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/sched/core.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b817e6d..24ba1c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2657,6 +2657,12 @@ unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
 	struct rq *rq;
 	u64 ns = 0;
 
+	/* Micro optimization. */
+#ifdef CONFIG_64BIT
+	if (!add_delta)
+		return p->se.sum_exec_runtime;
+#endif
+
 	rq = task_rq_lock(p, &flags);
 	ns = p->se.sum_exec_runtime;
 	if (add_delta)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/10] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (8 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 09/10] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
@ 2013-04-30  3:17 ` kosaki.motohiro
  2013-05-01 11:01 ` [PATCH v3 0/7] posix-cpu-timers fixlet Peter Zijlstra
  10 siblings, 0 replies; 19+ messages in thread
From: kosaki.motohiro @ 2013-04-30  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Now we have similar four timer related functions, cpu_clock_sample(),
cpu_clock_sample_group(), cpu_timer_sample() and cpu_timer_sample_group().

For readability, making do_cpu_clock_timer_sample() and thread_cputime()
helper functions and all *_sample functions use them.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/sched.h     |    1 +
 kernel/posix-cpu-timers.c |  132 +++++++++++++++++++-------------------------
 kernel/sched/cputime.c    |   11 ++++
 3 files changed, 69 insertions(+), 75 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7863d4b..73ac8fa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2622,6 +2622,7 @@ static inline int spin_needbreak(spinlock_t *lock)
 #endif
 }
 
+void thread_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times);
 /*
  * Thread group CPU time accounting.
  */
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 4043282..b33290d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -203,46 +203,6 @@ posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
 	return error;
 }
 
-
-static int do_cpu_clock_timer_sample(const clockid_t which_clock,
-				     struct task_struct *p,
-				     bool add_delta,
-				     union cpu_time_count *cpu)
-{
-	switch (CPUCLOCK_WHICH(which_clock)) {
-	default:
-		return -EINVAL;
-	case CPUCLOCK_PROF:
-		cpu->cpu = prof_ticks(p);
-		break;
-	case CPUCLOCK_VIRT:
-		cpu->cpu = virt_ticks(p);
-		break;
-	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p, add_delta);
-		break;
-	}
-	return 0;
-}
-
-/*
- * Sample a per-thread clock for the given task.
- */
-static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
-{
-	return do_cpu_clock_timer_sample(which_clock, p, true, cpu);
-}
-
-/*
- * Sample a per-thread timer clock for the given task.
- */
-static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
-{
-	return do_cpu_clock_timer_sample(which_clock, p, false, cpu);
-}
-
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
 	if (b->utime > a->utime)
@@ -284,34 +244,83 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 }
 
 /*
- * Sample a process (thread group) clock for the given group_leader task.
- * Must be called with tasklist_lock held for reading.
+ * Sample time for the given task.
+ * @which_clock:	Clock id.
+ * @p:			Target task. Must be thread group leader when
+ *			thread_group is true.
+ * @thread_group:	True if want to get process time.
+ * @add_delta:		True if want to get clock time,
+ *			otherwise, get timer time.
  */
-static int cpu_clock_sample_group(const clockid_t which_clock,
-				  struct task_struct *p,
-				  union cpu_time_count *cpu)
+static int do_cpu_clock_timer_sample(const clockid_t which_clock,
+				     struct task_struct *p,
+				     bool thread_group,
+				     bool clock_time,
+				     union cpu_time_count *cpu)
 {
 	struct task_cputime cputime;
 
+	if (thread_group) {
+		if (clock_time)
+			thread_group_cputime(p, true, &cputime);
+		else
+			thread_group_cputimer(p, &cputime);
+	} else
+		thread_cputime(p, clock_time, &cputime);
+
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, true, &cputime);
 		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
 	return 0;
 }
 
+/*
+ * Sample a per-thread clock time for the given task.
+ */
+static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, true, cpu);
+}
+
+/*
+ * Sample a per-process clock time for the given task.
+ */
+static int cpu_clock_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, true, cpu);
+}
+
+/*
+ * Sample a per-thread timer time for the given task.
+ */
+static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, false, cpu);
+}
+
+/*
+ * Sample a per-process timer time for the given task.
+ */
+static int cpu_timer_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, false, cpu);
+}
 
 static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp)
 {
@@ -632,33 +641,6 @@ static void cpu_timer_fire(struct k_itimer *timer)
 }
 
 /*
- * Sample a process (thread group) timer for the given group_leader task.
- * Must be called with tasklist_lock held for reading.
- */
-static int cpu_timer_sample_group(const clockid_t which_clock,
-				  struct task_struct *p,
-				  union cpu_time_count *cpu)
-{
-	struct task_cputime cputime;
-
-	thread_group_cputimer(p, &cputime);
-	switch (CPUCLOCK_WHICH(which_clock)) {
-	default:
-		return -EINVAL;
-	case CPUCLOCK_PROF:
-		cpu->cpu = cputime.utime + cputime.stime;
-		break;
-	case CPUCLOCK_VIRT:
-		cpu->cpu = cputime.utime;
-		break;
-	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime;
-		break;
-	}
-	return 0;
-}
-
-/*
  * Guts of sys_timer_settime for CPU timers.
  * This is called with the timer locked and interrupts disabled.
  * If we return TIMER_RETRY, it's necessary to release the timer's lock
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 69d3f6c..29ef7d7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -289,6 +289,17 @@ static __always_inline bool steal_account_process_tick(void)
 	return false;
 }
 
+void thread_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times)
+{
+	struct signal_struct *sig = tsk->signal;
+	cputime_t utime, stime;
+
+	task_cputime(tsk, &utime, &stime);
+	times->utime = utime;
+	times->stime = stime;
+	times->sum_exec_runtime = task_sched_runtime(tsk, add_delta);
+}
+
 /*
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
  * tasks (sum on group iteration) belonging to @tsk's group.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-04-30  3:17 ` [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
@ 2013-04-30  3:38   ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2013-04-30  3:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Martin Schwidefsky, Steven Rostedt,
	David Miller, Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro, kosaki.motohiro

Hm. I'm not sure why this path series start [patch 4/10]. maybe I need to review my script again.
anyway, patch 1-3 don't exist. sorry for confusing.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization
  2013-04-30  3:17 ` [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
@ 2013-04-30 13:34   ` Olivier Langlois
  0 siblings, 0 replies; 19+ messages in thread
From: Olivier Langlois @ 2013-04-30 13:34 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, Martin Schwidefsky, Steven Rostedt, David Miller,
	Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

On Mon, 2013-04-29 at 23:17 -0400, kosaki.motohiro@gmail.com wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Currently glibc's rt/tst-cputimer1 testcase is spradically fail because
> a timer created by timer_create() may fire earlier than it's specified.
> 
> posix_cpu_timer_set() uses "val" as current time for three purpose. 1)
> initialize sig->cputimer. 2) calculation "old" val. 3) calculations an
> expires time.
> 
> (1) and (2) should be used only committed time (i.e. without delta_exec)
> because run_posix_cpu_timers() don't care of delta_exec and we need
> consistency. But (3) need exact current time (aka cpu clock time) beucase
> an expires should be "now + timeout" by definition.
> 
> This patch makes separete two kind of "now".
> 
Kosaki,

I must admit that this solution is simpler to what I originally
proposed.

good work!
Olivier




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/10] sched: task_sched_runtime introduce micro optimization
  2013-04-30  3:17 ` [PATCH 09/10] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
@ 2013-05-01 11:00   ` Peter Zijlstra
  2013-05-01 11:50     ` Paul Turner
  2013-05-01 18:42     ` KOSAKI Motohiro
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2013-05-01 11:00 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, Olivier Langlois, Martin Schwidefsky,
	Steven Rostedt, David Miller, Thomas Gleixner,
	Frederic Weisbecker, Ingo Molnar, KOSAKI Motohiro

On Mon, Apr 29, 2013 at 11:17:17PM -0400, kosaki.motohiro@gmail.com wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> rq lock in task_sched_runtime() is necessary for two reasons. 1)
> accessing se.sum_exec_runtime is inatomic on 32bit and 2)
> do_task_delta_exec() require it.
> 
> And then, 64bit can avoid holds rq lock when add_delta is false.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  kernel/sched/core.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b817e6d..24ba1c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2657,6 +2657,12 @@ unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
>  	struct rq *rq;
>  	u64 ns = 0;
>  
> +	/* Micro optimization. */

Instead of the above; how about something like:

  /* 64-bit doesn't need locks to atomically read a 64bit value */

> +#ifdef CONFIG_64BIT
> +	if (!add_delta)
> +		return p->se.sum_exec_runtime;
> +#endif
> +
>  	rq = task_rq_lock(p, &flags);
>  	ns = p->se.sum_exec_runtime;
>  	if (add_delta)
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] posix-cpu-timers fixlet
  2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
                   ` (9 preceding siblings ...)
  2013-04-30  3:17 ` [PATCH 10/10] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro
@ 2013-05-01 11:01 ` Peter Zijlstra
  2013-05-01 18:45   ` KOSAKI Motohiro
  10 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2013-05-01 11:01 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, Olivier Langlois, Martin Schwidefsky,
	Steven Rostedt, David Miller, Thomas Gleixner,
	Frederic Weisbecker, Ingo Molnar

On Mon, Apr 29, 2013 at 11:17:08PM -0400, kosaki.motohiro@gmail.com wrote:
> Glibc's posix timer testcase found a lot of bugs in posix timer code. This series, hopefully,
> fixes all of them. All patches are independent each other logically.
> 
> 
> [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
> [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
> [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization
> [PATCH 4/7] posix-cpu-timers: timer functions must use timer time instead of clock time
> [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
> [PATCH 6/7] sched: task_sched_runtime introduce micro optimization
> [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}


Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Thanks for doing this Kosaki-San!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/10] sched: task_sched_runtime introduce micro optimization
  2013-05-01 11:00   ` Peter Zijlstra
@ 2013-05-01 11:50     ` Paul Turner
  2013-05-01 18:55       ` KOSAKI Motohiro
  2013-05-01 18:42     ` KOSAKI Motohiro
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Turner @ 2013-05-01 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, LKML, Olivier Langlois, Martin Schwidefsky,
	Steven Rostedt, David Miller, Thomas Gleixner,
	Frederic Weisbecker, Ingo Molnar, KOSAKI Motohiro

On Wed, May 1, 2013 at 4:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Apr 29, 2013 at 11:17:17PM -0400, kosaki.motohiro@gmail.com wrote:
>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>
>> rq lock in task_sched_runtime() is necessary for two reasons. 1)
>> accessing se.sum_exec_runtime is inatomic on 32bit and 2)
>> do_task_delta_exec() require it.
>>
>> And then, 64bit can avoid holds rq lock when add_delta is false.
>>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  kernel/sched/core.c |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index b817e6d..24ba1c6 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2657,6 +2657,12 @@ unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
>>       struct rq *rq;
>>       u64 ns = 0;
>>
>> +     /* Micro optimization. */
>
> Instead of the above; how about something like:
>
>   /* 64-bit doesn't need locks to atomically read a 64bit value */
>
>> +#ifdef CONFIG_64BIT
>> +     if (!add_delta)
>> +             return p->se.sum_exec_runtime;
>> +#endif


Stronger:

+#ifdef CONFIG_64BIT
+       if (!p->on_cpu)
+               return p->se.sum_exec_runtime;
+#endif

[ Or !p->on_cpu || !add_delta ].

We can take the racy read versus p->on_cpu since:
  If we race with it leaving cpu: we take lock, we're correct
  If we race with it entering cpu: unaccounted time ---> 0, this is
indistinguishable from the read occurring a few cycles earlier.

>>
>>       rq = task_rq_lock(p, &flags);
>>       ns = p->se.sum_exec_runtime;
>>       if (add_delta)
>> --
>> 1.7.1
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/10] sched: task_sched_runtime introduce micro optimization
  2013-05-01 11:00   ` Peter Zijlstra
  2013-05-01 11:50     ` Paul Turner
@ 2013-05-01 18:42     ` KOSAKI Motohiro
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2013-05-01 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, linux-kernel, Olivier Langlois,
	Martin Schwidefsky, Steven Rostedt, David Miller,
	Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	KOSAKI Motohiro

(5/1/13 7:00 AM), Peter Zijlstra wrote:
> On Mon, Apr 29, 2013 at 11:17:17PM -0400, kosaki.motohiro@gmail.com wrote:
>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>
>> rq lock in task_sched_runtime() is necessary for two reasons. 1)
>> accessing se.sum_exec_runtime is inatomic on 32bit and 2)
>> do_task_delta_exec() require it.
>>
>> And then, 64bit can avoid holds rq lock when add_delta is false.
>>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  kernel/sched/core.c |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index b817e6d..24ba1c6 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2657,6 +2657,12 @@ unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
>>  	struct rq *rq;
>>  	u64 ns = 0;
>>  
>> +	/* Micro optimization. */
> 
> Instead of the above; how about something like:
> 
>   /* 64-bit doesn't need locks to atomically read a 64bit value */

Looks nicer. Indeed.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/7] posix-cpu-timers fixlet
  2013-05-01 11:01 ` [PATCH v3 0/7] posix-cpu-timers fixlet Peter Zijlstra
@ 2013-05-01 18:45   ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2013-05-01 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, linux-kernel, Olivier Langlois,
	Martin Schwidefsky, Steven Rostedt, David Miller,
	Thomas Gleixner, Frederic Weisbecker, Ingo Molnar

(5/1/13 7:01 AM), Peter Zijlstra wrote:
> On Mon, Apr 29, 2013 at 11:17:08PM -0400, kosaki.motohiro@gmail.com wrote:
>> Glibc's posix timer testcase found a lot of bugs in posix timer code. This series, hopefully,
>> fixes all of them. All patches are independent each other logically.
>>
>>
>> [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
>> [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
>> [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization
>> [PATCH 4/7] posix-cpu-timers: timer functions must use timer time instead of clock time
>> [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
>> [PATCH 6/7] sched: task_sched_runtime introduce micro optimization
>> [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}
> 
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Thanks for doing this Kosaki-San!

Thanks.
I've alread found several spell mistake in changelog. I'll resend patches soon.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/10] sched: task_sched_runtime introduce micro optimization
  2013-05-01 11:50     ` Paul Turner
@ 2013-05-01 18:55       ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2013-05-01 18:55 UTC (permalink / raw)
  To: Paul Turner
  Cc: Peter Zijlstra, KOSAKI Motohiro, LKML, Olivier Langlois,
	Martin Schwidefsky, Steven Rostedt, David Miller,
	Thomas Gleixner, Frederic Weisbecker, Ingo Molnar,
	KOSAKI Motohiro

> Stronger:
> 
> +#ifdef CONFIG_64BIT
> +       if (!p->on_cpu)
> +               return p->se.sum_exec_runtime;
> +#endif
> 
> [ Or !p->on_cpu || !add_delta ].
> 
> We can take the racy read versus p->on_cpu since:
>   If we race with it leaving cpu: we take lock, we're correct
>   If we race with it entering cpu: unaccounted time ---> 0, this is
> indistinguishable from the read occurring a few cycles earlier.

OK, agreed. That's nice.



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-05-01 18:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-30  3:17 [PATCH v3 0/7] posix-cpu-timers fixlet kosaki.motohiro
2013-04-30  3:17 ` [PATCH 01/10] events: Protect access via task_subsys_state_check() kosaki.motohiro
2013-04-30  3:17 ` [PATCH 02/10] vm: add no-mmu vm_iomap_memory() stub kosaki.motohiro
2013-04-30  3:17 ` [PATCH 03/10] ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock kosaki.motohiro
2013-04-30  3:17 ` [PATCH 04/10] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
2013-04-30  3:38   ` KOSAKI Motohiro
2013-04-30  3:17 ` [PATCH 05/10] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
2013-04-30  3:17 ` [PATCH 06/10] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
2013-04-30 13:34   ` Olivier Langlois
2013-04-30  3:17 ` [PATCH 07/10] posix-cpu-timers: timer functions must use timer time instead of clock time kosaki.motohiro
2013-04-30  3:17 ` [PATCH 08/10] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
2013-04-30  3:17 ` [PATCH 09/10] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
2013-05-01 11:00   ` Peter Zijlstra
2013-05-01 11:50     ` Paul Turner
2013-05-01 18:55       ` KOSAKI Motohiro
2013-05-01 18:42     ` KOSAKI Motohiro
2013-04-30  3:17 ` [PATCH 10/10] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro
2013-05-01 11:01 ` [PATCH v3 0/7] posix-cpu-timers fixlet Peter Zijlstra
2013-05-01 18:45   ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).