All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()
@ 2016-04-14 12:29 Sebastian Andrzej Siewior
  2016-04-14 13:07 ` Rik van Riel
  2016-04-14 13:18 ` Frederic Weisbecker
  0 siblings, 2 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-04-14 12:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Sebastian Andrzej Siewior, Peter Zijlstra,
	Frederic Weisbecker, Rik van Riel, Thomas Gleixner

A while ago vtime_init_idle() used to invoke sched_clock_cpu() and had
write_seqlock_irqsave() because sched_clock_cpu() required to be called
with interrupts off.
This requirement is written before the body of the function and was
introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock() API
and improve documentation"). This requirement has been dropped in 2013
via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the clocks")
but the body of the function still has the comment.

Now, vtime_init_idle() was converted from seqlock to seqcount via
b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
while doing so the IRQ-off region around sched_clock_cpu() was preserved
while not strictly required (given the first part of this commit).
A little later, sched_clock_cpu() was replaced with jiffies via
ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
granularity").
Based on this events I assume it is safe to drop the local_irq_safe()
section.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/cputime.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5498d5..816cd4ef5873 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -785,14 +785,10 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 void vtime_init_idle(struct task_struct *t, int cpu)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
 	write_seqcount_begin(&t->vtime_seqcount);
 	t->vtime_snap_whence = VTIME_SYS;
 	t->vtime_snap = jiffies;
 	write_seqcount_end(&t->vtime_seqcount);
-	local_irq_restore(flags);
 }
 
 cputime_t task_gtime(struct task_struct *t)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()
  2016-04-14 12:29 [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle() Sebastian Andrzej Siewior
@ 2016-04-14 13:07 ` Rik van Riel
  2016-04-14 13:18 ` Frederic Weisbecker
  1 sibling, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2016-04-14 13:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]

On Thu, 2016-04-14 at 14:29 +0200, Sebastian Andrzej Siewior wrote:
> A while ago vtime_init_idle() used to invoke sched_clock_cpu() and
> had
> write_seqlock_irqsave() because sched_clock_cpu() required to be
> called
> with interrupts off.
> This requirement is written before the body of the function and was
> introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock()
> API
> and improve documentation"). This requirement has been dropped in
> 2013
> via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the
> clocks")
> but the body of the function still has the comment.
> 
> Now, vtime_init_idle() was converted from seqlock to seqcount via
> b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
> while doing so the IRQ-off region around sched_clock_cpu() was
> preserved
> while not strictly required (given the first part of this commit).
> A little later, sched_clock_cpu() was replaced with jiffies via
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
> granularity").
> Based on this events I assume it is safe to drop the local_irq_safe()
> section.
> 
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()
  2016-04-14 12:29 [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle() Sebastian Andrzej Siewior
  2016-04-14 13:07 ` Rik van Riel
@ 2016-04-14 13:18 ` Frederic Weisbecker
  2016-04-14 13:41   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2016-04-14 13:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Peter Zijlstra,
	Rik van Riel, Thomas Gleixner

On Thu, Apr 14, 2016 at 02:29:14PM +0200, Sebastian Andrzej Siewior wrote:
> A while ago vtime_init_idle() used to invoke sched_clock_cpu() and had
> write_seqlock_irqsave() because sched_clock_cpu() required to be called
> with interrupts off.
> This requirement is written before the body of the function and was
> introduced in 2010 via c676329abb2b ("sched_clock: Add local_clock() API
> and improve documentation"). This requirement has been dropped in 2013
> via ef08f0fff876 ("sched/clock: Remove local_irq_disable() from the clocks")
> but the body of the function still has the comment.
> 
> Now, vtime_init_idle() was converted from seqlock to seqcount via
> b7ce2277f087 ("sched/cputime: Convert vtime_seqlock to seqcount") and
> while doing so the IRQ-off region around sched_clock_cpu() was preserved
> while not strictly required (given the first part of this commit).
> A little later, sched_clock_cpu() was replaced with jiffies via
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy
> granularity").
> Based on this events I assume it is safe to drop the local_irq_safe()
> section.

I think I tried this but I got lockdep warnings because the other updaters
happen on IRQ.

In practive we can't be interrupted since the idle task hasn't even started.

Now we can probably get rid of the write_seqcount_stuff here because what
is initialized there is supposed to be visible by the idle task once it runs.

But I'm more worried about readers.

> 
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/sched/cputime.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 75f98c5498d5..816cd4ef5873 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -785,14 +785,10 @@ void arch_vtime_task_switch(struct task_struct *prev)
>  
>  void vtime_init_idle(struct task_struct *t, int cpu)
>  {
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
>  	write_seqcount_begin(&t->vtime_seqcount);
>  	t->vtime_snap_whence = VTIME_SYS;
>  	t->vtime_snap = jiffies;
>  	write_seqcount_end(&t->vtime_seqcount);
> -	local_irq_restore(flags);
>  }
>  
>  cputime_t task_gtime(struct task_struct *t)
> -- 
> 2.8.0.rc3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()
  2016-04-14 13:18 ` Frederic Weisbecker
@ 2016-04-14 13:41   ` Sebastian Andrzej Siewior
  2016-04-14 14:23     ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-04-14 13:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Peter Zijlstra,
	Rik van Riel, Thomas Gleixner

On 04/14/2016 03:18 PM, Frederic Weisbecker wrote:
> I think I tried this but I got lockdep warnings because the other updaters
> happen on IRQ.

I booted and played a little but lockdep didn't scream here so far.

> In practive we can't be interrupted since the idle task hasn't even started.
> 
> Now we can probably get rid of the write_seqcount_stuff here because what
> is initialized there is supposed to be visible by the idle task once it runs.
> 
> But I'm more worried about readers.

Most idle_thread_get() callers happen at boottime. Then I see two
during CPU-up (cpu-hotplug) at runtime. One from _cpu_up() via
idle_thread_get() and the second via bringup_cpu() and the second in
bringup_cpu().

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()
  2016-04-14 13:41   ` Sebastian Andrzej Siewior
@ 2016-04-14 14:23     ` Frederic Weisbecker
  0 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2016-04-14 14:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Peter Zijlstra,
	Rik van Riel, Thomas Gleixner

On Thu, Apr 14, 2016 at 03:41:17PM +0200, Sebastian Andrzej Siewior wrote:
> On 04/14/2016 03:18 PM, Frederic Weisbecker wrote:
> > I think I tried this but I got lockdep warnings because the other updaters
> > happen on IRQ.
> 
> I booted and played a little but lockdep didn't scream here so far.

You need to select CONFIG_NO_HZ_FULL and select some CPU in nohz_full=
in order to trigger it:

[    0.759869] =================================
[    0.764220] [ INFO: inconsistent lock state ]
[    0.768571] 4.6.0-rc2+ #249 Not tainted
[    0.772404] ---------------------------------
[    0.776755] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[    0.782751] cpuhp/7/49 [HC1[1]:SC0[0]:HE0:SE1] takes:
[    0.787795]  (&p->vtime_seqcount){?.+...}, at: [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[    0.795867] {HARDIRQ-ON-W} state was registered at:
[    0.800737]   [<ffffffff810fab45>] __lock_acquire+0x895/0x1d00
[    0.806579]   [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[    0.812075]   [<ffffffff810df0a0>] vtime_init_idle+0x40/0x80
[    0.817742]   [<ffffffff810daf9e>] init_idle+0x2ee/0x380
[    0.823063]   [<ffffffff810a495e>] fork_idle+0xae/0xc0
[    0.828213]   [<ffffffff81f7a0f4>] idle_threads_init+0x6b/0xa4
[    0.834054]   [<ffffffff81f7e1c5>] smp_init+0xf/0x82
[    0.839028]   [<ffffffff81f5b03c>] kernel_init_freeable+0x9b/0x1ab
[    0.845217]   [<ffffffff8193c6be>] kernel_init+0xe/0x100
[    0.850541]   [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[    0.856034] irq event stamp: 28
[    0.859173] hardirqs last  enabled at (27): [<ffffffff8194a9ff>] _raw_spin_unlock_irqrestore+0x5f/0x80
[    0.868482] hardirqs last disabled at (28): [<ffff88021652bcc8>] 0xffff88021652bcc8
[    0.876143] softirqs last  enabled at (0): [<ffffffff810a2f76>] copy_process.part.38+0x336/0x1c40
[    0.885020] softirqs last disabled at (0): [<          (null)>]           (null)
[    0.892420] 
[    0.892420] other info that might help us debug this:
[    0.898940]  Possible unsafe locking scenario:
[    0.898940] 
[    0.904849]        CPU0
[    0.907294]        ----
[    0.909737]   lock(&p->vtime_seqcount);
[    0.913605]   <Interrupt>
[    0.916222]     lock(&p->vtime_seqcount);
[    0.920261] 
[    0.920261]  *** DEADLOCK ***
[    0.920261] 
[    0.926173] 1 lock held by cpuhp/7/49:
[    0.929915]  #0:  (smpboot_threads_lock){+.+.+.}, at: [<ffffffff810cec9d>] smpboot_unpark_threads+0x1d/0x90
[    0.939735] 
[    0.939735] stack backtrace:
[    0.944087] CPU: 7 PID: 49 Comm: cpuhp/7 Not tainted 4.6.0-rc2+ #249
[    0.950433] Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
[    0.958250]  0000000000000000 ffff88021fbc3d80 ffffffff8141d403 ffff8802165243c0
[    0.965704]  ffffffff827eebd0 ffff88021fbc3dd0 ffffffff811a1743 0000000000000000
[    0.973159]  0000000000000000 0000000000000001 0000000000000000 ffff880216524bc0
[    0.980613] Call Trace:
[    0.983056]  <IRQ>  [<ffffffff8141d403>] dump_stack+0x85/0xc2
[    0.988813]  [<ffffffff811a1743>] print_usage_bug+0x1f3/0x208
[    0.994549]  [<ffffffff810f9c69>] mark_lock+0x649/0x6c0
[    0.999768]  [<ffffffff810f8e50>] ? check_usage_backwards+0x140/0x140
[    1.006199]  [<ffffffff810fad2b>] __lock_acquire+0xa7b/0x1d00
[    1.011938]  [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[    1.017328]  [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[    1.022548]  [<ffffffff810ded00>] vtime_gen_account_irq_exit+0x40/0xb0
[    1.029072]  [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[    1.034292]  [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[    1.039335]  [<ffffffff8103b303>] smp_irq_work_interrupt+0x33/0x40
[    1.045506]  [<ffffffff8194cd21>] irq_work_interrupt+0x91/0xa0
[    1.051330]  <EOI>  [<ffffffff8194aa04>] ? _raw_spin_unlock_irqrestore+0x64/0x80
[    1.058734]  [<ffffffff810d69b9>] try_to_wake_up+0x49/0x630
[    1.064299]  [<ffffffff8194a9ea>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[    1.071084]  [<ffffffff810cec80>] ? smpboot_create_threads+0x90/0x90
[    1.077429]  [<ffffffff810d7ea0>] wake_up_state+0x10/0x20
[    1.082819]  [<ffffffff810caf47>] __kthread_unpark+0x67/0x70
[    1.088472]  [<ffffffff810caf70>] kthread_unpark+0x20/0x30
[    1.093958]  [<ffffffff810cecea>] smpboot_unpark_threads+0x6a/0x90
[    1.100128]  [<ffffffff810a68f9>] cpuhp_invoke_callback+0x69/0x2e0
[    1.106302]  [<ffffffff810a6bb3>] cpuhp_up_callbacks+0x43/0xb0
[    1.112133]  [<ffffffff810a6dba>] cpuhp_thread_fun+0xda/0xf0
[    1.117784]  [<ffffffff810ce644>] smpboot_thread_fn+0x1e4/0x300
[    1.123696]  [<ffffffff810ce460>] ? sort_range+0x30/0x30
[    1.128999]  [<ffffffff810cabfa>] kthread+0x10a/0x120
[    1.134044]  [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[    1.139434]  [<ffffffff810caaf0>] ? kthread_create_on_node+0x200/0x200


> 
> > In practive we can't be interrupted since the idle task hasn't even started.
> > 
> > Now we can probably get rid of the write_seqcount_stuff here because what
> > is initialized there is supposed to be visible by the idle task once it runs.
> > 
> > But I'm more worried about readers.
> 
> Most idle_thread_get() callers happen at boottime. Then I see two
> during CPU-up (cpu-hotplug) at runtime. One from _cpu_up() via
> idle_thread_get() and the second via bringup_cpu() and the second in
> bringup_cpu()

Right, it's on CPU bootup/hotplug. I'm just worried that, if we erase that lock on
CPU (re-)initialization, the readers (ie: task_cputime()) may read unlucky sequences.

Now probably there is not much opportunity to call task_cputime() on secondary idle
tasks, they don't seem to be in /proc but perhaps they are accessible elsewhere.

I'd rather be paranoid and keep the code that way until we have the final
proof that it's not dangerous.

That said we could skip this initialization based on context tracking/vtime static key,
using vtime_accounting_enabled().

> 
> Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-14 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 12:29 [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle() Sebastian Andrzej Siewior
2016-04-14 13:07 ` Rik van Riel
2016-04-14 13:18 ` Frederic Weisbecker
2016-04-14 13:41   ` Sebastian Andrzej Siewior
2016-04-14 14:23     ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.