All of lore.kernel.org
 help / color / mirror / Atom feed
* Weird sched_clock behaviour during boot with -rc1
@ 2014-02-04 18:36 Will Deacon
  2014-02-04 20:46 ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2014-02-04 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys,

Booting -rc1 on my TC2 gives the following strange entries in the dmesg:


  Uncompressing Linux... done, booting the kernel.
  [    0.000000] Booting Linux on physical CPU 0x0

  [...]

  [    0.000000]   HighMem zone: 329728 pages, LIFO batch:31
  [    7.789662] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956969942ns
  [    0.000129] PERCPU: Embedded 9 pages/cpu @ee7bd000 s12800 r8192 d15872 u36864

  [...]

  [    0.868297] NR_IRQS:16 nr_irqs:16 16
  [    0.886350] Architected cp15 timer(s) running at 24.00MHz (phys).
  [ 2915.164998] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 2863311519744ns
  [    0.000002] Switching to timer-based delay loop
  [    0.014249] Console: colour dummy device 80x30


so it looks like something whacky goes on during sched_clock registration.
Sure enough, we're doing a pr_info in-between updating cs.* and calling
update_sched_clock(), so moving the print sorts things out (diff below).

What I can't figure out is why this has suddenly started happening with
3.14. Any ideas?

Cheers,

Will

--->8

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb36464281..f3de5d113afc 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -147,9 +147,6 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 
        /* calculate the ns resolution of this counter */
        res = cyc_to_ns(1ULL, cd.mult, cd.shift);
-       pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
-               bits, r, r_unit, res, wrap);
-
        update_sched_clock();
 
        /*
@@ -161,6 +158,9 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
        if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
                enable_sched_clock_irqtime();
 
+       pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
+               bits, r, r_unit, res, wrap);
+
        pr_debug("Registered %pF as sched_clock source\n", read);
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Weird sched_clock behaviour during boot with -rc1
  2014-02-04 18:36 Weird sched_clock behaviour during boot with -rc1 Will Deacon
@ 2014-02-04 20:46 ` John Stultz
  2014-02-04 22:00   ` Stephen Boyd
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2014-02-04 20:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/04/2014 10:36 AM, Will Deacon wrote:
> Hi guys,
>
> Booting -rc1 on my TC2 gives the following strange entries in the dmesg:
>
>
>   Uncompressing Linux... done, booting the kernel.
>   [    0.000000] Booting Linux on physical CPU 0x0
>
>   [...]
>
>   [    0.000000]   HighMem zone: 329728 pages, LIFO batch:31
>   [    7.789662] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956969942ns
>   [    0.000129] PERCPU: Embedded 9 pages/cpu @ee7bd000 s12800 r8192 d15872 u36864
>
>   [...]
>
>   [    0.868297] NR_IRQS:16 nr_irqs:16 16
>   [    0.886350] Architected cp15 timer(s) running at 24.00MHz (phys).
>   [ 2915.164998] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 2863311519744ns
>   [    0.000002] Switching to timer-based delay loop
>   [    0.014249] Console: colour dummy device 80x30
>
>
> so it looks like something whacky goes on during sched_clock registration.
> Sure enough, we're doing a pr_info in-between updating cs.* and calling
> update_sched_clock(), so moving the print sorts things out (diff below).

Yea... we have to be particularly careful with sched_clock to avoid
locks since we don't want to deadlock, but in this case
sched_clock_register is a little too relaxed here.

Stephen: Would it make sense to set cd.suspended = true at the top of
the registration? That should block any sched_clock calls from getting
half-updated data, but still allow the sched_clock_update function to work.


> What I can't figure out is why this has suddenly started happening with
> 3.14. Any ideas?

No clue. I'm guessing something like timing changes in the printk paths
that call sched_clock?

I suspect your patch to move the print down will also be a good idea
along with the suspending sched_clock while we register new ones.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Weird sched_clock behaviour during boot with -rc1
  2014-02-04 20:46 ` John Stultz
@ 2014-02-04 22:00   ` Stephen Boyd
  2014-02-05 21:47     ` Josh Cartwright
  2014-02-07 18:23     ` John Stultz
  0 siblings, 2 replies; 19+ messages in thread
From: Stephen Boyd @ 2014-02-04 22:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/04, John Stultz wrote:
> On 02/04/2014 10:36 AM, Will Deacon wrote:
> > Hi guys,
> >
> > Booting -rc1 on my TC2 gives the following strange entries in the dmesg:
> >
> >
> >   Uncompressing Linux... done, booting the kernel.
> >   [    0.000000] Booting Linux on physical CPU 0x0
> >
> >   [...]
> >
> >   [    0.000000]   HighMem zone: 329728 pages, LIFO batch:31
> >   [    7.789662] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956969942ns
> >   [    0.000129] PERCPU: Embedded 9 pages/cpu @ee7bd000 s12800 r8192 d15872 u36864
> >
> >   [...]
> >
> >   [    0.868297] NR_IRQS:16 nr_irqs:16 16
> >   [    0.886350] Architected cp15 timer(s) running at 24.00MHz (phys).
> >   [ 2915.164998] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 2863311519744ns
> >   [    0.000002] Switching to timer-based delay loop
> >   [    0.014249] Console: colour dummy device 80x30
> >
> >
> > so it looks like something whacky goes on during sched_clock registration.
> > Sure enough, we're doing a pr_info in-between updating cs.* and calling
> > update_sched_clock(), so moving the print sorts things out (diff below).
> 
> Yea... we have to be particularly careful with sched_clock to avoid
> locks since we don't want to deadlock, but in this case
> sched_clock_register is a little too relaxed here.
> 
> Stephen: Would it make sense to set cd.suspended = true at the top of
> the registration? That should block any sched_clock calls from getting
> half-updated data, but still allow the sched_clock_update function to work.
> 

That would work, but why can't we just hold the write seqlock
during the registration? We would need to make a lockeless
version of update_sched_clock() but that doesn't look too hard.
It might actually turn out nicer because we call
update_sched_clock() here just to set the epoch_cyc but we have
to reset the epoch_ns back to 0 to start the count off right.

How about this? The only concern is calling read_sched_clock()
inside the seqlock, but I don't think that's a concern and if it
is we can call it outside the lock at the beginning of this
function.

----8<----
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb36464281..ae8f10a02c32 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -124,6 +124,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 		return;
 
 	WARN_ON(!irqs_disabled());
+	raw_write_seqcount_begin(&cd.seq);
 	read_sched_clock = read;
 	sched_clock_mask = CLOCKSOURCE_MASK(bits);
 	cd.rate = rate;
@@ -147,15 +148,16 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 
 	/* calculate the ns resolution of this counter */
 	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
-	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
-		bits, r, r_unit, res, wrap);
-
-	update_sched_clock();
 
 	/*
 	 * Ensure that sched_clock() starts off at 0ns
 	 */
 	cd.epoch_ns = 0;
+	cd.epoch_cyc = read_sched_clock();
+	raw_write_seqcount_end(&cd.seq);
+
+	pr_info("sched_clock: %u bits@%lu%cHz, resolution %lluns, wraps every %lluns\n",
+		bits, r, r_unit, res, wrap);
 
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Weird sched_clock behaviour during boot with -rc1
  2014-02-04 22:00   ` Stephen Boyd
@ 2014-02-05 21:47     ` Josh Cartwright
  2014-02-07 18:23     ` John Stultz
  1 sibling, 0 replies; 19+ messages in thread
From: Josh Cartwright @ 2014-02-05 21:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 04, 2014 at 02:00:45PM -0800, Stephen Boyd wrote:
> On 02/04, John Stultz wrote:
> > On 02/04/2014 10:36 AM, Will Deacon wrote:
> > > Hi guys,
> > >
> > > Booting -rc1 on my TC2 gives the following strange entries in the dmesg:
> > >
> > >
> > >   Uncompressing Linux... done, booting the kernel.
> > >   [    0.000000] Booting Linux on physical CPU 0x0
> > >
> > >   [...]
> > >
> > >   [    0.000000]   HighMem zone: 329728 pages, LIFO batch:31
> > >   [    7.789662] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956969942ns
> > >   [    0.000129] PERCPU: Embedded 9 pages/cpu @ee7bd000 s12800 r8192 d15872 u36864
> > >
> > >   [...]
> > >
> > >   [    0.868297] NR_IRQS:16 nr_irqs:16 16
> > >   [    0.886350] Architected cp15 timer(s) running at 24.00MHz (phys).
> > >   [ 2915.164998] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 2863311519744ns
> > >   [    0.000002] Switching to timer-based delay loop
> > >   [    0.014249] Console: colour dummy device 80x30
> > >
> > >
> > > so it looks like something whacky goes on during sched_clock registration.
> > > Sure enough, we're doing a pr_info in-between updating cs.* and calling
> > > update_sched_clock(), so moving the print sorts things out (diff below).
> > 
> > Yea... we have to be particularly careful with sched_clock to avoid
> > locks since we don't want to deadlock, but in this case
> > sched_clock_register is a little too relaxed here.
> > 
> > Stephen: Would it make sense to set cd.suspended = true at the top of
> > the registration? That should block any sched_clock calls from getting
> > half-updated data, but still allow the sched_clock_update function to work.
> > 
> 
> That would work, but why can't we just hold the write seqlock
> during the registration? We would need to make a lockeless
> version of update_sched_clock() but that doesn't look too hard.
> It might actually turn out nicer because we call
> update_sched_clock() here just to set the epoch_cyc but we have
> to reset the epoch_ns back to 0 to start the count off right.
> 
> How about this? The only concern is calling read_sched_clock()
> inside the seqlock, but I don't think that's a concern and if it
> is we can call it outside the lock at the beginning of this
> function.

If we go down this route, it looks there is an opportunity to rearrange
the logic a bit to front load the clocksource calculations, minimizing
the amount of time the lock is held.

----8<----
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index e5387a0..9bad1f7 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -116,20 +116,40 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 void __init sched_clock_register(u64 (*read)(void), int bits,
 				 unsigned long rate)
 {
+	u64 res, wrap, new_mask;
+	u32 new_mult, new_shift;
+	ktime_t new_wrap_kt;
 	unsigned long r;
-	u64 res, wrap;
 	char r_unit;
 
 	if (cd.rate > rate)
 		return;
 
 	WARN_ON(!irqs_disabled());
+
+	/* calculate the mult/shift to convert counter ticks to ns. */
+	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
+
+	new_mask = CLOCKSOURCE_MASK(bits);
+
+	/* calculate how many ns until we wrap */
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+
+	raw_write_seqcount_begin(&cd.seq);
 	read_sched_clock = read;
-	sched_clock_mask = CLOCKSOURCE_MASK(bits);
+	sched_clock_mask = new_mask;
 	cd.rate = rate;
+	cd.wrap_kt = new_wrap_kt;
+	cd.mult = new_mult;
+	cd.shift = new_shift;
 
-	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
+	/*
+	 * Ensure that sched_clock() starts off at 0ns
+	 */
+	cd.epoch_ns = 0;
+	cd.epoch_cyc = read_sched_clock();
+	raw_write_seqcount_end(&cd.seq);
 
 	r = rate;
 	/*
@@ -145,22 +165,12 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	} else
 		r_unit = ' ';
 
-	/* calculate how many ns until we wrap */
-	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
-	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
-
 	/* calculate the ns resolution of this counter */
-	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
+	res = cyc_to_ns(1ULL, new_mult, new_shift);
+
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	update_sched_clock();
-
-	/*
-	 * Ensure that sched_clock() starts off at 0ns
-	 */
-	cd.epoch_ns = 0;
-
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
 		enable_sched_clock_irqtime();
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Weird sched_clock behaviour during boot with -rc1
  2014-02-04 22:00   ` Stephen Boyd
  2014-02-05 21:47     ` Josh Cartwright
@ 2014-02-07 18:23     ` John Stultz
  2014-02-07 19:37       ` Stephen Boyd
  2014-02-07 20:48       ` [PATCH] sched_clock: Prevent callers from seeing half-updated data Stephen Boyd
  1 sibling, 2 replies; 19+ messages in thread
From: John Stultz @ 2014-02-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/04/2014 02:00 PM, Stephen Boyd wrote:
> On 02/04, John Stultz wrote:
>> On 02/04/2014 10:36 AM, Will Deacon wrote:
>>> Hi guys,
>>>
>>> Booting -rc1 on my TC2 gives the following strange entries in the dmesg:
>>>
>>>
>>>   Uncompressing Linux... done, booting the kernel.
>>>   [    0.000000] Booting Linux on physical CPU 0x0
>>>
>>>   [...]
>>>
>>>   [    0.000000]   HighMem zone: 329728 pages, LIFO batch:31
>>>   [    7.789662] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956969942ns
>>>   [    0.000129] PERCPU: Embedded 9 pages/cpu @ee7bd000 s12800 r8192 d15872 u36864
>>>
>>>   [...]
>>>
>>>   [    0.868297] NR_IRQS:16 nr_irqs:16 16
>>>   [    0.886350] Architected cp15 timer(s) running at 24.00MHz (phys).
>>>   [ 2915.164998] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 2863311519744ns
>>>   [    0.000002] Switching to timer-based delay loop
>>>   [    0.014249] Console: colour dummy device 80x30
>>>
>>>
>>> so it looks like something whacky goes on during sched_clock registration.
>>> Sure enough, we're doing a pr_info in-between updating cs.* and calling
>>> update_sched_clock(), so moving the print sorts things out (diff below).
>> Yea... we have to be particularly careful with sched_clock to avoid
>> locks since we don't want to deadlock, but in this case
>> sched_clock_register is a little too relaxed here.
>>
>> Stephen: Would it make sense to set cd.suspended = true at the top of
>> the registration? That should block any sched_clock calls from getting
>> half-updated data, but still allow the sched_clock_update function to work.
>>
> That would work, but why can't we just hold the write seqlock
> during the registration? We would need to make a lockeless
> version of update_sched_clock() but that doesn't look too hard.
> It might actually turn out nicer because we call
> update_sched_clock() here just to set the epoch_cyc but we have
> to reset the epoch_ns back to 0 to start the count off right.
>
> How about this? The only concern is calling read_sched_clock()
> inside the seqlock, but I don't think that's a concern and if it
> is we can call it outside the lock at the beginning of this
> function.

Hey Stephen,

So whats the story here? Are we waiting on an ack from Will or would you
rather go with Josh's approach?

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Weird sched_clock behaviour during boot with -rc1
  2014-02-07 18:23     ` John Stultz
@ 2014-02-07 19:37       ` Stephen Boyd
  2014-02-07 20:48       ` [PATCH] sched_clock: Prevent callers from seeing half-updated data Stephen Boyd
  1 sibling, 0 replies; 19+ messages in thread
From: Stephen Boyd @ 2014-02-07 19:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/07, John Stultz wrote:
> On 02/04/2014 02:00 PM, Stephen Boyd wrote:
> >>
> > That would work, but why can't we just hold the write seqlock
> > during the registration? We would need to make a lockeless
> > version of update_sched_clock() but that doesn't look too hard.
> > It might actually turn out nicer because we call
> > update_sched_clock() here just to set the epoch_cyc but we have
> > to reset the epoch_ns back to 0 to start the count off right.
> >
> > How about this? The only concern is calling read_sched_clock()
> > inside the seqlock, but I don't think that's a concern and if it
> > is we can call it outside the lock at the beginning of this
> > function.
> 
> So whats the story here? Are we waiting on an ack from Will or would you
> rather go with Josh's approach?

An acked-by/tested-by from Will would be good. I'll cook up a
patch right now to do everything that has been mentioned in this
thread.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 18:23     ` John Stultz
  2014-02-07 19:37       ` Stephen Boyd
@ 2014-02-07 20:48       ` Stephen Boyd
  2014-02-07 22:22         ` Stephen Boyd
  2014-02-07 22:28         ` [PATCH v2] " Stephen Boyd
  1 sibling, 2 replies; 19+ messages in thread
From: Stephen Boyd @ 2014-02-07 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

If two sched_clock sources are registered we may end up in a
situation where a call to sched_clock() may be accessing the
epoch cycle count for the old counter and the cycle count for the
new counter. This can lead to confusing results where
sched_clock() values jump and then are reset to 0 (due to the way
the registration function forces the epoch_ns to be 0). Fix this
by reorganizing the registration function to hold the seqlock for
as short a time as possible while we update the clock_data
structure for a new counter and stop resetting the epoch_ns count
to 0.

Reported-by: Will Deacon <will.deacon@arm.com>
Cc: Josh Cartwright <joshc@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 kernel/time/sched_clock.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb36464281..36ffce3a4b83 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -116,20 +116,38 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 void __init sched_clock_register(u64 (*read)(void), int bits,
 				 unsigned long rate)
 {
+	u64 res, wrap, new_mask, new_epoch;
+	u32 new_mult, new_shift;
+	ktime_t new_wrap_kt;
 	unsigned long r;
-	u64 res, wrap;
 	char r_unit;
 
 	if (cd.rate > rate)
 		return;
 
 	WARN_ON(!irqs_disabled());
-	read_sched_clock = read;
-	sched_clock_mask = CLOCKSOURCE_MASK(bits);
-	cd.rate = rate;
 
 	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
+	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
+
+	new_mask = CLOCKSOURCE_MASK(bits);
+
+	/* calculate how many ns until we wrap */
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+
+	/* update epoch for the new counter */
+	new_epoch = read();
+
+	raw_write_seqcount_begin(&cd.seq);
+	read_sched_clock = read;
+	sched_clock_mask = new_mask;
+	cd.rate = rate;
+	cd.wrap_kt = new_wrap_kt;
+	cd.mult = new_mult;
+	cd.shift = new_shift;
+	cd.epoch_cyc = new_epoch;
+	raw_write_seqcount_end(&cd.seq);
 
 	r = rate;
 	if (r >= 4000000) {
@@ -141,22 +159,12 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	} else
 		r_unit = ' ';
 
-	/* calculate how many ns until we wrap */
-	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
-	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
-
 	/* calculate the ns resolution of this counter */
-	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
+	res = cyc_to_ns(1ULL, new_mult, new_shift);
+
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	update_sched_clock();
-
-	/*
-	 * Ensure that sched_clock() starts off at 0ns
-	 */
-	cd.epoch_ns = 0;
-
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
 		enable_sched_clock_irqtime();
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 20:48       ` [PATCH] sched_clock: Prevent callers from seeing half-updated data Stephen Boyd
@ 2014-02-07 22:22         ` Stephen Boyd
  2014-02-07 22:28           ` John Stultz
  2014-02-07 22:28         ` [PATCH v2] " Stephen Boyd
  1 sibling, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-02-07 22:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/07, Stephen Boyd wrote:
> If two sched_clock sources are registered we may end up in a
> situation where a call to sched_clock() may be accessing the
> epoch cycle count for the old counter and the cycle count for the
> new counter. This can lead to confusing results where
> sched_clock() values jump and then are reset to 0 (due to the way
> the registration function forces the epoch_ns to be 0). Fix this
> by reorganizing the registration function to hold the seqlock for
> as short a time as possible while we update the clock_data
> structure for a new counter and stop resetting the epoch_ns count
> to 0.

Hmm.. This won't properly accumulate time. We need to put
whatever time has elapsed into epoch_ns when we register the new
counter for this to work. I don't have a board with this
configuration but I'll send a v2 that should fix this. Hopefully
Will can test it.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 22:22         ` Stephen Boyd
@ 2014-02-07 22:28           ` John Stultz
  2014-02-11  6:49             ` Stephen Boyd
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2014-02-07 22:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/07/2014 02:22 PM, Stephen Boyd wrote:
> On 02/07, Stephen Boyd wrote:
>> If two sched_clock sources are registered we may end up in a
>> situation where a call to sched_clock() may be accessing the
>> epoch cycle count for the old counter and the cycle count for the
>> new counter. This can lead to confusing results where
>> sched_clock() values jump and then are reset to 0 (due to the way
>> the registration function forces the epoch_ns to be 0). Fix this
>> by reorganizing the registration function to hold the seqlock for
>> as short a time as possible while we update the clock_data
>> structure for a new counter and stop resetting the epoch_ns count
>> to 0.
> Hmm.. This won't properly accumulate time. We need to put
> whatever time has elapsed into epoch_ns when we register the new
> counter for this to work. I don't have a board with this
> configuration but I'll send a v2 that should fix this. Hopefully
> Will can test it.

Also maybe clarify in the commit message that this is a result of not
having the necessary locking in place in the registration code (likely
due to it not really being required in the single clock case), just so
Ingo and others have some more context as to why this is needed now and
wasn't hit before.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 20:48       ` [PATCH] sched_clock: Prevent callers from seeing half-updated data Stephen Boyd
  2014-02-07 22:22         ` Stephen Boyd
@ 2014-02-07 22:28         ` Stephen Boyd
  2014-02-10 11:14           ` Will Deacon
  1 sibling, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-02-07 22:28 UTC (permalink / raw)
  To: linux-arm-kernel

If two sched_clock sources are registered we may end up in a
situation where a call to sched_clock() may be accessing the
epoch cycle count for the old counter and the cycle count for the
new counter. This can lead to confusing results where
sched_clock() values jump and then are reset to 0 (due to the way
the registration function forces the epoch_ns to be 0). Fix this
by reorganizing the registration function to hold the seqlock for
as short a time as possible while we update the clock_data
structure for a new counter. We also put any accumulated time
into epoch_ns instead of resetting the time to 0 so that the clock
doesn't reset after each successful registration.

Reported-by: Will Deacon <will.deacon@arm.com>
Cc: Josh Cartwright <joshc@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---

Changes since v1:
 * Put elapsed time into epoch_ns 

 kernel/time/sched_clock.c | 46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb36464281..4d23dc4d8139 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -116,20 +116,42 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 void __init sched_clock_register(u64 (*read)(void), int bits,
 				 unsigned long rate)
 {
+	u64 res, wrap, new_mask, new_epoch, cyc, ns;
+	u32 new_mult, new_shift;
+	ktime_t new_wrap_kt;
 	unsigned long r;
-	u64 res, wrap;
 	char r_unit;
 
 	if (cd.rate > rate)
 		return;
 
 	WARN_ON(!irqs_disabled());
-	read_sched_clock = read;
-	sched_clock_mask = CLOCKSOURCE_MASK(bits);
-	cd.rate = rate;
 
 	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
+	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
+
+	new_mask = CLOCKSOURCE_MASK(bits);
+
+	/* calculate how many ns until we wrap */
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+
+	/* update epoch for new counter and update epoch_ns from old counter*/
+	new_epoch = read();
+	cyc = read_sched_clock();
+	ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask,
+			  cd.mult, cd.shift);
+
+	raw_write_seqcount_begin(&cd.seq);
+	read_sched_clock = read;
+	sched_clock_mask = new_mask;
+	cd.rate = rate;
+	cd.wrap_kt = new_wrap_kt;
+	cd.mult = new_mult;
+	cd.shift = new_shift;
+	cd.epoch_cyc = new_epoch;
+	cd.epoch_ns = ns;
+	raw_write_seqcount_end(&cd.seq);
 
 	r = rate;
 	if (r >= 4000000) {
@@ -141,22 +163,12 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	} else
 		r_unit = ' ';
 
-	/* calculate how many ns until we wrap */
-	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
-	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
-
 	/* calculate the ns resolution of this counter */
-	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
+	res = cyc_to_ns(1ULL, new_mult, new_shift);
+
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	update_sched_clock();
-
-	/*
-	 * Ensure that sched_clock() starts off at 0ns
-	 */
-	cd.epoch_ns = 0;
-
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
 		enable_sched_clock_irqtime();
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 22:28         ` [PATCH v2] " Stephen Boyd
@ 2014-02-10 11:14           ` Will Deacon
  2014-02-17 11:19             ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2014-02-10 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 07, 2014 at 10:28:58PM +0000, Stephen Boyd wrote:
> If two sched_clock sources are registered we may end up in a
> situation where a call to sched_clock() may be accessing the
> epoch cycle count for the old counter and the cycle count for the
> new counter. This can lead to confusing results where
> sched_clock() values jump and then are reset to 0 (due to the way
> the registration function forces the epoch_ns to be 0). Fix this
> by reorganizing the registration function to hold the seqlock for
> as short a time as possible while we update the clock_data
> structure for a new counter. We also put any accumulated time
> into epoch_ns instead of resetting the time to 0 so that the clock
> doesn't reset after each successful registration.
> 
> Reported-by: Will Deacon <will.deacon@arm.com>
> Cc: Josh Cartwright <joshc@codeaurora.org>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
> 
> Changes since v1:
>  * Put elapsed time into epoch_ns 

Well, this certainly fixes the dmesg prints and the system seems ok once
it's booted:

  Tested-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-07 22:28           ` John Stultz
@ 2014-02-11  6:49             ` Stephen Boyd
  2014-02-17 18:13               ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-02-11  6:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/07, John Stultz wrote:
> On 02/07/2014 02:22 PM, Stephen Boyd wrote:
> > On 02/07, Stephen Boyd wrote:
> >> If two sched_clock sources are registered we may end up in a
> >> situation where a call to sched_clock() may be accessing the
> >> epoch cycle count for the old counter and the cycle count for the
> >> new counter. This can lead to confusing results where
> >> sched_clock() values jump and then are reset to 0 (due to the way
> >> the registration function forces the epoch_ns to be 0). Fix this
> >> by reorganizing the registration function to hold the seqlock for
> >> as short a time as possible while we update the clock_data
> >> structure for a new counter and stop resetting the epoch_ns count
> >> to 0.
> > Hmm.. This won't properly accumulate time. We need to put
> > whatever time has elapsed into epoch_ns when we register the new
> > counter for this to work. I don't have a board with this
> > configuration but I'll send a v2 that should fix this. Hopefully
> > Will can test it.
> 
> Also maybe clarify in the commit message that this is a result of not
> having the necessary locking in place in the registration code (likely
> due to it not really being required in the single clock case), just so
> Ingo and others have some more context as to why this is needed now and
> wasn't hit before.
> 

Hmph... I already sent v2 before you replied. Is the commit text
good enough? I do mention that this is about two sched_clock
sources being registered.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] sched_clock: Prevent callers from seeing half-updated data
  2014-02-10 11:14           ` Will Deacon
@ 2014-02-17 11:19             ` Will Deacon
  2014-02-17 18:04               ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2014-02-17 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 10, 2014 at 11:14:35AM +0000, Will Deacon wrote:
> On Fri, Feb 07, 2014 at 10:28:58PM +0000, Stephen Boyd wrote:
> > If two sched_clock sources are registered we may end up in a
> > situation where a call to sched_clock() may be accessing the
> > epoch cycle count for the old counter and the cycle count for the
> > new counter. This can lead to confusing results where
> > sched_clock() values jump and then are reset to 0 (due to the way
> > the registration function forces the epoch_ns to be 0). Fix this
> > by reorganizing the registration function to hold the seqlock for
> > as short a time as possible while we update the clock_data
> > structure for a new counter. We also put any accumulated time
> > into epoch_ns instead of resetting the time to 0 so that the clock
> > doesn't reset after each successful registration.
> > 
> > Reported-by: Will Deacon <will.deacon@arm.com>
> > Cc: Josh Cartwright <joshc@codeaurora.org>
> > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> > ---
> > 
> > Changes since v1:
> >  * Put elapsed time into epoch_ns 
> 
> Well, this certainly fixes the dmesg prints and the system seems ok once
> it's booted:
> 
>   Tested-by: Will Deacon <will.deacon@arm.com>

Any chance of getting this merged please? I still see the weird timing
numbers when booting -rc3.

Cheers,

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] sched_clock: Prevent callers from seeing half-updated data
  2014-02-17 11:19             ` Will Deacon
@ 2014-02-17 18:04               ` John Stultz
  0 siblings, 0 replies; 19+ messages in thread
From: John Stultz @ 2014-02-17 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/17/2014 03:19 AM, Will Deacon wrote:
> On Mon, Feb 10, 2014 at 11:14:35AM +0000, Will Deacon wrote:
>> On Fri, Feb 07, 2014 at 10:28:58PM +0000, Stephen Boyd wrote:
>>> If two sched_clock sources are registered we may end up in a
>>> situation where a call to sched_clock() may be accessing the
>>> epoch cycle count for the old counter and the cycle count for the
>>> new counter. This can lead to confusing results where
>>> sched_clock() values jump and then are reset to 0 (due to the way
>>> the registration function forces the epoch_ns to be 0). Fix this
>>> by reorganizing the registration function to hold the seqlock for
>>> as short a time as possible while we update the clock_data
>>> structure for a new counter. We also put any accumulated time
>>> into epoch_ns instead of resetting the time to 0 so that the clock
>>> doesn't reset after each successful registration.
>>>
>>> Reported-by: Will Deacon <will.deacon@arm.com>
>>> Cc: Josh Cartwright <joshc@codeaurora.org>
>>> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>> ---
>>>
>>> Changes since v1:
>>>  * Put elapsed time into epoch_ns 
>> Well, this certainly fixes the dmesg prints and the system seems ok once
>> it's booted:
>>
>>   Tested-by: Will Deacon <will.deacon@arm.com>
> Any chance of getting this merged please? I still see the weird timing
> numbers when booting -rc3.


My apologies! I somehow missed your tested-by mail.

I'll queue it up and send it in.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-11  6:49             ` Stephen Boyd
@ 2014-02-17 18:13               ` John Stultz
  0 siblings, 0 replies; 19+ messages in thread
From: John Stultz @ 2014-02-17 18:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/10/2014 10:49 PM, Stephen Boyd wrote:
> On 02/07, John Stultz wrote:
>> On 02/07/2014 02:22 PM, Stephen Boyd wrote:
>>> On 02/07, Stephen Boyd wrote:
>>>> If two sched_clock sources are registered we may end up in a
>>>> situation where a call to sched_clock() may be accessing the
>>>> epoch cycle count for the old counter and the cycle count for the
>>>> new counter. This can lead to confusing results where
>>>> sched_clock() values jump and then are reset to 0 (due to the way
>>>> the registration function forces the epoch_ns to be 0). Fix this
>>>> by reorganizing the registration function to hold the seqlock for
>>>> as short a time as possible while we update the clock_data
>>>> structure for a new counter and stop resetting the epoch_ns count
>>>> to 0.
>>> Hmm.. This won't properly accumulate time. We need to put
>>> whatever time has elapsed into epoch_ns when we register the new
>>> counter for this to work. I don't have a board with this
>>> configuration but I'll send a v2 that should fix this. Hopefully
>>> Will can test it.
>> Also maybe clarify in the commit message that this is a result of not
>> having the necessary locking in place in the registration code (likely
>> due to it not really being required in the single clock case), just so
>> Ingo and others have some more context as to why this is needed now and
>> wasn't hit before.
>>
> Hmph... I already sent v2 before you replied. Is the commit text
> good enough? I do mention that this is about two sched_clock
> sources being registered.

I'll tweak the commit message a bit make this point more clear.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-18 20:32     ` John Stultz
@ 2014-02-18 20:53       ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-02-18 20:53 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Stephen Boyd, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Josh Cartwright

On Tue, 18 Feb 2014, John Stultz wrote:
> On Tue, Feb 18, 2014 at 12:20 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 17 Feb 2014, John Stultz wrote:
> >
> >> From: Stephen Boyd <sboyd@codeaurora.org>
> >>
> >> The generic sched_clock registration function was previously
> >> done lockless, due to the fact that it was expected to be called
> >> only once. However, now there are systems that may register
> >> multiple sched_clock sources, for which the lack of locking has
> >> casued problems:
> >>
> >> If two sched_clock sources are registered we may end up in a
> >> situation where a call to sched_clock() may be accessing the
> >> epoch cycle count for the old counter and the cycle count for the
> >> new counter. This can lead to confusing results where
> >> sched_clock() values jump and then are reset to 0 (due to the way
> >> the registration function forces the epoch_ns to be 0).
> >>
> >> Fix this by reorganizing the registration function to hold the
> >> seqlock for as short a time as possible while we update the
> >> clock_data structure for a new counter. We also put any
> >> accumulated time into epoch_ns instead of resetting the time to
> >> 0 so that the clock doesn't reset after each successful
> >> registration.
> >>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >
> > Peter ???
> 
> So this is the generic sched_clock work, which Peter really hasn't had
> much involvement with yet (mostly because its not yet generic enough
> to work with more then a few arches). But I included him in the CC
> since I think it would be good to have him following along.

Fair enough. I'll pick it up.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-18 20:20   ` Thomas Gleixner
@ 2014-02-18 20:32     ` John Stultz
  2014-02-18 20:53       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2014-02-18 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Stephen Boyd, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Josh Cartwright

On Tue, Feb 18, 2014 at 12:20 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 17 Feb 2014, John Stultz wrote:
>
>> From: Stephen Boyd <sboyd@codeaurora.org>
>>
>> The generic sched_clock registration function was previously
>> done lockless, due to the fact that it was expected to be called
>> only once. However, now there are systems that may register
>> multiple sched_clock sources, for which the lack of locking has
>> casued problems:
>>
>> If two sched_clock sources are registered we may end up in a
>> situation where a call to sched_clock() may be accessing the
>> epoch cycle count for the old counter and the cycle count for the
>> new counter. This can lead to confusing results where
>> sched_clock() values jump and then are reset to 0 (due to the way
>> the registration function forces the epoch_ns to be 0).
>>
>> Fix this by reorganizing the registration function to hold the
>> seqlock for as short a time as possible while we update the
>> clock_data structure for a new counter. We also put any
>> accumulated time into epoch_ns instead of resetting the time to
>> 0 so that the clock doesn't reset after each successful
>> registration.
>>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>
> Peter ???

So this is the generic sched_clock work, which Peter really hasn't had
much involvement with yet (mostly because its not yet generic enough
to work with more then a few arches). But I included him in the CC
since I think it would be good to have him following along.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-17 18:45 ` [PATCH] sched_clock: Prevent callers from seeing half-updated data John Stultz
@ 2014-02-18 20:20   ` Thomas Gleixner
  2014-02-18 20:32     ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2014-02-18 20:20 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Stephen Boyd, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Josh Cartwright

On Mon, 17 Feb 2014, John Stultz wrote:

> From: Stephen Boyd <sboyd@codeaurora.org>
> 
> The generic sched_clock registration function was previously
> done lockless, due to the fact that it was expected to be called
> only once. However, now there are systems that may register
> multiple sched_clock sources, for which the lack of locking has
> casued problems:
> 
> If two sched_clock sources are registered we may end up in a
> situation where a call to sched_clock() may be accessing the
> epoch cycle count for the old counter and the cycle count for the
> new counter. This can lead to confusing results where
> sched_clock() values jump and then are reset to 0 (due to the way
> the registration function forces the epoch_ns to be 0).
> 
> Fix this by reorganizing the registration function to hold the
> seqlock for as short a time as possible while we update the
> clock_data structure for a new counter. We also put any
> accumulated time into epoch_ns instead of resetting the time to
> 0 so that the clock doesn't reset after each successful
> registration.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>

Peter ???

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Stephen Boyd <sboyd@codeaurora.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Josh Cartwright <joshc@codeaurora.org>
> Reported-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> [jstultz: Added extra context to the commit message]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/sched_clock.c | 46 +++++++++++++++++++++++++++++-----------------
>  1 file changed, 29 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> index 0abb364..4d23dc4 100644
> --- a/kernel/time/sched_clock.c
> +++ b/kernel/time/sched_clock.c
> @@ -116,20 +116,42 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
>  void __init sched_clock_register(u64 (*read)(void), int bits,
>  				 unsigned long rate)
>  {
> +	u64 res, wrap, new_mask, new_epoch, cyc, ns;
> +	u32 new_mult, new_shift;
> +	ktime_t new_wrap_kt;
>  	unsigned long r;
> -	u64 res, wrap;
>  	char r_unit;
>  
>  	if (cd.rate > rate)
>  		return;
>  
>  	WARN_ON(!irqs_disabled());
> -	read_sched_clock = read;
> -	sched_clock_mask = CLOCKSOURCE_MASK(bits);
> -	cd.rate = rate;
>  
>  	/* calculate the mult/shift to convert counter ticks to ns. */
> -	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
> +	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
> +
> +	new_mask = CLOCKSOURCE_MASK(bits);
> +
> +	/* calculate how many ns until we wrap */
> +	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
> +	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
> +
> +	/* update epoch for new counter and update epoch_ns from old counter*/
> +	new_epoch = read();
> +	cyc = read_sched_clock();
> +	ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask,
> +			  cd.mult, cd.shift);
> +
> +	raw_write_seqcount_begin(&cd.seq);
> +	read_sched_clock = read;
> +	sched_clock_mask = new_mask;
> +	cd.rate = rate;
> +	cd.wrap_kt = new_wrap_kt;
> +	cd.mult = new_mult;
> +	cd.shift = new_shift;
> +	cd.epoch_cyc = new_epoch;
> +	cd.epoch_ns = ns;
> +	raw_write_seqcount_end(&cd.seq);
>  
>  	r = rate;
>  	if (r >= 4000000) {
> @@ -141,22 +163,12 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
>  	} else
>  		r_unit = ' ';
>  
> -	/* calculate how many ns until we wrap */
> -	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
> -	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
> -
>  	/* calculate the ns resolution of this counter */
> -	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
> +	res = cyc_to_ns(1ULL, new_mult, new_shift);
> +
>  	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
>  		bits, r, r_unit, res, wrap);
>  
> -	update_sched_clock();
> -
> -	/*
> -	 * Ensure that sched_clock() starts off at 0ns
> -	 */
> -	cd.epoch_ns = 0;
> -
>  	/* Enable IRQ time accounting if we have a fast enough sched_clock */
>  	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
>  		enable_sched_clock_irqtime();
> -- 
> 1.8.3.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] sched_clock: Prevent callers from seeing half-updated data
  2014-02-17 18:45 [PATCH] Generic sched_clock locking fix for tip/timers/urgent John Stultz
@ 2014-02-17 18:45 ` John Stultz
  2014-02-18 20:20   ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2014-02-17 18:45 UTC (permalink / raw)
  To: LKML
  Cc: Stephen Boyd, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Will Deacon, Josh Cartwright, John Stultz

From: Stephen Boyd <sboyd@codeaurora.org>

The generic sched_clock registration function was previously
done lockless, due to the fact that it was expected to be called
only once. However, now there are systems that may register
multiple sched_clock sources, for which the lack of locking has
casued problems:

If two sched_clock sources are registered we may end up in a
situation where a call to sched_clock() may be accessing the
epoch cycle count for the old counter and the cycle count for the
new counter. This can lead to confusing results where
sched_clock() values jump and then are reset to 0 (due to the way
the registration function forces the epoch_ns to be 0).

Fix this by reorganizing the registration function to hold the
seqlock for as short a time as possible while we update the
clock_data structure for a new counter. We also put any
accumulated time into epoch_ns instead of resetting the time to
0 so that the clock doesn't reset after each successful
registration.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Josh Cartwright <joshc@codeaurora.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[jstultz: Added extra context to the commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/sched_clock.c | 46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb364..4d23dc4 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -116,20 +116,42 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 void __init sched_clock_register(u64 (*read)(void), int bits,
 				 unsigned long rate)
 {
+	u64 res, wrap, new_mask, new_epoch, cyc, ns;
+	u32 new_mult, new_shift;
+	ktime_t new_wrap_kt;
 	unsigned long r;
-	u64 res, wrap;
 	char r_unit;
 
 	if (cd.rate > rate)
 		return;
 
 	WARN_ON(!irqs_disabled());
-	read_sched_clock = read;
-	sched_clock_mask = CLOCKSOURCE_MASK(bits);
-	cd.rate = rate;
 
 	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
+	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
+
+	new_mask = CLOCKSOURCE_MASK(bits);
+
+	/* calculate how many ns until we wrap */
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+
+	/* update epoch for new counter and update epoch_ns from old counter*/
+	new_epoch = read();
+	cyc = read_sched_clock();
+	ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask,
+			  cd.mult, cd.shift);
+
+	raw_write_seqcount_begin(&cd.seq);
+	read_sched_clock = read;
+	sched_clock_mask = new_mask;
+	cd.rate = rate;
+	cd.wrap_kt = new_wrap_kt;
+	cd.mult = new_mult;
+	cd.shift = new_shift;
+	cd.epoch_cyc = new_epoch;
+	cd.epoch_ns = ns;
+	raw_write_seqcount_end(&cd.seq);
 
 	r = rate;
 	if (r >= 4000000) {
@@ -141,22 +163,12 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	} else
 		r_unit = ' ';
 
-	/* calculate how many ns until we wrap */
-	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
-	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
-
 	/* calculate the ns resolution of this counter */
-	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
+	res = cyc_to_ns(1ULL, new_mult, new_shift);
+
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	update_sched_clock();
-
-	/*
-	 * Ensure that sched_clock() starts off at 0ns
-	 */
-	cd.epoch_ns = 0;
-
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
 		enable_sched_clock_irqtime();
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-02-18 20:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-04 18:36 Weird sched_clock behaviour during boot with -rc1 Will Deacon
2014-02-04 20:46 ` John Stultz
2014-02-04 22:00   ` Stephen Boyd
2014-02-05 21:47     ` Josh Cartwright
2014-02-07 18:23     ` John Stultz
2014-02-07 19:37       ` Stephen Boyd
2014-02-07 20:48       ` [PATCH] sched_clock: Prevent callers from seeing half-updated data Stephen Boyd
2014-02-07 22:22         ` Stephen Boyd
2014-02-07 22:28           ` John Stultz
2014-02-11  6:49             ` Stephen Boyd
2014-02-17 18:13               ` John Stultz
2014-02-07 22:28         ` [PATCH v2] " Stephen Boyd
2014-02-10 11:14           ` Will Deacon
2014-02-17 11:19             ` Will Deacon
2014-02-17 18:04               ` John Stultz
2014-02-17 18:45 [PATCH] Generic sched_clock locking fix for tip/timers/urgent John Stultz
2014-02-17 18:45 ` [PATCH] sched_clock: Prevent callers from seeing half-updated data John Stultz
2014-02-18 20:20   ` Thomas Gleixner
2014-02-18 20:32     ` John Stultz
2014-02-18 20:53       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.