All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix get_jiffies_64 to work on voyager
@ 2004-01-06 16:04 James Bottomley
  2004-01-06 16:19 ` Andrew Morton
  2004-01-06 16:26 ` Linus Torvalds
  0 siblings, 2 replies; 16+ messages in thread
From: James Bottomley @ 2004-01-06 16:04 UTC (permalink / raw)
  To: Andrew Morton, johnstultz; +Cc: Linux Kernel

This patch


ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org
  [PATCH] ia32 jiffy wrapping fixes

Causes the voyager boot to hang.  The problem is this change:

--- a/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
+++ b/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
@@ -141,7 +140,7 @@
 #ifndef CONFIG_NUMA
        if (!use_tsc)
 #endif
-               return (unsigned long long)jiffies * (1000000000 / HZ);
+               return (unsigned long long)get_jiffies_64() *
(1000000000 / HZ);

Apart from the fact (that I've whined about before) that this
sched_clock() function should be one of the timer function pointers, so
there isn't this CONFIG_NUMA dependence (unless I can also add a
CONFIG_X86_VOYAGER dependence to it as well), the problem seems to be
some type of bad resonance between the jiffies_64 update and the
xtime_lock in get_jiffies_64().  I think this may indicate that HZ needs
to be reduced to 100 on voyager;  however, there is also no need to get
the xtime sequence lock every time we do a jiffies_64 read, since the
only unstable time is when we may be updating both halves of it
non-atomically.  Thus, we only need the sequence lock when the bottom
half is zero.  This should improve the fast path of get_jiffies_64() for
all x86 arch's.

James

===== kernel/time.c 1.18 vs edited =====
--- 1.18/kernel/time.c	Wed Oct 22 00:09:54 2003
+++ edited/kernel/time.c	Tue Jan  6 09:20:38 2004
@@ -422,13 +422,20 @@
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {
-	unsigned long seq;
-	u64 ret;
+	u64 ret = jiffies_64;
 
-	do {
-		seq = read_seqbegin(&xtime_lock);
+	/* We only have read problems when the lower 32 bits are zero
+	 * indicating that we may be in the process of updating the upper
+	 * 32 bits */
+	while (unlikely((jiffies_64 & 0xffffffffULL) == 0)) {
+		unsigned long seq = read_seqbegin(&xtime_lock);
+		
+		rmb();
 		ret = jiffies_64;
-	} while (read_seqretry(&xtime_lock, seq));
+		rmb();
+		if(!read_seqretry(&xtime_lock, seq))
+			break;
+	}
 	return ret;
 }
 
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley
@ 2004-01-06 16:19 ` Andrew Morton
  2004-01-06 16:26   ` James Bottomley
                     ` (2 more replies)
  2004-01-06 16:26 ` Linus Torvalds
  1 sibling, 3 replies; 16+ messages in thread
From: Andrew Morton @ 2004-01-06 16:19 UTC (permalink / raw)
  To: James Bottomley; +Cc: johnstultz, linux-kernel

James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> This patch
> 
> 
>  ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org
>    [PATCH] ia32 jiffy wrapping fixes
> 
>  Causes the voyager boot to hang.  The problem is this change:
> 
>  --- a/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
>  +++ b/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
>  @@ -141,7 +140,7 @@
>   #ifndef CONFIG_NUMA
>          if (!use_tsc)
>   #endif
>  -               return (unsigned long long)jiffies * (1000000000 / HZ);
>  +               return (unsigned long long)get_jiffies_64() *
>  (1000000000 / HZ);

Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
to be synchronised in sched_clock()" patch from -mm.  The fix for that is
below.  I shall accelerate it.

--- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix	2003-12-30 00:45:09.000000000 -0800
+++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c	2003-12-30 00:45:09.000000000 -0800
@@ -140,7 +140,8 @@ unsigned long long sched_clock(void)
 #ifndef CONFIG_NUMA
 	if (!use_tsc)
 #endif
-		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
+		/* jiffies might overflow but this is not a big deal here */
+		return (unsigned long long)jiffies * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
 	rdtscll(this_offset);

_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:19 ` Andrew Morton
@ 2004-01-06 16:26   ` James Bottomley
  2004-01-06 17:05     ` Andrew Morton
  2004-01-06 16:29   ` Linus Torvalds
  2004-01-06 16:30   ` Tim Schmielau
  2 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2004-01-06 16:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: johnstul, Linux Kernel

On Tue, 2004-01-06 at 10:19, Andrew Morton wrote:
> Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
> to be synchronised in sched_clock()" patch from -mm.  The fix for that is
> below.  I shall accelerate it.

Actually, I think we need to know why this is happening, since the use
of these sequence locks is growing.  On voyager I just put it down to HZ
== 1000 being a bit much for my old pentium 66MHz processors, but if
you've seen it on a much faster processor, that would tend to indicate
there's some other problem at work here.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley
  2004-01-06 16:19 ` Andrew Morton
@ 2004-01-06 16:26 ` Linus Torvalds
  2004-01-06 18:29   ` Paulo Marques
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2004-01-06 16:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, johnstultz, Linux Kernel



On Tue, 6 Jan 2004, James Bottomley wrote:
>
>			 however, there is also no need to get
> the xtime sequence lock every time we do a jiffies_64 read, since the
> only unstable time is when we may be updating both halves of it
> non-atomically.  Thus, we only need the sequence lock when the bottom
> half is zero.  This should improve the fast path of get_jiffies_64() for
> all x86 arch's.

This is wrong. There is nothing that guarantees that the read has read the 
high bits first. 

It might have read the low bits first (and gotten 0xffffffff), and then on
another CPU the contents were updated, and now the high bits are one
bigger, and when you read the high bits, you get a value that is off by a
_lot_.

And it's not just 0 and 0xffffffff in the low bits that can be problematic 
either: if the CPU that does the "get_jiffies64()" gets an interrupt, the 
thing may be off by more than a count of one.

IF this optimization is really worth it, the code should be something like 
this:

	#define JIFFY_SLOP (3)

	u64 ret = jiffies_64;
	u32 low32 = ret;

	if ((low+JIFFY_SLOP) <= JIFFY_SLOP*2) {
		... do the seqlock thing ...
	}

instead.

Which still avoids the read-lock about 99.9999% of the time, so it may 
well be worth it. But somebody should double-check my logic.

				Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:19 ` Andrew Morton
  2004-01-06 16:26   ` James Bottomley
@ 2004-01-06 16:29   ` Linus Torvalds
  2004-01-06 17:05     ` Andrew Morton
  2004-01-06 17:11     ` Tim Schmielau
  2004-01-06 16:30   ` Tim Schmielau
  2 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2004-01-06 16:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James Bottomley, johnstultz, linux-kernel



On Tue, 6 Jan 2004, Andrew Morton wrote:
> 
> Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
> to be synchronised in sched_clock()" patch from -mm.  The fix for that is
> below.  I shall accelerate it.
> 
> --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix	2003-12-30 00:45:09.000000000 -0800
> +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c	2003-12-30 00:45:09.000000000 -0800
> @@ -140,7 +140,8 @@ unsigned long long sched_clock(void)
>  #ifndef CONFIG_NUMA
>  	if (!use_tsc)
>  #endif
> -		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
> +		/* jiffies might overflow but this is not a big deal here */
> +		return (unsigned long long)jiffies * (1000000000 / HZ);

Augh. If you cast it to "unsigned long long" anyway, why not just use the 
right value? It's "jiffies_64".

It has other problems, of course, but at least that makes them slightly 
less.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:19 ` Andrew Morton
  2004-01-06 16:26   ` James Bottomley
  2004-01-06 16:29   ` Linus Torvalds
@ 2004-01-06 16:30   ` Tim Schmielau
  2 siblings, 0 replies; 16+ messages in thread
From: Tim Schmielau @ 2004-01-06 16:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James Bottomley, johnstultz, linux-kernel

On Tue, 6 Jan 2004, Andrew Morton wrote:

> James Bottomley <James.Bottomley@SteelEye.com> wrote:
> >
> > This patch
> > 
> > 
> >  ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org
> >    [PATCH] ia32 jiffy wrapping fixes
> > 
> >  Causes the voyager boot to hang.  The problem is this change:
> > 
> >  --- a/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
> >  +++ b/arch/i386/kernel/timers/timer_tsc.c       Tue Jan  6 09:57:34 2004
> >  @@ -141,7 +140,7 @@
> >   #ifndef CONFIG_NUMA
> >          if (!use_tsc)
> >   #endif
> >  -               return (unsigned long long)jiffies * (1000000000 / HZ);
> >  +               return (unsigned long long)get_jiffies_64() *
> >  (1000000000 / HZ);
> 
> Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
> to be synchronised in sched_clock()" patch from -mm.  The fix for that is
> below.  I shall accelerate it.
> 
> --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix	2003-12-30 00:45:09.000000000 -0800
> +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c	2003-12-30 00:45:09.000000000 -0800
> @@ -140,7 +140,8 @@ unsigned long long sched_clock(void)
>  #ifndef CONFIG_NUMA
>  	if (!use_tsc)
>  #endif
> -		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
> +		/* jiffies might overflow but this is not a big deal here */
> +		return (unsigned long long)jiffies * (1000000000 / HZ);
>  
>  	/* Read the Time Stamp Counter */
>  	rdtscll(this_offset);

Yes, please revert that change. When I tested it, I didn't realize I need 
NUMA or clear use_tsc to actually cover the change. And it's indeed not a 
big deal.

The proposed get_jiffies_64() change however looks quite fragile to me.

Sorry,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:26   ` James Bottomley
@ 2004-01-06 17:05     ` Andrew Morton
  2004-01-06 18:53       ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2004-01-06 17:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: johnstul, linux-kernel

James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> On Tue, 2004-01-06 at 10:19, Andrew Morton wrote:
> > Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
> > to be synchronised in sched_clock()" patch from -mm.  The fix for that is
> > below.  I shall accelerate it.
> 
> Actually, I think we need to know why this is happening, since the use
> of these sequence locks is growing.

That would be nice.  Can you get a backtrace?

> On voyager I just put it down to HZ
> == 1000 being a bit much for my old pentium 66MHz processors, but if
> you've seen it on a much faster processor, that would tend to indicate
> there's some other problem at work here.

No, it was much simpler in my case: log_buf_len_setup() was accidentally
enabling interrupts early in boot and we were taking a timer interrupt
while holding a write lock on xtime_lock.  sched_clock() was requiring a
read lock and boom.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:29   ` Linus Torvalds
@ 2004-01-06 17:05     ` Andrew Morton
  2004-01-06 17:11     ` Tim Schmielau
  1 sibling, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2004-01-06 17:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James.Bottomley, johnstultz, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:
>
> 
> 
> On Tue, 6 Jan 2004, Andrew Morton wrote:
> > 
> > Hm, OK.  I hit the same deadlock when running with the "don't require TSCs
> > to be synchronised in sched_clock()" patch from -mm.  The fix for that is
> > below.  I shall accelerate it.
> > 
> > --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix	2003-12-30 00:45:09.000000000 -0800
> > +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c	2003-12-30 00:45:09.000000000 -0800
> > @@ -140,7 +140,8 @@ unsigned long long sched_clock(void)
> >  #ifndef CONFIG_NUMA
> >  	if (!use_tsc)
> >  #endif
> > -		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
> > +		/* jiffies might overflow but this is not a big deal here */
> > +		return (unsigned long long)jiffies * (1000000000 / HZ);
> 
> Augh. If you cast it to "unsigned long long" anyway, why not just use the 
> right value? It's "jiffies_64".

Sounds sane.

> It has other problems, of course, but at least that makes them slightly 
> less.

Yes, it's slightly sleazy.

I haven't tested this...


From: Ingo Molnar <mingo@elte.hu>

Voyager is getting odd deadlocks due to the taking of xtime_lock() in
sched_clock()->get_jiffies_64().

I had this patch queued up to fix a different deadlock, which ocurs when we
relax the requirement that TSC's be synchronised across CPUs.  But it will
fix James' deadlock too.




 arch/i386/kernel/timers/timer_tsc.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff -puN arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix arch/i386/kernel/timers/timer_tsc.c
--- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix	2004-01-06 08:56:36.000000000 -0800
+++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c	2004-01-06 08:58:37.000000000 -0800
@@ -140,7 +140,12 @@ unsigned long long sched_clock(void)
 #ifndef CONFIG_NUMA
 	if (!use_tsc)
 #endif
-		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
+		/*
+		 * Avoid xtime_lock deadlocks by not locking the read of
+		 * jiffies_64.  Can possibly cause glitches on 32-bit rollovers
+		 * but the CPU scheduler will recover OK.
+		 */
+		return jiffies_64 * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
 	rdtscll(this_offset);

_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:29   ` Linus Torvalds
  2004-01-06 17:05     ` Andrew Morton
@ 2004-01-06 17:11     ` Tim Schmielau
  2004-01-06 17:22       ` Andrew Morton
  2004-01-06 18:02       ` Linus Torvalds
  1 sibling, 2 replies; 16+ messages in thread
From: Tim Schmielau @ 2004-01-06 17:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, James Bottomley, johnstul, lkml

On Tue, 6 Jan 2004, Linus Torvalds wrote:

> Augh. If you cast it to "unsigned long long" anyway, why not just use the 
> right value? It's "jiffies_64".

Lessons in straight thinking...

We then need to make jiffies_64 volatile, since we violate the rule to 
never read it.

Tim 


--- linux-2.6.1-rc1/arch/i386/kernel/timers/timer_tsc.c	2003-12-31 14:21:19.000000000 +0100
+++ linux-2.6.1-rc1-j64/arch/i386/kernel/timers/timer_tsc.c	2004-01-06 18:05:24.000000000 +0100
@@ -140,7 +140,8 @@
 #ifndef CONFIG_NUMA
 	if (!use_tsc)
 #endif
-		return (unsigned long long)get_jiffies_64() * (1000000000 / HZ);
+		/* no locking but a rare wrong value is not a big deal */
+		return (unsigned long long)jiffies_64 * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
 	rdtscll(this_offset);

--- linux-2.6.1-rc1/include/linux/jiffies.h	2003-12-31 14:22:05.000000000 +0100
+++ linux-2.6.1-rc1-j64/include/linux/jiffies.h	2004-01-06 17:56:15.000000000 +0100
@@ -9,11 +9,11 @@
 #include <asm/param.h>			/* for HZ */
 
 /*
- * The 64-bit value is not volatile - you MUST NOT read it
- * without sampling the sequence number in xtime_lock.
+ * NEVER EVER read jiffies_64 without sampling the sequence number
+ * in xtime_lock.
  * get_jiffies_64() will do this for you as appropriate.
  */
-extern u64 jiffies_64;
+extern u64 volatile jiffies_64;
 extern unsigned long volatile jiffies;
 
 #if (BITS_PER_LONG < 64)
@@ -21,7 +21,7 @@
 #else
 static inline u64 get_jiffies_64(void)
 {
-	return (u64)jiffies;
+	return jiffies_64;
 }
 #endif
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 17:11     ` Tim Schmielau
@ 2004-01-06 17:22       ` Andrew Morton
  2004-01-06 17:37         ` Tim Schmielau
  2004-01-06 18:02       ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2004-01-06 17:22 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: torvalds, James.Bottomley, johnstul, linux-kernel

Tim Schmielau <tim@physik3.uni-rostock.de> wrote:
>
>   #if (BITS_PER_LONG < 64)
>  @@ -21,7 +21,7 @@
>   #else
>   static inline u64 get_jiffies_64(void)
>   {
>  -	return (u64)jiffies;
>  +	return jiffies_64;
>   }
>   #endif

hm, why this change?  Are you sure that all 64-bit architectures alias
jiffies onto jiffies_64?  x86_64 seems not to.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 17:22       ` Andrew Morton
@ 2004-01-06 17:37         ` Tim Schmielau
  0 siblings, 0 replies; 16+ messages in thread
From: Tim Schmielau @ 2004-01-06 17:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, James.Bottomley, johnstul, linux-kernel

On Tue, 6 Jan 2004, Andrew Morton wrote:

> Tim Schmielau <tim@physik3.uni-rostock.de> wrote:
> >
> >   #if (BITS_PER_LONG < 64)
> >  @@ -21,7 +21,7 @@
> >   #else
> >   static inline u64 get_jiffies_64(void)
> >   {
> >  -	return (u64)jiffies;
> >  +	return jiffies_64;
> >   }
> >   #endif
> 
> hm, why this change?  Are you sure that all 64-bit architectures alias
> jiffies onto jiffies_64?  x86_64 seems not to.
> 

It was jiffies_64 in the first place (that's why the function is called
get_jiffies_64(), after all). 
I once made it jiffies because jiffies was volatile while jiffies_64 was
not. Now that jiffies_64 becomes volatile as well, I thought we could
re-straighten things.

But we can leave out this chunk, it really shouldn't make any difference.

Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 17:11     ` Tim Schmielau
  2004-01-06 17:22       ` Andrew Morton
@ 2004-01-06 18:02       ` Linus Torvalds
  1 sibling, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2004-01-06 18:02 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: Andrew Morton, James Bottomley, johnstul, lkml


[ This is a big rant against using "volatile" on data structures. Feel 
  free to ignore it, but the fact is, I'm right. You should never EVER use
  "volatile" on a data structure. ]

On Tue, 6 Jan 2004, Tim Schmielau wrote:
> 
> We then need to make jiffies_64 volatile, since we violate the rule to 
> never read it.

No, we should _never_ make anything volatile. There just isn't any reason 
to. The compiler will never do any "better" with a volatile, it will only 
ever do worse.

If there are memory ordering constraints etc, the compiler won't be able 
to handle them anyway, and volatile will be a no-op. That's why we have 
"barrier()" and "mb()" and friends.

The _only_ acceptable use of "volatile" is basically:

 - in _code_ (not data structures), where we might mark a place as making 
   a special type of access. For example, in the PCI MMIO read functions, 
   rather than use inline assembly to force the particular access (which
   would work equally well), we can force the pointer to a volatile type.

   Similarly, we force this for "test_bit()" macros etc, because they are 
   documented to work on SMP-safe. But it's the _code_ that works that 
   way, not the data structures.

   And this is an important distinctions: there are specific pieces of 
   _code_ that may be SMP-safe (for whatever reasons the writer thinks). 
   Data structures themselves are never SMP safe.

   Ergo: never mark data structures "volatile". It's a sure sign of a bug 
   if the thing isn't a memory-mapped IO register (and even then it's 
   likely a bug, since you really should be using the proper functions).

   (Some driver writers use "volatile" for mailboxes that are updated by
   DMA from the hardware. It _can_ be correct there, but the fact is, you 
   might as well put the "volatile" in the code just out of principle).

That said, the "sure sign of a bug" case has one specific sub-case:

 - to paste over bugs that you really don't tink are worth fixing any 
   other way. This is why "jiffies" itself is declared volatile: just to 
   let people write code that does "while (time_before(xxx, jiffies))".

But the "jiffies" case is safe only _exactly_ because it's an atomic read. 
You always get a valid value - so it's actually "safe" to mark jiffies as 
baing volatile. It allows people to be sloppy (bad), but at least it 
allows people to be sloppy in a safe way.

In contrast, "jiffies_64" is _not_ an area where you can safely let the 
compiler read a unstable value, so "volatile" is fundamentally wrong for 
it. You need to have some locking, or to explicitly say "we don't care in 
this case", and in both cases it would be wrong to call the thing 
"volatile". With locking, it _isn't_ volatile, and with "we don't care", 
it had better not make any difference. In either case the "volatile" is 
wrong.

We had absolutely _tons_ of bugs in the original networking code, where 
clueless people thougth that "volatile" somehow means that you don't need 
locking. EVERY SINGLE CASE, and I mean EVERY ONE, was a bug.

There are some other cases where the "volatile" keyword is fine, notably
inline asm has a specific meaning that is pretty well-defined and very
useful. But in all other cases I'd suggest having the volatile be part of 
the code, possibly with an explanation _why_ it is ok to use volatile 
there unless it is obvious.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 16:26 ` Linus Torvalds
@ 2004-01-06 18:29   ` Paulo Marques
  2004-01-06 18:46     ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Paulo Marques @ 2004-01-06 18:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel

Linus Torvalds wrote:

> 
> On Tue, 6 Jan 2004, James Bottomley wrote:
> 
>>			 however, there is also no need to get
>>the xtime sequence lock every time we do a jiffies_64 read, since the
>>only unstable time is when we may be updating both halves of it
>>non-atomically.  Thus, we only need the sequence lock when the bottom
>>half is zero.  This should improve the fast path of get_jiffies_64() for
>>all x86 arch's.
>>
> 
> This is wrong. There is nothing that guarantees that the read has read the 
> high bits first. 
> 


It seems to me that even if the the high bits get read first, the 64 value can 
go "backwards" in time, since it can read X from the high 32 bits, then the 
clock advance to X+1, and the low bits will read a value close to zero, 
resulting in ((X<<32)|0).


> It might have read the low bits first (and gotten 0xffffffff), and then on
> another CPU the contents were updated, and now the high bits are one
> bigger, and when you read the high bits, you get a value that is off by a
> _lot_.
> 
> And it's not just 0 and 0xffffffff in the low bits that can be problematic 
> either: if the CPU that does the "get_jiffies64()" gets an interrupt, the 
> thing may be off by more than a count of one.
> 
> IF this optimization is really worth it, the code should be something like 
> this:
> 
> 	#define JIFFY_SLOP (3)
> 
> 	u64 ret = jiffies_64;

> 	u32 low32 = ret;
> 
> 	if ((low+JIFFY_SLOP) <= JIFFY_SLOP*2) {
> 		... do the seqlock thing ...
> 	}
> 
> instead.
> 


What about this instead? I don't like very much this kind of construction, but 
it seems that it would prevent the lock altogether:


	u32 jiff_high1, jiff_high2, jiff_low

	do {
		jiff_high1 = ((volatile) jiffies_64) >> 32;
		jiff_low = ((volatile) jiffies_64);
		jiff_high2 = ((volatile) jiffies_64) >> 32;
	}
	while(jiff_high1 != jiff_high2);

	return (jiff_high1<<32) | jiff_low;

If there is anyway to avoid the volatiles there, it would be much cleaner.


Please don't beat me too hard if there is an obvious problem with this 
implementation.

This is an old trick that was used on a brain dead DOS API, where there were 2 
functions to read date and time, and there was hard to get a real timestamp :)


-- 
Paulo Marques - www.grupopie.com

"In a world without walls and fences who needs windows and gates?"


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 18:29   ` Paulo Marques
@ 2004-01-06 18:46     ` Linus Torvalds
  2004-01-06 19:04       ` Paulo Marques
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2004-01-06 18:46 UTC (permalink / raw)
  To: Paulo Marques; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel



On Tue, 6 Jan 2004, Paulo Marques wrote:
> 
> What about this instead? I don't like very much this kind of construction, but 
> it seems that it would prevent the lock altogether:
> 
> 	u32 jiff_high1, jiff_high2, jiff_low
> 
> 	do {
> 		jiff_high1 = ((volatile) jiffies_64) >> 32;
> 		jiff_low = ((volatile) jiffies_64);
> 		jiff_high2 = ((volatile) jiffies_64) >> 32;
> 	}
> 	while(jiff_high1 != jiff_high2);

Yes, we used to do things like that at some point (and your code is 
buggy: by leaving out the size, the "volatile" cast casts to the implicit 
"int" size in C). 

It doesn't work in SMP environments, since the "ordering" by volatile is 
only a single-CPU ordering. This is just one of the reasons why "volatile" 
isn't very useful.

Also, the above still assumes an update ordering between low and high (it
assumes that both set of bits get written back simultaneously). Which is
not necessarily true on a software _or_ hardware ordering level.

So on the reader side you'd need to add an explicit "rmb()" between the
two reads of the high bits, and on the writer side you'd need to always 
make sure that you do an atomic 64-bt write.

Since the "rmb()" is as expensive as the sequence lock, the above wouldn't
much help.

> If there is anyway to avoid the volatiles there, it would be much cleaner.

Not only is there a need to avoid them, they don't help in the least,
since you need the explicit CPU ordering macro anyway. And that ordering 
macro is the true cost of "seq_read_lock()", so...

In contrast, the reason the simple "assume some values are stable" patch
actually _does_ help is that it doesn't depend on any ordering constraints
at all.  So it literally can be done lock-less, because it knows about
something much more fundamental: it depends on the _behaviour_ of the
values.

Basically, in the kernel, the expensive part about any lock is literally 
the ordering. There are other things that can be expensive too (if the 
lock is bouncing back and forth between CPU's, that becomes really 
exensive really quickly), but to a first order and especially on locks 
that aren't themselves fundamentally highly contended for, the CPU 
serialization implied in the lock is the expensive part.

So converting that serialization to a "lockless" algorithm that still
depends on ordering usually doesn't much help for those locks. The
"ordering" part is still serializing, and the main win ends up being on 
64-CPU systems etc where you can at least avoid the cacheline bounces.

Indeed, I think it was 64-CPU systems that caused the sequence lock stuff, 
not so much the "normal" 2- and 4-way boxes.

If you want to improve on the sequence lock, you need to take advantage of
inherent data knowledge (in this case the knowledge that you don't care
about the exact value, and that you know how the bits are updated).

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 17:05     ` Andrew Morton
@ 2004-01-06 18:53       ` James Bottomley
  0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2004-01-06 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: johnstul, Linux Kernel

On Tue, 2004-01-06 at 11:05, Andrew Morton wrote:
> No, it was much simpler in my case: log_buf_len_setup() was accidentally
> enabling interrupts early in boot and we were taking a timer interrupt
> while holding a write lock on xtime_lock.  sched_clock() was requiring a
> read lock and boom.

Voyager has no APIC local timer equivalent, so I rebroadcast the timer
tick to all CPUs.  However, the local tick is done in this thread with
the xtime_lock held as write and it can trigger the scheduler load
balancing, which needs to call sched_clock()....boom.

All in all, this does show that the xtime sequence lock is a bit too
fragile.  It also seems to show that we should redo the subarch timer
hooks if we want to make the sequence locks work.

I think there should be two hooks: one called holding the xtime write
lock for doing clock adjustment specific things and the other called
after we've released the xtime write lock.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fix get_jiffies_64 to work on voyager
  2004-01-06 18:46     ` Linus Torvalds
@ 2004-01-06 19:04       ` Paulo Marques
  0 siblings, 0 replies; 16+ messages in thread
From: Paulo Marques @ 2004-01-06 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel

Linus Torvalds wrote:

> 
> On Tue, 6 Jan 2004, Paulo Marques wrote:
> 
>>	....
> 
> Yes, we used to do things like that at some point (and your code is 
> buggy: by leaving out the size, the "volatile" cast casts to the implicit 
> "int" size in C). 
> 


Ugh, stupid mistake. Just wrote the code to fast... :(


Anyway, thanks for your clarification. People like me that code a lot of 
user-space stuff, take a while to realize all the implications of SMP, ordering, 
preemption, etc., etc.

By the way, after double checking your logic, it looks good as long as there are 
no interrupts that take more than 3 jiffies to complete :)

-- 
Paulo Marques - www.grupopie.com

"In a world without walls and fences who needs windows and gates?"


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-01-06 19:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley
2004-01-06 16:19 ` Andrew Morton
2004-01-06 16:26   ` James Bottomley
2004-01-06 17:05     ` Andrew Morton
2004-01-06 18:53       ` James Bottomley
2004-01-06 16:29   ` Linus Torvalds
2004-01-06 17:05     ` Andrew Morton
2004-01-06 17:11     ` Tim Schmielau
2004-01-06 17:22       ` Andrew Morton
2004-01-06 17:37         ` Tim Schmielau
2004-01-06 18:02       ` Linus Torvalds
2004-01-06 16:30   ` Tim Schmielau
2004-01-06 16:26 ` Linus Torvalds
2004-01-06 18:29   ` Paulo Marques
2004-01-06 18:46     ` Linus Torvalds
2004-01-06 19:04       ` Paulo Marques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.