* [PATCH] fix get_jiffies_64 to work on voyager @ 2004-01-06 16:04 James Bottomley 2004-01-06 16:19 ` Andrew Morton 2004-01-06 16:26 ` Linus Torvalds 0 siblings, 2 replies; 16+ messages in thread From: James Bottomley @ 2004-01-06 16:04 UTC (permalink / raw) To: Andrew Morton, johnstultz; +Cc: Linux Kernel This patch ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org [PATCH] ia32 jiffy wrapping fixes Causes the voyager boot to hang. The problem is this change: --- a/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 +++ b/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 @@ -141,7 +140,7 @@ #ifndef CONFIG_NUMA if (!use_tsc) #endif - return (unsigned long long)jiffies * (1000000000 / HZ); + return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); Apart from the fact (that I've whined about before) that this sched_clock() function should be one of the timer function pointers, so there isn't this CONFIG_NUMA dependence (unless I can also add a CONFIG_X86_VOYAGER dependence to it as well), the problem seems to be some type of bad resonance between the jiffies_64 update and the xtime_lock in get_jiffies_64(). I think this may indicate that HZ needs to be reduced to 100 on voyager; however, there is also no need to get the xtime sequence lock every time we do a jiffies_64 read, since the only unstable time is when we may be updating both halves of it non-atomically. Thus, we only need the sequence lock when the bottom half is zero. This should improve the fast path of get_jiffies_64() for all x86 arch's. James ===== kernel/time.c 1.18 vs edited ===== --- 1.18/kernel/time.c Wed Oct 22 00:09:54 2003 +++ edited/kernel/time.c Tue Jan 6 09:20:38 2004 @@ -422,13 +422,20 @@ #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void) { - unsigned long seq; - u64 ret; + u64 ret = jiffies_64; - do { - seq = read_seqbegin(&xtime_lock); + /* We only have read problems when the lower 32 bits are zero + * indicating that we may be in the process of updating the upper + * 32 bits */ + while (unlikely((jiffies_64 & 0xffffffffULL) == 0)) { + unsigned long seq = read_seqbegin(&xtime_lock); + + rmb(); ret = jiffies_64; - } while (read_seqretry(&xtime_lock, seq)); + rmb(); + if(!read_seqretry(&xtime_lock, seq)) + break; + } return ret; } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley @ 2004-01-06 16:19 ` Andrew Morton 2004-01-06 16:26 ` James Bottomley ` (2 more replies) 2004-01-06 16:26 ` Linus Torvalds 1 sibling, 3 replies; 16+ messages in thread From: Andrew Morton @ 2004-01-06 16:19 UTC (permalink / raw) To: James Bottomley; +Cc: johnstultz, linux-kernel James Bottomley <James.Bottomley@SteelEye.com> wrote: > > This patch > > > ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org > [PATCH] ia32 jiffy wrapping fixes > > Causes the voyager boot to hang. The problem is this change: > > --- a/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 > +++ b/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 > @@ -141,7 +140,7 @@ > #ifndef CONFIG_NUMA > if (!use_tsc) > #endif > - return (unsigned long long)jiffies * (1000000000 / HZ); > + return (unsigned long long)get_jiffies_64() * > (1000000000 / HZ); Hm, OK. I hit the same deadlock when running with the "don't require TSCs to be synchronised in sched_clock()" patch from -mm. The fix for that is below. I shall accelerate it. --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix 2003-12-30 00:45:09.000000000 -0800 +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c 2003-12-30 00:45:09.000000000 -0800 @@ -140,7 +140,8 @@ unsigned long long sched_clock(void) #ifndef CONFIG_NUMA if (!use_tsc) #endif - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); + /* jiffies might overflow but this is not a big deal here */ + return (unsigned long long)jiffies * (1000000000 / HZ); /* Read the Time Stamp Counter */ rdtscll(this_offset); _ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:19 ` Andrew Morton @ 2004-01-06 16:26 ` James Bottomley 2004-01-06 17:05 ` Andrew Morton 2004-01-06 16:29 ` Linus Torvalds 2004-01-06 16:30 ` Tim Schmielau 2 siblings, 1 reply; 16+ messages in thread From: James Bottomley @ 2004-01-06 16:26 UTC (permalink / raw) To: Andrew Morton; +Cc: johnstul, Linux Kernel On Tue, 2004-01-06 at 10:19, Andrew Morton wrote: > Hm, OK. I hit the same deadlock when running with the "don't require TSCs > to be synchronised in sched_clock()" patch from -mm. The fix for that is > below. I shall accelerate it. Actually, I think we need to know why this is happening, since the use of these sequence locks is growing. On voyager I just put it down to HZ == 1000 being a bit much for my old pentium 66MHz processors, but if you've seen it on a much faster processor, that would tend to indicate there's some other problem at work here. James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:26 ` James Bottomley @ 2004-01-06 17:05 ` Andrew Morton 2004-01-06 18:53 ` James Bottomley 0 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2004-01-06 17:05 UTC (permalink / raw) To: James Bottomley; +Cc: johnstul, linux-kernel James Bottomley <James.Bottomley@SteelEye.com> wrote: > > On Tue, 2004-01-06 at 10:19, Andrew Morton wrote: > > Hm, OK. I hit the same deadlock when running with the "don't require TSCs > > to be synchronised in sched_clock()" patch from -mm. The fix for that is > > below. I shall accelerate it. > > Actually, I think we need to know why this is happening, since the use > of these sequence locks is growing. That would be nice. Can you get a backtrace? > On voyager I just put it down to HZ > == 1000 being a bit much for my old pentium 66MHz processors, but if > you've seen it on a much faster processor, that would tend to indicate > there's some other problem at work here. No, it was much simpler in my case: log_buf_len_setup() was accidentally enabling interrupts early in boot and we were taking a timer interrupt while holding a write lock on xtime_lock. sched_clock() was requiring a read lock and boom. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 17:05 ` Andrew Morton @ 2004-01-06 18:53 ` James Bottomley 0 siblings, 0 replies; 16+ messages in thread From: James Bottomley @ 2004-01-06 18:53 UTC (permalink / raw) To: Andrew Morton; +Cc: johnstul, Linux Kernel On Tue, 2004-01-06 at 11:05, Andrew Morton wrote: > No, it was much simpler in my case: log_buf_len_setup() was accidentally > enabling interrupts early in boot and we were taking a timer interrupt > while holding a write lock on xtime_lock. sched_clock() was requiring a > read lock and boom. Voyager has no APIC local timer equivalent, so I rebroadcast the timer tick to all CPUs. However, the local tick is done in this thread with the xtime_lock held as write and it can trigger the scheduler load balancing, which needs to call sched_clock()....boom. All in all, this does show that the xtime sequence lock is a bit too fragile. It also seems to show that we should redo the subarch timer hooks if we want to make the sequence locks work. I think there should be two hooks: one called holding the xtime write lock for doing clock adjustment specific things and the other called after we've released the xtime write lock. James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:19 ` Andrew Morton 2004-01-06 16:26 ` James Bottomley @ 2004-01-06 16:29 ` Linus Torvalds 2004-01-06 17:05 ` Andrew Morton 2004-01-06 17:11 ` Tim Schmielau 2004-01-06 16:30 ` Tim Schmielau 2 siblings, 2 replies; 16+ messages in thread From: Linus Torvalds @ 2004-01-06 16:29 UTC (permalink / raw) To: Andrew Morton; +Cc: James Bottomley, johnstultz, linux-kernel On Tue, 6 Jan 2004, Andrew Morton wrote: > > Hm, OK. I hit the same deadlock when running with the "don't require TSCs > to be synchronised in sched_clock()" patch from -mm. The fix for that is > below. I shall accelerate it. > > --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix 2003-12-30 00:45:09.000000000 -0800 > +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c 2003-12-30 00:45:09.000000000 -0800 > @@ -140,7 +140,8 @@ unsigned long long sched_clock(void) > #ifndef CONFIG_NUMA > if (!use_tsc) > #endif > - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); > + /* jiffies might overflow but this is not a big deal here */ > + return (unsigned long long)jiffies * (1000000000 / HZ); Augh. If you cast it to "unsigned long long" anyway, why not just use the right value? It's "jiffies_64". It has other problems, of course, but at least that makes them slightly less. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:29 ` Linus Torvalds @ 2004-01-06 17:05 ` Andrew Morton 2004-01-06 17:11 ` Tim Schmielau 1 sibling, 0 replies; 16+ messages in thread From: Andrew Morton @ 2004-01-06 17:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: James.Bottomley, johnstultz, linux-kernel Linus Torvalds <torvalds@osdl.org> wrote: > > > > On Tue, 6 Jan 2004, Andrew Morton wrote: > > > > Hm, OK. I hit the same deadlock when running with the "don't require TSCs > > to be synchronised in sched_clock()" patch from -mm. The fix for that is > > below. I shall accelerate it. > > > > --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix 2003-12-30 00:45:09.000000000 -0800 > > +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c 2003-12-30 00:45:09.000000000 -0800 > > @@ -140,7 +140,8 @@ unsigned long long sched_clock(void) > > #ifndef CONFIG_NUMA > > if (!use_tsc) > > #endif > > - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); > > + /* jiffies might overflow but this is not a big deal here */ > > + return (unsigned long long)jiffies * (1000000000 / HZ); > > Augh. If you cast it to "unsigned long long" anyway, why not just use the > right value? It's "jiffies_64". Sounds sane. > It has other problems, of course, but at least that makes them slightly > less. Yes, it's slightly sleazy. I haven't tested this... From: Ingo Molnar <mingo@elte.hu> Voyager is getting odd deadlocks due to the taking of xtime_lock() in sched_clock()->get_jiffies_64(). I had this patch queued up to fix a different deadlock, which ocurs when we relax the requirement that TSC's be synchronised across CPUs. But it will fix James' deadlock too. arch/i386/kernel/timers/timer_tsc.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletion(-) diff -puN arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix arch/i386/kernel/timers/timer_tsc.c --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix 2004-01-06 08:56:36.000000000 -0800 +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c 2004-01-06 08:58:37.000000000 -0800 @@ -140,7 +140,12 @@ unsigned long long sched_clock(void) #ifndef CONFIG_NUMA if (!use_tsc) #endif - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); + /* + * Avoid xtime_lock deadlocks by not locking the read of + * jiffies_64. Can possibly cause glitches on 32-bit rollovers + * but the CPU scheduler will recover OK. + */ + return jiffies_64 * (1000000000 / HZ); /* Read the Time Stamp Counter */ rdtscll(this_offset); _ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:29 ` Linus Torvalds 2004-01-06 17:05 ` Andrew Morton @ 2004-01-06 17:11 ` Tim Schmielau 2004-01-06 17:22 ` Andrew Morton 2004-01-06 18:02 ` Linus Torvalds 1 sibling, 2 replies; 16+ messages in thread From: Tim Schmielau @ 2004-01-06 17:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, James Bottomley, johnstul, lkml On Tue, 6 Jan 2004, Linus Torvalds wrote: > Augh. If you cast it to "unsigned long long" anyway, why not just use the > right value? It's "jiffies_64". Lessons in straight thinking... We then need to make jiffies_64 volatile, since we violate the rule to never read it. Tim --- linux-2.6.1-rc1/arch/i386/kernel/timers/timer_tsc.c 2003-12-31 14:21:19.000000000 +0100 +++ linux-2.6.1-rc1-j64/arch/i386/kernel/timers/timer_tsc.c 2004-01-06 18:05:24.000000000 +0100 @@ -140,7 +140,8 @@ #ifndef CONFIG_NUMA if (!use_tsc) #endif - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); + /* no locking but a rare wrong value is not a big deal */ + return (unsigned long long)jiffies_64 * (1000000000 / HZ); /* Read the Time Stamp Counter */ rdtscll(this_offset); --- linux-2.6.1-rc1/include/linux/jiffies.h 2003-12-31 14:22:05.000000000 +0100 +++ linux-2.6.1-rc1-j64/include/linux/jiffies.h 2004-01-06 17:56:15.000000000 +0100 @@ -9,11 +9,11 @@ #include <asm/param.h> /* for HZ */ /* - * The 64-bit value is not volatile - you MUST NOT read it - * without sampling the sequence number in xtime_lock. + * NEVER EVER read jiffies_64 without sampling the sequence number + * in xtime_lock. * get_jiffies_64() will do this for you as appropriate. */ -extern u64 jiffies_64; +extern u64 volatile jiffies_64; extern unsigned long volatile jiffies; #if (BITS_PER_LONG < 64) @@ -21,7 +21,7 @@ #else static inline u64 get_jiffies_64(void) { - return (u64)jiffies; + return jiffies_64; } #endif ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 17:11 ` Tim Schmielau @ 2004-01-06 17:22 ` Andrew Morton 2004-01-06 17:37 ` Tim Schmielau 2004-01-06 18:02 ` Linus Torvalds 1 sibling, 1 reply; 16+ messages in thread From: Andrew Morton @ 2004-01-06 17:22 UTC (permalink / raw) To: Tim Schmielau; +Cc: torvalds, James.Bottomley, johnstul, linux-kernel Tim Schmielau <tim@physik3.uni-rostock.de> wrote: > > #if (BITS_PER_LONG < 64) > @@ -21,7 +21,7 @@ > #else > static inline u64 get_jiffies_64(void) > { > - return (u64)jiffies; > + return jiffies_64; > } > #endif hm, why this change? Are you sure that all 64-bit architectures alias jiffies onto jiffies_64? x86_64 seems not to. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 17:22 ` Andrew Morton @ 2004-01-06 17:37 ` Tim Schmielau 0 siblings, 0 replies; 16+ messages in thread From: Tim Schmielau @ 2004-01-06 17:37 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, James.Bottomley, johnstul, linux-kernel On Tue, 6 Jan 2004, Andrew Morton wrote: > Tim Schmielau <tim@physik3.uni-rostock.de> wrote: > > > > #if (BITS_PER_LONG < 64) > > @@ -21,7 +21,7 @@ > > #else > > static inline u64 get_jiffies_64(void) > > { > > - return (u64)jiffies; > > + return jiffies_64; > > } > > #endif > > hm, why this change? Are you sure that all 64-bit architectures alias > jiffies onto jiffies_64? x86_64 seems not to. > It was jiffies_64 in the first place (that's why the function is called get_jiffies_64(), after all). I once made it jiffies because jiffies was volatile while jiffies_64 was not. Now that jiffies_64 becomes volatile as well, I thought we could re-straighten things. But we can leave out this chunk, it really shouldn't make any difference. Tim ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 17:11 ` Tim Schmielau 2004-01-06 17:22 ` Andrew Morton @ 2004-01-06 18:02 ` Linus Torvalds 1 sibling, 0 replies; 16+ messages in thread From: Linus Torvalds @ 2004-01-06 18:02 UTC (permalink / raw) To: Tim Schmielau; +Cc: Andrew Morton, James Bottomley, johnstul, lkml [ This is a big rant against using "volatile" on data structures. Feel free to ignore it, but the fact is, I'm right. You should never EVER use "volatile" on a data structure. ] On Tue, 6 Jan 2004, Tim Schmielau wrote: > > We then need to make jiffies_64 volatile, since we violate the rule to > never read it. No, we should _never_ make anything volatile. There just isn't any reason to. The compiler will never do any "better" with a volatile, it will only ever do worse. If there are memory ordering constraints etc, the compiler won't be able to handle them anyway, and volatile will be a no-op. That's why we have "barrier()" and "mb()" and friends. The _only_ acceptable use of "volatile" is basically: - in _code_ (not data structures), where we might mark a place as making a special type of access. For example, in the PCI MMIO read functions, rather than use inline assembly to force the particular access (which would work equally well), we can force the pointer to a volatile type. Similarly, we force this for "test_bit()" macros etc, because they are documented to work on SMP-safe. But it's the _code_ that works that way, not the data structures. And this is an important distinctions: there are specific pieces of _code_ that may be SMP-safe (for whatever reasons the writer thinks). Data structures themselves are never SMP safe. Ergo: never mark data structures "volatile". It's a sure sign of a bug if the thing isn't a memory-mapped IO register (and even then it's likely a bug, since you really should be using the proper functions). (Some driver writers use "volatile" for mailboxes that are updated by DMA from the hardware. It _can_ be correct there, but the fact is, you might as well put the "volatile" in the code just out of principle). That said, the "sure sign of a bug" case has one specific sub-case: - to paste over bugs that you really don't tink are worth fixing any other way. This is why "jiffies" itself is declared volatile: just to let people write code that does "while (time_before(xxx, jiffies))". But the "jiffies" case is safe only _exactly_ because it's an atomic read. You always get a valid value - so it's actually "safe" to mark jiffies as baing volatile. It allows people to be sloppy (bad), but at least it allows people to be sloppy in a safe way. In contrast, "jiffies_64" is _not_ an area where you can safely let the compiler read a unstable value, so "volatile" is fundamentally wrong for it. You need to have some locking, or to explicitly say "we don't care in this case", and in both cases it would be wrong to call the thing "volatile". With locking, it _isn't_ volatile, and with "we don't care", it had better not make any difference. In either case the "volatile" is wrong. We had absolutely _tons_ of bugs in the original networking code, where clueless people thougth that "volatile" somehow means that you don't need locking. EVERY SINGLE CASE, and I mean EVERY ONE, was a bug. There are some other cases where the "volatile" keyword is fine, notably inline asm has a specific meaning that is pretty well-defined and very useful. But in all other cases I'd suggest having the volatile be part of the code, possibly with an explanation _why_ it is ok to use volatile there unless it is obvious. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:19 ` Andrew Morton 2004-01-06 16:26 ` James Bottomley 2004-01-06 16:29 ` Linus Torvalds @ 2004-01-06 16:30 ` Tim Schmielau 2 siblings, 0 replies; 16+ messages in thread From: Tim Schmielau @ 2004-01-06 16:30 UTC (permalink / raw) To: Andrew Morton; +Cc: James Bottomley, johnstultz, linux-kernel On Tue, 6 Jan 2004, Andrew Morton wrote: > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > > > This patch > > > > > > ChangeSet@1.1534.5.2, 2003-12-30 15:40:23-08:00, akpm@osdl.org > > [PATCH] ia32 jiffy wrapping fixes > > > > Causes the voyager boot to hang. The problem is this change: > > > > --- a/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 > > +++ b/arch/i386/kernel/timers/timer_tsc.c Tue Jan 6 09:57:34 2004 > > @@ -141,7 +140,7 @@ > > #ifndef CONFIG_NUMA > > if (!use_tsc) > > #endif > > - return (unsigned long long)jiffies * (1000000000 / HZ); > > + return (unsigned long long)get_jiffies_64() * > > (1000000000 / HZ); > > Hm, OK. I hit the same deadlock when running with the "don't require TSCs > to be synchronised in sched_clock()" patch from -mm. The fix for that is > below. I shall accelerate it. > > --- 25/arch/i386/kernel/timers/timer_tsc.c~sched_clock-2.6.0-A1-deadlock-fix 2003-12-30 00:45:09.000000000 -0800 > +++ 25-akpm/arch/i386/kernel/timers/timer_tsc.c 2003-12-30 00:45:09.000000000 -0800 > @@ -140,7 +140,8 @@ unsigned long long sched_clock(void) > #ifndef CONFIG_NUMA > if (!use_tsc) > #endif > - return (unsigned long long)get_jiffies_64() * (1000000000 / HZ); > + /* jiffies might overflow but this is not a big deal here */ > + return (unsigned long long)jiffies * (1000000000 / HZ); > > /* Read the Time Stamp Counter */ > rdtscll(this_offset); Yes, please revert that change. When I tested it, I didn't realize I need NUMA or clear use_tsc to actually cover the change. And it's indeed not a big deal. The proposed get_jiffies_64() change however looks quite fragile to me. Sorry, Tim ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley 2004-01-06 16:19 ` Andrew Morton @ 2004-01-06 16:26 ` Linus Torvalds 2004-01-06 18:29 ` Paulo Marques 1 sibling, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2004-01-06 16:26 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Morton, johnstultz, Linux Kernel On Tue, 6 Jan 2004, James Bottomley wrote: > > however, there is also no need to get > the xtime sequence lock every time we do a jiffies_64 read, since the > only unstable time is when we may be updating both halves of it > non-atomically. Thus, we only need the sequence lock when the bottom > half is zero. This should improve the fast path of get_jiffies_64() for > all x86 arch's. This is wrong. There is nothing that guarantees that the read has read the high bits first. It might have read the low bits first (and gotten 0xffffffff), and then on another CPU the contents were updated, and now the high bits are one bigger, and when you read the high bits, you get a value that is off by a _lot_. And it's not just 0 and 0xffffffff in the low bits that can be problematic either: if the CPU that does the "get_jiffies64()" gets an interrupt, the thing may be off by more than a count of one. IF this optimization is really worth it, the code should be something like this: #define JIFFY_SLOP (3) u64 ret = jiffies_64; u32 low32 = ret; if ((low+JIFFY_SLOP) <= JIFFY_SLOP*2) { ... do the seqlock thing ... } instead. Which still avoids the read-lock about 99.9999% of the time, so it may well be worth it. But somebody should double-check my logic. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 16:26 ` Linus Torvalds @ 2004-01-06 18:29 ` Paulo Marques 2004-01-06 18:46 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Paulo Marques @ 2004-01-06 18:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel Linus Torvalds wrote: > > On Tue, 6 Jan 2004, James Bottomley wrote: > >> however, there is also no need to get >>the xtime sequence lock every time we do a jiffies_64 read, since the >>only unstable time is when we may be updating both halves of it >>non-atomically. Thus, we only need the sequence lock when the bottom >>half is zero. This should improve the fast path of get_jiffies_64() for >>all x86 arch's. >> > > This is wrong. There is nothing that guarantees that the read has read the > high bits first. > It seems to me that even if the the high bits get read first, the 64 value can go "backwards" in time, since it can read X from the high 32 bits, then the clock advance to X+1, and the low bits will read a value close to zero, resulting in ((X<<32)|0). > It might have read the low bits first (and gotten 0xffffffff), and then on > another CPU the contents were updated, and now the high bits are one > bigger, and when you read the high bits, you get a value that is off by a > _lot_. > > And it's not just 0 and 0xffffffff in the low bits that can be problematic > either: if the CPU that does the "get_jiffies64()" gets an interrupt, the > thing may be off by more than a count of one. > > IF this optimization is really worth it, the code should be something like > this: > > #define JIFFY_SLOP (3) > > u64 ret = jiffies_64; > u32 low32 = ret; > > if ((low+JIFFY_SLOP) <= JIFFY_SLOP*2) { > ... do the seqlock thing ... > } > > instead. > What about this instead? I don't like very much this kind of construction, but it seems that it would prevent the lock altogether: u32 jiff_high1, jiff_high2, jiff_low do { jiff_high1 = ((volatile) jiffies_64) >> 32; jiff_low = ((volatile) jiffies_64); jiff_high2 = ((volatile) jiffies_64) >> 32; } while(jiff_high1 != jiff_high2); return (jiff_high1<<32) | jiff_low; If there is anyway to avoid the volatiles there, it would be much cleaner. Please don't beat me too hard if there is an obvious problem with this implementation. This is an old trick that was used on a brain dead DOS API, where there were 2 functions to read date and time, and there was hard to get a real timestamp :) -- Paulo Marques - www.grupopie.com "In a world without walls and fences who needs windows and gates?" ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 18:29 ` Paulo Marques @ 2004-01-06 18:46 ` Linus Torvalds 2004-01-06 19:04 ` Paulo Marques 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2004-01-06 18:46 UTC (permalink / raw) To: Paulo Marques; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel On Tue, 6 Jan 2004, Paulo Marques wrote: > > What about this instead? I don't like very much this kind of construction, but > it seems that it would prevent the lock altogether: > > u32 jiff_high1, jiff_high2, jiff_low > > do { > jiff_high1 = ((volatile) jiffies_64) >> 32; > jiff_low = ((volatile) jiffies_64); > jiff_high2 = ((volatile) jiffies_64) >> 32; > } > while(jiff_high1 != jiff_high2); Yes, we used to do things like that at some point (and your code is buggy: by leaving out the size, the "volatile" cast casts to the implicit "int" size in C). It doesn't work in SMP environments, since the "ordering" by volatile is only a single-CPU ordering. This is just one of the reasons why "volatile" isn't very useful. Also, the above still assumes an update ordering between low and high (it assumes that both set of bits get written back simultaneously). Which is not necessarily true on a software _or_ hardware ordering level. So on the reader side you'd need to add an explicit "rmb()" between the two reads of the high bits, and on the writer side you'd need to always make sure that you do an atomic 64-bt write. Since the "rmb()" is as expensive as the sequence lock, the above wouldn't much help. > If there is anyway to avoid the volatiles there, it would be much cleaner. Not only is there a need to avoid them, they don't help in the least, since you need the explicit CPU ordering macro anyway. And that ordering macro is the true cost of "seq_read_lock()", so... In contrast, the reason the simple "assume some values are stable" patch actually _does_ help is that it doesn't depend on any ordering constraints at all. So it literally can be done lock-less, because it knows about something much more fundamental: it depends on the _behaviour_ of the values. Basically, in the kernel, the expensive part about any lock is literally the ordering. There are other things that can be expensive too (if the lock is bouncing back and forth between CPU's, that becomes really exensive really quickly), but to a first order and especially on locks that aren't themselves fundamentally highly contended for, the CPU serialization implied in the lock is the expensive part. So converting that serialization to a "lockless" algorithm that still depends on ordering usually doesn't much help for those locks. The "ordering" part is still serializing, and the main win ends up being on 64-CPU systems etc where you can at least avoid the cacheline bounces. Indeed, I think it was 64-CPU systems that caused the sequence lock stuff, not so much the "normal" 2- and 4-way boxes. If you want to improve on the sequence lock, you need to take advantage of inherent data knowledge (in this case the knowledge that you don't care about the exact value, and that you know how the bits are updated). Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fix get_jiffies_64 to work on voyager 2004-01-06 18:46 ` Linus Torvalds @ 2004-01-06 19:04 ` Paulo Marques 0 siblings, 0 replies; 16+ messages in thread From: Paulo Marques @ 2004-01-06 19:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: James Bottomley, Andrew Morton, johnstultz, Linux Kernel Linus Torvalds wrote: > > On Tue, 6 Jan 2004, Paulo Marques wrote: > >> .... > > Yes, we used to do things like that at some point (and your code is > buggy: by leaving out the size, the "volatile" cast casts to the implicit > "int" size in C). > Ugh, stupid mistake. Just wrote the code to fast... :( Anyway, thanks for your clarification. People like me that code a lot of user-space stuff, take a while to realize all the implications of SMP, ordering, preemption, etc., etc. By the way, after double checking your logic, it looks good as long as there are no interrupts that take more than 3 jiffies to complete :) -- Paulo Marques - www.grupopie.com "In a world without walls and fences who needs windows and gates?" ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2004-01-06 19:05 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-01-06 16:04 [PATCH] fix get_jiffies_64 to work on voyager James Bottomley 2004-01-06 16:19 ` Andrew Morton 2004-01-06 16:26 ` James Bottomley 2004-01-06 17:05 ` Andrew Morton 2004-01-06 18:53 ` James Bottomley 2004-01-06 16:29 ` Linus Torvalds 2004-01-06 17:05 ` Andrew Morton 2004-01-06 17:11 ` Tim Schmielau 2004-01-06 17:22 ` Andrew Morton 2004-01-06 17:37 ` Tim Schmielau 2004-01-06 18:02 ` Linus Torvalds 2004-01-06 16:30 ` Tim Schmielau 2004-01-06 16:26 ` Linus Torvalds 2004-01-06 18:29 ` Paulo Marques 2004-01-06 18:46 ` Linus Torvalds 2004-01-06 19:04 ` Paulo Marques
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.