All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH_TAKE_2] now < last_tick problem
@ 2003-10-10  4:13 Ian Wienand
  2003-10-10 16:42 ` David Mosberger
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Ian Wienand @ 2003-10-10  4:13 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On Fri, Oct 10, 2003 at 10:52:21AM +1000, Ian Wienand wrote:
> Suggested patch attached; note the fsyscall implementation does not
> appear to have this problem.

Sorry to reply to myself, but I noticed when using httperf I still got
one or two messages (literally, it was very much reduced); the first
patch misses the fact there is also a race between 'lost' being
calculated and the timer interrupt possibly catching jiffies up to
wall_jiffies.  I also added a comment as I get easily confused.

The attached patch lets me run httperf repeatedly with none of these
warnings, though there still might be a better way to do it.

-i



[-- Attachment #2: time.c.take2.diff --]
[-- Type: text/plain, Size: 1444 bytes --]

===== arch/ia64/kernel/time.c 1.35 vs edited =====
--- 1.35/arch/ia64/kernel/time.c	Wed Oct  8 12:53:38 2003
+++ edited/arch/ia64/kernel/time.c	Fri Oct 10 14:08:15 2003
@@ -71,11 +71,32 @@
 unsigned long
 itc_get_offset (void)
 {
-	unsigned long elapsed_cycles, lost = jiffies - wall_jiffies;
-	unsigned long now = ia64_get_itc(), last_tick;
+	unsigned long elapsed_cycles;
+	unsigned long now, last_tick;
 
+	/* 
+	 * itm_next is the next timer tick
+	 * itm_delta is the time between timer ticks
+	 * wall_jiffies are timer ticks the timer interrupt hasn't 
+	 * added to jiffies yet.
+	 *
+	 *    itm_delta      itm_delta
+	 * |--------------|---------------|
+	 * jiffies        wall_jiffies    itm_next
+	 *
+	 * (wall_jiffies - jiffies)*itm_delta = ITC ticks between jiffies and wall_jiffies
+	 * itm_next - itm_delta = ITC at wall_jiffies
+	 * last_tick = ITC at wall_jiffies - ITC ticks between jiffies and wall_jiffies 
+	 * elapsed ITC ticks since jiffies updated = ITC now - last_tick
+	 */
 	last_tick = (cpu_data(TIME_KEEPER_ID)->itm_next
-		     - (lost + 1)*cpu_data(TIME_KEEPER_ID)->itm_delta);
+		     - (jiffies - wall_jiffies + 1)*cpu_data(TIME_KEEPER_ID)->itm_delta);
+	
+	/*
+	 * get now after last_tick to avoid race condition where
+	 * itm_next might be updated.
+	 */
+	now = ia64_get_itc();
 
 	if (unlikely((long) (now - last_tick) < 0)) {
 		printk(KERN_ERR "CPU %d: now < last_tick (now=0x%lx,last_tick=0x%lx)!\n",

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
@ 2003-10-10 16:42 ` David Mosberger
  2003-10-13  2:11 ` Ian Wienand
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: David Mosberger @ 2003-10-10 16:42 UTC (permalink / raw)
  To: linux-ia64

Ian,

Just a quick note (got to run): I think you correctly identified the
race causing the now < last_tick problem.  Purely from (bad) memory, I
think the problem was introduced when xtime_lock was converted from an
irq-safe spinlock to a seq-lock.  In theory, xtime_lock still protects
get_offset(), but the theory only holds as long as the seq-lock body
is "transactional" (no side-effects until read_seqretry() returns 0).
I think the source of the probem is that we consider the value
returned by get_offset() to be valid EVEN when read_seqretry() returns
1.  Because of that, we'll end up updating last_nsec_offset with a
potentialy bad value.

	--david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
  2003-10-10 16:42 ` David Mosberger
@ 2003-10-13  2:11 ` Ian Wienand
  2003-10-13 18:17 ` David Mosberger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ian Wienand @ 2003-10-13  2:11 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On Fri, Oct 10, 2003 at 09:42:35AM -0700, David Mosberger wrote:
> I think the source of the probem is that we consider the value
> returned by get_offset() to be valid EVEN when read_seqretry() returns
> 1.  Because of that, we'll end up updating last_nsec_offset with a
> potentialy bad value.

Well, to my eyes the use of the xtime_lock in do_gettimeofday() looks
OK, but I guess what you are saying is that the message is moot --
xtime_lock protects everything itc_get_offset() needs, and
do_gettimeofday() has a read lock on xtime_lock and so reads the
offset again if something was updated underneath it.

-i


[-- Attachment #2: time.c.nomsg.diff --]
[-- Type: text/plain, Size: 1166 bytes --]

===== arch/ia64/kernel/time.c 1.35 vs edited =====
--- 1.35/arch/ia64/kernel/time.c	Wed Oct  8 12:53:38 2003
+++ edited/arch/ia64/kernel/time.c	Mon Oct 13 11:54:05 2003
@@ -65,8 +65,11 @@
 }
 
 /*
- * Return the number of nano-seconds that elapsed since the last update to jiffy.  The
- * xtime_lock must be at least read-locked when calling this routine.
+ * Return the number of nano-seconds that elapsed since the last
+ * update to jiffy.  It is quite possible that the timer interrupt
+ * will interrupt this and result in a race for any of jiffies,
+ * wall_jiffies or itm_next.  Thus, the xtime_lock must be at least
+ * read-locked when calling this routine.
  */
 unsigned long
 itc_get_offset (void)
@@ -77,11 +80,6 @@
 	last_tick = (cpu_data(TIME_KEEPER_ID)->itm_next
 		     - (lost + 1)*cpu_data(TIME_KEEPER_ID)->itm_delta);
 
-	if (unlikely((long) (now - last_tick) < 0)) {
-		printk(KERN_ERR "CPU %d: now < last_tick (now=0x%lx,last_tick=0x%lx)!\n",
-		       smp_processor_id(), now, last_tick);
-		return last_nsec_offset;
-	}
 	elapsed_cycles = now - last_tick;
 	return (elapsed_cycles*local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
  2003-10-10 16:42 ` David Mosberger
  2003-10-13  2:11 ` Ian Wienand
@ 2003-10-13 18:17 ` David Mosberger
  2003-10-13 23:06 ` Ian Wienand
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: David Mosberger @ 2003-10-13 18:17 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 13 Oct 2003 12:11:52 +1000, Ian Wienand <ianw@gelato.unsw.edu.au> said:

  Ian> Well, to my eyes the use of the xtime_lock in do_gettimeofday() looks
  Ian> OK, but I guess what you are saying is that the message is moot --
  Ian> xtime_lock protects everything itc_get_offset() needs, and
  Ian> do_gettimeofday() has a read lock on xtime_lock and so reads the
  Ian> offset again if something was updated underneath it.

You do realize that xtime_lock is NOT a lock at all?  Seqlock is a
scheme for lock-free synchronization.  Readers and writers will run
concurrently and the only guarantee that you get is that if a writer
interfered with a reader, the reader will retry its operation (but of
course, this means that the reader occasionally will be seeing
inconsistent data).

	--david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
                   ` (2 preceding siblings ...)
  2003-10-13 18:17 ` David Mosberger
@ 2003-10-13 23:06 ` Ian Wienand
  2003-10-14  5:23 ` David Mosberger
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ian Wienand @ 2003-10-13 23:06 UTC (permalink / raw)
  To: linux-ia64

On Mon, Oct 13, 2003 at 11:17:06AM -0700, David Mosberger wrote:
> You do realize that xtime_lock is NOT a lock at all?  Seqlock is a
> scheme for lock-free synchronization.

That was my point, sorry.

do_gettimeofday() does 

 while (1) {
                seq = read_seqbegin(&xtime_lock);
                {
                        old = last_nsec_offset;
                        offset = time_interpolator_get_offset();
                        sec = xtime.tv_sec;
                        nsec = xtime.tv_nsec;
                }
                if (unlikely(read_seqretry(&xtime_lock, seq)))
                        continue;
  ... and so on ...
 }

Previously, if that read_seqbegin was some kind of irq save lock, then
time_interpolator_get_offset() (which is itc_get_offset()) should
never have been interrupted (especially by the timer interrupt), and
the warning message (now < last_tick) meant something was wrong.
Using synchronisation, it's probable that itc_get_offset() will be
interrupted every now and then, but do_gettimeofday() will keep
retrying it till read_seqretry informs it that read the right values.

So the warning message in itc_get_offset isn't really needed?

-i

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
                   ` (3 preceding siblings ...)
  2003-10-13 23:06 ` Ian Wienand
@ 2003-10-14  5:23 ` David Mosberger
  2003-10-14  5:53 ` Ian Wienand
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: David Mosberger @ 2003-10-14  5:23 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 14 Oct 2003 09:06:21 +1000, Ian Wienand <ianw@gelato.unsw.edu.au> said:

  Ian> That was my point, sorry.

  Ian> do_gettimeofday() does

  Ian>  while (1) {
  Ian>                seq = read_seqbegin(&xtime_lock);
  Ian>                {
  Ian>                        old = last_nsec_offset;
  Ian>                        offset = time_interpolator_get_offset();
  Ian>                        sec = xtime.tv_sec;
  Ian>                        nsec = xtime.tv_nsec;
  Ian>                }
  Ian>                if (unlikely(read_seqretry(&xtime_lock, seq)))
  Ian>                        continue;
  Ian>  ... and so on ...
  Ian> }

  Ian> Previously, if that read_seqbegin was some kind of irq save lock, then
  Ian> time_interpolator_get_offset() (which is itc_get_offset()) should
  Ian> never have been interrupted (especially by the timer interrupt), and
  Ian> the warning message (now < last_tick) meant something was wrong.
  Ian> Using synchronisation, it's probable that itc_get_offset() will be
  Ian> interrupted every now and then, but do_gettimeofday() will keep
  Ian> retrying it till read_seqretry informs it that read the right values.

  Ian> So the warning message in itc_get_offset isn't really needed?

Hmmh, I seem to have misremembered the code.  I thought we updated
last_nsec_offset _before_ read_seqretry(), but that's not the case.  I
think you may be right that we can simply delete the (bogus)
consistency-check.  May want to add a comment about that, though.

	--david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
                   ` (4 preceding siblings ...)
  2003-10-14  5:23 ` David Mosberger
@ 2003-10-14  5:53 ` Ian Wienand
  2003-10-14 16:58 ` David Mosberger
  2003-10-14 23:05 ` Ian Wienand
  7 siblings, 0 replies; 9+ messages in thread
From: Ian Wienand @ 2003-10-14  5:53 UTC (permalink / raw)
  To: linux-ia64

On Mon, Oct 13, 2003 at 10:23:39PM -0700, David Mosberger wrote:
> consistency-check.  May want to add a comment about that, though.

A few messages up I added something like 

+ * Return the number of nano-seconds that elapsed since the last
+ * update to jiffy.  It is quite possible that the timer interrupt
+ * will interrupt this and result in a race for any of jiffies,
+ * wall_jiffies or itm_next.  Thus, the xtime_lock must be at least
+ * read-locked when calling this routine.

although read-locked should probably be 'read synchronised' or
something to avoid confusion.

-i

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
                   ` (5 preceding siblings ...)
  2003-10-14  5:53 ` Ian Wienand
@ 2003-10-14 16:58 ` David Mosberger
  2003-10-14 23:05 ` Ian Wienand
  7 siblings, 0 replies; 9+ messages in thread
From: David Mosberger @ 2003-10-14 16:58 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 14 Oct 2003 15:53:15 +1000, Ian Wienand <ianw@gelato.unsw.edu.au> said:

  Ian> On Mon, Oct 13, 2003 at 10:23:39PM -0700, David Mosberger
  Ian> wrote:
  >> consistency-check.  May want to add a comment about that, though.

  Ian> A few messages up I added something like

  Ian> + * Return the number of nano-seconds that elapsed since the
  Ian> last + * update to jiffy.  It is quite possible that the timer
  Ian> interrupt + * will interrupt this and result in a race for any
  Ian> of jiffies, + * wall_jiffies or itm_next.  Thus, the xtime_lock
  Ian> must be at least + * read-locked when calling this routine.

  Ian> although read-locked should probably be 'read synchronised' or
  Ian> something to avoid confusion.

Sounds fine to me.  Do you want to send me a complete patch so you'll
get the proper credit?

Thanks,

	--david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH_TAKE_2] now < last_tick problem
  2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
                   ` (6 preceding siblings ...)
  2003-10-14 16:58 ` David Mosberger
@ 2003-10-14 23:05 ` Ian Wienand
  7 siblings, 0 replies; 9+ messages in thread
From: Ian Wienand @ 2003-10-14 23:05 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 148 bytes --]

On Tue, Oct 14, 2003 at 09:58:14AM -0700, David Mosberger wrote:
> Sounds fine to me.  Do you want to send me a complete patch

attached

thanks,
-i

[-- Attachment #2: time.c.take3.diff --]
[-- Type: text/plain, Size: 1221 bytes --]

===== arch/ia64/kernel/time.c 1.35 vs edited =====
--- 1.35/arch/ia64/kernel/time.c	Wed Oct  8 12:53:38 2003
+++ edited/arch/ia64/kernel/time.c	Wed Oct 15 08:54:31 2003
@@ -65,8 +65,12 @@
 }
 
 /*
- * Return the number of nano-seconds that elapsed since the last update to jiffy.  The
- * xtime_lock must be at least read-locked when calling this routine.
+ * Return the number of nano-seconds that elapsed since the last
+ * update to jiffy.  It is quite possible that the timer interrupt
+ * will interrupt this and result in a race for any of jiffies,
+ * wall_jiffies or itm_next.  Thus, the xtime_lock must be at least
+ * read synchronised when calling this routine (see do_gettimeofday()
+ * below for an example).
  */
 unsigned long
 itc_get_offset (void)
@@ -77,11 +81,6 @@
 	last_tick = (cpu_data(TIME_KEEPER_ID)->itm_next
 		     - (lost + 1)*cpu_data(TIME_KEEPER_ID)->itm_delta);
 
-	if (unlikely((long) (now - last_tick) < 0)) {
-		printk(KERN_ERR "CPU %d: now < last_tick (now=0x%lx,last_tick=0x%lx)!\n",
-		       smp_processor_id(), now, last_tick);
-		return last_nsec_offset;
-	}
 	elapsed_cycles = now - last_tick;
 	return (elapsed_cycles*local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-10-14 23:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-10  4:13 [PATCH_TAKE_2] now < last_tick problem Ian Wienand
2003-10-10 16:42 ` David Mosberger
2003-10-13  2:11 ` Ian Wienand
2003-10-13 18:17 ` David Mosberger
2003-10-13 23:06 ` Ian Wienand
2003-10-14  5:23 ` David Mosberger
2003-10-14  5:53 ` Ian Wienand
2003-10-14 16:58 ` David Mosberger
2003-10-14 23:05 ` Ian Wienand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.