All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	lkml <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@samba.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Anton Blanchard <anton@samba.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC] time: xtime_lock is held too long
Date: Sat, 07 May 2011 01:00:35 +0200	[thread overview]
Message-ID: <1304722835.2821.192.camel@edumazet-laptop> (raw)
In-Reply-To: <1304722000.20980.130.camel@work-vm>

Le vendredi 06 mai 2011 à 15:46 -0700, john stultz a écrit :
> On Sat, 2011-05-07 at 00:30 +0200, Eric Dumazet wrote:
> > Le vendredi 06 mai 2011 à 13:24 -0700, john stultz a écrit :
> > 
> > > So would the easier solution be to just break out timekeeper locking
> > > from the xtime_lock?
> > > 
> > > So basically we would just add a timekeeper.lock seqlock and use it to
> > > protect only the timekeeping code? We can still keep xtime_lock around
> > > for the tick/jiffies protection (well, until tglx kills jiffies :), but
> > > gettimeofday and friends wouldn't be blocked for so long.
> > > 
> > > That should be pretty straight forward now that the timekeeper data is
> > > completely static to timkeeeping.c.
> > > 
> > 
> > Yes :)
> > 
> > I can see many cpus entering tick_do_update_jiffies64() and all are
> > calling write_seqlock(&xtime_lock);
> > 
> > Only first one can perform the work, but all others are waiting on the
> > spinlock, get it, change seqcount, and realize they have nothing to
> > do...
> 
> Huh. So who is calling tick_do_update_jiffies64 in your case? I know the
> sched_tick_timer and tick_nohz_handler checks to make sure
> tick_do_timer_cpu == cpu to avoid exactly the thundering heard problem
> on the jiffies update.
> 
> There's other spots that call tick_do_update_jiffies64, but I thought
> those were more rare. So there may be something else wrong going on
> here.
> 


That I can answer :

echo function > /sys/kernel/debug/tracing/current_tracer

echo "do_timer calc_global_load second_overflow ktime_divns do_timestamp1 do_timestamp2 tick_do_update_jiffies64" >/sys/kernel/debug/tracing/set_ftrace_filter

echo 1 > /sys/kernel/debug/tracing/tracing_enabled 
usleep 2000000;
echo 0 > /sys/kernel/debug/tracing/tracing_enabled
cat /sys/kernel/debug/tracing/trace >TRACE


(I added do_timestamp1/do_timestamp2) after/before write_seqlock()/write_sequnlock()

         <idle>-0     [003]   920.355377: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [006]   920.355377: tick_do_update_jiffies64 <-tick_sched_timer
          <idle>-0     [003]   920.355378: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [000]   920.355657: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [000]   920.355660: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [006]   920.355663: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [006]   920.355666: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [003]   920.355781: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [003]   920.355783: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [004]   920.355796: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [004]   920.355797: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [000]   920.355808: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [004]   920.355933: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [000]   920.355934: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [004]   920.355935: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [000]   920.355935: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [006]   920.356093: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [006]   920.356095: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [006]   920.356198: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [000]   920.356202: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [000]   920.356211: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [002]   920.356319: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [002]   920.356321: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [000]   920.356372: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [004]   920.356372: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [006]   920.356372: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [002]   920.356372: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [003]   920.356372: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [004]   920.356373: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [004]   920.356373: do_timer <-tick_do_update_jiffies64
          <idle>-0     [004]   920.356374: calc_global_load <-do_timer
          <idle>-0     [004]   920.356374: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [006]   920.356375: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [006]   920.356376: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [000]   920.356377: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [000]   920.356377: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [002]   920.356379: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [002]   920.356379: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [003]   920.356380: do_timestamp1 <-tick_do_update_jiffies64
          <idle>-0     [003]   920.356380: do_timestamp2 <-tick_do_update_jiffies64
          <idle>-0     [002]   920.356515: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [002]   920.356517: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [002]   920.356607: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [002]   920.356608: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [000]   920.356716: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [000]   920.356718: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [006]   920.356720: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [006]   920.356723: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [004]   920.356808: tick_do_update_jiffies64 <-tick_check_idle
          <idle>-0     [004]   920.356810: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick


> 
> > Meanwhile, a reader must wait that all writers are finished, because of
> > all seqcount changes storm.
> > 
> > Following patch helps. Of course we might find out why so many cpus (on
> > my 8 cpus machine !) are calling tick_do_update_jiffies64() at the same
> > time...
> > 
> > 
> > This is basically what I said in my first mail : 
> > 
> > Separate logical sections to reduce windows where readers are blocked/spinning.
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index d5097c4..251b2fe 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -56,7 +56,7 @@ static void tick_do_update_jiffies64(ktime_t now)
> >  		return;
> > 
> >  	/* Reevalute with xtime_lock held */
> > -	write_seqlock(&xtime_lock);
> > +	spin_lock(&xtime_lock.lock);
> 
> Oof.. No, this is too ugly and really just abuses the seqlock structure.
> 

That was a hack/POC, of course, but we can cleanup seqlock.h to provide
clean thing. A seqlock_t should include a seqcount_t and a spinlock_t.


diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e98cd2e..afc00f4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -29,8 +29,12 @@
 #include <linux/spinlock.h>
 #include <linux/preempt.h>
 
-typedef struct {
+typedef struct seqcount {
 	unsigned sequence;
+} seqcount_t;
+
+typedef struct {
+	seqcount_t seq;
 	spinlock_t lock;
 } seqlock_t;
 
@@ -53,6 +57,22 @@ typedef struct {
 #define DEFINE_SEQLOCK(x) \
 		seqlock_t x = __SEQLOCK_UNLOCKED(x)
 
+/*
+ * Sequence counter only version assumes that callers are using their
+ * own mutexing.
+ */
+static inline void write_seqcount_begin(seqcount_t *s)
+{
+	s->sequence++;
+	smp_wmb();
+}
+
+static inline void write_seqcount_end(seqcount_t *s)
+{
+	smp_wmb();
+	s->sequence++;
+}
+
 /* Lock out other writers and update the count.
  * Acts like a normal spin_lock/unlock.
  * Don't need preempt_disable() because that is in the spin_lock already.
@@ -60,14 +80,12 @@ typedef struct {
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	++sl->sequence;
-	smp_wmb();
+	write_seqcount_begin(&sl->seq);
 }
 
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	smp_wmb();
-	sl->sequence++;
+	write_seqcount_end(&sl->seq);
 	spin_unlock(&sl->lock);
 }
 
@@ -75,10 +93,9 @@ static inline int write_tryseqlock(seqlock_t *sl)
 {
 	int ret = spin_trylock(&sl->lock);
 
-	if (ret) {
-		++sl->sequence;
-		smp_wmb();
-	}
+	if (ret)
+		write_seqcount_begin(&sl->seq);
+
 	return ret;
 }
 
@@ -88,7 +105,7 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
 	unsigned ret;
 
 repeat:
-	ret = sl->sequence;
+	ret = sl->seq.sequence;
 	smp_rmb();
 	if (unlikely(ret & 1)) {
 		cpu_relax();
@@ -107,7 +124,7 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	smp_rmb();
 
-	return unlikely(sl->sequence != start);
+	return unlikely(sl->seq.sequence != start);
 }
 
 
@@ -118,10 +135,6 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
  * after the write_seqcount_end().
  */
 
-typedef struct seqcount {
-	unsigned sequence;
-} seqcount_t;
-
 #define SEQCNT_ZERO { 0 }
 #define seqcount_init(x)	do { *(x) = (seqcount_t) SEQCNT_ZERO; } while (0)
 
@@ -204,22 +217,6 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
- */
-static inline void write_seqcount_begin(seqcount_t *s)
-{
-	s->sequence++;
-	smp_wmb();
-}
-
-static inline void write_seqcount_end(seqcount_t *s)
-{
-	smp_wmb();
-	s->sequence++;
-}
-
 /**
  * write_seqcount_barrier - invalidate in-progress read-side seq operations
  * @s: pointer to seqcount_t



> If you really want to untangle what xtime_lock protects, you need to
> introduce a new lock (I suggest in the timekeeper structure) to protect
> the timekeeping data.
> 
> Then we can refine xtime_lock to also just protect the jiffies/tick
> management bits as well if needed.

For the moment I am trying to understand the code. Its quite complex and
uses a monolithic seqlock, defeating seqlock power.

That was really my initial point, if you remember my mail...




  reply	other threads:[~2011-05-06 23:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04  3:11 [PATCH] time: Add locking to xtime access in get_seconds() John Stultz
2011-05-04  3:52 ` Andi Kleen
2011-05-05  2:54   ` john stultz
2011-05-05  5:44     ` Eric Dumazet
2011-05-05  6:21       ` john stultz
2011-05-05  6:50         ` Eric Dumazet
2011-05-05  8:14         ` Paul E. McKenney
2011-05-05 18:51           ` john stultz
2011-05-05 14:04         ` [RFC] time: xtime_lock is held too long Eric Dumazet
2011-05-05 14:39           ` Thomas Gleixner
2011-05-05 15:08             ` Eric Dumazet
2011-05-05 15:59               ` Thomas Gleixner
2011-05-05 21:01                 ` Andi Kleen
2011-05-06  1:41                   ` Eric Dumazet
2011-05-06  6:55                     ` Andi Kleen
2011-05-06 10:18                   ` Thomas Gleixner
2011-05-06 10:22                     ` Ingo Molnar
2011-05-06 16:53                       ` Andi Kleen
2011-05-07  8:20                         ` Ingo Molnar
2011-05-06 16:59                     ` Andi Kleen
2011-05-06 17:09                       ` Eric Dumazet
2011-05-06 17:17                         ` Andi Kleen
2011-05-06 17:42                       ` Eric Dumazet
2011-05-06 17:50                         ` Andi Kleen
2011-05-06 19:26                           ` Eric Dumazet
2011-05-06 20:04                             ` Eric Dumazet
2011-05-06 20:24                               ` john stultz
2011-05-06 22:30                                 ` Eric Dumazet
2011-05-06 22:46                                   ` john stultz
2011-05-06 23:00                                     ` Eric Dumazet [this message]
2011-05-06 23:28                                       ` john stultz
2011-05-07  5:02                                         ` Eric Dumazet
2011-05-07  7:11                                           ` Henrik Rydberg
2011-05-09  8:40                                         ` Thomas Gleixner
2011-05-12  9:13                                         ` [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop, [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop Milton Miller
2011-05-12  9:13                                           ` Milton Miller
2011-05-12  9:35                                           ` Eric Dumazet
2011-05-12  9:35                                             ` Eric Dumazet
2011-05-12 14:08                                           ` Andi Kleen
2011-05-12 14:08                                             ` Andi Kleen
2011-05-06 20:18                         ` [RFC] time: xtime_lock is held too long john stultz
2011-05-05 17:57     ` [PATCH] time: Add locking to xtime access in get_seconds() Andi Kleen
2011-05-05 20:17       ` john stultz
2011-05-05 20:24         ` Eric Dumazet
2011-05-05 20:40           ` john stultz
2011-05-05 20:43             ` Eric Dumazet
2011-05-05 20:56         ` Andi Kleen
2011-05-04 16:51 ` Max Asbock
2011-05-04 21:05   ` Andi Kleen
2011-05-04 23:05   ` john stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1304722835.2821.192.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=anton@samba.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.