linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Bharata B Rao <bharata.rao@gmail.com>,
	Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
	Paul Menage <menage@google.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cpuacct: add a branch prediction
Date: Thu, 26 Feb 2009 08:45:09 -0800	[thread overview]
Message-ID: <20090226164509.GB6634@linux.vnet.ibm.com> (raw)
In-Reply-To: <145d0010d65060bb089d5a87e06cbd0d.squirrel@webmail-b.css.fujitsu.com>

On Thu, Feb 26, 2009 at 09:06:24PM +0900, KAMEZAWA Hiroyuki wrote:
> Peter Zijlstra wrote:
> > On Thu, 2009-02-26 at 20:17 +0900, KAMEZAWA Hiroyuki wrote:
> >> Peter Zijlstra wrote:
> >> > On Thu, 2009-02-26 at 19:28 +0900, KAMEZAWA Hiroyuki wrote:
> >> >
> >> >> Taking hierarchy mutex while reading will make read-side stable.
> >> >
> >> > We're talking about scheduling here, taking a mutex to stop scheduling
> >> > won't work, nor will it be acceptible to use anything that will.
> >> >
> >> No mutex is necessary, anyway.
> >> hierarchy-walker function completely works well under rcu read lock,
> >> if small jitter is allowed.
> >
> > Right, should be doable -- and looking at the code, we have this
> > horrible 32 bit exception in there that locks the rq in order to read
> > the 64bit value.
> >
> > Would be grand to get rid of that,. how bad would it be for userspace to
> > get the occasionally fubarred value?
> >
> >From view of user-support saler, if terrible broken value is reported,
> it will be user-incident and annoy me(us) ;)
> 
> I'd like to get rid of rq->lock, too..Hmm.. some routine like
> atomic64_read() can help this ? (But I don't want to use atomic_t here..)

atomic64_read() will not help you on a 32-bit machine.  Here is the
sequence of events that will cause the aforementioned user incidents and
consequent annoyance:

o	The value of the counter is (2^32)-1, or 0xffffffff.

o	CPU 0 reads the high-order 32 bits of the counter, getting zero.

o	CPU 1 increments the low-order 32 bits of the counter, resulting
	in zero, but notes that there is a carry out of this field.

o	CPU 0 reads the low-order 32 bits of the counter, getting zero.

o	CPU 1 increments the high-order 32 bits of the counter, so that
	the new value of the counter is 2^32, or 0x100000000.

So CPU 0 gets a value that is -way- off.

The usual trick is something like the following for counter read:

1.	Read the high-order 32 bits of the counter.

2.	Do a memory barrier, smp_mb().

3.	Read the low-order 32 bits of the counter.

4.	Do another memory barrier, again smp_mb().

5.	Read the high-order 32 bits of the counter again.

	If it is the same as the value obtained in step 1 (or the previous
	execution of step 5), then we are done.  (This works even in case
	of complete 64-bit overflow, though we should be very lucky to
	live that long!)  Otherwise, go to step 2.

But it is also necessary to modify the counter update:

1.	Increment the low-order 32 bits of the counter.  If no overflow
	occurred, we are done, otherwise, continue through this sequence
	of steps.

2.	Do a memory barrier, smp_mb().

3.	Increment the high-order 32 bits of the counter.

How to detect overflow in step 1?  Well, if we are incrementing, we can
just test for the new value being zero.  Otherwise, if we are adding
a 32-bit number, if the new value of the low-order 32 bits of counter
is less than the old value, overflow occurred (but make sure that the
comparison is unsigned!).

This all assumes that you are adding a 32-bit quantity to the counter.
Adding 64-bit values is not much harder.

Does this approach work for you?

							Thanx, Paul

> > But aside from that, the cpu controller itself is also summing directly
> > up the hierarchy, so cpuacct doing the same doesn't seem odd.
> >
> I'll post some idea if I can think of something reasonable.
> But I tend to hesitate to modify sched.c ;)
> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  parent reply	other threads:[~2009-02-26 16:45 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-26  7:40 [PATCH] cpuacct: add a branch prediction Li Zefan
2009-02-26  8:07 ` KAMEZAWA Hiroyuki
2009-02-26  8:17   ` Li Zefan
2009-02-26  8:22     ` KAMEZAWA Hiroyuki
2009-02-26  8:35       ` Li Zefan
2009-02-26  8:40         ` KAMEZAWA Hiroyuki
2009-02-26 10:10           ` Bharata B Rao
2009-02-26 10:28             ` KAMEZAWA Hiroyuki
2009-02-26 10:44               ` Peter Zijlstra
2009-02-26 10:55                 ` KAMEZAWA Hiroyuki
2009-02-26 11:22                   ` Peter Zijlstra
2009-02-26 11:17                 ` KAMEZAWA Hiroyuki
2009-02-26 11:28                   ` Peter Zijlstra
2009-02-26 12:06                     ` KAMEZAWA Hiroyuki
2009-02-26 12:20                       ` Peter Zijlstra
2009-02-26 12:26                         ` Ingo Molnar
2009-02-26 12:40                           ` Arnd Bergmann
2009-02-27  4:25                           ` Paul Mackerras
2009-02-26 16:45                       ` Paul E. McKenney [this message]
2009-02-27  0:58                         ` KAMEZAWA Hiroyuki
2009-02-27  1:29                           ` Paul E. McKenney
2009-02-27  3:22                             ` [RFC][PATCH] remove rq->lock from cpuacct cgroup (Was " KAMEZAWA Hiroyuki
2009-03-02 14:56                               ` Peter Zijlstra
2009-03-02 23:42                                 ` KAMEZAWA Hiroyuki
2009-03-03  7:51                                   ` Peter Zijlstra
2009-03-03  9:04                                     ` KAMEZAWA Hiroyuki
2009-03-03  9:40                                       ` Peter Zijlstra
2009-03-03 10:42                                         ` KAMEZAWA Hiroyuki
2009-03-03 10:44                                           ` KAMEZAWA Hiroyuki
2009-03-03 11:54                                           ` Peter Zijlstra
2009-03-04  6:32                                             ` [PATCH] remove rq->lock from cpuacct cgroup v2 KAMEZAWA Hiroyuki
2009-03-04  7:54                                               ` Bharata B Rao
2009-03-04  8:20                                                 ` KAMEZAWA Hiroyuki
2009-03-04  8:46                                                   ` KAMEZAWA Hiroyuki
2009-03-04 10:35                                                     ` Bharata B Rao
2009-03-04 12:11                                                   ` Bharata B Rao
2009-03-04 14:17                                                     ` KAMEZAWA Hiroyuki
2009-02-26  8:37 ` [PATCH] cpuacct: add a branch prediction Balbir Singh
2009-02-26  8:41   ` Li Zefan
2009-02-26 10:40     ` Balbir Singh
2009-02-26 10:43       ` Peter Zijlstra
2009-02-26  8:43   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090226164509.GB6634@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=bharata.rao@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).