All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
@ 2012-11-02 16:01 Shan Wei
  2012-11-02 17:46 ` Christoph Lameter
  2012-11-02 18:10 ` Paul E. McKenney
  0 siblings, 2 replies; 9+ messages in thread
From: Shan Wei @ 2012-11-02 16:01 UTC (permalink / raw)
  To: dipankar, paulmck, Kernel-Maillist, cl, Shan Wei

From: Shan Wei <davidshan@tencent.com>

Signed-off-by: Shan Wei <davidshan@tencent.com>
---
 kernel/rcutree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 74df86b..441b945 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	struct rcu_node *rnp_old = NULL;
 
 	/* Funnel through hierarchy to reduce memory contention. */
-	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+	rnp = __this_cpu_read(rsp->rda->mynode);
 	for (; rnp != NULL; rnp = rnp->parent) {
 		ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
 		      !raw_spin_trylock(&rnp->fqslock);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-02 16:01 [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id()) Shan Wei
@ 2012-11-02 17:46 ` Christoph Lameter
  2012-11-02 18:10 ` Paul E. McKenney
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2012-11-02 17:46 UTC (permalink / raw)
  To: Shan Wei; +Cc: dipankar, paulmck, Kernel-Maillist

On Sat, 3 Nov 2012, Shan Wei wrote:

>
>  	/* Funnel through hierarchy to reduce memory contention. */
> -	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> +	rnp = __this_cpu_read(rsp->rda->mynode);
>  	for (; rnp != NULL; rnp = rnp->parent) {

Reviewed-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-02 16:01 [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id()) Shan Wei
  2012-11-02 17:46 ` Christoph Lameter
@ 2012-11-02 18:10 ` Paul E. McKenney
  2012-11-02 20:19   ` Christoph Lameter
  1 sibling, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2012-11-02 18:10 UTC (permalink / raw)
  To: Shan Wei; +Cc: dipankar, Kernel-Maillist, cl

On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> From: Shan Wei <davidshan@tencent.com>
> 
> Signed-off-by: Shan Wei <davidshan@tencent.com>
> ---
>  kernel/rcutree.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 74df86b..441b945 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>  	struct rcu_node *rnp_old = NULL;
> 
>  	/* Funnel through hierarchy to reduce memory contention. */
> -	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> +	rnp = __this_cpu_read(rsp->rda->mynode);

OK, I'll bite...  Why this instead of:

	rnp = __this_cpu_read(rsp->rda)->mynode;

							Thanx, Paul

>  	for (; rnp != NULL; rnp = rnp->parent) {
>  		ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
>  		      !raw_spin_trylock(&rnp->fqslock);
> -- 
> 1.7.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-02 18:10 ` Paul E. McKenney
@ 2012-11-02 20:19   ` Christoph Lameter
  2012-11-03  9:19     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2012-11-02 20:19 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Shan Wei, dipankar, Kernel-Maillist

On Fri, 2 Nov 2012, Paul E. McKenney wrote:

> On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> > From: Shan Wei <davidshan@tencent.com>
> >
> > Signed-off-by: Shan Wei <davidshan@tencent.com>
> > ---
> >  kernel/rcutree.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 74df86b..441b945 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
> >  	struct rcu_node *rnp_old = NULL;
> >
> >  	/* Funnel through hierarchy to reduce memory contention. */
> > -	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> > +	rnp = __this_cpu_read(rsp->rda->mynode);
>
> OK, I'll bite...  Why this instead of:
>
> 	rnp = __this_cpu_read(rsp->rda)->mynode;

Because this_cpu_read fetches a data word from an address. The addres is
relocated using a segment prefix (which contains the offset of the
current per cpu area).

And the address needed here is the address of the field of mynode
within a structure that has a per cpu address.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-02 20:19   ` Christoph Lameter
@ 2012-11-03  9:19     ` Paul E. McKenney
  2012-11-04 10:38       ` Shan Wei
  2012-11-05 15:23       ` Christoph Lameter
  0 siblings, 2 replies; 9+ messages in thread
From: Paul E. McKenney @ 2012-11-03  9:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Shan Wei, dipankar, Kernel-Maillist

On Fri, Nov 02, 2012 at 08:19:04PM +0000, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Paul E. McKenney wrote:
> 
> > On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> > > From: Shan Wei <davidshan@tencent.com>
> > >
> > > Signed-off-by: Shan Wei <davidshan@tencent.com>
> > > ---
> > >  kernel/rcutree.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > index 74df86b..441b945 100644
> > > --- a/kernel/rcutree.c
> > > +++ b/kernel/rcutree.c
> > > @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
> > >  	struct rcu_node *rnp_old = NULL;
> > >
> > >  	/* Funnel through hierarchy to reduce memory contention. */
> > > -	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> > > +	rnp = __this_cpu_read(rsp->rda->mynode);
> >
> > OK, I'll bite...  Why this instead of:
> >
> > 	rnp = __this_cpu_read(rsp->rda)->mynode;
> 
> Because this_cpu_read fetches a data word from an address. The addres is
> relocated using a segment prefix (which contains the offset of the
> current per cpu area).
> 
> And the address needed here is the address of the field of mynode
> within a structure that has a per cpu address.

OK, I do understand why it happens to work.  My question is instead why
it is considered a good idea.  After all, it is the ->rda field that is
marked __percpu, not the ->mynode field.  So in the interest of
mechanical checking and general readability, it seems to me that it
would be way better to apply __this_cpu_read() to rsp->rda rather than
to rsp->rda->mynode.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-03  9:19     ` Paul E. McKenney
@ 2012-11-04 10:38       ` Shan Wei
  2012-11-05 14:49         ` Christoph Lameter
  2012-11-05 15:23       ` Christoph Lameter
  1 sibling, 1 reply; 9+ messages in thread
From: Shan Wei @ 2012-11-04 10:38 UTC (permalink / raw)
  To: paulmck; +Cc: Christoph Lameter, dipankar, Kernel-Maillist

Paul E. McKenney said, at 2012/11/3 17:19:
> OK, I do understand why it happens to work.  My question is instead why
> it is considered a good idea.  

Maybe objdump gives the answer.
 __this_cpu_read which read member pointer of per-cpu variable
can reduce two instructions on x86-64 arch.


*test code:* 
struct eater_state {
        u32 state;
        struct eater __percpu *eater_info;
};

struct eater {
        char name[4];
        u32 age;
};

static u32 test_func(struct eater_state *tstas)
{
        struct eater *aeater;

        //aeater = __this_cpu_ptr(tstas->eater_info);   <-----------------1
        //return aeater->age;
        return  __this_cpu_read(tstas->eater_info->age); <-----------------2
}

static int __init demo_init(void)
{
        int ret = 0 ;
        int age;
        struct eater_state as;
        struct eater david;

        as.state = 1;
        as.eater_info = &david;

        age = test_func(&as);

        return ret;
}


__this_cpu_ptr <-----------------1
0000000000000000 <init_module>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   48 8d 45 f0             lea    -0x10(%rbp),%rax
   c:   65 48 03 04 25 00 00 00 00      add    %gs:0x0,%rax
  15:   31 c0                   xor    %eax,%eax
  17:   c9                      leaveq 
  18:   c3                      retq   


 __this_cpu_read<-----------------2
0000000000000000 <init_module>:
   0:   55                      push   %rbp
   1:   31 c0                   xor    %eax,%eax
   3:   48 89 e5                mov    %rsp,%rbp
   6:   48 83 ec 10             sub    $0x10,%rsp
   a:   c9                      leaveq 
   b:   c3                      retq   



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-04 10:38       ` Shan Wei
@ 2012-11-05 14:49         ` Christoph Lameter
       [not found]           ` <CAPYxyx+eJUWxDJrbOHVRtCchszmj8+BgSkNhpH3gGBJK87OikA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2012-11-05 14:49 UTC (permalink / raw)
  To: Shan Wei; +Cc: paulmck, dipankar, Kernel-Maillist

On Sun, 4 Nov 2012, Shan Wei wrote:

>  __this_cpu_read<-----------------2
> 0000000000000000 <init_module>:
>    0:   55                      push   %rbp
>    1:   31 c0                   xor    %eax,%eax
>    3:   48 89 e5                mov    %rsp,%rbp
>    6:   48 83 ec 10             sub    $0x10,%rsp
>    a:   c9                      leaveq
>    b:   c3                      retq

?? There should be an operation using gs: here. This does not look
like code that includes a __this_cpu_read().


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
  2012-11-03  9:19     ` Paul E. McKenney
  2012-11-04 10:38       ` Shan Wei
@ 2012-11-05 15:23       ` Christoph Lameter
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2012-11-05 15:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Shan Wei, dipankar, Kernel-Maillist

On Sat, 3 Nov 2012, Paul E. McKenney wrote:

> OK, I do understand why it happens to work.  My question is instead why
> it is considered a good idea.  After all, it is the ->rda field that is
> marked __percpu, not the ->mynode field.  So in the interest of
> mechanical checking and general readability, it seems to me that it
> would be way better to apply __this_cpu_read() to rsp->rda rather than
> to rsp->rda->mynode.

mynode is part of the structure reached via rda.

Use on rsp->rda does not work since the offset of mynode must be added to
rda before a fetch related to the current cpus per cpu address can be
done.

this_cpu_ptr relocates and address. this_cpu_read() relocates the address
and performs the fetch. If you want to operate on rda then you can only
use this_cpu_ptr. this_cpu_read() saves you more instructions since it can
do the relocation and the fetch in one instruction.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())
       [not found]           ` <CAPYxyx+eJUWxDJrbOHVRtCchszmj8+BgSkNhpH3gGBJK87OikA@mail.gmail.com>
@ 2012-11-05 15:55             ` Christoph Lameter
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2012-11-05 15:55 UTC (permalink / raw)
  To: 单卫; +Cc: paulmck, dipankar, Kernel-Maillist

[-- Attachment #1: Type: TEXT/PLAIN, Size: 381 bytes --]

On Mon, 5 Nov 2012, 锟斤拷锟斤拷 wrote:

> I guarantee that x86-64 don't use gs register here. run test again锟斤拷
> Maybe there is some optimizations for __this_cpu_read call on x86-64锟斤拷 not
> sure.

There is no optimization that I know of unless the compiler eliminated the
__this_cpu_read completely. gs: is necessary to perform the implied
relocation in this_cpu_read().


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-05 15:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-02 16:01 [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id()) Shan Wei
2012-11-02 17:46 ` Christoph Lameter
2012-11-02 18:10 ` Paul E. McKenney
2012-11-02 20:19   ` Christoph Lameter
2012-11-03  9:19     ` Paul E. McKenney
2012-11-04 10:38       ` Shan Wei
2012-11-05 14:49         ` Christoph Lameter
     [not found]           ` <CAPYxyx+eJUWxDJrbOHVRtCchszmj8+BgSkNhpH3gGBJK87OikA@mail.gmail.com>
2012-11-05 15:55             ` Christoph Lameter
2012-11-05 15:23       ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.