netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9 v4] use efficient this_cpu_* helper
@ 2012-11-13  1:51 Shan Wei
  2012-11-15 14:19 ` Christoph Lameter
  0 siblings, 1 reply; 4+ messages in thread
From: Shan Wei @ 2012-11-13  1:51 UTC (permalink / raw)
  To: cl, David Miller, NetDev, Kernel-Maillist, Shan Wei

this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id()) 
and can reduce  memory accesses.
The latter helper needs to find the offset for current cpu,
and needs more assembler instructions which objdump shows in following. 

this_cpu_ptr relocates and address. this_cpu_read() relocates the address
and performs the fetch. If you want to operate on rda(defined as per_cpu) 
then you can only use this_cpu_ptr. this_cpu_read() saves you more instructions
since it can do the relocation and the fetch in one instruction.

per_cpu_ptr(p, smp_processor_id()):
  1e:   65 8b 04 25 00 00 00 00         mov    %gs:0x0,%eax
  26:   48 98                           cltq
  28:   31 f6                           xor    %esi,%esi
  2a:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi
  31:   48 8b 04 c5 00 00 00 00         mov    0x0(,%rax,8),%rax
  39:   c7 44 10 04 14 00 00 00         movl   $0x14,0x4(%rax,%rdx,1)

this_cpu_ptr(p)
  1e:   65 48 03 14 25 00 00 00 00      add    %gs:0x0,%rdx
  27:   31 f6                           xor    %esi,%esi
  29:   c7 42 04 14 00 00 00            movl   $0x14,0x4(%rdx)
  30:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi



Changelog V4:
1. [read|write]ing fields of struct rds_ib_cache_head using __this_cpu_* operation for rds subsystem.
   see patch2
2. fix bug in xfrm to read pointer. see patch3. 
3. avoid type cast in patch7.

Changelog V3:
1. use this_cpu_read directly read member of per-cpu variable,
   so that droping the this_cpu_ptr operation.
2. for preemption off and bottom halves off case,
   use __this_cpu_read instead of this_cpu_read. 

Changelog V2:
1. Use this_cpu_read directly instead of ref to field of per-cpu variable.
2. Patch5 about ftrace is dropped from this series.
3. Add new patch9 to replace get_cpu;per_cpu_ptr;put_cpu with this_cpu_add opt.
4. For preemption disable case, use __this_cpu_read instead.
  

$ git diff --stat d4185bbf62a5d8d777ee445db1581beb17882a07
 drivers/clocksource/arm_generic.c |    2 +-
 kernel/padata.c                   |    5 ++---
 kernel/rcutree.c                  |    2 +-
 kernel/trace/blktrace.c           |    2 +-
 kernel/trace/trace.c              |    5 +----
 net/batman-adv/main.h             |    4 +---
 net/core/flow.c                   |    4 +---
 net/openvswitch/datapath.c        |    4 ++--
 net/openvswitch/vport.c           |    5 ++---
 net/rds/ib.h                      |    2 +-
 net/rds/ib_recv.c                 |   24 +++++++++++++-----------
 net/xfrm/xfrm_ipcomp.c            |    8 +++-----
 12 files changed, 29 insertions(+), 38 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/9 v4] use efficient this_cpu_* helper
  2012-11-13  1:51 [PATCH 0/9 v4] use efficient this_cpu_* helper Shan Wei
@ 2012-11-15 14:19 ` Christoph Lameter
  2012-11-15 14:53   ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Lameter @ 2012-11-15 14:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Miller, NetDev, Shan Wei, Kernel-Maillist

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2786 bytes --]

Tejon: Could you pick up this patchset?

On Tue, 13 Nov 2012, Shan Wei wrote:

> this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id())
> and can reduce  memory accesses.
> The latter helper needs to find the offset for current cpu,
> and needs more assembler instructions which objdump shows in following.
>
> this_cpu_ptr relocates and address. this_cpu_read() relocates the address
> and performs the fetch. If you want to operate on rda(defined as per_cpu)
> then you can only use this_cpu_ptr. this_cpu_read() saves you more instructions
> since it can do the relocation and the fetch in one instruction.
>
> per_cpu_ptr(p, smp_processor_id()):
>   1e:   65 8b 04 25 00 00 00 00         mov    %gs:0x0,%eax
>   26:   48 98                           cltq
>   28:   31 f6                           xor    %esi,%esi
>   2a:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi
>   31:   48 8b 04 c5 00 00 00 00         mov    0x0(,%rax,8),%rax
>   39:   c7 44 10 04 14 00 00 00         movl   $0x14,0x4(%rax,%rdx,1)
>
> this_cpu_ptr(p)
>   1e:   65 48 03 14 25 00 00 00 00      add    %gs:0x0,%rdx
>   27:   31 f6                           xor    %esi,%esi
>   29:   c7 42 04 14 00 00 00            movl   $0x14,0x4(%rdx)
>   30:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi
>
>
>
> Changelog V4:
> 1. [read|write]ing fields of struct rds_ib_cache_head using __this_cpu_* operation for rds subsystem.
>    see patch2
> 2. fix bug in xfrm to read pointer. see patch3.
> 3. avoid type cast in patch7.
>
> Changelog V3:
> 1. use this_cpu_read directly read member of per-cpu variable,
>    so that droping the this_cpu_ptr operation.
> 2. for preemption off and bottom halves off case,
>    use __this_cpu_read instead of this_cpu_read.
>
> Changelog V2:
> 1. Use this_cpu_read directly instead of ref to field of per-cpu variable.
> 2. Patch5 about ftrace is dropped from this series.
> 3. Add new patch9 to replace get_cpu;per_cpu_ptr;put_cpu with this_cpu_add opt.
> 4. For preemption disable case, use __this_cpu_read instead.
>
>
> $ git diff --stat d4185bbf62a5d8d777ee445db1581beb17882a07
>  drivers/clocksource/arm_generic.c |    2 +-
>  kernel/padata.c                   |    5 ++---
>  kernel/rcutree.c                  |    2 +-
>  kernel/trace/blktrace.c           |    2 +-
>  kernel/trace/trace.c              |    5 +----
>  net/batman-adv/main.h             |    4 +---
>  net/core/flow.c                   |    4 +---
>  net/openvswitch/datapath.c        |    4 ++--
>  net/openvswitch/vport.c           |    5 ++---
>  net/rds/ib.h                      |    2 +-
>  net/rds/ib_recv.c                 |   24 +++++++++++++-----------
>  net/xfrm/xfrm_ipcomp.c            |    8 +++-----
>  12 files changed, 29 insertions(+), 38 deletions(-)
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/9 v4] use efficient this_cpu_* helper
  2012-11-15 14:19 ` Christoph Lameter
@ 2012-11-15 14:53   ` Tejun Heo
  2012-11-16  8:30     ` Shan Wei
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2012-11-15 14:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, NetDev, Shan Wei, Kernel-Maillist

On Thu, Nov 15, 2012 at 02:19:38PM +0000, Christoph Lameter wrote:
> Tejon: Could you pick up this patchset?

Sure, but, Shan, when posting patchset, please make the patches
replies to the head message; otherwise, it's pretty difficult to track
what's going on with the patchset as a whole.  I see that some patches
are being picked up by respective subsystems.  If you have patches
left, please let me know.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/9 v4] use efficient this_cpu_* helper
  2012-11-15 14:53   ` Tejun Heo
@ 2012-11-16  8:30     ` Shan Wei
  0 siblings, 0 replies; 4+ messages in thread
From: Shan Wei @ 2012-11-16  8:30 UTC (permalink / raw)
  To: Tejun Heo, David Miller, paulmck, rostedt
  Cc: Christoph Lameter, NetDev, Kernel-Maillist

Hi Tejun Heo:

Tejun Heo said, at 2012/11/15 22:53:
> On Thu, Nov 15, 2012 at 02:19:38PM +0000, Christoph Lameter wrote:
>> Tejon: Could you pick up this patchset?
> 
> Sure, but, Shan, when posting patchset, please make the patches
> replies to the head message; otherwise, it's pretty difficult to track
> what's going on with the patchset as a whole.  I see that some patches
> are being picked up by respective subsystems.  If you have patches
> left, please let me know.

OK, next time i will do as you suggest.

This patchset include more subsystem, i.e network, rcu, trace.
The best way to avoid code conflict is subsystem maintainer to pick them up
to their code tree. I will remind them in each patch that not yet applied and 
add you to the receiver list.

Best Regards
Shan Wei

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-16  8:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-13  1:51 [PATCH 0/9 v4] use efficient this_cpu_* helper Shan Wei
2012-11-15 14:19 ` Christoph Lameter
2012-11-15 14:53   ` Tejun Heo
2012-11-16  8:30     ` Shan Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).