All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] core: fix the use of this_cpu_ptr
@ 2013-03-28  9:42 roy.qing.li
  2013-03-28 13:05 ` Eric Dumazet
  2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller
  0 siblings, 2 replies; 30+ messages in thread
From: roy.qing.li @ 2013-03-28  9:42 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

flush_tasklet is not percpu var, and percpu is percpu var, and
	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
is not equal to
	&this_cpu_ptr(info->cache->percpu)->flush_tasklet

1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/core/flow.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/flow.c b/net/core/flow.c
index 7fae135..e8084b8 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -346,7 +346,7 @@ static void flow_cache_flush_per_cpu(void *data)
 	struct flow_flush_info *info = data;
 	struct tasklet_struct *tasklet;
 
-	tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet);
+	tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;
 	tasklet->data = (unsigned long)info;
 	tasklet_schedule(tasklet);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28  9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li
@ 2013-03-28 13:05 ` Eric Dumazet
  2013-03-28 14:38   ` Christoph Lameter
  2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-03-28 13:05 UTC (permalink / raw)
  To: roy.qing.li, Shan Wei, Christoph Lameter; +Cc: netdev

On Thu, 2013-03-28 at 17:42 +0800, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
> 
> flush_tasklet is not percpu var, and percpu is percpu var, and
> 	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
> is not equal to
> 	&this_cpu_ptr(info->cache->percpu)->flush_tasklet
> 
> 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
>  net/core/flow.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/flow.c b/net/core/flow.c
> index 7fae135..e8084b8 100644
> --- a/net/core/flow.c
> +++ b/net/core/flow.c
> @@ -346,7 +346,7 @@ static void flow_cache_flush_per_cpu(void *data)
>  	struct flow_flush_info *info = data;
>  	struct tasklet_struct *tasklet;
>  
> -	tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet);
> +	tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;
>  	tasklet->data = (unsigned long)info;
>  	tasklet_schedule(tasklet);
>  }

Hi

Any reason you dont Cc Shan Wei & Christoph Lameter ?

Christoph, could this kind of error be detected by the compiler or
sparse ?

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28 13:05 ` Eric Dumazet
@ 2013-03-28 14:38   ` Christoph Lameter
  2013-03-28 15:36     ` Eric Dumazet
  2013-03-29  1:24     ` RongQing Li
  0 siblings, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-03-28 14:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: roy.qing.li, Shan Wei, netdev

On Thu, 28 Mar 2013, Eric Dumazet wrote:

> > flush_tasklet is not percpu var, and percpu is percpu var, and
> > 	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
> > is not equal to
> > 	&this_cpu_ptr(info->cache->percpu)->flush_tasklet

&this_cpu_ptr is always an error since you are taking the addresss of an
address.

this_cpu_ptr(&structure) is the right way to get the address of the cpu
instance for this cpu for a per cpu structure.

> Christoph, could this kind of error be detected by the compiler or
> sparse ?

The per cpu variables are marked with __percpu. This should be detected by
sparse.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28 14:38   ` Christoph Lameter
@ 2013-03-28 15:36     ` Eric Dumazet
  2013-03-28 16:44       ` Christoph Lameter
  2013-03-29  1:24     ` RongQing Li
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-03-28 15:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: roy.qing.li, Shan Wei, netdev

On Thu, 2013-03-28 at 14:38 +0000, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Eric Dumazet wrote:

> 
> > Christoph, could this kind of error be detected by the compiler or
> > sparse ?
> 
> The per cpu variables are marked with __percpu. This should be detected by
> sparse.

make C=2 net/core/flow.o

  CHECK   net/core/flow.c

No warning.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28 15:36     ` Eric Dumazet
@ 2013-03-28 16:44       ` Christoph Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-03-28 16:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: roy.qing.li, Shan Wei, netdev

On Thu, 28 Mar 2013, Eric Dumazet wrote:

> On Thu, 2013-03-28 at 14:38 +0000, Christoph Lameter wrote:
> > On Thu, 28 Mar 2013, Eric Dumazet wrote:
>
> >
> > > Christoph, could this kind of error be detected by the compiler or
> > > sparse ?
> >
> > The per cpu variables are marked with __percpu. This should be detected by
> > sparse.
>
> make C=2 net/core/flow.o
>
>   CHECK   net/core/flow.c
>
> No warning.

huh?

this_cpu_ptr uses SHIFT_PERCPU_PTR


#ifndef SHIFT_PERCPU_PTR
/* Weird cast keeps both GCC and sparse happy. */
#define SHIFT_PERCPU_PTR(__p, __offset) ({                              \
        __verify_pcpu_ptr((__p));                                       \
        RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset));
\
})
#endif

This would mean that __verify_pcpu_ptr is broken.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28 14:38   ` Christoph Lameter
  2013-03-28 15:36     ` Eric Dumazet
@ 2013-03-29  1:24     ` RongQing Li
  2013-04-01 15:21       ` Christoph Lameter
  1 sibling, 1 reply; 30+ messages in thread
From: RongQing Li @ 2013-03-29  1:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, Shan Wei, netdev

2013/3/28 Christoph Lameter <cl@linux.com>:
> On Thu, 28 Mar 2013, Eric Dumazet wrote:
>
>> > flush_tasklet is not percpu var, and percpu is percpu var, and
>> >     this_cpu_ptr(&info->cache->percpu->flush_tasklet)
>> > is not equal to
>> >     &this_cpu_ptr(info->cache->percpu)->flush_tasklet
>
> &this_cpu_ptr is always an error since you are taking the addresss of an
> address.
>

&this_cpu_ptr()->flush_tasklet,   "->" has high priority than "&"
so the result is same as
 &(this_cpu_ptr()->flush_tasklet)
it should not a issue.

flush_tasklet is not a percpu var, it is a member of percpu var.

-Roy

> this_cpu_ptr(&structure) is the right way to get the address of the cpu
> instance for this cpu for a per cpu structure.
>
>> Christoph, could this kind of error be detected by the compiler or
>> sparse ?
>
> The per cpu variables are marked with __percpu. This should be detected by
> sparse.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-28  9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li
  2013-03-28 13:05 ` Eric Dumazet
@ 2013-03-29 19:13 ` David Miller
  1 sibling, 0 replies; 30+ messages in thread
From: David Miller @ 2013-03-29 19:13 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev

From: roy.qing.li@gmail.com
Date: Thu, 28 Mar 2013 17:42:41 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> flush_tasklet is not percpu var, and percpu is percpu var, and
> 	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
> is not equal to
> 	&this_cpu_ptr(info->cache->percpu)->flush_tasklet
> 
> 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-03-29  1:24     ` RongQing Li
@ 2013-04-01 15:21       ` Christoph Lameter
  2013-04-01 16:31         ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2013-04-01 15:21 UTC (permalink / raw)
  To: RongQing Li; +Cc: Eric Dumazet, Shan Wei, netdev

On Fri, 29 Mar 2013, RongQing Li wrote:

> > &this_cpu_ptr is always an error since you are taking the addresss of an
> > address.
> >
>
> &this_cpu_ptr()->flush_tasklet,   "->" has high priority than "&"
> so the result is same as
>  &(this_cpu_ptr()->flush_tasklet)

Ok. This is the same as

	this_cpu_read(xxx.flush_tasklet)

Looks less confusing to me.

> flush_tasklet is not a percpu var, it is a member of percpu var.

Well then it would be best to use this_cpu_read() instead of this_cpu_ptr.
It also will generate better code.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-04-01 15:21       ` Christoph Lameter
@ 2013-04-01 16:31         ` Eric Dumazet
  2013-04-01 18:15           ` Christoph Lameter
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Eric Dumazet @ 2013-04-01 16:31 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: RongQing Li, Shan Wei, netdev

On Mon, 2013-04-01 at 15:21 +0000, Christoph Lameter wrote:
> On Fri, 29 Mar 2013, RongQing Li wrote:

> > flush_tasklet is not a percpu var, it is a member of percpu var.
> 
> Well then it would be best to use this_cpu_read() instead of this_cpu_ptr.
> It also will generate better code.

I believe we already had this discussion in the past.

flush_tasklet is a structure, and we need its address, not read its
content.

You can not use this_cpu_read() to get its address, and following
code is fine.

tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;

Similar to this code in mm/page_alloc.c

pcp = &this_cpu_ptr(zone->pageset)->pcp;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] core: fix the use of this_cpu_ptr
  2013-04-01 16:31         ` Eric Dumazet
@ 2013-04-01 18:15           ` Christoph Lameter
  2013-04-03 20:41           ` this cpu documentation Christoph Lameter
       [not found]           ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org>
  2 siblings, 0 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-01 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: RongQing Li, Shan Wei, netdev

On Mon, 1 Apr 2013, Eric Dumazet wrote:

> On Mon, 2013-04-01 at 15:21 +0000, Christoph Lameter wrote:
> > On Fri, 29 Mar 2013, RongQing Li wrote:
>
> > > flush_tasklet is not a percpu var, it is a member of percpu var.
> >
> > Well then it would be best to use this_cpu_read() instead of this_cpu_ptr.
> > It also will generate better code.
>
> I believe we already had this discussion in the past.
>
> flush_tasklet is a structure, and we need its address, not read its
> content.
>
> You can not use this_cpu_read() to get its address, and following
> code is fine.
>
> tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;

that is confusing..

tasklet = this_cpu_ptr(&info->cache->percpu->flushtasklet)

this_cpu_ptr performs an address relocation. The address then is the one
of flushtasklet.

> Similar to this code in mm/page_alloc.c
>
> pcp = &this_cpu_ptr(zone->pageset)->pcp;

Yeah thats my (early) code using these features.

	pcp = this_cpu_ptr(&zone->pageset->pcp)

I need to do a writeup on this one.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* this cpu documentation
  2013-04-01 16:31         ` Eric Dumazet
  2013-04-01 18:15           ` Christoph Lameter
@ 2013-04-03 20:41           ` Christoph Lameter
  2013-04-03 21:18             ` Tejun Heo
  2013-04-04  0:09             ` Randy Dunlap
       [not found]           ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org>
  2 siblings, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-03 20:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt


From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation

Document the rationale and the way to use this_cpu operations.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops	2013-04-03 15:25:41.424846306 -0500
@@ -0,0 +1,194 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add an per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This mean there are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV instructions like
+inc/dec/cmpxchg without the lock prefix and the associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+	DEFINE_PER_CPU(int, x);
+	int z;
+
+	z = this_cpu_read(x);
+
+results in a single instruction
+
+	mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the Os from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+	this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+	inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+	int *y;
+	int cpu;
+
+	cpu = get_cpu();
+	y = per_cpu_ptr(&x, cpu);
+	(*y)++;
+	put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+	y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local\
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+	x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+		of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Lets say we have a percpu structure
+
+	struct s {
+		int n,m;
+	};
+
+	DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+	this_cpu_inc(p.m)
+
+	z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+	struct s __percpu *ps = &p;
+
+	z = this_cpu_dec(ps->m);
+
+	z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+	struct s *pp;
+
+	pp = this_cpu_ptr(&p);
+
+	pp->m--
+
+	z = pp->n++
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. F.e.
+
+	__this_cpu_inc(x)
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+an RMV operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with ().
+
+
+Christoph Lameter, April 3rd, 2013
+

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PERCPU] Remove & in front of this_cpu_ptr
       [not found]           ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org>
@ 2013-04-03 20:42             ` Christoph Lameter
  2013-04-03 21:24               ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2013-04-03 20:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet

Subject: percpu: Remove & in front of this_cpu_ptr

Both

	this_cpu_ptr(&percpu_pointer->field)


[Add Offset in percpu pointer to the field offset in the struct
and then add to the local percpu base]

as well as

	 &this_cpu_ptr(percpu_pointer)->field

[Add percpu variable offset to local percpu base to form an address
and then add the field offset to the address].

are correct. However, the latter looks a bit more complicated.
The first one is easier to understand. The second one may be
more difficult for the compiler to optimize as well.

Convert all of them to this_cpu_ptr(&percpu_pointer->field).

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/fs/gfs2/rgrp.c
===================================================================
--- linux.orig/fs/gfs2/rgrp.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/fs/gfs2/rgrp.c	2013-04-03 15:26:43.045773676 -0500
@@ -1726,7 +1726,7 @@ static bool gfs2_rgrp_congested(const st
 	s64 var;

 	preempt_disable();
-	st = &this_cpu_ptr(sdp->sd_lkstats)->lkstats[LM_TYPE_RGRP];
+	st = this_cpu_ptr(&sdp->sd_lkstats->lkstats[LM_TYPE_RGRP]);
 	r_srttb = st->stats[GFS2_LKS_SRTTB];
 	r_dcount = st->stats[GFS2_LKS_DCOUNT];
 	var = st->stats[GFS2_LKS_SRTTVARB] +
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/mm/page_alloc.c	2013-04-03 15:30:02.124769119 -0500
@@ -1342,7 +1342,7 @@ void free_hot_cold_page(struct page *pag
 		migratetype = MIGRATE_MOVABLE;
 	}

-	pcp = &this_cpu_ptr(zone->pageset)->pcp;
+	pcp = this_cpu_ptr(&zone->pageset->pcp);
 	if (cold)
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
@@ -1484,7 +1484,7 @@ again:
 		struct list_head *list;

 		local_irq_save(flags);
-		pcp = &this_cpu_ptr(zone->pageset)->pcp;
+		pcp = this_cpu_ptr(&zone->pageset->pcp);
 		list = &pcp->lists[migratetype];
 		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
Index: linux/net/core/flow.c
===================================================================
--- linux.orig/net/core/flow.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/net/core/flow.c	2013-04-03 15:26:43.045773676 -0500
@@ -328,7 +328,7 @@ static void flow_cache_flush_per_cpu(voi
 	struct flow_flush_info *info = data;
 	struct tasklet_struct *tasklet;

-	tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;
+	tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet);
 	tasklet->data = (unsigned long)info;
 	tasklet_schedule(tasklet);
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-03 20:41           ` this cpu documentation Christoph Lameter
@ 2013-04-03 21:18             ` Tejun Heo
  2013-04-04  0:09             ` Randy Dunlap
  1 sibling, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-04-03 21:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, srostedt

On Wed, Apr 03, 2013 at 08:41:32PM +0000, Christoph Lameter wrote:
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation
> 
> Document the rationale and the way to use this_cpu operations.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Applied to percpu/for-3.10 with the file renamed to this_cpu_ops.txt.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-03 20:42             ` [PERCPU] Remove & in front of this_cpu_ptr Christoph Lameter
@ 2013-04-03 21:24               ` Tejun Heo
  2013-04-03 21:29                 ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2013-04-03 21:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet

Hello, Christoph.

On Wed, Apr 03, 2013 at 08:42:33PM +0000, Christoph Lameter wrote:
> Subject: percpu: Remove & in front of this_cpu_ptr
> 
> Both
> 
> 	this_cpu_ptr(&percpu_pointer->field)
> 
> 
> [Add Offset in percpu pointer to the field offset in the struct
> and then add to the local percpu base]
> 
> as well as
> 
> 	 &this_cpu_ptr(percpu_pointer)->field
> 
> [Add percpu variable offset to local percpu base to form an address
> and then add the field offset to the address].
> 
> are correct. However, the latter looks a bit more complicated.
> The first one is easier to understand. The second one may be
> more difficult for the compiler to optimize as well.

I don't know about this one.  I actually prefer the latter in that the
pointer being passed into this_cpu_ptr() is something which is the
actual percpu pointer either from variable declaration or the
allocator.  Sure, they both are just different expressions of the same
thing but the former requires an extra guarantee from percpu subsystem
that the accessors would work for pointers which aren't the exact
values defined or allocated.  I'd much prefer unfiying things toward
the latter than the former.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-03 21:24               ` Tejun Heo
@ 2013-04-03 21:29                 ` Eric Dumazet
  2013-04-04 13:52                   ` Christoph Lameter
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-04-03 21:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, RongQing Li, Shan Wei, netdev

On Wed, 2013-04-03 at 14:24 -0700, Tejun Heo wrote:

> I don't know about this one.  I actually prefer the latter in that the
> pointer being passed into this_cpu_ptr() is something which is the
> actual percpu pointer either from variable declaration or the
> allocator.  Sure, they both are just different expressions of the same
> thing but the former requires an extra guarantee from percpu subsystem
> that the accessors would work for pointers which aren't the exact
> values defined or allocated.  I'd much prefer unfiying things toward
> the latter than the former.

I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field

The offset is added after getting the address of the (percpu) base
object.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-03 20:41           ` this cpu documentation Christoph Lameter
  2013-04-03 21:18             ` Tejun Heo
@ 2013-04-04  0:09             ` Randy Dunlap
  2013-04-04 14:41               ` Christoph Lameter
  1 sibling, 1 reply; 30+ messages in thread
From: Randy Dunlap @ 2013-04-04  0:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt

On 04/03/13 13:41, Christoph Lameter wrote:
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation
> 
> Document the rationale and the way to use this_cpu operations.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-03 15:25:41.424846306 -0500
> @@ -0,0 +1,194 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add an per cpu variable offset to the processor

                           add a per

> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This mean there are no atomicity issues between the calculation

        means

> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV instructions like

                                                        RMW ??

> +inc/dec/cmpxchg without the lock prefix and the associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurency

                                                                concurrency

> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is

                                                           base

> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the Os from

                                                                     OS or O/S or kernel

> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local\

drop ending backslash

> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Lets say we have a percpu structure

   Let's

> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--

	add    ;

> +
> +	z = pp->n++

	add        ;

> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interupt safe. Some architecture do not support these per

                    interrupt

> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. F.e.

                                           E.g. or For example:


> +
> +	__this_cpu_inc(x)
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +an RMV operation in the same instruction.

      RMW ?

> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with ().
> +
> +
> +Christoph Lameter, April 3rd, 2013


-- 
~Randy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-03 21:29                 ` Eric Dumazet
@ 2013-04-04 13:52                   ` Christoph Lameter
  2013-04-04 14:00                     ` Tejun Heo
  2013-04-04 14:29                     ` Eric Dumazet
  0 siblings, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-04 13:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tejun Heo, RongQing Li, Shan Wei, netdev

On Wed, 3 Apr 2013, Eric Dumazet wrote:

> I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field
>
> The offset is added after getting the address of the (percpu) base
> object.

There are two offsets being added! percpu_pointer is not a
pointer but an offset. this_cpu_ptr creates a pointer from the
percpu base of the current processor by adding the offset of the percpu
variable. The offset calculation better be in the parenthesis.

The method that I proposed is also conforming with the use of other
this_cpu_ops. F.e. In order to do a read one would need to do

x = this_cpu_read(percpu_pointer->field)




x = this_cpu_read(percpu_pointer)->field

does not work (and does not pass sparse).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-04 13:52                   ` Christoph Lameter
@ 2013-04-04 14:00                     ` Tejun Heo
  2013-04-04 14:21                       ` Christoph Lameter
  2013-04-04 14:29                     ` Eric Dumazet
  1 sibling, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2013-04-04 14:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev

Hello, Christoph.

On Thu, Apr 04, 2013 at 01:52:00PM +0000, Christoph Lameter wrote:
> The method that I proposed is also conforming with the use of other
> this_cpu_ops. F.e. In order to do a read one would need to do
> 
> x = this_cpu_read(percpu_pointer->field)
> 
> 
> 
> 
> x = this_cpu_read(percpu_pointer)->field
> 
> does not work (and does not pass sparse).

Right, this is true, and we *do* wanna support this_cpu ops other than
this_cpu_ptr on per-cpu struct fields.  The usage is still somewhat
unusual tho.  Can we please add documentation in the comments too?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-04 14:00                     ` Tejun Heo
@ 2013-04-04 14:21                       ` Christoph Lameter
  2013-04-04 14:25                         ` Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2013-04-04 14:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, 4 Apr 2013, Tejun Heo wrote:

> Right, this is true, and we *do* wanna support this_cpu ops other than
> this_cpu_ptr on per-cpu struct fields.  The usage is still somewhat
> unusual tho.  Can we please add documentation in the comments too?

I posted a patch adding documentation yesterday and you took it.
???

Add comments where?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-04 14:21                       ` Christoph Lameter
@ 2013-04-04 14:25                         ` Tejun Heo
  2013-04-04 15:02                           ` Christoph Lameter
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2013-04-04 14:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, Apr 04, 2013 at 02:21:57PM +0000, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Tejun Heo wrote:
> 
> > Right, this is true, and we *do* wanna support this_cpu ops other than
> > this_cpu_ptr on per-cpu struct fields.  The usage is still somewhat
> > unusual tho.  Can we please add documentation in the comments too?
> 
> I posted a patch adding documentation yesterday and you took it.
> ???
> 
> Add comments where?

I was thinking above this_cpu_*() ops.  Let's make it as conspicious
as reasonably possible.  It's a similar problem with declaring per-cpu
arrays - there are a couple ways to do it and there's no way to
automatically reject the one which isn't preferred.  I don't know.
Maybe all we can do is periodic sweep through the source tree and fix
up the "wrong" ones.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-04 13:52                   ` Christoph Lameter
  2013-04-04 14:00                     ` Tejun Heo
@ 2013-04-04 14:29                     ` Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2013-04-04 14:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Tejun Heo, RongQing Li, Shan Wei, netdev

On Thu, 2013-04-04 at 13:52 +0000, Christoph Lameter wrote:
> On Wed, 3 Apr 2013, Eric Dumazet wrote:
> 
> > I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field
> >
> > The offset is added after getting the address of the (percpu) base
> > object.
> 
> There are two offsets being added!


I was speaking of the offsetof(struct ..., field), not on the 'offset'
you think (the percpu one).

Thats why I prefer &this_cpu_ptr(percpu_pointer)->field

Its clearer for me, but thats a very minor issue.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04  0:09             ` Randy Dunlap
@ 2013-04-04 14:41               ` Christoph Lameter
  2013-04-04 16:28                 ` Tejun Heo
  2013-04-04 17:19                 ` Randy Dunlap
  0 siblings, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-04 14:41 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo

From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation V2

Document the rationale and the way to use this_cpu operations.

V2: Improved after feedback from Randy Dunlap

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops	2013-04-04 09:40:06.431946280 -0500
@@ -0,0 +1,197 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add a per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This meanthere are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
+instructions like inc/dec/cmpxchg without the lock prefix and the
+associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurirency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+	DEFINE_PER_CPU(int, x);
+	int z;
+
+	z = this_cpu_read(x);
+
+results in a single instruction
+
+	mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the kernel from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+	this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+	inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+	int *y;
+	int cpu;
+
+	cpu = get_cpu();
+	y = per_cpu_ptr(&x, cpu);
+	(*y)++;
+	put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+	y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+	x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+		of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Let's say we have a percpu structure
+
+	struct s {
+		int n,m;
+	};
+
+	DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+	this_cpu_inc(p.m)
+
+	z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+	struct s __percpu *ps = &p;
+
+	z = this_cpu_dec(ps->m);
+
+	z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+	struct s *pp;
+
+	pp = this_cpu_ptr(&p);
+
+	pp->m--;
+
+	z = pp->n++;
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interrupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. For example.
+
+	__this_cpu_inc(x);
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+an Read-Modify-Write operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with (). The
+second form also is consistent with the way this_cpu_read() and friends
+are used.
+
+
+Christoph Lameter, April 3rd, 2013
+

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PERCPU] Remove & in front of this_cpu_ptr
  2013-04-04 14:25                         ` Tejun Heo
@ 2013-04-04 15:02                           ` Christoph Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-04 15:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, 4 Apr 2013, Tejun Heo wrote:

> I was thinking above this_cpu_*() ops.  Let's make it as conspicious
> as reasonably possible.  It's a similar problem with declaring per-cpu
> arrays - there are a couple ways to do it and there's no way to
> automatically reject the one which isn't preferred.  I don't know.
> Maybe all we can do is periodic sweep through the source tree and fix
> up the "wrong" ones.

Both ways are working just fine. I'd like to use more of these though and
would like to tighten things up a bit before doing sweeps through the
kernel.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 14:41               ` Christoph Lameter
@ 2013-04-04 16:28                 ` Tejun Heo
  2013-04-04 17:19                 ` Randy Dunlap
  1 sibling, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-04-04 16:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, Apr 04, 2013 at 02:41:08PM +0000, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V2
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Updated patch applied to wq/for-3.10.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 14:41               ` Christoph Lameter
  2013-04-04 16:28                 ` Tejun Heo
@ 2013-04-04 17:19                 ` Randy Dunlap
  2013-04-04 17:26                   ` Tejun Heo
  2013-04-04 17:40                   ` Christoph Lameter
  1 sibling, 2 replies; 30+ messages in thread
From: Randy Dunlap @ 2013-04-04 17:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo

On 04/04/13 07:41, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V2
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2: Improved after feedback from Randy Dunlap

Thanks.  I have a few more corrections to V2 (please see below).


> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 09:40:06.431946280 -0500
> @@ -0,0 +1,197 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add a per cpu variable offset to the processor
> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This meanthere are no atomicity issues between the calculation

        means there

> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
> +instructions like inc/dec/cmpxchg without the lock prefix and the
> +associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency

                                                                concurrency

> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the kernel from
> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local
> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Let's say we have a percpu structure
> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--;
> +
> +	z = pp->n++;
> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interrupt safe. Some architecture do not support these per
> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. For example.
> +
> +	__this_cpu_inc(x);
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +an Read-Modify-Write operation in the same instruction.

   a

> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with (). The
> +second form also is consistent with the way this_cpu_read() and friends
> +are used.
> +
> +
> +Christoph Lameter, April 3rd, 2013



-- 
~Randy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 17:19                 ` Randy Dunlap
@ 2013-04-04 17:26                   ` Tejun Heo
  2013-04-04 17:40                   ` Christoph Lameter
  1 sibling, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-04-04 17:26 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, Apr 4, 2013 at 10:19 AM, Randy Dunlap <rdunlap@infradead.org> wrote:
> On 04/04/13 07:41, Christoph Lameter wrote:
>> From: Christoph Lameter <cl@linux.com>
>> Subject: this_cpu: Add documentation V2
>>
>> Document the rationale and the way to use this_cpu operations.
>>
>> V2: Improved after feedback from Randy Dunlap
>
> Thanks.  I have a few more corrections to V2 (please see below).

Updated the tree w/ v3. I also re-filled all the paragraphs to 75 column.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 17:19                 ` Randy Dunlap
  2013-04-04 17:26                   ` Tejun Heo
@ 2013-04-04 17:40                   ` Christoph Lameter
  2013-04-04 18:35                     ` Randy Dunlap
  2013-04-11 17:00                     ` Paul E. McKenney
  1 sibling, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2013-04-04 17:40 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo

On Thu, 4 Apr 2013, Randy Dunlap wrote:

> Thanks.  I have a few more corrections to V2 (please see below).

From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation V3

Document the rationale and the way to use this_cpu operations.

V2/V3: Improved after feedback from Randy Dunlap

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
@@ -0,0 +1,197 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add a per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This means there are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
+instructions like inc/dec/cmpxchg without the lock prefix and the
+associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurirency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+	DEFINE_PER_CPU(int, x);
+	int z;
+
+	z = this_cpu_read(x);
+
+results in a single instruction
+
+	mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the kernel from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+	this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+	inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+	int *y;
+	int cpu;
+
+	cpu = get_cpu();
+	y = per_cpu_ptr(&x, cpu);
+	(*y)++;
+	put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+	y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+	x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+		of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Let's say we have a percpu structure
+
+	struct s {
+		int n,m;
+	};
+
+	DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+	this_cpu_inc(p.m)
+
+	z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+	struct s __percpu *ps = &p;
+
+	z = this_cpu_dec(ps->m);
+
+	z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+	struct s *pp;
+
+	pp = this_cpu_ptr(&p);
+
+	pp->m--;
+
+	z = pp->n++;
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interrupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. For example.
+
+	__this_cpu_inc(x);
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+a Read-Modify-Write operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with (). The
+second form also is consistent with the way this_cpu_read() and friends
+are used.
+
+
+Christoph Lameter, April 4th, 2013
+

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 17:40                   ` Christoph Lameter
@ 2013-04-04 18:35                     ` Randy Dunlap
  2013-04-04 18:52                       ` Tejun Heo
  2013-04-11 17:00                     ` Paul E. McKenney
  1 sibling, 1 reply; 30+ messages in thread
From: Randy Dunlap @ 2013-04-04 18:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo

On 04/04/13 10:40, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Randy Dunlap wrote:
> 
>> Thanks.  I have a few more corrections to V2 (please see below).
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V3
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2/V3: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
> @@ -0,0 +1,197 @@
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency

                                                                concurrency
again.  but hopefully Tejun has already corrected that.

Thanks.

> +issues with other processors in the system.
> +


-- 
~Randy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 18:35                     ` Randy Dunlap
@ 2013-04-04 18:52                       ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2013-04-04 18:52 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev

On Thu, Apr 04, 2013 at 11:35:55AM -0700, Randy Dunlap wrote:
> > +should be accessing that variable and therefore there are no concurirency
> 
>                                                                 concurrency
> again.  but hopefully Tejun has already corrected that.

Yeap, the committed version is at

  https://git.kernel.org/cgit/linux/kernel/git/tj/percpu.git/tree/Documentation/this_cpu_ops.txt?h=for-3.10

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: this cpu documentation
  2013-04-04 17:40                   ` Christoph Lameter
  2013-04-04 18:35                     ` Randy Dunlap
@ 2013-04-11 17:00                     ` Paul E. McKenney
  1 sibling, 0 replies; 30+ messages in thread
From: Paul E. McKenney @ 2013-04-11 17:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo

On Thu, Apr 04, 2013 at 05:40:38PM +0000, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Randy Dunlap wrote:
> 
> > Thanks.  I have a few more corrections to V2 (please see below).
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V3
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2/V3: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Very good to see this!!!

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
> @@ -0,0 +1,197 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add a per cpu variable offset to the processor
> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This means there are no atomicity issues between the calculation
> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
> +instructions like inc/dec/cmpxchg without the lock prefix and the
> +associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency
> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the kernel from
> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local
> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Let's say we have a percpu structure
> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--;
> +
> +	z = pp->n++;
> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interrupt safe. Some architecture do not support these per
> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. For example.
> +
> +	__this_cpu_inc(x);
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +a Read-Modify-Write operation in the same instruction.
> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with (). The
> +second form also is consistent with the way this_cpu_read() and friends
> +are used.
> +
> +
> +Christoph Lameter, April 4th, 2013
> +
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-04-11 17:01 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28  9:42 [PATCH] core: fix the use of this_cpu_ptr roy.qing.li
2013-03-28 13:05 ` Eric Dumazet
2013-03-28 14:38   ` Christoph Lameter
2013-03-28 15:36     ` Eric Dumazet
2013-03-28 16:44       ` Christoph Lameter
2013-03-29  1:24     ` RongQing Li
2013-04-01 15:21       ` Christoph Lameter
2013-04-01 16:31         ` Eric Dumazet
2013-04-01 18:15           ` Christoph Lameter
2013-04-03 20:41           ` this cpu documentation Christoph Lameter
2013-04-03 21:18             ` Tejun Heo
2013-04-04  0:09             ` Randy Dunlap
2013-04-04 14:41               ` Christoph Lameter
2013-04-04 16:28                 ` Tejun Heo
2013-04-04 17:19                 ` Randy Dunlap
2013-04-04 17:26                   ` Tejun Heo
2013-04-04 17:40                   ` Christoph Lameter
2013-04-04 18:35                     ` Randy Dunlap
2013-04-04 18:52                       ` Tejun Heo
2013-04-11 17:00                     ` Paul E. McKenney
     [not found]           ` <alpine.DEB.2.02.1304031540110.3444@gentwo.org>
2013-04-03 20:42             ` [PERCPU] Remove & in front of this_cpu_ptr Christoph Lameter
2013-04-03 21:24               ` Tejun Heo
2013-04-03 21:29                 ` Eric Dumazet
2013-04-04 13:52                   ` Christoph Lameter
2013-04-04 14:00                     ` Tejun Heo
2013-04-04 14:21                       ` Christoph Lameter
2013-04-04 14:25                         ` Tejun Heo
2013-04-04 15:02                           ` Christoph Lameter
2013-04-04 14:29                     ` Eric Dumazet
2013-03-29 19:13 ` [PATCH] core: fix the use " David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.