All of lore.kernel.org
 help / color / mirror / Atom feed
* [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
@ 2013-08-23 19:01 ` Christoph Lameter
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-23 19:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel

__get_cpu_var() is used for multiple purposes in the kernel source. One of them is
address calculation via the form &__get_cpu_va(x). This calculates the address for
the instance of the percpu variable of the current processor based on an offset.

Others usage cases are for storing and retrieving data from the current processors percpu area.
__get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.

__get_cpu_var() is defined as :


#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))



__get_cpu_var() always only does a address determination. However, store and retrieve operations
can use a segment prefix (or global register on other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
optimized assembly code to read and write per cpu variables.

This patch converts __get_cpu_var into either and explicit address calculation using this_cpu_ptr()
or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
and less registers are used when code is generated.

At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.

The patchset includes passes over all arches as well. Once these operations are used throughout then
specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
f.e. using a global register that may be set to the per cpu base.

Transformations done to __get_cpu_var()


1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);


3. Retrieve the content of the current processors instance of a per cpu variable.

	DEFINE_PER_CPU(int, u);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(this_cpu_ptr(&y), x, sizeof(x));


5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrementi etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)


These conversiont throughout the kernel source lead to
some savings in term sof code size.

Before

size arch/x86/boot/bzImage
   text	   data	    bss	    dec	    hex	filename
3996624	      0	      0	3996624	 3cfbd0	arch/x86/boot/bzImage

After

size arch/x86/boot/bzImage
   text	   data	    bss	    dec	    hex	filename
3995840	      0	      0	3995840	 3cf8c0	arch/x86/boot/bzImage




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
@ 2013-08-23 19:01 ` Christoph Lameter
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-23 19:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel

__get_cpu_var() is used for multiple purposes in the kernel source. One of them is
address calculation via the form &__get_cpu_va(x). This calculates the address for
the instance of the percpu variable of the current processor based on an offset.

Others usage cases are for storing and retrieving data from the current processors percpu area.
__get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.

__get_cpu_var() is defined as :


#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))



__get_cpu_var() always only does a address determination. However, store and retrieve operations
can use a segment prefix (or global register on other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
optimized assembly code to read and write per cpu variables.

This patch converts __get_cpu_var into either and explicit address calculation using this_cpu_ptr()
or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
and less registers are used when code is generated.

At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.

The patchset includes passes over all arches as well. Once these operations are used throughout then
specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
f.e. using a global register that may be set to the per cpu base.

Transformations done to __get_cpu_var()


1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);


3. Retrieve the content of the current processors instance of a per cpu variable.

	DEFINE_PER_CPU(int, u);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(this_cpu_ptr(&y), x, sizeof(x));


5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrementi etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)


These conversiont throughout the kernel source lead to
some savings in term sof code size.

Before

size arch/x86/boot/bzImage
   text	   data	    bss	    dec	    hex	filename
3996624	      0	      0	3996624	 3cfbd0	arch/x86/boot/bzImage

After

size arch/x86/boot/bzImage
   text	   data	    bss	    dec	    hex	filename
3995840	      0	      0	3995840	 3cf8c0	arch/x86/boot/bzImage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
  2013-08-23 19:01 ` Christoph Lameter
  (?)
@ 2013-08-23 20:05 ` Tejun Heo
  2013-08-26 15:21   ` Christoph Lameter
  -1 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2013-08-23 20:05 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel

Hello, Christoph.

On Fri, Aug 23, 2013 at 07:01:56PM +0000, Christoph Lameter wrote:
> This patch converts __get_cpu_var into either and explicit address
> calculation using this_cpu_ptr() or into a use of this_cpu
> operations that use the offset. Thereby address calcualtions are
> avoided and less registers are used when code is generated.

Yeah, we should have done this long ago.  Eventually, I think we'd be
better off dropping all _var() accessors.  They were okay when we had
segration between static and dynamic ones but are now just adding to
confusion.

On a cursory scan,

* Each patch probably needs a brief explanation of why this is
  happening, especially if these patches are gonna be routed
  separately.

* It would be a lot easier to route the patches if each had cc's to
  the maintainers of the affected subsystems.

* Dunno what's the convention around conccinelle scripts but do we
  need to keep them around if the accessor being converted gets
  removed at the end of the series?

How do you want to route the patches?  I'm gonna apply the second
patch which updates __verify_pcpu_ptr() to the percpu tree right away
and push it to Linus early during the merge window so that pushing
other patches through different trees from there on isn't too painful.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
  2013-08-23 20:05 ` Tejun Heo
@ 2013-08-26 15:21   ` Christoph Lameter
  2013-08-26 18:18     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2013-08-26 15:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel

On Fri, 23 Aug 2013, Tejun Heo wrote:

> * It would be a lot easier to route the patches if each had cc's to
>   the maintainers of the affected subsystems.

So the drivers patch needs to CC all driver maintainers?

There must be some easier way to get this done.

> * Dunno what's the convention around conccinelle scripts but do we
>   need to keep them around if the accessor being converted gets
>   removed at the end of the series?
>
> How do you want to route the patches?  I'm gonna apply the second
> patch which updates __verify_pcpu_ptr() to the percpu tree right away
> and push it to Linus early during the merge window so that pushing
> other patches through different trees from there on isn't too painful.

Not sure how to do this. Thats why its an RFC. I cced Andrew because he
usually knows how to deal with massive patches like this.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
  2013-08-26 15:21   ` Christoph Lameter
@ 2013-08-26 18:18     ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2013-08-26 18:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel

Hey, Christoph.

On Mon, Aug 26, 2013 at 03:21:56PM +0000, Christoph Lameter wrote:
> On Fri, 23 Aug 2013, Tejun Heo wrote:
> 
> > * It would be a lot easier to route the patches if each had cc's to
> >   the maintainers of the affected subsystems.
> 
> So the drivers patch needs to CC all driver maintainers?

There usually is a maintainer for a whole lot of similar drivers - one
for infiniband, one for v4l and so on, so the list usually isn't that
long.

> There must be some easier way to get this done.

It can be consolidated and pushed as a single series either through
the percpu tree or -mm but it still at least needs to inform the
people working on the affected code and get the confirmations where
possible.

> Not sure how to do this. Thats why its an RFC. I cced Andrew because he
> usually knows how to deal with massive patches like this.

It can go two ways.

* Split further so that the patches can be merged through separate
  branches so that they converge on the next merge window where the
  leftovers can be taken care of and further dependent changes merged.

* Get acks from most maintainers and push the changes as a single
  series through either percpu tree or -mm.  The benefits of going
  through -mm is that -mm floats on top of all changes scheduled for
  the next merge window, so if the changes in question are likely to
  conflict with other changes in various subsystems scheduled for the
  next merge window, -mm is easier.  Given the nature of the changes,
  I don't think going through percpu or -mm would make much
  difference.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
  2013-08-23 19:01 ` Christoph Lameter
  (?)
  (?)
@ 2013-08-27  4:57 ` Stephen Rothwell
  2013-08-27 15:39   ` Christoph Lameter
  -1 siblings, 1 reply; 7+ messages in thread
From: Stephen Rothwell @ 2013-08-27  4:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, akpm, linux-arch, Steven Rostedt, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

Hi Christoph,

On Fri, 23 Aug 2013 19:01:56 +0000 Christoph Lameter <cl@linux.com> wrote:
>
> At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.

However you get these to Linus, please do not do this last step until the
following merge window.  I can pretty much guarantee that new usages will
be added between the generation of the patch set and their integration.

Its also helpful (to me, at least) if these can be routed via various
subsystem trees to lower the number of conflicts ...
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
  2013-08-27  4:57 ` Stephen Rothwell
@ 2013-08-27 15:39   ` Christoph Lameter
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-27 15:39 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Tejun Heo, akpm, linux-arch, Steven Rostedt, linux-kernel

On Tue, 27 Aug 2013, Stephen Rothwell wrote:

> Hi Christoph,
>
> On Fri, 23 Aug 2013 19:01:56 +0000 Christoph Lameter <cl@linux.com> wrote:
> >
> > At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
>
> However you get these to Linus, please do not do this last step until the
> following merge window.  I can pretty much guarantee that new usages will
> be added between the generation of the patch set and their integration.

Ok then lets hold this particular patch until 3.13.

> Its also helpful (to me, at least) if these can be routed via various
> subsystem trees to lower the number of conflicts ...

Ok will do.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-08-27 15:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23 19:01 [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel Christoph Lameter
2013-08-23 19:01 ` Christoph Lameter
2013-08-23 20:05 ` Tejun Heo
2013-08-26 15:21   ` Christoph Lameter
2013-08-26 18:18     ` Tejun Heo
2013-08-27  4:57 ` Stephen Rothwell
2013-08-27 15:39   ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.