* [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
@ 2013-08-23 19:01 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-23 19:01 UTC (permalink / raw)
To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is
address calculation via the form &__get_cpu_va(x). This calculates the address for
the instance of the percpu variable of the current processor based on an offset.
Others usage cases are for storing and retrieving data from the current processors percpu area.
__get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does a address determination. However, store and retrieve operations
can use a segment prefix (or global register on other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
optimized assembly code to read and write per cpu variables.
This patch converts __get_cpu_var into either and explicit address calculation using this_cpu_ptr()
or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
and less registers are used when code is generated.
At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
The patchset includes passes over all arches as well. Once these operations are used throughout then
specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
f.e. using a global register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu variable.
DEFINE_PER_CPU(int, u);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(this_cpu_ptr(&y), x, sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
this_cpu_write(y, x);
6. Increment/Decrementi etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
this_cpu_inc(y)
These conversiont throughout the kernel source lead to
some savings in term sof code size.
Before
size arch/x86/boot/bzImage
text data bss dec hex filename
3996624 0 0 3996624 3cfbd0 arch/x86/boot/bzImage
After
size arch/x86/boot/bzImage
text data bss dec hex filename
3995840 0 0 3995840 3cf8c0 arch/x86/boot/bzImage
^ permalink raw reply [flat|nested] 7+ messages in thread
* [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
@ 2013-08-23 19:01 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-23 19:01 UTC (permalink / raw)
To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is
address calculation via the form &__get_cpu_va(x). This calculates the address for
the instance of the percpu variable of the current processor based on an offset.
Others usage cases are for storing and retrieving data from the current processors percpu area.
__get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does a address determination. However, store and retrieve operations
can use a segment prefix (or global register on other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use
optimized assembly code to read and write per cpu variables.
This patch converts __get_cpu_var into either and explicit address calculation using this_cpu_ptr()
or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided
and less registers are used when code is generated.
At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
The patchset includes passes over all arches as well. Once these operations are used throughout then
specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by
f.e. using a global register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu variable.
DEFINE_PER_CPU(int, u);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(this_cpu_ptr(&y), x, sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
this_cpu_write(y, x);
6. Increment/Decrementi etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
this_cpu_inc(y)
These conversiont throughout the kernel source lead to
some savings in term sof code size.
Before
size arch/x86/boot/bzImage
text data bss dec hex filename
3996624 0 0 3996624 3cfbd0 arch/x86/boot/bzImage
After
size arch/x86/boot/bzImage
text data bss dec hex filename
3995840 0 0 3995840 3cf8c0 arch/x86/boot/bzImage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
2013-08-23 19:01 ` Christoph Lameter
(?)
@ 2013-08-23 20:05 ` Tejun Heo
2013-08-26 15:21 ` Christoph Lameter
-1 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2013-08-23 20:05 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel
Hello, Christoph.
On Fri, Aug 23, 2013 at 07:01:56PM +0000, Christoph Lameter wrote:
> This patch converts __get_cpu_var into either and explicit address
> calculation using this_cpu_ptr() or into a use of this_cpu
> operations that use the offset. Thereby address calcualtions are
> avoided and less registers are used when code is generated.
Yeah, we should have done this long ago. Eventually, I think we'd be
better off dropping all _var() accessors. They were okay when we had
segration between static and dynamic ones but are now just adding to
confusion.
On a cursory scan,
* Each patch probably needs a brief explanation of why this is
happening, especially if these patches are gonna be routed
separately.
* It would be a lot easier to route the patches if each had cc's to
the maintainers of the affected subsystems.
* Dunno what's the convention around conccinelle scripts but do we
need to keep them around if the accessor being converted gets
removed at the end of the series?
How do you want to route the patches? I'm gonna apply the second
patch which updates __verify_pcpu_ptr() to the percpu tree right away
and push it to Linus early during the merge window so that pushing
other patches through different trees from there on isn't too painful.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
2013-08-23 20:05 ` Tejun Heo
@ 2013-08-26 15:21 ` Christoph Lameter
2013-08-26 18:18 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2013-08-26 15:21 UTC (permalink / raw)
To: Tejun Heo; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel
On Fri, 23 Aug 2013, Tejun Heo wrote:
> * It would be a lot easier to route the patches if each had cc's to
> the maintainers of the affected subsystems.
So the drivers patch needs to CC all driver maintainers?
There must be some easier way to get this done.
> * Dunno what's the convention around conccinelle scripts but do we
> need to keep them around if the accessor being converted gets
> removed at the end of the series?
>
> How do you want to route the patches? I'm gonna apply the second
> patch which updates __verify_pcpu_ptr() to the percpu tree right away
> and push it to Linus early during the merge window so that pushing
> other patches through different trees from there on isn't too painful.
Not sure how to do this. Thats why its an RFC. I cced Andrew because he
usually knows how to deal with massive patches like this.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
2013-08-26 15:21 ` Christoph Lameter
@ 2013-08-26 18:18 ` Tejun Heo
0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2013-08-26 18:18 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-arch, Steven Rostedt, linux-kernel
Hey, Christoph.
On Mon, Aug 26, 2013 at 03:21:56PM +0000, Christoph Lameter wrote:
> On Fri, 23 Aug 2013, Tejun Heo wrote:
>
> > * It would be a lot easier to route the patches if each had cc's to
> > the maintainers of the affected subsystems.
>
> So the drivers patch needs to CC all driver maintainers?
There usually is a maintainer for a whole lot of similar drivers - one
for infiniband, one for v4l and so on, so the list usually isn't that
long.
> There must be some easier way to get this done.
It can be consolidated and pushed as a single series either through
the percpu tree or -mm but it still at least needs to inform the
people working on the affected code and get the confirmations where
possible.
> Not sure how to do this. Thats why its an RFC. I cced Andrew because he
> usually knows how to deal with massive patches like this.
It can go two ways.
* Split further so that the patches can be merged through separate
branches so that they converge on the next merge window where the
leftovers can be taken care of and further dependent changes merged.
* Get acks from most maintainers and push the changes as a single
series through either percpu tree or -mm. The benefits of going
through -mm is that -mm floats on top of all changes scheduled for
the next merge window, so if the changes in question are likely to
conflict with other changes in various subsystems scheduled for the
next merge window, -mm is easier. Given the nature of the changes,
I don't think going through percpu or -mm would make much
difference.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
2013-08-23 19:01 ` Christoph Lameter
(?)
(?)
@ 2013-08-27 4:57 ` Stephen Rothwell
2013-08-27 15:39 ` Christoph Lameter
-1 siblings, 1 reply; 7+ messages in thread
From: Stephen Rothwell @ 2013-08-27 4:57 UTC (permalink / raw)
To: Christoph Lameter
Cc: Tejun Heo, akpm, linux-arch, Steven Rostedt, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 623 bytes --]
Hi Christoph,
On Fri, 23 Aug 2013 19:01:56 +0000 Christoph Lameter <cl@linux.com> wrote:
>
> At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
However you get these to Linus, please do not do this last step until the
following merge window. I can pretty much guarantee that new usages will
be added between the generation of the patch set and their integration.
Its also helpful (to me, at least) if these can be routed via various
subsystem trees to lower the number of conflicts ...
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel
2013-08-27 4:57 ` Stephen Rothwell
@ 2013-08-27 15:39 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-08-27 15:39 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Tejun Heo, akpm, linux-arch, Steven Rostedt, linux-kernel
On Tue, 27 Aug 2013, Stephen Rothwell wrote:
> Hi Christoph,
>
> On Fri, 23 Aug 2013 19:01:56 +0000 Christoph Lameter <cl@linux.com> wrote:
> >
> > At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too.
>
> However you get these to Linus, please do not do this last step until the
> following merge window. I can pretty much guarantee that new usages will
> be added between the generation of the patch set and their integration.
Ok then lets hold this particular patch until 3.13.
> Its also helpful (to me, at least) if these can be routed via various
> subsystem trees to lower the number of conflicts ...
Ok will do.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-08-27 15:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23 19:01 [guv 00/16] [RFC] percpu: Replace __get_cpu_var uses throughout the kernel Christoph Lameter
2013-08-23 19:01 ` Christoph Lameter
2013-08-23 20:05 ` Tejun Heo
2013-08-26 15:21 ` Christoph Lameter
2013-08-26 18:18 ` Tejun Heo
2013-08-27 4:57 ` Stephen Rothwell
2013-08-27 15:39 ` Christoph Lameter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.