All of lore.kernel.org
 help / color / mirror / Atom feed
* what is __get_cpu_var() ?
@ 2011-02-23  4:27 Murali N
  2011-02-23  8:22 ` Dave Hylands
  0 siblings, 1 reply; 6+ messages in thread
From: Murali N @ 2011-02-23  4:27 UTC (permalink / raw)
  To: kernelnewbies

Hi,
can somebody explain me what "__get_cpu_var()" macro does?
I try to understand this macro but i couldn't, its representation is weired!!!

-- 
Regards,
Murali N

^ permalink raw reply	[flat|nested] 6+ messages in thread

* what is __get_cpu_var() ?
  2011-02-23  4:27 what is __get_cpu_var() ? Murali N
@ 2011-02-23  8:22 ` Dave Hylands
  2011-02-23 17:34   ` Murali N
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hylands @ 2011-02-23  8:22 UTC (permalink / raw)
  To: kernelnewbies

Hi Murali,

On Tue, Feb 22, 2011 at 9:27 PM, Murali N <nalajala.murali@gmail.com> wrote:
> Hi,
> can somebody explain me what "__get_cpu_var()" macro does?
> I try to understand this macro but i couldn't, its representation is weired!!!

get_cpu_var returns the contents of a per-cpu variable.

__get_cpu_var contains the actual machine-dependant implementation. It
looks like all of the architectures use the one in
asm-generic/percpu.h

In general, all of the per-cpu data is gathered together into a
section. Multiple sections are allocated (one per CPU). I think that
the address of the variable is really the offset within the section,
and each allocated section is cache-line aligned. This offset is then
added to the "offset for my cpu" to come up with the final address of
the variable, which is dereferenced as a pointer dereference. There
are lots of extra doo-dads to get around warnings, and to prevent the
linker from producing relocation references for for the variable
access (since it looks like an access of a global variable, but it's
really just doing a game of using the offset of the variable within
the section).

So you could think of it as a very fancy offsetof macro.

There are several other macros involved, perhaps you could be a bit
more specific about your request?

Dave Hylands

^ permalink raw reply	[flat|nested] 6+ messages in thread

* what is __get_cpu_var() ?
  2011-02-23  8:22 ` Dave Hylands
@ 2011-02-23 17:34   ` Murali N
  2011-02-23 18:15     ` Dave Hylands
  0 siblings, 1 reply; 6+ messages in thread
From: Murali N @ 2011-02-23 17:34 UTC (permalink / raw)
  To: kernelnewbies

Hi Dave,
thanks for your reply.

On Wed, Feb 23, 2011 at 1:22 AM, Dave Hylands <dhylands@gmail.com> wrote:
> Hi Murali,
>
> On Tue, Feb 22, 2011 at 9:27 PM, Murali N <nalajala.murali@gmail.com> wrote:
>> Hi,
>> can somebody explain me what "__get_cpu_var()" macro does?
>> I try to understand this macro but i couldn't, its representation is weired!!!
>
> get_cpu_var returns the contents of a per-cpu variable.
>
> __get_cpu_var contains the actual machine-dependant implementation. It
> looks like all of the architectures use the one in
> asm-generic/percpu.h
>
> In general, all of the per-cpu data is gathered together into a
> section. Multiple sections are allocated (one per CPU). I think that
> the address of the variable is really the offset within the section,
> and each allocated section is cache-line aligned. This offset is then
> added to the "offset for my cpu" to come up with the final address of
> the variable, which is dereferenced as a pointer dereference. There
> are lots of extra doo-dads to get around warnings, and to prevent the
> linker from producing relocation references for for the variable
> access (since it looks like an access of a global variable, but it's
> really just doing a game of using the offset of the variable within
> the section).
>
> So you could think of it as a very fancy offsetof macro.
>
> There are several other macros involved, perhaps you could be a bit
> more specific about your request?
>
> Dave Hylands
>

I have one more basic question.
Why would we need to maintain structures like this? Is there any
advantage we get here?
I saw one of the architectures specifics timer code, through out the
code they are using this macro.

-- 
Regards,
Murali N

^ permalink raw reply	[flat|nested] 6+ messages in thread

* what is __get_cpu_var() ?
  2011-02-23 17:34   ` Murali N
@ 2011-02-23 18:15     ` Dave Hylands
  2011-02-23 18:26       ` Murali N
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hylands @ 2011-02-23 18:15 UTC (permalink / raw)
  To: kernelnewbies

Hi Murali,

On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali@gmail.com> wrote:
> Hi Dave,
> thanks for your reply.
...snip...
>> get_cpu_var returns the contents of a per-cpu variable.
>>
>> __get_cpu_var contains the actual machine-dependant implementation. It
>> looks like all of the architectures use the one in
>> asm-generic/percpu.h
>>
>> In general, all of the per-cpu data is gathered together into a
>> section. Multiple sections are allocated (one per CPU). I think that
>> the address of the variable is really the offset within the section,
>> and each allocated section is cache-line aligned. This offset is then
>> added to the "offset for my cpu" to come up with the final address of
>> the variable, which is dereferenced as a pointer dereference. There
>> are lots of extra doo-dads to get around warnings, and to prevent the
>> linker from producing relocation references for for the variable
>> access (since it looks like an access of a global variable, but it's
>> really just doing a game of using the offset of the variable within
>> the section).
>>
>> So you could think of it as a very fancy offsetof macro.
>>
>> There are several other macros involved, perhaps you could be a bit
>> more specific about your request?
>>
>> Dave Hylands
>>
>
> I have one more basic question.
> Why would we need to maintain structures like this? Is there any
> advantage we get here?

Primarily for performance reasons. For example, the kernel maintains
lots of stats on threads and processes (I haven't looked to see if
these are actually maintained on a per-cpu basis, but the concept
applies). these stats are updated frequently, but only accessed
occaisonally. If you have a global "database" of stats, then each CPU
needs to lock the data, which creates lots of contention. By keeping
stuff per-cpu, the cpus don't need to acquire any locks (or at the
very least won't cause as much contention when acquiring per-cpu
locks). This becomes especially important when there are lots of cpus.

The query functions can then amalgamate the information and present it
as if it were maintained in a global database.

So if you have data which is updated frequently and only accessed
occaisonally, or updated infrequently and accessed frequently, then
you might have a case for using per-cpu-data. Of course you'd still
need to profile it and see if it makes sense.

Also keep in mind, that some things might not seem like it matters
much for say a dual-core, but could make a considerable difference
with say 32 cores.

Dave Hylands

^ permalink raw reply	[flat|nested] 6+ messages in thread

* what is __get_cpu_var() ?
  2011-02-23 18:15     ` Dave Hylands
@ 2011-02-23 18:26       ` Murali N
  2011-02-23 18:30         ` Dave Hylands
  0 siblings, 1 reply; 6+ messages in thread
From: Murali N @ 2011-02-23 18:26 UTC (permalink / raw)
  To: kernelnewbies

Hi Dave,

On Wed, Feb 23, 2011 at 11:15 AM, Dave Hylands <dhylands@gmail.com> wrote:
> Hi Murali,
>
> On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali@gmail.com> wrote:
>> Hi Dave,
>> thanks for your reply.
> ...snip...
>>> get_cpu_var returns the contents of a per-cpu variable.
>>>
>>> __get_cpu_var contains the actual machine-dependant implementation. It
>>> looks like all of the architectures use the one in
>>> asm-generic/percpu.h
>>>
>>> In general, all of the per-cpu data is gathered together into a
>>> section. Multiple sections are allocated (one per CPU). I think that
>>> the address of the variable is really the offset within the section,
>>> and each allocated section is cache-line aligned. This offset is then
>>> added to the "offset for my cpu" to come up with the final address of
>>> the variable, which is dereferenced as a pointer dereference. There
>>> are lots of extra doo-dads to get around warnings, and to prevent the
>>> linker from producing relocation references for for the variable
>>> access (since it looks like an access of a global variable, but it's
>>> really just doing a game of using the offset of the variable within
>>> the section).
>>>
>>> So you could think of it as a very fancy offsetof macro.
>>>
>>> There are several other macros involved, perhaps you could be a bit
>>> more specific about your request?
>>>
>>> Dave Hylands
>>>
>>
>> I have one more basic question.
>> Why would we need to maintain structures like this? Is there any
>> advantage we get here?
>
> Primarily for performance reasons. For example, the kernel maintains
> lots of stats on threads and processes (I haven't looked to see if
> these are actually maintained on a per-cpu basis, but the concept
> applies). these stats are updated frequently, but only accessed
> occaisonally. If you have a global "database" of stats, then each CPU
> needs to lock the data, which creates lots of contention. By keeping
> stuff per-cpu, the cpus don't need to acquire any locks (or at the
> very least won't cause as much contention when acquiring per-cpu
> locks). This becomes especially important when there are lots of cpus.
>
> The query functions can then amalgamate the information and present it
> as if it were maintained in a global database.
>
> So if you have data which is updated frequently and only accessed
> occaisonally, or updated infrequently and accessed frequently, then
> you might have a case for using per-cpu-data. Of course you'd still
> need to profile it and see if it makes sense.
>
> Also keep in mind, that some things might not seem like it matters
> much for say a dual-core, but could make a considerable difference
> with say 32 cores.
>
> Dave Hylands
>

So it make sense to use if i am running on more cores ( > 4 ).

-- 
Regards,
Murali N

^ permalink raw reply	[flat|nested] 6+ messages in thread

* what is __get_cpu_var() ?
  2011-02-23 18:26       ` Murali N
@ 2011-02-23 18:30         ` Dave Hylands
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Hylands @ 2011-02-23 18:30 UTC (permalink / raw)
  To: kernelnewbies

HI Murali,

On Wed, Feb 23, 2011 at 11:26 AM, Murali N <nalajala.murali@gmail.com> wrote:
> Hi Dave,
>
> On Wed, Feb 23, 2011 at 11:15 AM, Dave Hylands <dhylands@gmail.com> wrote:
>> Hi Murali,
>>
>> On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali@gmail.com> wrote:
>>> Hi Dave,
>>> thanks for your reply.
>> ...snip...
>>>> get_cpu_var returns the contents of a per-cpu variable.
>>>>
>>>> __get_cpu_var contains the actual machine-dependant implementation. It
>>>> looks like all of the architectures use the one in
>>>> asm-generic/percpu.h
>>>>
>>>> In general, all of the per-cpu data is gathered together into a
>>>> section. Multiple sections are allocated (one per CPU). I think that
>>>> the address of the variable is really the offset within the section,
>>>> and each allocated section is cache-line aligned. This offset is then
>>>> added to the "offset for my cpu" to come up with the final address of
>>>> the variable, which is dereferenced as a pointer dereference. There
>>>> are lots of extra doo-dads to get around warnings, and to prevent the
>>>> linker from producing relocation references for for the variable
>>>> access (since it looks like an access of a global variable, but it's
>>>> really just doing a game of using the offset of the variable within
>>>> the section).
>>>>
>>>> So you could think of it as a very fancy offsetof macro.
>>>>
>>>> There are several other macros involved, perhaps you could be a bit
>>>> more specific about your request?
>>>>
>>>> Dave Hylands
>>>>
>>>
>>> I have one more basic question.
>>> Why would we need to maintain structures like this? Is there any
>>> advantage we get here?
>>
>> Primarily for performance reasons. For example, the kernel maintains
>> lots of stats on threads and processes (I haven't looked to see if
>> these are actually maintained on a per-cpu basis, but the concept
>> applies). these stats are updated frequently, but only accessed
>> occaisonally. If you have a global "database" of stats, then each CPU
>> needs to lock the data, which creates lots of contention. By keeping
>> stuff per-cpu, the cpus don't need to acquire any locks (or at the
>> very least won't cause as much contention when acquiring per-cpu
>> locks). This becomes especially important when there are lots of cpus.
>>
>> The query functions can then amalgamate the information and present it
>> as if it were maintained in a global database.
>>
>> So if you have data which is updated frequently and only accessed
>> occaisonally, or updated infrequently and accessed frequently, then
>> you might have a case for using per-cpu-data. Of course you'd still
>> need to profile it and see if it makes sense.
>>
>> Also keep in mind, that some things might not seem like it matters
>> much for say a dual-core, but could make a considerable difference
>> with say 32 cores.
>>
>> Dave Hylands
>>
>
> So it make sense to use if i am running on more cores ( > 4 ).

It really depends on the access patterns of the data. Whether it makes
sense or not is something you'll probably need to profile (i.e. with
and without using per-cpu variables).

Dave Hylands

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-02-23 18:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-23  4:27 what is __get_cpu_var() ? Murali N
2011-02-23  8:22 ` Dave Hylands
2011-02-23 17:34   ` Murali N
2011-02-23 18:15     ` Dave Hylands
2011-02-23 18:26       ` Murali N
2011-02-23 18:30         ` Dave Hylands

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.