All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
@ 2012-05-20 15:19 Vlad Zolotarov
  2012-05-21  9:06 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-20 15:19 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin, Ingo Molnar
  Cc: Shai Fultheim (Shai@ScaleMP.com), Ido Yariv

Pls., consider applying this patch series.
It contains the following changes:
 - Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
   DECLARE_EARLY_PER_CPU_READ_MOSTLY().
 - Adds "read-mostly" qualifier to the following variables in smp.h:
  - cpu_sibling_map
  - cpu_core_map
  - cpu_llc_shared_map
  - cpu_llc_id
  - cpu_number
  - x86_cpu_to_apicid
  - x86_bios_cpu_apicid
  - x86_cpu_to_logical_apicid

As long as all the variables above are only written during the initialization,
this change is meant to prevent the false sharing and improve the
performance on large multiprocessor systems.

v3 changes:
- Added the missing definitions of DEFINE_EARLY_PER_CPU_READ_MOSTLY()
and DECLARE_EARLY_PER_CPU_READ_MOSTLY() macros in the !CONFIG_SMP code
path in arch/x86/include/asm/percpu.h.


thanks,
vlad



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-20 15:19 [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section Vlad Zolotarov
@ 2012-05-21  9:06 ` Ingo Molnar
  2012-05-21 10:14   ` Shai Fultheim (Shai@ScaleMP.com)
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21  9:06 UTC (permalink / raw)
  To: Vlad Zolotarov
  Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Shai Fultheim (Shai@ScaleMP.com),
	Ido Yariv


* Vlad Zolotarov <vlad@scalemp.com> wrote:

> Pls., consider applying this patch series.
> It contains the following changes:
>  - Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
>    DECLARE_EARLY_PER_CPU_READ_MOSTLY().
>  - Adds "read-mostly" qualifier to the following variables in smp.h:
>   - cpu_sibling_map
>   - cpu_core_map
>   - cpu_llc_shared_map
>   - cpu_llc_id
>   - cpu_number
>   - x86_cpu_to_apicid
>   - x86_bios_cpu_apicid
>   - x86_cpu_to_logical_apicid
> 
> As long as all the variables above are only written during the 
> initialization, this change is meant to prevent the false 
> sharing and improve the performance on large multiprocessor 
> systems.

Mind explaining why it helps performance? Are these percpu 
variables also accessed from other CPUs, increasing the cost of 
false sharing of cache-lines?

Or is the problem that the 'cache line' size is 4096 bytes on 
vSMP, making even singular cache misses expensive and you'd like 
to move these next to each other?

So it would be nice to see some before/after PMU stats 
demonstrating the improvement. (perf works on vSMP, right?)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21  9:06 ` Ingo Molnar
@ 2012-05-21 10:14   ` Shai Fultheim (Shai@ScaleMP.com)
  2012-05-21 12:32     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Shai Fultheim (Shai@ScaleMP.com) @ 2012-05-21 10:14 UTC (permalink / raw)
  To: Ingo Molnar, Vlad Zolotarov
  Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin, Ido Yariv

Ingo,

The reason for this, as you pointed out, is the 'cache line' size (4096 bytes).  We see significant false sharing is we do not move this next to each other.  

We observed this for x86_cpu_to_apicid only, but per earlier request on this thread we implemented for the other (RO) vars as well.

Regards,
Shai.


--Shai 


> -----Original Message-----
> From: Ingo Molnar [mailto:mingo.kernel.org@gmail.com] On Behalf Of Ingo
> Molnar
> Sent: Monday, May 21, 2012 12:07
> To: Vlad Zolotarov
> Cc: Thomas Gleixner; linux-kernel; Ingo Molnar; H. Peter Anvin; Shai Fultheim
> (Shai@ScaleMP.com); Ido Yariv
> Subject: Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly
> section
> 
> 
> * Vlad Zolotarov <vlad@scalemp.com> wrote:
> 
> > Pls., consider applying this patch series.
> > It contains the following changes:
> >  - Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
> >    DECLARE_EARLY_PER_CPU_READ_MOSTLY().
> >  - Adds "read-mostly" qualifier to the following variables in smp.h:
> >   - cpu_sibling_map
> >   - cpu_core_map
> >   - cpu_llc_shared_map
> >   - cpu_llc_id
> >   - cpu_number
> >   - x86_cpu_to_apicid
> >   - x86_bios_cpu_apicid
> >   - x86_cpu_to_logical_apicid
> >
> > As long as all the variables above are only written during the
> > initialization, this change is meant to prevent the false
> > sharing and improve the performance on large multiprocessor
> > systems.
> 
> Mind explaining why it helps performance? Are these percpu
> variables also accessed from other CPUs, increasing the cost of
> false sharing of cache-lines?
> 
> Or is the problem that the 'cache line' size is 4096 bytes on
> vSMP, making even singular cache misses expensive and you'd like
> to move these next to each other?
> 
> So it would be nice to see some before/after PMU stats
> demonstrating the improvement. (perf works on vSMP, right?)
> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21 10:14   ` Shai Fultheim (Shai@ScaleMP.com)
@ 2012-05-21 12:32     ` Ingo Molnar
  2012-05-21 13:54       ` Vlad Zolotarov
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 12:32 UTC (permalink / raw)
  To: Shai Fultheim (Shai@ScaleMP.com)
  Cc: Vlad Zolotarov, Thomas Gleixner, linux-kernel, Ingo Molnar,
	H. Peter Anvin, Ido Yariv


* Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:

> Ingo,
> 
> The reason for this, as you pointed out, is the 'cache line' 
> size (4096 bytes).  We see significant false sharing is we do 
> not move this next to each other.

Which write-often variable caused the many cache flushes/fills? 
cpu_to_apicid is read mostly.

I.e. it might make more sense to identify the frequenty 
*modified* percpu variables, and move them to a separate 
section. I *think* most percpu variables are read mostly, so it 
would be more maintainable in the long run to figure out the 
frequently modified ones, not the frequently not modified ones.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21 12:32     ` Ingo Molnar
@ 2012-05-21 13:54       ` Vlad Zolotarov
  2012-05-21 14:08         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-21 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Shai Fultheim (Shai@ScaleMP.com),
	Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Ido Yariv

On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > Ingo,
> > 
> > The reason for this, as you pointed out, is the 'cache line'
> > size (4096 bytes).  We see significant false sharing is we do
> > not move this next to each other.
> 
> Which write-often variable caused the many cache flushes/fills?
> cpu_to_apicid is read mostly.
> 
> I.e. it might make more sense to identify the frequenty
> *modified* percpu variables, and move them to a separate
> section. I *think* most percpu variables are read mostly, so it
> would be more maintainable in the long run to figure out the
> frequently modified ones, not the frequently not modified ones.

I tend to disagree about the general claim that most per-CPU variables are 
read-mostly: consider the per-CPU data structures used in lock-less algorithms 
like softnet_data used in a NAPI. I'm not sure what is a more common - read-
only or not-read-only per-cpu data, but surely there are both...

In this specific patch we deal with something that is initialized once in the 
init time and then used as if it's a constant thus representing a clear 
"__read_mostly" case. 

Having said all that I think that the proposed solution, the one using 
__read_mostly infrastructure, is just ok both in a long run. I also doubt that 
we are currently facing a need to define an additional "frequently modified" 
section.

Pls., comment.

thanks,
vlad

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21 13:54       ` Vlad Zolotarov
@ 2012-05-21 14:08         ` Ingo Molnar
  2012-05-21 14:56           ` Vlad Zolotarov
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 14:08 UTC (permalink / raw)
  To: Vlad Zolotarov
  Cc: Shai Fultheim (Shai@ScaleMP.com),
	Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Ido Yariv


* Vlad Zolotarov <vlad@scalemp.com> wrote:

> On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > Ingo,
> > > 
> > > The reason for this, as you pointed out, is the 'cache line'
> > > size (4096 bytes).  We see significant false sharing is we do
> > > not move this next to each other.
> > 
> > Which write-often variable caused the many cache flushes/fills?
> > cpu_to_apicid is read mostly.
> > 
> > I.e. it might make more sense to identify the frequenty 
> > *modified* percpu variables, and move them to a separate 
> > section. I *think* most percpu variables are read mostly, so 
> > it would be more maintainable in the long run to figure out 
> > the frequently modified ones, not the frequently not 
> > modified ones.
> 
> I tend to disagree about the general claim that most per-CPU 
> variables are read-mostly: consider the per-CPU data 
> structures used in lock-less algorithms like softnet_data used 
> in a NAPI. I'm not sure what is a more common - read- only or 
> not-read-only per-cpu data, but surely there are both...

Well, a quick tally of percpu variables on a 'make defconfig' 
kernel would tell us one way or another?

Here there's almost 200 percpu variables active in the 64-bit 
x86 defconfig, and a quick random sample suggests that most are 
read-mostly.

I have no fundamental prefer to either approach, but the 
direction taken should be justified explicitly, with numbers, 
arguments, etc. - also a short blurb somewhere in the headers 
that explains when they should be used, so that others can be 
aware of vSMP's special needs here.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21 14:08         ` Ingo Molnar
@ 2012-05-21 14:56           ` Vlad Zolotarov
  2012-05-21 15:21             ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-21 14:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Shai Fultheim (Shai@ScaleMP.com),
	Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Ido Yariv

On Monday, May 21, 2012 16:08:22 Ingo Molnar wrote:
> * Vlad Zolotarov <vlad@scalemp.com> wrote:
> > On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > > Ingo,
> > > > 
> > > > The reason for this, as you pointed out, is the 'cache line'
> > > > size (4096 bytes).  We see significant false sharing is we do
> > > > not move this next to each other.
> > > 
> > > Which write-often variable caused the many cache flushes/fills?
> > > cpu_to_apicid is read mostly.
> > > 
> > > I.e. it might make more sense to identify the frequenty
> > > *modified* percpu variables, and move them to a separate
> > > section. I *think* most percpu variables are read mostly, so
> > > it would be more maintainable in the long run to figure out
> > > the frequently modified ones, not the frequently not
> > > modified ones.
> > 
> > I tend to disagree about the general claim that most per-CPU
> > variables are read-mostly: consider the per-CPU data
> > structures used in lock-less algorithms like softnet_data used
> > in a NAPI. I'm not sure what is a more common - read- only or
> > not-read-only per-cpu data, but surely there are both...
> 
> Well, a quick tally of percpu variables on a 'make defconfig'
> kernel would tell us one way or another?
> 
> Here there's almost 200 percpu variables active in the 64-bit
> x86 defconfig, and a quick random sample suggests that most are
> read-mostly.
> 
> I have no fundamental prefer to either approach, but the
> direction taken should be justified explicitly, with numbers,
> arguments, etc. - also a short blurb somewhere in the headers
> that explains when they should be used, so that others can be
> aware of vSMP's special needs here.

There must be some misunderstanding - this patch is not a vSMP Foundation 
specific as it defines read-mostly variables as __read_mostly. The motivation 
for it is just the same as in a non-vSMP Foundation case. It's true that the 
performance gain this patch introduces in the vSMP Foundation is likely to be 
more significant than in a native Linux, however even for a native Linux it 
would still be a better code as __read_mostly is not a vSMP Foundation 
specific paradigm and, again, the variables modified are a clear read-mostly 
case.

So, the explanation u request above would be just the same as if I would 
explain when in general __read_mostly should be used. 

I grep'ed the Documentation and haven't found any readme file with the 
explicit instructions when __read_mostly qualifier should be used and u r 
right we'd better write one. 

I can create an initial version of such a doc but I think it would better come 
as a separate patch.

May we advance this way?

Pls., comment.

> 
> Thanks,
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
  2012-05-21 14:56           ` Vlad Zolotarov
@ 2012-05-21 15:21             ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 15:21 UTC (permalink / raw)
  To: Vlad Zolotarov
  Cc: Shai Fultheim (Shai@ScaleMP.com),
	Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
	Ido Yariv


* Vlad Zolotarov <vlad@scalemp.com> wrote:

> On Monday, May 21, 2012 16:08:22 Ingo Molnar wrote:
> > * Vlad Zolotarov <vlad@scalemp.com> wrote:
> > > On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > > > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > > > Ingo,
> > > > > 
> > > > > The reason for this, as you pointed out, is the 'cache line'
> > > > > size (4096 bytes).  We see significant false sharing is we do
> > > > > not move this next to each other.
> > > > 
> > > > Which write-often variable caused the many cache flushes/fills?
> > > > cpu_to_apicid is read mostly.
> > > > 
> > > > I.e. it might make more sense to identify the frequenty
> > > > *modified* percpu variables, and move them to a separate
> > > > section. I *think* most percpu variables are read mostly, so
> > > > it would be more maintainable in the long run to figure out
> > > > the frequently modified ones, not the frequently not
> > > > modified ones.
> > > 
> > > I tend to disagree about the general claim that most per-CPU
> > > variables are read-mostly: consider the per-CPU data
> > > structures used in lock-less algorithms like softnet_data used
> > > in a NAPI. I'm not sure what is a more common - read- only or
> > > not-read-only per-cpu data, but surely there are both...
> > 
> > Well, a quick tally of percpu variables on a 'make defconfig'
> > kernel would tell us one way or another?
> > 
> > Here there's almost 200 percpu variables active in the 64-bit
> > x86 defconfig, and a quick random sample suggests that most are
> > read-mostly.
> > 
> > I have no fundamental prefer to either approach, but the
> > direction taken should be justified explicitly, with numbers,
> > arguments, etc. - also a short blurb somewhere in the headers
> > that explains when they should be used, so that others can be
> > aware of vSMP's special needs here.
> 
> There must be some misunderstanding - this patch is not a vSMP 
> Foundation specific as it defines read-mostly variables as 
> __read_mostly. The motivation for it is just the same as in a 
> non-vSMP Foundation case. It's true that the performance gain 
> this patch introduces in the vSMP Foundation is likely to be 
> more significant than in a native Linux, however even for a 
> native Linux it would still be a better code as __read_mostly 
> is not a vSMP Foundation specific paradigm and, again, the 
> variables modified are a clear read-mostly case.

(Could we please use 'vSMP' as a shortcut?)

I know that it's not vSMP specific - but the gains are largely 
concentrated on the vSMP side and in fact I suspect that they 
are important performance fixes for vSMP, while only 'nice to 
have' micro-optimizations on other systems, right?

As such it's useful to outline the justification and relevance 
of the patch.

> So, the explanation u request above would be just the same as 
> if I would explain when in general __read_mostly should be 
> used.
> 
> I grep'ed the Documentation and haven't found any readme file 
> with the explicit instructions when __read_mostly qualifier 
> should be used and u r right we'd better write one.

Furthermore, this is a read_mostly per cpu variable, which is 
even less obvious than a read_mostly global variable.

> I can create an initial version of such a doc but I think it 
> would better come as a separate patch.

Sure.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-21 15:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-20 15:19 [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section Vlad Zolotarov
2012-05-21  9:06 ` Ingo Molnar
2012-05-21 10:14   ` Shai Fultheim (Shai@ScaleMP.com)
2012-05-21 12:32     ` Ingo Molnar
2012-05-21 13:54       ` Vlad Zolotarov
2012-05-21 14:08         ` Ingo Molnar
2012-05-21 14:56           ` Vlad Zolotarov
2012-05-21 15:21             ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.