* [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
@ 2012-05-20 15:19 Vlad Zolotarov
2012-05-21 9:06 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-20 15:19 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin, Ingo Molnar
Cc: Shai Fultheim (Shai@ScaleMP.com), Ido Yariv
Pls., consider applying this patch series.
It contains the following changes:
- Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
DECLARE_EARLY_PER_CPU_READ_MOSTLY().
- Adds "read-mostly" qualifier to the following variables in smp.h:
- cpu_sibling_map
- cpu_core_map
- cpu_llc_shared_map
- cpu_llc_id
- cpu_number
- x86_cpu_to_apicid
- x86_bios_cpu_apicid
- x86_cpu_to_logical_apicid
As long as all the variables above are only written during the initialization,
this change is meant to prevent the false sharing and improve the
performance on large multiprocessor systems.
v3 changes:
- Added the missing definitions of DEFINE_EARLY_PER_CPU_READ_MOSTLY()
and DECLARE_EARLY_PER_CPU_READ_MOSTLY() macros in the !CONFIG_SMP code
path in arch/x86/include/asm/percpu.h.
thanks,
vlad
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-20 15:19 [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section Vlad Zolotarov
@ 2012-05-21 9:06 ` Ingo Molnar
2012-05-21 10:14 ` Shai Fultheim (Shai@ScaleMP.com)
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 9:06 UTC (permalink / raw)
To: Vlad Zolotarov
Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
Shai Fultheim (Shai@ScaleMP.com),
Ido Yariv
* Vlad Zolotarov <vlad@scalemp.com> wrote:
> Pls., consider applying this patch series.
> It contains the following changes:
> - Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
> DECLARE_EARLY_PER_CPU_READ_MOSTLY().
> - Adds "read-mostly" qualifier to the following variables in smp.h:
> - cpu_sibling_map
> - cpu_core_map
> - cpu_llc_shared_map
> - cpu_llc_id
> - cpu_number
> - x86_cpu_to_apicid
> - x86_bios_cpu_apicid
> - x86_cpu_to_logical_apicid
>
> As long as all the variables above are only written during the
> initialization, this change is meant to prevent the false
> sharing and improve the performance on large multiprocessor
> systems.
Mind explaining why it helps performance? Are these percpu
variables also accessed from other CPUs, increasing the cost of
false sharing of cache-lines?
Or is the problem that the 'cache line' size is 4096 bytes on
vSMP, making even singular cache misses expensive and you'd like
to move these next to each other?
So it would be nice to see some before/after PMU stats
demonstrating the improvement. (perf works on vSMP, right?)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 9:06 ` Ingo Molnar
@ 2012-05-21 10:14 ` Shai Fultheim (Shai@ScaleMP.com)
2012-05-21 12:32 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Shai Fultheim (Shai@ScaleMP.com) @ 2012-05-21 10:14 UTC (permalink / raw)
To: Ingo Molnar, Vlad Zolotarov
Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin, Ido Yariv
Ingo,
The reason for this, as you pointed out, is the 'cache line' size (4096 bytes). We see significant false sharing is we do not move this next to each other.
We observed this for x86_cpu_to_apicid only, but per earlier request on this thread we implemented for the other (RO) vars as well.
Regards,
Shai.
--Shai
> -----Original Message-----
> From: Ingo Molnar [mailto:mingo.kernel.org@gmail.com] On Behalf Of Ingo
> Molnar
> Sent: Monday, May 21, 2012 12:07
> To: Vlad Zolotarov
> Cc: Thomas Gleixner; linux-kernel; Ingo Molnar; H. Peter Anvin; Shai Fultheim
> (Shai@ScaleMP.com); Ido Yariv
> Subject: Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly
> section
>
>
> * Vlad Zolotarov <vlad@scalemp.com> wrote:
>
> > Pls., consider applying this patch series.
> > It contains the following changes:
> > - Adds two new macros DEFINE_EARLY_PER_CPU_READ_MOSTLY() and
> > DECLARE_EARLY_PER_CPU_READ_MOSTLY().
> > - Adds "read-mostly" qualifier to the following variables in smp.h:
> > - cpu_sibling_map
> > - cpu_core_map
> > - cpu_llc_shared_map
> > - cpu_llc_id
> > - cpu_number
> > - x86_cpu_to_apicid
> > - x86_bios_cpu_apicid
> > - x86_cpu_to_logical_apicid
> >
> > As long as all the variables above are only written during the
> > initialization, this change is meant to prevent the false
> > sharing and improve the performance on large multiprocessor
> > systems.
>
> Mind explaining why it helps performance? Are these percpu
> variables also accessed from other CPUs, increasing the cost of
> false sharing of cache-lines?
>
> Or is the problem that the 'cache line' size is 4096 bytes on
> vSMP, making even singular cache misses expensive and you'd like
> to move these next to each other?
>
> So it would be nice to see some before/after PMU stats
> demonstrating the improvement. (perf works on vSMP, right?)
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 10:14 ` Shai Fultheim (Shai@ScaleMP.com)
@ 2012-05-21 12:32 ` Ingo Molnar
2012-05-21 13:54 ` Vlad Zolotarov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 12:32 UTC (permalink / raw)
To: Shai Fultheim (Shai@ScaleMP.com)
Cc: Vlad Zolotarov, Thomas Gleixner, linux-kernel, Ingo Molnar,
H. Peter Anvin, Ido Yariv
* Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> Ingo,
>
> The reason for this, as you pointed out, is the 'cache line'
> size (4096 bytes). We see significant false sharing is we do
> not move this next to each other.
Which write-often variable caused the many cache flushes/fills?
cpu_to_apicid is read mostly.
I.e. it might make more sense to identify the frequenty
*modified* percpu variables, and move them to a separate
section. I *think* most percpu variables are read mostly, so it
would be more maintainable in the long run to figure out the
frequently modified ones, not the frequently not modified ones.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 12:32 ` Ingo Molnar
@ 2012-05-21 13:54 ` Vlad Zolotarov
2012-05-21 14:08 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-21 13:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Shai Fultheim (Shai@ScaleMP.com),
Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
Ido Yariv
On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > Ingo,
> >
> > The reason for this, as you pointed out, is the 'cache line'
> > size (4096 bytes). We see significant false sharing is we do
> > not move this next to each other.
>
> Which write-often variable caused the many cache flushes/fills?
> cpu_to_apicid is read mostly.
>
> I.e. it might make more sense to identify the frequenty
> *modified* percpu variables, and move them to a separate
> section. I *think* most percpu variables are read mostly, so it
> would be more maintainable in the long run to figure out the
> frequently modified ones, not the frequently not modified ones.
I tend to disagree about the general claim that most per-CPU variables are
read-mostly: consider the per-CPU data structures used in lock-less algorithms
like softnet_data used in a NAPI. I'm not sure what is a more common - read-
only or not-read-only per-cpu data, but surely there are both...
In this specific patch we deal with something that is initialized once in the
init time and then used as if it's a constant thus representing a clear
"__read_mostly" case.
Having said all that I think that the proposed solution, the one using
__read_mostly infrastructure, is just ok both in a long run. I also doubt that
we are currently facing a need to define an additional "frequently modified"
section.
Pls., comment.
thanks,
vlad
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 13:54 ` Vlad Zolotarov
@ 2012-05-21 14:08 ` Ingo Molnar
2012-05-21 14:56 ` Vlad Zolotarov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 14:08 UTC (permalink / raw)
To: Vlad Zolotarov
Cc: Shai Fultheim (Shai@ScaleMP.com),
Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
Ido Yariv
* Vlad Zolotarov <vlad@scalemp.com> wrote:
> On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > Ingo,
> > >
> > > The reason for this, as you pointed out, is the 'cache line'
> > > size (4096 bytes). We see significant false sharing is we do
> > > not move this next to each other.
> >
> > Which write-often variable caused the many cache flushes/fills?
> > cpu_to_apicid is read mostly.
> >
> > I.e. it might make more sense to identify the frequenty
> > *modified* percpu variables, and move them to a separate
> > section. I *think* most percpu variables are read mostly, so
> > it would be more maintainable in the long run to figure out
> > the frequently modified ones, not the frequently not
> > modified ones.
>
> I tend to disagree about the general claim that most per-CPU
> variables are read-mostly: consider the per-CPU data
> structures used in lock-less algorithms like softnet_data used
> in a NAPI. I'm not sure what is a more common - read- only or
> not-read-only per-cpu data, but surely there are both...
Well, a quick tally of percpu variables on a 'make defconfig'
kernel would tell us one way or another?
Here there's almost 200 percpu variables active in the 64-bit
x86 defconfig, and a quick random sample suggests that most are
read-mostly.
I have no fundamental prefer to either approach, but the
direction taken should be justified explicitly, with numbers,
arguments, etc. - also a short blurb somewhere in the headers
that explains when they should be used, so that others can be
aware of vSMP's special needs here.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 14:08 ` Ingo Molnar
@ 2012-05-21 14:56 ` Vlad Zolotarov
2012-05-21 15:21 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Vlad Zolotarov @ 2012-05-21 14:56 UTC (permalink / raw)
To: Ingo Molnar
Cc: Shai Fultheim (Shai@ScaleMP.com),
Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
Ido Yariv
On Monday, May 21, 2012 16:08:22 Ingo Molnar wrote:
> * Vlad Zolotarov <vlad@scalemp.com> wrote:
> > On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > > Ingo,
> > > >
> > > > The reason for this, as you pointed out, is the 'cache line'
> > > > size (4096 bytes). We see significant false sharing is we do
> > > > not move this next to each other.
> > >
> > > Which write-often variable caused the many cache flushes/fills?
> > > cpu_to_apicid is read mostly.
> > >
> > > I.e. it might make more sense to identify the frequenty
> > > *modified* percpu variables, and move them to a separate
> > > section. I *think* most percpu variables are read mostly, so
> > > it would be more maintainable in the long run to figure out
> > > the frequently modified ones, not the frequently not
> > > modified ones.
> >
> > I tend to disagree about the general claim that most per-CPU
> > variables are read-mostly: consider the per-CPU data
> > structures used in lock-less algorithms like softnet_data used
> > in a NAPI. I'm not sure what is a more common - read- only or
> > not-read-only per-cpu data, but surely there are both...
>
> Well, a quick tally of percpu variables on a 'make defconfig'
> kernel would tell us one way or another?
>
> Here there's almost 200 percpu variables active in the 64-bit
> x86 defconfig, and a quick random sample suggests that most are
> read-mostly.
>
> I have no fundamental prefer to either approach, but the
> direction taken should be justified explicitly, with numbers,
> arguments, etc. - also a short blurb somewhere in the headers
> that explains when they should be used, so that others can be
> aware of vSMP's special needs here.
There must be some misunderstanding - this patch is not a vSMP Foundation
specific as it defines read-mostly variables as __read_mostly. The motivation
for it is just the same as in a non-vSMP Foundation case. It's true that the
performance gain this patch introduces in the vSMP Foundation is likely to be
more significant than in a native Linux, however even for a native Linux it
would still be a better code as __read_mostly is not a vSMP Foundation
specific paradigm and, again, the variables modified are a clear read-mostly
case.
So, the explanation u request above would be just the same as if I would
explain when in general __read_mostly should be used.
I grep'ed the Documentation and haven't found any readme file with the
explicit instructions when __read_mostly qualifier should be used and u r
right we'd better write one.
I can create an initial version of such a doc but I think it would better come
as a separate patch.
May we advance this way?
Pls., comment.
>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section
2012-05-21 14:56 ` Vlad Zolotarov
@ 2012-05-21 15:21 ` Ingo Molnar
0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2012-05-21 15:21 UTC (permalink / raw)
To: Vlad Zolotarov
Cc: Shai Fultheim (Shai@ScaleMP.com),
Thomas Gleixner, linux-kernel, Ingo Molnar, H. Peter Anvin,
Ido Yariv
* Vlad Zolotarov <vlad@scalemp.com> wrote:
> On Monday, May 21, 2012 16:08:22 Ingo Molnar wrote:
> > * Vlad Zolotarov <vlad@scalemp.com> wrote:
> > > On Monday, May 21, 2012 02:32:46 PM Ingo Molnar wrote:
> > > > * Shai Fultheim (Shai@ScaleMP.com) <Shai@ScaleMP.com> wrote:
> > > > > Ingo,
> > > > >
> > > > > The reason for this, as you pointed out, is the 'cache line'
> > > > > size (4096 bytes). We see significant false sharing is we do
> > > > > not move this next to each other.
> > > >
> > > > Which write-often variable caused the many cache flushes/fills?
> > > > cpu_to_apicid is read mostly.
> > > >
> > > > I.e. it might make more sense to identify the frequenty
> > > > *modified* percpu variables, and move them to a separate
> > > > section. I *think* most percpu variables are read mostly, so
> > > > it would be more maintainable in the long run to figure out
> > > > the frequently modified ones, not the frequently not
> > > > modified ones.
> > >
> > > I tend to disagree about the general claim that most per-CPU
> > > variables are read-mostly: consider the per-CPU data
> > > structures used in lock-less algorithms like softnet_data used
> > > in a NAPI. I'm not sure what is a more common - read- only or
> > > not-read-only per-cpu data, but surely there are both...
> >
> > Well, a quick tally of percpu variables on a 'make defconfig'
> > kernel would tell us one way or another?
> >
> > Here there's almost 200 percpu variables active in the 64-bit
> > x86 defconfig, and a quick random sample suggests that most are
> > read-mostly.
> >
> > I have no fundamental prefer to either approach, but the
> > direction taken should be justified explicitly, with numbers,
> > arguments, etc. - also a short blurb somewhere in the headers
> > that explains when they should be used, so that others can be
> > aware of vSMP's special needs here.
>
> There must be some misunderstanding - this patch is not a vSMP
> Foundation specific as it defines read-mostly variables as
> __read_mostly. The motivation for it is just the same as in a
> non-vSMP Foundation case. It's true that the performance gain
> this patch introduces in the vSMP Foundation is likely to be
> more significant than in a native Linux, however even for a
> native Linux it would still be a better code as __read_mostly
> is not a vSMP Foundation specific paradigm and, again, the
> variables modified are a clear read-mostly case.
(Could we please use 'vSMP' as a shortcut?)
I know that it's not vSMP specific - but the gains are largely
concentrated on the vSMP side and in fact I suspect that they
are important performance fixes for vSMP, while only 'nice to
have' micro-optimizations on other systems, right?
As such it's useful to outline the justification and relevance
of the patch.
> So, the explanation u request above would be just the same as
> if I would explain when in general __read_mostly should be
> used.
>
> I grep'ed the Documentation and haven't found any readme file
> with the explicit instructions when __read_mostly qualifier
> should be used and u r right we'd better write one.
Furthermore, this is a read_mostly per cpu variable, which is
even less obvious than a read_mostly global variable.
> I can create an initial version of such a doc but I think it
> would better come as a separate patch.
Sure.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-05-21 15:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-20 15:19 [PATCH v3 0/2] Move x86_cpu_to_apicid to the __read_mostly section Vlad Zolotarov
2012-05-21 9:06 ` Ingo Molnar
2012-05-21 10:14 ` Shai Fultheim (Shai@ScaleMP.com)
2012-05-21 12:32 ` Ingo Molnar
2012-05-21 13:54 ` Vlad Zolotarov
2012-05-21 14:08 ` Ingo Molnar
2012-05-21 14:56 ` Vlad Zolotarov
2012-05-21 15:21 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.