linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 2/3] add new macros to make percpu readmostly section correctly align
@ 2010-12-02  2:02 Shaohua Li
  2010-12-10 15:14 ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-02  2:02 UTC (permalink / raw)
  To: lkml; +Cc: hpa, Andrew Morton, sam, eric.dumazet

percpu readmostly section should start and end at address cachline aligned.
Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
I add new macros for x86.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

---
 include/asm-generic/vmlinux.lds.h |   66 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h	2010-12-02 09:22:32.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h	2010-12-02 09:32:42.000000000 +0800
@@ -726,6 +726,72 @@
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
 	}
 
+/**
+ * PERCPU_VADDR_CACHEALIGNED - define output section for percpu area
+ * @vaddr: explicit base address (optional)
+ * @phdr: destination PHDR (optional)
+ * @cacheline: cachline size required by readmostly percpu data
+ *
+ * Macro which expands to output section for percpu area.  If @vaddr
+ * is not blank, it specifies explicit base address and all percpu
+ * symbols will be offset from the given address.  If blank, @vaddr
+ * always equals @laddr + LOAD_OFFSET.
+ *
+ * @phdr defines the output PHDR to use if not blank.  Be warned that
+ * output PHDR is sticky.  If @phdr is specified, the next output
+ * section in the linker script will go there too.  @phdr should have
+ * a leading colon.
+ *
+ * Note that this macros defines __per_cpu_load as an absolute symbol.
+ * If there is no need to put the percpu section at a predetermined
+ * address, use PERCPU_CACHEALIGNED().
+ */
+#define PERCPU_VADDR_CACHEALIGNED(vaddr, phdr, cacheline)		\
+	VMLINUX_SYMBOL(__per_cpu_load) = .;				\
+	.data..percpu vaddr : AT(VMLINUX_SYMBOL(__per_cpu_load)		\
+				- LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
+		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
+		*(.data..percpu..page_aligned)				\
+		. = ALIGN(cacheline);					\
+		*(.data..percpu..readmostly)				\
+		. = ALIGN(cacheline);					\
+		*(.data..percpu)					\
+		*(.data..percpu..shared_aligned)			\
+		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
+	} phdr								\
+	. = VMLINUX_SYMBOL(__per_cpu_load) + SIZEOF(.data..percpu);
+
+/**
+ * PERCPU_CACHEALIGNED - define output section for percpu area, simple version
+ * @align: required alignment
+ * @cacheline: cachline size required by readmostly percpu data
+ *
+ * Align to @align and outputs output section for percpu area.  This
+ * macro doesn't maniuplate @vaddr or @phdr and __per_cpu_load and
+ * __per_cpu_start will be identical.
+ *
+ * This macro is equivalent to ALIGN(align); PERCPU_VADDR_CACHEALIGNED( , ) except
+ * that __per_cpu_load is defined as a relative symbol against
+ * .data..percpu which is required for relocatable x86_32
+ * configuration.
+ */
+#define PERCPU_CACHEALIGNED(align, cacheline)				\
+	. = ALIGN(align);						\
+	.data..percpu	: AT(ADDR(.data..percpu) - LOAD_OFFSET) {	\
+		VMLINUX_SYMBOL(__per_cpu_load) = .;			\
+		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
+		*(.data..percpu..first)					\
+		. = ALIGN(PAGE_SIZE);					\
+		*(.data..percpu..page_aligned)				\
+		. = ALIGN(cacheline);					\
+		*(.data..percpu..readmostly)				\
+		. = ALIGN(cacheline);					\
+		*(.data..percpu)					\
+		*(.data..percpu..shared_aligned)			\
+		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
+	}
 
 /*
  * Definition of the high level *_SECTION macros



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-02  2:02 [patch 2/3] add new macros to make percpu readmostly section correctly align Shaohua Li
@ 2010-12-10 15:14 ` Tejun Heo
  2010-12-13  0:41   ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-10 15:14 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

Hello,

On 12/02/2010 03:02 AM, Shaohua Li wrote:
> percpu readmostly section should start and end at address cachline aligned.
> Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
> I add new macros for x86.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Why add another set of macros?  Can't you just make the default one
aligned?

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-10 15:14 ` Tejun Heo
@ 2010-12-13  0:41   ` Shaohua Li
  2010-12-13  9:47     ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-13  0:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

On Fri, 2010-12-10 at 23:14 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/02/2010 03:02 AM, Shaohua Li wrote:
> > percpu readmostly section should start and end at address cachline aligned.
> > Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
> > I add new macros for x86.
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> 
> Why add another set of macros?  Can't you just make the default one
> aligned?
how to do it without changing all arch code? There isn't cache line size
macro for all arch which could be used in vmlinux.lds.h, IIRC.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-13  0:41   ` Shaohua Li
@ 2010-12-13  9:47     ` Tejun Heo
  2010-12-14  1:08       ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-13  9:47 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

Hello,

On 12/13/2010 01:41 AM, Shaohua Li wrote:
>> Why add another set of macros?  Can't you just make the default one
>> aligned?
> how to do it without changing all arch code? There isn't cache line size
> macro for all arch which could be used in vmlinux.lds.h, IIRC.

Can't we get around that with some ifdefferies?  And even if we'll
have to change lds on every arch, I think we better do it that way.
It's not like the problem exists only on x86, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-13  9:47     ` Tejun Heo
@ 2010-12-14  1:08       ` Shaohua Li
  2010-12-14  9:58         ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-14  1:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

On Mon, 2010-12-13 at 17:47 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/13/2010 01:41 AM, Shaohua Li wrote:
> >> Why add another set of macros?  Can't you just make the default one
> >> aligned?
> > how to do it without changing all arch code? There isn't cache line size
> > macro for all arch which could be used in vmlinux.lds.h, IIRC.
> 
> Can't we get around that with some ifdefferies?  And even if we'll
> have to change lds on every arch, I think we better do it that way.
I don't understand what you mean. defining a cachine line size macro for
all archs? There is such macro, but using it in vmlinux.ld.h always
report error. There is some other defines which can't be included in a
link script.
> It's not like the problem exists only on x86, right?
yes, though currently only x86 has it.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-14  1:08       ` Shaohua Li
@ 2010-12-14  9:58         ` Tejun Heo
  2010-12-15  1:57           ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-14  9:58 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

Hello,

On 12/14/2010 02:08 AM, Shaohua Li wrote:
> I don't understand what you mean. defining a cachine line size macro
> for all archs? There is such macro, but using it in vmlinux.ld.h
> always report error. There is some other defines which can't be
> included in a link script.

I haven't really looked through it but wouldn't it be possible to
ifdef it.  ie. if cacheline macro is available, align it to it;
otherwise, don't.  And, ultimately, the correct thing to do is making
it cacheline aligned on all archs.  There can be several different
ways to get there but it might just as well be making cacheline size
available in all archs first and then update the PERCPU macro.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-14  9:58         ` Tejun Heo
@ 2010-12-15  1:57           ` Shaohua Li
  2010-12-15 14:08             ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-15  1:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

On Tue, 2010-12-14 at 17:58 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/14/2010 02:08 AM, Shaohua Li wrote:
> > I don't understand what you mean. defining a cachine line size macro
> > for all archs? There is such macro, but using it in vmlinux.ld.h
> > always report error. There is some other defines which can't be
> > included in a link script.
> 
> I haven't really looked through it but wouldn't it be possible to
> ifdef it.  ie. if cacheline macro is available, align it to it;
> otherwise, don't.  And, ultimately, the correct thing to do is making
> it cacheline aligned on all archs.  There can be several different
> ways to get there but it might just as well be making cacheline size
> available in all archs first and then update the PERCPU macro.
How about this one?

Subject: Make x86 percpu readmostly section correctly aligned

percpu readmostly section should start and end at address cachline aligned to
avoid cache false sharing. For ARCHs care about the cache false sharing,
they should define INTERNODE_CACHE_BYTES. We use it to do the alignment.
Currently only x86 has percpu readmostly section, so only changed it so far.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

---
 arch/x86/kernel/vmlinux.lds.S     |    2 +-
 include/asm-generic/vmlinux.lds.h |   11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h	2010-12-15 09:27:26.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h	2010-12-15 09:55:22.000000000 +0800
@@ -665,6 +665,13 @@
 	*(.discard.*)							\
 	}
 
+#ifdef INTERNODE_CACHE_BYTES
+#define INTERNODE_CACHEALIGNED						\
+	. = ALIGN(INTERNODE_CACHE_BYTES);
+#else
+#define INTERNODE_CACHEALIGNED
+#endif
+
 /**
  * PERCPU_VADDR - define output section for percpu area
  * @vaddr: explicit base address (optional)
@@ -692,7 +699,9 @@
 		*(.data..percpu..first)					\
 		. = ALIGN(PAGE_SIZE);					\
 		*(.data..percpu..page_aligned)				\
+		INTERNODE_CACHEALIGNED					\
 		*(.data..percpu..readmostly)				\
+		INTERNODE_CACHEALIGNED					\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
@@ -720,7 +729,9 @@
 		*(.data..percpu..first)					\
 		. = ALIGN(PAGE_SIZE);					\
 		*(.data..percpu..page_aligned)				\
+		INTERNODE_CACHEALIGNED					\
 		*(.data..percpu..readmostly)				\
+		INTERNODE_CACHEALIGNED					\
 		*(.data..percpu)					\
 		*(.data..percpu..shared_aligned)			\
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
Index: linux/arch/x86/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:01.000000000 +0800
+++ linux/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:06.000000000 +0800
@@ -20,11 +20,11 @@
 #define LOAD_OFFSET __START_KERNEL_map
 #endif
 
-#include <asm-generic/vmlinux.lds.h>
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
 #include <asm/page_types.h>
 #include <asm/cache.h>
+#include <asm-generic/vmlinux.lds.h>
 #include <asm/boot.h>
 
 #undef i386     /* in case the preprocessor is a 32bit one */



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-15  1:57           ` Shaohua Li
@ 2010-12-15 14:08             ` Tejun Heo
  2010-12-15 14:49               ` Tejun Heo
  2010-12-16  0:53               ` Shaohua Li
  0 siblings, 2 replies; 20+ messages in thread
From: Tejun Heo @ 2010-12-15 14:08 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

Hello,

On 12/15/2010 02:57 AM, Shaohua Li wrote:
> How about this one?

Much better.  :-)

> +#ifdef INTERNODE_CACHE_BYTES
> +#define INTERNODE_CACHEALIGNED						\
> +	. = ALIGN(INTERNODE_CACHE_BYTES);
> +#else
> +#define INTERNODE_CACHEALIGNED
> +#endif

Yeah, this looks good.

> Index: linux/arch/x86/kernel/vmlinux.lds.S
> ===================================================================
> --- linux.orig/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:01.000000000 +0800
> +++ linux/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:06.000000000 +0800
> @@ -20,11 +20,11 @@
>  #define LOAD_OFFSET __START_KERNEL_map
>  #endif
>  
> -#include <asm-generic/vmlinux.lds.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/thread_info.h>
>  #include <asm/page_types.h>
>  #include <asm/cache.h>
> +#include <asm-generic/vmlinux.lds.h>
>  #include <asm/boot.h>

Why do we need this chunk?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-15 14:08             ` Tejun Heo
@ 2010-12-15 14:49               ` Tejun Heo
  2010-12-16  0:55                 ` Shaohua Li
  2010-12-16  0:53               ` Shaohua Li
  1 sibling, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-15 14:49 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

Hello,

On 12/15/2010 03:08 PM, Tejun Heo wrote:
>> +#ifdef INTERNODE_CACHE_BYTES
>> +#define INTERNODE_CACHEALIGNED						\
>> +	. = ALIGN(INTERNODE_CACHE_BYTES);
>> +#else
>> +#define INTERNODE_CACHEALIGNED
>> +#endif
> 
> Yeah, this looks good.

Just one more thing.  If you look at various arch linker scripts,
cache line size is always present.  After all, RW_DATA_SECTION() needs
it.  They're different define's or sometimes hardcoded values but it
would be nice if we can take this chance to unify them and use it for
this too.  Would you be interested in doing that?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-15 14:08             ` Tejun Heo
  2010-12-15 14:49               ` Tejun Heo
@ 2010-12-16  0:53               ` Shaohua Li
  2010-12-16  5:46                 ` Sam Ravnborg
  1 sibling, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-16  0:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > How about this one?
> 
> Much better.  :-)
> 
> > +#ifdef INTERNODE_CACHE_BYTES
> > +#define INTERNODE_CACHEALIGNED						\
> > +	. = ALIGN(INTERNODE_CACHE_BYTES);
> > +#else
> > +#define INTERNODE_CACHEALIGNED
> > +#endif
> 
> Yeah, this looks good.
> 
> > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:01.000000000 +0800
> > +++ linux/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:06.000000000 +0800
> > @@ -20,11 +20,11 @@
> >  #define LOAD_OFFSET __START_KERNEL_map
> >  #endif
> >  
> > -#include <asm-generic/vmlinux.lds.h>
> >  #include <asm/asm-offsets.h>
> >  #include <asm/thread_info.h>
> >  #include <asm/page_types.h>
> >  #include <asm/cache.h>
> > +#include <asm-generic/vmlinux.lds.h>
> >  #include <asm/boot.h>
> 
> Why do we need this chunk?
the cache size is defined in cache.h, so I need move vmlinux.lds.h after
cache.h

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-15 14:49               ` Tejun Heo
@ 2010-12-16  0:55                 ` Shaohua Li
  0 siblings, 0 replies; 20+ messages in thread
From: Shaohua Li @ 2010-12-16  0:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, hpa, Andrew Morton, sam, eric.dumazet

On Wed, 2010-12-15 at 22:49 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/15/2010 03:08 PM, Tejun Heo wrote:
> >> +#ifdef INTERNODE_CACHE_BYTES
> >> +#define INTERNODE_CACHEALIGNED						\
> >> +	. = ALIGN(INTERNODE_CACHE_BYTES);
> >> +#else
> >> +#define INTERNODE_CACHEALIGNED
> >> +#endif
> > 
> > Yeah, this looks good.
> 
> Just one more thing.  If you look at various arch linker scripts,
> cache line size is always present.  After all, RW_DATA_SECTION() needs
> it.  They're different define's or sometimes hardcoded values but it
> would be nice if we can take this chance to unify them and use it for
> this too.  Would you be interested in doing that?
might not, sorry.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-16  0:53               ` Shaohua Li
@ 2010-12-16  5:46                 ` Sam Ravnborg
  2010-12-16  5:56                   ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Ravnborg @ 2010-12-16  5:46 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Tejun Heo, lkml, hpa, Andrew Morton, eric.dumazet

On Thu, Dec 16, 2010 at 08:53:37AM +0800, Shaohua Li wrote:
> On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> > Hello,
> > 
> > On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > > How about this one?
> > 
> > Much better.  :-)
> > 
> > > +#ifdef INTERNODE_CACHE_BYTES
> > > +#define INTERNODE_CACHEALIGNED						\
> > > +	. = ALIGN(INTERNODE_CACHE_BYTES);
> > > +#else
> > > +#define INTERNODE_CACHEALIGNED
> > > +#endif
> > 
> > Yeah, this looks good.
> > 
> > > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > > ===================================================================
> > > --- linux.orig/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:01.000000000 +0800
> > > +++ linux/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:06.000000000 +0800
> > > @@ -20,11 +20,11 @@
> > >  #define LOAD_OFFSET __START_KERNEL_map
> > >  #endif
> > >  
> > > -#include <asm-generic/vmlinux.lds.h>
> > >  #include <asm/asm-offsets.h>
> > >  #include <asm/thread_info.h>
> > >  #include <asm/page_types.h>
> > >  #include <asm/cache.h>
> > > +#include <asm-generic/vmlinux.lds.h>
> > >  #include <asm/boot.h>
> > 
> > Why do we need this chunk?
> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> cache.h

The right fix is to move the inclusion of cache.h to asm-generic/vmlinux.lds.h.
A quick audit only found sparc that failed to guard non assembler stuff.

	Sam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-16  5:46                 ` Sam Ravnborg
@ 2010-12-16  5:56                   ` Shaohua Li
  2010-12-16  9:50                     ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-16  5:56 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Tejun Heo, lkml, hpa, Andrew Morton, eric.dumazet

On Thu, 2010-12-16 at 13:46 +0800, Sam Ravnborg wrote:
> On Thu, Dec 16, 2010 at 08:53:37AM +0800, Shaohua Li wrote:
> > On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > > > How about this one?
> > > 
> > > Much better.  :-)
> > > 
> > > > +#ifdef INTERNODE_CACHE_BYTES
> > > > +#define INTERNODE_CACHEALIGNED						\
> > > > +	. = ALIGN(INTERNODE_CACHE_BYTES);
> > > > +#else
> > > > +#define INTERNODE_CACHEALIGNED
> > > > +#endif
> > > 
> > > Yeah, this looks good.
> > > 
> > > > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > > > ===================================================================
> > > > --- linux.orig/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:01.000000000 +0800
> > > > +++ linux/arch/x86/kernel/vmlinux.lds.S	2010-12-15 09:37:06.000000000 +0800
> > > > @@ -20,11 +20,11 @@
> > > >  #define LOAD_OFFSET __START_KERNEL_map
> > > >  #endif
> > > >  
> > > > -#include <asm-generic/vmlinux.lds.h>
> > > >  #include <asm/asm-offsets.h>
> > > >  #include <asm/thread_info.h>
> > > >  #include <asm/page_types.h>
> > > >  #include <asm/cache.h>
> > > > +#include <asm-generic/vmlinux.lds.h>
> > > >  #include <asm/boot.h>
> > > 
> > > Why do we need this chunk?
> > the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> > cache.h
> 
> The right fix is to move the inclusion of cache.h to asm-generic/vmlinux.lds.h.
> A quick audit only found sparc that failed to guard non assembler stuff.
with this, we need check every arch, at least doing a compile. I'm
afraid I can't, sorry.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-16  5:56                   ` Shaohua Li
@ 2010-12-16  9:50                     ` Tejun Heo
  2010-12-20  1:28                       ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-16  9:50 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

Hello, Shaohua.

On 12/16/2010 06:56 AM, Shaohua Li wrote:
>>>>> -#include <asm-generic/vmlinux.lds.h>
>>>>>  #include <asm/asm-offsets.h>
>>>>>  #include <asm/thread_info.h>
>>>>>  #include <asm/page_types.h>
>>>>>  #include <asm/cache.h>
>>>>> +#include <asm-generic/vmlinux.lds.h>
>>>>>  #include <asm/boot.h>
>>>>
>>>> Why do we need this chunk?
>>> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
>>> cache.h
>>
>> The right fix is to move the inclusion of cache.h to
>> asm-generic/vmlinux.lds.h.  A quick audit only found sparc that
>> failed to guard non assembler stuff.
>
> with this, we need check every arch, at least doing a compile. I'm
> afraid I can't, sorry.

Not being able to cross build every arch is okay but you at least need
to make an effort to make things easily applicable to other archs and
avoid adding subtle ugliness like the above.  Please at least try to
look at other arch codes and see how things can be made to work across
different archs.  Setting up cross compilers for the major archs, for
example, sparc, power and ia64 isn't that difficult either.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-16  9:50                     ` Tejun Heo
@ 2010-12-20  1:28                       ` Shaohua Li
  2010-12-20 15:55                         ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-20  1:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

On Thu, 2010-12-16 at 17:50 +0800, Tejun Heo wrote:
> Hello, Shaohua.
> 
> On 12/16/2010 06:56 AM, Shaohua Li wrote:
> >>>>> -#include <asm-generic/vmlinux.lds.h>
> >>>>>  #include <asm/asm-offsets.h>
> >>>>>  #include <asm/thread_info.h>
> >>>>>  #include <asm/page_types.h>
> >>>>>  #include <asm/cache.h>
> >>>>> +#include <asm-generic/vmlinux.lds.h>
> >>>>>  #include <asm/boot.h>
> >>>>
> >>>> Why do we need this chunk?
> >>> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> >>> cache.h
> >>
> >> The right fix is to move the inclusion of cache.h to
> >> asm-generic/vmlinux.lds.h.  A quick audit only found sparc that
> >> failed to guard non assembler stuff.
> >
> > with this, we need check every arch, at least doing a compile. I'm
> > afraid I can't, sorry.
> 
> Not being able to cross build every arch is okay but you at least need
> to make an effort to make things easily applicable to other archs and
> avoid adding subtle ugliness like the above.  Please at least try to
> look at other arch codes and see how things can be made to work across
> different archs.  Setting up cross compilers for the major archs, for
> example, sparc, power and ia64 isn't that difficult either.
This still needs I fix every arch, for example, as Sam pointed out,
spark build will fail. I really have the bandwidth and capability to do
this. Increment changes are always preferred. My original patch is
trying to follow increment changes way.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-20  1:28                       ` Shaohua Li
@ 2010-12-20 15:55                         ` Tejun Heo
  2010-12-23  2:38                           ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-20 15:55 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

Hello,

On 12/20/2010 02:28 AM, Shaohua Li wrote:
>> Not being able to cross build every arch is okay but you at least need
>> to make an effort to make things easily applicable to other archs and
>> avoid adding subtle ugliness like the above.  Please at least try to
>> look at other arch codes and see how things can be made to work across
>> different archs.  Setting up cross compilers for the major archs, for
>> example, sparc, power and ia64 isn't that difficult either.
>
> This still needs I fix every arch, for example, as Sam pointed out,
> spark build will fail. I really have the bandwidth and capability to do
> this. Increment changes are always preferred. My original patch is
> trying to follow increment changes way.

Yeah, well, in my experience, those approaches usually just end up
piling partial conversions on top of different set of partial
conversions.  It's not like we're talking about a major arch dependent
feature it's just a matter of looking through different arch codes,
maybe test building some, pushing things to linux-next and dealing
with fallouts if there is any.  I'll try to do it myself.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-20 15:55                         ` Tejun Heo
@ 2010-12-23  2:38                           ` Shaohua Li
  2010-12-27 12:14                             ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-23  2:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

On Mon, 2010-12-20 at 23:55 +0800, Tejun Heo wrote:
> Hello,
> 
> On 12/20/2010 02:28 AM, Shaohua Li wrote:
> >> Not being able to cross build every arch is okay but you at least need
> >> to make an effort to make things easily applicable to other archs and
> >> avoid adding subtle ugliness like the above.  Please at least try to
> >> look at other arch codes and see how things can be made to work across
> >> different archs.  Setting up cross compilers for the major archs, for
> >> example, sparc, power and ia64 isn't that difficult either.
> >
> > This still needs I fix every arch, for example, as Sam pointed out,
> > spark build will fail. I really have the bandwidth and capability to do
> > this. Increment changes are always preferred. My original patch is
> > trying to follow increment changes way.
> 
> Yeah, well, in my experience, those approaches usually just end up
> piling partial conversions on top of different set of partial
> conversions.  It's not like we're talking about a major arch dependent
> feature it's just a matter of looking through different arch codes,
> maybe test building some, pushing things to linux-next and dealing
> with fallouts if there is any.  I'll try to do it myself.
Thanks for doing it and sorry I didn't do it.
The first patch in the series isn't related to this issue, please
consider merge it.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-23  2:38                           ` Shaohua Li
@ 2010-12-27 12:14                             ` Tejun Heo
  2010-12-28  0:26                               ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2010-12-27 12:14 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

On Thu, Dec 23, 2010 at 10:38:34AM +0800, Shaohua Li wrote:
> The first patch in the series isn't related to this issue, please
> consider merge it.

Can you please re-post it?  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-27 12:14                             ` Tejun Heo
@ 2010-12-28  0:26                               ` Shaohua Li
  2010-12-28 11:13                                 ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2010-12-28  0:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Sam Ravnborg, lkml, hpa, Andrew Morton, eric.dumazet

On Mon, 2010-12-27 at 20:14 +0800, Tejun Heo wrote:
> On Thu, Dec 23, 2010 at 10:38:34AM +0800, Shaohua Li wrote:
> > The first patch in the series isn't related to this issue, please
> > consider merge it.
> 
> Can you please re-post it?  Thanks.
here it is.

Subject: make readmostly section correctly align

readmostly section should end at cache line aligned address, otherwise the last
several data might share cachline with other data and make the readmostly data
still have cache bounce.
For example, in ia64, secpath_cachep is the last readmostly data, and it shares
cacheline with init_uts_ns.
a000000100e80480 d secpath_cachep
a000000100e80488 D init_uts_ns

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

---
 include/asm-generic/vmlinux.lds.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h	2010-12-01 16:49:48.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h	2010-12-02 09:22:32.000000000 +0800
@@ -192,7 +192,8 @@
 
 #define READ_MOSTLY_DATA(align)						\
 	. = ALIGN(align);						\
-	*(.data..read_mostly)
+	*(.data..read_mostly)						\
+	. = ALIGN(align);
 
 #define CACHELINE_ALIGNED_DATA(align)					\
 	. = ALIGN(align);						\



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/3] add new macros to make percpu readmostly section correctly align
  2010-12-28  0:26                               ` Shaohua Li
@ 2010-12-28 11:13                                 ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2010-12-28 11:13 UTC (permalink / raw)
  To: Shaohua Li, Andrew Morton; +Cc: Sam Ravnborg, lkml, hpa, eric.dumazet

Hello,

On Tue, Dec 28, 2010 at 08:26:57AM +0800, Shaohua Li wrote:
> Subject: make readmostly section correctly align
> 
> readmostly section should end at cache line aligned address, otherwise the last
> several data might share cachline with other data and make the readmostly data
> still have cache bounce.
> For example, in ia64, secpath_cachep is the last readmostly data, and it shares
> cacheline with init_uts_ns.
> a000000100e80480 d secpath_cachep
> a000000100e80488 D init_uts_ns
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Yeap, this looks good to me.

 Acked-by: Tejun Heo <tj@kernel.org>

I can route this through percpu tree but think -mm is the better fit.
Andrew, can you please take this one?

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-12-28 11:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-02  2:02 [patch 2/3] add new macros to make percpu readmostly section correctly align Shaohua Li
2010-12-10 15:14 ` Tejun Heo
2010-12-13  0:41   ` Shaohua Li
2010-12-13  9:47     ` Tejun Heo
2010-12-14  1:08       ` Shaohua Li
2010-12-14  9:58         ` Tejun Heo
2010-12-15  1:57           ` Shaohua Li
2010-12-15 14:08             ` Tejun Heo
2010-12-15 14:49               ` Tejun Heo
2010-12-16  0:55                 ` Shaohua Li
2010-12-16  0:53               ` Shaohua Li
2010-12-16  5:46                 ` Sam Ravnborg
2010-12-16  5:56                   ` Shaohua Li
2010-12-16  9:50                     ` Tejun Heo
2010-12-20  1:28                       ` Shaohua Li
2010-12-20 15:55                         ` Tejun Heo
2010-12-23  2:38                           ` Shaohua Li
2010-12-27 12:14                             ` Tejun Heo
2010-12-28  0:26                               ` Shaohua Li
2010-12-28 11:13                                 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).