All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
@ 2011-08-05 13:15 Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton
  Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

Hi,

a small refinement of the patchset from yesterday per hpa's comments:

* put mask and flags into a single cacheline and make it __read_mostly

* change alignment computation back to clearing bits [14:12] so that a
mask of 0x0 can have no effect on the address.

Please take a look and apply, if no objections.

Thanks.

---
Changelog:

v3:
here's an updated and revised patchset addressing all comments from last
time:

* saturate bits [14:12] instead of clearing them
* calculate the mask from the CPUID 0x8000_0005 IC identifier instead of
hardcoding it

v2:
here's the second version of this patch which actually turned into a
small patchset. As Ingo suggested, the initial patch stays first to ease
backporting and the following 3 patches address (hopefully) all review
comments from the initial submission. The patchset has been tested with
Debian's old stable lenny (i.e. 5.0) distro in a 32-bit environment and
all worked as expected.

Below some performance data to show that there is no noticeable
performance degradation introduced by the changeset.

So please, do take a look again and let me know.

Thanks.

VA alignment enabled.
====================

 Performance counter stats for './build.sh' (10 runs):

    3187047.935990 task-clock                #   24.001 CPUs utilized            ( +-  1.37% )
           510,888 context-switches          #    0.000 M/sec                    ( +-  0.44% )
            60,712 CPU-migrations            #    0.000 M/sec                    ( +-  0.51% )
        26,046,891 page-faults               #    0.008 M/sec                    ( +-  0.00% )
 1,841,068,123,735 cycles                    #    0.578 GHz                      ( +-  1.10% ) [63.39%]
   560,044,437,348 stalled-cycles-frontend   #   30.42% frontend cycles idle     ( +-  1.13% ) [64.65%]
   436,165,228,465 stalled-cycles-backend    #   23.69% backend  cycles idle     ( +-  1.19% ) [67.21%]
 1,461,854,088,667 instructions              #    0.79  insns per cycle
                                             #    0.38  stalled cycles per insn  ( +-  0.77% ) [70.31%]
   334,169,452,362 branches                  #  104.852 M/sec                    ( +-  1.20% ) [69.43%]
    21,485,007,982 branch-misses             #    6.43% of all branches          ( +-  0.68% ) [65.01%]

     132.787483539 seconds time elapsed                                          ( +-  1.37% )


VA alignment disabled
=====================

Performance counter stats for './build.sh' (10 runs):

    3173688.887193 task-clock                #   24.001 CPUs utilized            ( +-  1.37% )
           511,425 context-switches          #    0.000 M/sec                    ( +-  0.28% )
            60,522 CPU-migrations            #    0.000 M/sec                    ( +-  0.60% )
        26,046,902 page-faults               #    0.008 M/sec                    ( +-  0.00% )
 1,832,825,813,094 cycles                    #    0.578 GHz                      ( +-  0.96% ) [63.60%]
   563,123,451,900 stalled-cycles-frontend   #   30.72% frontend cycles idle     ( +-  0.96% ) [63.97%]
   439,565,070,106 stalled-cycles-backend    #   23.98% backend  cycles idle     ( +-  1.23% ) [66.69%]
 1,465,314,643,020 instructions              #    0.80  insns per cycle
                                             #    0.38  stalled cycles per insn  ( +-  0.74% ) [70.11%]
   332,416,669,982 branches                  #  104.741 M/sec                    ( +-  0.85% ) [69.71%]
    21,181,821,204 branch-misses             #    6.37% of all branches          ( +-  0.97% ) [65.93%]

     132.230903628 seconds time elapsed                                          ( +-  1.37% )

stock 3.0
=========

 Performance counter stats for './build.sh' (10 runs):

    3369707.240439 task-clock                #   24.001 CPUs utilized            ( +-  1.18% )
           510,450 context-switches          #    0.000 M/sec                    ( +-  0.29% )
            58,906 CPU-migrations            #    0.000 M/sec                    ( +-  0.35% )
        26,057,272 page-faults               #    0.008 M/sec                    ( +-  0.00% )
 1,836,326,075,063 cycles                    #    0.545 GHz                      ( +-  1.05% ) [63.51%]
   561,850,647,545 stalled-cycles-frontend   #   30.60% frontend cycles idle     ( +-  1.03% ) [64.17%]
   439,923,021,200 stalled-cycles-backend    #   23.96% backend  cycles idle     ( +-  1.10% ) [66.64%]
 1,467,236,934,265 instructions              #    0.80  insns per cycle
                                             #    0.38  stalled cycles per insn  ( +-  0.87% ) [70.06%]
   331,937,054,120 branches                  #   98.506 M/sec                    ( +-  0.81% ) [69.83%]
    21,228,553,080 branch-misses             #    6.40% of all branches          ( +-  0.87% ) [65.79%]

     140.398317711 seconds time elapsed                                          ( +-  1.18% )



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH -v3.1 1/3] x86, AMD: Correct F15h IC aliasing issue
  2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
  2011-08-05 22:58   ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton
  Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.

It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 Documentation/kernel-parameters.txt |   13 ++++++
 arch/x86/include/asm/elf.h          |   31 +++++++++++++
 arch/x86/kernel/cpu/amd.c           |   13 ++++++
 arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
 arch/x86/mm/mmap.c                  |   15 ------
 arch/x86/vdso/vma.c                 |    9 ++++
 6 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aa47be7..af73c03 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -299,6 +299,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			behaviour to be specified.  Bit 0 enables warnings,
 			bit 1 enables fixups, and bit 2 sends a segfault.
 
+	align_va_addr=	[X86-64]
+			Align virtual addresses by clearing slice [14:12] when
+			allocating a VMA at process creation time. This option
+			gives you up to 3% performance improvement on AMD F15h
+			machines (where it is enabled by default) for a
+			CPU-intensive style benchmark, and it can vary highly in
+			a microbenchmark depending on workload and compiler.
+
+			1: only for 32-bit processes
+			2: only for 64-bit processes
+			on: enable for both 32- and 64-bit processes
+			off: disable for both 32- and 64-bit processes
+
 	amd_iommu=	[HW,X86-84]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
 /*
  * ELF register definitions..
  */
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 #include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
 extern unsigned long arch_randomize_brk(struct mm_struct *mm);
 #define arch_randomize_brk arch_randomize_brk
 
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+	return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+	if (test_thread_flag(TIF_IA32))
+		return 1;
+#endif
+	return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+	ALIGN_VA_32	= BIT(0),
+	ALIGN_VA_64	= BIT(1),
+	ALIGN_VDSO	= BIT(2),
+	ALIGN_TOPDOWN	= BIT(3),
+};
+
+struct va_alignment {
+	int flags;
+	unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
 #endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 					"with P0 frequency!\n");
 		}
 	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+
+	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+			 enum align_flags flags)
+{
+	unsigned long tmp_addr;
+
+	/* handle 32- and 64-bit case with a single conditional */
+	if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+		return addr;
+
+	if (!(current->flags & PF_RANDOMIZE))
+		return addr;
+
+	if (!((flags & ALIGN_VDSO) || filp))
+		return addr;
+
+	tmp_addr = addr;
+
+	/*
+	 * We need an address which is <= than the original
+	 * one only when in topdown direction.
+	 */
+	if (!(flags & ALIGN_TOPDOWN))
+		tmp_addr += va_align.mask;
+
+	tmp_addr &= ~va_align.mask;
+
+	return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+	/* guard against enabling this on other CPU families */
+	if (va_align.flags < 0)
+		return 1;
+
+	if (*str == 0)
+		return 1;
+
+	if (*str == '=')
+		str++;
+
+	if (!strcmp(str, "32"))
+		va_align.flags = ALIGN_VA_32;
+	else if (!strcmp(str, "64"))
+		va_align.flags = ALIGN_VA_64;
+	else if (!strcmp(str, "off"))
+		va_align.flags = 0;
+	else if (!strcmp(str, "on"))
+		va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+	else
+		return 0;
+
+	return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		unsigned long, prot, unsigned long, flags,
 		unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	start_addr = addr;
 
 full_search:
+
+	addr = align_addr(addr, filp, 0);
+
 	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
 			mm->cached_hole_size = vma->vm_start - addr;
 
 		addr = vma->vm_end;
+		addr = align_addr(addr, filp, 0);
 	}
 }
 
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* make sure it can fit in the remaining address space */
 	if (addr > len) {
-		vma = find_vma(mm, addr-len);
-		if (!vma || addr <= vma->vm_start)
+		unsigned long tmp_addr = align_addr(addr - len, filp,
+						    ALIGN_TOPDOWN);
+
+		vma = find_vma(mm, tmp_addr);
+		if (!vma || tmp_addr + len <= vma->vm_start)
 			/* remember the address as a hint for next time */
-			return mm->free_area_cache = addr-len;
+			return mm->free_area_cache = tmp_addr;
 	}
 
 	if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	addr = mm->mmap_base-len;
 
 	do {
+		addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
 		/*
 		 * Lookup failure means no vma is above this address,
 		 * else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
 #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
 #define MAX_GAP (TASK_SIZE/6*5)
 
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
-	return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
-	if (test_thread_flag(TIF_IA32))
-		return 1;
-#endif
-	return 0;
-}
-
 static int mmap_is_legacy(void)
 {
 	if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 7abd2be..caa42ce 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -69,6 +69,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 	addr = start + (offset << PAGE_SHIFT);
 	if (addr >= end)
 		addr = end;
+
+	/*
+	 * page-align it here so that get_unmapped_area doesn't
+	 * align it wrongfully again to the next page. addr can come in 4K
+	 * unaligned here as a result of stack start randomization.
+	 */
+	addr = PAGE_ALIGN(addr);
+	addr = align_addr(addr, NULL, ALIGN_VDSO);
+
 	return addr;
 }
 
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper
  2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
  2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
  3 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton
  Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

Add a function ptr to struct x86_cpuinit_ops which is destined to be run
only once on the BSP during boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/x86_init.h |    1 +
 arch/x86/kernel/cpu/common.c    |    2 ++
 arch/x86/kernel/x86_init.c      |    1 +
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index d3d8590..08994a0 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -147,6 +147,7 @@ struct x86_init_ops {
  */
 struct x86_cpuinit_ops {
 	void (*setup_percpu_clockev)(void);
+	void (*run_on_bsp)(void);
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..465e633 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	filter_cpuid_features(c, false);
 
 	setup_smep(c);
+
+	x86_cpuinit.run_on_bsp();
 }
 
 void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f164bd..76b37ed 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -90,6 +90,7 @@ struct x86_init_ops x86_init __initdata = {
 
 struct x86_cpuinit_ops x86_cpuinit __cpuinitdata = {
 	.setup_percpu_clockev		= setup_secondary_APIC_clock,
+	.run_on_bsp			= x86_init_noop,
 };
 
 static void default_nmi_init(void) { };
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -v3.1 3/3] x86, AMD: Move BSP code to cpuinit helper
  2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
  2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
  2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
  3 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton
  Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

Move code which is run once on the BSP during boot into the cpuinit
helper.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/amd.c |   60 +++++++++++++++++++++++---------------------
 1 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..16939b8 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,36 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void __cpuinit amd_run_on_bsp(void)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+
+		if (c->x86 > 0x10 ||
+		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+			u64 val;
+
+			rdmsrl(MSR_K7_HWCR, val);
+			if (!(val & BIT(24)))
+				printk(KERN_WARNING FW_BUG "TSC doesn't count "
+					"with P0 frequency!\n");
+		}
+	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+	}
+}
+
 static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 {
 	early_init_amd_mc(c);
@@ -442,35 +472,7 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 	}
 #endif
 
-	/* We need to do the following only once */
-	if (c != &boot_cpu_data)
-		return;
-
-	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
-		if (c->x86 > 0x10 ||
-		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
-			u64 val;
-
-			rdmsrl(MSR_K7_HWCR, val);
-			if (!(val & BIT(24)))
-				printk(KERN_WARNING FW_BUG "TSC doesn't count "
-					"with P0 frequency!\n");
-		}
-	}
-
-	if (c->x86 == 0x15) {
-		unsigned long upperbit;
-		u32 cpuid, assoc;
-
-		cpuid	 = cpuid_edx(0x80000005);
-		assoc	 = cpuid >> 16 & 0xff;
-		upperbit = ((cpuid >> 24) << 10) / assoc;
-
-		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
-		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
-
-	}
+	x86_cpuinit.run_on_bsp = amd_run_on_bsp;
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
  2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
                   ` (2 preceding siblings ...)
  2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
@ 2011-08-05 17:10 ` H. Peter Anvin
  2011-08-05 17:55   ` Borislav Petkov
  3 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 17:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Andre Przywara, Martin Pohlack, LKML,
	Borislav Petkov

On 08/05/2011 06:15 AM, Borislav Petkov wrote:
> From: Borislav Petkov <borislav.petkov@amd.com>
> 
> Hi,
> 
> a small refinement of the patchset from yesterday per hpa's comments:
> 
> * put mask and flags into a single cacheline and make it __read_mostly
> 
> * change alignment computation back to clearing bits [14:12] so that a
> mask of 0x0 can have no effect on the address.
> 
> Please take a look and apply, if no objections.
> 

Patch 1 looks good now.

Patch 2 I'm going to object to because it puts your run_on_bsp method
into a different structure where all the existing methods for this
already are in a way that looks totally gratuitous to me.  Why not just
have a c_bsp_init on struct cpu_dev like all the other methods?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
  2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
@ 2011-08-05 17:55   ` Borislav Petkov
  2011-08-05 18:01     ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
  2011-08-05 18:04     ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
  0 siblings, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 17:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML

On Fri, Aug 05, 2011 at 01:10:29PM -0400, H. Peter Anvin wrote:
> Patch 2 I'm going to object to because it puts your run_on_bsp method
> into a different structure where all the existing methods for this
> already are in a way that looks totally gratuitous to me.  Why not just
> have a c_bsp_init on struct cpu_dev like all the other methods?

Indeed, cpu_dev is actually the only logical place to put them, thanks.
I'm sending updated versions as a reply to this message.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper
  2011-08-05 17:55   ` Borislav Petkov
@ 2011-08-05 18:01     ` Borislav Petkov
  2011-08-05 22:58       ` [tip:x86/cpu] " tip-bot for Borislav Petkov
  2011-08-05 18:04     ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
  1 sibling, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 18:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML

Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/common.c |    3 +++
 arch/x86/kernel/cpu/cpu.h    |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..8ed394a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	filter_cpuid_features(c, false);
 
 	setup_smep(c);
+
+	if (this_cpu->c_bsp_init)
+		this_cpu->c_bsp_init(c);
 }
 
 void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e765633..1b22dcc 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -18,6 +18,7 @@ struct cpu_dev {
 	struct		cpu_model_info c_models[4];
 
 	void            (*c_early_init)(struct cpuinfo_x86 *);
+	void		(*c_bsp_init)(struct cpuinfo_x86 *);
 	void		(*c_init)(struct cpuinfo_x86 *);
 	void		(*c_identify)(struct cpuinfo_x86 *);
 	unsigned int	(*c_size_cache)(struct cpuinfo_x86 *, unsigned int);
-- 
1.7.4.rc2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
  2011-08-05 17:55   ` Borislav Petkov
  2011-08-05 18:01     ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
@ 2011-08-05 18:04     ` Borislav Petkov
  2011-08-05 20:07       ` H. Peter Anvin
  2011-08-05 22:59       ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov
  1 sibling, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 18:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML

Move code which is run once on the BSP during boot into the cpu_dev
helper.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/amd.c |   59 ++++++++++++++++++++++-----------------------
 1 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..53d96f5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,34 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
+{
+	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+
+		if (c->x86 > 0x10 ||
+		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+			u64 val;
+
+			rdmsrl(MSR_K7_HWCR, val);
+			if (!(val & BIT(24)))
+				printk(KERN_WARNING FW_BUG "TSC doesn't count "
+					"with P0 frequency!\n");
+		}
+	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+	}
+}
+
 static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 {
 	early_init_amd_mc(c);
@@ -441,36 +469,6 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
 	}
 #endif
-
-	/* We need to do the following only once */
-	if (c != &boot_cpu_data)
-		return;
-
-	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
-		if (c->x86 > 0x10 ||
-		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
-			u64 val;
-
-			rdmsrl(MSR_K7_HWCR, val);
-			if (!(val & BIT(24)))
-				printk(KERN_WARNING FW_BUG "TSC doesn't count "
-					"with P0 frequency!\n");
-		}
-	}
-
-	if (c->x86 == 0x15) {
-		unsigned long upperbit;
-		u32 cpuid, assoc;
-
-		cpuid	 = cpuid_edx(0x80000005);
-		assoc	 = cpuid >> 16 & 0xff;
-		upperbit = ((cpuid >> 24) << 10) / assoc;
-
-		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
-		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
-
-	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
@@ -692,6 +690,7 @@ static const struct cpu_dev __cpuinitconst amd_cpu_dev = {
 	.c_size_cache	= amd_size_cache,
 #endif
 	.c_early_init   = early_init_amd,
+	.c_bsp_init	= bsp_init_amd,
 	.c_init		= init_amd,
 	.c_x86_vendor	= X86_VENDOR_AMD,
 };
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
  2011-08-05 18:04     ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
@ 2011-08-05 20:07       ` H. Peter Anvin
  2011-08-05 22:52         ` Borislav Petkov
  2011-08-05 22:59       ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov
  1 sibling, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 20:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML

On 08/05/2011 11:04 AM, Borislav Petkov wrote:
> Move code which is run once on the BSP during boot into the cpu_dev
> helper.
> +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
> +{
> +	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +

You can't use static_cpu_has() here, since this code runs before
alternatives -- it will always be false.  Furthermore, for code that
only runs once, it is never a win to do patching.

Arguably bsp_init should be __init and not __cpuinit, but I don't know
how to make that work with the machinery, and is something that can be
fixed anyway.

	-hpa


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
  2011-08-05 20:07       ` H. Peter Anvin
@ 2011-08-05 22:52         ` Borislav Petkov
  2011-08-05 22:56           ` H. Peter Anvin
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 22:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton, Avi Kivity, Przywara, Andre, Pohlack, Martin,
	LKML

On Fri, Aug 05, 2011 at 04:07:40PM -0400, H. Peter Anvin wrote:
> On 08/05/2011 11:04 AM, Borislav Petkov wrote:
> > Move code which is run once on the BSP during boot into the cpu_dev
> > helper.
> > +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
> > +{
> > +	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> > +
> 
> You can't use static_cpu_has() here, since this code runs before
> alternatives -- it will always be false.  Furthermore, for code that
> only runs once, it is never a win to do patching.

Oh crap, this is a leftover from when run_on_bsp was struct
x86_cpuinit_ops member with no args. And I f*cked it up even then
although I went and got myself a pointer to boot_cpu_data:

+static void __cpuinit amd_run_on_bsp(void)
+{
+       struct cpuinfo_x86 *c = &boot_cpu_data;
+
+       if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {

but forgot to use it. Good catch, will fix it tomorrow.

> Arguably bsp_init should be __init and not __cpuinit, but I don't know
> how to make that work with the machinery, and is something that can be
> fixed anyway.

Yeah, how do we do that? struct cpu_dev is __cpuinitconst,
x86_cpuinit_ops is __cpuinitdata.

We could add it to identify_boot_cpu() - there's already some per-vendor
stuff like init_amd_e400_c1e_mask() which wouldn't hurt to be behind a
vendor check. early_identify_cpu() does already the vendor check with
get_cpu_vendor() so later, in identify_cpu() we could add a run_on_bsp()
which is __init and switch/case on the ->x86_vendor inside.

Then we can collect all the run-once-on-the-BSP code in there.

Hmmm..

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
  2011-08-05 22:52         ` Borislav Petkov
@ 2011-08-05 22:56           ` H. Peter Anvin
  0 siblings, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 22:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML

On 08/05/2011 03:52 PM, Borislav Petkov wrote:
> On Fri, Aug 05, 2011 at 04:07:40PM -0400, H. Peter Anvin wrote:
>> On 08/05/2011 11:04 AM, Borislav Petkov wrote:
>>> Move code which is run once on the BSP during boot into the cpu_dev
>>> helper.
>>> +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
>>> +{
>>> +	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
>>> +
>>
>> You can't use static_cpu_has() here, since this code runs before
>> alternatives -- it will always be false.  Furthermore, for code that
>> only runs once, it is never a win to do patching.
> 
> Oh crap, this is a leftover from when run_on_bsp was struct
> x86_cpuinit_ops member with no args. And I f*cked it up even then
> although I went and got myself a pointer to boot_cpu_data:
> 

I fixed it up directly.

> 
>> Arguably bsp_init should be __init and not __cpuinit, but I don't know
>> how to make that work with the machinery, and is something that can be
>> fixed anyway.
> 
> Yeah, how do we do that? struct cpu_dev is __cpuinitconst,
> x86_cpuinit_ops is __cpuinitdata.
> 
> We could add it to identify_boot_cpu() - there's already some per-vendor
> stuff like init_amd_e400_c1e_mask() which wouldn't hurt to be behind a
> vendor check. early_identify_cpu() does already the vendor check with
> get_cpu_vendor() so later, in identify_cpu() we could add a run_on_bsp()
> which is __init and switch/case on the ->x86_vendor inside.
> 
> Then we can collect all the run-once-on-the-BSP code in there.
> 
> Hmmm..
> 

As I said, we can fix this up incrementally.

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
@ 2011-08-05 22:58   ` tip-bot for Borislav Petkov
  2011-08-06  0:10     ` H. Peter Anvin
  0 siblings, 1 reply; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:58 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, hpa, borislav.petkov

Commit-ID:  dfb09f9b7ab03fd367740e541a5caf830ed56726
Gitweb:     http://git.kernel.org/tip/dfb09f9b7ab03fd367740e541a5caf830ed56726
Author:     Borislav Petkov <borislav.petkov@amd.com>
AuthorDate: Fri, 5 Aug 2011 15:15:08 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:26:44 -0700

x86, amd: Avoid cache aliasing penalties on AMD family 15h

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.

It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 Documentation/kernel-parameters.txt |   13 ++++++
 arch/x86/include/asm/elf.h          |   31 +++++++++++++
 arch/x86/kernel/cpu/amd.c           |   13 ++++++
 arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
 arch/x86/mm/mmap.c                  |   15 ------
 arch/x86/vdso/vma.c                 |    9 ++++
 6 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aa47be7..af73c03 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -299,6 +299,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			behaviour to be specified.  Bit 0 enables warnings,
 			bit 1 enables fixups, and bit 2 sends a segfault.
 
+	align_va_addr=	[X86-64]
+			Align virtual addresses by clearing slice [14:12] when
+			allocating a VMA at process creation time. This option
+			gives you up to 3% performance improvement on AMD F15h
+			machines (where it is enabled by default) for a
+			CPU-intensive style benchmark, and it can vary highly in
+			a microbenchmark depending on workload and compiler.
+
+			1: only for 32-bit processes
+			2: only for 64-bit processes
+			on: enable for both 32- and 64-bit processes
+			off: disable for both 32- and 64-bit processes
+
 	amd_iommu=	[HW,X86-84]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
 /*
  * ELF register definitions..
  */
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 #include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
 extern unsigned long arch_randomize_brk(struct mm_struct *mm);
 #define arch_randomize_brk arch_randomize_brk
 
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+	return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+	if (test_thread_flag(TIF_IA32))
+		return 1;
+#endif
+	return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+	ALIGN_VA_32	= BIT(0),
+	ALIGN_VA_64	= BIT(1),
+	ALIGN_VDSO	= BIT(2),
+	ALIGN_TOPDOWN	= BIT(3),
+};
+
+struct va_alignment {
+	int flags;
+	unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
 #endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 					"with P0 frequency!\n");
 		}
 	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+
+	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+			 enum align_flags flags)
+{
+	unsigned long tmp_addr;
+
+	/* handle 32- and 64-bit case with a single conditional */
+	if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+		return addr;
+
+	if (!(current->flags & PF_RANDOMIZE))
+		return addr;
+
+	if (!((flags & ALIGN_VDSO) || filp))
+		return addr;
+
+	tmp_addr = addr;
+
+	/*
+	 * We need an address which is <= than the original
+	 * one only when in topdown direction.
+	 */
+	if (!(flags & ALIGN_TOPDOWN))
+		tmp_addr += va_align.mask;
+
+	tmp_addr &= ~va_align.mask;
+
+	return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+	/* guard against enabling this on other CPU families */
+	if (va_align.flags < 0)
+		return 1;
+
+	if (*str == 0)
+		return 1;
+
+	if (*str == '=')
+		str++;
+
+	if (!strcmp(str, "32"))
+		va_align.flags = ALIGN_VA_32;
+	else if (!strcmp(str, "64"))
+		va_align.flags = ALIGN_VA_64;
+	else if (!strcmp(str, "off"))
+		va_align.flags = 0;
+	else if (!strcmp(str, "on"))
+		va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+	else
+		return 0;
+
+	return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		unsigned long, prot, unsigned long, flags,
 		unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	start_addr = addr;
 
 full_search:
+
+	addr = align_addr(addr, filp, 0);
+
 	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
 			mm->cached_hole_size = vma->vm_start - addr;
 
 		addr = vma->vm_end;
+		addr = align_addr(addr, filp, 0);
 	}
 }
 
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* make sure it can fit in the remaining address space */
 	if (addr > len) {
-		vma = find_vma(mm, addr-len);
-		if (!vma || addr <= vma->vm_start)
+		unsigned long tmp_addr = align_addr(addr - len, filp,
+						    ALIGN_TOPDOWN);
+
+		vma = find_vma(mm, tmp_addr);
+		if (!vma || tmp_addr + len <= vma->vm_start)
 			/* remember the address as a hint for next time */
-			return mm->free_area_cache = addr-len;
+			return mm->free_area_cache = tmp_addr;
 	}
 
 	if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	addr = mm->mmap_base-len;
 
 	do {
+		addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
 		/*
 		 * Lookup failure means no vma is above this address,
 		 * else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
 #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
 #define MAX_GAP (TASK_SIZE/6*5)
 
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
-	return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
-	if (test_thread_flag(TIF_IA32))
-		return 1;
-#endif
-	return 0;
-}
-
 static int mmap_is_legacy(void)
 {
 	if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 7abd2be..caa42ce 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -69,6 +69,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 	addr = start + (offset << PAGE_SHIFT);
 	if (addr >= end)
 		addr = end;
+
+	/*
+	 * page-align it here so that get_unmapped_area doesn't
+	 * align it wrongfully again to the next page. addr can come in 4K
+	 * unaligned here as a result of stack start randomization.
+	 */
+	addr = PAGE_ALIGN(addr);
+	addr = align_addr(addr, NULL, ALIGN_VDSO);
+
 	return addr;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:x86/cpu] x86: Add a BSP cpu_dev helper
  2011-08-05 18:01     ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
@ 2011-08-05 22:58       ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, bp, borislav.petkov

Commit-ID:  a110b5ec7371592eac856ac5c22dc7b518952d44
Gitweb:     http://git.kernel.org/tip/a110b5ec7371592eac856ac5c22dc7b518952d44
Author:     Borislav Petkov <bp@amd64.org>
AuthorDate: Fri, 5 Aug 2011 20:01:16 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:26:49 -0700

x86: Add a BSP cpu_dev helper

Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180116.GB26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c |    3 +++
 arch/x86/kernel/cpu/cpu.h    |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..8ed394a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	filter_cpuid_features(c, false);
 
 	setup_smep(c);
+
+	if (this_cpu->c_bsp_init)
+		this_cpu->c_bsp_init(c);
 }
 
 void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e765633..1b22dcc 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -18,6 +18,7 @@ struct cpu_dev {
 	struct		cpu_model_info c_models[4];
 
 	void            (*c_early_init)(struct cpuinfo_x86 *);
+	void		(*c_bsp_init)(struct cpuinfo_x86 *);
 	void		(*c_init)(struct cpuinfo_x86 *);
 	void		(*c_identify)(struct cpuinfo_x86 *);
 	unsigned int	(*c_size_cache)(struct cpuinfo_x86 *, unsigned int);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:x86/cpu] x86, amd: Move BSP code to cpu_dev helper
  2011-08-05 18:04     ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
  2011-08-05 20:07       ` H. Peter Anvin
@ 2011-08-05 22:59       ` tip-bot for Borislav Petkov
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, bp, borislav.petkov

Commit-ID:  8fa8b035085e7320c15875c1f6b03b290ca2dd66
Gitweb:     http://git.kernel.org/tip/8fa8b035085e7320c15875c1f6b03b290ca2dd66
Author:     Borislav Petkov <bp@amd64.org>
AuthorDate: Fri, 5 Aug 2011 20:04:09 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:32:33 -0700

x86, amd: Move BSP code to cpu_dev helper

Move code which is run once on the BSP during boot into the cpu_dev
helper.

[ hpa: removed bogus cpu_has -> static_cpu_has conversion ]

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180409.GC26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/cpu/amd.c |   59 ++++++++++++++++++++++-----------------------
 1 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..b6e3e87 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,34 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+		if (c->x86 > 0x10 ||
+		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+			u64 val;
+
+			rdmsrl(MSR_K7_HWCR, val);
+			if (!(val & BIT(24)))
+				printk(KERN_WARNING FW_BUG "TSC doesn't count "
+					"with P0 frequency!\n");
+		}
+	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+	}
+}
+
 static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 {
 	early_init_amd_mc(c);
@@ -441,36 +469,6 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
 	}
 #endif
-
-	/* We need to do the following only once */
-	if (c != &boot_cpu_data)
-		return;
-
-	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
-		if (c->x86 > 0x10 ||
-		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
-			u64 val;
-
-			rdmsrl(MSR_K7_HWCR, val);
-			if (!(val & BIT(24)))
-				printk(KERN_WARNING FW_BUG "TSC doesn't count "
-					"with P0 frequency!\n");
-		}
-	}
-
-	if (c->x86 == 0x15) {
-		unsigned long upperbit;
-		u32 cpuid, assoc;
-
-		cpuid	 = cpuid_edx(0x80000005);
-		assoc	 = cpuid >> 16 & 0xff;
-		upperbit = ((cpuid >> 24) << 10) / assoc;
-
-		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
-		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
-
-	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
@@ -692,6 +690,7 @@ static const struct cpu_dev __cpuinitconst amd_cpu_dev = {
 	.c_size_cache	= amd_size_cache,
 #endif
 	.c_early_init   = early_init_amd,
+	.c_bsp_init	= bsp_init_amd,
 	.c_init		= init_amd,
 	.c_x86_vendor	= X86_VENDOR_AMD,
 };

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-08-05 22:58   ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
@ 2011-08-06  0:10     ` H. Peter Anvin
  2011-08-06 12:31       ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-06  0:10 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, tglx, hpa, borislav.petkov; +Cc: linux-tip-commits

On 08/05/2011 03:58 PM, tip-bot for Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index b13ed39..b0234bc 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
>  					"with P0 frequency!\n");
>  		}
>  	}
> +
> +	if (c->x86 == 0x15) {
> +		unsigned long upperbit;
> +		u32 cpuid, assoc;
> +
> +		cpuid	 = cpuid_edx(0x80000005);
> +		assoc	 = cpuid >> 16 & 0xff;
> +		upperbit = ((cpuid >> 24) << 10) / assoc;
> +
> +		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
> +		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
> +
> +	}
>  }
>  
>  static void __cpuinit init_amd(struct cpuinfo_x86 *c)

Breaks all i386 builds:

/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'

[the line numbers refer to the entire patchset]

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch
  2011-08-06  0:10     ` H. Peter Anvin
@ 2011-08-06 12:31       ` Borislav Petkov
  2011-08-06 23:22         ` [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build tip-bot for Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-06 12:31 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
	Andrew Morton
  Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit
builds with the following error message:

/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'

This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/sys_x86_64.c |    4 ----
 arch/x86/mm/mmap.c           |    5 ++++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index aaa8d09..fe7d2da 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,10 +18,6 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
-struct __read_mostly va_alignment va_align = {
-	.flags = -1,
-};
-
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
  *
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d4c0736..4b5ba85 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,10 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
 static unsigned int stack_maxrandom_size(void)
 {
 	unsigned int max = 0;
@@ -42,7 +46,6 @@ static unsigned int stack_maxrandom_size(void)
 	return max;
 }
 
-
 /*
  * Top of mmap area (just below the process stack).
  *
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build
  2011-08-06 12:31       ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
@ 2011-08-06 23:22         ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-06 23:22 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, borislav.petkov

Commit-ID:  9387f774d61b01ab71bade85e6d0bfab0b3419bd
Gitweb:     http://git.kernel.org/tip/9387f774d61b01ab71bade85e6d0bfab0b3419bd
Author:     Borislav Petkov <borislav.petkov@amd.com>
AuthorDate: Sat, 6 Aug 2011 14:31:38 +0200
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Sat, 6 Aug 2011 11:44:57 -0700

x86-32, amd: Move va_align definition to unbreak 32-bit build

hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit
builds with the following error message:

/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'

This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/sys_x86_64.c |    4 ----
 arch/x86/mm/mmap.c           |    5 ++++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index aaa8d09..fe7d2da 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,10 +18,6 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
-struct __read_mostly va_alignment va_align = {
-	.flags = -1,
-};
-
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
  *
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d4c0736..4b5ba85 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,10 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
 static unsigned int stack_maxrandom_size(void)
 {
 	unsigned int max = 0;
@@ -42,7 +46,6 @@ static unsigned int stack_maxrandom_size(void)
 	return max;
 }
 
-
 /*
  * Top of mmap area (just below the process stack).
  *

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-08-06 23:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
2011-08-05 22:58   ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
2011-08-06  0:10     ` H. Peter Anvin
2011-08-06 12:31       ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
2011-08-06 23:22         ` [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build tip-bot for Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
2011-08-05 17:55   ` Borislav Petkov
2011-08-05 18:01     ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
2011-08-05 22:58       ` [tip:x86/cpu] " tip-bot for Borislav Petkov
2011-08-05 18:04     ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 20:07       ` H. Peter Anvin
2011-08-05 22:52         ` Borislav Petkov
2011-08-05 22:56           ` H. Peter Anvin
2011-08-05 22:59       ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.