* [PATCH 2/2] x86-64: use RIP-relative addressing for most per-CPU accesses
@ 2014-11-04 8:50 Jan Beulich
2014-11-04 19:45 ` [tip:x86/boot] x86-64: Use " tip-bot for Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2014-11-04 8:50 UTC (permalink / raw)
To: mingo, tglx, hpa; +Cc: linux-kernel
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).
With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
arch/x86/include/asm/percpu.h | 59 +++++++++++++++++++++++++++++++-----------
arch/x86/kernel/vmlinux.lds.S | 2 +
2 files changed, 46 insertions(+), 15 deletions(-)
--- 3.18-rc3-x86_64-PCrel-per-CPU.orig/arch/x86/include/asm/percpu.h
+++ 3.18-rc3-x86_64-PCrel-per-CPU/arch/x86/include/asm/percpu.h
@@ -64,7 +64,7 @@
#define __percpu_prefix ""
#endif
-#define __percpu_arg(x) __percpu_prefix "%P" #x
+#define __percpu_arg(x) __percpu_prefix "%" #x
/*
* Initialized pointers to per-cpu variables needed for the boot
@@ -179,29 +179,58 @@ do { \
} \
} while (0)
-#define percpu_from_op(op, var, constraint) \
+#define percpu_from_op(op, var) \
({ \
typeof(var) pfo_ret__; \
switch (sizeof(var)) { \
case 1: \
asm(op "b "__percpu_arg(1)",%0" \
: "=q" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 2: \
asm(op "w "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 4: \
asm(op "l "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 8: \
asm(op "q "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
+ break; \
+ default: __bad_percpu_size(); \
+ } \
+ pfo_ret__; \
+})
+
+#define percpu_stable_op(op, var) \
+({ \
+ typeof(var) pfo_ret__; \
+ switch (sizeof(var)) { \
+ case 1: \
+ asm(op "b "__percpu_arg(P1)",%0" \
+ : "=q" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 2: \
+ asm(op "w "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 4: \
+ asm(op "l "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 8: \
+ asm(op "q "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
break; \
default: __bad_percpu_size(); \
} \
@@ -359,11 +388,11 @@ do { \
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
-#define this_cpu_read_stable(var) percpu_from_op("mov", var, "p" (&(var)))
+#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
-#define raw_cpu_read_1(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define raw_cpu_read_2(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define raw_cpu_read_4(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define raw_cpu_read_1(pcp) percpu_from_op("mov", pcp)
+#define raw_cpu_read_2(pcp) percpu_from_op("mov", pcp)
+#define raw_cpu_read_4(pcp) percpu_from_op("mov", pcp)
#define raw_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val)
@@ -381,9 +410,9 @@ do { \
#define raw_cpu_xchg_2(pcp, val) percpu_xchg_op(pcp, val)
#define raw_cpu_xchg_4(pcp, val) percpu_xchg_op(pcp, val)
-#define this_cpu_read_1(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define this_cpu_read_2(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define this_cpu_read_4(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define this_cpu_read_1(pcp) percpu_from_op("mov", pcp)
+#define this_cpu_read_2(pcp) percpu_from_op("mov", pcp)
+#define this_cpu_read_4(pcp) percpu_from_op("mov", pcp)
#define this_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op("mov", (pcp), val)
@@ -435,7 +464,7 @@ do { \
* 32 bit must fall back to generic operations.
*/
#ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define raw_cpu_read_8(pcp) percpu_from_op("mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op((pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val)
@@ -444,7 +473,7 @@ do { \
#define raw_cpu_xchg_8(pcp, nval) percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval)
-#define this_cpu_read_8(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define this_cpu_read_8(pcp) percpu_from_op("mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op((pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val)
--- 3.18-rc3-x86_64-PCrel-per-CPU.orig/arch/x86/kernel/vmlinux.lds.S
+++ 3.18-rc3-x86_64-PCrel-per-CPU/arch/x86/kernel/vmlinux.lds.S
@@ -186,6 +186,8 @@ SECTIONS
* start another segment - init.
*/
PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+ ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
+ "per-CPU data too large - increase CONFIG_PHYSICAL_START")
#endif
INIT_TEXT_SECTION(PAGE_SIZE)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:x86/boot] x86-64: Use RIP-relative addressing for most per-CPU accesses
2014-11-04 8:50 [PATCH 2/2] x86-64: use RIP-relative addressing for most per-CPU accesses Jan Beulich
@ 2014-11-04 19:45 ` tip-bot for Jan Beulich
2014-11-04 20:10 ` H. Peter Anvin
0 siblings, 1 reply; 4+ messages in thread
From: tip-bot for Jan Beulich @ 2014-11-04 19:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: jbeulich, JBeulich, tglx, hpa, linux-kernel, mingo
Commit-ID: 97b67ae559947f1e208439a1bf6a734da3087006
Gitweb: http://git.kernel.org/tip/97b67ae559947f1e208439a1bf6a734da3087006
Author: Jan Beulich <JBeulich@suse.com>
AuthorDate: Tue, 4 Nov 2014 08:50:48 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 4 Nov 2014 20:43:14 +0100
x86-64: Use RIP-relative addressing for most per-CPU accesses
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).
With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/percpu.h | 59 ++++++++++++++++++++++++++++++++-----------
arch/x86/kernel/vmlinux.lds.S | 2 ++
2 files changed, 46 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 74da317..e0ba66c 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -64,7 +64,7 @@
#define __percpu_prefix ""
#endif
-#define __percpu_arg(x) __percpu_prefix "%P" #x
+#define __percpu_arg(x) __percpu_prefix "%" #x
/*
* Initialized pointers to per-cpu variables needed for the boot
@@ -179,29 +179,58 @@ do { \
} \
} while (0)
-#define percpu_from_op(op, var, constraint) \
+#define percpu_from_op(op, var) \
({ \
typeof(var) pfo_ret__; \
switch (sizeof(var)) { \
case 1: \
asm(op "b "__percpu_arg(1)",%0" \
: "=q" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 2: \
asm(op "w "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 4: \
asm(op "l "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
break; \
case 8: \
asm(op "q "__percpu_arg(1)",%0" \
: "=r" (pfo_ret__) \
- : constraint); \
+ : "m" (var)); \
+ break; \
+ default: __bad_percpu_size(); \
+ } \
+ pfo_ret__; \
+})
+
+#define percpu_stable_op(op, var) \
+({ \
+ typeof(var) pfo_ret__; \
+ switch (sizeof(var)) { \
+ case 1: \
+ asm(op "b "__percpu_arg(P1)",%0" \
+ : "=q" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 2: \
+ asm(op "w "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 4: \
+ asm(op "l "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
+ break; \
+ case 8: \
+ asm(op "q "__percpu_arg(P1)",%0" \
+ : "=r" (pfo_ret__) \
+ : "p" (&(var))); \
break; \
default: __bad_percpu_size(); \
} \
@@ -359,11 +388,11 @@ do { \
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
-#define this_cpu_read_stable(var) percpu_from_op("mov", var, "p" (&(var)))
+#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
-#define raw_cpu_read_1(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define raw_cpu_read_2(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define raw_cpu_read_4(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define raw_cpu_read_1(pcp) percpu_from_op("mov", pcp)
+#define raw_cpu_read_2(pcp) percpu_from_op("mov", pcp)
+#define raw_cpu_read_4(pcp) percpu_from_op("mov", pcp)
#define raw_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val)
@@ -381,9 +410,9 @@ do { \
#define raw_cpu_xchg_2(pcp, val) percpu_xchg_op(pcp, val)
#define raw_cpu_xchg_4(pcp, val) percpu_xchg_op(pcp, val)
-#define this_cpu_read_1(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define this_cpu_read_2(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
-#define this_cpu_read_4(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define this_cpu_read_1(pcp) percpu_from_op("mov", pcp)
+#define this_cpu_read_2(pcp) percpu_from_op("mov", pcp)
+#define this_cpu_read_4(pcp) percpu_from_op("mov", pcp)
#define this_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op("mov", (pcp), val)
@@ -435,7 +464,7 @@ do { \
* 32 bit must fall back to generic operations.
*/
#ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define raw_cpu_read_8(pcp) percpu_from_op("mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op((pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val)
@@ -444,7 +473,7 @@ do { \
#define raw_cpu_xchg_8(pcp, nval) percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval)
-#define this_cpu_read_8(pcp) percpu_from_op("mov", (pcp), "m"(pcp))
+#define this_cpu_read_8(pcp) percpu_from_op("mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op((pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 49edf2d..00bf300 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -186,6 +186,8 @@ SECTIONS
* start another segment - init.
*/
PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+ ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
+ "per-CPU data too large - increase CONFIG_PHYSICAL_START")
#endif
INIT_TEXT_SECTION(PAGE_SIZE)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [tip:x86/boot] x86-64: Use RIP-relative addressing for most per-CPU accesses
2014-11-04 19:45 ` [tip:x86/boot] x86-64: Use " tip-bot for Jan Beulich
@ 2014-11-04 20:10 ` H. Peter Anvin
2014-11-05 17:04 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: H. Peter Anvin @ 2014-11-04 20:10 UTC (permalink / raw)
To: mingo, linux-kernel, JBeulich, tglx, linux-tip-commits
On 11/04/2014 11:45 AM, tip-bot for Jan Beulich wrote:
> Commit-ID: 97b67ae559947f1e208439a1bf6a734da3087006
> Gitweb: http://git.kernel.org/tip/97b67ae559947f1e208439a1bf6a734da3087006
> Author: Jan Beulich <JBeulich@suse.com>
> AuthorDate: Tue, 4 Nov 2014 08:50:48 +0000
> Committer: Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Tue, 4 Nov 2014 20:43:14 +0100
>
> x86-64: Use RIP-relative addressing for most per-CPU accesses
>
> Observing that per-CPU data (in the SMP case) is reachable by
> exploiting 64-bit address wraparound (building on the default kernel
> load address being at 16Mb), the one byte shorter RIP-relative
> addressing form can be used for most per-CPU accesses. The one
> exception are the "stable" reads, where the use of the "P" operand
> modifier prevents the compiler from using RIP-relative addressing, but
> is unavoidable due to the use of the "p" constraint (side note: with
> gcc 4.9.x the intended effect of this isn't being achieved anymore,
> see gcc bug 63637).
>
> With the dependency on the minimum kernel load address, arbitrarily
> low values for CONFIG_PHYSICAL_START are now no longer possible. A
> link time assertion is being added, directing to the need to increase
> that value when it triggers.
>
This description makes very little sense. What matters here is the
virtual address, which is always >= -2 GB.
-hpa
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [tip:x86/boot] x86-64: Use RIP-relative addressing for most per-CPU accesses
2014-11-04 20:10 ` H. Peter Anvin
@ 2014-11-05 17:04 ` Jan Beulich
0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2014-11-05 17:04 UTC (permalink / raw)
To: mingo, tglx, linux-kernel, linux-tip-commits, hpa
>>> "H. Peter Anvin" <hpa@zytor.com> 11/04/14 9:11 PM >>>
>On 11/04/2014 11:45 AM, tip-bot for Jan Beulich wrote:
>> x86-64: Use RIP-relative addressing for most per-CPU accesses
>>
>> Observing that per-CPU data (in the SMP case) is reachable by
>> exploiting 64-bit address wraparound (building on the default kernel
>> load address being at 16Mb), the one byte shorter RIP-relative
>> addressing form can be used for most per-CPU accesses. The one
>> exception are the "stable" reads, where the use of the "P" operand
>> modifier prevents the compiler from using RIP-relative addressing, but
>> is unavoidable due to the use of the "p" constraint (side note: with
>> gcc 4.9.x the intended effect of this isn't being achieved anymore,
>> see gcc bug 63637).
>>
>> With the dependency on the minimum kernel load address, arbitrarily
>> low values for CONFIG_PHYSICAL_START are now no longer possible. A
>> link time assertion is being added, directing to the need to increase
>> that value when it triggers.
>>
>
>This description makes very little sense. What matters here is the
>virtual address, which is always >= -2 GB.
But code at or very close to -2Gb won't be able to reach addresses far
enough from address zero; it indeed matters that the lowest code address
is far enough away from -2Gb.
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-05 17:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-04 8:50 [PATCH 2/2] x86-64: use RIP-relative addressing for most per-CPU accesses Jan Beulich
2014-11-04 19:45 ` [tip:x86/boot] x86-64: Use " tip-bot for Jan Beulich
2014-11-04 20:10 ` H. Peter Anvin
2014-11-05 17:04 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).