* [GIT PULL] x86/alternatives padding
@ 2015-03-03 17:06 Borislav Petkov
2015-03-04 7:32 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2015-03-03 17:06 UTC (permalink / raw)
To: x86-ml; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Andy Lutomirski, lkml
Hi guys,
so this one has been long in the making and has been passing testing
on a bunch of boxes and bitness here so maybe we should try to put it
into the wider tip mix and see what happens. If all is well, great, if
there's trouble which I haven't managed to trigger in my testing, we can
remove it from tip/master until all issues are fixed.
Btw, the last three patches are adjusting and improving perf bench a
little as it includes memcpy/memset_64.S directly and this patchset
breaks it with the changes otherwise.
Please pull,
thanks.
---
The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:
Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git tags/alternatives_padding
for you to fetch changes up to dfecb95cdfeaf7872d83a96bec3a606e9cd95c8d:
perf/bench: Add -r all so that you can run all mem* routines (2015-03-03 18:01:58 +0100)
----------------------------------------------------------------
A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
straightforward and without having to figure out old and new instruction
sizes but have the toolchain figure that out for us.
Furthermore, it optimizes JMPs used so that fetch and decode can be
relieved with smaller versions of the JMPs, where possible.
Some stats:
x86_64 defconfig:
Alternatives sites total: 2478
Total padding added (in Bytes): 6051
The padding is currently done for:
X86_FEATURE_ALWAYS
X86_FEATURE_ERMS
X86_FEATURE_LFENCE_RDTSC
X86_FEATURE_MFENCE_RDTSC
X86_FEATURE_SMAP
This is with the latest version of the patchset. Of course, on each
machine the alternatives sites actually being patched are a proper
subset of the total number.
----------------------------------------------------------------
Borislav Petkov (18):
x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
x86/alternatives: Cleanup DPRINTK macro
x86/alternatives: Add instruction padding
x86/alternatives: Make JMPs more robust
x86/alternatives: Use optimized NOPs for padding
x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro
x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2
x86/smap: Use ALTERNATIVE macro
x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro
x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro
x86/asm: Use alternative_2() in rdtsc_barrier()
x86/asm: Cleanup prefetch primitives
x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro
x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro
x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro
perf/bench: Fix mem* routines usage after alternatives change
perf/bench: Carve out mem routine benchmarking
perf/bench: Add -r all so that you can run all mem* routines
arch/x86/include/asm/alternative-asm.h | 43 ++++++-
arch/x86/include/asm/alternative.h | 65 +++++++----
arch/x86/include/asm/apic.h | 2 +-
arch/x86/include/asm/barrier.h | 6 +-
arch/x86/include/asm/cpufeature.h | 30 ++---
arch/x86/include/asm/processor.h | 16 ++-
arch/x86/include/asm/smap.h | 30 ++---
arch/x86/kernel/alternative.c | 158 ++++++++++++++++++++++----
arch/x86/kernel/cpu/amd.c | 5 +
arch/x86/kernel/entry_32.S | 12 +-
arch/x86/lib/clear_page_64.S | 66 +++++------
arch/x86/lib/copy_page_64.S | 37 ++----
arch/x86/lib/copy_user_64.S | 46 ++------
arch/x86/lib/memcpy_64.S | 68 ++++-------
arch/x86/lib/memmove_64.S | 19 +---
arch/x86/lib/memset_64.S | 61 ++++------
arch/x86/um/asm/barrier.h | 4 +-
tools/perf/bench/mem-memcpy-x86-64-asm-def.h | 6 +-
tools/perf/bench/mem-memcpy-x86-64-asm.S | 2 -
tools/perf/bench/mem-memcpy.c | 128 +++++++++++----------
tools/perf/bench/mem-memset-x86-64-asm-def.h | 6 +-
tools/perf/bench/mem-memset-x86-64-asm.S | 2 -
tools/perf/util/include/asm/alternative-asm.h | 1 +
23 files changed, 433 insertions(+), 380 deletions(-)
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GIT PULL] x86/alternatives padding
2015-03-03 17:06 [GIT PULL] x86/alternatives padding Borislav Petkov
@ 2015-03-04 7:32 ` Ingo Molnar
2015-03-04 11:22 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2015-03-04 7:32 UTC (permalink / raw)
To: Borislav Petkov
Cc: x86-ml, Peter Zijlstra, Arnaldo Carvalho de Melo,
Andy Lutomirski, lkml, Linus Torvalds
* Borislav Petkov <bp@alien8.de> wrote:
> Hi guys,
>
> so this one has been long in the making and has been passing testing
> on a bunch of boxes and bitness here so maybe we should try to put it
> into the wider tip mix and see what happens. If all is well, great, if
> there's trouble which I haven't managed to trigger in my testing, we can
> remove it from tip/master until all issues are fixed.
>
> Btw, the last three patches are adjusting and improving perf bench a
> little as it includes memcpy/memset_64.S directly and this patchset
> breaks it with the changes otherwise.
>
> Please pull,
> thanks.
>
> ---
> The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:
>
> Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git tags/alternatives_padding
>
> for you to fetch changes up to dfecb95cdfeaf7872d83a96bec3a606e9cd95c8d:
>
> perf/bench: Add -r all so that you can run all mem* routines (2015-03-03 18:01:58 +0100)
>
> ----------------------------------------------------------------
> A more involved rework of the alternatives framework to be able to
> pad instructions and thus make using the alternatives macros more
> straightforward and without having to figure out old and new instruction
> sizes but have the toolchain figure that out for us.
>
> Furthermore, it optimizes JMPs used so that fetch and decode can be
> relieved with smaller versions of the JMPs, where possible.
>
> Some stats:
>
> x86_64 defconfig:
>
> Alternatives sites total: 2478
> Total padding added (in Bytes): 6051
Just curious: did the kernel image size change before/after these
changes? I.e. was any of the existing alternative instructions using
sites coded sub-optimally, with a larger maximum instruction size
allocated than strictly needed?
At least some of your improvements made things more optimal -
wondering at the total win, beyond the significant maintainability win
that is.
> The padding is currently done for:
>
> X86_FEATURE_ALWAYS
> X86_FEATURE_ERMS
> X86_FEATURE_LFENCE_RDTSC
> X86_FEATURE_MFENCE_RDTSC
> X86_FEATURE_SMAP
>
> This is with the latest version of the patchset. Of course, on each
> machine the alternatives sites actually being patched are a proper
> subset of the total number.
>
> ----------------------------------------------------------------
> Borislav Petkov (18):
> x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
> x86/alternatives: Cleanup DPRINTK macro
> x86/alternatives: Add instruction padding
> x86/alternatives: Make JMPs more robust
> x86/alternatives: Use optimized NOPs for padding
> x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro
> x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2
> x86/smap: Use ALTERNATIVE macro
> x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro
> x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro
> x86/asm: Use alternative_2() in rdtsc_barrier()
> x86/asm: Cleanup prefetch primitives
> x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro
> x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro
> x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro
> perf/bench: Fix mem* routines usage after alternatives change
> perf/bench: Carve out mem routine benchmarking
> perf/bench: Add -r all so that you can run all mem* routines
>
> arch/x86/include/asm/alternative-asm.h | 43 ++++++-
> arch/x86/include/asm/alternative.h | 65 +++++++----
> arch/x86/include/asm/apic.h | 2 +-
> arch/x86/include/asm/barrier.h | 6 +-
> arch/x86/include/asm/cpufeature.h | 30 ++---
> arch/x86/include/asm/processor.h | 16 ++-
> arch/x86/include/asm/smap.h | 30 ++---
> arch/x86/kernel/alternative.c | 158 ++++++++++++++++++++++----
> arch/x86/kernel/cpu/amd.c | 5 +
> arch/x86/kernel/entry_32.S | 12 +-
> arch/x86/lib/clear_page_64.S | 66 +++++------
> arch/x86/lib/copy_page_64.S | 37 ++----
> arch/x86/lib/copy_user_64.S | 46 ++------
> arch/x86/lib/memcpy_64.S | 68 ++++-------
> arch/x86/lib/memmove_64.S | 19 +---
> arch/x86/lib/memset_64.S | 61 ++++------
> arch/x86/um/asm/barrier.h | 4 +-
> tools/perf/bench/mem-memcpy-x86-64-asm-def.h | 6 +-
> tools/perf/bench/mem-memcpy-x86-64-asm.S | 2 -
> tools/perf/bench/mem-memcpy.c | 128 +++++++++++----------
> tools/perf/bench/mem-memset-x86-64-asm-def.h | 6 +-
> tools/perf/bench/mem-memset-x86-64-asm.S | 2 -
> tools/perf/util/include/asm/alternative-asm.h | 1 +
> 23 files changed, 433 insertions(+), 380 deletions(-)
Pulled into tip:x86/asm, thanks Boris!
(I made a few comments as replies to the patches themselves, none
affected the quality of this tree so I pulled it.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GIT PULL] x86/alternatives padding
2015-03-04 7:32 ` Ingo Molnar
@ 2015-03-04 11:22 ` Borislav Petkov
2015-03-04 11:41 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2015-03-04 11:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86-ml, Peter Zijlstra, Arnaldo Carvalho de Melo,
Andy Lutomirski, lkml, Linus Torvalds
On Wed, Mar 04, 2015 at 08:32:21AM +0100, Ingo Molnar wrote:
> Just curious: did the kernel image size change before/after these
> changes? I.e. was any of the existing alternative instructions using
> sites coded sub-optimally, with a larger maximum instruction size
> allocated than strictly needed?
>
> At least some of your improvements made things more optimal -
> wondering at the total win, beyond the significant maintainability win
> that is.
Well, kernel image doesn't change while vmlinux shows only a very small
.text increase of about 2K. I'm not sure yet why that happens though
because it shouldn't be the padding. Because we will have to do it
anyway, this patchset makes it automatic instead of by-hand, so to
speak.
Let me bisect it and see which patch adds the increase.
4.0-rc1 with alternatives patchset:
===================================
Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC f2669897
Kernel: arch/x86/boot/bzImage is ready (#1)
text data bss dec hex filename
12292971 1595264 1085440 14973675 e47aeb vmlinux
plain 4.0-rc1:
==============
Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC 7200607a
Kernel: arch/x86/boot/bzImage is ready (#1)
text data bss dec hex filename
12290539 1595264 1085440 14971243 e4716b vmlinux
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GIT PULL] x86/alternatives padding
2015-03-04 11:22 ` Borislav Petkov
@ 2015-03-04 11:41 ` Borislav Petkov
2015-03-04 20:22 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2015-03-04 11:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86-ml, Peter Zijlstra, Arnaldo Carvalho de Melo,
Andy Lutomirski, lkml, Linus Torvalds
On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
> Well, kernel image doesn't change while vmlinux shows only a very small
> .text increase of about 2K. I'm not sure yet why that happens though
> because it shouldn't be the padding. Because we will have to do it
> anyway, this patchset makes it automatic instead of by-hand, so to
> speak.
>
> Let me bisect it and see which patch adds the increase.
Doh, of course. I've added u8 padlen to the alternative instruction
entry struct. For 2Kish alt sites in total, this explains the almost
exact same increase in text size:
text data bss dec hex filename
12290539 1595264 1085440 14971243 e4716b vmlinux
338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
text data bss dec hex filename
12290539 1595264 1085440 14971243 e4716b vmlinux
db477a3386de... x86/alternatives: Cleanup DPRINTK macro
text data bss dec hex filename
12290539 1595264 1085440 14971243 e4716b vmlinux
4332195c5615... x86/alternatives: Add instruction padding
text data bss dec hex filename
12293030 1595264 1085440 14973734 e47b26 vmlinux
^^^^^^^
Ok, that's sorted out now.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GIT PULL] x86/alternatives padding
2015-03-04 11:41 ` Borislav Petkov
@ 2015-03-04 20:22 ` Ingo Molnar
2015-03-04 21:02 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2015-03-04 20:22 UTC (permalink / raw)
To: Borislav Petkov
Cc: x86-ml, Peter Zijlstra, Arnaldo Carvalho de Melo,
Andy Lutomirski, lkml, Linus Torvalds
* Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
> > Well, kernel image doesn't change while vmlinux shows only a very small
> > .text increase of about 2K. I'm not sure yet why that happens though
> > because it shouldn't be the padding. Because we will have to do it
> > anyway, this patchset makes it automatic instead of by-hand, so to
> > speak.
> >
> > Let me bisect it and see which patch adds the increase.
>
> Doh, of course. I've added u8 padlen to the alternative instruction
> entry struct. For 2Kish alt sites in total, this explains the almost
> exact same increase in text size:
>
> text data bss dec hex filename
> 12290539 1595264 1085440 14971243 e4716b vmlinux
>
> 338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
> text data bss dec hex filename
> 12290539 1595264 1085440 14971243 e4716b vmlinux
>
> db477a3386de... x86/alternatives: Cleanup DPRINTK macro
> text data bss dec hex filename
> 12290539 1595264 1085440 14971243 e4716b vmlinux
>
> 4332195c5615... x86/alternatives: Add instruction padding
> text data bss dec hex filename
> 12293030 1595264 1085440 14973734 e47b26 vmlinux
> ^^^^^^^
So you could have a look at the detailed section dump itself via:
objdump -h vmlinux
there .text will be the raw text and .alt* will be listed separately.
The 'size' tool will add up executable sections IIRC, mixing these
sections.
.alt* is freed after init, so it's not really a kernel image size
increase, right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GIT PULL] x86/alternatives padding
2015-03-04 20:22 ` Ingo Molnar
@ 2015-03-04 21:02 ` Borislav Petkov
0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-03-04 21:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86-ml, Peter Zijlstra, Arnaldo Carvalho de Melo,
Andy Lutomirski, lkml, Linus Torvalds
On Wed, Mar 04, 2015 at 09:22:27PM +0100, Ingo Molnar wrote:
> So you could have a look at the detailed section dump itself via:
>
> objdump -h vmlinux
>
> there .text will be the raw text and .alt* will be listed separately.
> The 'size' tool will add up executable sections IIRC, mixing these
> sections.
Right.
> .alt* is freed after init, so it's not really a kernel image size
> increase, right?
Exactly:
void free_initmem(void)
{
free_init_pages("unused kernel",
(unsigned long)(&__init_begin),
(unsigned long)(&__init_end));
}
which are:
69708: ffffffff81ee9000 0 NOTYPE GLOBAL DEFAULT 16 __init_begin
72679: ffffffff81ff9000 0 NOTYPE GLOBAL DEFAULT 25 __init_end
and there's a bunch of stuff between ffffffff81ee9000 and ffffffff81ff9000:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[17] .init.text PROGBITS ffffffff81efe000 012fe000
0000000000066c98 0000000000000000 AX 0 0 16
[18] .init.data PROGBITS ffffffff81f65000 01365000
0000000000086d18 0000000000000000 WA 0 0 4096
[19] .x86_cpu_dev.init PROGBITS ffffffff81febd18 013ebd18
0000000000000018 0000000000000000 A 0 0 8
[20] .altinstructions PROGBITS ffffffff81febd30 013ebd30
0000000000007e4b 0000000000000000 A 0 0 1
[21] .altinstr_replace PROGBITS ffffffff81ff3b7b 013f3b7b
0000000000002044 0000000000000000 AX 0 0 1
[22] .iommu_table PROGBITS ffffffff81ff5bc0 013f5bc0
00000000000000c8 0000000000000000 A 0 0 8
[23] .apicdrivers PROGBITS ffffffff81ff5c88 013f5c88
0000000000000010 0000000000000000 WA 0 0 8
[24] .exit.text PROGBITS ffffffff81ff5c98 013f5c98
0000000000002412 0000000000000000 AX 0 0 1
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-03-04 21:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03 17:06 [GIT PULL] x86/alternatives padding Borislav Petkov
2015-03-04 7:32 ` Ingo Molnar
2015-03-04 11:22 ` Borislav Petkov
2015-03-04 11:41 ` Borislav Petkov
2015-03-04 20:22 ` Ingo Molnar
2015-03-04 21:02 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).