linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] x86/asm changes for v5.6
@ 2020-01-28 16:59 Ingo Molnar
  2020-01-28 19:51 ` Linus Torvalds
  2020-01-28 21:15 ` pr-tracker-bot
  0 siblings, 2 replies; 29+ messages in thread
From: Ingo Molnar @ 2020-01-28 16:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest x86-asm-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

   # HEAD: 183ef7adf4ed638ac0fb0c3c9a71fc00e8512b61 x86/boot: Simplify calculation of output address

Misc updates: 

 - Remove last remaining calls to exception_enter/exception_exit() and 
   simplify the entry code some more.

 - Remove force_iret()

 - Add support for "Fast Short Rep Mov", which is available starting with 
   Ice Lake Intel CPUs - and make the x86 assembly version of memmove() 
   use REP MOV for all sizes when FSRM is available.

 - Micro-optimize/simplify the 32-bit boot code a bit.

 - Use a more future-proof SYSRET instruction mnemonic

 Thanks,

	Ingo

------------------>
Arvind Sankar (1):
      x86/boot: Simplify calculation of output address

Brian Gerst (1):
      x86: Remove force_iret()

Frederic Weisbecker (2):
      x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
      x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault

Jan Beulich (1):
      x86/entry/64: Add instruction suffix to SYSRET

Tony Luck (1):
      x86/cpufeatures: Add support for fast short REP; MOVSB


 arch/x86/boot/compressed/head_32.S |  8 +++-----
 arch/x86/entry/entry_64.S          |  2 +-
 arch/x86/ia32/ia32_signal.c        |  2 --
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/ptrace.h      | 16 ----------------
 arch/x86/include/asm/thread_info.h |  9 ---------
 arch/x86/kernel/kvm.c              |  4 ----
 arch/x86/kernel/process_32.c       |  1 -
 arch/x86/kernel/process_64.c       |  1 -
 arch/x86/kernel/signal.c           |  2 --
 arch/x86/kernel/vm86_32.c          |  1 -
 arch/x86/lib/memmove_64.S          |  7 ++++---
 arch/x86/mm/fault.c                | 39 ++++++++++++--------------------------
 13 files changed, 21 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar
@ 2020-01-28 19:51 ` Linus Torvalds
  2020-01-28 20:06   ` Linus Torvalds
  2020-01-28 21:15 ` pr-tracker-bot
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-28 19:51 UTC (permalink / raw)
  To: Ingo Molnar, Tony Luck, Borislav Petkov
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

On Tue, Jan 28, 2020 at 8:59 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>  - Add support for "Fast Short Rep Mov", which is available starting with
>    Ice Lake Intel CPUs - and make the x86 assembly version of memmove()
>    use REP MOV for all sizes when FSRM is available.

Pulled. However, this seems rather non-optimal:

        ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
        ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS

in that it leaves unnecessary NOP's there as alternatives.

We have "ALTERNATIVE_2", so we can do the above in one thing, _and_
move the rep-movsq testing code into there too:

        ALTERNATIVE_2 \
                "cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
                "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
                "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb;
retq", X86_FEATURE_ERMS

which avoids unnecessary nops.

I dunno.  It doesn't much matter, but we _do_ have things to do for
all three cases, and it actually makes sense to move all the three
"use rep movs" cases into the ALTERNATIVE. No?

UNTESTED patch attached, but visually it seems to generate better code
and less unnecessary nops (I get just two bytes of nop with this for
the nonFSRM/ERMS case)

If somebody tests this out and commits it and writes a commit message,
they can claim authorship. Or add my sign-off.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1186 bytes --]

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 7ff00ea64e4f..e42bf35b9b62 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
 	cmp %rdi, %r8
 	jg 2f
 
-	/* FSRM implies ERMS => no length checks, do the copy directly */
+	/*
+	 * Three rep-string alternatives:
+	 *  - go to "movsq" for large regions where source and dest are
+	 *    mutually aligned (same in low 8 bits). "label 4"
+	 *  - plain rep-movsb for FSRM
+	 *  - rep-movs for > 32 byte for ERMS.
+	 */
 .Lmemmove_begin_forward:
-	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
+	ALTERNATIVE_2 \
+		"cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
+		"movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
+		"cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
 
-	/*
-	 * movsq instruction have many startup latency
-	 * so we handle small size by general register.
-	 */
-	cmp  $680, %rdx
-	jb	3f
-	/*
-	 * movsq instruction is only good for aligned case.
-	 */
-
-	cmpb %dil, %sil
-	je 4f
 3:
 	sub $0x20, %rdx
 	/*

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 19:51 ` Linus Torvalds
@ 2020-01-28 20:06   ` Linus Torvalds
  2020-01-28 20:14     ` Ingo Molnar
                       ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Linus Torvalds @ 2020-01-28 20:06 UTC (permalink / raw)
  To: Ingo Molnar, Tony Luck, Borislav Petkov
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>         ALTERNATIVE_2 \
>                 "cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
>                 "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
>                 "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS

Note the UNTESTED part.

In particular, I didn't check what the priority for the alternatives
is. Since FSRM being set always implies ERMS being set too, it may be
that the ERMS case is always picked with the above code.

So maybe the FSRM and ERMS lines need to be switched around, and
somebody should add a comment to the ALTERNATIVE_2 macro about the
priority rules for feature1 vs feature2 when both are set..

IOW, testing most definitely required for that patch suggestion of mine..

                Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 20:06   ` Linus Torvalds
@ 2020-01-28 20:14     ` Ingo Molnar
  2020-01-28 20:41     ` Luck, Tony
  2020-01-29 13:26     ` Borislav Petkov
  2 siblings, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2020-01-28 20:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Borislav Petkov, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >         ALTERNATIVE_2 \
> >                 "cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> >                 "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
> >                 "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> 
> Note the UNTESTED part.
> 
> In particular, I didn't check what the priority for the alternatives
> is. Since FSRM being set always implies ERMS being set too, it may be
> that the ERMS case is always picked with the above code.
> 
> So maybe the FSRM and ERMS lines need to be switched around, and
> somebody should add a comment to the ALTERNATIVE_2 macro about the
> priority rules for feature1 vs feature2 when both are set..
> 
> IOW, testing most definitely required for that patch suggestion of mine..

Understood, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 20:06   ` Linus Torvalds
  2020-01-28 20:14     ` Ingo Molnar
@ 2020-01-28 20:41     ` Luck, Tony
  2020-01-28 21:04       ` Linus Torvalds
  2020-01-29 13:26     ` Borislav Petkov
  2 siblings, 1 reply; 29+ messages in thread
From: Luck, Tony @ 2020-01-28 20:41 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, Borislav Petkov
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

> In particular, I didn't check what the priority for the alternatives
> is. Since FSRM being set always implies ERMS being set too, it may be
> that the ERMS case is always picked with the above code.

Is there still some easy way to get gdb to disassemble from /dev/kmem
to see what we ended up with after all the patching?  With address space
randomization and other security bits I'm not certain what still works.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 20:41     ` Luck, Tony
@ 2020-01-28 21:04       ` Linus Torvalds
  2020-01-28 22:31         ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-28 21:04 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Borislav Petkov, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> Is there still some easy way to get gdb to disassemble from /dev/kmem
> to see what we ended up with after all the patching?

Hmm. No, I think it's all gone.

It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as
root, but I think we've broken that long ago.

               Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar
  2020-01-28 19:51 ` Linus Torvalds
@ 2020-01-28 21:15 ` pr-tracker-bot
  1 sibling, 0 replies; 29+ messages in thread
From: pr-tracker-bot @ 2020-01-28 21:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

The pull request you sent on Tue, 28 Jan 2020 17:59:06 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/bcc8aff6af53907ecb60e5aa8b34fbd429408a7a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 21:04       ` Linus Torvalds
@ 2020-01-28 22:31         ` Borislav Petkov
  2020-01-29 18:00           ` Josh Poimboeuf
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-28 22:31 UTC (permalink / raw)
  To: Linus Torvalds, Luck, Tony
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Andrew Morton

On Tue, Jan 28, 2020 at 01:04:19PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > Is there still some easy way to get gdb to disassemble from /dev/kmem
> > to see what we ended up with after all the patching?
> 
> Hmm. No, I think it's all gone.
> 
> It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as
> root, but I think we've broken that long ago.

Either booting with "debug-alternative" on baremetal or starting a guest
and stopping it with gdb and examining the patched memory is what I've
been using to hack on the alternatives in past years. Guest won't help
you a whole lot with FSRM but you could "force it", for example.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 20:06   ` Linus Torvalds
  2020-01-28 20:14     ` Ingo Molnar
  2020-01-28 20:41     ` Luck, Tony
@ 2020-01-29 13:26     ` Borislav Petkov
  2020-01-29 17:07       ` Luck, Tony
  2 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-29 13:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Tony Luck, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Tue, Jan 28, 2020 at 12:06:53PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >         ALTERNATIVE_2 \
> >                 "cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> >                 "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
> >                 "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> 
> Note the UNTESTED part.
> 
> In particular, I didn't check what the priority for the alternatives
> is. Since FSRM being set always implies ERMS being set too, it may be
> that the ERMS case is always picked with the above code.
> 
> So maybe the FSRM and ERMS lines need to be switched around, and
> somebody should add a comment to the ALTERNATIVE_2 macro about the
> priority rules for feature1 vs feature2 when both are set..
> 
> IOW, testing most definitely required for that patch suggestion of mine..

So what is there now before your patch is this (I've forced both
X86_FEATURE_FSRM and X86_FEATURE_ERMS on a BDW guest).

[    4.238160] apply_alternatives: feat: 18*32+4, old: (__memmove+0x17/0x1a0 (ffffffff817d90d7) len: 10), repl: (ffffffff8251dbbb, len: 0), pad: 0
[    4.239503] ffffffff817d90d7: old_insn: 48 83 fa 20 0f 82 f5 00 00 00

That's what in vmlinux:

ffffffff817d90d7:       48 83 fa 20             cmp $0x20,%rdx
ffffffff817d90db:       0f 82 f5 00 00 00       jb ffffffff817d91d6

which is 10 bytes.

It gets replaced to:

[    4.240194] ffffffff817d90d7: final_insn: 0f 1f 84 00 00 00 00 00 66 90

ffffffff817d90d7:       0f 1f 84 00 00 00 00    nopl 0x0(%rax,%rax,1)
ffffffff817d90de:       00
ffffffff817d90df:       66 90                   xchg %ax,%ax

I.e., NOPed out.

ERMS replaces the bytes *after* these 10 bytes, note the VA:

0xffffffff817d90d7 + 0xa = 0xffffffff817d90e1

[    4.240917] apply_alternatives: feat: 9*32+9, old: (__memmove+0x21/0x1a0 (ffffffff817d90e1) len: 6), repl: (ffffffff8251dbbb, len: 6), pad: 6
[    4.242209] ffffffff817d90e1: old_insn: 90 90 90 90 90 90
[    4.242823] ffffffff8251dbbb: rpl_insn: 48 89 d1 f3 a4 c3
[    4.243503] ffffffff817d90e1: final_insn: 48 89 d1 f3 a4 c3

which turns into

ffffffff817d90e1:       48 89 d1                mov %rdx,%rcx
ffffffff817d90e4:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff817d90e6:       c3                      retq

as expected.

And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it
triple-faults my guest. I'll debug it more later to find out why, when I
get a chance.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 13:26     ` Borislav Petkov
@ 2020-01-29 17:07       ` Luck, Tony
  2020-01-29 17:40         ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Luck, Tony @ 2020-01-29 17:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 02:26:18PM +0100, Borislav Petkov wrote:
> On Tue, Jan 28, 2020 at 12:06:53PM -0800, Linus Torvalds wrote:
> > On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > >         ALTERNATIVE_2 \
> > >                 "cmp  $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> > >                 "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
> > >                 "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> > 
> > Note the UNTESTED part.
> > 
> > In particular, I didn't check what the priority for the alternatives
> > is. Since FSRM being set always implies ERMS being set too, it may be
> > that the ERMS case is always picked with the above code.

So I wrote a tiny function to test (rather than wrestle with trying
to disassemble the post-alternative patched binary of the running system):

        .globl  feature
        .type   feature, @function
feature:
        .cfi_startproc
       ALTERNATIVE_2 \
               "movl    $1, %eax", \
               "movl    $2, %eax", X86_FEATURE_FSRM, \
               "movl    $3, %eax", X86_FEATURE_ERMS

        ret

This returns "3" ... not what we want. But swapping the ERMS/FSRM order
gets the correct version.

> And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it
> triple-faults my guest. I'll debug it more later to find out why, when I
> get a chance.

Triple fault is a surprise.  As long as you have ERMS, it shouldn't
hurt to take the FSRM code path.

Does the code that performs the patch use memmove() to copy the alternate
version into place? That could get ugly!

I'm not in the same city as my test machine, so I'm going to defer testing
Linus' patch (with FSRM/ERMS swapped) until I'm near enough to poke it
if it breaks.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 17:07       ` Luck, Tony
@ 2020-01-29 17:40         ` Linus Torvalds
  2020-01-29 18:34           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-29 17:40 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> This returns "3" ... not what we want. But swapping the ERMS/FSRM order
> gets the correct version.

That actually makes sense, and is what I suspected (after I wrote the
patch) what would happen. It would just be good to have it explicitly
documented at the macro.

> > And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it
> > triple-faults my guest. I'll debug it more later to find out why, when I
> > get a chance.
>
> Triple fault is a surprise.  As long as you have ERMS, it shouldn't
> hurt to take the FSRM code path.
>
> Does the code that performs the patch use memmove() to copy the alternate
> version into place? That could get ugly!

That would be bad indeed, but I don't think it should matter
particularly for this case - it would have been bad before too.

I suspect there is some other issue going on, like the NOP padding
logic being confused.

In particular, look here, this is the alt_instruction entries:

   altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
   altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b

where the arguments are "orig alt feature orig_len alt_len pad_len" in
that order.

Notice how "pad_len" in both cases is the padding from the _original_
instruction (142b-141b).

So what happens when all the three alternatives are different sizes?
In particular, we have

 (a) first 'orig' with 'orig_pad'

 (b) then we do the first feature replacement, and we use that correct
original padding

 (c) then we do the _second_ feature replacemement - except now
'orig_len, pad_len' doesn't actually match what the code is, because
we've done that first replacement.

I still don't see why it would be a problem, really, because the two
replacement sequences don't actually care about the padding (they both
end with a 'ret', so you don't need to get the padding nops right),
but I wonder if this casues confusion.

I do note that all the existing uses of ALTERNATIVE_2 in asm code that
might have this issue (REP_GOOD vs ERMS) has an empty instruction in
the middle, and the final instruction is the same size as the
original. So _if_ there is some padding confusion in alternatives
handling, it might easily have been hidden by that.

So I'm just hand-waving. Maybe there was some simpler explanation
(like me just picking the wrong instructions when I did the rough
conversion and simply breaking things with some stupid bug).

               Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-28 22:31         ` Borislav Petkov
@ 2020-01-29 18:00           ` Josh Poimboeuf
  0 siblings, 0 replies; 29+ messages in thread
From: Josh Poimboeuf @ 2020-01-29 18:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Tue, Jan 28, 2020 at 11:31:10PM +0100, Borislav Petkov wrote:
> On Tue, Jan 28, 2020 at 01:04:19PM -0800, Linus Torvalds wrote:
> > On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote:
> > >
> > > Is there still some easy way to get gdb to disassemble from /dev/kmem
> > > to see what we ended up with after all the patching?
> > 
> > Hmm. No, I think it's all gone.
> > 
> > It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as
> > root, but I think we've broken that long ago.
> 
> Either booting with "debug-alternative" on baremetal or starting a guest
> and stopping it with gdb and examining the patched memory is what I've
> been using to hack on the alternatives in past years. Guest won't help
> you a whole lot with FSRM but you could "force it", for example.

FWIW, I can still do 'objdump -d' and gdb on /proc/kcore just fine
(though you have to give gdb addresses because no symbols).

-- 
Josh


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 17:40         ` Linus Torvalds
@ 2020-01-29 18:34           ` Borislav Petkov
  2020-01-29 18:56             ` Linus Torvalds
                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Borislav Petkov @ 2020-01-29 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote:
> On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > This returns "3" ... not what we want. But swapping the ERMS/FSRM order
> > gets the correct version.
> 
> That actually makes sense, and is what I suspected (after I wrote the
> patch) what would happen. It would just be good to have it explicitly
> documented at the macro.

Like this?

---
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 13adca37c99a..d94bad03bcb4 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	ALTINSTR_REPLACEMENT(newinstr, feature, 1)			\
 	".popsection\n"
 
+/*
+ * The patching happens in the natural order of first newinstr1 and then
+ * newinstr2, iff the respective feature bits are set. See apply_alternatives()
+ * for details.
+ */
 #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
 	OLDINSTR_2(oldinstr, 1, 2)					\
 	".pushsection .altinstructions,\"a\"\n"				\

> That would be bad indeed, but I don't think it should matter
> particularly for this case - it would have been bad before too.
> 
> I suspect there is some other issue going on, like the NOP padding
> logic being confused.

Or the cmp $0x20 test missing in the default case, see below.

> In particular, look here, this is the alt_instruction entries:
> 
>    altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
>    altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b
> 
> where the arguments are "orig alt feature orig_len alt_len pad_len" in
> that order.
> 
> Notice how "pad_len" in both cases is the padding from the _original_
> instruction (142b-141b).

<snip this which I'll take a look later so that we can sort out the
issue at hand first>

> So I'm just hand-waving. Maybe there was some simpler explanation
> (like me just picking the wrong instructions when I did the rough
> conversion and simply breaking things with some stupid bug).

Looks like it. So I did this:

---
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 7ff00ea64e4f..a670d01570df 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
 	cmp %rdi, %r8
 	jg 2f
 
-	/* FSRM implies ERMS => no length checks, do the copy directly */
-.Lmemmove_begin_forward:
-	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
-	ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
-
 	/*
-	 * movsq instruction have many startup latency
-	 * so we handle small size by general register.
-	 */
-	cmp  $680, %rdx
-	jb	3f
-	/*
-	 * movsq instruction is only good for aligned case.
+	 * Three rep-string alternatives:
+	 *  - go to "movsq" for large regions where source and dest are
+	 *    mutually aligned (same in low 8 bits). "label 4"
+	 *  - plain rep-movsb for FSRM
+	 *  - rep-movs for > 32 byte for ERMS.
 	 */
+.Lmemmove_begin_forward:
+	ALTERNATIVE_2 \
+		"cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
+		"cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
+		"movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
 
-	cmpb %dil, %sil
-	je 4f
 3:
 	sub $0x20, %rdx
 	/*

---

Notice how the *first* option of the alternative, which is the default
one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which
sends us down to the less than 32 bytes copy length.

Your original version didn't have it and here's what I saw:

So I stopped the guest just before that code and the call trace looked
like this:

#0  memmove () at arch/x86/lib/memmove_64.S:43
#1  0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0, 
    size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553
#2  0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>, 
    nid=73400320, flags=<optimized out>) at mm/memblock.c:641
#3  0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830
#4  0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798
#5  0xffffffff823f9ae1 in start_kernel () at init/main.c:598
#6  0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242
#7  0x0000000000000000 in ?? ()

and count in rdx was:

rdx            0x18     24

Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test
and we run into the following asm at label 3:

3:
        sub $0x20, %rdx
        /*
         * We gobble 32 bytes forward in each loop.
         */

<--- right here %rdx is:

rdx            0xfffffffffffffff8       -8

and yeeehaaw, we're in the weeds and then end up triplefaulting at some
unmapped source address in %rsi or so.

So now I'm going to play all three variants with pen and paper to make
sure we're still sane.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 18:34           ` Borislav Petkov
@ 2020-01-29 18:56             ` Linus Torvalds
  2020-01-30  8:51               ` Borislav Petkov
  2020-01-29 19:42             ` Luck, Tony
  2020-01-30  5:47             ` Damian Tometzki
  2 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-29 18:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 10:34 AM Borislav Petkov <bp@suse.de> wrote:
>
> On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote:
>
> > So I'm just hand-waving. Maybe there was some simpler explanation
> > (like me just picking the wrong instructions when I did the rough
> > conversion and simply breaking things with some stupid bug).
>
> Looks like it. So I did this:

Ahh, yeah, good spotting.

And I wonder if we should just make that

        movq %rdx, %rcx

unconditional, because all the cases basically want it anyway (the
"label 4" case does it after the jump).

It might even make sense to move it to the top of __memmove, depending
on how well those early instructions decode.

                     Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 18:34           ` Borislav Petkov
  2020-01-29 18:56             ` Linus Torvalds
@ 2020-01-29 19:42             ` Luck, Tony
  2020-01-30  5:47             ` Damian Tometzki
  2 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2020-01-29 19:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 07:34:04PM +0100, Borislav Petkov wrote:
> > So I'm just hand-waving. Maybe there was some simpler explanation
> > (like me just picking the wrong instructions when I did the rough
> > conversion and simply breaking things with some stupid bug).
> 
> Looks like it. So I did this:
> 

That looked plausible enough to risk on my remote machine. See below.

> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 7ff00ea64e4f..a670d01570df 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
>  	cmp %rdi, %r8
>  	jg 2f
>  
> -	/* FSRM implies ERMS => no length checks, do the copy directly */
> -.Lmemmove_begin_forward:
> -	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> -
>  	/*
> -	 * movsq instruction have many startup latency
> -	 * so we handle small size by general register.
> -	 */
> -	cmp  $680, %rdx
> -	jb	3f
> -	/*
> -	 * movsq instruction is only good for aligned case.
> +	 * Three rep-string alternatives:
> +	 *  - go to "movsq" for large regions where source and dest are
> +	 *    mutually aligned (same in low 8 bits). "label 4"
> +	 *  - plain rep-movsb for FSRM
> +	 *  - rep-movs for > 32 byte for ERMS.

Maybe list the code paths in the same order that they appear in the
code below? So ERMS, then FSRM.

>  	 */
> +.Lmemmove_begin_forward:
> +	ALTERNATIVE_2 \
> +		"cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> +		"cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
> +		"movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
>  
> -	cmpb %dil, %sil
> -	je 4f
>  3:
>  	sub $0x20, %rdx
>  	/*
> 
> ---
> 

So the kernel booted. I grabbed the address of memmove from
/proc/kallysyms and used:

   # objdump -d --start-address=0x{blah blah} /proc/kcore

and things look OK in there. It picked the FSRM option to simply
use "rep movsb". Seems to be padded with various NOPs after the
retq. Then the unpatched area starts with the "sub $0x20,%rdx".


ffffffff95946840:       48 89 f8                mov    %rdi,%rax
ffffffff95946843:       48 39 fe                cmp    %rdi,%rsi
ffffffff95946846:       7d 0f                   jge    0xffffffff95946857
ffffffff95946848:       49 89 f0                mov    %rsi,%r8
ffffffff9594684b:       49 01 d0                add    %rdx,%r8
ffffffff9594684e:       49 39 f8                cmp    %rdi,%r8
ffffffff95946851:       0f 8f a9 00 00 00       jg     0xffffffff95946900
ffffffff95946857:       48 89 d1                mov    %rdx,%rcx
ffffffff9594685a:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff9594685c:       c3                      retq
ffffffff9594685d:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
ffffffff95946864:       00
ffffffff95946865:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
ffffffff9594686c:       00
ffffffff9594686d:       66 90                   xchg   %ax,%ax
ffffffff9594686f:       48 83 ea 20             sub    $0x20,%rdx


So:

Tested-by: Tony Luck <tony.luck@intel.com>

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 18:34           ` Borislav Petkov
  2020-01-29 18:56             ` Linus Torvalds
  2020-01-29 19:42             ` Luck, Tony
@ 2020-01-30  5:47             ` Damian Tometzki
  2020-01-30  7:55               ` Borislav Petkov
  2 siblings, 1 reply; 29+ messages in thread
From: Damian Tometzki @ 2020-01-30  5:47 UTC (permalink / raw)
  To: Borislav Petkov, Linus Torvalds
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton



Am 29.01.20 um 19:34 schrieb Borislav Petkov:
> On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote:
>> On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote:
>>>
>>> This returns "3" ... not what we want. But swapping the ERMS/FSRM order
>>> gets the correct version.
>>
>> That actually makes sense, and is what I suspected (after I wrote the
>> patch) what would happen. It would just be good to have it explicitly
>> documented at the macro.
> 
> Like this?
> 
> ---
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 13adca37c99a..d94bad03bcb4 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
>   	ALTINSTR_REPLACEMENT(newinstr, feature, 1)			\
>   	".popsection\n"
>   
> +/*
> + * The patching happens in the natural order of first newinstr1 and then
> + * newinstr2, iff the respective feature bits are set. See apply_alternatives()
> + * for details.
> + */
>   #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
>   	OLDINSTR_2(oldinstr, 1, 2)					\
>   	".pushsection .altinstructions,\"a\"\n"				\
> 
>> That would be bad indeed, but I don't think it should matter
>> particularly for this case - it would have been bad before too.
>>
>> I suspect there is some other issue going on, like the NOP padding
>> logic being confused.
> 
> Or the cmp $0x20 test missing in the default case, see below.
> 
>> In particular, look here, this is the alt_instruction entries:
>>
>>     altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
>>     altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b
>>
>> where the arguments are "orig alt feature orig_len alt_len pad_len" in
>> that order.
>>
>> Notice how "pad_len" in both cases is the padding from the _original_
>> instruction (142b-141b).
> 
> <snip this which I'll take a look later so that we can sort out the
> issue at hand first>
> 
>> So I'm just hand-waving. Maybe there was some simpler explanation
>> (like me just picking the wrong instructions when I did the rough
>> conversion and simply breaking things with some stupid bug).
> 
> Looks like it. So I did this:
> 
> ---
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 7ff00ea64e4f..a670d01570df 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
>   	cmp %rdi, %r8
>   	jg 2f
>   
> -	/* FSRM implies ERMS => no length checks, do the copy directly */
> -.Lmemmove_begin_forward:
> -	ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> -	ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> -
>   	/*
> -	 * movsq instruction have many startup latency
> -	 * so we handle small size by general register.
> -	 */
> -	cmp  $680, %rdx
> -	jb	3f
> -	/*
> -	 * movsq instruction is only good for aligned case.
> +	 * Three rep-string alternatives:
> +	 *  - go to "movsq" for large regions where source and dest are
> +	 *    mutually aligned (same in low 8 bits). "label 4"
> +	 *  - plain rep-movsb for FSRM
> +	 *  - rep-movs for > 32 byte for ERMS.
>   	 */
> +.Lmemmove_begin_forward:
> +	ALTERNATIVE_2 \
> +		"cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> +		"cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
> +		"movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
>   
> -	cmpb %dil, %sil
> -	je 4f
>   3:
>   	sub $0x20, %rdx
>   	/*
> 
> ---
> 
> Notice how the *first* option of the alternative, which is the default
> one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which
> sends us down to the less than 32 bytes copy length.
> 
> Your original version didn't have it and here's what I saw:
> 
> So I stopped the guest just before that code and the call trace looked
> like this:
> 
> #0  memmove () at arch/x86/lib/memmove_64.S:43
> #1  0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0,
>      size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553
> #2  0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>,
>      nid=73400320, flags=<optimized out>) at mm/memblock.c:641
> #3  0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830
> #4  0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798
> #5  0xffffffff823f9ae1 in start_kernel () at init/main.c:598
> #6  0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242
> #7  0x0000000000000000 in ?? ()
> 
> and count in rdx was:
> 
> rdx            0x18     24
> 
> Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test
> and we run into the following asm at label 3:
> 
> 3:
>          sub $0x20, %rdx
>          /*
>           * We gobble 32 bytes forward in each loop.
>           */
> 
> <--- right here %rdx is:
> 
> rdx            0xfffffffffffffff8       -8
> 
> and yeeehaaw, we're in the weeds and then end up triplefaulting at some
> unmapped source address in %rsi or so.
> 
> So now I'm going to play all three variants with pen and paper to make
> sure we're still sane.
> 
> Thx.
> 
Hello together,

in my qemu env the system isnt coming up. I tried both with and without 
the changes from Borislav.

Best regards
Damian


0.605193] ------------[ cut here ]------------
[    0.605933] General protection fault in user access. Non-canonical 
address?
[    0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77 
ex_handler_uaccess+0x48/0x50
[    0.606931] Modules linked in:
[    0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15
[    0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[    0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50
[    0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00 
75 dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00 
00 <0f> 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49
[    0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286
[    0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX: 
0000000000000000
[    0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI: 
ffffffff8316e05f
[    0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09: 
0000000000029840
[    0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12: 
0000000000000000
[    0.606931] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[    0.606931] FS:  0000000000000000(0000) GS:ffff88800f800000(0000) 
knlGS:0000000000000000
[    0.606931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 
00000000000006f0
[    0.606931] Call Trace:
[    0.606931]  fixup_exception+0x3e/0x51
[    0.606931]  do_general_protection+0x9c/0x260
[    0.606931]  general_protection+0x28/0x30
[    0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40
[    0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 
d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 
07 <f3> 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f
[    0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246
[    0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX: 
0000000000000001
[    0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI: 
ffffc900000dbdc0
[    0.606931] RBP: ffffffffffffffff R08: ffff88800f82d280 R09: 
0000000000000073
[    0.606931] R10: 0000000000000000 R11: ffffc900000dbd68 R12: 
0000000000000dc0
[    0.606931] R13: 00000000000000fe R14: ffffffff81252d01 R15: 
009896808086a201
[    0.606931]  ? __register_sysctl_table+0x361/0x500
[    0.606931]  __probe_kernel_read+0x2e/0x60
[    0.606931]  __kmalloc+0x10b/0x230
[    0.606931]  __register_sysctl_table+0x361/0x500
[    0.606931]  ? kmem_cache_alloc_trace+0xee/0x1e0
[    0.606931]  __register_sysctl_paths+0x186/0x1b0
[    0.606931]  ? iomap_init+0x1b/0x1b
[    0.606931]  ? do_early_param+0x89/0x89
[    0.606931]  dquot_init+0x23/0x117
[    0.606931]  ? iomap_init+0x1b/0x1b
[    0.606931]  do_one_initcall+0x31/0x1b0
[    0.606931]  ? do_early_param+0x89/0x89
[    0.606931]  ? do_early_param+0x89/0x89
[    0.606931]  kernel_init_freeable+0x15b/0x1b3
[    0.606931]  ? rest_init+0x9a/0x9a
[    0.606931]  kernel_init+0x5/0xf6
[    0.606931]  ret_from_fork+0x35/0x40
[    0.606931] ---[ end trace 8ee2a58282a5eb54 ]---
[    0.650539] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC PTI

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30  5:47             ` Damian Tometzki
@ 2020-01-30  7:55               ` Borislav Petkov
  2020-01-30 11:10                 ` Mike Rapoport
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30  7:55 UTC (permalink / raw)
  To: Damian Tometzki
  Cc: Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

Hello Damian,

On Thu, Jan 30, 2020 at 06:47:14AM +0100, Damian Tometzki wrote:
> in my qemu env the system isnt coming up. I tried both with and without the
> changes from Borislav.

in the future, please do not hijack the thread like that but start a new
one or open a bug on bugzilla.kernel.org. Your issue is something else.

> 0.605193] ------------[ cut here ]------------
> [    0.605933] General protection fault in user access. Non-canonical
> address?

There it is.

> [    0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77
> ex_handler_uaccess+0x48/0x50
> [    0.606931] Modules linked in:
> [    0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15
> [    0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
> [    0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50
> [    0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00 75
> dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00 00 <0f>
> 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49
> [    0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286
> [    0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX:
> 0000000000000000
> [    0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI:
> ffffffff8316e05f
> [    0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09:
> 0000000000029840
> [    0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12:
> 0000000000000000
> [    0.606931] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    0.606931] FS:  0000000000000000(0000) GS:ffff88800f800000(0000)
> knlGS:0000000000000000
> [    0.606931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4:
> 00000000000006f0
> [    0.606931] Call Trace:
> [    0.606931]  fixup_exception+0x3e/0x51
> [    0.606931]  do_general_protection+0x9c/0x260
> [    0.606931]  general_protection+0x28/0x30
> [    0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40
> [    0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9
> 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3>
> 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f
> [    0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246
> [    0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX:
> 0000000000000001
> [    0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI:
> ffffc900000dbdc0

It looks like dquot_init->register_sysctl_table-> ... does copy_to_user
at some point and it goes off into the weeds and %rsi becomes
non-canonical.

Please start a new thread or open a bug and upload your .config and
dmesg. We'll continue debugging that there.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-29 18:56             ` Linus Torvalds
@ 2020-01-30  8:51               ` Borislav Petkov
  2020-01-30 15:27                 ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30  8:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Jan 29, 2020 at 10:56:43AM -0800, Linus Torvalds wrote:
> Ahh, yeah, good spotting.
> 
> And I wonder if we should just make that
> 
>         movq %rdx, %rcx
> 
> unconditional, because all the cases basically want it anyway (the
> "label 4" case does it after the jump).
> 
> It might even make sense to move it to the top of __memmove, depending
> on how well those early instructions decode.

Yah, that makes sense and it kinda works but let me sanity-check whether
what we're doing is still worth our time.

In one of my previous mails:

https://lkml.kernel.org/r/20200129132618.GA30979@zn.tnic

I showed that the total length of instructions we're talking about here
is 10 + 6 = 16 bytes.

The current version I have ATM (with "mov %rdx, %rcx" hoisted up in the
function):

        ALTERNATIVE_2 \
                "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx; jb 3f; cmpb %dil, %sil; je 4f", \
                "cmp $0x20, %rdx; jb 1f; rep movsb; retq", X86_FEATURE_ERMS, \
                "rep movsb; retq", X86_FEATURE_FSRM

has grown to:

[    4.182616] apply_alternatives: feat: 9*32+9, old: (__memmove+0x1a/0x1a0 (ffffffff817d90da) len: 24), repl: (ffffffff8251dbbb, len: 13), pad: 0
[    4.183915] ffffffff817d90da: old_insn: 48 83 fa 20 0f 82 f2 00 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3e
[    4.184982] ffffffff8251dbbb: rpl_insn: 48 83 fa 20 0f 82 11 b6 2b ff f3 a4 c3
[    4.185801] ffffffff817d90da: final_insn: 48 83 fa 20 0f 82 11 b6 2b ff f3 a4 c3 0f 1f 84 00 00 00 00 00 0f 1f 00

The default case is the longest:

ffffffff817d90da:       48 83 fa 20             cmp $0x20,%rdx
ffffffff817d90de:       0f 82 11 b6 2b ff       jb ffffffff80a946f5
ffffffff817d90e4:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff817d90e6:       c3                      retq 
ffffffff817d90e7:       0f 1f 84 00 00 00 00    nopl 0x0(%rax,%rax,1)
ffffffff817d90ee:       00 
ffffffff817d90ef:       0f 1f 00                nopl (%rax)

24 bytes which is already 8 bytes more.

The only argument I see for this is readability as the single
ALTERNATIVE_2 statement shows straight-forward what gets replaced where.
VS as it is right now needs a bit more staring.

However, this new version has hit another shortcoming of the
alternatives - check this out:

So I enabled only X86_FEATURE_ERMS in my guest first to see how

        ALTERNATIVE_2 \
                "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx; jb 3f; cmpb %dil, %sil; je 4f", \
                "cmp $0x20, %rdx; jb 1f; rep movsb; retq", X86_FEATURE_ERMS, \
^^^^^^^^^^^^^^^^^^^^^

that line would work.

                "rep movsb; retq", X86_FEATURE_FSRM

And it exploded, see the end of this mail. Stopping right before
widen_string() showed me this:

We call memmove with:

rdx            0x1      1
rsi            0xffffffff82543b62       -2108408990
rdi            0xffffffff82543b65       -2108408987

i.e., length is only one.

We run into

	"cmp $0x20, %rdx; jb 1f; rep movsb; retq"

and now look how that "jb 1f" looks in gdb:

...
rip            0xffffffff817d90de       0xffffffff817d90de <memmove+30>
...

=> 0xffffffff817d90de <memmove+30>:     jb     0xffffffff80a946f5
0xffffc90000013c48:     0xffffffff817cfee8      0xffffc90000013da8
0xffffc90000013c58:     0xffffffff82543f40      0xffffffff81eefabe
0xffffc90000013c68:     0xffffffff817d0306      0xffffffff82543b62
0xffffc90000013c78:     0xffffffff82543b62      0xffffffff82543f40
0xffffc90000013c88:     0xffffffff81eb892b      0xffffffff817d2f36
Dump of assembler code from 0xffffffff817d90de to 0xffffffff817d90f2:
=> 0xffffffff817d90de <memmove+30>:     0f 82 11 b6 2b ff       jb     0xffffffff80a946f5
   0xffffffff817d90e4 <memmove+36>:     f3 a4   rep movsb %ds:(%rsi),%es:(%rdi)
   0xffffffff817d90e6 <memmove+38>:     c3      retq   
   0xffffffff817d90e7 <memmove+39>:     0f 1f 84 00 00 00 00 00 nopl   0x0(%rax,%rax,1)
   0xffffffff817d90ef <memmove+47>:     0f 1f 00        nopl   (%rax)

RIP is 0xffffffff817d90de but the JMP absolute target is
0xffffffff80a946f5. And that's waaaay off. And then it dawned on me -
apply_alternatives() only fixes jumps when they're the first byte of the
replacement:

                if (a->replacementlen && is_jmp(replacement[0]))
                        recompute_jump(a, instr, replacement, insn_buff);

and otherwise the JMPs are wrong in .altinstr_replacement as there
they're relative to the next RIP of the insn there:

ffffffff8251dbbb:       48 83 fa 20             cmp    $0x20,%rdx
ffffffff8251dbbf:       0f 82 11 b6 2b ff       jb     ffffffff817d91d6 <__memmove+0x116>

i.e., that offset is from *this* section to the place where __memmove is
and hasn't been fixed up.

And this whole deal has popped up in the past but we've never decided
that it is worth implementing proper insn parsing in the alternatives so
that only the correct insn bytes which are jumps, get to be recomputed.

So I can put it on my TODO to fix that but it would require more staring
and experimenting before we can have JMP insns in arbitrary places in
the insn replacement. The good thing is we have an insn decoder in the
kernel now so using that should be relatively easy.

Question is, do we really need it?

Thoughts?

[    4.391913] BUG: unable to handle page fault for address: ffffffff80a946f5
[    4.391913] #PF: supervisor instruction fetch in kernel mode
[    4.391913] #PF: error_code(0x0010) - not-present page
[    4.391913] PGD 200a067 P4D 200a067 PUD 200b063 PMD 0 
[    4.391913] Oops: 0010 [#1] PREEMPT SMP
[    4.391913] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0+ #19
[    4.391913] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[    4.391913] RIP: 0010:0xffffffff80a946f5
[    4.391913] Code: Bad RIP value.
[    4.391913] RSP: 0000:ffffc90000013c48 EFLAGS: 00010087
[    4.391913] RAX: ffffffff82543b65 RBX: 0000000000000003 RCX: 0000000000000001
[    4.391913] RDX: 0000000000000001 RSI: ffffffff82543b62 RDI: ffffffff82543b65
[    4.391913] RBP: ffffffff82543b62 R08: ffffffff82543b63 R09: 0000000000000001
[    4.391913] R10: ffffffff82543f40 R11: 0000000082543b61 R12: ffffffff82543b63
[    4.391913] R13: ffff0a0000000404 R14: 00000000000003e0 R15: ffffffff81eb892b
[    4.391913] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[    4.391913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.391913] CR2: ffffffff80a946cb CR3: 0000000002009000 CR4: 00000000003406f0
[    4.391913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    4.391913] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[    4.391913] Call Trace:
[    4.391913]  ? widen_string+0x98/0xb0
[    4.391913]  ? string+0x56/0x60
[    4.391913]  ? vsnprintf+0x256/0x4f0
[    4.391913]  ? vscnprintf+0x9/0x30
[    4.391913]  ? vprintk_store+0x4c/0x220
[    4.391913]  ? vprintk_emit+0x114/0x2b0
[    4.391913]  ? printk+0x48/0x4a
[    4.391913]  ? native_cpu_up.cold+0x9e/0xd4
[    4.391913]  ? cpus_read_trylock+0x50/0x50
[    4.391913]  ? bringup_cpu+0x3c/0x100
[    4.391913]  ? cpuhp_invoke_callback+0x95/0x5f0
[    4.391913]  ? _cpu_up+0xa4/0x140
[    4.391913]  ? do_cpu_up+0xa8/0xb0
[    4.391913]  ? smp_init+0x58/0xac
[    4.391913]  ? kernel_init_freeable+0xa0/0x183
[    4.391913]  ? rest_init+0xb9/0xb9
[    4.391913]  ? kernel_init+0xa/0xf7
[    4.391913]  ? ret_from_fork+0x35/0x40
[    4.391913] Modules linked in:
[    4.391913] CR2: ffffffff80a946f5
[    4.391913] ---[ end trace 2d921178836b2f4d ]---
[    4.391913] RIP: 0010:0xffffffff80a946f5
[    4.391913] Code: Bad RIP value.
[    4.391913] RSP: 0000:ffffc90000013c48 EFLAGS: 00010087
[    4.391913] RAX: ffffffff82543b65 RBX: 0000000000000003 RCX: 0000000000000001
[    4.391913] RDX: 0000000000000001 RSI: ffffffff82543b62 RDI: ffffffff82543b65
[    4.391913] RBP: ffffffff82543b62 R08: ffffffff82543b63 R09: 0000000000000001
[    4.391913] R10: ffffffff82543f40 R11: 0000000082543b61 R12: ffffffff82543b63
[    4.391913] R13: ffff0a0000000404 R14: 00000000000003e0 R15: ffffffff81eb892b
[    4.391913] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[    4.391913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.391913] CR2: ffffffff80a946cb CR3: 0000000002009000 CR4: 00000000003406f0
[    4.391913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    4.391913] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[    4.391913] note: swapper/0[1] exited with preempt_count 1
[    4.391913] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30  7:55               ` Borislav Petkov
@ 2020-01-30 11:10                 ` Mike Rapoport
  2020-01-30 11:53                   ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Rapoport @ 2020-01-30 11:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 08:55:49AM +0100, Borislav Petkov wrote:
> Hello Damian,
> 
> On Thu, Jan 30, 2020 at 06:47:14AM +0100, Damian Tometzki wrote:
> > in my qemu env the system isnt coming up. I tried both with and without the
> > changes from Borislav.
> 
> in the future, please do not hijack the thread like that but start a new
> one or open a bug on bugzilla.kernel.org. Your issue is something else.
> 
> > 0.605193] ------------[ cut here ]------------
> > [    0.605933] General protection fault in user access. Non-canonical
> > address?
> 
> There it is.
> 
> > [    0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77
> > ex_handler_uaccess+0x48/0x50
> > [    0.606931] Modules linked in:
> > [    0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15
> > [    0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
> > [    0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50

...
 
> It looks like dquot_init->register_sysctl_table-> ... does copy_to_user
> at some point and it goes off into the weeds and %rsi becomes
> non-canonical.
> 
> Please start a new thread or open a bug and upload your .config and
> dmesg. We'll continue debugging that there.

Maybe that won't be needed.

It seems that this a random boot crash caused by 987f028b8637cfa7 ("char:
hpet: Use flexible-array member") and fix is on the way:

https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/
 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 11:10                 ` Mike Rapoport
@ 2020-01-30 11:53                   ` Borislav Petkov
  2020-01-30 11:59                     ` Mike Rapoport
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30 11:53 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 01:10:57PM +0200, Mike Rapoport wrote:
> It seems that this a random boot crash caused by 987f028b8637cfa7 ("char:
> hpet: Use flexible-array member") and fix is on the way:
> 
> https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/

Hmm, I don't see the connection at a first glance except that both stack
traces lead somewhere down the bowels of k*alloc...

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 11:53                   ` Borislav Petkov
@ 2020-01-30 11:59                     ` Mike Rapoport
  2020-01-30 12:06                       ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Rapoport @ 2020-01-30 11:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 12:53:26PM +0100, Borislav Petkov wrote:
> On Thu, Jan 30, 2020 at 01:10:57PM +0200, Mike Rapoport wrote:
> > It seems that this a random boot crash caused by 987f028b8637cfa7 ("char:
> > hpet: Use flexible-array member") and fix is on the way:
> > 
> > https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/
> 
> Hmm, I don't see the connection at a first glance except that both stack
> traces lead somewhere down the bowels of k*alloc...

I've seen similar crash with my qemu/kvm and bisected it to that commit.

The hpet allocation is off-by-one and as a result hpet corrupts the memory
somewhere in the slab
 
> -- 
> Regards/Gruss,
>     Boris.
> 
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 11:59                     ` Mike Rapoport
@ 2020-01-30 12:06                       ` Borislav Petkov
  2020-01-30 16:45                         ` Damian Tometzki
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30 12:06 UTC (permalink / raw)
  To: Mike Rapoport, Damian Tometzki
  Cc: Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 01:59:59PM +0200, Mike Rapoport wrote:
> I've seen similar crash with my qemu/kvm and bisected it to that commit.
> 
> The hpet allocation is off-by-one and as a result hpet corrupts the memory
> somewhere in the slab

Oh wow, wonderful. Good that we have bisection in such cases - there's
no way in hell I'll debug it to that. ;-\

Thanks for saving me a bunch of time.

Damian, can you test the patch at the bottom of that mail:

https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/

?

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30  8:51               ` Borislav Petkov
@ 2020-01-30 15:27                 ` Linus Torvalds
  2020-01-30 17:39                   ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-30 15:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 12:51 AM Borislav Petkov <bp@suse.de> wrote:
>
> However, this new version has hit another shortcoming of the
> alternatives - check this out:

[ Branches not getting fixed up ]

Fair enough. Let's not complicate things just to avoid a few nops.

That does make me wonder about RIP-relative addressing in alternatives
too. Particularly anything where we let gcc pick addressing modes. I
guess we don't have any, but maybe this branch issue and possible RIP
addressing is something that objtool could be taught to warn about?

                  Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 12:06                       ` Borislav Petkov
@ 2020-01-30 16:45                         ` Damian Tometzki
  2020-01-30 17:39                           ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Damian Tometzki @ 2020-01-30 16:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mike Rapoport, Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

Hello Borislav,

System is now boots without errors.

tested.
Damian

On Thu, Jan 30, 2020 at 1:06 PM Borislav Petkov <bp@suse.de> wrote:
>
> On Thu, Jan 30, 2020 at 01:59:59PM +0200, Mike Rapoport wrote:
> > I've seen similar crash with my qemu/kvm and bisected it to that commit.
> >
> > The hpet allocation is off-by-one and as a result hpet corrupts the memory
> > somewhere in the slab
>
> Oh wow, wonderful. Good that we have bisection in such cases - there's
> no way in hell I'll debug it to that. ;-\
>
> Thanks for saving me a bunch of time.
>
> Damian, can you test the patch at the bottom of that mail:
>
> https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/
>
> ?
>
> Thx.
>
> --
> Regards/Gruss,
>     Boris.
>
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 15:27                 ` Linus Torvalds
@ 2020-01-30 17:39                   ` Borislav Petkov
  2020-01-30 18:02                     ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30 17:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 07:27:28AM -0800, Linus Torvalds wrote:
> Fair enough. Let's not complicate things just to avoid a few nops.

Yeah, judging by the frequency this keeps popping up, we might end up
doing proper insn parsing for the alternatives soon. :)

> That does make me wonder about RIP-relative addressing in alternatives
> too. Particularly anything where we let gcc pick addressing modes. I
> guess we don't have any, but maybe this branch issue and possible RIP
> addressing is something that objtool could be taught to warn about?

Yeah, makes sense. It would help if one slaps a relative JMP as *not*
the first insn in an alternatives replacement and the build to warn that
it can't work. Lemme go stare at objtool.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 16:45                         ` Damian Tometzki
@ 2020-01-30 17:39                           ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2020-01-30 17:39 UTC (permalink / raw)
  To: Damian Tometzki
  Cc: Mike Rapoport, Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 05:45:04PM +0100, Damian Tometzki wrote:
> Hello Borislav,
> 
> System is now boots without errors.
> 
> tested.

Thanks for testing!

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 17:39                   ` Borislav Petkov
@ 2020-01-30 18:02                     ` Linus Torvalds
  2020-01-31 14:27                       ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2020-01-30 18:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Jan 30, 2020 at 9:39 AM Borislav Petkov <bp@suse.de> wrote:
>
> Yeah, makes sense. It would help if one slaps a relative JMP as *not*
> the first insn in an alternatives replacement and the build to warn that
> it can't work.

Maybe with the exception that a short conditional jump inside the
alternative code itself is fine.

Because a branch-over inside the alternative sequence (or a loop -
think inline cmpxchg loop or whatever) would be fine, since it's
unaffected by code placement.

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-30 18:02                     ` Linus Torvalds
@ 2020-01-31 14:27                       ` Borislav Petkov
  2020-01-31 16:05                         ` Josh Poimboeuf
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2020-01-31 14:27 UTC (permalink / raw)
  To: Linus Torvalds, Josh Poimboeuf
  Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

+ Josh.

On Thu, Jan 30, 2020 at 10:02:54AM -0800, Linus Torvalds wrote:
> Maybe with the exception that a short conditional jump inside the
> alternative code itself is fine.
> 
> Because a branch-over inside the alternative sequence (or a loop -
> think inline cmpxchg loop or whatever) would be fine, since it's
> unaffected by code placement.

Perhaps something like the below as a start.

The build becomes really noisy if the error case is hit because aborting
in handle_group_alt() really messes up objtool processing but that is
perhaps ok as the idea is to *see* that something's wrong.

---
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4768d91c6d68..b4dfd625842d 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -756,7 +771,9 @@ static int handle_group_alt(struct objtool_file *file,
 	last_new_insn = NULL;
 	insn = *new_insn;
 	sec_for_each_insn_from(file, insn) {
-		if (insn->offset >= special_alt->new_off + special_alt->new_len)
+		unsigned long new_offset = special_alt->new_off + special_alt->new_len;
+
+		if (insn->offset >= new_offset)
 			break;
 
 		last_new_insn = insn;
@@ -765,8 +787,12 @@ static int handle_group_alt(struct objtool_file *file,
 		insn->func = orig_insn->func;
 
 		if (insn->type != INSN_JUMP_CONDITIONAL &&
-		    insn->type != INSN_JUMP_UNCONDITIONAL)
+		    insn->type != INSN_JUMP_UNCONDITIONAL) {
 			continue;
+		} else {
+			if (insn->offset > 0 && insn->offset + insn->len < new_offset)
+				ERROR("Subsequent JMP instructions are not alternatives-patched. Stopping.");
+		}
 
 		if (!insn->immediate)
 			continue;
diff --git a/tools/objtool/warn.h b/tools/objtool/warn.h
index cbb0a02b7480..652c7adc7650 100644
--- a/tools/objtool/warn.h
+++ b/tools/objtool/warn.h
@@ -40,6 +40,14 @@ static inline char *offstr(struct section *sec, unsigned long offset)
 	return str;
 }
 
+#define ERROR(format, ...)				\
+({							\
+	fprintf(stderr,					\
+		"%s: error: objtool: " format "\n",	\
+		objname, ##__VA_ARGS__);		\
+	abort();					\
+})
+
 #define WARN(format, ...)				\
 	fprintf(stderr,					\
 		"%s: warning: objtool: " format "\n",	\

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [GIT PULL] x86/asm changes for v5.6
  2020-01-31 14:27                       ` Borislav Petkov
@ 2020-01-31 16:05                         ` Josh Poimboeuf
  0 siblings, 0 replies; 29+ messages in thread
From: Josh Poimboeuf @ 2020-01-31 16:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Luck, Tony, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton, Julien Thierry

On Fri, Jan 31, 2020 at 03:27:27PM +0100, Borislav Petkov wrote:
> + Josh.
> 
> On Thu, Jan 30, 2020 at 10:02:54AM -0800, Linus Torvalds wrote:
> > Maybe with the exception that a short conditional jump inside the
> > alternative code itself is fine.
> > 
> > Because a branch-over inside the alternative sequence (or a loop -
> > think inline cmpxchg loop or whatever) would be fine, since it's
> > unaffected by code placement.
> 
> Perhaps something like the below as a start.
> 
> The build becomes really noisy if the error case is hit because aborting
> in handle_group_alt() really messes up objtool processing but that is
> perhaps ok as the idea is to *see* that something's wrong.

[ Adding Julien as an FYI -- not sure how it affects arm64 ]

Boris, I made the check even broader than we discussed on IRC: it
disallows *any* relocations in the alternatives section unless it's the
first instruction in the group and it's a call or a jmp.

Untested, let me know if it works.

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index b6da413bcbd6..382a65363379 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -782,6 +782,20 @@ static int handle_group_alt(struct objtool_file *file,
 		insn->ignore = orig_insn->ignore_alts;
 		insn->func = orig_insn->func;
 
+		/*
+		 * The x86 alternatives code adjusts relocation targets only in
+		 * the case where the first instruction is either a call or a
+		 * jump.  Otherwise, relocations aren't supported by the
+		 * alternatives code (as there hasn't yet been a need for it).
+		 */
+		if (!(insn->offset == special_alt->new_off &&
+		      (insn->type == INSN_CALL || insn->type == INSN_JUMP_UNCONDITIONAL)) &&
+		    find_rela_by_dest_range(insn->sec, insn->offset, insn->len)) {
+			WARN_FUNC("relocations not allowed in alternatives section",
+				  insn->sec, insn->offset);
+			return -1;
+		}
+
 		if (insn->type != INSN_JUMP_CONDITIONAL &&
 		    insn->type != INSN_JUMP_UNCONDITIONAL)
 			continue;
@@ -2509,8 +2523,14 @@ int check(const char *_objname, bool orc)
 out:
 	cleanup(&file);
 
-	/* ignore warnings for now until we get all the code cleaned up */
-	if (ret || warnings)
-		return 0;
+	if (ret < 0) {
+		/*
+		 *  Fatal error.  The binary is corrupt or otherwise broken in
+		 *  some way, or objtool is badly broken.  Fail the kernel
+		 *  build.
+		 */
+		return ret;
+	}
+
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-01-31 16:05 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar
2020-01-28 19:51 ` Linus Torvalds
2020-01-28 20:06   ` Linus Torvalds
2020-01-28 20:14     ` Ingo Molnar
2020-01-28 20:41     ` Luck, Tony
2020-01-28 21:04       ` Linus Torvalds
2020-01-28 22:31         ` Borislav Petkov
2020-01-29 18:00           ` Josh Poimboeuf
2020-01-29 13:26     ` Borislav Petkov
2020-01-29 17:07       ` Luck, Tony
2020-01-29 17:40         ` Linus Torvalds
2020-01-29 18:34           ` Borislav Petkov
2020-01-29 18:56             ` Linus Torvalds
2020-01-30  8:51               ` Borislav Petkov
2020-01-30 15:27                 ` Linus Torvalds
2020-01-30 17:39                   ` Borislav Petkov
2020-01-30 18:02                     ` Linus Torvalds
2020-01-31 14:27                       ` Borislav Petkov
2020-01-31 16:05                         ` Josh Poimboeuf
2020-01-29 19:42             ` Luck, Tony
2020-01-30  5:47             ` Damian Tometzki
2020-01-30  7:55               ` Borislav Petkov
2020-01-30 11:10                 ` Mike Rapoport
2020-01-30 11:53                   ` Borislav Petkov
2020-01-30 11:59                     ` Mike Rapoport
2020-01-30 12:06                       ` Borislav Petkov
2020-01-30 16:45                         ` Damian Tometzki
2020-01-30 17:39                           ` Borislav Petkov
2020-01-28 21:15 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).