All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>,
	tim.c.chen@linux.intel.com, pjt@google.com, jikos@kernel.org,
	gregkh@linux-foundation.org, dave.hansen@intel.com,
	mingo@kernel.org, riel@redhat.com, luto@amacapital.net,
	torvalds@linux-foundation.org, ak@linux.intel.com,
	keescook@google.com, peterz@infradead.org, tglx@linutronix.de,
	hpa@zytor.com, linux-kernel@vger.kernel.org,
	linux-tip-commits@vger.kernel.org
Subject: Re: [tip:x86/pti] x86/retpoline: Fill return stack buffer on vmexit
Date: Thu, 25 Jan 2018 20:07:29 +0100	[thread overview]
Message-ID: <20180125190729.jxey2k44sptuk42t@pd.tnic> (raw)
In-Reply-To: <1516903463.30244.148.camel@infradead.org>

On Thu, Jan 25, 2018 at 06:04:23PM +0000, David Woodhouse wrote:
> Yep, I'll buy that. But first we need Josh to work out what he's having
> for lunch.
> 
> Although just another marker to tell objtool "ignore this whole
> function" might be sufficient to allow us to have an out-of-line RSB-
> stuffing function.

Ok, I think we solved it all on IRC.

I'm adding a conglomerate patch at the end. It builds and boots here in
a vm. Here's what I did:

The barrier:

static inline void indirect_branch_prediction_barrier(void)
{
        alternative_input("", "call __ibp_barrier", X86_FEATURE_IBPB, ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
}

with

void __ibp_barrier(void)
{
        wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
}

Alternatives applies correctly:

[    1.040965] apply_alternatives: feat: 7*32+16, old: (ffffffff8259d806 len: 5), repl: (ffffffff826aa88b, len: 5), pad: 5
[    1.042798] ffffffff8259d806: old_insn: 90 90 90 90 90
[    1.044001] ffffffff826aa88b: rpl_insn: e8 80 c3 9c fe
[    1.044938] apply_alternatives: Fix CALL offset: 0xfead9405, CALL 0xffffffff81076c10
[    1.046343] ffffffff8259d806: final_insn: e8 05 94 ad fe

with final_insn becoming:

ffffffff8259d806:       e8 80 c3 9c fe          callq  ffffffff81076c10 <__ibp_barrier>

When we look at the __ibp_barrier:

ffffffff81076c10 <__ibp_barrier>:
ffffffff81076c10:       e8 bb ac 98 00          callq  ffffffff81a018d0 <__fentry__>
ffffffff81076c15:       b9 49 00 00 00          mov    $0x49,%ecx
ffffffff81076c1a:       b8 01 00 00 00          mov    $0x1,%eax
ffffffff81076c1f:       31 d2                   xor    %edx,%edx
ffffffff81076c21:       0f 30                   wrmsr  
ffffffff81076c23:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffff81076c28:       c3                      retq   
ffffffff81076c29:       31 d2                   xor    %edx,%edx
ffffffff81076c2b:       be 01 00 00 00          mov    $0x1,%esi
ffffffff81076c30:       bf 49 00 00 00          mov    $0x49,%edi
ffffffff81076c35:       e9 76 64 3f 00          jmpq   ffffffff8146d0b0 <do_trace_write_msr>
ffffffff81076c3a:       90                      nop
ffffffff81076c3b:       90                      nop
ffffffff81076c3c:       90                      nop
ffffffff81076c3d:       90                      nop
ffffffff81076c3e:       90                      nop

It has the tracing shit too. We could replace that with __wrmsr() which
does exception handling. And I think we should do that instead.

/me does it.

The RSB filler became:

static inline void vmexit_fill_RSB(void)
{
#ifdef CONFIG_RETPOLINE
        alternative_input("", "call __fill_rsb_clobber_ax", X86_FEATURE_RETPOLINE, ASM_NO_INPUT_CLOBBER("memory"));
#endif
}

and __fill_rsb_clobber_ax looks like:

ENTRY(__fill_rsb_clobber_ax)
        ___FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, %_ASM_SP
END(__fill_rsb_clobber_ax)

where:

.macro  ___FILL_RETURN_BUFFER reg:req nr:req sp:req
        mov     (\nr / 2), \reg
        .align 16
771:
        call    772f
773:                                            /* speculation trap */
        pause
        lfence
        jmp     773b
        .align 16
772:
        call    774f
775:                                            /* speculation trap */
        pause
        lfence
        jmp     775b
        .align 16
774:
        dec     \reg
        jnz     771b
        add     (BITS_PER_LONG /8) * \nr, \sp
.endm

Please check the alignment directives are at the right places.

Application looks correct too:

[    0.428974] apply_alternatives: feat: 7*32+12, old: (ffffffff8105fe50 len: 5), repl: (ffffffff826aa69f, len: 5), pad: 5
[    0.432003] ffffffff8105fe50: old_insn: 90 90 90 90 90
[    0.432977] ffffffff826aa69f: rpl_insn: e8 5c 8b 55 ff
[    0.433943] apply_alternatives: Fix CALL offset: 0xba33ab, CALL 0xffffffff81c03200
[    0.435368] ffffffff8105fe50: final_insn: e8 ab 33 ba 00

final_insn becomes

ffffffff8105fe50:       e8 5c 8b 55 ff          callq  ffffffff81c03200 <__fill_rsb_clobber_ax>

and the __fill_rsb_clobber_ax asm look good to me too:

ffffffff81c03200 <__fill_rsb_clobber_ax>:
ffffffff81c03200:       48 8b 04 25 10 00 00    mov    0x10,%rax
ffffffff81c03207:       00 
ffffffff81c03208:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
ffffffff81c0320f:       00 
ffffffff81c03210:       e8 0b 00 00 00          callq  ffffffff81c03220 <__fill_rsb_clobber_ax+0x20>
ffffffff81c03215:       f3 90                   pause  
ffffffff81c03217:       0f ae e8                lfence 
ffffffff81c0321a:       eb f9                   jmp    ffffffff81c03215 <__fill_rsb_clobber_ax+0x15>
ffffffff81c0321c:       0f 1f 40 00             nopl   0x0(%rax)
ffffffff81c03220:       e8 0b 00 00 00          callq  ffffffff81c03230 <__fill_rsb_clobber_ax+0x30>
ffffffff81c03225:       f3 90                   pause  
ffffffff81c03227:       0f ae e8                lfence 
ffffffff81c0322a:       eb f9                   jmp    ffffffff81c03225 <__fill_rsb_clobber_ax+0x25>
ffffffff81c0322c:       0f 1f 40 00             nopl   0x0(%rax)
ffffffff81c03230:       48 ff c8                dec    %rax
ffffffff81c03233:       75 db                   jne    ffffffff81c03210 <__fill_rsb_clobber_ax+0x10>
ffffffff81c03235:       48 03 24 25 00 01 00    add    0x100,%rsp
ffffffff81c0323c:       00

with the respective alignment.

Btw, I've forced

	 setup_force_cpu_cap(X86_FEATURE_IBPB);

so that I can see it applies only - nothing else.

Here's the full diff I have so far:

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 67bbfaa1448b..a7c5602847b1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -210,6 +210,7 @@
 #define X86_FEATURE_AVX512_4VNNIW	( 7*32+16) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
 
+#define X86_FEATURE_IBPB		( 7*32+16) /* Using Indirect Branch Prediction Barrier */
 #define X86_FEATURE_MBA			( 7*32+18) /* Memory Bandwidth Allocation */
 #define X86_FEATURE_RSB_CTXSW		( 7*32+19) /* Fill RSB on context switches */
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e7b983a35506..1c4e87b73789 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -39,6 +39,9 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_IA32_PRED_CMD              0x00000049 /* Prediction Command */
+#define PRED_CMD_IBPB                  (1 << 0)   /* Indirect Branch Prediction Barrier */
+
 #define MSR_PPIN_CTL			0x0000004e
 #define MSR_PPIN			0x0000004f
 
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4ad41087ce0e..3e4d4996374e 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
-#ifndef __NOSPEC_BRANCH_H__
-#define __NOSPEC_BRANCH_H__
+#ifndef __X86_ASM_NOSPEC_BRANCH_H__
+#define __X86_ASM_NOSPEC_BRANCH_H__
 
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
@@ -53,6 +53,9 @@
 
 #ifdef __ASSEMBLY__
 
+#include <asm/bitsperlong.h>
+
+
 /*
  * This should be used immediately before a retpoline alternative.  It tells
  * objtool where the retpolines are so that it can make sense of the control
@@ -121,6 +124,29 @@
 #endif
 .endm
 
+.macro  ___FILL_RETURN_BUFFER reg:req nr:req sp:req
+	mov	(\nr / 2), \reg
+	.align 16
+771:
+	call	772f
+773:						/* speculation trap */
+	pause
+	lfence
+	jmp	773b
+	.align 16
+772:
+	call	774f
+775:						/* speculation trap */
+	pause
+	lfence
+	jmp	775b
+	.align 16
+774:
+	dec	\reg
+	jnz	771b
+	add	(BITS_PER_LONG /8) * \nr, \sp
+.endm
+
  /*
   * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
   * monstrosity above, manually.
@@ -206,17 +232,14 @@ extern char __indirect_thunk_end[];
 static inline void vmexit_fill_RSB(void)
 {
 #ifdef CONFIG_RETPOLINE
-	unsigned long loops;
-
-	asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
-		      ALTERNATIVE("jmp 910f",
-				  __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
-				  X86_FEATURE_RETPOLINE)
-		      "910:"
-		      : "=r" (loops), ASM_CALL_CONSTRAINT
-		      : : "memory" );
+	alternative_input("", "call __fill_rsb_clobber_ax", X86_FEATURE_RETPOLINE, ASM_NO_INPUT_CLOBBER("memory"));
 #endif
 }
 
+static inline void indirect_branch_prediction_barrier(void)
+{
+	alternative_input("", "call __ibp_barrier", X86_FEATURE_IBPB, ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+}
+
 #endif /* __ASSEMBLY__ */
-#endif /* __NOSPEC_BRANCH_H__ */
+#endif /* __X86_ASM_NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d3a67fba200a..624bc37dc696 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -971,4 +971,9 @@ bool xen_set_default_idle(void);
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
+
+#ifdef CONFIG_RETPOLINE
+asmlinkage void __fill_rsb_clobber_ax(void);
+void __ibp_barrier(void);
+#endif
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..35f60e5b93bf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -249,6 +249,10 @@ static void __init spectre_v2_select_mitigation(void)
 		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
 		pr_info("Filling RSB on context switch\n");
 	}
+
+	setup_force_cpu_cap(X86_FEATURE_IBPB);
+
+	indirect_branch_prediction_barrier();
 }
 
 #undef pr_fmt
@@ -281,3 +285,10 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 	return sprintf(buf, "%s\n", spectre_v2_strings[spectre_v2_enabled]);
 }
 #endif
+
+#ifdef CONFIG_RETPOLINE
+void __ibp_barrier(void)
+{
+	__wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
+}
+#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f23934bbaf4e..69a473919260 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -27,6 +27,7 @@ lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 lib-$(CONFIG_RETPOLINE) += retpoline.o
+OBJECT_FILES_NON_STANDARD_retpoline.o :=y
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
 
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index dfb2ba91b670..62eaa8f80d30 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -47,3 +47,7 @@ GENERATE_THUNK(r13)
 GENERATE_THUNK(r14)
 GENERATE_THUNK(r15)
 #endif
+
+ENTRY(__fill_rsb_clobber_ax)
+	___FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, %_ASM_SP
+END(__fill_rsb_clobber_ax)
-- 
2.13.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

  parent reply	other threads:[~2018-01-25 19:07 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11 21:46 [PATCH v8 00/12] Retpoline: Avoid speculative indirect calls in kernel David Woodhouse
2018-01-11 21:46 ` [PATCH v8 01/12] objtool: Detect jumps to retpoline thunks David Woodhouse
2018-01-11 23:22   ` [tip:x86/pti] " tip-bot for Josh Poimboeuf
2018-01-11 21:46 ` [PATCH v8 02/12] objtool: Allow alternatives to be ignored David Woodhouse
2018-01-11 23:22   ` [tip:x86/pti] " tip-bot for Josh Poimboeuf
2018-01-18 19:09   ` [v8,02/12] " Guenter Roeck
2018-01-18 19:33     ` Josh Poimboeuf
2018-01-18 19:41       ` Guenter Roeck
2018-01-22 19:34         ` David Woodhouse
2018-01-22 20:25           ` Guenter Roeck
2018-01-22 20:27             ` David Woodhouse
2018-01-28 21:06             ` Josh Poimboeuf
2018-01-29  1:26               ` Guenter Roeck
2018-01-29 17:15               ` Guenter Roeck
2018-01-29 17:30                 ` Josh Poimboeuf
2018-01-22 19:27       ` Guenter Roeck
2018-01-11 21:46 ` [PATCH v8 03/12] x86/retpoline: Add initial retpoline support David Woodhouse
2018-01-11 23:23   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 23:58   ` [PATCH v8 03/12] " Tom Lendacky
2018-01-12 10:28     ` David Woodhouse
2018-01-12 14:02       ` Tom Lendacky
2018-01-14 15:02   ` Borislav Petkov
2018-01-14 15:53     ` Josh Poimboeuf
2018-01-14 15:59       ` Borislav Petkov
2018-01-11 21:46 ` [PATCH v8 04/12] x86/spectre: Add boot time option to select Spectre v2 mitigation David Woodhouse
2018-01-11 23:23   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-23 22:40   ` [PATCH v8 04/12] " Borislav Petkov
2018-01-23 22:53     ` David Woodhouse
2018-01-23 23:05       ` Andi Kleen
2018-01-23 22:55     ` Jiri Kosina
2018-01-23 23:05       ` Borislav Petkov
2018-01-24  0:32         ` Kees Cook
2018-01-24  9:58           ` Borislav Petkov
2018-01-23 23:06       ` Jiri Kosina
2018-01-23 23:21       ` Andi Kleen
2018-01-23 23:24         ` Jiri Kosina
2018-01-23 23:45           ` Andi Kleen
2018-01-23 23:49             ` Jiri Kosina
2018-01-24  4:26               ` Greg Kroah-Hartman
2018-01-24  9:56                 ` Jiri Kosina
2018-01-24 13:58                   ` Greg Kroah-Hartman
2018-01-24 14:03                     ` Jiri Kosina
2018-01-24 14:22                       ` Greg Kroah-Hartman
2018-01-11 21:46 ` [PATCH v8 05/12] x86/retpoline/crypto: Convert crypto assembler indirect jumps David Woodhouse
2018-01-11 23:24   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 06/12] x86/retpoline/entry: Convert entry " David Woodhouse
2018-01-11 23:24   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 07/12] x86/retpoline/ftrace: Convert ftrace " David Woodhouse
2018-01-11 23:25   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 08/12] x86/retpoline/hyperv: Convert " David Woodhouse
2018-01-11 23:25   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 09/12] x86/retpoline/xen: Convert Xen hypercall " David Woodhouse
2018-01-11 23:25   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 10/12] x86/retpoline/checksum32: Convert assembler " David Woodhouse
2018-01-11 23:26   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 21:46 ` [PATCH v8 11/12] x86/retpoline/irq32: " David Woodhouse
2018-01-11 23:26   ` [tip:x86/pti] " tip-bot for Andi Kleen
2018-01-11 21:46 ` [PATCH v8 12/12] x86/retpoline: Fill return stack buffer on vmexit David Woodhouse
2018-01-11 23:27   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-11 23:51   ` [PATCH v8 12/12] " Andi Kleen
2018-01-12 11:11     ` [PATCH v8.1 " David Woodhouse
2018-01-12 11:15       ` Thomas Gleixner
2018-01-12 11:21         ` Woodhouse, David
2018-01-12 11:37       ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-01-14 14:50         ` Borislav Petkov
2018-01-14 15:28           ` Thomas Gleixner
2018-01-14 15:35         ` Borislav Petkov
2018-01-25 12:07         ` Borislav Petkov
2018-01-25 12:20           ` David Woodhouse
2018-01-25 12:45             ` Borislav Petkov
2018-01-25 15:10               ` Josh Poimboeuf
2018-01-25 15:51                 ` Borislav Petkov
2018-01-25 16:03                   ` David Woodhouse
2018-01-25 16:56                     ` Josh Poimboeuf
2018-01-25 17:00                       ` David Woodhouse
2018-01-25 17:05                         ` Andy Lutomirski
2018-01-25 17:44                           ` Josh Poimboeuf
2018-01-25 18:41                           ` Jiri Kosina
2018-01-25 17:10                         ` Thomas Gleixner
2018-01-25 17:32                         ` Josh Poimboeuf
2018-01-25 17:53                         ` Borislav Petkov
2018-01-25 18:04                           ` David Woodhouse
2018-01-25 18:32                             ` Josh Poimboeuf
2018-01-25 19:07                             ` Borislav Petkov [this message]
2018-01-25 19:10                               ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125190729.jxey2k44sptuk42t@pd.tnic \
    --to=bp@alien8.de \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=gregkh@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.