All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/38] x86/retbleed: Call depth tracking mitigation
@ 2022-07-16 23:17 Thomas Gleixner
  2022-07-16 23:17 ` [patch 01/38] x86/paravirt: Ensure proper alignment Thomas Gleixner
                   ` (42 more replies)
  0 siblings, 43 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

Folks!

Back in the good old spectre v2 days (2018) we decided to not use
IBRS. In hindsight this might have been the wrong decision because it did
not force people to come up with alternative approaches.

It was already discussed back then to try software based call depth
accounting and RSB stuffing on underflow for Intel SKL[-X] systems to avoid
the insane overhead of IBRS.

This has been tried in 2018 and was rejected due to the massive overhead
and other shortcomings of the approach to put the accounting into each
function prologue:

  1) Text size increase which is inflicted on everyone.  While CPUs are
     good in ignoring NOPs they still pollute the I-cache.

  2) That results in tail call over-accounting which can be exploited.

Disabling tail calls is not an option either and adding a 10 byte padding
in front of every direct call is even worse in terms of text size and
I-cache impact. We also could patch calls past the accounting in the
function prologue but that becomes a nightmare vs. ENDBR.

As IBRS is a performance horror show, Peter Zijstra and me revisited the
call depth tracking approach and implemented it in a way which is hopefully
more palatable and avoids the downsides of the original attempt.

We both unsurprisingly hate the result with a passion...

The way we approached this is:

 1) objtool creates a list of function entry points and a list of direct
    call sites into new sections which can be discarded after init.

 2) On affected machines, use the new sections, allocate module memory
    and create a call thunk per function (16 bytes without
    debug/statistics). Then patch all direct calls to invoke the thunk,
    which does the call accounting and then jumps to the original call
    site.

 3) Utilize the retbleed return thunk mechanism by making the jump
    target run-time configurable. Add the accounting counterpart and
    stuff RSB on underflow in that alternate implementation.

This does not need a new compiler and avoids almost all overhead for
non-affected machines. It can be selected via 'retbleed=stuff' on the
kernel command line.

The memory consumption is impressive. On a affected server with a Debian
config this results in about 1.8MB call thunk memory and 2MB btree memory
to keep track of the thunks. The call thunk memory is overcommitted due to
the way how objtool collects symbols. This probably could be cut in half,
but we need to allocate a 2MB region anyway due to the following:

Our initial approach of just using module_alloc() turned out to create a
massive ITLB miss rate, which is not surprising as each call goes out to a
randomly distributed call thunk. Peter added a method to allocate with
large 2MB TLBs which made that problem mostly go away and gave us another
nice performance gain. It obviously does not reduce ICache overhead. It
neither cures the problem that the ITLB cache has one slot permanently
occupied by the call-thunk mapping.

The theory behind call depth tracking is:

RSB is a stack with depth 16 which is filled on every call. On the return
path speculation "pops" entries to speculate down the call chain. Once the
speculative RSB is empty it switches to other predictors, e.g. the Branch
History Buffer, which can be mistrained by user space and misguides the
speculation path to a disclosure gadget as described in the retbleed paper.

Call depth tracking is designed to break this speculation path by stuffing
speculation trap calls into the RSB which are never getting a corresponding
return executed. This stalls the prediction path until it gets resteered,

The assumption is that stuffing at the 12th return is sufficient to break
the speculation before it hits the underflow and the fallback to the other
predictors. Testing confirms that it works. Johannes, one of the retbleed
researchers. tried to attack this approach and confirmed that it brings
the signal to noise ratio down to the crystal ball level.

There is obviously no scientific proof that this will withstand future
research progress, but all we can do right now is to speculate about that.

So much for the theory. Here are numbers for various tests I did myself and
FIO results provided by Tim Chen.

Just to make everyone feel as bad as I feel, this comes with two overhead
values: one against mitigations=off and one against retbleed=off.

hackbench	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	   24.41s			  0.00%		-15.26%
retbleed=off	   28.80s	 		+18.00%		  0.00%
retbleed=ibrs	   34.92s	 		+43.07%		+21.24%
retbleed=stuff	   31.95s	 		+30.91%		+10.94%
----------------------------------------------------------------------------
				
sys_time loop	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	   1.28s			  0.00%		-78.66%
retbleed=off	   5.98s		       +368.50%		  0.00%
retbleed=ibrs	   9.73s		       +662.81%		+62.82%
retbleed=stuff	   6.15s		       +381.93%		 +2.87%
----------------------------------------------------------------------------
				
kernel build	       		Baseline: mitigations=off       retbleed=off
----------------------------------------------------------------------------
mitigations=off	   197.55s			  0.00%		 -3.64%
retbleed=off	   205.02s	  		 +3.78%		  0.00%
retbleed=ibrs	   218.00s			+10.35%		 +6.33%
retbleed=stuff	   209.92s	  		 +6.26%		 +2.39%
----------------------------------------------------------------------------
				
microbench deep
callchain	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	   1.72s			  0.00%		-39.88%
retbleed=off	   2.86s	 	        +66.34%		  0.00%
retbleed=ibrs	   3.92s		       +128.23%		+37.20%
retbleed=stuff	   3.39s		        +97.05%		+18.46%
----------------------------------------------------------------------------

fio rand-read	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	  352.7 kIOPS			  0.00%		 +3.42%
retbleed=off	  341.0 kIOPS			 -3.31%		  0.00%
retbleed=ibrs	  242.0 kIOPS			-31.38%		-29.03%
retbleed=stuff	  288.3	kIOPS			-18.24%		-15.44%
----------------------------------------------------------------------------

fio 70/30	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	  349.0 kIOPS			  0.00%		+10.49%
retbleed=off	  315.9 kIOPS			 -9.49%		  0.00%
retbleed=ibrs	  241.5 kIOPS			-30.80%		-23.54%
retbleed=stuff	  276.2	kIOPS			-20.86%		-12.56%
----------------------------------------------------------------------------

fio write	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	  299.3 kIOPS			  0.00%		+10.49%
retbleed=off	  275.6 kIOPS			 -9.49%		  0.00%
retbleed=ibrs	  232.7 kIOPS			-22.27%		-15.60%
retbleed=stuff	  259.3	kIOPS			-13.36%		 -5.93%
----------------------------------------------------------------------------

Sockperf 14 byte
localhost	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	  4.495 MBps			  0.00%	        +33.19%
retbleed=off	  3.375	MBps		        -24.92%	  	  0.00%
retbleed=ibrs	  2.573	MBps		        -42.76%	        -23.76%
retbleed=stuff	  2.725	MBps		        -39.38%	        -19.26%
----------------------------------------------------------------------------
				
Sockperf 1472 byte
localhost	       		Baseline: mitigations=off	retbleed=off
----------------------------------------------------------------------------
mitigations=off	  425.494 MBps			  0.00%		+26.21%
retbleed=off	  337.132 MBps		        -20.77%		  0.00%
retbleed=ibrs	  261.243 MBps		        -38.60%		-22.51%
retbleed=stuff	  275.111 MBps		        -35.34%		-18.40%
----------------------------------------------------------------------------

The micro benchmark is just an artificial 30 calls deep call-chain through
a syscall with empty functions to measure the overhead. But it's also
interestingly close to the pathological FIO random read case. It just
emphasizes the overhead slightly more. I wrote that because I was not able
to take real FIO numbers as any variant was close to the max-out point of
that not so silly fast SSD/NVME in my machine.

So the benefit varies depending on hardware and workload scenarios. At
least it does not get worse than IBeeRS.

The whole lot is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git depthtracking

The ftrace and BPF changes need some extra scrunity from the respective
experts.

Peter and me went great length to analyze the overhead. Unsurprisingly the
call thunks are contributing the largest amount after the actual stuffing
in the return thunk. The call thunks are creating trouble in various ways:

 1) An extra ITLB which has been very prominent in our first experiments
    where we had 4k PTEs instead of the 2M mapping which we use now.
    But this still causes ITLB misses in certain workloads

 2) Icache footprint is obviously larger and the out of line call-thunks
    are not prefetcher friendly.

 3) The extra jump adds overhead for predictors and branch history

A potential solution for this is to let the compiler add a 16 byte padding
in front of every function. That allows to bring the call thunk into
ITLB/Icache locality and avoids the extra jump.

Obviously this would only affect SKL variants and no other machine would go
through that code (which is just padding in the compiled binary). Contrary
to adding this to the prologue and NOPing it out, this avoids the ICache
overhead for the non SKL case completely.

The function alignment option does not work for that because it just
guarantees that the next function entry is aligned, but the padding size
depends on the position of the last instruction of the previous function
which might be anything between 0 and padsize-1 obviously, which is not a
good starting point to put 10 bytes of accounting code into it reliably.

I hacked up GCC to emit such padding and from first experimentation it
brings quite some performance back.

           	      	 IBRS	    stuff       stuff(pad)
sockperf 14   bytes: 	 -23.76%    -19.26%     -14.31%
sockperf 1472 bytes: 	 -22.51%    -18.40%     -12.25%
microbench:   	     	 +37.20%    +18.46%     +15.47%    
hackbench:	     	 +21.24%    +10.94%     +10.12%

For FIO I don't have numbers yet, but I expect FIO to get a significant
gain too.

>From a quick survey it seems to have no impact for the case where the
thunks are not used. But that really needs some deep investigation and
there is a potential conflict with the clang CFI efforts.

The kernel text size increases with a Debian config from 9.9M to 10.4M, so
about 5%. If the thunk is not 16 byte aligned, the text size increase is
about 3%, but it turned out that 16 byte aligned is slightly faster.

The 16 byte function alignment turned out to be beneficial in general even
without the thunks. Not much of an improvement, but measurable. We should
revisit this independent of these horrors.

The implementation falls back to the allocated thunks when padding is not
available. I'll send out the GCC patch and the required kernel patch as a
reply to this series after polishing it a bit.


We also evaluated another approach to solve this problem, which does not
involve call thunks and solely relies on return thunks. The idea is to have
a return thunk which is not reliably predictable. That option would be
"retbleed=confuse". It looks like this:

retthunk:
	test	$somebit, something
	jz	1f
	ret
	int3
1:
	test	$someotherbit, something
	jnz	2f
	ret
	int3
2:
	....

There are two questions to this approach:

 1) How does $something have enough entropy to be effective

    Ideally we can use a register for that, because the frequent memory
    operation in the return thunk has obviously a significant performance
    penalty.
    
    In combination with stack randomization the RSP looks like a feasible
    option, but for kernel centric workloads that does not create much
    entropy.

    Though keep in mind that the attacks rely on syscalls and observing
    their side effects.

    Which in turn means that entry stack randomization is a good entropy
    source. But it's easy enough to create a scenario where the attacking
    thread spins and relies on interrupts to cause the kernel entry.

    Interrupt entry from user-space does not do kernel task stack
    randomization today, but that's easy enough to fix.

    If we can't use RSP because it turns out to be too predictable, then we
    can still create entropy in a per CPU variable, but RSP is definitely
    faster than a memory access.

 2) How "easy" is it to circumvent the "randomization"

    It makes it impossible for the occasional exploit writer like myself
    and probably for the vast majority of script kiddies, but we really
    need input from the researchers who seem to have way more nasty ideas
    than we can imagine.

    I asked some of those folks, but the answers were not conclusive at
    all. There are some legitimate concerns, but the question is whether
    they are real threats or purely academic problems (at the moment).

    That retthunk "randomization" can easily be extended to have several
    return thunks and they can be assigned during patching randomly. As
    that's a boot time decision it's possible to figure it out, but in the
    worst case we can randomly reassign them once in a while.

But also this approach has quite some performance impact:

For 2 RET pathes randomized with randomize_kstack_offset=y and RSP bit 3:

          	  	IBRS       stuff	 stuff(pad)    confuse
  microbench:	       	+37.20%	   +18.46%	 +15.47%       +6.76%	 
  sockperf 14   bytes: 	-23.76%	   -19.26% 	 -14.31%       -9.04%
  sockperf 1472 bytes: 	-22.51%	   -18.40% 	 -12.25%       -7.51%

For 3 RET pathes randomized with randomize_kstack_offset=y and RSP bit 3, 6:

          	  	IBRS       stuff	 stuff(pad)    confuse
  microbench:	       	+37.20%	   +18.46%	 +15.47%       +7.12%	 
  sockperf 14   bytes: 	-23.76%	   -19.26% 	 -14.31%      -12.03%
  sockperf 1472 bytes: 	-22.51%	   -18.40% 	 -12.25%      -10.49%

For 4 RET pathes randomized with randomize_kstack_offset=y and RSP bit 3, 6, 5:

          	  	IBRS       stuff	 stuff(pad)    confuse
  microbench:	       	+37.20%	   +18.46%	 +15.47%       +7.46%	 
  sockperf 14   bytes: 	-23.76%	   -19.26% 	 -14.31%      -16.80%
  sockperf 1472 bytes: 	-22.51%	   -18.40% 	 -12.25%      -15.95%

So for the more randomized variant sockperf tanks and is already slower
than stuffing with thunks in the compiler provided padding space.

I send out a patch in reply to this series which implements that variant,
but there needs to be input from the security researchers how protective
this is. If we could get away with 2 RET pathes (perhaps multiple instances
with different bits), that would be amazing.

Thanks go to Andrew Cooper for helpful discussions around this, Johannes
Wikner for spending the effort of trying to break the stuffing defense and
to Tim Chen for getting the FIO numbers to us.

And of course many thanks to Peter for working on this with me. We went
through quite some mispredicted branches and had to retire some "brilliant"
ideas on the way.

Thanks,

	tglx
---
 arch/x86/Kconfig                            |   37 +
 arch/x86/entry/entry_64.S                   |   26 
 arch/x86/entry/vdso/Makefile                |   11 
 arch/x86/include/asm/alternative.h          |   69 ++
 arch/x86/include/asm/cpufeatures.h          |    1 
 arch/x86/include/asm/disabled-features.h    |    9 
 arch/x86/include/asm/module.h               |    2 
 arch/x86/include/asm/nospec-branch.h        |  166 +++++
 arch/x86/include/asm/paravirt.h             |    5 
 arch/x86/include/asm/paravirt_types.h       |   21 
 arch/x86/include/asm/processor.h            |    1 
 arch/x86/include/asm/text-patching.h        |    2 
 arch/x86/kernel/Makefile                    |    2 
 arch/x86/kernel/alternative.c               |  114 +++
 arch/x86/kernel/callthunks.c                |  799 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/bugs.c                  |   32 +
 arch/x86/kernel/cpu/common.c                |   17 
 arch/x86/kernel/ftrace.c                    |   20 
 arch/x86/kernel/ftrace_64.S                 |   31 +
 arch/x86/kernel/kprobes/core.c              |    1 
 arch/x86/kernel/module.c                    |   86 ---
 arch/x86/kernel/static_call.c               |    3 
 arch/x86/kernel/unwind_orc.c                |   21 
 arch/x86/kernel/vmlinux.lds.S               |   14 
 arch/x86/lib/retpoline.S                    |  106 +++
 arch/x86/mm/Makefile                        |    2 
 arch/x86/mm/module_alloc.c                  |   68 ++
 arch/x86/net/bpf_jit_comp.c                 |   56 +
 include/linux/btree.h                       |    6 
 include/linux/kallsyms.h                    |   24 
 include/linux/module.h                      |   72 +-
 include/linux/static_call.h                 |    2 
 include/linux/vmalloc.h                     |    3 
 init/main.c                                 |    2 
 kernel/kallsyms.c                           |   23 
 kernel/kprobes.c                            |   52 +
 kernel/module/internal.h                    |    8 
 kernel/module/main.c                        |    6 
 kernel/module/tree_lookup.c                 |   17 
 kernel/static_call_inline.c                 |   23 
 kernel/trace/trace_selftest.c               |    5 
 lib/btree.c                                 |    8 
 mm/vmalloc.c                                |   33 -
 samples/ftrace/ftrace-direct-modify.c       |    2 
 samples/ftrace/ftrace-direct-multi-modify.c |    2 
 samples/ftrace/ftrace-direct-multi.c        |    1 
 samples/ftrace/ftrace-direct-too.c          |    1 
 samples/ftrace/ftrace-direct.c              |    1 
 scripts/Makefile.lib                        |    3 
 tools/objtool/arch/x86/decode.c             |   26 
 tools/objtool/builtin-check.c               |    7 
 tools/objtool/check.c                       |  160 ++++-
 tools/objtool/include/objtool/arch.h        |    4 
 tools/objtool/include/objtool/builtin.h     |    1 
 tools/objtool/include/objtool/check.h       |   20 
 tools/objtool/include/objtool/elf.h         |    2 
 tools/objtool/include/objtool/objtool.h     |    1 
 tools/objtool/objtool.c                     |    1 
 58 files changed, 1970 insertions(+), 268 deletions(-)

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 01/38] x86/paravirt: Ensure proper alignment
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross

The entries in the .parainstr sections are 8 byte aligned and the
corresponding C struct makes the array offset 16 bytes.

Though the pushed entries are only using 12 bytes. .parainstr_end is
therefore 4 bytes short.

That works by chance because it's only used in a loop:

     for (p = start; p < end; p++)

But this falls flat when calculating the number of elements:

    n = end - start

That's obviously off by one.

Ensure that the gap is filled and the last entry is occupying 16 bytes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h       |    1 +
 arch/x86/include/asm/paravirt_types.h |    1 +
 2 files changed, 2 insertions(+)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -743,6 +743,7 @@ extern void default_banner(void);
 	 word 771b;				\
 	 .byte ptype;				\
 	 .byte 772b-771b;			\
+	 _ASM_ALIGN;				\
 	.popsection
 
 
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -294,6 +294,7 @@ extern struct paravirt_patch_template pv
 	"  .byte " type "\n"				\
 	"  .byte 772b-771b\n"				\
 	"  .short " clobber "\n"			\
+	_ASM_ALIGN "\n"					\
 	".popsection\n"
 
 /* Generate patchable code, with the default asm parameters. */


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
  2022-07-16 23:17 ` [patch 01/38] x86/paravirt: Ensure proper alignment Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-17  0:22   ` Andrew Cooper
                     ` (4 more replies)
  2022-07-16 23:17 ` [patch 03/38] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc() Thomas Gleixner
                   ` (40 subsequent siblings)
  42 siblings, 5 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

load_percpu_segment() is using wrmsr() which is paravirtualized. That's an
issue because the code sequence is:

        __loadsegment_simple(gs, 0);
	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));

So anything which uses a per CPU variable between setting GS to 0 and
writing GSBASE is going to end up in a NULL pointer dereference. That's
can be triggered with instrumentation and is guaranteed to be triggered
with callthunks for call depth tracking.

Use native_wrmsrl() instead. XEN_PV will trap and emulate, but that's not a
hot path.

Also make it static and mark it noinstr so neither kprobes, sanitizers or
whatever can touch it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    1 -
 arch/x86/kernel/cpu/common.c     |   12 ++++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -673,7 +673,6 @@ extern struct desc_ptr		early_gdt_descr;
 extern void switch_to_new_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
-extern void load_percpu_segment(int);
 extern void cpu_init(void);
 extern void cpu_init_secondary(void);
 extern void cpu_init_exception_handling(void);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,13 +701,21 @@ static const char *table_lookup_model(st
 __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
-void load_percpu_segment(int cpu)
+static noinstr void load_percpu_segment(int cpu)
 {
 #ifdef CONFIG_X86_32
 	loadsegment(fs, __KERNEL_PERCPU);
 #else
 	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
+	/*
+	 * Because of the __loadsegment_simple(gs, 0) above, any GS-prefixed
+	 * instruction will explode right about here. As such, we must not have
+	 * any CALL-thunks using per-cpu data.
+	 *
+	 * Therefore, use native_wrmsrl() and have XenPV take the fault and
+	 * emulate.
+	 */
+	native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
 #endif
 }
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 03/38] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc()
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
  2022-07-16 23:17 ` [patch 01/38] x86/paravirt: Ensure proper alignment Thomas Gleixner
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 04/38] x86/vdso: Ensure all kernel code is seen by objtool Thomas Gleixner
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Instead of resetting permissions all over the place when freeing module
memory tell the vmalloc code to do so. Avoids the exercise for the next
upcoming user.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/ftrace.c       |    2 --
 arch/x86/kernel/kprobes/core.c |    1 -
 arch/x86/kernel/module.c       |    9 +++++----
 3 files changed, 5 insertions(+), 7 deletions(-)

--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -412,8 +412,6 @@ create_trampoline(struct ftrace_ops *ops
 	/* ALLOC_TRAMP flags lets us know we created it */
 	ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
 
-	set_vm_flush_reset_perms(trampoline);
-
 	if (likely(system_state != SYSTEM_BOOTING))
 		set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -416,7 +416,6 @@ void *alloc_insn_page(void)
 	if (!page)
 		return NULL;
 
-	set_vm_flush_reset_perms(page);
 	/*
 	 * First make the page read-only, and only then make it executable to
 	 * prevent it from being W+X in between.
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -74,10 +74,11 @@ void *module_alloc(unsigned long size)
 		return NULL;
 
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
-				    MODULES_VADDR + get_module_load_offset(),
-				    MODULES_END, gfp_mask,
-				    PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
-				    __builtin_return_address(0));
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+
 	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
 		vfree(p);
 		return NULL;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 04/38] x86/vdso: Ensure all kernel code is seen by objtool
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (2 preceding siblings ...)
  2022-07-16 23:17 ` [patch 03/38] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc() Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 05/38] btree: Initialize early when builtin Thomas Gleixner
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

extable.c is kernel code and not part of the VDSO

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/vdso/Makefile |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -30,11 +30,12 @@ vobjs32-y += vdso32/vclock_gettime.o
 vobjs-$(CONFIG_X86_SGX)	+= vsgx.o
 
 # files to link into kernel
-obj-y				+= vma.o extable.o
-KASAN_SANITIZE_vma.o		:= y
-UBSAN_SANITIZE_vma.o		:= y
-KCSAN_SANITIZE_vma.o		:= y
-OBJECT_FILES_NON_STANDARD_vma.o	:= n
+obj-y					+= vma.o extable.o
+KASAN_SANITIZE_vma.o			:= y
+UBSAN_SANITIZE_vma.o			:= y
+KCSAN_SANITIZE_vma.o			:= y
+OBJECT_FILES_NON_STANDARD_vma.o		:= n
+OBJECT_FILES_NON_STANDARD_extable.o	:= n
 
 # vDSO images to build
 vdso_img-$(VDSO64-y)		+= 64


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 05/38] btree: Initialize early when builtin
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (3 preceding siblings ...)
  2022-07-16 23:17 ` [patch 04/38] x86/vdso: Ensure all kernel code is seen by objtool Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 06/38] objtool: Allow GS relative relocs Thomas Gleixner
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

An upcoming user of btree needs it early on. Initialize it in
start_kernel().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/btree.h |    6 ++++++
 init/main.c           |    2 ++
 lib/btree.c           |    8 +++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

--- a/include/linux/btree.h
+++ b/include/linux/btree.h
@@ -5,6 +5,12 @@
 #include <linux/kernel.h>
 #include <linux/mempool.h>
 
+#if IS_BUILTIN(CONFIG_BTREE)
+extern void btree_cache_init(void);
+#else
+static inline void btree_cache_init(void) {}
+#endif
+
 /**
  * DOC: B+Tree basics
  *
--- a/init/main.c
+++ b/init/main.c
@@ -75,6 +75,7 @@
 #include <linux/signal.h>
 #include <linux/idr.h>
 #include <linux/kgdb.h>
+#include <linux/btree.h>
 #include <linux/ftrace.h>
 #include <linux/async.h>
 #include <linux/shmem_fs.h>
@@ -1125,6 +1126,7 @@ asmlinkage __visible void __init __no_sa
 	cgroup_init();
 	taskstats_init_early();
 	delayacct_init();
+	btree_cache_init();
 
 	poking_init();
 	check_bugs();
--- a/lib/btree.c
+++ b/lib/btree.c
@@ -787,15 +787,21 @@ static int __init btree_module_init(void
 	return 0;
 }
 
+#if IS_MODULE(CONFIG_BTREE)
 static void __exit btree_module_exit(void)
 {
 	kmem_cache_destroy(btree_cachep);
 }
 
-/* If core code starts using btree, initialization should happen even earlier */
 module_init(btree_module_init);
 module_exit(btree_module_exit);
 
 MODULE_AUTHOR("Joern Engel <joern@logfs.org>");
 MODULE_AUTHOR("Johannes Berg <johannes@sipsolutions.net>");
 MODULE_LICENSE("GPL");
+#else
+void __init btree_cache_init(void)
+{
+	BUG_ON(btree_module_init());
+}
+#endif


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 06/38] objtool: Allow GS relative relocs
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (4 preceding siblings ...)
  2022-07-16 23:17 ` [patch 05/38] btree: Initialize early when builtin Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 07/38] objtool: Track init section Thomas Gleixner
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Objtool doesn't currently much like per-cpu usage in alternatives:

arch/x86/entry/entry_64.o: warning: objtool: .altinstr_replacement+0xf: unsupported relocation in alternatives section
  f:   65 c7 04 25 00 00 00 00 00 00 00 80     movl   $0x80000000,%gs:0x0      13: R_X86_64_32S        __x86_call_depth

Allow this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 tools/objtool/arch/x86/decode.c       |   26 +++++++++++++++++++++-----
 tools/objtool/check.c                 |    6 ++----
 tools/objtool/include/objtool/arch.h  |    4 +---
 tools/objtool/include/objtool/check.h |   20 +++++++++++---------
 4 files changed, 35 insertions(+), 21 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -103,24 +103,37 @@ unsigned long arch_jump_destination(stru
 #define rm_is_mem(reg)	(mod_is_mem() && !is_RIP() && rm_is(reg))
 #define rm_is_reg(reg)	(mod_is_reg() && modrm_rm == (reg))
 
-static bool has_notrack_prefix(struct insn *insn)
+static bool has_prefix(struct insn *insn, u8 prefix)
 {
 	int i;
 
 	for (i = 0; i < insn->prefixes.nbytes; i++) {
-		if (insn->prefixes.bytes[i] == 0x3e)
+		if (insn->prefixes.bytes[i] == prefix)
 			return true;
 	}
 
 	return false;
 }
 
+static bool has_notrack_prefix(struct insn *insn)
+{
+	return has_prefix(insn, 0x3e);
+}
+
+static bool has_gs_prefix(struct insn *insn)
+{
+	return has_prefix(insn, 0x65);
+}
+
 int arch_decode_instruction(struct objtool_file *file, const struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
-			    unsigned int *len, enum insn_type *type,
-			    unsigned long *immediate,
-			    struct list_head *ops_list)
+			    struct instruction *instruction)
 {
+	struct list_head *ops_list = &instruction->stack_ops;
+	unsigned long *immediate = &instruction->immediate;
+	enum insn_type *type = &instruction->type;
+	unsigned int *len = &instruction->len;
+
 	const struct elf *elf = file->elf;
 	struct insn insn;
 	int x86_64, ret;
@@ -149,6 +162,9 @@ int arch_decode_instruction(struct objto
 	if (insn.vex_prefix.nbytes)
 		return 0;
 
+	if (has_gs_prefix(&insn))
+		instruction->alt_reloc_safe = 1;
+
 	prefix = insn.prefixes.bytes[0];
 
 	op1 = insn.opcode.bytes[0];
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -396,9 +396,7 @@ static int decode_instructions(struct ob
 
 			ret = arch_decode_instruction(file, sec, offset,
 						      sec->sh.sh_size - offset,
-						      &insn->len, &insn->type,
-						      &insn->immediate,
-						      &insn->stack_ops);
+						      insn);
 			if (ret)
 				goto err;
 
@@ -1620,7 +1618,7 @@ static int handle_group_alt(struct objto
 		 * accordingly.
 		 */
 		alt_reloc = insn_reloc(file, insn);
-		if (alt_reloc &&
+		if (alt_reloc && !insn->alt_reloc_safe &&
 		    !arch_support_alt_relocation(special_alt, insn, alt_reloc)) {
 
 			WARN_FUNC("unsupported relocation in alternatives section",
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -73,9 +73,7 @@ void arch_initial_func_cfi_state(struct
 
 int arch_decode_instruction(struct objtool_file *file, const struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
-			    unsigned int *len, enum insn_type *type,
-			    unsigned long *immediate,
-			    struct list_head *ops_list);
+			    struct instruction *insn);
 
 bool arch_callee_saved_reg(unsigned char reg);
 
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -47,15 +47,17 @@ struct instruction {
 	unsigned long immediate;
 
 	u16 dead_end		: 1,
-	   ignore		: 1,
-	   ignore_alts		: 1,
-	   hint			: 1,
-	   save			: 1,
-	   restore		: 1,
-	   retpoline_safe	: 1,
-	   noendbr		: 1,
-	   entry		: 1;
-		/* 7 bit hole */
+	    ignore		: 1,
+	    ignore_alts		: 1,
+	    hint		: 1,
+	    save		: 1,
+	    restore		: 1,
+	    retpoline_safe	: 1,
+	    noendbr		: 1,
+	    entry		: 1,
+	    alt_reloc_safe	: 1;
+
+		/* 6 bit hole */
 
 	s8 instr;
 	u8 visited;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 07/38] objtool: Track init section
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (5 preceding siblings ...)
  2022-07-16 23:17 ` [patch 06/38] objtool: Allow GS relative relocs Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 08/38] objtool: Add .call_sites section Thomas Gleixner
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

For future usage of .init.text exclusion track the init section in the
instruction decoder and use the result in retpoline validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 tools/objtool/check.c               |   17 ++++++++++-------
 tools/objtool/include/objtool/elf.h |    2 +-
 2 files changed, 11 insertions(+), 8 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -380,6 +380,15 @@ static int decode_instructions(struct ob
 		    !strncmp(sec->name, ".text.__x86.", 12))
 			sec->noinstr = true;
 
+		/*
+		 * .init.text code is ran before userspace and thus doesn't
+		 * strictly need retpolines, except for modules which are
+		 * loaded late, they very much do need retpoline in their
+		 * .init.text
+		 */
+		if (!strcmp(sec->name, ".init.text") && !opts.module)
+			sec->init = true;
+
 		for (offset = 0; offset < sec->sh.sh_size; offset += insn->len) {
 			insn = malloc(sizeof(*insn));
 			if (!insn) {
@@ -3720,13 +3729,7 @@ static int validate_retpoline(struct obj
 		if (insn->retpoline_safe)
 			continue;
 
-		/*
-		 * .init.text code is ran before userspace and thus doesn't
-		 * strictly need retpolines, except for modules which are
-		 * loaded late, they very much do need retpoline in their
-		 * .init.text
-		 */
-		if (!strcmp(insn->sec->name, ".init.text") && !opts.module)
+		if (insn->sec->init)
 			continue;
 
 		if (insn->type == INSN_RETURN) {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -38,7 +38,7 @@ struct section {
 	Elf_Data *data;
 	char *name;
 	int idx;
-	bool changed, text, rodata, noinstr;
+	bool changed, text, rodata, noinstr, init;
 };
 
 struct symbol {


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 08/38] objtool: Add .call_sites section
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (6 preceding siblings ...)
  2022-07-16 23:17 ` [patch 07/38] objtool: Track init section Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 09/38] objtool: Add .sym_sites section Thomas Gleixner
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

In preparation for call depth tracking provide a section which collects all
direct calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/vmlinux.lds.S           |    7 ++++
 tools/objtool/check.c                   |   51 ++++++++++++++++++++++++++++++++
 tools/objtool/include/objtool/objtool.h |    1 
 tools/objtool/objtool.c                 |    1 
 4 files changed, 60 insertions(+)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -290,6 +290,13 @@ SECTIONS
 		*(.return_sites)
 		__return_sites_end = .;
 	}
+
+	. = ALIGN(8);
+	.call_sites : AT(ADDR(.call_sites) - LOAD_OFFSET) {
+		__call_sites = .;
+		*(.call_sites)
+		__call_sites_end = .;
+	}
 #endif
 
 #ifdef CONFIG_X86_KERNEL_IBT
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -898,6 +898,49 @@ static int create_mcount_loc_sections(st
 	return 0;
 }
 
+static int create_direct_call_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	unsigned int *loc;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".call_sites");
+	if (sec) {
+		INIT_LIST_HEAD(&file->call_list);
+		WARN("file already has .call_sites section, skipping");
+		return 0;
+	}
+
+	if (list_empty(&file->call_list))
+		return 0;
+
+	idx = 0;
+	list_for_each_entry(insn, &file->call_list, call_node)
+		idx++;
+
+	sec = elf_create_section(file->elf, ".call_sites", 0, sizeof(unsigned int), idx);
+	if (!sec)
+		return -1;
+
+	idx = 0;
+	list_for_each_entry(insn, &file->call_list, call_node) {
+
+		loc = (unsigned int *)sec->data->d_buf + idx;
+		memset(loc, 0, sizeof(unsigned int));
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(unsigned int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset))
+			return -1;
+
+		idx++;
+	}
+
+	return 0;
+}
+
 /*
  * Warnings shouldn't be reported for ignored functions.
  */
@@ -1252,6 +1295,9 @@ static void annotate_call_site(struct ob
 		return;
 	}
 
+	if (insn->type == INSN_CALL && !insn->sec->init)
+		list_add_tail(&insn->call_node, &file->call_list);
+
 	if (!sibling && dead_end_function(file, sym))
 		insn->dead_end = true;
 }
@@ -4275,6 +4321,11 @@ int check(struct objtool_file *file)
 		if (ret < 0)
 			goto out;
 		warnings += ret;
+
+		ret = create_direct_call_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
 	}
 
 	if (opts.mcount) {
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -28,6 +28,7 @@ struct objtool_file {
 	struct list_head static_call_list;
 	struct list_head mcount_loc_list;
 	struct list_head endbr_list;
+	struct list_head call_list;
 	bool ignore_unreachables, hints, rodata;
 
 	unsigned int nr_endbr;
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -106,6 +106,7 @@ struct objtool_file *objtool_open_read(c
 	INIT_LIST_HEAD(&file.static_call_list);
 	INIT_LIST_HEAD(&file.mcount_loc_list);
 	INIT_LIST_HEAD(&file.endbr_list);
+	INIT_LIST_HEAD(&file.call_list);
 	file.ignore_unreachables = opts.no_unreachable;
 	file.hints = false;
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 09/38] objtool: Add .sym_sites section
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (7 preceding siblings ...)
  2022-07-16 23:17 ` [patch 08/38] objtool: Add .call_sites section Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 10/38] objtool: Add --hacks=skylake Thomas Gleixner
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

In preparation for call depth tracking provide a section which collects all
all !init symbols to generate thunks for.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/vmlinux.lds.S |    7 +++++
 tools/objtool/check.c         |   55 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -297,6 +297,13 @@ SECTIONS
 		*(.call_sites)
 		__call_sites_end = .;
 	}
+
+	. = ALIGN(8);
+	.sym_sites : AT(ADDR(.sym_sites) - LOAD_OFFSET) {
+		__sym_sites = .;
+		*(.sym_sites)
+		__sym_sites_end = .;
+	}
 #endif
 
 #ifdef CONFIG_X86_KERNEL_IBT
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -941,6 +941,56 @@ static int create_direct_call_sections(s
 	return 0;
 }
 
+static int create_sym_thunk_sections(struct objtool_file *file)
+{
+	struct section *sec, *s;
+	struct symbol *sym;
+	unsigned int *loc;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".sym_sites");
+	if (sec) {
+		INIT_LIST_HEAD(&file->call_list);
+		WARN("file already has .sym_sites section, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	for_each_sec(file, s) {
+		if (!s->text || s->init)
+			continue;
+
+		list_for_each_entry(sym, &s->symbol_list, list)
+			idx++;
+	}
+
+	sec = elf_create_section(file->elf, ".sym_sites", 0, sizeof(unsigned int), idx);
+	if (!sec)
+		return -1;
+
+	idx = 0;
+	for_each_sec(file, s) {
+		if (!s->text || s->init)
+			continue;
+
+		list_for_each_entry(sym, &s->symbol_list, list) {
+
+			loc = (unsigned int *)sec->data->d_buf + idx;
+			memset(loc, 0, sizeof(unsigned int));
+
+			if (elf_add_reloc_to_insn(file->elf, sec,
+						idx * sizeof(unsigned int),
+						R_X86_64_PC32,
+						s, sym->offset))
+				return -1;
+
+			idx++;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * Warnings shouldn't be reported for ignored functions.
  */
@@ -4326,6 +4376,11 @@ int check(struct objtool_file *file)
 		if (ret < 0)
 			goto out;
 		warnings += ret;
+
+		ret = create_sym_thunk_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
 	}
 
 	if (opts.mcount) {


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 10/38] objtool: Add --hacks=skylake
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (8 preceding siblings ...)
  2022-07-16 23:17 ` [patch 09/38] objtool: Add .sym_sites section Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 11/38] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls Thomas Gleixner
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Make the call/func sections selectable via the --hacks option.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/Makefile.lib                    |    3 ++-
 tools/objtool/builtin-check.c           |    7 ++++++-
 tools/objtool/check.c                   |   18 ++++++++++--------
 tools/objtool/include/objtool/builtin.h |    1 +
 4 files changed, 19 insertions(+), 10 deletions(-)

--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -231,7 +231,8 @@ objtool := $(objtree)/tools/objtool/objt
 
 objtool_args =								\
 	$(if $(CONFIG_HAVE_JUMP_LABEL_HACK), --hacks=jump_label)	\
-	$(if $(CONFIG_HAVE_NOINSTR_HACK), --hacks=noinstr)		\
+	$(if $(CONFIG_HAVE_NOINSTR_HACK), --hacks=noinstr)              \
+	$(if $(CONFIG_CALL_DEPTH_TRACKING), --hacks=skylake)            \
 	$(if $(CONFIG_X86_KERNEL_IBT), --ibt)				\
 	$(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount)		\
 	$(if $(CONFIG_UNWINDER_ORC), --orc)				\
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -57,12 +57,17 @@ static int parse_hacks(const struct opti
 		found = true;
 	}
 
+	if (!str || strstr(str, "skylake")) {
+		opts.hack_skylake = true;
+		found = true;
+	}
+
 	return found ? 0 : -1;
 }
 
 const struct option check_options[] = {
 	OPT_GROUP("Actions:"),
-	OPT_CALLBACK_OPTARG('h', "hacks", NULL, NULL, "jump_label,noinstr", "patch toolchain bugs/limitations", parse_hacks),
+	OPT_CALLBACK_OPTARG('h', "hacks", NULL, NULL, "jump_label,noinstr,skylake", "patch toolchain bugs/limitations", parse_hacks),
 	OPT_BOOLEAN('i', "ibt", &opts.ibt, "validate and annotate IBT"),
 	OPT_BOOLEAN('m', "mcount", &opts.mcount, "annotate mcount/fentry calls for ftrace"),
 	OPT_BOOLEAN('n', "noinstr", &opts.noinstr, "validate noinstr rules"),
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -4372,15 +4372,17 @@ int check(struct objtool_file *file)
 			goto out;
 		warnings += ret;
 
-		ret = create_direct_call_sections(file);
-		if (ret < 0)
-			goto out;
-		warnings += ret;
+		if (opts.hack_skylake) {
+			ret = create_direct_call_sections(file);
+			if (ret < 0)
+				goto out;
+			warnings += ret;
 
-		ret = create_sym_thunk_sections(file);
-		if (ret < 0)
-			goto out;
-		warnings += ret;
+			ret = create_sym_thunk_sections(file);
+			if (ret < 0)
+				goto out;
+			warnings += ret;
+		}
 	}
 
 	if (opts.mcount) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -14,6 +14,7 @@ struct opts {
 	bool dump_orc;
 	bool hack_jump_label;
 	bool hack_noinstr;
+	bool hack_skylake;
 	bool ibt;
 	bool mcount;
 	bool noinstr;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 11/38] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (9 preceding siblings ...)
  2022-07-16 23:17 ` [patch 10/38] objtool: Add --hacks=skylake Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 12/38] x86/entry: Make sync_regs() invocation a tail call Thomas Gleixner
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

Allow STT_NOTYPE to tail-call into STT_FUNC, per definition STT_NOTYPE
is not a sub-function of the STT_FUNC.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 tools/objtool/check.c |   29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1420,6 +1420,16 @@ static void add_return_call(struct objto
 
 static bool same_function(struct instruction *insn1, struct instruction *insn2)
 {
+	if (!insn1->func && !insn2->func)
+		return true;
+
+	/* Allow STT_NOTYPE -> STT_FUNC+0 tail-calls */
+	if (!insn1->func && insn1->func != insn2->func)
+		return false;
+
+	if (!insn2->func)
+		return true;
+
 	return insn1->func->pfunc == insn2->func->pfunc;
 }
 
@@ -1537,18 +1547,19 @@ static int add_jump_destinations(struct
 			    strstr(jump_dest->func->name, ".cold")) {
 				insn->func->cfunc = jump_dest->func;
 				jump_dest->func->pfunc = insn->func;
-
-			} else if (!same_function(insn, jump_dest) &&
-				   is_first_func_insn(file, jump_dest)) {
-				/*
-				 * Internal sibling call without reloc or with
-				 * STT_SECTION reloc.
-				 */
-				add_call_dest(file, insn, jump_dest->func, true);
-				continue;
 			}
 		}
 
+		if (!same_function(insn, jump_dest) &&
+		    is_first_func_insn(file, jump_dest)) {
+			/*
+			 * Internal sibling call without reloc or with
+			 * STT_SECTION reloc.
+			 */
+			add_call_dest(file, insn, jump_dest->func, true);
+			continue;
+		}
+
 		insn->jump_dest = jump_dest;
 	}
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 12/38] x86/entry: Make sync_regs() invocation a tail call
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (10 preceding siblings ...)
  2022-07-16 23:17 ` [patch 11/38] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 13/38] x86/modules: Make module_alloc() generally available Thomas Gleixner
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

No point in having a call there. Spare the call/ret overhead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1060,11 +1060,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	UNTRAIN_RET
 
 	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
-.Lerror_entry_from_usermode_after_swapgs:
-
 	/* Put us onto the real thread stack. */
-	call	sync_regs
-	RET
+	jmp	sync_regs
 
 	/*
 	 * There are two places in the kernel that can potentially fault with
@@ -1122,7 +1119,7 @@ SYM_CODE_START_LOCAL(error_entry)
 	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 	call	fixup_bad_iret
 	mov	%rax, %rdi
-	jmp	.Lerror_entry_from_usermode_after_swapgs
+	jmp	sync_regs
 SYM_CODE_END(error_entry)
 
 SYM_CODE_START_LOCAL(error_return)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 13/38] x86/modules: Make module_alloc() generally available
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (11 preceding siblings ...)
  2022-07-16 23:17 ` [patch 12/38] x86/entry: Make sync_regs() invocation a tail call Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 14/38] x86/Kconfig: Add CONFIG_CALL_THUNKS Thomas Gleixner
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

module_alloc() allocates from the module region which is also available
when CONFIG_MODULES=n. Non module builds should be able to allocate from
that region nevertheless e.g. for creating call thunks.

Split the code out and make it possible to select for !MODULES builds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig           |    3 ++
 arch/x86/kernel/module.c   |   58 ------------------------------------------
 arch/x86/mm/Makefile       |    2 +
 arch/x86/mm/module_alloc.c |   62 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 58 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2236,6 +2236,9 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	  If unsure, leave at the default value.
 
+config MODULE_ALLOC
+	def_bool MODULES
+
 config HOTPLUG_CPU
 	def_bool y
 	depends on SMP
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -8,21 +8,14 @@
 
 #include <linux/moduleloader.h>
 #include <linux/elf.h>
-#include <linux/vmalloc.h>
 #include <linux/fs.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
-#include <linux/kasan.h>
 #include <linux/bug.h>
-#include <linux/mm.h>
-#include <linux/gfp.h>
 #include <linux/jump_label.h>
-#include <linux/random.h>
 #include <linux/memory.h>
 
 #include <asm/text-patching.h>
-#include <asm/page.h>
-#include <asm/setup.h>
 #include <asm/unwind.h>
 
 #if 0
@@ -36,57 +29,6 @@ do {							\
 } while (0)
 #endif
 
-#ifdef CONFIG_RANDOMIZE_BASE
-static unsigned long module_load_offset;
-
-/* Mutex protects the module_load_offset. */
-static DEFINE_MUTEX(module_kaslr_mutex);
-
-static unsigned long int get_module_load_offset(void)
-{
-	if (kaslr_enabled()) {
-		mutex_lock(&module_kaslr_mutex);
-		/*
-		 * Calculate the module_load_offset the first time this
-		 * code is called. Once calculated it stays the same until
-		 * reboot.
-		 */
-		if (module_load_offset == 0)
-			module_load_offset =
-				(get_random_int() % 1024 + 1) * PAGE_SIZE;
-		mutex_unlock(&module_kaslr_mutex);
-	}
-	return module_load_offset;
-}
-#else
-static unsigned long int get_module_load_offset(void)
-{
-	return 0;
-}
-#endif
-
-void *module_alloc(unsigned long size)
-{
-	gfp_t gfp_mask = GFP_KERNEL;
-	void *p;
-
-	if (PAGE_ALIGN(size) > MODULES_LEN)
-		return NULL;
-
-	p = __vmalloc_node_range(size, MODULE_ALIGN,
-				 MODULES_VADDR + get_module_load_offset(),
-				 MODULES_END, gfp_mask, PAGE_KERNEL,
-				 VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
-				 NUMA_NO_NODE, __builtin_return_address(0));
-
-	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
-		vfree(p);
-		return NULL;
-	}
-
-	return p;
-}
-
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -59,3 +59,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_enc
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
+
+obj-$(CONFIG_MODULE_ALLOC)	+= module_alloc.o
--- /dev/null
+++ b/arch/x86/mm/module_alloc.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/kasan.h>
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/mutex.h>
+#include <linux/random.h>
+#include <linux/vmalloc.h>
+
+#include <asm/setup.h>
+#include <asm/page.h>
+
+#ifdef CONFIG_RANDOMIZE_BASE
+static unsigned long module_load_offset;
+
+/* Mutex protects the module_load_offset. */
+static DEFINE_MUTEX(module_kaslr_mutex);
+
+static unsigned long int get_module_load_offset(void)
+{
+	if (kaslr_enabled()) {
+		mutex_lock(&module_kaslr_mutex);
+		/*
+		 * Calculate the module_load_offset the first time this
+		 * code is called. Once calculated it stays the same until
+		 * reboot.
+		 */
+		if (module_load_offset == 0)
+			module_load_offset =
+				(get_random_int() % 1024 + 1) * PAGE_SIZE;
+		mutex_unlock(&module_kaslr_mutex);
+	}
+	return module_load_offset;
+}
+#else
+static unsigned long int get_module_load_offset(void)
+{
+	return 0;
+}
+#endif
+
+void *module_alloc(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+
+	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 14/38] x86/Kconfig: Add CONFIG_CALL_THUNKS
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (12 preceding siblings ...)
  2022-07-16 23:17 ` [patch 13/38] x86/modules: Make module_alloc() generally available Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 15/38] x86/retbleed: Add X86_FEATURE_CALL_DEPTH Thomas Gleixner
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

In preparation for mitigating the Intel SKL RSB underflow issue in
software, add a new configuration symbol which allows to build the
required call thunk infrastructure conditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2439,6 +2439,14 @@ config CC_HAS_SLS
 config CC_HAS_RETURN_THUNK
 	def_bool $(cc-option,-mfunction-return=thunk-extern)
 
+config HAVE_CALL_THUNKS
+	def_bool y
+	depends on RETHUNK && OBJTOOL
+
+config CALL_THUNKS
+	def_bool n
+	select MODULE_ALLOC
+
 menuconfig SPECULATION_MITIGATIONS
 	bool "Mitigations for speculative execution vulnerabilities"
 	default y


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 15/38] x86/retbleed: Add X86_FEATURE_CALL_DEPTH
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (13 preceding siblings ...)
  2022-07-16 23:17 ` [patch 14/38] x86/Kconfig: Add CONFIG_CALL_THUNKS Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 16/38] modules: Make struct module_layout unconditionally available Thomas Gleixner
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Intel SKL CPUs fall back to other predictors when the RSB underflows. The
only microcode mitigation is IBRS which is insanely expensive. It comes
with performance drops of up to 30% depending on the workload.

A way less expensive, but nevertheless horrible mitigation is to track the
call depth in software and overeagerly fill the RSB when returns underflow
the software counter.

Provide a configuration symbol and a CPU misfeature bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                         |   13 +++++++++++++
 arch/x86/include/asm/cpufeatures.h       |    1 +
 arch/x86/include/asm/disabled-features.h |    9 ++++++++-
 3 files changed, 22 insertions(+), 1 deletion(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2498,6 +2498,19 @@ config CPU_UNRET_ENTRY
 	help
 	  Compile the kernel with support for the retbleed=unret mitigation.
 
+config CALL_DEPTH_TRACKING
+	bool "Mitigate RSB underflow with call depth tracking"
+	depends on CPU_SUP_INTEL && HAVE_CALL_THUNKS
+	select CALL_THUNKS
+	default y
+	help
+	  Compile the kernel with call depth tracking to mitigate the Intel
+	  SKL Return-Speculation-Buffer (RSB) underflow issue. The mitigation
+	  is off by default and needs to be enabled on the kernel command line
+	  via the retbleed=stuff option. For non-affected systems the overhead
+	  of this option is marginal as the call depth tracking is using
+	  run-time generated call thunks and call patching.
+
 config CPU_IBPB_ENTRY
 	bool "Enable IBPB on kernel entry"
 	depends on CPU_SUP_AMD
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -302,6 +302,7 @@
 #define X86_FEATURE_RETPOLINE_LFENCE	(11*32+13) /* "" Use LFENCE for Spectre variant 2 */
 #define X86_FEATURE_RETHUNK		(11*32+14) /* "" Use REturn THUNK */
 #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
+#define X86_FEATURE_CALL_DEPTH		(11*32+16) /* "" Call depth tracking for RSB stuffing */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -69,6 +69,12 @@
 # define DISABLE_UNRET		(1 << (X86_FEATURE_UNRET & 31))
 #endif
 
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+# define DISABLE_CALL_DEPTH_TRACKING	0
+#else
+# define DISABLE_CALL_DEPTH_TRACKING	(1 << (X86_FEATURE_CALL_DEPTH & 31))
+#endif
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 # define DISABLE_ENQCMD		0
 #else
@@ -101,7 +107,8 @@
 #define DISABLED_MASK8	(DISABLE_TDX_GUEST)
 #define DISABLED_MASK9	(DISABLE_SGX)
 #define DISABLED_MASK10	0
-#define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
+#define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
+			 DISABLE_CALL_DEPTH_TRACKING)
 #define DISABLED_MASK12	0
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 16/38] modules: Make struct module_layout unconditionally available
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (14 preceding siblings ...)
  2022-07-16 23:17 ` [patch 15/38] x86/retbleed: Add X86_FEATURE_CALL_DEPTH Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 17/38] module: Add arch_data to module_layout Thomas Gleixner
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

To simplify the upcoming call thunk code it's desired to expose struct
module_layout even on !MDOULES builds. This spares conditionals and
#ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/module.h |   44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -67,6 +67,28 @@ struct module_version_attribute {
 	const char *version;
 };
 
+struct mod_tree_node {
+	struct module *mod;
+	struct latch_tree_node node;
+};
+
+struct module_layout {
+	/* The actual code + data. */
+	void *base;
+	/* Total size. */
+	unsigned int size;
+	/* The size of the executable code.  */
+	unsigned int text_size;
+	/* Size of RO section of the module (text+rodata) */
+	unsigned int ro_size;
+	/* Size of RO after init section */
+	unsigned int ro_after_init_size;
+
+#ifdef CONFIG_MODULES_TREE_LOOKUP
+	struct mod_tree_node mtn;
+#endif
+};
+
 extern ssize_t __modver_version_show(struct module_attribute *,
 				     struct module_kobject *, char *);
 
@@ -316,28 +338,6 @@ enum module_state {
 	MODULE_STATE_UNFORMED,	/* Still setting it up. */
 };
 
-struct mod_tree_node {
-	struct module *mod;
-	struct latch_tree_node node;
-};
-
-struct module_layout {
-	/* The actual code + data. */
-	void *base;
-	/* Total size. */
-	unsigned int size;
-	/* The size of the executable code.  */
-	unsigned int text_size;
-	/* Size of RO section of the module (text+rodata) */
-	unsigned int ro_size;
-	/* Size of RO after init section */
-	unsigned int ro_after_init_size;
-
-#ifdef CONFIG_MODULES_TREE_LOOKUP
-	struct mod_tree_node mtn;
-#endif
-};
-
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 /* Only touch one cacheline for common rbtree-for-core-layout case. */
 #define __module_layout_align ____cacheline_aligned


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 17/38] module: Add arch_data to module_layout
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (15 preceding siblings ...)
  2022-07-16 23:17 ` [patch 16/38] modules: Make struct module_layout unconditionally available Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 18/38] mm/vmalloc: Provide huge page mappings Thomas Gleixner
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

For the upcoming call depth tracking it's required to store extra
information in the module layout. Add a pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/module.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -87,6 +87,9 @@ struct module_layout {
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 	struct mod_tree_node mtn;
 #endif
+#ifdef CONFIG_CALL_THUNKS
+	void *arch_data;
+#endif
 };
 
 extern ssize_t __modver_version_show(struct module_attribute *,


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 18/38] mm/vmalloc: Provide huge page mappings
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (16 preceding siblings ...)
  2022-07-16 23:17 ` [patch 17/38] module: Add arch_data to module_layout Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 19/38] x86/module: Provide __module_alloc() Thomas Gleixner
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Provide VM_HUGE_VMAP, which unconditionally tries to use huge
mappings. Unlike VM_ALLOW_HUGE_VMAP it doesn't care about the number
of NUMA nodes or the size of the allocation.

If the page allocator fails to provide huge pages, it will silently
fall back.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/vmalloc.h |    3 ++-
 mm/vmalloc.c            |   33 +++++++++++++++++++--------------
 2 files changed, 21 insertions(+), 15 deletions(-)

--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -27,10 +27,11 @@ struct notifier_block;		/* in notifier.h
 #define VM_FLUSH_RESET_PERMS	0x00000100	/* reset direct map and flush TLB on unmap, can't be freed in atomic context */
 #define VM_MAP_PUT_PAGES	0x00000200	/* put pages and free array in vfree */
 #define VM_ALLOW_HUGE_VMAP	0x00000400      /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
+#define VM_HUGE_VMAP		0x00000800      /* Force for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
 
 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
 	!defined(CONFIG_KASAN_VMALLOC)
-#define VM_DEFER_KMEMLEAK	0x00000800	/* defer kmemleak object creation */
+#define VM_DEFER_KMEMLEAK	0x00001000	/* defer kmemleak object creation */
 #else
 #define VM_DEFER_KMEMLEAK	0
 #endif
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3099,23 +3099,28 @@ void *__vmalloc_node_range(unsigned long
 		return NULL;
 	}
 
-	if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
-		unsigned long size_per_node;
+	if (vmap_allow_huge && (vm_flags & (VM_HUGE_VMAP|VM_ALLOW_HUGE_VMAP))) {
 
-		/*
-		 * Try huge pages. Only try for PAGE_KERNEL allocations,
-		 * others like modules don't yet expect huge pages in
-		 * their allocations due to apply_to_page_range not
-		 * supporting them.
-		 */
+		if (vm_flags & VM_ALLOW_HUGE_VMAP) {
+			unsigned long size_per_node;
 
-		size_per_node = size;
-		if (node == NUMA_NO_NODE)
-			size_per_node /= num_online_nodes();
-		if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
+			/*
+			 * Try huge pages. Only try for PAGE_KERNEL allocations,
+			 * others like modules don't yet expect huge pages in
+			 * their allocations due to apply_to_page_range not
+			 * supporting them.
+			 */
+
+			size_per_node = size;
+			if (node == NUMA_NO_NODE)
+				size_per_node /= num_online_nodes();
+			if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
+				shift = PMD_SHIFT;
+			else
+				shift = arch_vmap_pte_supported_shift(size_per_node);
+		} else {
 			shift = PMD_SHIFT;
-		else
-			shift = arch_vmap_pte_supported_shift(size_per_node);
+		}
 
 		align = max(real_align, 1UL << shift);
 		size = ALIGN(real_size, 1UL << shift);


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 19/38] x86/module: Provide __module_alloc()
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (17 preceding siblings ...)
  2022-07-16 23:17 ` [patch 18/38] mm/vmalloc: Provide huge page mappings Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 20/38] x86/alternatives: Provide text_poke_[copy|set]_locked() Thomas Gleixner
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Provide a function to allocate from module space with large TLBs. This is
required for callthunks as otherwise the ITLB pressure kills performance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/module.h |    2 ++
 arch/x86/mm/module_alloc.c    |   10 ++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -13,4 +13,6 @@ struct mod_arch_specific {
 #endif
 };
 
+extern void *__module_alloc(unsigned long size, unsigned long vmflags);
+
 #endif /* _ASM_X86_MODULE_H */
--- a/arch/x86/mm/module_alloc.c
+++ b/arch/x86/mm/module_alloc.c
@@ -39,7 +39,7 @@ static unsigned long int get_module_load
 }
 #endif
 
-void *module_alloc(unsigned long size)
+void *__module_alloc(unsigned long size, unsigned long vmflags)
 {
 	gfp_t gfp_mask = GFP_KERNEL;
 	void *p;
@@ -47,10 +47,11 @@ void *module_alloc(unsigned long size)
 	if (PAGE_ALIGN(size) > MODULES_LEN)
 		return NULL;
 
+	vmflags |= VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK;
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				 MODULES_VADDR + get_module_load_offset(),
 				 MODULES_END, gfp_mask, PAGE_KERNEL,
-				 VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK,
+				 vmflags,
 				 NUMA_NO_NODE, __builtin_return_address(0));
 
 	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
@@ -60,3 +61,8 @@ void *module_alloc(unsigned long size)
 
 	return p;
 }
+
+void *module_alloc(unsigned long size)
+{
+	return __module_alloc(size, 0);
+}


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 20/38] x86/alternatives: Provide text_poke_[copy|set]_locked()
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (18 preceding siblings ...)
  2022-07-16 23:17 ` [patch 19/38] x86/module: Provide __module_alloc() Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 21/38] x86/entry: Make some entry symbols global Thomas Gleixner
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

The upcoming call thunk patching must hold text_mutex and needs access to
text_poke_copy() and text_poke_set(), which take text_mutex.

Provide _locked postfixed variants to expose the inner workings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/text-patching.h |    2 +
 arch/x86/kernel/alternative.c        |   48 +++++++++++++++++++++--------------
 2 files changed, 32 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -45,6 +45,8 @@ extern void *text_poke(void *addr, const
 extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
+extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len);
+extern void *text_poke_set_locked(void *addr, int c, size_t len);
 extern void *text_poke_set(void *addr, int c, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1225,6 +1225,26 @@ void *text_poke_kgdb(void *addr, const v
 	return __text_poke(text_poke_memcpy, addr, opcode, len);
 }
 
+void *text_poke_copy_locked(void *addr, const void *opcode, size_t len)
+{
+	unsigned long start = (unsigned long)addr;
+	size_t patched = 0;
+
+	if (WARN_ON_ONCE(core_kernel_text(start)))
+		return NULL;
+
+	while (patched < len) {
+		unsigned long ptr = start + patched;
+		size_t s;
+
+		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
+
+		__text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s);
+		patched += s;
+	}
+	return addr;
+}
+
 /**
  * text_poke_copy - Copy instructions into (an unused part of) RX memory
  * @addr: address to modify
@@ -1239,23 +1259,29 @@ void *text_poke_kgdb(void *addr, const v
  */
 void *text_poke_copy(void *addr, const void *opcode, size_t len)
 {
+	mutex_lock(&text_mutex);
+	addr = text_poke_copy_locked(addr, opcode, len);
+	mutex_unlock(&text_mutex);
+	return addr;
+}
+
+void *text_poke_set_locked(void *addr, int c, size_t len)
+{
 	unsigned long start = (unsigned long)addr;
 	size_t patched = 0;
 
 	if (WARN_ON_ONCE(core_kernel_text(start)))
 		return NULL;
 
-	mutex_lock(&text_mutex);
 	while (patched < len) {
 		unsigned long ptr = start + patched;
 		size_t s;
 
 		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
 
-		__text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s);
+		__text_poke(text_poke_memset, (void *)ptr, (void *)&c, s);
 		patched += s;
 	}
-	mutex_unlock(&text_mutex);
 	return addr;
 }
 
@@ -1270,22 +1296,8 @@ void *text_poke_copy(void *addr, const v
  */
 void *text_poke_set(void *addr, int c, size_t len)
 {
-	unsigned long start = (unsigned long)addr;
-	size_t patched = 0;
-
-	if (WARN_ON_ONCE(core_kernel_text(start)))
-		return NULL;
-
 	mutex_lock(&text_mutex);
-	while (patched < len) {
-		unsigned long ptr = start + patched;
-		size_t s;
-
-		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
-
-		__text_poke(text_poke_memset, (void *)ptr, (void *)&c, s);
-		patched += s;
-	}
+	addr = text_poke_set_locked(addr, c, len);
 	mutex_unlock(&text_mutex);
 	return addr;
 }


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 21/38] x86/entry: Make some entry symbols global
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (19 preceding siblings ...)
  2022-07-16 23:17 ` [patch 20/38] x86/alternatives: Provide text_poke_[copy|set]_locked() Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 22/38] x86/paravirt: Make struct paravirt_call_site unconditionally available Thomas Gleixner
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

paranoid_entry(), error_entry() and xen_error_entry() have to be
exempted from call accounting by thunk patching because they are
before UNTRAIN_RET.

Expose them so they are available in the alternative code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -326,7 +326,8 @@ SYM_CODE_END(ret_from_fork)
 #endif
 .endm
 
-SYM_CODE_START_LOCAL(xen_error_entry)
+SYM_CODE_START(xen_error_entry)
+	ANNOTATE_NOENDBR
 	UNWIND_HINT_FUNC
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
@@ -904,7 +905,8 @@ SYM_CODE_END(xen_failsafe_callback)
  * R14 - old CR3
  * R15 - old SPEC_CTRL
  */
-SYM_CODE_START_LOCAL(paranoid_entry)
+SYM_CODE_START(paranoid_entry)
+	ANNOTATE_NOENDBR
 	UNWIND_HINT_FUNC
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
@@ -1039,7 +1041,8 @@ SYM_CODE_END(paranoid_exit)
 /*
  * Switch GS and CR3 if needed.
  */
-SYM_CODE_START_LOCAL(error_entry)
+SYM_CODE_START(error_entry)
+	ANNOTATE_NOENDBR
 	UNWIND_HINT_FUNC
 
 	PUSH_AND_CLEAR_REGS save_ret=1


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 22/38] x86/paravirt: Make struct paravirt_call_site unconditionally available
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (20 preceding siblings ...)
  2022-07-16 23:17 ` [patch 21/38] x86/entry: Make some entry symbols global Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 23/38] x86/callthunks: Add call patching for call depth tracking Thomas Gleixner
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

For the upcoming call thunk patching it's less ifdeffery when the data
structure is unconditionally available. The code can then be trivially
fenced off with IS_ENABLED().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/paravirt.h       |    4 ++--
 arch/x86/include/asm/paravirt_types.h |   20 ++++++++++++--------
 2 files changed, 14 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -4,13 +4,13 @@
 /* Various instructions on x86 need to be replaced for
  * para-virtualization: those hooks are defined here. */
 
+#include <asm/paravirt_types.h>
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/pgtable_types.h>
 #include <asm/asm.h>
 #include <asm/nospec-branch.h>
 
-#include <asm/paravirt_types.h>
-
 #ifndef __ASSEMBLY__
 #include <linux/bug.h>
 #include <linux/types.h>
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -2,6 +2,17 @@
 #ifndef _ASM_X86_PARAVIRT_TYPES_H
 #define _ASM_X86_PARAVIRT_TYPES_H
 
+#ifndef __ASSEMBLY__
+/* These all sit in the .parainstructions section to tell us what to patch. */
+struct paravirt_patch_site {
+	u8 *instr;		/* original instructions */
+	u8 type;		/* type of this instruction */
+	u8 len;			/* length of original instruction */
+};
+#endif
+
+#ifdef CONFIG_PARAVIRT
+
 /* Bitmask of what can be clobbered: usually at least eax. */
 #define CLBR_EAX  (1 << 0)
 #define CLBR_ECX  (1 << 1)
@@ -584,16 +595,9 @@ unsigned long paravirt_ret0(void);
 
 #define paravirt_nop	((void *)_paravirt_nop)
 
-/* These all sit in the .parainstructions section to tell us what to patch. */
-struct paravirt_patch_site {
-	u8 *instr;		/* original instructions */
-	u8 type;		/* type of this instruction */
-	u8 len;			/* length of original instruction */
-};
-
 extern struct paravirt_patch_site __parainstructions[],
 	__parainstructions_end[];
 
 #endif	/* __ASSEMBLY__ */
-
+#endif  /* CONFIG_PARAVIRT */
 #endif	/* _ASM_X86_PARAVIRT_TYPES_H */


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 23/38] x86/callthunks: Add call patching for call depth tracking
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (21 preceding siblings ...)
  2022-07-16 23:17 ` [patch 22/38] x86/paravirt: Make struct paravirt_call_site unconditionally available Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 24/38] module: Add layout for callthunks tracking Thomas Gleixner
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Mitigating the Intel SKL RSB underflow issue in software requires to track
the call depth. This could be done with help of the compiler by adding at
least 7 bytes NOPs before every direct call, which amounts to a 15+ percent
text size increase for a vmlinux built with a Debian kernel config. While
CPUs are quite efficient in ignoring NOPs this still is a massive penalty
in terms of I-cache for all CPUs which do not have this issue.

Inflict the pain only on SKL CPUs by creating call thunks for each function
and patching the calls to invoke the thunks instead.

The thunks are created in module memory to stay within the 32bit
displacement boundary. The thunk then does:

	 ACCOUNT_DEPTH
	 JMP function

The functions and call sites lists are generated by objtool. The memory
requirement is 16 bytes per call thunk and btree memory for keeping track
of them. For a Debian distro config this amounts to ~1.6MB thunk memory and
2MB btree storage. This is only required when the call depth tracking is
enabled on the kernel command line. So the burden is solely on SKL[-X].

The thunks are all stored in one 2MB memory region which is mapped with a
large TLB to prevent ITLB pressure.

The thunks are generated from a template and the btree is used to store
them by destination address. The actual call patching retrieves the thunks
from the btree and replaces the original function call by a call to the
thunk.

Module handling and the actual thunk code for SKL will be added in
subsequent steps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                   |   13 +
 arch/x86/include/asm/alternative.h |   13 +
 arch/x86/kernel/Makefile           |    2 
 arch/x86/kernel/alternative.c      |    6 
 arch/x86/kernel/callthunks.c       |  459 +++++++++++++++++++++++++++++++++++++
 5 files changed, 493 insertions(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,6 +125,7 @@ config X86
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_THP_SWAP		if X86_64
 	select ARCH_HAS_PARANOID_L1D_FLUSH
+	select BTREE				if CALL_DEPTH_TRACKING
 	select BUILDTIME_TABLE_SORT
 	select CLKEVT_I8253
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
@@ -2511,6 +2512,18 @@ config CALL_DEPTH_TRACKING
 	  of this option is marginal as the call depth tracking is using
 	  run-time generated call thunks and call patching.
 
+config CALL_THUNKS_DEBUG
+	bool "Enable call thunks and call depth tracking debugging"
+	depends on CALL_DEPTH_TRACKING
+	default n
+	help
+	  Enable call/ret counters for imbalance detection and build in
+	  a noisy dmesg about callthunks generation and call patching for
+	  trouble shooting. The debug prints need to be enabled on the
+	  kernel command line with 'debug-callthunks'.
+	  Only enable this, when you are debugging call thunks as this
+	  creates a noticable runtime overhead. If unsure say N.
+
 config CPU_IBPB_ENTRY
 	bool "Enable IBPB on kernel entry"
 	depends on CPU_SUP_AMD
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -80,6 +80,19 @@ extern void apply_returns(s32 *start, s3
 extern void apply_ibt_endbr(s32 *start, s32 *end);
 
 struct module;
+struct paravirt_patch_site;
+
+struct callthunk_sites {
+	s32				*syms_start, *syms_end;
+	s32				*call_start, *call_end;
+	struct paravirt_patch_site	*pv_start, *pv_end;
+};
+
+#ifdef CONFIG_CALL_THUNKS
+extern void callthunks_patch_builtin_calls(void);
+#else
+static __always_inline void callthunks_patch_builtin_calls(void) {}
+#endif
 
 #ifdef CONFIG_SMP
 extern void alternatives_smp_module_add(struct module *mod, char *name,
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -141,6 +141,8 @@ obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev.o
 
+obj-$(CONFIG_CALL_THUNKS)		+= callthunks.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -936,6 +936,12 @@ void __init alternative_instructions(voi
 	 */
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
+	/*
+	 * Now all calls are established. Apply the call thunks if
+	 * required.
+	 */
+	callthunks_patch_builtin_calls();
+
 	apply_ibt_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end);
 
 #ifdef CONFIG_SMP
--- /dev/null
+++ b/arch/x86/kernel/callthunks.c
@@ -0,0 +1,459 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "callthunks: " fmt
+
+#include <linux/btree.h>
+#include <linux/memory.h>
+#include <linux/moduleloader.h>
+#include <linux/set_memory.h>
+#include <linux/vmalloc.h>
+
+#include <asm/alternative.h>
+#include <asm/insn.h>
+#include <asm/nospec-branch.h>
+#include <asm/paravirt.h>
+#include <asm/sections.h>
+#include <asm/switch_to.h>
+#include <asm/sync_core.h>
+#include <asm/text-patching.h>
+
+#ifdef CONFIG_CALL_THUNKS_DEBUG
+static int __initdata_or_module debug_callthunks;
+
+#define prdbg(fmt, args...)					\
+do {								\
+	if (debug_callthunks)					\
+		printk(KERN_DEBUG pr_fmt(fmt), ##args);	\
+} while(0)
+
+static int __init debug_thunks(char *str)
+{
+	debug_callthunks = 1;
+	return 1;
+}
+__setup("debug-callthunks", debug_thunks);
+#else
+#define prdbg(fmt, args...)	do { } while(0)
+#endif
+
+extern s32 __call_sites[], __call_sites_end[];
+extern s32 __sym_sites[], __sym_sites_end[];
+
+static struct btree_head64 call_thunks;
+
+static bool thunks_initialized __ro_after_init;
+static struct module_layout builtin_layout __ro_after_init;
+
+struct thunk_desc {
+	void		*template;
+	unsigned int	template_size;
+	unsigned int	thunk_size;
+};
+
+static struct thunk_desc callthunk_desc __ro_after_init;
+
+struct thunk_mem {
+	void			*base;
+	unsigned int		size;
+	unsigned int		nthunks;
+	bool			is_rx;
+	struct list_head	list;
+	unsigned long		map[0];
+};
+
+struct thunk_mem_area {
+	struct thunk_mem	*tmem;
+	unsigned long		start;
+	unsigned long		nthunks;
+};
+
+static LIST_HEAD(thunk_mem_list);
+
+extern void error_entry(void);
+extern void xen_error_entry(void);
+extern void paranoid_entry(void);
+
+static inline bool is_inittext(struct module_layout *layout, void *addr)
+{
+	if (!layout->mtn.mod)
+		return is_kernel_inittext((unsigned long)addr);
+
+	return within_module_init((unsigned long)addr, layout->mtn.mod);
+}
+
+static __init_or_module bool skip_addr(void *dest)
+{
+	if (dest == error_entry)
+		return true;
+	if (dest == paranoid_entry)
+		return true;
+	if (dest == xen_error_entry)
+		return true;
+	/* Does FILL_RSB... */
+	if (dest == __switch_to_asm)
+		return true;
+	/* Accounts directly */
+	if (dest == ret_from_fork)
+		return true;
+#ifdef CONFIG_FUNCTION_TRACER
+	if (dest == __fentry__)
+		return true;
+#endif
+	return false;
+}
+
+static __init_or_module void *call_get_dest(void *addr)
+{
+	struct insn insn;
+	void *dest;
+	int ret;
+
+	ret = insn_decode_kernel(&insn, addr);
+	if (ret)
+		return ERR_PTR(ret);
+
+	/* Patched out call? */
+	if (insn.opcode.bytes[0] != CALL_INSN_OPCODE)
+		return NULL;
+
+	dest = addr + insn.length + insn.immediate.value;
+	if (skip_addr(dest))
+		return NULL;
+	return dest;
+}
+
+static void *jump_get_dest(void *addr)
+{
+	struct insn insn;
+	int ret;
+
+	ret = insn_decode_kernel(&insn, addr);
+	if (WARN_ON_ONCE(ret))
+		return NULL;
+
+	if (insn.opcode.bytes[0] != JMP32_INSN_OPCODE) {
+		WARN_ON_ONCE(insn.opcode.bytes[0] != INT3_INSN_OPCODE);
+		return NULL;
+	}
+
+	return addr + insn.length + insn.immediate.value;
+}
+
+static __init_or_module void callthunk_free(struct thunk_mem_area *area,
+					    bool set_int3)
+{
+	struct thunk_mem *tmem = area->tmem;
+	unsigned int i, size;
+	u8 *thunk, *tp;
+
+	lockdep_assert_held(&text_mutex);
+
+	prdbg("Freeing tmem %px %px %lu %lu\n", tmem->base,
+	      tmem->base + area->start * callthunk_desc.thunk_size,
+	      area->start, area->nthunks);
+
+	/* Jump starts right after the template */
+	thunk = tmem->base + area->start * callthunk_desc.thunk_size;
+	tp = thunk + callthunk_desc.template_size;
+
+	for (i = 0; i < area->nthunks; i++) {
+		void *dest = jump_get_dest(tp);
+
+		if (dest)
+			btree_remove64(&call_thunks, (unsigned long)dest);
+		tp += callthunk_desc.thunk_size;
+	}
+	bitmap_clear(tmem->map, area->start, area->nthunks);
+
+	if (bitmap_empty(tmem->map, tmem->nthunks)) {
+		list_del(&tmem->list);
+		prdbg("Freeing empty tmem: %px %u %u\n", tmem->base,
+		      tmem->size, tmem->nthunks);
+		vfree(tmem->base);
+		kfree(tmem);
+	} else if (set_int3) {
+		size = area->nthunks * callthunk_desc.thunk_size;
+		text_poke_set_locked(thunk, 0xcc, size);
+	}
+	kfree(area);
+}
+
+static __init_or_module
+int callthunk_setup_one(void *dest, u8 *thunk, u8 *buffer,
+			struct module_layout *layout)
+{
+	unsigned long key = (unsigned long)dest;
+	u8 *jmp;
+
+	if (is_inittext(layout, dest)) {
+		prdbg("Ignoring init dest: %pS %px\n", dest, dest);
+		return 0;
+	}
+
+	/* Multiple symbols can have the same location. */
+	if (btree_lookup64(&call_thunks, key)) {
+		prdbg("Ignoring duplicate dest: %pS %px\n", dest, dest);
+		return 0;
+	}
+
+	memcpy(buffer, callthunk_desc.template, callthunk_desc.template_size);
+	jmp = thunk + callthunk_desc.template_size;
+	buffer += callthunk_desc.template_size;
+	__text_gen_insn(buffer, JMP32_INSN_OPCODE, jmp, dest, JMP32_INSN_SIZE);
+
+	return btree_insert64(&call_thunks, key, (void *)thunk, GFP_KERNEL) ? : 1;
+}
+
+static __always_inline char *layout_getname(struct module_layout *layout)
+{
+#ifdef CONFIG_MODULES
+	if (layout->mtn.mod)
+		return layout->mtn.mod->name;
+#endif
+	return "builtin";
+}
+
+static __init_or_module void patch_call(void *addr, struct module_layout *layout)
+{
+	void *thunk, *dest;
+	unsigned long key;
+	u8 bytes[8];
+
+	if (is_inittext(layout, addr))
+		return;
+
+	dest = call_get_dest(addr);
+	if (!dest || WARN_ON_ONCE(IS_ERR(dest)))
+		return;
+
+	key = (unsigned long)dest;
+	thunk = btree_lookup64(&call_thunks, key);
+
+	if (!thunk) {
+		WARN_ONCE(!is_inittext(layout, dest),
+			  "Lookup %s thunk for %pS -> %pS %016lx failed\n",
+			  layout_getname(layout), addr, dest, key);
+		return;
+	}
+
+	__text_gen_insn(bytes, CALL_INSN_OPCODE, addr, thunk, CALL_INSN_SIZE);
+	text_poke_early(addr, bytes, CALL_INSN_SIZE);
+}
+
+static __init_or_module void patch_call_sites(s32 *start, s32 *end,
+					      struct module_layout *layout)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++)
+		patch_call((void *)s + *s, layout);
+}
+
+static __init_or_module void
+patch_paravirt_call_sites(struct paravirt_patch_site *start,
+			  struct paravirt_patch_site *end,
+			  struct module_layout *layout)
+{
+	struct paravirt_patch_site *p;
+
+	for (p = start; p < end; p++)
+		patch_call(p->instr, layout);
+}
+
+static struct thunk_mem_area *callthunks_alloc(unsigned int nthunks)
+{
+	struct thunk_mem_area *area;
+	unsigned int size, mapsize;
+	struct thunk_mem *tmem;
+
+	area = kzalloc(sizeof(*area), GFP_KERNEL);
+	if (!area)
+		return NULL;
+
+	list_for_each_entry(tmem, &thunk_mem_list, list) {
+		unsigned long start;
+
+		start = bitmap_find_next_zero_area(tmem->map, tmem->nthunks,
+						   0, nthunks, 0);
+		if (start >= tmem->nthunks)
+			continue;
+		area->tmem = tmem;
+		area->start = start;
+		prdbg("Using tmem %px %px %lu %u\n", tmem->base,
+		      tmem->base + start * callthunk_desc.thunk_size,
+		      start, nthunks);
+		return area;
+	}
+
+	size = nthunks * callthunk_desc.thunk_size;
+	size = round_up(size, PMD_SIZE);
+	nthunks = size / callthunk_desc.thunk_size;
+	mapsize = nthunks / 8;
+
+	tmem = kzalloc(sizeof(*tmem) + mapsize, GFP_KERNEL);
+	if (!tmem)
+		goto free_area;
+	INIT_LIST_HEAD(&tmem->list);
+
+	tmem->base = __module_alloc(size, VM_HUGE_VMAP);
+	if (!tmem->base)
+		goto free_tmem;
+	memset(tmem->base, INT3_INSN_OPCODE, size);
+	tmem->size = size;
+	tmem->nthunks = nthunks;
+	list_add(&tmem->list, &thunk_mem_list);
+
+	area->tmem = tmem;
+	area->start = 0;
+	prdbg("Allocated tmem %px %x %u\n", tmem->base, size, nthunks);
+	return area;
+
+free_tmem:
+	kfree(tmem);
+free_area:
+	kfree(area);
+	return NULL;
+}
+
+static __init_or_module void callthunk_area_set_rx(struct thunk_mem_area *area)
+{
+	unsigned long base, size;
+
+	base = (unsigned long)area->tmem->base;
+	size = area->tmem->size / PAGE_SIZE;
+
+	prdbg("Set RX: %016lx %lx\n", base, size);
+	set_memory_ro(base, size);
+	set_memory_x(base, size);
+
+	area->tmem->is_rx = true;
+}
+
+static __init_or_module int callthunks_setup(struct callthunk_sites *cs,
+					     struct module_layout *layout)
+{
+	u8 *tp, *thunk, *buffer, *vbuf = NULL;
+	unsigned int nthunks, bitpos;
+	struct thunk_mem_area *area;
+	int ret, text_size, size;
+	s32 *s;
+
+	lockdep_assert_held(&text_mutex);
+
+	prdbg("Setup %s\n", layout_getname(layout));
+	/* Calculate the number of thunks required */
+	nthunks = cs->syms_end - cs->syms_start;
+
+	/*
+	 * thunk_size can be 0 when there are no intra module calls,
+	 * but there might be still sites to patch.
+	 */
+	if (!nthunks)
+		goto patch;
+
+	area = callthunks_alloc(nthunks);
+	if (!area)
+		return -ENOMEM;
+
+	bitpos = area->start;
+	thunk = area->tmem->base + bitpos * callthunk_desc.thunk_size;
+	tp = thunk;
+
+	prdbg("Thunk %px\n", tp);
+	/*
+	 * If the memory area is already RX, use a temporary
+	 * buffer. Otherwise just copy into the unused area
+	 */
+	if (!area->tmem->is_rx) {
+		prdbg("Using thunk direct\n");
+		buffer = thunk;
+	} else {
+		size = nthunks * callthunk_desc.thunk_size;
+		vbuf = vmalloc(size);
+		if (!vbuf) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+		memset(vbuf, INT3_INSN_OPCODE, size);
+		buffer = vbuf;
+		prdbg("Using thunk vbuf %px\n", vbuf);
+	}
+
+	for (s = cs->syms_start; s < cs->syms_end; s++, bitpos++) {
+		void *dest = (void *)s + *s;
+
+		ret = callthunk_setup_one(dest, tp, buffer, layout);
+		if (ret)
+			goto fail;
+		buffer += callthunk_desc.thunk_size;
+		tp += callthunk_desc.thunk_size;
+		bitmap_set(area->tmem->map, bitpos, 1);
+		area->nthunks++;
+	}
+
+	text_size = tp - thunk;
+	prdbg("Thunk %px .. %px 0x%x\n", thunk, tp, text_size);
+
+	/*
+	 * If thunk memory is already RX, poke the buffer into it.
+	 * Otherwise make the memory RX.
+	 */
+	if (vbuf)
+		text_poke_copy_locked(thunk, vbuf, text_size);
+	else
+		callthunk_area_set_rx(area);
+	sync_core();
+
+	layout->base = thunk;
+	layout->size = text_size;
+	layout->text_size = text_size;
+	layout->arch_data = area;
+
+	vfree(vbuf);
+
+patch:
+	prdbg("Patching call sites %s\n", layout_getname(layout));
+	patch_call_sites(cs->call_start, cs->call_end, layout);
+	patch_paravirt_call_sites(cs->pv_start, cs->pv_end, layout);
+	prdbg("Patching call sites done%s\n", layout_getname(layout));
+	return 0;
+
+fail:
+	WARN_ON_ONCE(ret);
+	callthunk_free(area, false);
+	vfree(vbuf);
+	return ret;
+}
+
+static __init noinline void callthunks_init(struct callthunk_sites *cs)
+{
+	int ret;
+
+	if (!callthunk_desc.template)
+		return;
+
+	if (WARN_ON_ONCE(btree_init64(&call_thunks)))
+		return;
+
+	ret = callthunks_setup(cs, &builtin_layout);
+	if (WARN_ON_ONCE(ret))
+		return;
+
+	thunks_initialized = true;
+}
+
+void __init callthunks_patch_builtin_calls(void)
+{
+	struct callthunk_sites cs = {
+		.syms_start	= __sym_sites,
+		.syms_end	= __sym_sites_end,
+		.call_start	= __call_sites,
+		.call_end	= __call_sites_end,
+		.pv_start	= __parainstructions,
+		.pv_end		= __parainstructions_end
+	};
+
+	mutex_lock(&text_mutex);
+	callthunks_init(&cs);
+	mutex_unlock(&text_mutex);
+}


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 24/38] module: Add layout for callthunks tracking
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (22 preceding siblings ...)
  2022-07-16 23:17 ` [patch 23/38] x86/callthunks: Add call patching for call depth tracking Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 25/38] x86/modules: Add call thunk patching Thomas Gleixner
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Various things will need to be able to tell if a specific address is a
callthunk or not (ORC, BPF, static_call). In order to answer this
question in the face of modules it is necessary to (quickly) find the
module associated with a specific (callthunk) address.

Extend the __module_address() infrastructure with knowledge of the
(per module) callthunk range.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/module.h      |   21 +++++++++++++++++++--
 kernel/module/internal.h    |    8 ++++++++
 kernel/module/main.c        |    6 ++++++
 kernel/module/tree_lookup.c |   17 ++++++++++++++++-
 4 files changed, 49 insertions(+), 3 deletions(-)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -424,6 +424,9 @@ struct module {
 	/* Core layout: rbtree is accessed frequently, so keep together. */
 	struct module_layout core_layout __module_layout_align;
 	struct module_layout init_layout;
+#ifdef CONFIG_CALL_THUNKS
+	struct module_layout thunk_layout;
+#endif
 #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
 	struct module_layout data_layout;
 #endif
@@ -590,9 +593,23 @@ static inline bool within_module_init(un
 	       addr < (unsigned long)mod->init_layout.base + mod->init_layout.size;
 }
 
-static inline bool within_module(unsigned long addr, const struct module *mod)
+static inline bool within_module_thunk(unsigned long addr,
+				       const struct module *mod)
+{
+#ifdef CONFIG_CALL_THUNKS
+	return (unsigned long)mod->thunk_layout.base <= addr &&
+	       addr < (unsigned long)mod->thunk_layout.base + mod->thunk_layout.size;
+#else
+	return false;
+#endif
+}
+
+static inline bool within_module(unsigned long addr,
+				 const struct module *mod)
 {
-	return within_module_init(addr, mod) || within_module_core(addr, mod);
+	return within_module_core(addr, mod)  ||
+	       within_module_thunk(addr, mod) ||
+	       within_module_init(addr, mod);
 }
 
 /* Search for module by name: must be in a RCU-sched critical section. */
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -219,6 +219,14 @@ static inline struct module *mod_find(un
 }
 #endif /* CONFIG_MODULES_TREE_LOOKUP */
 
+#if defined(CONFIG_MODULES_TREE_LOOKUP) && defined(CONFIG_CALL_THUNKS)
+void mod_tree_insert_thunk(struct module *mod);
+void mod_tree_remove_thunk(struct module *mod);
+#else
+static inline void mod_tree_insert_thunk(struct module *mod) { }
+static inline void mod_tree_remove_thunk(struct module *mod) { }
+#endif
+
 void module_enable_ro(const struct module *mod, bool after_init);
 void module_enable_nx(const struct module *mod);
 void module_enable_x(const struct module *mod);
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1154,6 +1154,7 @@ static void free_module(struct module *m
 	 */
 	mutex_lock(&module_mutex);
 	mod->state = MODULE_STATE_UNFORMED;
+	mod_tree_remove_thunk(mod);
 	mutex_unlock(&module_mutex);
 
 	/* Remove dynamic debug info */
@@ -2770,6 +2771,10 @@ static int load_module(struct load_info
 	if (err < 0)
 		goto free_modinfo;
 
+	mutex_lock(&module_mutex);
+	mod_tree_insert_thunk(mod);
+	mutex_unlock(&module_mutex);
+
 	flush_module_icache(mod);
 
 	/* Setup CFI for the module. */
@@ -2859,6 +2864,7 @@ static int load_module(struct load_info
 	mutex_lock(&module_mutex);
 	/* Unlink carefully: kallsyms could be walking list. */
 	list_del_rcu(&mod->list);
+	mod_tree_remove_thunk(mod);
 	mod_tree_remove(mod);
 	wake_up_all(&module_wq);
 	/* Wait for RCU-sched synchronizing before releasing mod->list. */
--- a/kernel/module/tree_lookup.c
+++ b/kernel/module/tree_lookup.c
@@ -66,11 +66,26 @@ static noinline void __mod_tree_insert(s
 	latch_tree_insert(&node->node, &tree->root, &mod_tree_ops);
 }
 
-static void __mod_tree_remove(struct mod_tree_node *node, struct mod_tree_root *tree)
+static noinline void __mod_tree_remove(struct mod_tree_node *node, struct mod_tree_root *tree)
 {
 	latch_tree_erase(&node->node, &tree->root, &mod_tree_ops);
 }
 
+#ifdef CONFIG_CALL_THUNKS
+void mod_tree_insert_thunk(struct module *mod)
+{
+	mod->thunk_layout.mtn.mod = mod;
+	if (mod->thunk_layout.size)
+		__mod_tree_insert(&mod->thunk_layout.mtn, &mod_tree);
+}
+
+void mod_tree_remove_thunk(struct module *mod)
+{
+	if (mod->thunk_layout.size)
+		__mod_tree_remove(&mod->thunk_layout.mtn, &mod_tree);
+}
+#endif
+
 /*
  * These modifications: insert, remove_init and remove; are serialized by the
  * module_mutex.


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 25/38] x86/modules: Add call thunk patching
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (23 preceding siblings ...)
  2022-07-16 23:17 ` [patch 24/38] module: Add layout for callthunks tracking Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 26/38] x86/returnthunk: Allow different return thunks Thomas Gleixner
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

As for the builtins create call thunks and patch the call sites to call the
thunk on Intel SKL CPUs for retbleed mitigation.

Note, that module init functions are ignored for sake of simplicity because
loading modules is not something which is done in high frequent loops and
the attacker has not really a handle on when this happens in order to
launch a matching attack. The depth tracking will still work for calls into
the builtins and because the call is not accounted it will underflow faster
and overstuff, but that's mitigated by the saturating counter and the side
effect is only temporary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/alternative.h |    7 +++++
 arch/x86/kernel/callthunks.c       |   49 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/module.c           |   29 +++++++++++++++++++++
 include/linux/module.h             |    4 +++
 4 files changed, 88 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -90,8 +90,15 @@ struct callthunk_sites {
 
 #ifdef CONFIG_CALL_THUNKS
 extern void callthunks_patch_builtin_calls(void);
+extern void callthunks_patch_module_calls(struct callthunk_sites *sites,
+					  struct module *mod);
+extern void callthunks_module_free(struct module *mod);
 #else
 static __always_inline void callthunks_patch_builtin_calls(void) {}
+static __always_inline void
+callthunks_patch_module_calls(struct callthunk_sites *sites,
+			      struct module *mod) {}
+static __always_inline void callthunks_module_free(struct module *mod) { }
 #endif
 
 #ifdef CONFIG_SMP
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -329,6 +329,20 @@ static __init_or_module void callthunk_a
 	area->tmem->is_rx = true;
 }
 
+static __init_or_module int callthunk_set_modname(struct module_layout *layout)
+{
+#ifdef CONFIG_MODULES
+	struct module *mod = layout->mtn.mod;
+
+	if (mod) {
+		mod->callthunk_name = kasprintf(GFP_KERNEL, "callthunk:%s", mod->name);
+		if (!mod->callthunk_name)
+			return -ENOMEM;
+	}
+#endif
+	return 0;
+}
+
 static __init_or_module int callthunks_setup(struct callthunk_sites *cs,
 					     struct module_layout *layout)
 {
@@ -404,6 +418,10 @@ static __init_or_module int callthunks_s
 		callthunk_area_set_rx(area);
 	sync_core();
 
+	ret = callthunk_set_modname(layout);
+	if (ret)
+		goto fail;
+
 	layout->base = thunk;
 	layout->size = text_size;
 	layout->text_size = text_size;
@@ -457,3 +475,34 @@ void __init callthunks_patch_builtin_cal
 	callthunks_init(&cs);
 	mutex_unlock(&text_mutex);
 }
+
+#ifdef CONFIG_MODULES
+void noinline callthunks_patch_module_calls(struct callthunk_sites *cs,
+					    struct module *mod)
+{
+	struct module_layout *layout = &mod->thunk_layout;
+
+	if (!thunks_initialized)
+		return;
+
+	layout->mtn.mod = mod;
+	mutex_lock(&text_mutex);
+	WARN_ON_ONCE(callthunks_setup(cs, layout));
+	mutex_unlock(&text_mutex);
+}
+
+void callthunks_module_free(struct module *mod)
+{
+	struct module_layout *layout = &mod->thunk_layout;
+	struct thunk_mem_area *area = layout->arch_data;
+
+	if (!thunks_initialized || !area)
+		return;
+
+	prdbg("Free %s\n", layout_getname(layout));
+	layout->arch_data = NULL;
+	mutex_lock(&text_mutex);
+	callthunk_free(area, true);
+	mutex_unlock(&text_mutex);
+}
+#endif /* CONFIG_MODULES */
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -196,7 +196,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL, *orc = NULL, *orc_ip = NULL,
-		*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL;
+		*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
+		*syms = NULL, *calls = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -216,6 +217,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 			retpolines = s;
 		if (!strcmp(".return_sites", secstrings + s->sh_name))
 			returns = s;
+		if (!strcmp(".sym_sites", secstrings + s->sh_name))
+			syms = s;
+		if (!strcmp(".call_sites", secstrings + s->sh_name))
+			calls = s;
 		if (!strcmp(".ibt_endbr_seal", secstrings + s->sh_name))
 			ibt_endbr = s;
 	}
@@ -241,10 +246,31 @@ int module_finalize(const Elf_Ehdr *hdr,
 		void *aseg = (void *)alt->sh_addr;
 		apply_alternatives(aseg, aseg + alt->sh_size);
 	}
+	if (calls || syms || para) {
+		struct callthunk_sites cs = {};
+
+		if (syms) {
+			cs.syms_start = (void *)syms->sh_addr;
+			cs.syms_end = (void *)syms->sh_addr + syms->sh_size;
+		}
+
+		if (calls) {
+			cs.call_start = (void *)calls->sh_addr;
+			cs.call_end = (void *)calls->sh_addr + calls->sh_size;
+		}
+
+		if (para) {
+			cs.pv_start = (void *)para->sh_addr;
+			cs.pv_end = (void *)para->sh_addr + para->sh_size;
+		}
+
+		callthunks_patch_module_calls(&cs, me);
+	}
 	if (ibt_endbr) {
 		void *iseg = (void *)ibt_endbr->sh_addr;
 		apply_ibt_endbr(iseg, iseg + ibt_endbr->sh_size);
 	}
+
 	if (locks && text) {
 		void *lseg = (void *)locks->sh_addr;
 		void *tseg = (void *)text->sh_addr;
@@ -266,4 +292,5 @@ int module_finalize(const Elf_Ehdr *hdr,
 void module_arch_cleanup(struct module *mod)
 {
 	alternatives_smp_module_del(mod);
+	callthunks_module_free(mod);
 }
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -525,6 +525,10 @@ struct module {
 	struct pi_entry **printk_index_start;
 #endif
 
+#ifdef CONFIG_CALL_THUNKS
+	char *callthunk_name;
+#endif
+
 #ifdef CONFIG_MODULE_UNLOAD
 	/* What modules depend on me? */
 	struct list_head source_list;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 26/38] x86/returnthunk: Allow different return thunks
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (24 preceding siblings ...)
  2022-07-16 23:17 ` [patch 25/38] x86/modules: Add call thunk patching Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 27/38] x86/asm: Provide ALTERNATIVE_3 Thomas Gleixner
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

In preparation for call depth tracking on Intel SKL CPUs, make it possible
to patch in a SKL specific return thunk.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/nospec-branch.h |    6 ++++++
 arch/x86/kernel/alternative.c        |   19 ++++++++++++++-----
 arch/x86/kernel/ftrace.c             |    2 +-
 arch/x86/kernel/static_call.c        |    2 +-
 arch/x86/net/bpf_jit_comp.c          |    2 +-
 5 files changed, 23 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -168,6 +168,12 @@ extern void __x86_return_thunk(void);
 extern void zen_untrain_ret(void);
 extern void entry_ibpb(void);
 
+#ifdef CONFIG_CALL_THUNKS
+extern void (*x86_return_thunk)(void);
+#else
+#define x86_return_thunk	(&__x86_return_thunk)
+#endif
+
 #ifdef CONFIG_RETPOLINE
 
 #define GEN(reg) \
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -509,6 +509,11 @@ void __init_or_module noinline apply_ret
 }
 
 #ifdef CONFIG_RETHUNK
+
+#ifdef CONFIG_CALL_THUNKS
+void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk;
+#endif
+
 /*
  * Rewrite the compiler generated return thunk tail-calls.
  *
@@ -524,14 +529,18 @@ static int patch_return(void *addr, stru
 {
 	int i = 0;
 
-	if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
-		return -1;
-
-	bytes[i++] = RET_INSN_OPCODE;
+	if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) {
+		if (x86_return_thunk == __x86_return_thunk)
+			return -1;
+
+		i = JMP32_INSN_SIZE;
+		__text_gen_insn(bytes, JMP32_INSN_OPCODE, addr, x86_return_thunk, i);
+	} else {
+		bytes[i++] = RET_INSN_OPCODE;
+	}
 
 	for (; i < insn->length;)
 		bytes[i++] = INT3_INSN_OPCODE;
-
 	return i;
 }
 
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -358,7 +358,7 @@ create_trampoline(struct ftrace_ops *ops
 
 	ip = trampoline + size;
 	if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
-		__text_gen_insn(ip, JMP32_INSN_OPCODE, ip, &__x86_return_thunk, JMP32_INSN_SIZE);
+		__text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE);
 	else
 		memcpy(ip, retq, sizeof(retq));
 
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -52,7 +52,7 @@ static void __ref __static_call_transfor
 
 	case RET:
 		if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
-			code = text_gen_insn(JMP32_INSN_OPCODE, insn, &__x86_return_thunk);
+			code = text_gen_insn(JMP32_INSN_OPCODE, insn, x86_return_thunk);
 		else
 			code = &retinsn;
 		break;
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -430,7 +430,7 @@ static void emit_return(u8 **pprog, u8 *
 	u8 *prog = *pprog;
 
 	if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) {
-		emit_jump(&prog, &__x86_return_thunk, ip);
+		emit_jump(&prog, x86_return_thunk, ip);
 	} else {
 		EMIT1(0xC3);		/* ret */
 		if (IS_ENABLED(CONFIG_SLS))


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 27/38] x86/asm: Provide ALTERNATIVE_3
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (25 preceding siblings ...)
  2022-07-16 23:17 ` [patch 26/38] x86/returnthunk: Allow different return thunks Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 28/38] x86/retbleed: Add SKL return thunk Thomas Gleixner
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Fairly straight forward adaptation/extention of ALTERNATIVE_2.

Required for call depth tracking.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/alternative.h |   33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -367,6 +367,7 @@ static inline int alternatives_text_rese
 #define old_len			141b-140b
 #define new_len1		144f-143f
 #define new_len2		145f-144f
+#define new_len3		146f-145f
 
 /*
  * gas compatible max based on the idea from:
@@ -374,7 +375,8 @@ static inline int alternatives_text_rese
  *
  * The additional "-" is needed because gas uses a "true" value of -1.
  */
-#define alt_max_short(a, b)	((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
+#define alt_max_2(a, b)		((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
+#define alt_max_3(a, b, c)	(alt_max_2(alt_max_2(a, b), c))
 
 
 /*
@@ -386,8 +388,8 @@ static inline int alternatives_text_rese
 140:
 	\oldinstr
 141:
-	.skip -((alt_max_short(new_len1, new_len2) - (old_len)) > 0) * \
-		(alt_max_short(new_len1, new_len2) - (old_len)),0x90
+	.skip -((alt_max_2(new_len1, new_len2) - (old_len)) > 0) * \
+		(alt_max_2(new_len1, new_len2) - (old_len)),0x90
 142:
 
 	.pushsection .altinstructions,"a"
@@ -404,6 +406,31 @@ static inline int alternatives_text_rese
 	.popsection
 .endm
 
+.macro ALTERNATIVE_3 oldinstr, newinstr1, feature1, newinstr2, feature2, newinstr3, feature3
+140:
+	\oldinstr
+141:
+	.skip -((alt_max_3(new_len1, new_len2, new_len3) - (old_len)) > 0) * \
+		(alt_max_3(new_len1, new_len2, new_len3) - (old_len)),0x90
+142:
+
+	.pushsection .altinstructions,"a"
+	altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f
+	altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f
+	altinstruction_entry 140b,145f,\feature3,142b-140b,146f-145f
+	.popsection
+
+	.pushsection .altinstr_replacement,"ax"
+143:
+	\newinstr1
+144:
+	\newinstr2
+145:
+	\newinstr3
+146:
+	.popsection
+.endm
+
 /* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */
 #define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \
 	ALTERNATIVE_2 oldinstr, newinstr_no, X86_FEATURE_ALWAYS,	\


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 28/38] x86/retbleed: Add SKL return thunk
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (26 preceding siblings ...)
  2022-07-16 23:17 ` [patch 27/38] x86/asm: Provide ALTERNATIVE_3 Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 29/38] x86/retpoline: Add SKL retthunk retpolines Thomas Gleixner
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

To address the Intel SKL RSB underflow issue in software it's required to
do call depth tracking.

Provide a return thunk for call depth tracking on Intel SKL CPUs.

The tracking does not use a counter. It uses uses arithmetic shift
right on call entry and logical shift left on return.

The depth tracking variable is initialized to 0x8000.... when the call
depth is zero. The arithmetic shift right sign extends the MSB and
saturates after the 12th call. The shift count is 5 so the tracking covers
12 nested calls. On return the variable is shifted left logically so it
becomes zero again.

 CALL	 	   	RET
 0: 0x8000000000000000	0x0000000000000000
 1: 0xfc00000000000000	0xf000000000000000
...
11: 0xfffffffffffffff8	0xfffffffffffffc00
12: 0xffffffffffffffff	0xffffffffffffffe0

After a return buffer fill the depth is credited 12 calls before the next
stuffing has to take place.

There is a inaccuracy for situations like this:

   10 calls
    5 returns
    3 calls
    4 returns
    3 calls
    ....

The shift count might cause this to be off by one in either direction, but
there is still a cushion vs. the RSB depth. The algorithm does not claim to
be perfect, but it should obfuscate the problem enough to make exploitation
extremly difficult.

The theory behind this is:

RSB is a stack with depth 16 which is filled on every call. On the return
path speculation "pops" entries to speculate down the call chain. Once the
speculative RSB is empty it switches to other predictors, e.g. the Branch
History Buffer, which can be mistrained by user space and misguide the
speculation path to a gadget.

Call depth tracking is designed to break this speculation path by stuffing
speculation trap calls into the RSB which are never getting a corresponding
return executed. This stalls the prediction path until it gets resteered,

The assumption is that stuffing at the 12th return is sufficient to break
the speculation before it hits the underflow and the fallback to the other
predictors. Testing confirms that it works. Johannes, one of the retbleed
researchers. tried to attack this approach but failed.

There is obviously no scientific proof that this will withstand future
research progress, but all we can do right now is to speculate about it.

The SAR/SHL usage was suggested by Andi Kleen.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/entry_64.S            |   10 +--
 arch/x86/include/asm/nospec-branch.h |  114 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/common.c         |    5 +
 arch/x86/lib/retpoline.S             |   30 +++++++++
 4 files changed, 149 insertions(+), 10 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -287,6 +287,7 @@ SYM_FUNC_END(__switch_to_asm)
 SYM_CODE_START(ret_from_fork)
 	UNWIND_HINT_EMPTY
 	ANNOTATE_NOENDBR // copy_thread
+	CALL_DEPTH_ACCOUNT
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
@@ -331,7 +332,7 @@ SYM_CODE_START(xen_error_entry)
 	UNWIND_HINT_FUNC
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
-	UNTRAIN_RET
+	UNTRAIN_RET_FROM_CALL
 	RET
 SYM_CODE_END(xen_error_entry)
 
@@ -975,7 +976,7 @@ SYM_CODE_START(paranoid_entry)
 	 * CR3 above, keep the old value in a callee saved register.
 	 */
 	IBRS_ENTER save_reg=%r15
-	UNTRAIN_RET
+	UNTRAIN_RET_FROM_CALL
 
 	RET
 SYM_CODE_END(paranoid_entry)
@@ -1060,7 +1061,7 @@ SYM_CODE_START(error_entry)
 	/* We have user CR3.  Change to kernel CR3. */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 	IBRS_ENTER
-	UNTRAIN_RET
+	UNTRAIN_RET_FROM_CALL
 
 	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
 	/* Put us onto the real thread stack. */
@@ -1095,6 +1096,7 @@ SYM_CODE_START(error_entry)
 	 */
 .Lerror_entry_done_lfence:
 	FENCE_SWAPGS_KERNEL_ENTRY
+	CALL_DEPTH_ACCOUNT
 	leaq	8(%rsp), %rax			/* return pt_regs pointer */
 	ANNOTATE_UNRET_END
 	RET
@@ -1113,7 +1115,7 @@ SYM_CODE_START(error_entry)
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 	IBRS_ENTER
-	UNTRAIN_RET
+	UNTRAIN_RET_FROM_CALL
 
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -11,8 +11,53 @@
 #include <asm/cpufeatures.h>
 #include <asm/msr-index.h>
 #include <asm/unwind_hints.h>
+#include <asm/percpu.h>
 
 #define RETPOLINE_THUNK_SIZE	32
+#define RSB_CLEAR_LOOPS		32	/* To forcibly overwrite all entries */
+
+/*
+ * Call depth tracking for Intel SKL CPUs to address the RSB underflow
+ * issue in software.
+ *
+ * The tracking does not use a counter. It uses uses arithmetic shift
+ * right on call entry and logical shift left on return.
+ *
+ * The depth tracking variable is initialized to 0x8000.... when the call
+ * depth is zero. The arithmetic shift right sign extends the MSB and
+ * saturates after the 12th call. The shift count is 5 for both directions
+ * so the tracking covers 12 nested calls.
+ *
+ *  Call
+ *  0: 0x8000000000000000	0x0000000000000000
+ *  1: 0xfc00000000000000	0xf000000000000000
+ * ...
+ * 11: 0xfffffffffffffff8	0xfffffffffffffc00
+ * 12: 0xffffffffffffffff	0xffffffffffffffe0
+ *
+ * After a return buffer fill the depth is credited 12 calls before the
+ * next stuffing has to take place.
+ *
+ * There is a inaccuracy for situations like this:
+ *
+ *  10 calls
+ *   5 returns
+ *   3 calls
+ *   4 returns
+ *   3 calls
+ *   ....
+ *
+ * The shift count might cause this to be off by one in either direction,
+ * but there is still a cushion vs. the RSB depth. The algorithm does not
+ * claim to be perfect and it can be speculated around by the CPU, but it
+ * is considered that it obfuscates the problem enough to make exploitation
+ * extremly difficult.
+ */
+#define RET_DEPTH_SHIFT			5
+#define RSB_RET_STUFF_LOOPS		16
+#define RET_DEPTH_INIT			0x8000000000000000ULL
+#define RET_DEPTH_INIT_FROM_CALL	0xfc00000000000000ULL
+#define RET_DEPTH_CREDIT		0xffffffffffffffffULL
 
 /*
  * Fill the CPU return stack buffer.
@@ -31,7 +76,28 @@
  * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
  */
 
-#define RSB_CLEAR_LOOPS		32	/* To forcibly overwrite all entries */
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+#define CREDIT_CALL_DEPTH					\
+	movq	$-1, PER_CPU_VAR(__x86_call_depth);
+
+#define RESET_CALL_DEPTH					\
+	mov	$0x80, %rax;					\
+	shl	$56, %rax;					\
+	movq	%rax, PER_CPU_VAR(__x86_call_depth);
+
+#define RESET_CALL_DEPTH_FROM_CALL				\
+	mov	$0xfc, %rax;					\
+	shl	$56, %rax;					\
+	movq	%rax, PER_CPU_VAR(__x86_call_depth);
+
+#define INCREMENT_CALL_DEPTH					\
+	sarq	$5, %gs:__x86_call_depth
+#else
+#define CREDIT_CALL_DEPTH
+#define RESET_CALL_DEPTH
+#define INCREMENT_CALL_DEPTH
+#define RESET_CALL_DEPTH_FROM_CALL
+#endif
 
 /*
  * Google experimented with loop-unrolling and this turned out to be
@@ -59,7 +125,9 @@
 774:						\
 	add	$(BITS_PER_LONG/8) * 2, sp;	\
 	dec	reg;				\
-	jnz	771b;
+	jnz	771b;				\
+						\
+	CREDIT_CALL_DEPTH
 
 #ifdef __ASSEMBLY__
 
@@ -145,11 +213,32 @@
  * where we have a stack but before any RET instruction.
  */
 .macro UNTRAIN_RET
-#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY)
+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
+	defined(CONFIG_X86_FEATURE_CALL_DEPTH)
 	ANNOTATE_UNRET_END
-	ALTERNATIVE_2 "",						\
-	              CALL_ZEN_UNTRAIN_RET, X86_FEATURE_UNRET,		\
-		      "call entry_ibpb", X86_FEATURE_ENTRY_IBPB
+	ALTERNATIVE_3 "",						\
+		      CALL_ZEN_UNTRAIN_RET, X86_FEATURE_UNRET,		\
+		      "call entry_ibpb", X86_FEATURE_ENTRY_IBPB,	\
+		      __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+#endif
+.endm
+
+.macro UNTRAIN_RET_FROM_CALL
+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
+	defined(CONFIG_X86_FEATURE_CALL_DEPTH)
+	ANNOTATE_UNRET_END
+	ALTERNATIVE_3 "",						\
+		      CALL_ZEN_UNTRAIN_RET, X86_FEATURE_UNRET,		\
+		      "call entry_ibpb", X86_FEATURE_ENTRY_IBPB,	\
+		      __stringify(RESET_CALL_DEPTH_FROM_CALL), X86_FEATURE_CALL_DEPTH
+#endif
+.endm
+
+
+.macro CALL_DEPTH_ACCOUNT
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+	ALTERNATIVE "",							\
+		    __stringify(INCREMENT_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
 #endif
 .endm
 
@@ -174,6 +263,19 @@ extern void (*x86_return_thunk)(void);
 #define x86_return_thunk	(&__x86_return_thunk)
 #endif
 
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+extern void __x86_return_skl(void);
+
+static inline void x86_set_skl_return_thunk(void)
+{
+	x86_return_thunk = &__x86_return_skl;
+}
+
+DECLARE_PER_CPU(u64, __x86_call_depth);
+#else
+static inline void x86_set_skl_return_thunk(void) {}
+#endif
+
 #ifdef CONFIG_RETPOLINE
 
 #define GEN(reg) \
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2002,6 +2002,11 @@ EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
 DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
 
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+DEFINE_PER_CPU(u64, __x86_call_depth);
+EXPORT_PER_CPU_SYMBOL_GPL(__x86_call_depth);
+#endif
+
 static void wrmsrl_cstar(unsigned long val)
 {
 	/*
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -8,6 +8,7 @@
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
 #include <asm/unwind_hints.h>
+#include <asm/percpu.h>
 #include <asm/frame.h>
 
 	.section .text.__x86.indirect_thunk
@@ -140,3 +141,32 @@ SYM_FUNC_END(zen_untrain_ret)
 EXPORT_SYMBOL(__x86_return_thunk)
 
 #endif /* CONFIG_RETHUNK */
+
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+
+	.align 64
+SYM_FUNC_START(__x86_return_skl)
+	ANNOTATE_NOENDBR
+	/* Keep the hotpath in a 16byte I-fetch */
+	shlq	$5, PER_CPU_VAR(__x86_call_depth)
+	jz	1f
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+1:
+	.rept	16
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call	2f
+	int3
+2:
+	.endr
+	add	$(8*16), %rsp
+
+	CREDIT_CALL_DEPTH
+
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+SYM_FUNC_END(__x86_return_skl)
+
+#endif /* CONFIG_CALL_DEPTH_TRACKING */


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 29/38] x86/retpoline: Add SKL retthunk retpolines
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (27 preceding siblings ...)
  2022-07-16 23:17 ` [patch 28/38] x86/retbleed: Add SKL return thunk Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:17 ` [patch 30/38] x86/retbleed: Add SKL call thunk Thomas Gleixner
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Ensure that retpolines do the proper call accounting so that the return
accounting works correctly.

Specifically; retpolines are used to replace both 'jmp *%reg' and
'call *%reg', however these two cases do not have the same accounting
requirements. Therefore split things up and provide two different
retpoline arrays for SKL.

The 'jmp *%reg' case needs no accounting, the
__x86_indirect_jump_thunk_array[] covers this. The retpoline is
changed to not use the return thunk; it's a simple call;ret construct.

[ strictly speaking it should do:
	andq $(~0x1f), PER_CPU_VAR(__x86_call_depth)
  but we can argue this can be covered by the fuzz we already have
  in the accounting depth (12) vs the RSB depth (16) ]

The 'call *%reg' case does need accounting, the
__x86_indirect_call_thunk_array[] covers this. Again, this retpoline
avoids the use of the return-thunk, in this case to avoid double
accounting.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/nospec-branch.h |   12 +++++
 arch/x86/kernel/alternative.c        |   43 +++++++++++++++++++--
 arch/x86/lib/retpoline.S             |   71 +++++++++++++++++++++++++++++++----
 arch/x86/net/bpf_jit_comp.c          |    5 +-
 4 files changed, 119 insertions(+), 12 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -252,6 +252,8 @@
 
 typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE];
 extern retpoline_thunk_t __x86_indirect_thunk_array[];
+extern retpoline_thunk_t __x86_indirect_call_thunk_array[];
+extern retpoline_thunk_t __x86_indirect_jump_thunk_array[];
 
 extern void __x86_return_thunk(void);
 extern void zen_untrain_ret(void);
@@ -283,6 +285,16 @@ static inline void x86_set_skl_return_th
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+#define GEN(reg)						\
+	extern retpoline_thunk_t __x86_indirect_call_thunk_ ## reg;
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
+#define GEN(reg)						\
+	extern retpoline_thunk_t __x86_indirect_jump_thunk_ ## reg;
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
 #ifdef CONFIG_X86_64
 
 /*
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -377,6 +377,38 @@ static int emit_indirect(int op, int reg
 	return i;
 }
 
+static int emit_call_track_retpoline(void *addr, struct insn *insn, int reg, u8 *bytes)
+{
+	u8 op = insn->opcode.bytes[0];
+	int i = 0;
+
+	if (insn->length == 6)
+		bytes[i++] = 0x2e; /* CS-prefix */
+
+	switch (op) {
+	case CALL_INSN_OPCODE:
+		__text_gen_insn(bytes+i, op, addr+i,
+				__x86_indirect_call_thunk_array[reg],
+				CALL_INSN_SIZE);
+		i += CALL_INSN_SIZE;
+		break;
+
+	case JMP32_INSN_OPCODE:
+		__text_gen_insn(bytes+i, op, addr+i,
+				__x86_indirect_jump_thunk_array[reg],
+				JMP32_INSN_SIZE);
+		i += JMP32_INSN_SIZE;
+		break;
+
+	default:
+		BUG();
+	}
+
+	WARN_ON_ONCE(i != insn->length);
+
+	return i;
+}
+
 /*
  * Rewrite the compiler generated retpoline thunk calls.
  *
@@ -408,11 +440,16 @@ static int patch_retpoline(void *addr, s
 	/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
 	BUG_ON(reg == 4);
 
+	op = insn->opcode.bytes[0];
+
 	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
-	    !cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE))
+	    !cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
+		if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH)) {
+			i += emit_call_track_retpoline(addr, insn, reg, bytes);
+			return i;
+		}
 		return -1;
-
-	op = insn->opcode.bytes[0];
+	}
 
 	/*
 	 * Convert:
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -13,17 +13,18 @@
 
 	.section .text.__x86.indirect_thunk
 
-.macro RETPOLINE reg
+
+.macro POLINE reg
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call    .Ldo_rop_\@
-.Lspec_trap_\@:
-	UNWIND_HINT_EMPTY
-	pause
-	lfence
-	jmp .Lspec_trap_\@
+	int3
 .Ldo_rop_\@:
 	mov     %\reg, (%_ASM_SP)
 	UNWIND_HINT_FUNC
+.endm
+
+.macro RETPOLINE reg
+	POLINE \reg
 	RET
 .endm
 
@@ -53,7 +54,6 @@ SYM_INNER_LABEL(__x86_indirect_thunk_\re
  */
 
 #define __EXPORT_THUNK(sym)	_ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
-#define EXPORT_THUNK(reg)	__EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
 
 	.align RETPOLINE_THUNK_SIZE
 SYM_CODE_START(__x86_indirect_thunk_array)
@@ -65,10 +65,65 @@ SYM_CODE_START(__x86_indirect_thunk_arra
 	.align RETPOLINE_THUNK_SIZE
 SYM_CODE_END(__x86_indirect_thunk_array)
 
-#define GEN(reg) EXPORT_THUNK(reg)
+#define GEN(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
+#ifdef CONFIG_CALL_DEPTH_TRACKING
+.macro CALL_THUNK reg
+	.align RETPOLINE_THUNK_SIZE
+
+SYM_INNER_LABEL(__x86_indirect_call_thunk_\reg, SYM_L_GLOBAL)
+	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
+
+	CALL_DEPTH_ACCOUNT
+	POLINE \reg
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+.endm
+
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_START(__x86_indirect_call_thunk_array)
+
+#define GEN(reg) CALL_THUNK reg
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_END(__x86_indirect_call_thunk_array)
+
+#define GEN(reg) __EXPORT_THUNK(__x86_indirect_call_thunk_ ## reg)
 #include <asm/GEN-for-each-reg.h>
 #undef GEN
 
+.macro JUMP_THUNK reg
+	.align RETPOLINE_THUNK_SIZE
+
+SYM_INNER_LABEL(__x86_indirect_jump_thunk_\reg, SYM_L_GLOBAL)
+	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
+	POLINE \reg
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+.endm
+
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_START(__x86_indirect_jump_thunk_array)
+
+#define GEN(reg) JUMP_THUNK reg
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
+	.align RETPOLINE_THUNK_SIZE
+SYM_CODE_END(__x86_indirect_jump_thunk_array)
+
+#define GEN(reg) __EXPORT_THUNK(__x86_indirect_jump_thunk_ ## reg)
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+#endif
 /*
  * This function name is magical and is used by -mfunction-return=thunk-extern
  * for the compiler to generate JMPs to it.
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -417,7 +417,10 @@ static void emit_indirect_jump(u8 **ppro
 		EMIT2(0xFF, 0xE0 + reg);
 	} else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
 		OPTIMIZER_HIDE_VAR(reg);
-		emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+		if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH))
+			emit_jump(&prog, &__x86_indirect_jump_thunk_array[reg], ip);
+		else
+			emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
 	} else {
 		EMIT2(0xFF, 0xE0 + reg);
 	}


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 30/38] x86/retbleed: Add SKL call thunk
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (28 preceding siblings ...)
  2022-07-16 23:17 ` [patch 29/38] x86/retpoline: Add SKL retthunk retpolines Thomas Gleixner
@ 2022-07-16 23:17 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 31/38] x86/calldepth: Add ret/call counting for debug Thomas Gleixner
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Add the actual SKL call thunk for call depth accounting.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/callthunks.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -52,6 +52,24 @@ struct thunk_desc {
 
 static struct thunk_desc callthunk_desc __ro_after_init;
 
+asm (
+	".pushsection .rodata				\n"
+	".global skl_call_thunk_template		\n"
+	"skl_call_thunk_template:			\n"
+		__stringify(INCREMENT_CALL_DEPTH)"	\n"
+	".global skl_call_thunk_tail			\n"
+	"skl_call_thunk_tail:				\n"
+	".popsection					\n"
+);
+
+extern u8 skl_call_thunk_template[];
+extern u8 skl_call_thunk_tail[];
+
+#define SKL_TMPL_SIZE \
+	((unsigned int)(skl_call_thunk_tail - skl_call_thunk_template))
+#define SKL_CALLTHUNK_CODE_SIZE	(SKL_TMPL_SIZE + JMP32_INSN_SIZE + INT3_INSN_SIZE)
+#define SKL_CALLTHUNK_SIZE	roundup_pow_of_two(SKL_CALLTHUNK_CODE_SIZE)
+
 struct thunk_mem {
 	void			*base;
 	unsigned int		size;
@@ -447,6 +465,12 @@ static __init noinline void callthunks_i
 {
 	int ret;
 
+	if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH)) {
+		callthunk_desc.template = skl_call_thunk_template;
+		callthunk_desc.template_size = SKL_TMPL_SIZE;
+		callthunk_desc.thunk_size = SKL_CALLTHUNK_SIZE;
+	}
+
 	if (!callthunk_desc.template)
 		return;
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 31/38] x86/calldepth: Add ret/call counting for debug
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (29 preceding siblings ...)
  2022-07-16 23:17 ` [patch 30/38] x86/retbleed: Add SKL call thunk Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 32/38] static_call: Add call depth tracking support Thomas Gleixner
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

Add a debuigfs mechanism to validate the accounting, e.g. vs. call/ret
balance and to gather statistics about the stuffing to call ratio.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/nospec-branch.h |   32 +++++++++++++++++++--
 arch/x86/kernel/callthunks.c         |   51 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/retpoline.S             |    7 ++++
 3 files changed, 86 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -77,6 +77,23 @@
  */
 
 #ifdef CONFIG_CALL_DEPTH_TRACKING
+
+#ifdef CONFIG_CALL_THUNKS_DEBUG
+# define CALL_THUNKS_DEBUG_INC_CALLS				\
+	incq	%gs:__x86_call_count;
+# define CALL_THUNKS_DEBUG_INC_RETS				\
+	incq	%gs:__x86_ret_count;
+# define CALL_THUNKS_DEBUG_INC_STUFFS				\
+	incq	%gs:__x86_stuffs_count;
+# define CALL_THUNKS_DEBUG_INC_CTXSW				\
+	incq	%gs:__x86_ctxsw_count;
+#else
+# define CALL_THUNKS_DEBUG_INC_CALLS
+# define CALL_THUNKS_DEBUG_INC_RETS
+# define CALL_THUNKS_DEBUG_INC_STUFFS
+# define CALL_THUNKS_DEBUG_INC_CTXSW
+#endif
+
 #define CREDIT_CALL_DEPTH					\
 	movq	$-1, PER_CPU_VAR(__x86_call_depth);
 
@@ -88,10 +105,12 @@
 #define RESET_CALL_DEPTH_FROM_CALL				\
 	mov	$0xfc, %rax;					\
 	shl	$56, %rax;					\
-	movq	%rax, PER_CPU_VAR(__x86_call_depth);
+	movq	%rax, PER_CPU_VAR(__x86_call_depth);		\
+	CALL_THUNKS_DEBUG_INC_CALLS
 
 #define INCREMENT_CALL_DEPTH					\
-	sarq	$5, %gs:__x86_call_depth
+	sarq	$5, %gs:__x86_call_depth;			\
+	CALL_THUNKS_DEBUG_INC_CALLS
 #else
 #define CREDIT_CALL_DEPTH
 #define RESET_CALL_DEPTH
@@ -127,7 +146,8 @@
 	dec	reg;				\
 	jnz	771b;				\
 						\
-	CREDIT_CALL_DEPTH
+	CREDIT_CALL_DEPTH			\
+	CALL_THUNKS_DEBUG_INC_CTXSW
 
 #ifdef __ASSEMBLY__
 
@@ -274,6 +294,12 @@ static inline void x86_set_skl_return_th
 }
 
 DECLARE_PER_CPU(u64, __x86_call_depth);
+#ifdef CONFIG_CALL_THUNKS_DEBUG
+DECLARE_PER_CPU(u64, __x86_call_count);
+DECLARE_PER_CPU(u64, __x86_ret_count);
+DECLARE_PER_CPU(u64, __x86_stuffs_count);
+DECLARE_PER_CPU(u64, __x86_ctxsw_count);
+#endif
 #else
 static inline void x86_set_skl_return_thunk(void) {}
 #endif
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -3,6 +3,7 @@
 #define pr_fmt(fmt) "callthunks: " fmt
 
 #include <linux/btree.h>
+#include <linux/debugfs.h>
 #include <linux/memory.h>
 #include <linux/moduleloader.h>
 #include <linux/set_memory.h>
@@ -32,6 +33,13 @@ static int __init debug_thunks(char *str
 	return 1;
 }
 __setup("debug-callthunks", debug_thunks);
+
+DEFINE_PER_CPU(u64, __x86_call_count);
+DEFINE_PER_CPU(u64, __x86_ret_count);
+DEFINE_PER_CPU(u64, __x86_stuffs_count);
+DEFINE_PER_CPU(u64, __x86_ctxsw_count);
+EXPORT_SYMBOL_GPL(__x86_ctxsw_count);
+
 #else
 #define prdbg(fmt, args...)	do { } while(0)
 #endif
@@ -530,3 +538,46 @@ void callthunks_module_free(struct modul
 	mutex_unlock(&text_mutex);
 }
 #endif /* CONFIG_MODULES */
+
+#if defined(CONFIG_CALL_THUNKS_DEBUG) && defined(CONFIG_DEBUG_FS)
+static int callthunks_debug_show(struct seq_file *m, void *p)
+{
+	unsigned long cpu = (unsigned long)m->private;
+
+	seq_printf(m, "C: %16llu R: %16llu S: %16llu X: %16llu\n,",
+		   per_cpu(__x86_call_count, cpu),
+		   per_cpu(__x86_ret_count, cpu),
+		   per_cpu(__x86_stuffs_count, cpu),
+		   per_cpu(__x86_ctxsw_count, cpu));
+	return 0;
+}
+
+static int callthunks_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, callthunks_debug_show, inode->i_private);
+}
+
+static const struct file_operations dfs_ops = {
+	.open		= callthunks_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init callthunks_debugfs_init(void)
+{
+	struct dentry *dir;
+	unsigned long cpu;
+
+	dir = debugfs_create_dir("callthunks", NULL);
+	for_each_possible_cpu(cpu) {
+		void *arg = (void *)cpu;
+		char name [10];
+
+		sprintf(name, "cpu%lu", cpu);
+		debugfs_create_file(name, 0644, dir, arg, &dfs_ops);
+	}
+	return 0;
+}
+__initcall(callthunks_debugfs_init);
+#endif
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -202,13 +202,18 @@ EXPORT_SYMBOL(__x86_return_thunk)
 	.align 64
 SYM_FUNC_START(__x86_return_skl)
 	ANNOTATE_NOENDBR
-	/* Keep the hotpath in a 16byte I-fetch */
+	/*
+	 * Keep the hotpath in a 16byte I-fetch for the non-debug
+	 * case.
+	 */
+	CALL_THUNKS_DEBUG_INC_RETS
 	shlq	$5, PER_CPU_VAR(__x86_call_depth)
 	jz	1f
 	ANNOTATE_UNRET_SAFE
 	ret
 	int3
 1:
+	CALL_THUNKS_DEBUG_INC_STUFFS
 	.rept	16
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call	2f


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 32/38] static_call: Add call depth tracking support
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (30 preceding siblings ...)
  2022-07-16 23:18 ` [patch 31/38] x86/calldepth: Add ret/call counting for debug Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 33/38] kallsyms: Take callthunks into account Thomas Gleixner
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

When indirect calls are switched to direct calls then it has to be ensured
that the call target is not the function, but the call thunk when call
depth tracking is enabled. But static calls are available before call
thunks have been set up.

Ensure a second run through the static call patching code after call thunks
have been created. When call thunks are not enabled this has no side
effects.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/alternative.h |    5 +++++
 arch/x86/kernel/callthunks.c       |   37 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/static_call.c      |    1 +
 include/linux/static_call.h        |    2 ++
 kernel/static_call_inline.c        |   23 ++++++++++++++++++-----
 5 files changed, 63 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -93,12 +93,17 @@ extern void callthunks_patch_builtin_cal
 extern void callthunks_patch_module_calls(struct callthunk_sites *sites,
 					  struct module *mod);
 extern void callthunks_module_free(struct module *mod);
+extern void *callthunks_translate_call_dest(void *dest);
 #else
 static __always_inline void callthunks_patch_builtin_calls(void) {}
 static __always_inline void
 callthunks_patch_module_calls(struct callthunk_sites *sites,
 			      struct module *mod) {}
 static __always_inline void callthunks_module_free(struct module *mod) { }
+static __always_inline void *callthunks_translate_call_dest(void *dest)
+{
+	return dest;
+}
 #endif
 
 #ifdef CONFIG_SMP
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -7,6 +7,7 @@
 #include <linux/memory.h>
 #include <linux/moduleloader.h>
 #include <linux/set_memory.h>
+#include <linux/static_call.h>
 #include <linux/vmalloc.h>
 
 #include <asm/alternative.h>
@@ -492,6 +493,7 @@ static __init noinline void callthunks_i
 	if (WARN_ON_ONCE(ret))
 		return;
 
+	static_call_force_reinit();
 	thunks_initialized = true;
 }
 
@@ -511,6 +513,41 @@ void __init callthunks_patch_builtin_cal
 	mutex_unlock(&text_mutex);
 }
 
+static bool is_module_init_dest(void *dest)
+{
+	bool ret = false;
+
+#ifdef CONFIG_MODULES
+	struct module *mod;
+
+	preempt_disable();
+	mod = __module_address((unsigned long)dest);
+	if (mod && within_module_init((unsigned long)dest, mod))
+		ret = true;
+	preempt_enable();
+#endif
+	return ret;
+}
+
+void *callthunks_translate_call_dest(void *dest)
+{
+	void *thunk;
+
+	lockdep_assert_held(&text_mutex);
+
+	if (!thunks_initialized || skip_addr(dest))
+		return dest;
+
+	thunk = btree_lookup64(&call_thunks, (unsigned long)dest);
+
+	if (thunk)
+		return thunk;
+
+	WARN_ON_ONCE(!is_kernel_inittext((unsigned long)dest) &&
+		     !is_module_init_dest(dest));
+	return dest;
+}
+
 #ifdef CONFIG_MODULES
 void noinline callthunks_patch_module_calls(struct callthunk_sites *cs,
 					    struct module *mod)
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -34,6 +34,7 @@ static void __ref __static_call_transfor
 
 	switch (type) {
 	case CALL:
+		func = callthunks_translate_call_dest(func);
 		code = text_gen_insn(CALL_INSN_OPCODE, insn, func);
 		if (func == &__static_call_return0) {
 			emulate = code;
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -162,6 +162,8 @@ extern void arch_static_call_transform(v
 
 extern int __init static_call_init(void);
 
+extern void static_call_force_reinit(void);
+
 struct static_call_mod {
 	struct static_call_mod *next;
 	struct module *mod; /* for vmlinux, mod == NULL */
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -15,7 +15,18 @@ extern struct static_call_site __start_s
 extern struct static_call_tramp_key __start_static_call_tramp_key[],
 				    __stop_static_call_tramp_key[];
 
-static bool static_call_initialized;
+static int static_call_initialized;
+
+/*
+ * Must be called before early_initcall() to be effective.
+ */
+void static_call_force_reinit(void)
+{
+	if (WARN_ON_ONCE(!static_call_initialized))
+		return;
+
+	static_call_initialized++;
+}
 
 /* mutex to protect key modules/sites */
 static DEFINE_MUTEX(static_call_mutex);
@@ -475,7 +486,8 @@ int __init static_call_init(void)
 {
 	int ret;
 
-	if (static_call_initialized)
+	/* See static_call_force_reinit(). */
+	if (static_call_initialized == 1)
 		return 0;
 
 	cpus_read_lock();
@@ -490,11 +502,12 @@ int __init static_call_init(void)
 		BUG();
 	}
 
-	static_call_initialized = true;
-
 #ifdef CONFIG_MODULES
-	register_module_notifier(&static_call_module_nb);
+	if (!static_call_initialized)
+		register_module_notifier(&static_call_module_nb);
 #endif
+
+	static_call_initialized = 1;
 	return 0;
 }
 early_initcall(static_call_init);


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 33/38] kallsyms: Take callthunks into account
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (31 preceding siblings ...)
  2022-07-16 23:18 ` [patch 32/38] static_call: Add call depth tracking support Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 34/38] x86/orc: Make it callthunk aware Thomas Gleixner
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel),
	Masami Hiramatsu

From: Peter Zijlstra <peterz@infradead.org>

Similar to ftrace and bpf call thunks are creating symbols which are
interesting for things like printing stack-traces, perf, live-patching
and things like that.

Add the required functionality to the core and implement it in x86.

Callthunks will report the same function name as their target, but
their module name will be "callthunk" or "callthunk:${modname}" for
modules.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/callthunks.c |  155 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/kallsyms.h     |   24 ++++++
 kernel/kallsyms.c            |   23 ++++++
 3 files changed, 202 insertions(+)

--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -4,6 +4,7 @@
 
 #include <linux/btree.h>
 #include <linux/debugfs.h>
+#include <linux/kallsyms.h>
 #include <linux/memory.h>
 #include <linux/moduleloader.h>
 #include <linux/set_memory.h>
@@ -548,6 +549,160 @@ void *callthunks_translate_call_dest(voi
 	return dest;
 }
 
+static bool is_module_callthunk(void *addr)
+{
+	bool ret = false;
+
+#ifdef CONFIG_MODULES
+	struct module *mod;
+
+	preempt_disable();
+	mod = __module_address((unsigned long)addr);
+	if (mod && within_module_thunk((unsigned long)addr, mod))
+		ret = true;
+	preempt_enable();
+#endif
+	return ret;
+}
+
+static bool is_callthunk(void *addr)
+{
+	if (builtin_layout.base <= addr &&
+	    addr < builtin_layout.base + builtin_layout.size)
+		return true;
+	return is_module_callthunk(addr);
+}
+
+static void *__callthunk_dest(void *addr)
+{
+	unsigned long mask = callthunk_desc.thunk_size - 1;
+	void *thunk;
+
+	thunk = (void *)((unsigned long)addr & ~mask);
+	thunk += callthunk_desc.template_size;
+	return jump_get_dest(thunk);
+}
+
+static void *callthunk_dest(void *addr)
+{
+	if (!is_callthunk(addr))
+		return NULL;
+	return __callthunk_dest(addr);
+}
+
+static void set_modname(char **modname, unsigned long addr)
+{
+	if (!modname || !IS_ENABLED(CONFIG_MODULES))
+		*modname = "callthunk";
+
+#ifdef CONFIG_MODULES
+	else {
+		struct module * mod;
+
+		preempt_disable();
+		mod = __module_address(addr);
+		*modname = mod->callthunk_name;
+		preempt_enable();
+	}
+#endif
+}
+
+const char *
+callthunk_address_lookup(unsigned long addr, unsigned long *size,
+			 unsigned long *off, char **modname, char *sym)
+{
+	unsigned long dest, mask = callthunk_desc.thunk_size - 1;
+	const char *ret;
+
+	if (!thunks_initialized)
+		return NULL;
+
+	dest = (unsigned long)callthunk_dest((void *)addr);
+	if (!dest)
+		return NULL;
+
+	ret = kallsyms_lookup(dest, size, off, modname, sym);
+	if (!ret)
+		return NULL;
+
+	*off = addr & mask;
+	*size = callthunk_desc.thunk_size;
+
+	set_modname(modname, addr);
+	return ret;
+}
+
+static int get_module_thunk(char **modname, struct module_layout **layoutp,
+			    unsigned int symthunk)
+{
+#ifdef CONFIG_MODULES
+	extern struct list_head modules;
+	struct module *mod;
+	unsigned int size;
+
+	symthunk -= (*layoutp)->text_size;
+	list_for_each_entry_rcu(mod, &modules, list) {
+		if (mod->state == MODULE_STATE_UNFORMED)
+			continue;
+
+		*layoutp = &mod->thunk_layout;
+		size = mod->thunk_layout.text_size;
+
+		if (symthunk >= size) {
+			symthunk -= size;
+			continue;
+		}
+		*modname = mod->callthunk_name;
+		return symthunk;
+	}
+#endif
+	return -ERANGE;
+}
+
+int callthunk_get_kallsym(unsigned int symnum, unsigned long *value,
+			  char *type, char *name, char *module_name,
+			  int *exported)
+{
+	int symthunk = symnum * callthunk_desc.thunk_size;
+	struct module_layout *layout = &builtin_layout;
+	char *modname = "callthunk";
+	void *thunk, *dest;
+	int ret = -ERANGE;
+
+	if (!thunks_initialized)
+		return -ERANGE;
+
+	preempt_disable();
+
+	if (symthunk >= layout->text_size) {
+		symthunk = get_module_thunk(&modname, &layout, symthunk);
+		if (symthunk < 0)
+			goto out;
+	}
+
+	thunk = layout->base + symthunk;
+	dest = __callthunk_dest(thunk);
+
+	if (!dest) {
+		strlcpy(name, "(unknown callthunk)", KSYM_NAME_LEN);
+		ret = 0;
+		goto out;
+	}
+
+	ret = lookup_symbol_name((unsigned long)dest, name);
+	if (ret)
+		goto out;
+
+	*value = (unsigned long)thunk;
+	*exported = 0;
+	*type = 't';
+	strlcpy(module_name, modname, MODULE_NAME_LEN);
+
+out:
+	preempt_enable();
+	return ret;
+}
+
 #ifdef CONFIG_MODULES
 void noinline callthunks_patch_module_calls(struct callthunk_sites *cs,
 					    struct module *mod)
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -65,6 +65,30 @@ static inline void *dereference_symbol_d
 	return ptr;
 }
 
+#ifdef CONFIG_CALL_THUNKS
+extern const char *
+callthunk_address_lookup(unsigned long addr, unsigned long *size,
+			 unsigned long *off, char **modname, char *sym);
+extern int callthunk_get_kallsym(unsigned int symnum, unsigned long *value,
+				 char *type, char *name, char *module_name,
+				 int *exported);
+#else
+static inline const char *
+callthunk_address_lookup(unsigned long addr, unsigned long *size,
+			 unsigned long *off, char **modname, char *sym)
+{
+	return NULL;
+}
+
+static inline
+int callthunk_get_kallsym(unsigned int symnum, unsigned long *value,
+			  char *type, char *name, char *module_name,
+			  int *exported)
+{
+	return -1;
+}
+#endif
+
 #ifdef CONFIG_KALLSYMS
 int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 				      unsigned long),
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -365,6 +365,10 @@ static const char *kallsyms_lookup_build
 		ret = ftrace_mod_address_lookup(addr, symbolsize,
 						offset, modname, namebuf);
 
+	if (!ret)
+		ret = callthunk_address_lookup(addr, symbolsize,
+					       offset, modname, namebuf);
+
 found:
 	cleanup_symbol_name(namebuf);
 	return ret;
@@ -578,6 +582,7 @@ struct kallsym_iter {
 	loff_t pos_mod_end;
 	loff_t pos_ftrace_mod_end;
 	loff_t pos_bpf_end;
+	loff_t pos_callthunk_end;
 	unsigned long value;
 	unsigned int nameoff; /* If iterating in core kernel symbols. */
 	char type;
@@ -657,6 +662,20 @@ static int get_ksymbol_bpf(struct kallsy
 	return 1;
 }
 
+static int get_ksymbol_callthunk(struct kallsym_iter *iter)
+{
+	int ret = callthunk_get_kallsym(iter->pos - iter->pos_bpf_end,
+					&iter->value, &iter->type,
+					iter->name, iter->module_name,
+					&iter->exported);
+	if (ret < 0) {
+		iter->pos_callthunk_end = iter->pos;
+		return 0;
+	}
+
+	return 1;
+}
+
 /*
  * This uses "__builtin__kprobes" as a module name for symbols for pages
  * allocated for kprobes' purposes, even though "__builtin__kprobes" is not a
@@ -724,6 +743,10 @@ static int update_iter_mod(struct kallsy
 	    get_ksymbol_bpf(iter))
 		return 1;
 
+	if ((!iter->pos_callthunk_end || iter->pos_callthunk_end > pos) &&
+	    get_ksymbol_callthunk(iter))
+		return 1;
+
 	return get_ksymbol_kprobe(iter);
 }
 


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 34/38] x86/orc: Make it callthunk aware
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (32 preceding siblings ...)
  2022-07-16 23:18 ` [patch 33/38] kallsyms: Take callthunks into account Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 35/38] kprobes: Add callthunk blacklisting Thomas Gleixner
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Callthunks addresses on the stack would confuse the ORC unwinder. Handle
them correctly and tell ORC to proceed further down the stack.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/include/asm/alternative.h |    5 +++++
 arch/x86/kernel/callthunks.c       |    2 +-
 arch/x86/kernel/unwind_orc.c       |   21 ++++++++++++++++++++-
 3 files changed, 26 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -94,6 +94,7 @@ extern void callthunks_patch_module_call
 					  struct module *mod);
 extern void callthunks_module_free(struct module *mod);
 extern void *callthunks_translate_call_dest(void *dest);
+extern bool is_callthunk(void *addr);
 #else
 static __always_inline void callthunks_patch_builtin_calls(void) {}
 static __always_inline void
@@ -104,6 +105,10 @@ static __always_inline void *callthunks_
 {
 	return dest;
 }
+static __always_inline bool is_callthunk(void *addr)
+{
+	return false;
+}
 #endif
 
 #ifdef CONFIG_SMP
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -565,7 +565,7 @@ static bool is_module_callthunk(void *ad
 	return ret;
 }
 
-static bool is_callthunk(void *addr)
+bool is_callthunk(void *addr)
 {
 	if (builtin_layout.base <= addr &&
 	    addr < builtin_layout.base + builtin_layout.size)
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -131,6 +131,21 @@ static struct orc_entry null_orc_entry =
 	.type = UNWIND_HINT_TYPE_CALL
 };
 
+#ifdef CONFIG_CALL_THUNKS
+static struct orc_entry *orc_callthunk_find(unsigned long ip)
+{
+	if (!is_callthunk((void *)ip))
+		return NULL;
+
+	return &null_orc_entry;
+}
+#else
+static struct orc_entry *orc_callthunk_find(unsigned long ip)
+{
+	return NULL;
+}
+#endif
+
 /* Fake frame pointer entry -- used as a fallback for generated code */
 static struct orc_entry orc_fp_entry = {
 	.type		= UNWIND_HINT_TYPE_CALL,
@@ -184,7 +199,11 @@ static struct orc_entry *orc_find(unsign
 	if (orc)
 		return orc;
 
-	return orc_ftrace_find(ip);
+	orc =  orc_ftrace_find(ip);
+	if (orc)
+		return orc;
+
+	return orc_callthunk_find(ip);
 }
 
 #ifdef CONFIG_MODULES


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 35/38] kprobes: Add callthunk blacklisting
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (33 preceding siblings ...)
  2022-07-16 23:18 ` [patch 34/38] x86/orc: Make it callthunk aware Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-16 23:18 ` [patch 36/38] x86/ftrace: Make it call depth tracking aware Thomas Gleixner
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel),
	Masami Hiramatsu

From: Peter Zijlstra <peterz@infradead.org>

Callthunks are not safe for probing. Add them to the kprobes black listed
areas.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/callthunks.c |    5 ++++
 kernel/kprobes.c             |   52 +++++++++++++++++++++++++++----------------
 2 files changed, 38 insertions(+), 19 deletions(-)

--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -6,6 +6,7 @@
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
 #include <linux/memory.h>
+#include <linux/kprobes.h>
 #include <linux/moduleloader.h>
 #include <linux/set_memory.h>
 #include <linux/static_call.h>
@@ -476,6 +477,7 @@ static __init_or_module int callthunks_s
 
 static __init noinline void callthunks_init(struct callthunk_sites *cs)
 {
+	unsigned long base, size;
 	int ret;
 
 	if (cpu_feature_enabled(X86_FEATURE_CALL_DEPTH)) {
@@ -494,6 +496,9 @@ static __init noinline void callthunks_i
 	if (WARN_ON_ONCE(ret))
 		return;
 
+	base = (unsigned long)builtin_layout.base;
+	size = builtin_layout.size;
+	kprobe_add_area_blacklist(base, base + size);
 	static_call_force_reinit();
 	thunks_initialized = true;
 }
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2439,40 +2439,38 @@ void dump_kprobe(struct kprobe *kp)
 }
 NOKPROBE_SYMBOL(dump_kprobe);
 
-int kprobe_add_ksym_blacklist(unsigned long entry)
+static int __kprobe_add_ksym_blacklist(unsigned long start, unsigned long end)
 {
 	struct kprobe_blacklist_entry *ent;
-	unsigned long offset = 0, size = 0;
-
-	if (!kernel_text_address(entry) ||
-	    !kallsyms_lookup_size_offset(entry, &size, &offset))
-		return -EINVAL;
 
 	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
 	if (!ent)
 		return -ENOMEM;
-	ent->start_addr = entry;
-	ent->end_addr = entry + size;
+	ent->start_addr = start;
+	ent->end_addr = end;
 	INIT_LIST_HEAD(&ent->list);
 	list_add_tail(&ent->list, &kprobe_blacklist);
 
-	return (int)size;
+	return (int)(end - start);
+}
+
+int kprobe_add_ksym_blacklist(unsigned long entry)
+{
+	unsigned long offset = 0, size = 0;
+
+	if (!kernel_text_address(entry) ||
+	    !kallsyms_lookup_size_offset(entry, &size, &offset))
+		return -EINVAL;
+
+	return __kprobe_add_ksym_blacklist(entry, entry + size);
 }
 
 /* Add all symbols in given area into kprobe blacklist */
 int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
 {
-	unsigned long entry;
-	int ret = 0;
+	int ret = __kprobe_add_ksym_blacklist(start, end);
 
-	for (entry = start; entry < end; entry += ret) {
-		ret = kprobe_add_ksym_blacklist(entry);
-		if (ret < 0)
-			return ret;
-		if (ret == 0)	/* In case of alias symbol */
-			ret = 1;
-	}
-	return 0;
+	return ret < 0 ? ret : 0;
 }
 
 /* Remove all symbols in given area from kprobe blacklist */
@@ -2578,6 +2576,14 @@ static void add_module_kprobe_blacklist(
 		end = start + mod->noinstr_text_size;
 		kprobe_add_area_blacklist(start, end);
 	}
+
+#ifdef CONFIG_CALL_THUNKS
+	start = (unsigned long)mod->thunk_layout.base;
+	if (start) {
+		end = start + mod->thunk_layout.size;
+		kprobe_remove_area_blacklist(start, end);
+	}
+#endif
 }
 
 static void remove_module_kprobe_blacklist(struct module *mod)
@@ -2601,6 +2607,14 @@ static void remove_module_kprobe_blackli
 		end = start + mod->noinstr_text_size;
 		kprobe_remove_area_blacklist(start, end);
 	}
+
+#ifdef CONFIG_CALL_THUNKS
+	start = (unsigned long)mod->thunk_layout.base;
+	if (start) {
+		end = start + mod->thunk_layout.size;
+		kprobe_remove_area_blacklist(start, end);
+	}
+#endif
 }
 
 /* Module notifier call back, checking kprobes on the module */


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 36/38] x86/ftrace: Make it call depth tracking aware
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (34 preceding siblings ...)
  2022-07-16 23:18 ` [patch 35/38] kprobes: Add callthunk blacklisting Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-18 21:01   ` Steven Rostedt
  2022-07-16 23:18 ` [patch 37/38] x86/bpf: Emit call depth accounting if required Thomas Gleixner
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Peter Zijlstra (Intel)

From: Peter Zijlstra <peterz@infradead.org>

Since ftrace has trampolines, don't use thunks for the __fentry__ site
but instead require that every function called from there includes
accounting. This very much includes all the direct-call functions.

Additionally, ftrace uses ROP tricks in two places:

 - return_to_handler(), and
 - ftrace_regs_caller() when pt_regs->orig_ax is set by a direct-call.

return_to_handler() already uses a retpoline to replace an
indirect-jump to defeat IBT, since this is a jump-type retpoline, make
sure there is no accounting done and ALTERNATIVE the RET into a ret.

ftrace_regs_caller() does much the same but currently causes an RSB
imbalance by effectively doing a PUSH+RET combo, rebalance.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/nospec-branch.h        |    8 +++++++
 arch/x86/kernel/ftrace.c                    |   16 ++++++++++----
 arch/x86/kernel/ftrace_64.S                 |   31 ++++++++++++++++++++++++++--
 arch/x86/net/bpf_jit_comp.c                 |    6 +++++
 kernel/trace/trace_selftest.c               |    5 +++-
 samples/ftrace/ftrace-direct-modify.c       |    2 +
 samples/ftrace/ftrace-direct-multi-modify.c |    2 +
 samples/ftrace/ftrace-direct-multi.c        |    1 
 samples/ftrace/ftrace-direct-too.c          |    1 
 samples/ftrace/ftrace-direct.c              |    1 
 10 files changed, 66 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -293,6 +293,11 @@ static inline void x86_set_skl_return_th
 	x86_return_thunk = &__x86_return_skl;
 }
 
+#define CALL_DEPTH_ACCOUNT					\
+	ALTERNATIVE("",						\
+		    __stringify(INCREMENT_CALL_DEPTH),		\
+		    X86_FEATURE_CALL_DEPTH)
+
 DECLARE_PER_CPU(u64, __x86_call_depth);
 #ifdef CONFIG_CALL_THUNKS_DEBUG
 DECLARE_PER_CPU(u64, __x86_call_count);
@@ -302,6 +307,9 @@ DECLARE_PER_CPU(u64, __x86_ctxsw_count);
 #endif
 #else
 static inline void x86_set_skl_return_thunk(void) {}
+
+#define CALL_DEPTH_ACCOUNT
+
 #endif
 
 #ifdef CONFIG_RETPOLINE
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -69,6 +69,10 @@ static const char *ftrace_nop_replace(vo
 
 static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 {
+	/*
+	 * No need to translate into a callthunk. The trampoline does
+	 * the depth accounting itself.
+	 */
 	return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
 }
 
@@ -316,7 +320,7 @@ create_trampoline(struct ftrace_ops *ops
 	unsigned long size;
 	unsigned long *ptr;
 	void *trampoline;
-	void *ip;
+	void *ip, *dest;
 	/* 48 8b 15 <offset> is movq <offset>(%rip), %rdx */
 	unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
 	unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE };
@@ -403,10 +407,14 @@ create_trampoline(struct ftrace_ops *ops
 	/* put in the call to the function */
 	mutex_lock(&text_mutex);
 	call_offset -= start_offset;
+	/*
+	 * No need to translate into a callthunk. The trampoline does
+	 * the depth accounting before the call already.
+	 */
+	dest = ftrace_ops_get_func(ops);
 	memcpy(trampoline + call_offset,
-	       text_gen_insn(CALL_INSN_OPCODE,
-			     trampoline + call_offset,
-			     ftrace_ops_get_func(ops)), CALL_INSN_SIZE);
+	       text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest),
+	       CALL_INSN_SIZE);
 	mutex_unlock(&text_mutex);
 
 	/* ALLOC_TRAMP flags lets us know we created it */
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -132,6 +132,7 @@
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 SYM_FUNC_START(__fentry__)
+	CALL_DEPTH_ACCOUNT
 	RET
 SYM_FUNC_END(__fentry__)
 EXPORT_SYMBOL(__fentry__)
@@ -140,6 +141,8 @@ SYM_FUNC_START(ftrace_caller)
 	/* save_mcount_regs fills in first two parameters */
 	save_mcount_regs
 
+	CALL_DEPTH_ACCOUNT
+
 	/* Stack - skipping return address of ftrace_caller */
 	leaq MCOUNT_REG_SIZE+8(%rsp), %rcx
 	movq %rcx, RSP(%rsp)
@@ -155,6 +158,9 @@ SYM_INNER_LABEL(ftrace_caller_op_ptr, SY
 	/* Only ops with REGS flag set should have CS register set */
 	movq $0, CS(%rsp)
 
+	/* Account for the function call below */
+	CALL_DEPTH_ACCOUNT
+
 SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 	call ftrace_stub
@@ -195,6 +201,8 @@ SYM_FUNC_START(ftrace_regs_caller)
 	save_mcount_regs 8
 	/* save_mcount_regs fills in first two parameters */
 
+	CALL_DEPTH_ACCOUNT
+
 SYM_INNER_LABEL(ftrace_regs_caller_op_ptr, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 	/* Load the ftrace_ops into the 3rd parameter */
@@ -225,6 +233,9 @@ SYM_INNER_LABEL(ftrace_regs_caller_op_pt
 	/* regs go into 4th parameter */
 	leaq (%rsp), %rcx
 
+	/* Account for the function call below */
+	CALL_DEPTH_ACCOUNT
+
 SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 	call ftrace_stub
@@ -280,7 +291,19 @@ SYM_INNER_LABEL(ftrace_regs_caller_end,
 	/* Restore flags */
 	popfq
 	UNWIND_HINT_FUNC
-	jmp	ftrace_epilogue
+
+	/*
+	 * Since we're effectively emulating a tail-call with PUSH;RET
+	 * make sure we don't unbalance the RSB and mess up accounting.
+	 */
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call	2f
+	int3
+2:
+	add	$8, %rsp
+	ALTERNATIVE __stringify(RET), \
+		    __stringify(ANNOTATE_UNRET_SAFE; ret; int3), \
+		    X86_FEATURE_CALL_DEPTH
 
 SYM_FUNC_END(ftrace_regs_caller)
 STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller)
@@ -289,6 +312,8 @@ STACK_FRAME_NON_STANDARD_FP(ftrace_regs_
 #else /* ! CONFIG_DYNAMIC_FTRACE */
 
 SYM_FUNC_START(__fentry__)
+	CALL_DEPTH_ACCOUNT
+
 	cmpq $ftrace_stub, ftrace_trace_function
 	jnz trace
 
@@ -345,6 +370,8 @@ SYM_CODE_START(return_to_handler)
 	int3
 .Ldo_rop:
 	mov %rdi, (%rsp)
-	RET
+	ALTERNATIVE __stringify(RET), \
+		    __stringify(ANNOTATE_UNRET_SAFE; ret; int3), \
+		    X86_FEATURE_CALL_DEPTH
 SYM_CODE_END(return_to_handler)
 #endif
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -12,6 +12,7 @@
 #include <linux/memory.h>
 #include <linux/sort.h>
 #include <asm/extable.h>
+#include <asm/ftrace.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 #include <asm/text-patching.h>
@@ -2090,6 +2091,11 @@ int arch_prepare_bpf_trampoline(struct b
 	prog = image;
 
 	EMIT_ENDBR();
+	/*
+	 * This is the direct-call trampoline, as such it needs accounting
+	 * for the __fentry__ call.
+	 */
+	x86_call_depth_emit_accounting(&prog, __fentry__);
 	EMIT1(0x55);		 /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
 	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -785,7 +785,10 @@ static struct fgraph_ops fgraph_ops __in
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
-noinline __noclone static void trace_direct_tramp(void) { }
+noinline __noclone static void trace_direct_tramp(void)
+{
+	asm(CALL_DEPTH_ACCOUNT);
+}
 #endif
 
 /*
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -34,6 +34,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	call my_direct_func1\n"
 "	leave\n"
 "	.size		my_tramp1, .-my_tramp1\n"
@@ -45,6 +46,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	call my_direct_func2\n"
 "	leave\n"
 	ASM_RET
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -32,6 +32,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	pushq %rdi\n"
 "	movq 8(%rbp), %rdi\n"
 "	call my_direct_func1\n"
@@ -46,6 +47,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	pushq %rdi\n"
 "	movq 8(%rbp), %rdi\n"
 "	call my_direct_func2\n"
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -27,6 +27,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	pushq %rdi\n"
 "	movq 8(%rbp), %rdi\n"
 "	call my_direct_func\n"
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -29,6 +29,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	pushq %rdi\n"
 "	pushq %rsi\n"
 "	pushq %rdx\n"
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -26,6 +26,7 @@ asm (
 	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
+	CALL_DEPTH_ACCOUNT
 "	pushq %rdi\n"
 "	call my_direct_func\n"
 "	popq %rdi\n"


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 37/38] x86/bpf: Emit call depth accounting if required
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (35 preceding siblings ...)
  2022-07-16 23:18 ` [patch 36/38] x86/ftrace: Make it call depth tracking aware Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-19  5:30   ` Alexei Starovoitov
  2022-07-16 23:18 ` [patch 38/38] x86/retbleed: Add call depth tracking mitigation Thomas Gleixner
                   ` (5 subsequent siblings)
  42 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Alexei Starovoitov,
	Daniel Borkmann

Ensure that calls in BPF jitted programs are emitting call depth accounting
when enabled to keep the call/return balanced. The return thunk jump is
already injected due to the earlier retbleed mitigations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 arch/x86/include/asm/alternative.h |    6 +++++
 arch/x86/kernel/callthunks.c       |   19 ++++++++++++++++
 arch/x86/net/bpf_jit_comp.c        |   43 ++++++++++++++++++++++++-------------
 3 files changed, 53 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -95,6 +95,7 @@ extern void callthunks_patch_module_call
 extern void callthunks_module_free(struct module *mod);
 extern void *callthunks_translate_call_dest(void *dest);
 extern bool is_callthunk(void *addr);
+extern int x86_call_depth_emit_accounting(u8 **pprog, void *func);
 #else
 static __always_inline void callthunks_patch_builtin_calls(void) {}
 static __always_inline void
@@ -109,6 +110,11 @@ static __always_inline bool is_callthunk
 {
 	return false;
 }
+static __always_inline int x86_call_depth_emit_accounting(u8 **pprog,
+							  void *func)
+{
+	return 0;
+}
 #endif
 
 #ifdef CONFIG_SMP
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -706,6 +706,25 @@ int callthunk_get_kallsym(unsigned int s
 	return ret;
 }
 
+#ifdef CONFIG_BPF_JIT
+int x86_call_depth_emit_accounting(u8 **pprog, void *func)
+{
+	unsigned int tmpl_size = callthunk_desc.template_size;
+	void *tmpl = callthunk_desc.template;
+
+	if (!thunks_initialized)
+		return 0;
+
+	/* Is function call target a thunk? */
+	if (is_callthunk(func))
+		return 0;
+
+	memcpy(*pprog, tmpl, tmpl_size);
+	*pprog += tmpl_size;
+	return tmpl_size;
+}
+#endif
+
 #ifdef CONFIG_MODULES
 void noinline callthunks_patch_module_calls(struct callthunk_sites *cs,
 					    struct module *mod)
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -340,6 +340,12 @@ static int emit_call(u8 **pprog, void *f
 	return emit_patch(pprog, func, ip, 0xE8);
 }
 
+static int emit_rsb_call(u8 **pprog, void *func, void *ip)
+{
+	x86_call_depth_emit_accounting(pprog, func);
+	return emit_patch(pprog, func, ip, 0xE8);
+}
+
 static int emit_jump(u8 **pprog, void *func, void *ip)
 {
 	return emit_patch(pprog, func, ip, 0xE9);
@@ -1431,19 +1437,26 @@ st:			if (is_imm8(insn->off))
 			break;
 
 			/* call */
-		case BPF_JMP | BPF_CALL:
+		case BPF_JMP | BPF_CALL: {
+			int offs;
+
 			func = (u8 *) __bpf_call_base + imm32;
 			if (tail_call_reachable) {
 				/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
 				EMIT3_off32(0x48, 0x8B, 0x85,
 					    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
-				if (!imm32 || emit_call(&prog, func, image + addrs[i - 1] + 7))
+				if (!imm32)
 					return -EINVAL;
+				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
 			} else {
-				if (!imm32 || emit_call(&prog, func, image + addrs[i - 1]))
+				if (!imm32)
 					return -EINVAL;
+				offs = x86_call_depth_emit_accounting(&prog, func);
 			}
+			if (emit_call(&prog, func, image + addrs[i - 1] + offs))
+				return -EINVAL;
 			break;
+		}
 
 		case BPF_JMP | BPF_TAIL_CALL:
 			if (imm32)
@@ -1808,10 +1821,10 @@ static int invoke_bpf_prog(const struct
 	/* arg2: lea rsi, [rbp - ctx_cookie_off] */
 	EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
 
-	if (emit_call(&prog,
-		      p->aux->sleepable ? __bpf_prog_enter_sleepable :
-		      __bpf_prog_enter, prog))
-			return -EINVAL;
+	if (emit_rsb_call(&prog,
+			  p->aux->sleepable ? __bpf_prog_enter_sleepable :
+			  __bpf_prog_enter, prog))
+		return -EINVAL;
 	/* remember prog start time returned by __bpf_prog_enter */
 	emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
 
@@ -1831,7 +1844,7 @@ static int invoke_bpf_prog(const struct
 			       (long) p->insnsi >> 32,
 			       (u32) (long) p->insnsi);
 	/* call JITed bpf program or interpreter */
-	if (emit_call(&prog, p->bpf_func, prog))
+	if (emit_rsb_call(&prog, p->bpf_func, prog))
 		return -EINVAL;
 
 	/*
@@ -1855,10 +1868,10 @@ static int invoke_bpf_prog(const struct
 	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
 	/* arg3: lea rdx, [rbp - run_ctx_off] */
 	EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
-	if (emit_call(&prog,
-		      p->aux->sleepable ? __bpf_prog_exit_sleepable :
-		      __bpf_prog_exit, prog))
-			return -EINVAL;
+	if (emit_rsb_call(&prog,
+			  p->aux->sleepable ? __bpf_prog_exit_sleepable :
+			  __bpf_prog_exit, prog))
+		return -EINVAL;
 
 	*pprog = prog;
 	return 0;
@@ -2123,7 +2136,7 @@ int arch_prepare_bpf_trampoline(struct b
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		/* arg1: mov rdi, im */
 		emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im);
-		if (emit_call(&prog, __bpf_tramp_enter, prog)) {
+		if (emit_rsb_call(&prog, __bpf_tramp_enter, prog)) {
 			ret = -EINVAL;
 			goto cleanup;
 		}
@@ -2151,7 +2164,7 @@ int arch_prepare_bpf_trampoline(struct b
 		restore_regs(m, &prog, nr_args, regs_off);
 
 		/* call original function */
-		if (emit_call(&prog, orig_call, prog)) {
+		if (emit_rsb_call(&prog, orig_call, prog)) {
 			ret = -EINVAL;
 			goto cleanup;
 		}
@@ -2194,7 +2207,7 @@ int arch_prepare_bpf_trampoline(struct b
 		im->ip_epilogue = prog;
 		/* arg1: mov rdi, im */
 		emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im);
-		if (emit_call(&prog, __bpf_tramp_exit, prog)) {
+		if (emit_rsb_call(&prog, __bpf_tramp_exit, prog)) {
 			ret = -EINVAL;
 			goto cleanup;
 		}


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 38/38] x86/retbleed: Add call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (36 preceding siblings ...)
  2022-07-16 23:18 ` [patch 37/38] x86/bpf: Emit call depth accounting if required Thomas Gleixner
@ 2022-07-16 23:18 ` Thomas Gleixner
  2022-07-17  9:45 ` [patch 00/38] x86/retbleed: Call " David Laight
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-16 23:18 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

The fully secure mitigation for RSB underflow on Intel SKL CPUs is IBRS,
which inflicts up to 30% penalty for pathological syscall heavy work loads.

Software based call depth tracking and RSB refill is not perfect, but
reduces the attack surface massively. The penalty for the pathological case
is about 8% which is still annoying but definitely more palatable than IBRS.

Add a retbleed=stuff command line option to enable the call depth tracking
and software refill of the RSB.

This gives admins a choice. IBeeRS are safe and cause headaches, call depth
tracking is considered to be s(t)ufficiently safe.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/bugs.c |   32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -784,6 +784,7 @@ enum retbleed_mitigation {
 	RETBLEED_MITIGATION_IBPB,
 	RETBLEED_MITIGATION_IBRS,
 	RETBLEED_MITIGATION_EIBRS,
+	RETBLEED_MITIGATION_STUFF,
 };
 
 enum retbleed_mitigation_cmd {
@@ -791,6 +792,7 @@ enum retbleed_mitigation_cmd {
 	RETBLEED_CMD_AUTO,
 	RETBLEED_CMD_UNRET,
 	RETBLEED_CMD_IBPB,
+	RETBLEED_CMD_STUFF,
 };
 
 const char * const retbleed_strings[] = {
@@ -799,6 +801,7 @@ const char * const retbleed_strings[] =
 	[RETBLEED_MITIGATION_IBPB]	= "Mitigation: IBPB",
 	[RETBLEED_MITIGATION_IBRS]	= "Mitigation: IBRS",
 	[RETBLEED_MITIGATION_EIBRS]	= "Mitigation: Enhanced IBRS",
+	[RETBLEED_MITIGATION_STUFF]	= "Mitigation: Stuffing",
 };
 
 static enum retbleed_mitigation retbleed_mitigation __ro_after_init =
@@ -828,6 +831,8 @@ static int __init retbleed_parse_cmdline
 			retbleed_cmd = RETBLEED_CMD_UNRET;
 		} else if (!strcmp(str, "ibpb")) {
 			retbleed_cmd = RETBLEED_CMD_IBPB;
+		} else if (!strcmp(str, "stuff")) {
+			retbleed_cmd = RETBLEED_CMD_STUFF;
 		} else if (!strcmp(str, "nosmt")) {
 			retbleed_nosmt = true;
 		} else {
@@ -876,6 +881,21 @@ static void __init retbleed_select_mitig
 		}
 		break;
 
+	case RETBLEED_CMD_STUFF:
+		if (IS_ENABLED(CONFIG_CALL_DEPTH_TRACKING) &&
+		    spectre_v2_enabled == SPECTRE_V2_RETPOLINE) {
+			retbleed_mitigation = RETBLEED_MITIGATION_STUFF;
+
+		} else {
+			if (IS_ENABLED(CONFIG_CALL_DEPTH_TRACKING))
+				pr_err("WARNING: retbleed=stuff depends on spectre_v2=retpoline\n");
+			else
+				pr_err("WARNING: kernel not compiled with CALL_DEPTH_TRACKING.\n");
+
+			goto do_cmd_auto;
+		}
+		break;
+
 do_cmd_auto:
 	case RETBLEED_CMD_AUTO:
 	default:
@@ -913,6 +933,12 @@ static void __init retbleed_select_mitig
 		mitigate_smt = true;
 		break;
 
+	case RETBLEED_MITIGATION_STUFF:
+		setup_force_cpu_cap(X86_FEATURE_RETHUNK);
+		setup_force_cpu_cap(X86_FEATURE_CALL_DEPTH);
+		x86_set_skl_return_thunk();
+		break;
+
 	default:
 		break;
 	}
@@ -923,7 +949,7 @@ static void __init retbleed_select_mitig
 
 	/*
 	 * Let IBRS trump all on Intel without affecting the effects of the
-	 * retbleed= cmdline option.
+	 * retbleed= cmdline option except for call depth based stuffing
 	 */
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
 		switch (spectre_v2_enabled) {
@@ -936,7 +962,8 @@ static void __init retbleed_select_mitig
 			retbleed_mitigation = RETBLEED_MITIGATION_EIBRS;
 			break;
 		default:
-			pr_err(RETBLEED_INTEL_MSG);
+			if (retbleed_mitigation != RETBLEED_MITIGATION_STUFF)
+				pr_err(RETBLEED_INTEL_MSG);
 		}
 	}
 
@@ -1361,6 +1388,7 @@ static void __init spectre_v2_select_mit
 		if (IS_ENABLED(CONFIG_CPU_IBRS_ENTRY) &&
 		    boot_cpu_has_bug(X86_BUG_RETBLEED) &&
 		    retbleed_cmd != RETBLEED_CMD_OFF &&
+		    retbleed_cmd != RETBLEED_CMD_STUFF &&
 		    boot_cpu_has(X86_FEATURE_IBRS) &&
 		    boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
 			mode = SPECTRE_V2_IBRS;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
@ 2022-07-17  0:22   ` Andrew Cooper
  2022-07-17 15:20     ` Linus Torvalds
  2022-07-17 19:08     ` Thomas Gleixner
  2022-07-18 17:52   ` [patch 0/3] x86/cpu: Sanitize switch_to_new_gdt() Thomas Gleixner
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 142+ messages in thread
From: Andrew Cooper @ 2022-07-17  0:22 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On 17/07/2022 00:17, Thomas Gleixner wrote:
> load_percpu_segment() is using wrmsr() which is paravirtualized. That's an
> issue because the code sequence is:
>
>         __loadsegment_simple(gs, 0);
> 	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>
> So anything which uses a per CPU variable between setting GS to 0 and
> writing GSBASE is going to end up in a NULL pointer dereference. That's
> can be triggered with instrumentation and is guaranteed to be triggered
> with callthunks for call depth tracking.
>
> Use native_wrmsrl() instead. XEN_PV will trap and emulate, but that's not a
> hot path.
>
> Also make it static and mark it noinstr so neither kprobes, sanitizers or
> whatever can touch it.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/processor.h |    1 -
>  arch/x86/kernel/cpu/common.c     |   12 ++++++++++--
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -673,7 +673,6 @@ extern struct desc_ptr		early_gdt_descr;
>  extern void switch_to_new_gdt(int);
>  extern void load_direct_gdt(int);
>  extern void load_fixmap_gdt(int);
> -extern void load_percpu_segment(int);
>  extern void cpu_init(void);
>  extern void cpu_init_secondary(void);
>  extern void cpu_init_exception_handling(void);
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -701,13 +701,21 @@ static const char *table_lookup_model(st
>  __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>  __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>  
> -void load_percpu_segment(int cpu)
> +static noinstr void load_percpu_segment(int cpu)
>  {
>  #ifdef CONFIG_X86_32
>  	loadsegment(fs, __KERNEL_PERCPU);
>  #else
>  	__loadsegment_simple(gs, 0);
> -	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
> +	/*
> +	 * Because of the __loadsegment_simple(gs, 0) above, any GS-prefixed
> +	 * instruction will explode right about here. As such, we must not have
> +	 * any CALL-thunks using per-cpu data.
> +	 *
> +	 * Therefore, use native_wrmsrl() and have XenPV take the fault and
> +	 * emulate.
> +	 */
> +	native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>  #endif

Lovely :-/

But I still don't see how that works, because __loadsegment_simple() is
a memory clobber and cpu_kernelmode_gs_base() has a per-cpu lookup in it.

That said, this only has a sole caller, and in context, it's bogus for
64bit.  Can't we fix all the problems by just doing this:

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 736262a76a12..6f393bc9d89d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,16 +701,6 @@ static const char *table_lookup_model(struct
cpuinfo_x86 *c)
 __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned
long));
 __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
-void load_percpu_segment(int cpu)
-{
-#ifdef CONFIG_X86_32
-       loadsegment(fs, __KERNEL_PERCPU);
-#else
-       __loadsegment_simple(gs, 0);
-       wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
-#endif
-}
-
 #ifdef CONFIG_X86_32
 /* The 32-bit entry code needs to find cpu_entry_area. */
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
@@ -742,12 +732,15 @@ EXPORT_SYMBOL_GPL(load_fixmap_gdt);
  * Current gdt points %fs at the "master" per-cpu area: after this,
  * it's on the real one.
  */
-void switch_to_new_gdt(int cpu)
+void __noinstr switch_to_new_gdt(int cpu)
 {
        /* Load the original GDT */
        load_direct_gdt(cpu);
+
+#ifdef CONFIG_X86_32
        /* Reload the per-cpu base */
-       load_percpu_segment(cpu);
+       loadsegment(fs, __KERNEL_PERCPU);
+#endif
 }
 
 static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};


It's only 32bit where the percpu pointer is tied to the GDT.  On 64bit,
gsbase is good before this, and remains good after.

With this change,

# Make sure load_percpu_segment has no stackprotector
CFLAGS_common.o         := -fno-stack-protector

comes up for re-evaluation too.

~Andrew

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (37 preceding siblings ...)
  2022-07-16 23:18 ` [patch 38/38] x86/retbleed: Add call depth tracking mitigation Thomas Gleixner
@ 2022-07-17  9:45 ` David Laight
  2022-07-17 15:07   ` Thomas Gleixner
  2022-07-18 19:29 ` Thomas Gleixner
                   ` (3 subsequent siblings)
  42 siblings, 1 reply; 142+ messages in thread
From: David Laight @ 2022-07-17  9:45 UTC (permalink / raw)
  To: 'Thomas Gleixner', LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

From: Thomas Gleixner
> Sent: 17 July 2022 00:17
> Folks!
> 
> Back in the good old spectre v2 days (2018) we decided to not use
> IBRS. In hindsight this might have been the wrong decision because it did
> not force people to come up with alternative approaches.
> 
> It was already discussed back then to try software based call depth
> accounting and RSB stuffing on underflow for Intel SKL[-X] systems to avoid
> the insane overhead of IBRS.
> 
> This has been tried in 2018 and was rejected due to the massive overhead
> and other shortcomings of the approach to put the accounting into each
> function prologue:
> 
>   1) Text size increase which is inflicted on everyone.  While CPUs are
>      good in ignoring NOPs they still pollute the I-cache.
> 
>   2) That results in tail call over-accounting which can be exploited.
> 
> Disabling tail calls is not an option either and adding a 10 byte padding
> in front of every direct call is even worse in terms of text size and
> I-cache impact. We also could patch calls past the accounting in the
> function prologue but that becomes a nightmare vs. ENDBR.
> 
> As IBRS is a performance horror show, Peter Zijstra and me revisited the
> call depth tracking approach and implemented it in a way which is hopefully
> more palatable and avoids the downsides of the original attempt.
> 
> We both unsurprisingly hate the result with a passion...
> 
> The way we approached this is:
> 
>  1) objtool creates a list of function entry points and a list of direct
>     call sites into new sections which can be discarded after init.
> 
>  2) On affected machines, use the new sections, allocate module memory
>     and create a call thunk per function (16 bytes without
>     debug/statistics). Then patch all direct calls to invoke the thunk,
>     which does the call accounting and then jumps to the original call
>     site.
> 
>  3) Utilize the retbleed return thunk mechanism by making the jump
>     target run-time configurable. Add the accounting counterpart and
>     stuff RSB on underflow in that alternate implementation.

What happens to indirect calls?
The above would imply that they miss the function entry thunk, but
get the return one.
Won't this lead to mis-counting of the RSB?

I also thought that retpolines would trash the return stack?
Using a single retpoline thunk would pretty much ensure that
they are never correctly predicted from the BTB, but it only
gives a single BTB entry that needs 'setting up' to get mis-
prediction.

I'm also sure I managed to infer from a document of instruction
timings and architectures that some x86 cpu actually used the BTB
for normal conditional jumps?
Possibly to avoid passing the full %ip value all down the
cpu pipeline.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-17  9:45 ` [patch 00/38] x86/retbleed: Call " David Laight
@ 2022-07-17 15:07   ` Thomas Gleixner
  2022-07-17 17:56     ` David Laight
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 15:07 UTC (permalink / raw)
  To: David Laight, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Sun, Jul 17 2022 at 09:45, David Laight wrote:
> From: Thomas Gleixner
>> 
>>  3) Utilize the retbleed return thunk mechanism by making the jump
>>     target run-time configurable. Add the accounting counterpart and
>>     stuff RSB on underflow in that alternate implementation.
>
> What happens to indirect calls?
> The above would imply that they miss the function entry thunk, but
> get the return one.
> Won't this lead to mis-counting of the RSB?

That's accounted in the indirect call thunk. This mitigation requires
retpolines enabled.

> I also thought that retpolines would trash the return stack?

No. They prevent that the CPU misspeculates an indirect call due to a
mistrained BTB.

> Using a single retpoline thunk would pretty much ensure that
> they are never correctly predicted from the BTB, but it only
> gives a single BTB entry that needs 'setting up' to get mis-
> prediction.

  BTB != RSB

The intra function call in the retpoline is of course adding a RSB entry
which points to the speculation trap, but that gets popped immediately
after that by the return which goes to the called function.

But that does not prevent the RSB underflow problem. As I described the
RSB is a stack with depth 16. Call pushs, ret pops. So if speculation is
ahead and emptied the RSB while speculating down the rets then the next
speculated RET will fall back to other prediction mechanism which is
what the SKL specific retbleed variant exploits via BHB mistraining.

> I'm also sure I managed to infer from a document of instruction
> timings and architectures that some x86 cpu actually used the BTB
> for normal conditional jumps?

That's relevant to the problem at hand in which way?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17  0:22   ` Andrew Cooper
@ 2022-07-17 15:20     ` Linus Torvalds
  2022-07-17 19:08     ` Thomas Gleixner
  1 sibling, 0 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-17 15:20 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Thomas Gleixner, LKML, x86, Tim Chen, Josh Poimboeuf,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Sat, Jul 16, 2022 at 5:22 PM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>
> It's only 32bit where the percpu pointer is tied to the GDT.  On 64bit,
> gsbase is good before this, and remains good after.

That sounds sensible to me, but somebody should check that there's
nothing that accidentally relied on the MSR_GS_BASE setting (or the
segment selector clearing, for that matter).

Not that I can necessarily see how anything could work with it wrong, but..

           Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-17 15:07   ` Thomas Gleixner
@ 2022-07-17 17:56     ` David Laight
  2022-07-17 19:15       ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: David Laight @ 2022-07-17 17:56 UTC (permalink / raw)
  To: 'Thomas Gleixner', LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

From: Thomas Gleixner
> Sent: 17 July 2022 16:07
> 
> On Sun, Jul 17 2022 at 09:45, David Laight wrote:
> > From: Thomas Gleixner
> >>
> >>  3) Utilize the retbleed return thunk mechanism by making the jump
> >>     target run-time configurable. Add the accounting counterpart and
> >>     stuff RSB on underflow in that alternate implementation.
> >
> > What happens to indirect calls?
> > The above would imply that they miss the function entry thunk, but
> > get the return one.
> > Won't this lead to mis-counting of the RSB?
> 
> That's accounted in the indirect call thunk. This mitigation requires
> retpolines enabled.

Thanks, that wasn't in the summary.

> > I also thought that retpolines would trash the return stack?
> 
> No. They prevent that the CPU misspeculates an indirect call due to a
> mistrained BTB.
> 
> > Using a single retpoline thunk would pretty much ensure that
> > they are never correctly predicted from the BTB, but it only
> > gives a single BTB entry that needs 'setting up' to get mis-
> > prediction.
> 
>   BTB != RSB

I was thinking about what happens after the RSB has underflowed.
Which is when (I presume) the BTB based speculation happens.

> The intra function call in the retpoline is of course adding a RSB entry
> which points to the speculation trap, but that gets popped immediately
> after that by the return which goes to the called function.

I'm remembering the 'active' instructions in a retpoline being 'push; ret'.
Which is an RSB imbalance.

...
> > I'm also sure I managed to infer from a document of instruction
> > timings and architectures that some x86 cpu actually used the BTB
> > for normal conditional jumps?
> 
> That's relevant to the problem at hand in which way?

The next problem :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17  0:22   ` Andrew Cooper
  2022-07-17 15:20     ` Linus Torvalds
@ 2022-07-17 19:08     ` Thomas Gleixner
  2022-07-17 20:08       ` Thomas Gleixner
  1 sibling, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 19:08 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Sun, Jul 17 2022 at 00:22, Andrew Cooper wrote:
>> -void load_percpu_segment(int cpu)
>> +static noinstr void load_percpu_segment(int cpu)
>>  {
>>  #ifdef CONFIG_X86_32
>>  	loadsegment(fs, __KERNEL_PERCPU);
>>  #else
>>  	__loadsegment_simple(gs, 0);
>> -	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>> +	/*
>> +	 * Because of the __loadsegment_simple(gs, 0) above, any GS-prefixed
>> +	 * instruction will explode right about here. As such, we must not have
>> +	 * any CALL-thunks using per-cpu data.
>> +	 *
>> +	 * Therefore, use native_wrmsrl() and have XenPV take the fault and
>> +	 * emulate.
>> +	 */
>> +	native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>>  #endif
>
> Lovely :-/
>
> But I still don't see how that works, because __loadsegment_simple() is
> a memory clobber and cpu_kernelmode_gs_base() has a per-cpu lookup in
> it.

No. It uses an array lookup :)

> That said, this only has a sole caller, and in context, it's bogus for
> 64bit.  Can't we fix all the problems by just doing this:
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 736262a76a12..6f393bc9d89d 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -701,16 +701,6 @@ static const char *table_lookup_model(struct
> cpuinfo_x86 *c)
>  __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned
> long));
>  __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>  
> -void load_percpu_segment(int cpu)
> -{
> -#ifdef CONFIG_X86_32
> -       loadsegment(fs, __KERNEL_PERCPU);
> -#else
> -       __loadsegment_simple(gs, 0);
> -       wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
> -#endif
> -}
> -
>  #ifdef CONFIG_X86_32
>  /* The 32-bit entry code needs to find cpu_entry_area. */
>  DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
> @@ -742,12 +732,15 @@ EXPORT_SYMBOL_GPL(load_fixmap_gdt);
>   * Current gdt points %fs at the "master" per-cpu area: after this,
>   * it's on the real one.
>   */
> -void switch_to_new_gdt(int cpu)
> +void __noinstr switch_to_new_gdt(int cpu)
>  {
>         /* Load the original GDT */
>         load_direct_gdt(cpu);
> +
> +#ifdef CONFIG_X86_32
>         /* Reload the per-cpu base */
> -       load_percpu_segment(cpu);
> +       loadsegment(fs, __KERNEL_PERCPU);
> +#endif
>  }
>  
>  static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>
>
> It's only 32bit where the percpu pointer is tied to the GDT.  On 64bit,
> gsbase is good before this, and remains good after.
>
> With this change,
>
> # Make sure load_percpu_segment has no stackprotector
> CFLAGS_common.o         := -fno-stack-protector
>
> comes up for re-evaluation too.

Good point. Let me stare at it some more.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-17 17:56     ` David Laight
@ 2022-07-17 19:15       ` Thomas Gleixner
  0 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 19:15 UTC (permalink / raw)
  To: David Laight, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Sun, Jul 17 2022 at 17:56, David Laight wrote:
> From: Thomas Gleixner
>> On Sun, Jul 17 2022 at 09:45, David Laight wrote:
> I was thinking about what happens after the RSB has underflowed.
> Which is when (I presume) the BTB based speculation happens.
>
>> The intra function call in the retpoline is of course adding a RSB entry
>> which points to the speculation trap, but that gets popped immediately
>> after that by the return which goes to the called function.
>
> I'm remembering the 'active' instructions in a retpoline being 'push; ret'.
> Which is an RSB imbalance.

Looking at the code might help to remember correctly:

        call   1f
        speculation trap
1:      mov     %reg, %rsp
        ret

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17 19:08     ` Thomas Gleixner
@ 2022-07-17 20:08       ` Thomas Gleixner
  2022-07-17 20:13         ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 20:08 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Sun, Jul 17 2022 at 21:08, Thomas Gleixner wrote:
> On Sun, Jul 17 2022 at 00:22, Andrew Cooper wrote:
>>  #ifdef CONFIG_X86_32
>>  /* The 32-bit entry code needs to find cpu_entry_area. */
>>  DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
>> @@ -742,12 +732,15 @@ EXPORT_SYMBOL_GPL(load_fixmap_gdt);
>>   * Current gdt points %fs at the "master" per-cpu area: after this,
>>   * it's on the real one.
>>   */
>> -void switch_to_new_gdt(int cpu)
>> +void __noinstr switch_to_new_gdt(int cpu)
>>  {
>>         /* Load the original GDT */
>>         load_direct_gdt(cpu);
>> +
>> +#ifdef CONFIG_X86_32
>>         /* Reload the per-cpu base */
>> -       load_percpu_segment(cpu);
>> +       loadsegment(fs, __KERNEL_PERCPU);
>> +#endif
>>  }
>>  
>>  static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>>
>>
>> It's only 32bit where the percpu pointer is tied to the GDT.  On 64bit,
>> gsbase is good before this, and remains good after.
>>
>> With this change,
>>
>> # Make sure load_percpu_segment has no stackprotector
>> CFLAGS_common.o         := -fno-stack-protector
>>
>> comes up for re-evaluation too.
>
> Good point. Let me stare at it some more.

If it only would be that simple :)

loadsegment_simple() was a red herring. The gs segment is already zero.

So what explodes here is the early boot when switching from early per
CPU to the real per CPU area.

start_kernel()
        .....
        setup_per_cpu_areas();
        smp_prepare_boot_cpu()
          switch_to_new_gdt()
       	     load_direct_gdt(cpu);
          load_percpu_segment(cpu)
            GS: 0
            GS_BASE: 0xffffffff829d0000 (early PERCPU) 
            wrmsrl()
            GS_BASE: 0xffff888237c00000 (real PERCPU)

So the explosion happens when accessing a per CPU variable after loading
the GDT and before GS_BASE is fixed up.

That's the only case AFAICT where this matters. In all other invocations
GS_BASE is already correct.

Let me fix this proper.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17 20:08       ` Thomas Gleixner
@ 2022-07-17 20:13         ` Thomas Gleixner
  2022-07-17 21:54           ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 20:13 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Sun, Jul 17 2022 at 22:08, Thomas Gleixner wrote:
> On Sun, Jul 17 2022 at 21:08, Thomas Gleixner wrote:
> loadsegment_simple() was a red herring. The gs segment is already zero.
>
> So what explodes here is the early boot when switching from early per
> CPU to the real per CPU area.
>
> start_kernel()
>         .....
>         setup_per_cpu_areas();
>         smp_prepare_boot_cpu()

Bah. switch_to_new_gdt() is already invoked from setup_per_cpu_areas()
and then again in smp_prepare_boot_cpu() and once more in cpu_init(),

What a mess.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17 20:13         ` Thomas Gleixner
@ 2022-07-17 21:54           ` Thomas Gleixner
  2022-07-18  5:11             ` Juergen Gross
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-17 21:54 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross

On Sun, Jul 17 2022 at 22:13, Thomas Gleixner wrote:
> On Sun, Jul 17 2022 at 22:08, Thomas Gleixner wrote:
>> On Sun, Jul 17 2022 at 21:08, Thomas Gleixner wrote:
>> loadsegment_simple() was a red herring. The gs segment is already zero.
>>
>> So what explodes here is the early boot when switching from early per
>> CPU to the real per CPU area.
>>
>> start_kernel()
>>         .....
>>         setup_per_cpu_areas();
>>         smp_prepare_boot_cpu()
>
> Bah. switch_to_new_gdt() is already invoked from setup_per_cpu_areas()
> and then again in smp_prepare_boot_cpu() and once more in cpu_init(),
>
> What a mess.

So the below builds and boots at least on 64bit. I'll stare at it some
more tomorrow. I have no idea whether native_load_gdt() works with
XEN_PV. It should, but what do I know.

Thanks,

        tglx
---
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -205,7 +205,7 @@ static inline void native_set_ldt(const
 	}
 }
 
-static inline void native_load_gdt(const struct desc_ptr *dtr)
+static __always_inline void native_load_gdt(const struct desc_ptr *dtr)
 {
 	asm volatile("lgdt %0"::"m" (*dtr));
 }
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -670,10 +670,9 @@ extern int sysenter_setup(void);
 /* Defined in head.S */
 extern struct desc_ptr		early_gdt_descr;
 
-extern void switch_to_new_gdt(int);
+extern void switch_to_real_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
-extern void load_percpu_segment(int);
 extern void cpu_init(void);
 extern void cpu_init_secondary(void);
 extern void cpu_init_exception_handling(void);
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -7,20 +7,24 @@
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_common.o = -pg
 CFLAGS_REMOVE_perf_event.o = -pg
+CFLAGS_REMOVE_switch_gdt.o = -pg
 endif
 
 # If these files are instrumented, boot hangs during the first second.
 KCOV_INSTRUMENT_common.o := n
 KCOV_INSTRUMENT_perf_event.o := n
+KCOV_INSTRUMENT_switch_gdt.o := n
 
 # As above, instrumenting secondary CPU boot code causes boot hangs.
 KCSAN_SANITIZE_common.o := n
+KCSAN_SANITIZE_switch_gdt.o := n
 
-# Make sure load_percpu_segment has no stackprotector
-CFLAGS_common.o		:= -fno-stack-protector
+# Make sure that switching the GDT and the per CPU segment
+# does not have stack protector enabled.
+CFLAGS_switch_gdt.o	:= -fno-stack-protector
 
 obj-y			:= cacheinfo.o scattered.o topology.o
-obj-y			+= common.o
+obj-y			+= common.o switch_gdt.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
 obj-y			+= bugs.o
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,16 +701,6 @@ static const char *table_lookup_model(st
 __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
-void load_percpu_segment(int cpu)
-{
-#ifdef CONFIG_X86_32
-	loadsegment(fs, __KERNEL_PERCPU);
-#else
-	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
-#endif
-}
-
 #ifdef CONFIG_X86_32
 /* The 32-bit entry code needs to find cpu_entry_area. */
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
@@ -738,18 +728,6 @@ void load_fixmap_gdt(int cpu)
 }
 EXPORT_SYMBOL_GPL(load_fixmap_gdt);
 
-/*
- * Current gdt points %fs at the "master" per-cpu area: after this,
- * it's on the real one.
- */
-void switch_to_new_gdt(int cpu)
-{
-	/* Load the original GDT */
-	load_direct_gdt(cpu);
-	/* Reload the per-cpu base */
-	load_percpu_segment(cpu);
-}
-
 static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 
 static void get_model_name(struct cpuinfo_x86 *c)
@@ -2228,12 +2206,6 @@ void cpu_init(void)
 	    boot_cpu_has(X86_FEATURE_TSC) || boot_cpu_has(X86_FEATURE_DE))
 		cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
 
-	/*
-	 * Initialize the per-CPU GDT with the boot GDT,
-	 * and set up the GDT descriptor:
-	 */
-	switch_to_new_gdt(cpu);
-
 	if (IS_ENABLED(CONFIG_X86_64)) {
 		loadsegment(fs, 0);
 		memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
--- /dev/null
+++ b/arch/x86/kernel/cpu/switch_gdt.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/processor.h>
+#include <asm/segment.h>
+#include <asm/desc.h>
+
+/*
+ * Invoked during early boot to switch from early GDT and early per CPU
+ * (%fs on 32bit, GS_BASE on 64bit) to the real GDT and the runtime per CPU
+ * area.
+ *
+ * This has to be done atomic because after switching from early GDT to
+ * the real one any per cpu variable access is going to fault because
+ * %fs resp. GS_BASE is not yet pointing to the real per CPU data.
+ *
+ * As a consequence this uses the native variants of load_gdt() and
+ * wrmsrl(). So XEN_PV has to take the fault and emulate.
+ */
+void __init switch_to_real_gdt(int cpu)
+{
+	struct desc_ptr gdt_descr;
+
+	gdt_descr.address = (long)get_cpu_gdt_rw(cpu);
+	gdt_descr.size = GDT_SIZE - 1;
+	native_load_gdt(&gdt_descr);
+
+#ifdef CONFIG_X86_32
+	loadsegment(fs, __KERNEL_PERCPU);
+#else
+	native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
+#endif
+}
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -211,7 +211,7 @@ void __init setup_per_cpu_areas(void)
 		 * area.  Reload any changed state for the boot CPU.
 		 */
 		if (!cpu)
-			switch_to_new_gdt(cpu);
+			switch_to_real_gdt(cpu);
 	}
 
 	/* indicate the early static arrays will soon be gone */
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1457,7 +1457,11 @@ void arch_thaw_secondary_cpus_end(void)
 void __init native_smp_prepare_boot_cpu(void)
 {
 	int me = smp_processor_id();
-	switch_to_new_gdt(me);
+
+	/* SMP invokes this from setup_per_cpu_areas() */
+	if (!IS_ENABLED(CONFIG_SMP))
+		switch_to_real_gdt(me);
+
 	/* already set me in cpu_online_mask in boot_cpu_init() */
 	cpumask_set_cpu(me, cpu_callout_mask);
 	cpu_set_state_online(me);
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1164,7 +1164,7 @@ static void __init xen_setup_gdt(int cpu
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
 
-	switch_to_new_gdt(cpu);
+	switch_to_real_gdt(cpu);
 
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
 	pv_ops.cpu.load_gdt = xen_load_gdt;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-17 21:54           ` Thomas Gleixner
@ 2022-07-18  5:11             ` Juergen Gross
  2022-07-18  6:54               ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Juergen Gross @ 2022-07-18  5:11 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt


[-- Attachment #1.1.1: Type: text/plain, Size: 7163 bytes --]

On 17.07.22 23:54, Thomas Gleixner wrote:
> On Sun, Jul 17 2022 at 22:13, Thomas Gleixner wrote:
>> On Sun, Jul 17 2022 at 22:08, Thomas Gleixner wrote:
>>> On Sun, Jul 17 2022 at 21:08, Thomas Gleixner wrote:
>>> loadsegment_simple() was a red herring. The gs segment is already zero.
>>>
>>> So what explodes here is the early boot when switching from early per
>>> CPU to the real per CPU area.
>>>
>>> start_kernel()
>>>          .....
>>>          setup_per_cpu_areas();
>>>          smp_prepare_boot_cpu()
>>
>> Bah. switch_to_new_gdt() is already invoked from setup_per_cpu_areas()
>> and then again in smp_prepare_boot_cpu() and once more in cpu_init(),
>>
>> What a mess.
> 
> So the below builds and boots at least on 64bit. I'll stare at it some
> more tomorrow. I have no idea whether native_load_gdt() works with
> XEN_PV. It should, but what do I know.

No, shouldn't work. But ...

> 
> Thanks,
> 
>          tglx
> ---
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -205,7 +205,7 @@ static inline void native_set_ldt(const
>   	}
>   }
>   
> -static inline void native_load_gdt(const struct desc_ptr *dtr)
> +static __always_inline void native_load_gdt(const struct desc_ptr *dtr)
>   {
>   	asm volatile("lgdt %0"::"m" (*dtr));
>   }
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -670,10 +670,9 @@ extern int sysenter_setup(void);
>   /* Defined in head.S */
>   extern struct desc_ptr		early_gdt_descr;
>   
> -extern void switch_to_new_gdt(int);
> +extern void switch_to_real_gdt(int);
>   extern void load_direct_gdt(int);
>   extern void load_fixmap_gdt(int);
> -extern void load_percpu_segment(int);
>   extern void cpu_init(void);
>   extern void cpu_init_secondary(void);
>   extern void cpu_init_exception_handling(void);
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -7,20 +7,24 @@
>   ifdef CONFIG_FUNCTION_TRACER
>   CFLAGS_REMOVE_common.o = -pg
>   CFLAGS_REMOVE_perf_event.o = -pg
> +CFLAGS_REMOVE_switch_gdt.o = -pg
>   endif
>   
>   # If these files are instrumented, boot hangs during the first second.
>   KCOV_INSTRUMENT_common.o := n
>   KCOV_INSTRUMENT_perf_event.o := n
> +KCOV_INSTRUMENT_switch_gdt.o := n
>   
>   # As above, instrumenting secondary CPU boot code causes boot hangs.
>   KCSAN_SANITIZE_common.o := n
> +KCSAN_SANITIZE_switch_gdt.o := n
>   
> -# Make sure load_percpu_segment has no stackprotector
> -CFLAGS_common.o		:= -fno-stack-protector
> +# Make sure that switching the GDT and the per CPU segment
> +# does not have stack protector enabled.
> +CFLAGS_switch_gdt.o	:= -fno-stack-protector
>   
>   obj-y			:= cacheinfo.o scattered.o topology.o
> -obj-y			+= common.o
> +obj-y			+= common.o switch_gdt.o
>   obj-y			+= rdrand.o
>   obj-y			+= match.o
>   obj-y			+= bugs.o
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -701,16 +701,6 @@ static const char *table_lookup_model(st
>   __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>   __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>   
> -void load_percpu_segment(int cpu)
> -{
> -#ifdef CONFIG_X86_32
> -	loadsegment(fs, __KERNEL_PERCPU);
> -#else
> -	__loadsegment_simple(gs, 0);
> -	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
> -#endif
> -}
> -
>   #ifdef CONFIG_X86_32
>   /* The 32-bit entry code needs to find cpu_entry_area. */
>   DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
> @@ -738,18 +728,6 @@ void load_fixmap_gdt(int cpu)
>   }
>   EXPORT_SYMBOL_GPL(load_fixmap_gdt);
>   
> -/*
> - * Current gdt points %fs at the "master" per-cpu area: after this,
> - * it's on the real one.
> - */
> -void switch_to_new_gdt(int cpu)
> -{
> -	/* Load the original GDT */
> -	load_direct_gdt(cpu);
> -	/* Reload the per-cpu base */
> -	load_percpu_segment(cpu);
> -}
> -
>   static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>   
>   static void get_model_name(struct cpuinfo_x86 *c)
> @@ -2228,12 +2206,6 @@ void cpu_init(void)
>   	    boot_cpu_has(X86_FEATURE_TSC) || boot_cpu_has(X86_FEATURE_DE))
>   		cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
>   
> -	/*
> -	 * Initialize the per-CPU GDT with the boot GDT,
> -	 * and set up the GDT descriptor:
> -	 */
> -	switch_to_new_gdt(cpu);
> -
>   	if (IS_ENABLED(CONFIG_X86_64)) {
>   		loadsegment(fs, 0);
>   		memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/switch_gdt.c
> @@ -0,0 +1,31 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <asm/processor.h>
> +#include <asm/segment.h>
> +#include <asm/desc.h>
> +
> +/*
> + * Invoked during early boot to switch from early GDT and early per CPU
> + * (%fs on 32bit, GS_BASE on 64bit) to the real GDT and the runtime per CPU
> + * area.
> + *
> + * This has to be done atomic because after switching from early GDT to
> + * the real one any per cpu variable access is going to fault because
> + * %fs resp. GS_BASE is not yet pointing to the real per CPU data.
> + *
> + * As a consequence this uses the native variants of load_gdt() and
> + * wrmsrl(). So XEN_PV has to take the fault and emulate.
> + */
> +void __init switch_to_real_gdt(int cpu)
> +{
> +	struct desc_ptr gdt_descr;
> +
> +	gdt_descr.address = (long)get_cpu_gdt_rw(cpu);
> +	gdt_descr.size = GDT_SIZE - 1;
> +	native_load_gdt(&gdt_descr);
> +
> +#ifdef CONFIG_X86_32
> +	loadsegment(fs, __KERNEL_PERCPU);
> +#else
> +	native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
> +#endif
> +}
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -211,7 +211,7 @@ void __init setup_per_cpu_areas(void)
>   		 * area.  Reload any changed state for the boot CPU.
>   		 */
>   		if (!cpu)
> -			switch_to_new_gdt(cpu);
> +			switch_to_real_gdt(cpu);
>   	}
>   
>   	/* indicate the early static arrays will soon be gone */
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1457,7 +1457,11 @@ void arch_thaw_secondary_cpus_end(void)
>   void __init native_smp_prepare_boot_cpu(void)
>   {
>   	int me = smp_processor_id();
> -	switch_to_new_gdt(me);
> +
> +	/* SMP invokes this from setup_per_cpu_areas() */
> +	if (!IS_ENABLED(CONFIG_SMP))
> +		switch_to_real_gdt(me);
> +
>   	/* already set me in cpu_online_mask in boot_cpu_init() */
>   	cpumask_set_cpu(me, cpu_callout_mask);
>   	cpu_set_state_online(me);
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1164,7 +1164,7 @@ static void __init xen_setup_gdt(int cpu
>   	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
>   	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
>   
> -	switch_to_new_gdt(cpu);
> +	switch_to_real_gdt(cpu);

... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?

>   
>   	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
>   	pv_ops.cpu.load_gdt = xen_load_gdt;
> 

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-18  5:11             ` Juergen Gross
@ 2022-07-18  6:54               ` Thomas Gleixner
  2022-07-18  8:55                 ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18  6:54 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Mon, Jul 18 2022 at 07:11, Juergen Gross wrote:
> On 17.07.22 23:54, Thomas Gleixner wrote:
>>   void __init native_smp_prepare_boot_cpu(void)
>>   {
>>   	int me = smp_processor_id();
>> -	switch_to_new_gdt(me);
>> +
>> +	/* SMP invokes this from setup_per_cpu_areas() */
>> +	if (!IS_ENABLED(CONFIG_SMP))
>> +		switch_to_real_gdt(me);
>> +
>>   	/* already set me in cpu_online_mask in boot_cpu_init() */
>>   	cpumask_set_cpu(me, cpu_callout_mask);
>>   	cpu_set_state_online(me);
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -1164,7 +1164,7 @@ static void __init xen_setup_gdt(int cpu
>>   	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
>>   	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
>>   
>> -	switch_to_new_gdt(cpu);
>> +	switch_to_real_gdt(cpu);
>
> ... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?

That does not solve the problem of having a disagreement between GDT and
GS_BASE. Let me dig into this some more.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-18  6:54               ` Thomas Gleixner
@ 2022-07-18  8:55                 ` Thomas Gleixner
  2022-07-18  9:31                   ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18  8:55 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt

On Mon, Jul 18 2022 at 08:54, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 07:11, Juergen Gross wrote:
>>> -	switch_to_new_gdt(cpu);
>>> +	switch_to_real_gdt(cpu);
>>
>> ... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?
>
> That does not solve the problem of having a disagreement between GDT and
> GS_BASE. Let me dig into this some more.

Bah. The real problem is __loadsegment_simple(gs, 0). After that GS_BASE
is 0. So any per CPU access before setting MSR_GS_BASE back to working
state is going into lala land.

So it's not the GDT. It's the mov 0, %gs which makes stuff go south, but
as %gs is already 0, we can keep the paravirt load_gdt() and use
native_write_msr() and everything should be happy.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-18  8:55                 ` Thomas Gleixner
@ 2022-07-18  9:31                   ` Peter Zijlstra
  2022-07-18 10:33                     ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18  9:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juergen Gross, Andrew Cooper, LKML, x86, Linus Torvalds,
	Tim Chen, Josh Poimboeuf, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt

On Mon, Jul 18, 2022 at 10:55:29AM +0200, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 08:54, Thomas Gleixner wrote:
> > On Mon, Jul 18 2022 at 07:11, Juergen Gross wrote:
> >>> -	switch_to_new_gdt(cpu);
> >>> +	switch_to_real_gdt(cpu);
> >>
> >> ... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?
> >
> > That does not solve the problem of having a disagreement between GDT and
> > GS_BASE. Let me dig into this some more.
> 
> Bah. The real problem is __loadsegment_simple(gs, 0). After that GS_BASE
> is 0. So any per CPU access before setting MSR_GS_BASE back to working
> state is going into lala land.
> 
> So it's not the GDT. It's the mov 0, %gs which makes stuff go south, but
> as %gs is already 0, we can keep the paravirt load_gdt() and use
> native_write_msr() and everything should be happy.

How is the ret from xen_load_gdt() not going to explode?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-18  9:31                   ` Peter Zijlstra
@ 2022-07-18 10:33                     ` Thomas Gleixner
  2022-07-18 11:42                       ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 10:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juergen Gross, Andrew Cooper, LKML, x86, Linus Torvalds,
	Tim Chen, Josh Poimboeuf, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt

On Mon, Jul 18 2022 at 11:31, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 10:55:29AM +0200, Thomas Gleixner wrote:
>> On Mon, Jul 18 2022 at 08:54, Thomas Gleixner wrote:
>> > On Mon, Jul 18 2022 at 07:11, Juergen Gross wrote:
>> >>> -	switch_to_new_gdt(cpu);
>> >>> +	switch_to_real_gdt(cpu);
>> >>
>> >> ... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?
>> >
>> > That does not solve the problem of having a disagreement between GDT and
>> > GS_BASE. Let me dig into this some more.
>> 
>> Bah. The real problem is __loadsegment_simple(gs, 0). After that GS_BASE
>> is 0. So any per CPU access before setting MSR_GS_BASE back to working
>> state is going into lala land.
>> 
>> So it's not the GDT. It's the mov 0, %gs which makes stuff go south, but
>> as %gs is already 0, we can keep the paravirt load_gdt() and use
>> native_write_msr() and everything should be happy.
>
> How is the ret from xen_load_gdt() not going to explode?

This is only for the early boot _before_ all the patching happens. So
that goes through the default retthunk.

Secondary CPUs do not need that as they set up GDT and GS_BASE in the
low level asm code before coming out to C.

I'm still trying to figure out how this works on XENPV and on 32bit.

Sigh...


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
  2022-07-18 10:33                     ` Thomas Gleixner
@ 2022-07-18 11:42                       ` Thomas Gleixner
  0 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 11:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juergen Gross, Andrew Cooper, LKML, x86, Linus Torvalds,
	Tim Chen, Josh Poimboeuf, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt

On Mon, Jul 18 2022 at 12:33, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 11:31, Peter Zijlstra wrote:
>> On Mon, Jul 18, 2022 at 10:55:29AM +0200, Thomas Gleixner wrote:
>>> On Mon, Jul 18 2022 at 08:54, Thomas Gleixner wrote:
>>> > On Mon, Jul 18 2022 at 07:11, Juergen Gross wrote:
>>> >>> -	switch_to_new_gdt(cpu);
>>> >>> +	switch_to_real_gdt(cpu);
>>> >>
>>> >> ... can't you use the paravirt variant of load_gdt in switch_to_real_gdt() ?
>>> >
>>> > That does not solve the problem of having a disagreement between GDT and
>>> > GS_BASE. Let me dig into this some more.
>>> 
>>> Bah. The real problem is __loadsegment_simple(gs, 0). After that GS_BASE
>>> is 0. So any per CPU access before setting MSR_GS_BASE back to working
>>> state is going into lala land.
>>> 
>>> So it's not the GDT. It's the mov 0, %gs which makes stuff go south, but
>>> as %gs is already 0, we can keep the paravirt load_gdt() and use
>>> native_write_msr() and everything should be happy.
>>
>> How is the ret from xen_load_gdt() not going to explode?
>
> This is only for the early boot _before_ all the patching happens. So
> that goes through the default retthunk.
>
> Secondary CPUs do not need that as they set up GDT and GS_BASE in the
> low level asm code before coming out to C.
>
> I'm still trying to figure out how this works on XENPV and on 32bit.

On 32bit the CPU comes out with GDT and FS correctly set too.

For XEN_PV it looks like cpu_initialize_context() hands down GDT and
GSBASE info to the hypercall which kicks the CPU so we should be
good there as well. Emphasis on should. Jürgen?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 0/3] x86/cpu: Sanitize switch_to_new_gdt()
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
  2022-07-17  0:22   ` Andrew Cooper
@ 2022-07-18 17:52   ` Thomas Gleixner
  2022-07-18 17:52   ` [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt() Thomas Gleixner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 17:52 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

This series is a replacement for patch 2/38 of the call depth stuffing
series as a follow up to the review feedback.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt()
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
  2022-07-17  0:22   ` Andrew Cooper
  2022-07-18 17:52   ` [patch 0/3] x86/cpu: Sanitize switch_to_new_gdt() Thomas Gleixner
@ 2022-07-18 17:52   ` Thomas Gleixner
  2022-07-18 18:43     ` Linus Torvalds
  2022-07-18 17:52   ` [patch 2/3] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations Thomas Gleixner
  2022-07-18 17:52   ` [patch 3/3] x86/cpu: Re-enable stackprotector Thomas Gleixner
  4 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 17:52 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On 32bit FS and on 64bit GS segments are already set up correctly, but
load_percpu_segment() still sets [FG]S after switching from the early GDT
to the direct GDT.

For 32bit the segment load has no side effects, but on 64bit it causes
GSBASE to become 0, which means that any per CPU access before GSBASE is
set to the new value is going to fault. That's the reason why the whole
file containing this code has stackprotector removed.

But that's a pointless exercise for both 32 and 64 bit as the relevant
segment selector is already correct. Loading the new GDT does not change
that.

Remove the segment loads and inline load_percpu_segment() into the only
caller. Add comments while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    1 
 arch/x86/kernel/cpu/common.c     |   42 ++++++++++++++++++++++++---------------
 2 files changed, 26 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -673,7 +673,6 @@ extern struct desc_ptr		early_gdt_descr;
 extern void switch_to_new_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
-extern void load_percpu_segment(int);
 extern void cpu_init(void);
 extern void cpu_init_secondary(void);
 extern void cpu_init_exception_handling(void);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,16 +701,6 @@ static const char *table_lookup_model(st
 __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
-void load_percpu_segment(int cpu)
-{
-#ifdef CONFIG_X86_32
-	loadsegment(fs, __KERNEL_PERCPU);
-#else
-	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
-#endif
-}
-
 #ifdef CONFIG_X86_32
 /* The 32-bit entry code needs to find cpu_entry_area. */
 DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
@@ -738,16 +728,36 @@ void load_fixmap_gdt(int cpu)
 }
 EXPORT_SYMBOL_GPL(load_fixmap_gdt);
 
-/*
- * Current gdt points %fs at the "master" per-cpu area: after this,
- * it's on the real one.
+/**
+ * switch_to_new_gdt - Switch form early GDT to the direct one
+ * @cpu:	The CPU number for which this is invoked
+ *
+ * Invoked during early boot to switch from early GDT and early per CPU
+ * (%fs on 32bit, GS_BASE on 64bit) to the direct GDT and the runtime per
+ * CPU area.
  */
 void switch_to_new_gdt(int cpu)
 {
-	/* Load the original GDT */
 	load_direct_gdt(cpu);
-	/* Reload the per-cpu base */
-	load_percpu_segment(cpu);
+
+	/*
+	 * No need to load the %gs (%fs for 32bit) segment. It is already
+	 * correct, %gs is 0 on 64bit and %fs is __KERNEL_PERCPU on 32 bit.
+	 *
+	 * Writing %gs on 64bit would zero GSBASE which would make any per
+	 * CPU operation up to the point of the wrmsrl() fault.
+	 *
+	 * 64bit requires to point GSBASE to the new offset. Until the
+	 * wrmsrl() happens the early mapping is still valid. That means
+	 * the GSBASE update will lose any prior per CPU data which was
+	 * not copied over in setup_per_cpu_areas().
+	 *
+	 * For secondary CPUs this is not a problem because they start
+	 * already with the direct GDT and the real GSBASE. This invocation
+	 * is pointless and will be removed in a subsequent step.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
 }
 
 static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 2/3] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
                     ` (2 preceding siblings ...)
  2022-07-18 17:52   ` [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt() Thomas Gleixner
@ 2022-07-18 17:52   ` Thomas Gleixner
  2022-07-18 17:52   ` [patch 3/3] x86/cpu: Re-enable stackprotector Thomas Gleixner
  4 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 17:52 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

The only place where switch_to_new_gdt() is required is early boot to
switch from the early GDT to the direct GDT. Any other invocation is
completely redundant because it does not change anything.

Secondary CPUs come out of the ASM code with GDT and GSBASE correctly set
up. The same is true for XEN_PV.

Remove all the voodoo invocations which are left overs from the ancient
past, rename the function to switch_to_direct_gdt() and mark it init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 +-
 arch/x86/kernel/cpu/common.c     |   14 ++------------
 arch/x86/kernel/setup_percpu.c   |    2 +-
 arch/x86/kernel/smpboot.c        |    6 +++++-
 arch/x86/xen/enlighten_pv.c      |    2 +-
 5 files changed, 10 insertions(+), 16 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -670,7 +670,7 @@ extern int sysenter_setup(void);
 /* Defined in head.S */
 extern struct desc_ptr		early_gdt_descr;
 
-extern void switch_to_new_gdt(int);
+extern void switch_to_direct_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
 extern void cpu_init(void);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -729,14 +729,14 @@ void load_fixmap_gdt(int cpu)
 EXPORT_SYMBOL_GPL(load_fixmap_gdt);
 
 /**
- * switch_to_new_gdt - Switch form early GDT to the direct one
+ * switch_to_direct_gdt - Switch form early GDT to the direct one
  * @cpu:	The CPU number for which this is invoked
  *
  * Invoked during early boot to switch from early GDT and early per CPU
  * (%fs on 32bit, GS_BASE on 64bit) to the direct GDT and the runtime per
  * CPU area.
  */
-void switch_to_new_gdt(int cpu)
+void __init switch_to_direct_gdt(int cpu)
 {
 	load_direct_gdt(cpu);
 
@@ -751,10 +751,6 @@ void switch_to_new_gdt(int cpu)
 	 * wrmsrl() happens the early mapping is still valid. That means
 	 * the GSBASE update will lose any prior per CPU data which was
 	 * not copied over in setup_per_cpu_areas().
-	 *
-	 * For secondary CPUs this is not a problem because they start
-	 * already with the direct GDT and the real GSBASE. This invocation
-	 * is pointless and will be removed in a subsequent step.
 	 */
 	if (IS_ENABLED(CONFIG_X86_64))
 		wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
@@ -2238,12 +2234,6 @@ void cpu_init(void)
 	    boot_cpu_has(X86_FEATURE_TSC) || boot_cpu_has(X86_FEATURE_DE))
 		cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
 
-	/*
-	 * Initialize the per-CPU GDT with the boot GDT,
-	 * and set up the GDT descriptor:
-	 */
-	switch_to_new_gdt(cpu);
-
 	if (IS_ENABLED(CONFIG_X86_64)) {
 		loadsegment(fs, 0);
 		memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -211,7 +211,7 @@ void __init setup_per_cpu_areas(void)
 		 * area.  Reload any changed state for the boot CPU.
 		 */
 		if (!cpu)
-			switch_to_new_gdt(cpu);
+			switch_to_direct_gdt(cpu);
 	}
 
 	/* indicate the early static arrays will soon be gone */
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1457,7 +1457,11 @@ void arch_thaw_secondary_cpus_end(void)
 void __init native_smp_prepare_boot_cpu(void)
 {
 	int me = smp_processor_id();
-	switch_to_new_gdt(me);
+
+	/* SMP invokes this from setup_per_cpu_areas() */
+	if (!IS_ENABLED(CONFIG_SMP))
+		switch_to_direct_gdt(me);
+
 	/* already set me in cpu_online_mask in boot_cpu_init() */
 	cpumask_set_cpu(me, cpu_callout_mask);
 	cpu_set_state_online(me);
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1164,7 +1164,7 @@ static void __init xen_setup_gdt(int cpu
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_ops.cpu.load_gdt = xen_load_gdt_boot;
 
-	switch_to_new_gdt(cpu);
+	switch_to_direct_gdt(cpu);
 
 	pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
 	pv_ops.cpu.load_gdt = xen_load_gdt;


^ permalink raw reply	[flat|nested] 142+ messages in thread

* [patch 3/3] x86/cpu: Re-enable stackprotector
  2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
                     ` (3 preceding siblings ...)
  2022-07-18 17:52   ` [patch 2/3] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations Thomas Gleixner
@ 2022-07-18 17:52   ` Thomas Gleixner
  4 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 17:52 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

Commit 5416c2663517 ("x86: make sure load_percpu_segment has no
stackprotector") disabled the stackprotector for cpu/common.c because of
load_percpu_segment(). Back then the boot stack canary was initialized very
early in start_kernel(). Switching the per CPU area by loading the GDT
caused the stackprotector to fail with paravirt enabled kernels as the
GSBASE was not updated yet. In hindsight a wrong change because it would
have been sufficient to ensure that the canary is the same in both per CPU
areas.

Commit d55535232c3d ("random: move rand_initialize() earlier") moved the
stack canary initialization to a later point in the init sequence. As a
consequence the per CPU stack canary is 0 when switching the per CPU areas,
so there is no requirement anymore to exclude this file.

Add a comment to load_percpu_segment().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/Makefile |    3 ---
 arch/x86/kernel/cpu/common.c |    3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -16,9 +16,6 @@ KCOV_INSTRUMENT_perf_event.o := n
 # As above, instrumenting secondary CPU boot code causes boot hangs.
 KCSAN_SANITIZE_common.o := n
 
-# Make sure load_percpu_segment has no stackprotector
-CFLAGS_common.o		:= -fno-stack-protector
-
 obj-y			:= cacheinfo.o scattered.o topology.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -751,6 +751,9 @@ void __init switch_to_direct_gdt(int cpu
 	 * wrmsrl() happens the early mapping is still valid. That means
 	 * the GSBASE update will lose any prior per CPU data which was
 	 * not copied over in setup_per_cpu_areas().
+	 *
+	 * This works even with stackprotector enabled because the
+	 * per CPU stack canary is 0 in both per CPU areas.
 	 */
 	if (IS_ENABLED(CONFIG_X86_64))
 		wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt()
  2022-07-18 17:52   ` [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt() Thomas Gleixner
@ 2022-07-18 18:43     ` Linus Torvalds
  2022-07-18 18:55       ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 18:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

So I appreciate the added big comments in this code, but looking at this patch:

On Mon, Jul 18, 2022 at 10:52 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> +/**
> + * switch_to_new_gdt - Switch form early GDT to the direct one
> + * @cpu:       The CPU number for which this is invoked
> + *
> + * Invoked during early boot to switch from early GDT and early per CPU
> + * (%fs on 32bit, GS_BASE on 64bit) to the direct GDT and the runtime per
> + * CPU area.
>   */
>  void switch_to_new_gdt(int cpu)
>  {
> -       /* Load the original GDT */
>         load_direct_gdt(cpu);
> -       /* Reload the per-cpu base */
> -       load_percpu_segment(cpu);
> +
> +       /*
> +        * No need to load the %gs (%fs for 32bit) segment. It is already
> +        * correct, %gs is 0 on 64bit and %fs is __KERNEL_PERCPU on 32 bit.
> +        *
> +        * Writing %gs on 64bit would zero GSBASE which would make any per
> +        * CPU operation up to the point of the wrmsrl() fault.
> +        *
> +        * 64bit requires to point GSBASE to the new offset. Until the
> +        * wrmsrl() happens the early mapping is still valid. That means
> +        * the GSBASE update will lose any prior per CPU data which was
> +        * not copied over in setup_per_cpu_areas().
> +        *
> +        * For secondary CPUs this is not a problem because they start
> +        * already with the direct GDT and the real GSBASE. This invocation
> +        * is pointless and will be removed in a subsequent step.
> +        */
> +       if (IS_ENABLED(CONFIG_X86_64))
> +               wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>  }

... while those comments are nice and all, I do think this retains the
basic insanity of having "switch_to_new_gdt()" do magical things on
x86-64 that don't really match the name.

So honestly, I'd be happier of that whole

       if (IS_ENABLED(CONFIG_X86_64))
               wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));

was migrated to the callers instead. There aren't *that* many callers.

I expect that it is then quite possible that several of the call-sites
would go "GS_BASE is already correct here, I can remove this".

But even if every single caller keeps that wrmsrl() around, at least
it wouldn't be hidden behind a function call that has a name that
implies something completely different is happening.

And no, I don't care *that* deeply, so this is just a suggestion.

But wouldn't it be nice if this function was actually named by what it
does, rather than by what it used to do back in the i386 days when the
GDT affected the segment bases?

                  Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt()
  2022-07-18 18:43     ` Linus Torvalds
@ 2022-07-18 18:55       ` Thomas Gleixner
  0 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 18:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18 2022 at 11:43, Linus Torvalds wrote:
> So I appreciate the added big comments in this code, but looking at this patch:
> On Mon, Jul 18, 2022 at 10:52 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> +        * For secondary CPUs this is not a problem because they start
>> +        * already with the direct GDT and the real GSBASE. This invocation
>> +        * is pointless and will be removed in a subsequent step.
>> +        */
>> +       if (IS_ENABLED(CONFIG_X86_64))
>> +               wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>>  }
>
> ... while those comments are nice and all, I do think this retains the
> basic insanity of having "switch_to_new_gdt()" do magical things on
> x86-64 that don't really match the name.
>
> So honestly, I'd be happier of that whole
>
>        if (IS_ENABLED(CONFIG_X86_64))
>                wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
>
> was migrated to the callers instead. There aren't *that* many callers.
>
> I expect that it is then quite possible that several of the call-sites
> would go "GS_BASE is already correct here, I can remove this".

With the next patch we have only two left. The SMP and the UP case. Let
me look whether the UP needs it at all.

> But even if every single caller keeps that wrmsrl() around, at least
> it wouldn't be hidden behind a function call that has a name that
> implies something completely different is happening.
>
> And no, I don't care *that* deeply, so this is just a suggestion.
>
> But wouldn't it be nice if this function was actually named by what it
> does, rather than by what it used to do back in the i386 days when the
> GDT affected the segment bases?

Yes. Let me come up with a sensible name.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (38 preceding siblings ...)
  2022-07-17  9:45 ` [patch 00/38] x86/retbleed: Call " David Laight
@ 2022-07-18 19:29 ` Thomas Gleixner
  2022-07-18 19:30   ` Thomas Gleixner
  2022-07-18 19:55 ` Thomas Gleixner
                   ` (2 subsequent siblings)
  42 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 19:29 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Sun, Jul 17 2022 at 01:17, Thomas Gleixner wrote:
> The function alignment option does not work for that because it just
> guarantees that the next function entry is aligned, but the padding size
> depends on the position of the last instruction of the previous function
> which might be anything between 0 and padsize-1 obviously, which is not a
> good starting point to put 10 bytes of accounting code into it reliably.
>
> I hacked up GCC to emit such padding and from first experimentation it
> brings quite some performance back.
>
>            	      	 IBRS	    stuff       stuff(pad)
> sockperf 14   bytes: 	 -23.76%    -19.26%     -14.31%
> sockperf 1472 bytes: 	 -22.51%    -18.40%     -12.25%
> microbench:   	     	 +37.20%    +18.46%     +15.47%    
> hackbench:	     	 +21.24%    +10.94%     +10.12%
>
> For FIO I don't have numbers yet, but I expect FIO to get a significant
> gain too.
>
>>From a quick survey it seems to have no impact for the case where the
> thunks are not used. But that really needs some deep investigation and
> there is a potential conflict with the clang CFI efforts.
>
> The kernel text size increases with a Debian config from 9.9M to 10.4M, so
> about 5%. If the thunk is not 16 byte aligned, the text size increase is
> about 3%, but it turned out that 16 byte aligned is slightly faster.
>
> The 16 byte function alignment turned out to be beneficial in general even
> without the thunks. Not much of an improvement, but measurable. We should
> revisit this independent of these horrors.
>
> The implementation falls back to the allocated thunks when padding is not
> available. I'll send out the GCC patch and the required kernel patch as a
> reply to this series after polishing it a bit.

Here it goes. GCC hackery first.

---
Subject: gcc: Add padding in front of function entry points
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 15 Jul 2022 14:37:53 +0200

For testing purposes:

Add a 16 byte padding filled with int3 in front of each function entry
so the kernel can put call depth accounting into it.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 gcc/config/i386/i386.cc  |   11 +++++++++++
 gcc/config/i386/i386.h   |    7 +++++++
 gcc/config/i386/i386.opt |    4 ++++
 gcc/doc/invoke.texi      |    6 ++++++
 4 files changed, 28 insertions(+)

--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6182,6 +6182,17 @@ ix86_code_end (void)
     file_end_indicate_split_stack ();
 }
 
+void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED)
+{
+  if (flag_force_function_padding)
+    {
+      fprintf (asm_out_file, "\t.align 16\n");
+      fprintf (asm_out_file, "\t.skip 16,0xcc\n");
+    }
+}
+
 /* Emit code for the SET_GOT patterns.  */
 
 const char *
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2860,6 +2860,13 @@ extern enum attr_cpu ix86_schedule;
 #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
 #endif
 
+#include <stdio.h>
+extern void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED);
+#undef ASM_OUTPUT_FUNCTION_PREFIX
+#define ASM_OUTPUT_FUNCTION_PREFIX x86_asm_output_function_prefix
+
 /*
 Local variables:
 version-control: t
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1064,6 +1064,10 @@ mindirect-branch=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
 Convert indirect call and jump to call and return thunks.
 
+mforce-function-padding
+Target Var(flag_force_function_padding) Init(0)
+Put a 16 byte padding area before each function
+
 mfunction-return=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
 Convert function return to call and return thunk.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1451,6 +1451,7 @@ See RS/6000 and PowerPC Options.
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
 -mindirect-branch-register -mharden-sls=@var{choice} @gol
 -mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access}
+-mforce-function-padding @gol
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -32849,6 +32850,11 @@ Force all calls to functions to be indir
 when using Intel Processor Trace where it generates more precise timing
 information for function calls.
 
+@item -mforce-function-padding
+@opindex -mforce-function-padding
+Force a 16 byte padding are before each function which allows run-time
+code patching to put a special prologue before the function entry.
+
 @item -mmanual-endbr
 @opindex mmanual-endbr
 Insert ENDBR instruction at function entry only via the @code{cf_check}

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 19:29 ` Thomas Gleixner
@ 2022-07-18 19:30   ` Thomas Gleixner
  2022-07-18 19:51     ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 19:30 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18 2022 at 21:29, Thomas Gleixner wrote:
>> The implementation falls back to the allocated thunks when padding is not
>> available. I'll send out the GCC patch and the required kernel patch as a
>> reply to this series after polishing it a bit.
>
> Here it goes. GCC hackery first.

And the kernel counterpart.

---
Subject: x06/callthunks: Put thunks into compiler provided padding area
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 15 Jul 2022 16:12:47 +0200

      - NOT FOR INCLUSION -

Let the compiler add a 16 byte padding in front of each function entry
point and put the call depth accounting there. That avoids calling out
into the module area and reduces ITLB pressure.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig             |   14 ++++++
 arch/x86/Makefile            |    4 +
 arch/x86/kernel/callthunks.c |   99 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 115 insertions(+), 2 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2440,6 +2440,9 @@ config CC_HAS_SLS
 config CC_HAS_RETURN_THUNK
 	def_bool $(cc-option,-mfunction-return=thunk-extern)
 
+config CC_HAS_PADDING
+	def_bool $(cc-option,-mforce-function-padding)
+
 config HAVE_CALL_THUNKS
 	def_bool y
 	depends on RETHUNK && OBJTOOL
@@ -2512,6 +2515,17 @@ config CALL_DEPTH_TRACKING
 	  of this option is marginal as the call depth tracking is using
 	  run-time generated call thunks and call patching.
 
+config CALL_THUNKS_IN_PADDING
+	bool "Put call depth into padding area before function"
+	depends on CALL_DEPTH_TRACKING && CC_HAS_PADDING
+	default n
+	help
+	  Put the call depth accounting into a padding area before the
+	  function entry. This padding area is generated by the
+	  compiler. This increases text size by ~5%. For non affected
+	  systems this space is unused. On affected SKL systems this
+	  results in a significant performance gain.
+
 config CALL_THUNKS_DEBUG
 	bool "Enable call thunks and call depth tracking debugging"
 	depends on CALL_DEPTH_TRACKING
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -197,6 +197,10 @@ ifdef CONFIG_SLS
   KBUILD_CFLAGS += -mharden-sls=all
 endif
 
+ifdef CONFIG_CALL_THUNKS_IN_PADDING
+  KBUILD_CFLAGS += -mforce-function-padding
+endif
+
 KBUILD_LDFLAGS += -m elf_$(UTS_MACHINE)
 
 ifdef CONFIG_LTO_CLANG
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -92,6 +92,7 @@ struct thunk_mem {
 
 struct thunk_mem_area {
 	struct thunk_mem	*tmem;
+	unsigned long		*dests;
 	unsigned long		start;
 	unsigned long		nthunks;
 };
@@ -181,6 +182,16 @@ static __init_or_module void callthunk_f
 	      tmem->base + area->start * callthunk_desc.thunk_size,
 	      area->start, area->nthunks);
 
+	/* Remove thunks in the padding area */
+	for (i = 0; area->dests && i < area->nthunks; i++) {
+		void *dest = (void *)area->dests[i];
+
+		if (!dest)
+			continue;
+		pr_info("Remove %px at index %u\n", dest, i);
+		btree_remove64(&call_thunks, (unsigned long)dest);
+	}
+
 	/* Jump starts right after the template */
 	thunk = tmem->base + area->start * callthunk_desc.thunk_size;
 	tp = thunk + callthunk_desc.template_size;
@@ -204,6 +215,7 @@ static __init_or_module void callthunk_f
 		size = area->nthunks * callthunk_desc.thunk_size;
 		text_poke_set_locked(thunk, 0xcc, size);
 	}
+	vfree(area->dests);
 	kfree(area);
 }
 
@@ -289,7 +301,8 @@ patch_paravirt_call_sites(struct paravir
 		patch_call(p->instr, layout);
 }
 
-static struct thunk_mem_area *callthunks_alloc(unsigned int nthunks)
+static struct thunk_mem_area *callthunks_alloc(unsigned int nthunks,
+					       bool module)
 {
 	struct thunk_mem_area *area;
 	unsigned int size, mapsize;
@@ -299,6 +312,13 @@ static struct thunk_mem_area *callthunks
 	if (!area)
 		return NULL;
 
+	if (module) {
+		area->dests = vzalloc(nthunks * sizeof(unsigned long));
+		if (!area->dests)
+			goto free_area;
+		pr_info("Allocated dests array: %px\n", area->dests);
+	}
+
 	list_for_each_entry(tmem, &thunk_mem_list, list) {
 		unsigned long start;
 
@@ -340,6 +360,7 @@ static struct thunk_mem_area *callthunks
 free_tmem:
 	kfree(tmem);
 free_area:
+	vfree(area->dests);
 	kfree(area);
 	return NULL;
 }
@@ -372,6 +393,73 @@ static __init_or_module int callthunk_se
 	return 0;
 }
 
+int setup_padding_thunks(s32 *start, s32 *end, struct thunk_mem_area *area,
+			 struct module_layout *layout)
+{
+	int nthunks = 0, idx = 0;
+	s32 *s;
+
+	if (callthunk_desc.template_size > 16)
+		return 0;
+
+	for (s = start; s < end; s++) {
+		void *thunk, *tp, *dest = (void *)s + *s;
+		unsigned long key = (unsigned long)dest;
+		int fail, i;
+		u8 opcode;
+
+		if (is_inittext(layout, dest)) {
+			prdbg("Ignoring init dest: %pS %px\n", dest, dest);
+			return 0;
+		}
+
+		/* Multiple symbols can have the same location. */
+		if (btree_lookup64(&call_thunks, key)) {
+			prdbg("Ignoring duplicate dest: %pS %px\n", dest, dest);
+			continue;
+		}
+
+		thunk = tp = dest - 16;
+		prdbg("Probing dest: %pS %px at %px\n", dest, dest, tp);
+		pagefault_disable();
+		fail = 0;
+		for (i = 0; !fail && i < 16; i++) {
+			if (get_kernel_nofault(opcode, tp + i)) {
+				fail = 1;
+			} else if (opcode != 0xcc) {
+				fail = 2;
+			}
+		}
+		pagefault_enable();
+		switch (fail) {
+		case 1:
+			prdbg("Faulted for dest: %pS %px\n", dest, dest);
+			nthunks++;
+			continue;
+		case 2:
+			prdbg("No padding for dest: %pS %px\n", dest, dest);
+			nthunks++;
+			continue;
+		}
+
+		prdbg("Thunk for dest: %pS %px at %px\n", dest, dest, tp);
+		memcpy(tp, callthunk_desc.template, callthunk_desc.template_size);
+		tp += callthunk_desc.template_size;
+		memcpy(tp, x86_nops[6], 6);
+
+		if (area->dests) {
+			pr_info("Insert %px at index %d\n", dest, idx);
+			area->dests[idx++] = key;
+		}
+
+		fail = btree_insert64(&call_thunks, key, (void *)thunk, GFP_KERNEL);
+		if (fail)
+			return fail;
+	}
+	prdbg("%d external thunks required\n", nthunks);
+	return 0;
+}
+
 static __init_or_module int callthunks_setup(struct callthunk_sites *cs,
 					     struct module_layout *layout)
 {
@@ -394,7 +482,7 @@ static __init_or_module int callthunks_s
 	if (!nthunks)
 		goto patch;
 
-	area = callthunks_alloc(nthunks);
+	area = callthunks_alloc(nthunks, !!layout->mtn.mod);
 	if (!area)
 		return -ENOMEM;
 
@@ -420,6 +508,13 @@ static __init_or_module int callthunks_s
 		prdbg("Using thunk vbuf %px\n", vbuf);
 	}
 
+	if (IS_ENABLED(CONFIG_CALL_THUNKS_IN_PADDING)) {
+		ret = setup_padding_thunks(cs->syms_start, cs->syms_end,
+					   area, layout);
+		if (ret < 0)
+			goto fail;
+	}
+
 	for (s = cs->syms_start; s < cs->syms_end; s++) {
 		void *dest = (void *)s + *s;
 

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 19:30   ` Thomas Gleixner
@ 2022-07-18 19:51     ` Linus Torvalds
  2022-07-18 20:44       ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 19:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 12:30 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Let the compiler add a 16 byte padding in front of each function entry
> point and put the call depth accounting there. That avoids calling out
> into the module area and reduces ITLB pressure.

Ooh.

I actually like this a lot better.

Could we just say "use this instead if you have SKL and care about the issue?"

I don't hate your module thunk trick, but this does seem *so* much
simpler, and if it performs better anyway, it really does seem like
the better approach.

And people and distros who care would have an easy time adding that
simple compiler patch instead.

I do think that for generality, the "-mforce-function-padding" option
should perhaps take as an argument how much padding (and how much
alignment) to force:

    -mforce-function-padding=5:16

would force 5 bytes of minimum padding, and align functions to 16
bytes. It should be easy to generate (no more complexity than your
current one) by just making the output do

        .skip 5,0xcc
        .palign 4,0xcc

and now you can specify that you only need X bytes of padding, for example.

                        Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (39 preceding siblings ...)
  2022-07-18 19:29 ` Thomas Gleixner
@ 2022-07-18 19:55 ` Thomas Gleixner
  2022-07-19 10:24 ` Virt " Andrew Cooper
  2022-07-20 16:57 ` [patch 00/38] x86/retbleed: " Steven Rostedt
  42 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 19:55 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Sun, Jul 17 2022 at 01:17, Thomas Gleixner wrote:
> For 4 RET pathes randomized with randomize_kstack_offset=y and RSP bit 3, 6, 5:
>
>           	  	IBRS       stuff	 stuff(pad)    confuse
>   microbench:	       	+37.20%	   +18.46%	 +15.47%       +7.46%	 
>   sockperf 14   bytes: 	-23.76%	   -19.26% 	 -14.31%      -16.80%
>   sockperf 1472 bytes: 	-22.51%	   -18.40% 	 -12.25%      -15.95%
>
> So for the more randomized variant sockperf tanks and is already slower
> than stuffing with thunks in the compiler provided padding space.
>
> I send out a patch in reply to this series which implements that variant,
> but there needs to be input from the security researchers how protective
> this is. If we could get away with 2 RET pathes (perhaps multiple instances
> with different bits), that would be amazing.

Here is goes.
---

Subject: x86/retbleed: Add confusion mitigation
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 15 Jul 2022 11:41:05 +0200

- NOT FOR INCLUSION -

Experimental option to confuse the return path by randomization.

The following command line options enable this:

    retbleed=confuse  	   	   4 return pathes
    retbleed=confuse,4  	   4 return pathes
    retbleed=confuse,3 	   	   3 return pathes
    retbleed=confuse,2  	   2 return pathes

This need scrunity by security researchers.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                     |   12 ++++++
 arch/x86/include/asm/nospec-branch.h |   23 +++++++++++
 arch/x86/kernel/cpu/bugs.c           |   41 +++++++++++++++++++++
 arch/x86/lib/retpoline.S             |   68 +++++++++++++++++++++++++++++++++++
 include/linux/randomize_kstack.h     |    6 +++
 kernel/entry/common.c                |    3 +
 6 files changed, 153 insertions(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2538,6 +2538,18 @@ config CALL_THUNKS_DEBUG
 	  Only enable this, when you are debugging call thunks as this
 	  creates a noticable runtime overhead. If unsure say N.
 
+config RETURN_CONFUSION
+	bool "Mitigate RSB underflow with return confusion"
+	depends on CPU_SUP_INTEL && RETHUNK && RANDOMIZE_KSTACK_OFFSET
+	default y
+	help
+	  Compile the kernel with return path confusion to mitigate the
+	  Intel SKL Return-Speculation-Buffer (RSB) underflow issue. The
+	  mitigation is off by default and needs to be enabled on the
+	  kernel command line via the retbleed=confuse option. For
+	  non-affected systems the overhead of this option is marginal as
+	  the return thunk jumps are patched to direct ret instructions.
+
 config CPU_IBPB_ENTRY
 	bool "Enable IBPB on kernel entry"
 	depends on CPU_SUP_AMD
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -312,6 +312,29 @@ static inline void x86_set_skl_return_th
 
 #endif
 
+#ifdef CONFIG_RETURN_CONFUSION
+extern void __x86_return_confused_skl2(void);
+extern void __x86_return_confused_skl3(void);
+extern void __x86_return_confused_skl4(void);
+
+static inline void x86_set_skl_confused_return_thunk(int which)
+{
+	switch (which) {
+	case 2:
+		x86_return_thunk = &__x86_return_confused_skl2;
+		break;
+	case 3:
+		x86_return_thunk = &__x86_return_confused_skl3;
+		break;
+	case 4:
+		x86_return_thunk = &__x86_return_confused_skl4;
+		break;
+	}
+}
+#else
+static inline void x86_set_skl_confused_return_thunk(void) { }
+#endif
+
 #ifdef CONFIG_RETPOLINE
 
 #define GEN(reg) \
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -14,6 +14,7 @@
 #include <linux/module.h>
 #include <linux/nospec.h>
 #include <linux/prctl.h>
+#include <linux/randomize_kstack.h>
 #include <linux/sched/smt.h>
 #include <linux/pgtable.h>
 #include <linux/bpf.h>
@@ -785,6 +786,7 @@ enum retbleed_mitigation {
 	RETBLEED_MITIGATION_IBRS,
 	RETBLEED_MITIGATION_EIBRS,
 	RETBLEED_MITIGATION_STUFF,
+	RETBLEED_MITIGATION_CONFUSE,
 };
 
 enum retbleed_mitigation_cmd {
@@ -793,6 +795,7 @@ enum retbleed_mitigation_cmd {
 	RETBLEED_CMD_UNRET,
 	RETBLEED_CMD_IBPB,
 	RETBLEED_CMD_STUFF,
+	RETBLEED_CMD_CONFUSE,
 };
 
 const char * const retbleed_strings[] = {
@@ -802,6 +805,7 @@ const char * const retbleed_strings[] =
 	[RETBLEED_MITIGATION_IBRS]	= "Mitigation: IBRS",
 	[RETBLEED_MITIGATION_EIBRS]	= "Mitigation: Enhanced IBRS",
 	[RETBLEED_MITIGATION_STUFF]	= "Mitigation: Stuffing",
+	[RETBLEED_MITIGATION_CONFUSE]	= "Mitigation: Return confusion",
 };
 
 static enum retbleed_mitigation retbleed_mitigation __ro_after_init =
@@ -810,6 +814,7 @@ static enum retbleed_mitigation_cmd retb
 	RETBLEED_CMD_AUTO;
 
 static int __ro_after_init retbleed_nosmt = false;
+static int __ro_after_init rethunk_confuse_skl = 4;
 
 static int __init retbleed_parse_cmdline(char *str)
 {
@@ -833,8 +838,19 @@ static int __init retbleed_parse_cmdline
 			retbleed_cmd = RETBLEED_CMD_IBPB;
 		} else if (!strcmp(str, "stuff")) {
 			retbleed_cmd = RETBLEED_CMD_STUFF;
+		} else if (!strcmp(str, "confuse")) {
+			retbleed_cmd = RETBLEED_CMD_CONFUSE;
 		} else if (!strcmp(str, "nosmt")) {
 			retbleed_nosmt = true;
+		} else if (retbleed_cmd == RETBLEED_CMD_CONFUSE &&
+			   !kstrtouint(str, 10, &rethunk_confuse_skl)) {
+
+			if (rethunk_confuse_skl < 2 ||
+			    rethunk_confuse_skl > 4) {
+				pr_err("Ignoring out-of-bound stuff count (%d).",
+				       rethunk_confuse_skl);
+				rethunk_confuse_skl = 4;
+			}
 		} else {
 			pr_err("Ignoring unknown retbleed option (%s).", str);
 		}
@@ -896,6 +912,25 @@ static void __init retbleed_select_mitig
 		}
 		break;
 
+	case RETBLEED_CMD_CONFUSE:
+		if (IS_ENABLED(CONFIG_RETURN_CONFUSION) &&
+		    spectre_v2_enabled == SPECTRE_V2_RETPOLINE &&
+		    random_kstack_offset_enabled()) {
+			retbleed_mitigation = RETBLEED_MITIGATION_CONFUSE;
+		} else {
+			if (IS_ENABLED(CONFIG_RETURN_CONFUSION) &&
+			    random_kstack_offset_enabled())
+				pr_err("WARNING: retbleed=confuse depends on randomize_kstack_offset=y\n");
+			else if (IS_ENABLED(CONFIG_RETURN_CONFUSION) &&
+				 spectre_v2_enabled != SPECTRE_V2_RETPOLINE)
+				pr_err("WARNING: retbleed=confuse depends on spectre_v2=retpoline\n");
+			else
+				pr_err("WARNING: kernel not compiled with RETURN_CONFUSION.\n");
+
+			goto do_cmd_auto;
+		}
+		break;
+
 do_cmd_auto:
 	case RETBLEED_CMD_AUTO:
 	default:
@@ -939,6 +974,11 @@ static void __init retbleed_select_mitig
 		x86_set_skl_return_thunk();
 		break;
 
+	case RETBLEED_MITIGATION_CONFUSE:
+		setup_force_cpu_cap(X86_FEATURE_RETHUNK);
+		x86_set_skl_confused_return_thunk(rethunk_confuse_skl);
+		break;
+
 	default:
 		break;
 	}
@@ -1389,6 +1429,7 @@ static void __init spectre_v2_select_mit
 		    boot_cpu_has_bug(X86_BUG_RETBLEED) &&
 		    retbleed_cmd != RETBLEED_CMD_OFF &&
 		    retbleed_cmd != RETBLEED_CMD_STUFF &&
+		    retbleed_cmd != RETBLEED_CMD_CONFUSE &&
 		    boot_cpu_has(X86_FEATURE_IBRS) &&
 		    boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
 			mode = SPECTRE_V2_IBRS;
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -230,3 +230,71 @@ SYM_FUNC_START(__x86_return_skl)
 SYM_FUNC_END(__x86_return_skl)
 
 #endif /* CONFIG_CALL_DEPTH_TRACKING */
+
+#ifdef CONFIG_RETURN_CONFUSION
+	.align 64
+SYM_FUNC_START(__x86_return_confused_skl4)
+	ANNOTATE_NOENDBR
+	testq	$3, %rsp
+	jz	1f
+
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+1:
+	testq	$6, %rsp
+	jz	2f
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+
+2:
+	testq	$5, %rsp
+	jz	3f
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+3:
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+SYM_FUNC_END(__x86_return_confused_skl4)
+
+	.align 64
+SYM_FUNC_START(__x86_return_confused_skl3)
+	ANNOTATE_NOENDBR
+	testq	$3, %rsp
+	jz	1f
+
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+1:
+	testq	$6, %rsp
+	jz	2f
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+
+2:
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+SYM_FUNC_END(__x86_return_confused_skl3)
+
+	.align 64
+SYM_FUNC_START(__x86_return_confused_skl2)
+	ANNOTATE_NOENDBR
+	testq	$3, %rsp
+	jz	1f
+
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+1:
+	ANNOTATE_UNRET_SAFE
+	ret
+	int3
+SYM_FUNC_END(__x86_return_confused_skl2)
+
+#endif /* CONFIG_RETURN_CONFUSION */
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -84,9 +84,15 @@ DECLARE_PER_CPU(u32, kstack_offset);
 		raw_cpu_write(kstack_offset, offset);			\
 	}								\
 } while (0)
+
+#define random_kstack_offset_enabled()					\
+	static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
+				&randomize_kstack_offset)
+
 #else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
 #define add_random_kstack_offset()		do { } while (0)
 #define choose_random_kstack_offset(rand)	do { } while (0)
+#define random_kstack_offset_enabled()		false
 #endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
 
 #endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -298,6 +298,7 @@ void syscall_exit_to_user_mode_work(stru
 
 noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
 {
+	add_random_kstack_offset();
 	__enter_from_user_mode(regs);
 }
 
@@ -444,6 +445,8 @@ irqentry_state_t noinstr irqentry_nmi_en
 {
 	irqentry_state_t irq_state;
 
+	if (user_mode(regs))
+		add_random_kstack_offset();
 	irq_state.lockdep = lockdep_hardirqs_enabled();
 
 	__nmi_enter();

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 19:51     ` Linus Torvalds
@ 2022-07-18 20:44       ` Thomas Gleixner
  2022-07-18 21:01         ` Linus Torvalds
                           ` (2 more replies)
  0 siblings, 3 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 20:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18 2022 at 12:51, Linus Torvalds wrote:
> On Mon, Jul 18, 2022 at 12:30 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Let the compiler add a 16 byte padding in front of each function entry
>> point and put the call depth accounting there. That avoids calling out
>> into the module area and reduces ITLB pressure.
>
> Ooh.
>
> I actually like this a lot better.
>
> Could we just say "use this instead if you have SKL and care about the issue?"
>
> I don't hate your module thunk trick, but this does seem *so* much
> simpler, and if it performs better anyway, it really does seem like
> the better approach.

Yes, Peter and I came from avoiding a new compiler and the overhead for
everyone when putting the padding into the code. We realized only when
staring at the perf data that this padding in front of the function
might be an acceptable solution. I did some more tests today on different
machines with mitigations=off with kernels compiled with and without
that padding. I couldn't find a single test case where the result was
outside of the usual noise. But then my tests are definitely incomplete.

> And people and distros who care would have an easy time adding that
> simple compiler patch instead.
>
> I do think that for generality, the "-mforce-function-padding" option
> should perhaps take as an argument how much padding (and how much
> alignment) to force:
>
>     -mforce-function-padding=5:16
>
> would force 5 bytes of minimum padding, and align functions to 16
> bytes. It should be easy to generate (no more complexity than your
> current one) by just making the output do
>
>         .skip 5,0xcc
>         .palign 4,0xcc
>
> and now you can specify that you only need X bytes of padding, for example.

Yes, I know. But it was horrible enough to find the right spot in that
gcc maze. Then I was happy that I figured how to add the boolean
option. I let real compiler people take care of the rest. HJL???

And we need input from the Clang folks because their CFI work also puts
stuff in front of the function entry, which nicely collides.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 36/38] x86/ftrace: Make it call depth tracking aware
  2022-07-16 23:18 ` [patch 36/38] x86/ftrace: Make it call depth tracking aware Thomas Gleixner
@ 2022-07-18 21:01   ` Steven Rostedt
  2022-07-19  8:46     ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Steven Rostedt @ 2022-07-18 21:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Peter Zijlstra (Intel)

On Sun, 17 Jul 2022 01:18:08 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> @@ -280,7 +291,19 @@ SYM_INNER_LABEL(ftrace_regs_caller_end,
>  	/* Restore flags */
>  	popfq
>  	UNWIND_HINT_FUNC
> -	jmp	ftrace_epilogue
> +
> +	/*
> +	 * Since we're effectively emulating a tail-call with PUSH;RET
> +	 * make sure we don't unbalance the RSB and mess up accounting.
> +	 */
> +	ANNOTATE_INTRA_FUNCTION_CALL
> +	call	2f
> +	int3
> +2:
> +	add	$8, %rsp
> +	ALTERNATIVE __stringify(RET), \
> +		    __stringify(ANNOTATE_UNRET_SAFE; ret; int3), \
> +		    X86_FEATURE_CALL_DEPTH
>  
>  SYM_FUNC_END(ftrace_regs_caller)

Would this code be simpler if we nuked the ftrace_epilogue altogether?

After commit 0c0593b45c9b ("x86/ftrace: Make function graph use ftrace
directly"), the ftrace_epilogue is no longer needed. That was there to make
sure all the trampolines would call the function graph tracer. But now that
function graph tracing is just another ftrace caller, it's not needed
anymore.

Something like the below. It booted and passed the ftrace kselftests.

Feel free to include this in your series.

-- Steve

From 533f10bd48ffbc4ee5d2a07f0a7fe99aeb1c823a Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Date: Mon, 18 Jul 2022 16:01:07 -0400
Subject: [PATCH] ftrace/x86: Remove jumps to ftrace_epilogue

The jumps to ftrace_epilogue were done as a way to make sure all the
function tracing trampolines ended at the function graph trampoline, as
the ftrace_epilogue was the location that it would handle that.

With the advent of function graph tracer now being just one of the
callbacks of the function tracer there is no more requirement that all
trampolines go to a single location.

Remove the jumps to the ftrace_epilogue and replace them with return
statements.

Note, the ftrace_epilogue can probably be renamed to ftrace_stub and the
weak logic for that could probably be removed. But lets leave that as a
separate change.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace_64.S | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index dfeb227de561..8f225fafa5fb 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -173,7 +173,9 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 
-	jmp ftrace_epilogue
+	UNWIND_HINT_FUNC
+	ENDBR
+	RET
 SYM_FUNC_END(ftrace_caller);
 STACK_FRAME_NON_STANDARD_FP(ftrace_caller)
 
@@ -261,15 +263,9 @@ SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL)
 	/* Restore flags */
 	popfq
 
-	/*
-	 * As this jmp to ftrace_epilogue can be a short jump
-	 * it must not be copied into the trampoline.
-	 * The trampoline will add the code to jump
-	 * to the return.
-	 */
 SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
-	jmp ftrace_epilogue
+	jmp 2f
 
 	/* Swap the flags with orig_rax */
 1:	movq MCOUNT_REG_SIZE(%rsp), %rdi
@@ -279,8 +275,10 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
 	restore_mcount_regs 8
 	/* Restore flags */
 	popfq
+2:
 	UNWIND_HINT_FUNC
-	jmp	ftrace_epilogue
+	ENDBR
+	RET
 
 SYM_FUNC_END(ftrace_regs_caller)
 STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 20:44       ` Thomas Gleixner
@ 2022-07-18 21:01         ` Linus Torvalds
  2022-07-18 21:43           ` Peter Zijlstra
  2022-07-18 21:18         ` Peter Zijlstra
  2022-07-22 20:11         ` Tim Chen
  2 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 21:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 1:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Yes, Peter and I came from avoiding a new compiler and the overhead for
> everyone when putting the padding into the code. We realized only when
> staring at the perf data that this padding in front of the function
> might be an acceptable solution. I did some more tests today on different
> machines with mitigations=off with kernels compiled with and without
> that padding. I couldn't find a single test case where the result was
> outside of the usual noise. But then my tests are definitely incomplete.

Well, it sounds like it most definitely isn't a huge and obvious problem.

> Yes, I know. But it was horrible enough to find the right spot in that
> gcc maze. Then I was happy that I figured how to add the boolean
> option. I let real compiler people take care of the rest. HJL???
>
> And we need input from the Clang folks because their CFI work also puts
> stuff in front of the function entry, which nicely collides.

Yeah, looking at the gcc sources (I have them locally because it helps
with the gcc bug reports I've done over the years), that
ASM_OUTPUT_FUNCTION_PREFIX is very convenient, but it's too late to do
any inter-function alignment for, because it's already after the usual
function-alignment output.

So I guess the padding thing is largely tied together with alignment
of the function start, so that idea of having different padding and
alignment bytes doesn't workl that well.

At least not in that ASM_OUTPUT_FUNCTION_PREFIX model, which is how
the gcc patch ends up being so small.

               Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 20:44       ` Thomas Gleixner
  2022-07-18 21:01         ` Linus Torvalds
@ 2022-07-18 21:18         ` Peter Zijlstra
  2022-07-18 22:22           ` Thomas Gleixner
  2022-07-18 22:48           ` Sami Tolvanen
  2022-07-22 20:11         ` Tim Chen
  2 siblings, 2 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18 21:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann, samitolvanen

On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:

> > I do think that for generality, the "-mforce-function-padding" option
> > should perhaps take as an argument how much padding (and how much
> > alignment) to force:
> >
> >     -mforce-function-padding=5:16
> >
> > would force 5 bytes of minimum padding, and align functions to 16
> > bytes. It should be easy to generate (no more complexity than your
> > current one) by just making the output do
> >
> >         .skip 5,0xcc
> >         .palign 4,0xcc
> >
> > and now you can specify that you only need X bytes of padding, for example.
> 
> Yes, I know. But it was horrible enough to find the right spot in that
> gcc maze. Then I was happy that I figured how to add the boolean
> option. I let real compiler people take care of the rest. HJL???
> 
> And we need input from the Clang folks because their CFI work also puts
> stuff in front of the function entry, which nicely collides.

Right, I need to go look at the latest kCFI patches, that sorta got
side-tracked for working on all the retbleed muck :/

Basically kCFI wants to preface every (indirect callable) function with:

__cfi_\func:
	int3
        movl $0x12345678, %rax
        int3
        int3
\func:
        endbr
\func_direct:

Ofc, we can still put the whole:

	sarq	$5, PER_CPU_VAR(__x86_call_depth);
	jmp	\func_direct

thing in front of that. But it does somewhat destroy the version I had
that only needs the 10 bytes padding for the sarq.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 21:01         ` Linus Torvalds
@ 2022-07-18 21:43           ` Peter Zijlstra
  2022-07-18 22:34             ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18 21:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 02:01:43PM -0700, Linus Torvalds wrote:
> On Mon, Jul 18, 2022 at 1:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Yes, Peter and I came from avoiding a new compiler and the overhead for
> > everyone when putting the padding into the code. We realized only when
> > staring at the perf data that this padding in front of the function
> > might be an acceptable solution. I did some more tests today on different
> > machines with mitigations=off with kernels compiled with and without
> > that padding. I couldn't find a single test case where the result was
> > outside of the usual noise. But then my tests are definitely incomplete.
> 
> Well, it sounds like it most definitely isn't a huge and obvious problem.
> 
> > Yes, I know. But it was horrible enough to find the right spot in that
> > gcc maze. Then I was happy that I figured how to add the boolean
> > option. I let real compiler people take care of the rest. HJL???
> >
> > And we need input from the Clang folks because their CFI work also puts
> > stuff in front of the function entry, which nicely collides.
> 
> Yeah, looking at the gcc sources (I have them locally because it helps
> with the gcc bug reports I've done over the years), that
> ASM_OUTPUT_FUNCTION_PREFIX is very convenient, but it's too late to do
> any inter-function alignment for, because it's already after the usual
> function-alignment output.
> 
> So I guess the padding thing is largely tied together with alignment
> of the function start, so that idea of having different padding and
> alignment bytes doesn't workl that well.
> 
> At least not in that ASM_OUTPUT_FUNCTION_PREFIX model, which is how
> the gcc patch ends up being so small.

FWIW, when I was poking at this last week, I found that -falign-function
only seems to apply to the normal .text section and not to random other
sections with text we create.

Or rather, I was seeind a lot of unaligned functions that all had custom
sections despite explicitly using the (what I thought was a global)
function alignment toggle.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 21:18         ` Peter Zijlstra
@ 2022-07-18 22:22           ` Thomas Gleixner
  2022-07-18 22:47             ` Joao Moreira
  2022-07-18 22:48           ` Sami Tolvanen
  1 sibling, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 22:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann, samitolvanen

On Mon, Jul 18 2022 at 23:18, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
>> And we need input from the Clang folks because their CFI work also puts
>> stuff in front of the function entry, which nicely collides.
>
> Right, I need to go look at the latest kCFI patches, that sorta got
> side-tracked for working on all the retbleed muck :/
>
> Basically kCFI wants to preface every (indirect callable) function with:
>
> __cfi_\func:
> 	int3
>         movl $0x12345678, %rax
>         int3
>         int3
> \func:
>         endbr
> \func_direct:
>
> Ofc, we can still put the whole:
>
> 	sarq	$5, PER_CPU_VAR(__x86_call_depth);
> 	jmp	\func_direct
>
> thing in front of that. But it does somewhat destroy the version I had
> that only needs the 10 bytes padding for the sarq.

Right, because it needs the jump. I was just chatting with Jaoa about
that over IRC.

The jump slow things down. Jaoa has ideas and will reply soonish.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 21:43           ` Peter Zijlstra
@ 2022-07-18 22:34             ` Linus Torvalds
  2022-07-18 23:52               ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 22:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 2:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> FWIW, when I was poking at this last week, I found that -falign-function
> only seems to apply to the normal .text section and not to random other
> sections with text we create.
>
> Or rather, I was seeind a lot of unaligned functions that all had custom
> sections despite explicitly using the (what I thought was a global)
> function alignment toggle.

Hmm. This triggers a memory..

I think we may have two different issues at play.

One is that I think our linker script only aligns code sections to 8
bytes by default. Grep for ALIGN_FUNCTION.

And I think that any .align directive (or .p2align) only aligns
relative to that section, so if the section itself wasn't aligned, it
doesn't help to have some alignment within the section.

I may be wrong.

But I can definitely see gcc not aligning functions too, and doing a

    nm vmlinux | grep ' t ' | grep -v '0 t ' | grep -v '\.cold$' | sort

shows a _lot_ of them for me.

I think the main cause is that the ACPI code builds with

    ccflags-y                       := -Os -D_LINUX -DBUILDING_ACPICA

and that '-Os' will disable all function alignment. I think there's a
few other places that do that too.

I don't see the same effect in my clang build, so I think that -Os
behavior is likely gcc-specific.

In my clang build, I do see a few unaligned function symbols, but they
seem to be all our own assembler ones (eg "nested_nmi") and they seem
to be intentional (ie that "nested_nmi" thing is in the middle of the
"asm_exc_nmi" function, which is the real thing and which _is_
aligned).

                Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:22           ` Thomas Gleixner
@ 2022-07-18 22:47             ` Joao Moreira
  2022-07-18 22:55               ` Sami Tolvanen
  0 siblings, 1 reply; 142+ messages in thread
From: Joao Moreira @ 2022-07-18 22:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Torvalds, Linus, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Cooper, Andrew, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu, Moreira,
	Joao, Nuzman, Joseph, Steven Rostedt, Gross, Jurgen,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	samitolvanen

On 2022-07-18 15:22, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 23:18, Peter Zijlstra wrote:
>> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
>>> And we need input from the Clang folks because their CFI work also 
>>> puts
>>> stuff in front of the function entry, which nicely collides.
>> 
>> Right, I need to go look at the latest kCFI patches, that sorta got
>> side-tracked for working on all the retbleed muck :/
>> 
>> Basically kCFI wants to preface every (indirect callable) function 
>> with:
>> 
>> __cfi_\func:
>> 	int3
>>         movl $0x12345678, %rax
>>         int3
>>         int3
>> \func:
>>         endbr
>> \func_direct:
>> 
>> Ofc, we can still put the whole:
>> 
>> 	sarq	$5, PER_CPU_VAR(__x86_call_depth);
>> 	jmp	\func_direct
>> 
>> thing in front of that. But it does somewhat destroy the version I had
>> that only needs the 10 bytes padding for the sarq.
> 
> Right, because it needs the jump. I was just chatting with Jaoa about
> that over IRC.
> 
> The jump slow things down. Jaoa has ideas and will reply soonish.

So, IIRC, kCFI will do something like this to validate call targets 
based on the hash as described on Peter's e-mail:

func_whatever:
	...
	cmpl $0x\hash, -6(%rax)
	je 1f
	ud2
1:
	call *%rax
	...

Thus the hash will be 6 bytes before the function entry point. Then we 
can get the compiler to emit a padding area before the __cfi_\func 
snippet and, during boot, if the CPU needs the call depth tracking 
mitigation, we:
- move the __cfi_func into the padding area
- patch the call depth tracking snippet ahead of it (overwriting the old 
__cfi_\func:)
- fix the cmpl offset in the caller

func_whatever:
	...
	cmpl $0x\hash, -FIXED_OFFSET(%rax)
	je 1f
	ud2
1:
	call *%rax
	...

This approach is very similar to what we discussed in the past for 
replacing kCFI with FineIBT if CET is available. Also, it would prevent 
the need for any jump and would keep the additional padding area in 10 
bytes.

Tks,
Joao



^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 21:18         ` Peter Zijlstra
  2022-07-18 22:22           ` Thomas Gleixner
@ 2022-07-18 22:48           ` Sami Tolvanen
  2022-07-18 22:59             ` Thomas Gleixner
  2022-07-18 23:51             ` Peter Zijlstra
  1 sibling, 2 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-18 22:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
> > And we need input from the Clang folks because their CFI work also puts
> > stuff in front of the function entry, which nicely collides.
>
> Right, I need to go look at the latest kCFI patches, that sorta got
> side-tracked for working on all the retbleed muck :/
>
> Basically kCFI wants to preface every (indirect callable) function with:
>
> __cfi_\func:
>         int3
>         movl $0x12345678, %rax
>         int3
>         int3
> \func:

Yes, and in order to avoid scattering the code with call target
gadgets, the preamble should remain immediately before the function.

> Ofc, we can still put the whole:
>
>         sarq    $5, PER_CPU_VAR(__x86_call_depth);
>         jmp     \func_direct
>
> thing in front of that.

Sure, that would work.

> But it does somewhat destroy the version I had that only needs the
> 10 bytes padding for the sarq.

There's also the question of how function alignment should work in the
KCFI case. Currently, the __cfi_ preamble is 16-byte aligned, which
obviously means the function itself isn't.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:47             ` Joao Moreira
@ 2022-07-18 22:55               ` Sami Tolvanen
  2022-07-18 23:08                 ` Joao Moreira
  2022-07-18 23:19                 ` Thomas Gleixner
  0 siblings, 2 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-18 22:55 UTC (permalink / raw)
  To: Joao Moreira
  Cc: Thomas Gleixner, Peter Zijlstra, Torvalds, Linus, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 3:47 PM Joao Moreira <joao@overdrivepizza.com> wrote:
> Thus the hash will be 6 bytes before the function entry point. Then we
> can get the compiler to emit a padding area before the __cfi_\func
> snippet and, during boot, if the CPU needs the call depth tracking
> mitigation, we:
> - move the __cfi_func into the padding area
> - patch the call depth tracking snippet ahead of it (overwriting the old
> __cfi_\func:)
> - fix the cmpl offset in the caller
>
> func_whatever:
>         ...
>         cmpl $0x\hash, -FIXED_OFFSET(%rax)
>         je 1f
>         ud2
> 1:
>         call *%rax
>         ...

The problem with this is that the cmpl instruction contains the full
type hash, which means that any instruction that's FIXED_OFFSET from
the cmpl is a valid indirect call target as far as KCFI is concerned.
-6 was chosen specifically to make the ud2 the only possible target.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:48           ` Sami Tolvanen
@ 2022-07-18 22:59             ` Thomas Gleixner
  2022-07-18 23:10               ` Sami Tolvanen
  2022-07-18 23:39               ` Linus Torvalds
  2022-07-18 23:51             ` Peter Zijlstra
  1 sibling, 2 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 22:59 UTC (permalink / raw)
  To: Sami Tolvanen, Peter Zijlstra
  Cc: Linus Torvalds, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18 2022 at 15:48, Sami Tolvanen wrote:
> On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
>> > And we need input from the Clang folks because their CFI work also puts
>> > stuff in front of the function entry, which nicely collides.
>>
>> Right, I need to go look at the latest kCFI patches, that sorta got
>> side-tracked for working on all the retbleed muck :/
>>
>> Basically kCFI wants to preface every (indirect callable) function with:
>>
>> __cfi_\func:
>>         int3
>>         movl $0x12345678, %rax
>>         int3
>>         int3
>> \func:
>
> Yes, and in order to avoid scattering the code with call target
> gadgets, the preamble should remain immediately before the function.
>
>> Ofc, we can still put the whole:
>>
>>         sarq    $5, PER_CPU_VAR(__x86_call_depth);
>>         jmp     \func_direct
>>
>> thing in front of that.
>
> Sure, that would work.
>
>> But it does somewhat destroy the version I had that only needs the
>> 10 bytes padding for the sarq.
>
> There's also the question of how function alignment should work in the
> KCFI case. Currently, the __cfi_ preamble is 16-byte aligned, which
> obviously means the function itself isn't.

That's bad. The function entry should be 16 byte aligned and as I just
learned for AMD the ideal alignment would be possibly 32 byte as that's
their I-fetch width. But my experiments with 16 bytes alignment
independent of the padding muck is benefitial for both AMD and Intel
over the 4 byte alignment we have right now.

This really needs a lot of thought and performance analysis before we
commit to anything here. Peter's an my investigations have shown how
sensitive this is.

We can't just add stuff without taking the whole picture into account
(independent of the proposed padding muck).

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:55               ` Sami Tolvanen
@ 2022-07-18 23:08                 ` Joao Moreira
  2022-07-18 23:19                 ` Thomas Gleixner
  1 sibling, 0 replies; 142+ messages in thread
From: Joao Moreira @ 2022-07-18 23:08 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Thomas Gleixner, Peter Zijlstra, Torvalds, Linus, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

> The problem with this is that the cmpl instruction contains the full
> type hash, which means that any instruction that's FIXED_OFFSET from
> the cmpl is a valid indirect call target as far as KCFI is concerned.
> -6 was chosen specifically to make the ud2 the only possible target.

Ugh. The bitter truth. I'll think a bit further.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:59             ` Thomas Gleixner
@ 2022-07-18 23:10               ` Sami Tolvanen
  2022-07-18 23:39               ` Linus Torvalds
  1 sibling, 0 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-18 23:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Linus Torvalds, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 3:59 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, Jul 18 2022 at 15:48, Sami Tolvanen wrote:
> > On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
> >> > And we need input from the Clang folks because their CFI work also puts
> >> > stuff in front of the function entry, which nicely collides.
> >>
> >> Right, I need to go look at the latest kCFI patches, that sorta got
> >> side-tracked for working on all the retbleed muck :/
> >>
> >> Basically kCFI wants to preface every (indirect callable) function with:
> >>
> >> __cfi_\func:
> >>         int3
> >>         movl $0x12345678, %rax
> >>         int3
> >>         int3
> >> \func:
> >
> > Yes, and in order to avoid scattering the code with call target
> > gadgets, the preamble should remain immediately before the function.
> >
> >> Ofc, we can still put the whole:
> >>
> >>         sarq    $5, PER_CPU_VAR(__x86_call_depth);
> >>         jmp     \func_direct
> >>
> >> thing in front of that.
> >
> > Sure, that would work.
> >
> >> But it does somewhat destroy the version I had that only needs the
> >> 10 bytes padding for the sarq.
> >
> > There's also the question of how function alignment should work in the
> > KCFI case. Currently, the __cfi_ preamble is 16-byte aligned, which
> > obviously means the function itself isn't.
>
> That's bad. The function entry should be 16 byte aligned and as I just
> learned for AMD the ideal alignment would be possibly 32 byte as that's
> their I-fetch width. But my experiments with 16 bytes alignment
> independent of the padding muck is benefitial for both AMD and Intel
> over the 4 byte alignment we have right now.

OK, that's what I thought. KCFI hasn't landed in Clang yet, so it
shouldn't be a problem to fix this.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:55               ` Sami Tolvanen
  2022-07-18 23:08                 ` Joao Moreira
@ 2022-07-18 23:19                 ` Thomas Gleixner
  2022-07-18 23:42                   ` Linus Torvalds
  1 sibling, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-18 23:19 UTC (permalink / raw)
  To: Sami Tolvanen, Joao Moreira
  Cc: Peter Zijlstra, Torvalds, Linus, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Cooper, Andrew, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu, Moreira,
	Joao, Nuzman, Joseph, Steven Rostedt, Gross, Jurgen,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18 2022 at 15:55, Sami Tolvanen wrote:
> On Mon, Jul 18, 2022 at 3:47 PM Joao Moreira <joao@overdrivepizza.com> wrote:
>> Thus the hash will be 6 bytes before the function entry point. Then we
>> can get the compiler to emit a padding area before the __cfi_\func
>> snippet and, during boot, if the CPU needs the call depth tracking
>> mitigation, we:
>> - move the __cfi_func into the padding area
>> - patch the call depth tracking snippet ahead of it (overwriting the old
>> __cfi_\func:)
>> - fix the cmpl offset in the caller
>>
>> func_whatever:
>>         ...
>>         cmpl $0x\hash, -FIXED_OFFSET(%rax)
>>         je 1f
>>         ud2
>> 1:
>>         call *%rax
>>         ...
>
> The problem with this is that the cmpl instruction contains the full
> type hash, which means that any instruction that's FIXED_OFFSET from
> the cmpl is a valid indirect call target as far as KCFI is concerned.
> -6 was chosen specifically to make the ud2 the only possible target.

But that's an implementation detail, right? Whatever we put in between
will still be a fixed offset, no? It's a different offset, but that's
what patching can deal with.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:59             ` Thomas Gleixner
  2022-07-18 23:10               ` Sami Tolvanen
@ 2022-07-18 23:39               ` Linus Torvalds
  1 sibling, 0 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 23:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sami Tolvanen, Peter Zijlstra, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 3:59 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> That's bad. The function entry should be 16 byte aligned and as I just
> learned for AMD the ideal alignment would be possibly 32 byte as that's
> their I-fetch width.

In general, the L1 cache line size is likely the only "ideal" alignment.

Even if (I think) intel fetches just 16 bytes per cycle from the L1 I$
when decoding, being cacheline aligned still means that the L2->L1
transfer for the beginning of the function starts out better.

But Intel's current optimization many actually end sup having special rules:

   When executing code from the Decoded Icache, direct branches that
are mostly taken should have all their instruction bytes in a 64B
cache line and nearer the end of that cache line. Their targets should
be at or near the beginning of a 64B cache line.

   When executing code from the legacy decode pipeline, direct
branches that are mostly taken should have all their instruction bytes
in a 16B aligned chunk of memory and nearer the end of that 16B
aligned chunk. Their targets should be at or near the beginning of a
16B aligned chunk of memory.

So note how the branch itself should be at the end of the chunk, but
the branch _target_ should be at the beginning of the chunk. And the
chunk size is 16 bytes for decoding new instructions, and 64 bytes for
predecoded.

I suspect that for the kernel, and for the beginning of the function
(ie the target case), the 16-byte thing is still the main thing.
Because the L0 I$ ("uop cache", "Decoded Icache", whatever you want to
call it) is probably too small for a lot of kernel loads where user
space has flushed things in between system calls or faults.

Older versions of the intel code just said "All branch targets should
be 16-byte aligned".

So I think 16 bytes for function alignment ends up being what we
generally want, but yes, it's slowly changing. AMD fetches 32-byte
chunks, and Intel has that 64-bit predecode thing.

           Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:19                 ` Thomas Gleixner
@ 2022-07-18 23:42                   ` Linus Torvalds
  2022-07-18 23:52                     ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 23:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sami Tolvanen, Joao Moreira, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 4:19 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> But that's an implementation detail, right? Whatever we put in between
> will still be a fixed offset, no? It's a different offset, but that's
> what patching can deal with.

No, what Sami is sayin that because the "cmpl" *inside* the function
that checks the hash value will have that same (valid) hash value
encoded as part of it, then you actually have *two* valid markers with
that hash value.

You have the "real" marker before the function.

But you also have the "false" marker that is part of the hash check
that is *inside* the function.

The "real marker + 6" points to the function head itself, and so is ok
as a target (normal operation).

The "false marker + 6" points to the "UD2", and so is *also* ok as a
target (bad guy trying to mis-use the false marker gets trapped by
UD2).

               Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:48           ` Sami Tolvanen
  2022-07-18 22:59             ` Thomas Gleixner
@ 2022-07-18 23:51             ` Peter Zijlstra
  2022-07-20  9:00               ` Thomas Gleixner
                                 ` (2 more replies)
  1 sibling, 3 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18 23:51 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Thomas Gleixner, Linus Torvalds, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 03:48:04PM -0700, Sami Tolvanen wrote:
> On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote:
> > > And we need input from the Clang folks because their CFI work also puts
> > > stuff in front of the function entry, which nicely collides.
> >
> > Right, I need to go look at the latest kCFI patches, that sorta got
> > side-tracked for working on all the retbleed muck :/
> >
> > Basically kCFI wants to preface every (indirect callable) function with:
> >
> > __cfi_\func:
> >         int3
> >         movl $0x12345678, %rax
> >         int3
> >         int3
> > \func:
> 
> Yes, and in order to avoid scattering the code with call target
> gadgets, the preamble should remain immediately before the function.

I think we have a little room, but yes, -6 is just right to hit the UD2.

> > Ofc, we can still put the whole:
> >
> >         sarq    $5, PER_CPU_VAR(__x86_call_depth);
> >         jmp     \func_direct
> >
> > thing in front of that.
> 
> Sure, that would work.

So if we assume \func starts with ENDBR, and further assume we've fixed
up every direct jmp/call to land at +4, we can overwrite the ENDBR with
part of the SARQ, that leaves us 6 more byte, placing the immediate at
-10 if I'm not mis-counting.

Now, the call sites are:

41 81 7b fa 78 56 34 12		cmpl	$0x12345678, -6(%r11)
74 02				je	1f
0f 0b				ud2
e8 00 00 00 00		1:	call	__x86_indirect_thunk_r11

That means the offset of +10 lands in the middle of the CALL
instruction, and since we only have 16 thunks there is a limited number
of byte patterns available there.

This really isn't as nice as the -6 but might just work well enough,
hmm?

> > But it does somewhat destroy the version I had that only needs the
> > 10 bytes padding for the sarq.
> 
> There's also the question of how function alignment should work in the
> KCFI case. Currently, the __cfi_ preamble is 16-byte aligned, which
> obviously means the function itself isn't.

That seems unfortunate, at least the intel parts have a 16 byte i-fetch
window (IIRC) so aligning the actual instructions at 16 bytes gets you
the best bang for the buck wrt ifetch (functions are random access and
not sequential).

Also, since we're talking at least 4 bytes more padding over the 7 that
are required by the kCFI scheme, the FineIBT alternative gets a little
more room to breathe. I'm thinking we can have the FineIBT landing site
at -16.

__fineibt_\func:
	endbr64				# 4
	xorl	$0x12345678, %r10d	# 7
	je	\func+4			# 2
	ud2				# 2

\func:
	nop4
	...

With the callsite looking like:

	nop7
	movl	$0x12345678, %r10d	# 7
	call	*%r11			# 3

or something like that (everything having IBT has eIBRS at the very
least).

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:42                   ` Linus Torvalds
@ 2022-07-18 23:52                     ` Linus Torvalds
  2022-07-18 23:57                       ` Peter Zijlstra
  2022-07-19  0:01                       ` Linus Torvalds
  0 siblings, 2 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-18 23:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sami Tolvanen, Joao Moreira, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 4:42 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> You have the "real" marker before the function.
>
> But you also have the "false" marker that is part of the hash check
> that is *inside* the function.
>
> The "real marker + 6" points to the function head itself, and so is ok
> as a target (normal operation).

Of course, one fix for that is to make the hash be only 24 bits, and
make the int3 byte part of the value you check, and not have the same
pattern in the checking code at all.

Honestly, I think that would be a better model - yes, you lose 8 bits
of hash, but considering that apparently the current KCFI code
*guarantees* that the hash pattern will exist even outside the actual
target pattern, I think it's still a better model.

I also happen to believe that the kCFI code should have entirely
different targets for direct jumps and for indirect jumps, but that's
a separate issue. Maybe it already does that?

                 Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 22:34             ` Linus Torvalds
@ 2022-07-18 23:52               ` Peter Zijlstra
  0 siblings, 0 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18 23:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 03:34:51PM -0700, Linus Torvalds wrote:

> I think the main cause is that the ACPI code builds with
> 
>     ccflags-y                       := -Os -D_LINUX -DBUILDING_ACPICA
> 
> and that '-Os' will disable all function alignment. I think there's a
> few other places that do that too.

Urgh, that's -Ostupid, right?

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:52                     ` Linus Torvalds
@ 2022-07-18 23:57                       ` Peter Zijlstra
  2022-07-19  0:03                         ` Linus Torvalds
  2022-07-19  0:01                       ` Linus Torvalds
  1 sibling, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-18 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 04:52:09PM -0700, Linus Torvalds wrote:
> I also happen to believe that the kCFI code should have entirely
> different targets for direct jumps and for indirect jumps, but that's
> a separate issue. Maybe it already does that?

kCFI is purely about indirect calls.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:52                     ` Linus Torvalds
  2022-07-18 23:57                       ` Peter Zijlstra
@ 2022-07-19  0:01                       ` Linus Torvalds
  2022-07-19  0:19                         ` Joao Moreira
  2022-07-19  8:26                         ` David Laight
  1 sibling, 2 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19  0:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sami Tolvanen, Joao Moreira, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 4:52 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Honestly, I think that would be a better model - yes, you lose 8 bits
> of hash, but considering that apparently the current KCFI code
> *guarantees* that the hash pattern will exist even outside the actual
> target pattern,

Gaah, I'm being stupid,. You still get the value collision, since the
int3 byte pattern would just be part of the compare pattern.

You'd have to use some multi-instruction compare to avoid having the
pattern in the instruction stream. Probably with another register.
Like

        movl -FIXED_OFFSET(%eax),%rdx
        addl $ANTI_PATTERN,%rdx
        je ok

so that the "compare" wouldn't use the same pattern value, but be an
add with the negated pattern value instead.

The extra instruction is likely less of a problem than the extra register used.

             Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:57                       ` Peter Zijlstra
@ 2022-07-19  0:03                         ` Linus Torvalds
  2022-07-19  0:11                           ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19  0:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 4:58 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jul 18, 2022 at 04:52:09PM -0700, Linus Torvalds wrote:
> > I also happen to believe that the kCFI code should have entirely
> > different targets for direct jumps and for indirect jumps, but that's
> > a separate issue. Maybe it already does that?
>
> kCFI is purely about indirect calls.

So it already only adds the pattern to things that have their address
taken, not all functions?

If so, that's simple enough to sort out: don't do any RSB stack
adjustment for those thunks AT ALL.

Because they should just then end up with a jump to the "real" target,
and that real target will do the RSB stack thing.

               Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:03                         ` Linus Torvalds
@ 2022-07-19  0:11                           ` Linus Torvalds
  2022-07-19  0:23                             ` Peter Zijlstra
  2022-07-19 17:19                             ` Sami Tolvanen
  0 siblings, 2 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19  0:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 5:03 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it already only adds the pattern to things that have their address
> taken, not all functions?
>
> If so, that's simple enough to sort out: don't do any RSB stack
> adjustment for those thunks AT ALL.
>
> Because they should just then end up with a jump to the "real" target,
> and that real target will do the RSB stack thing.

Put another way, let's say that you have a function that looks like this:

  int silly(void)
  {
       return 0;
  }

and now you have two cases:

 - the "direct callable version" of that function looks exactly the
way it always has looked, and gets the 16 bytes of padding for it, and
the RSB counting can happen in that padding

 - the "somebody took the address of this function" creates code that
has the hash marker before it, and has the hash check, and then does a
"jmp silly" to actually jump to the real code.

So what the RSB counting does is just ignore that second case entirely
as far as the RSB code generation goes. No need to have any padding
for it at all, it has that (completely different) kCFI padding
instead.

Instead, only the "real" silly function gets that RSB code, and the
"jmp silly" from the kCFI thunk needs to be updated to point to the
RSB thunk in front of it.

Yes, yes, it makes indirect calls slightly more expensive than direct
calls (because that kCFI thing can't just fall through to the real
thing), but considering all the *other* costs of indirect calls, the
cost of having that one "jmp" instruction doesn't really seem to
matter, does it?

                    Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:01                       ` Linus Torvalds
@ 2022-07-19  0:19                         ` Joao Moreira
  2022-07-19 17:21                           ` Sami Tolvanen
  2022-07-19  8:26                         ` David Laight
  1 sibling, 1 reply; 142+ messages in thread
From: Joao Moreira @ 2022-07-19  0:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Sami Tolvanen, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

> The extra instruction is likely less of a problem than the extra 
> register used.
> 
FWIIW, per-ABI, R11 is a scratch-reg and should be usable without hard 
consequences in this scenario.

Joao

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:11                           ` Linus Torvalds
@ 2022-07-19  0:23                             ` Peter Zijlstra
  2022-07-19  1:02                               ` Linus Torvalds
  2022-07-19 17:19                             ` Sami Tolvanen
  1 sibling, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-19  0:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 05:11:27PM -0700, Linus Torvalds wrote:
> On Mon, Jul 18, 2022 at 5:03 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So it already only adds the pattern to things that have their address
> > taken, not all functions?
> >
> > If so, that's simple enough to sort out: don't do any RSB stack
> > adjustment for those thunks AT ALL.
> >
> > Because they should just then end up with a jump to the "real" target,
> > and that real target will do the RSB stack thing.
> 
> Put another way, let's say that you have a function that looks like this:
> 
>   int silly(void)
>   {
>        return 0;
>   }
> 
> and now you have two cases:
> 
>  - the "direct callable version" of that function looks exactly the
> way it always has looked, and gets the 16 bytes of padding for it, and
> the RSB counting can happen in that padding
> 
>  - the "somebody took the address of this function" creates code that
> has the hash marker before it, and has the hash check, and then does a
> "jmp silly" to actually jump to the real code.
> 
> So what the RSB counting does is just ignore that second case entirely
> as far as the RSB code generation goes. No need to have any padding
> for it at all, it has that (completely different) kCFI padding
> instead.
> 
> Instead, only the "real" silly function gets that RSB code, and the
> "jmp silly" from the kCFI thunk needs to be updated to point to the
> RSB thunk in front of it.
> 
> Yes, yes, it makes indirect calls slightly more expensive than direct
> calls (because that kCFI thing can't just fall through to the real
> thing), but considering all the *other* costs of indirect calls, the
> cost of having that one "jmp" instruction doesn't really seem to
> matter, does it?

So it's like 2:15 am here, so I might not be following things right, but
doesn't the above mean you have to play funny games with what a function
pointer is?

That is, the content of a function pointer (address taken) no longer
match the actual function? That gives grief with things like
static_call(), ftrace and other things that write call instructions
instead of doing indirect calls.



^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:23                             ` Peter Zijlstra
@ 2022-07-19  1:02                               ` Linus Torvalds
  0 siblings, 0 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19  1:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 5:23 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> So it's like 2:15 am here, so I might not be following things right, but
> doesn't the above mean you have to play funny games with what a function
> pointer is?

Yes, but probably no more than compilers already do.

On x86, function pointers are simple, and just pointers to the first
instruction of the function.

But that's actually not true in general, and several other
architectures have *much* more complicated function pointers, where
they are pointers to special "function descriptor blocks" etc.

So I bet gcc has all that infrastructure in place anyway.

And the whole "use a different address for a direct call than for an
indirect call" is still much simpler than having an actual separate
function descriptor thing.

At worst, you'd actually always generate the thunk for the indirect
call case, and let the linker kill unused cases. The compiler wouldn't
even have to know about the two cases, except to use a different names
for the direct call case.

Do I claim it would be *pretty*? No. But I bet the existing CFI
patches already do things like this anyway.

(I have llvm sources on my machine too, because I used to build my own
clang from source back when I was testing the asm goto stuff. But
unlike gcc, I've never really *looked* at llvm, so I'm not familiar
with it at all, and I'm not going to try to figure out what the CFI
code actually does, and instead just handwave widely while saying "I
bet it already does this".)

               Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 37/38] x86/bpf: Emit call depth accounting if required
  2022-07-16 23:18 ` [patch 37/38] x86/bpf: Emit call depth accounting if required Thomas Gleixner
@ 2022-07-19  5:30   ` Alexei Starovoitov
  2022-07-19  8:34     ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Alexei Starovoitov @ 2022-07-19  5:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Alexei Starovoitov, Daniel Borkmann

On Sat, Jul 16, 2022 at 4:18 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Ensure that calls in BPF jitted programs are emitting call depth accounting
> when enabled to keep the call/return balanced. The return thunk jump is
> already injected due to the earlier retbleed mitigations.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  arch/x86/include/asm/alternative.h |    6 +++++
>  arch/x86/kernel/callthunks.c       |   19 ++++++++++++++++
>  arch/x86/net/bpf_jit_comp.c        |   43 ++++++++++++++++++++++++-------------
>  3 files changed, 53 insertions(+), 15 deletions(-)
>
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -95,6 +95,7 @@ extern void callthunks_patch_module_call
>  extern void callthunks_module_free(struct module *mod);
>  extern void *callthunks_translate_call_dest(void *dest);
>  extern bool is_callthunk(void *addr);
> +extern int x86_call_depth_emit_accounting(u8 **pprog, void *func);
>  #else
>  static __always_inline void callthunks_patch_builtin_calls(void) {}
>  static __always_inline void
> @@ -109,6 +110,11 @@ static __always_inline bool is_callthunk
>  {
>         return false;
>  }
> +static __always_inline int x86_call_depth_emit_accounting(u8 **pprog,
> +                                                         void *func)
> +{
> +       return 0;
> +}
>  #endif
>
>  #ifdef CONFIG_SMP
> --- a/arch/x86/kernel/callthunks.c
> +++ b/arch/x86/kernel/callthunks.c
> @@ -706,6 +706,25 @@ int callthunk_get_kallsym(unsigned int s
>         return ret;
>  }
>
> +#ifdef CONFIG_BPF_JIT
> +int x86_call_depth_emit_accounting(u8 **pprog, void *func)
> +{
> +       unsigned int tmpl_size = callthunk_desc.template_size;
> +       void *tmpl = callthunk_desc.template;
> +
> +       if (!thunks_initialized)
> +               return 0;
> +
> +       /* Is function call target a thunk? */
> +       if (is_callthunk(func))
> +               return 0;
> +
> +       memcpy(*pprog, tmpl, tmpl_size);
> +       *pprog += tmpl_size;
> +       return tmpl_size;
> +}
> +#endif
> +
>  #ifdef CONFIG_MODULES
>  void noinline callthunks_patch_module_calls(struct callthunk_sites *cs,
>                                             struct module *mod)
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -340,6 +340,12 @@ static int emit_call(u8 **pprog, void *f
>         return emit_patch(pprog, func, ip, 0xE8);
>  }
>
> +static int emit_rsb_call(u8 **pprog, void *func, void *ip)
> +{
> +       x86_call_depth_emit_accounting(pprog, func);
> +       return emit_patch(pprog, func, ip, 0xE8);
> +}
> +
>  static int emit_jump(u8 **pprog, void *func, void *ip)
>  {
>         return emit_patch(pprog, func, ip, 0xE9);
> @@ -1431,19 +1437,26 @@ st:                     if (is_imm8(insn->off))
>                         break;
>
>                         /* call */
> -               case BPF_JMP | BPF_CALL:
> +               case BPF_JMP | BPF_CALL: {
> +                       int offs;
> +
>                         func = (u8 *) __bpf_call_base + imm32;
>                         if (tail_call_reachable) {
>                                 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>                                 EMIT3_off32(0x48, 0x8B, 0x85,
>                                             -round_up(bpf_prog->aux->stack_depth, 8) - 8);
> -                               if (!imm32 || emit_call(&prog, func, image + addrs[i - 1] + 7))
> +                               if (!imm32)
>                                         return -EINVAL;
> +                               offs = 7 + x86_call_depth_emit_accounting(&prog, func);

It's a bit hard to read all the macro magic in patches 28-30,
but I suspect the asm inside
callthunk_desc.template
that will be emitted here before the call
will do
some math on %rax
movq %rax, PER_CPU_VAR(__x86_call_depth).

Only %rax register is scratched by the callthunk_desc, right?
If so, it's ok for all cases except this one.
See the comment few lines above
after if (tail_call_reachable)
and commit ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT")
We use %rax to keep the tail_call count.
The callthunk_desc would need to preserve %rax.
I guess extra push %rax/pop %rax would do it.

>                         } else {
> -                               if (!imm32 || emit_call(&prog, func, image + addrs[i - 1]))
> +                               if (!imm32)
>                                         return -EINVAL;
> +                               offs = x86_call_depth_emit_accounting(&prog, func);
>                         }
> +                       if (emit_call(&prog, func, image + addrs[i - 1] + offs))
> +                               return -EINVAL;
>                         break;
> +               }

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:01                       ` Linus Torvalds
  2022-07-19  0:19                         ` Joao Moreira
@ 2022-07-19  8:26                         ` David Laight
  2022-07-19 16:27                           ` Linus Torvalds
  1 sibling, 1 reply; 142+ messages in thread
From: David Laight @ 2022-07-19  8:26 UTC (permalink / raw)
  To: 'Linus Torvalds', Thomas Gleixner
  Cc: Sami Tolvanen, Joao Moreira, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

From: Linus Torvalds
> Sent: 19 July 2022 01:02
> 
> On Mon, Jul 18, 2022 at 4:52 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Honestly, I think that would be a better model - yes, you lose 8 bits
> > of hash, but considering that apparently the current KCFI code
> > *guarantees* that the hash pattern will exist even outside the actual
> > target pattern,
> 
> Gaah, I'm being stupid,. You still get the value collision, since the
> int3 byte pattern would just be part of the compare pattern.
> 
> You'd have to use some multi-instruction compare to avoid having the
> pattern in the instruction stream. Probably with another register.
> Like
> 
>         movl -FIXED_OFFSET(%eax),%rdx
>         addl $ANTI_PATTERN,%rdx
>         je ok
> 
> so that the "compare" wouldn't use the same pattern value, but be an
> add with the negated pattern value instead.
> 
> The extra instruction is likely less of a problem than the extra register used.

Shouldn't it be testing the value the caller supplied?

The extra instruction is likely to be one clock - I doubt it will
sensibly run in parallel with code later in the function.

The larger costs are (probably) polluting the D$ with I addresses
(already done by the caller) and the likely mispredicted 'je ok'.
Unless the function has been recently called the 'je ok' gets
static prediction.
While traditionally that would predict a forwards branch 'not taken'
ISTR more recent Intel cpu just use the predictor output - ie random.
Not at all sure about AMD cpu though.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 37/38] x86/bpf: Emit call depth accounting if required
  2022-07-19  5:30   ` Alexei Starovoitov
@ 2022-07-19  8:34     ` Peter Zijlstra
  0 siblings, 0 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-19  8:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Gleixner, LKML, X86 ML, Linus Torvalds, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 18, 2022 at 10:30:01PM -0700, Alexei Starovoitov wrote:
> On Sat, Jul 16, 2022 at 4:18 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > @@ -1431,19 +1437,26 @@ st:                     if (is_imm8(insn->off))
> >                         break;
> >
> >                         /* call */
> > -               case BPF_JMP | BPF_CALL:
> > +               case BPF_JMP | BPF_CALL: {
> > +                       int offs;
> > +
> >                         func = (u8 *) __bpf_call_base + imm32;
> >                         if (tail_call_reachable) {
> >                                 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> >                                 EMIT3_off32(0x48, 0x8B, 0x85,
> >                                             -round_up(bpf_prog->aux->stack_depth, 8) - 8);
> > -                               if (!imm32 || emit_call(&prog, func, image + addrs[i - 1] + 7))
> > +                               if (!imm32)
> >                                         return -EINVAL;
> > +                               offs = 7 + x86_call_depth_emit_accounting(&prog, func);
> 
> It's a bit hard to read all the macro magic in patches 28-30,
> but I suspect the asm inside
> callthunk_desc.template
> that will be emitted here before the call
> will do
> some math on %rax
> movq %rax, PER_CPU_VAR(__x86_call_depth).
> 
> Only %rax register is scratched by the callthunk_desc, right?
> If so, it's ok for all cases except this one.
> See the comment few lines above
> after if (tail_call_reachable)
> and commit ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall
> handling in JIT")
> We use %rax to keep the tail_call count.
> The callthunk_desc would need to preserve %rax.
> I guess extra push %rax/pop %rax would do it.

The accounting template is basically:

	sarq $5, PER_CPU_VAR(__x86_call_depth)

No registeres used (with debugging on it's a few more memops).

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 36/38] x86/ftrace: Make it call depth tracking aware
  2022-07-18 21:01   ` Steven Rostedt
@ 2022-07-19  8:46     ` Peter Zijlstra
  2022-07-19 13:06       ` Steven Rostedt
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-19  8:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, x86, Linus Torvalds, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman

On Mon, Jul 18, 2022 at 05:01:23PM -0400, Steven Rostedt wrote:

> From 533f10bd48ffbc4ee5d2a07f0a7fe99aeb1c823a Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> Date: Mon, 18 Jul 2022 16:01:07 -0400
> Subject: [PATCH] ftrace/x86: Remove jumps to ftrace_epilogue
> 
> The jumps to ftrace_epilogue were done as a way to make sure all the
> function tracing trampolines ended at the function graph trampoline, as
> the ftrace_epilogue was the location that it would handle that.
> 
> With the advent of function graph tracer now being just one of the
> callbacks of the function tracer there is no more requirement that all
> trampolines go to a single location.
> 
> Remove the jumps to the ftrace_epilogue and replace them with return
> statements.
> 
> Note, the ftrace_epilogue can probably be renamed to ftrace_stub and the
> weak logic for that could probably be removed. But lets leave that as a
> separate change.
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  arch/x86/kernel/ftrace_64.S | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
> index dfeb227de561..8f225fafa5fb 100644
> --- a/arch/x86/kernel/ftrace_64.S
> +++ b/arch/x86/kernel/ftrace_64.S
> @@ -173,7 +173,9 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>  SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
>  	ANNOTATE_NOENDBR
>  
> -	jmp ftrace_epilogue
> +	UNWIND_HINT_FUNC
> +	ENDBR

Only the RET should do I think; you definitely don't need an ENDBR here
nor do you need to override the unwind hint. Lemme try..

Yeah, the below is sufficient:

diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index dfeb227de561..d6679b65b6f2 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -172,8 +172,7 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	 */
 SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
-
-	jmp ftrace_epilogue
+	RET
 SYM_FUNC_END(ftrace_caller);
 STACK_FRAME_NON_STANDARD_FP(ftrace_caller)
 
@@ -269,7 +268,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL)
 	 */
 SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
-	jmp ftrace_epilogue
+	RET
 
 	/* Swap the flags with orig_rax */
 1:	movq MCOUNT_REG_SIZE(%rsp), %rdi
@@ -280,7 +279,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
 	/* Restore flags */
 	popfq
 	UNWIND_HINT_FUNC
-	jmp	ftrace_epilogue
+	RET
 
 SYM_FUNC_END(ftrace_regs_caller)
 STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller)

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Virt Call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (40 preceding siblings ...)
  2022-07-18 19:55 ` Thomas Gleixner
@ 2022-07-19 10:24 ` Andrew Cooper
  2022-07-19 14:13   ` Thomas Gleixner
  2022-07-19 14:45   ` Michael Kelley (LINUX)
  2022-07-20 16:57 ` [patch 00/38] x86/retbleed: " Steven Rostedt
  42 siblings, 2 replies; 142+ messages in thread
From: Andrew Cooper @ 2022-07-19 10:24 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann, kys,
	haiyangz, Stephen Hemminger, Wei Liu, decui, Michael Kelley

On 17/07/2022 00:17, Thomas Gleixner wrote:
> As IBRS is a performance horror show, Peter Zijstra and me revisited the
> call depth tracking approach and implemented it in a way which is hopefully
> more palatable and avoids the downsides of the original attempt.
>
> We both unsurprisingly hate the result with a passion...

And I hate to add more problems, but here we go.

Under virt, it's not just SMI's which might run behind your back. 
Regular interrupts/etc can probably be hand-waved away in the same way
that SMIs are.

Hypercalls however are a different matter.

Xen and HyperV both have hypercall pages, where the hypervisor provides
some executable code for the guest kernel to use.

Under the current scheme, the calls into the hypercall pages get
accounted, as objtool can see them, but the ret's don't.  This imbalance
is exasperated because some hypercalls are called in loops.

Worse however, it opens a hole where branch history is calculable and
the ret can reliably underflow.  This occurs when there's a minimal call
depth in Linux to get to the hypercall, and then a call depth of >16 in
the hypervisor.

The only variable in these cases is how much user control there is of
the registers, and I for one am not feeling lucky in face of the current
research.

The only solution I see here is for Linux to ret-thunk the hypercall
page too.  Under Xen, the hypercall page is mutable by the guest and
there is room to turn every ret into a jmp, but obviously none of this
is covered by any formal ABI, and this probably needs more careful
consideration than the short time I've put towards it.

That said, after a return from the hypervisor, Linux has no idea what
state the RSB is in, so the only safe course of action is to re-stuff.

CC'ing the HyperV folk for input on their side.

~Andrew

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 36/38] x86/ftrace: Make it call depth tracking aware
  2022-07-19  8:46     ` Peter Zijlstra
@ 2022-07-19 13:06       ` Steven Rostedt
  0 siblings, 0 replies; 142+ messages in thread
From: Steven Rostedt @ 2022-07-19 13:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Linus Torvalds, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman

On Tue, 19 Jul 2022 10:46:48 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Only the RET should do I think; you definitely don't need an ENDBR here
> nor do you need to override the unwind hint. Lemme try..

I'll replace with the RET and resend v2.

-- Steve

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: Virt Call depth tracking mitigation
  2022-07-19 10:24 ` Virt " Andrew Cooper
@ 2022-07-19 14:13   ` Thomas Gleixner
  2022-07-19 16:23     ` Andrew Cooper
  2022-07-19 14:45   ` Michael Kelley (LINUX)
  1 sibling, 1 reply; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-19 14:13 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann, kys,
	haiyangz, Stephen Hemminger, Wei Liu, decui, Michael Kelley

On Tue, Jul 19 2022 at 10:24, Andrew Cooper wrote:
> On 17/07/2022 00:17, Thomas Gleixner wrote:
>> As IBRS is a performance horror show, Peter Zijstra and me revisited the
>> call depth tracking approach and implemented it in a way which is hopefully
>> more palatable and avoids the downsides of the original attempt.
>>
>> We both unsurprisingly hate the result with a passion...
>
> And I hate to add more problems, but here we go.
>
> Under virt, it's not just SMI's which might run behind your back. 
> Regular interrupts/etc can probably be hand-waved away in the same way
> that SMIs are.

You mean host side interrupts, right?

> Hypercalls however are a different matter.
>
> Xen and HyperV both have hypercall pages, where the hypervisor provides
> some executable code for the guest kernel to use.
>
> Under the current scheme, the calls into the hypercall pages get
> accounted, as objtool can see them, but the ret's don't.  This imbalance
> is exasperated because some hypercalls are called in loops.

Bah.

> Worse however, it opens a hole where branch history is calculable and
> the ret can reliably underflow.  This occurs when there's a minimal call
> depth in Linux to get to the hypercall, and then a call depth of >16 in
> the hypervisor.
>
> The only variable in these cases is how much user control there is of
> the registers, and I for one am not feeling lucky in face of the current
> research.
>
> The only solution I see here is for Linux to ret-thunk the hypercall
> page too.  Under Xen, the hypercall page is mutable by the guest and
> there is room to turn every ret into a jmp, but obviously none of this
> is covered by any formal ABI, and this probably needs more careful
> consideration than the short time I've put towards it.

Well, that makes the guest side "safe", but isn't a deep hypercall > 16
already underflowing in the hypervisor code before it returns to the
guest?

> That said, after a return from the hypervisor, Linux has no idea what
> state the RSB is in, so the only safe course of action is to re-stuff.

Indeed.

Another proof for my claim that virt creates more problems than it
solves.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: Virt Call depth tracking mitigation
  2022-07-19 10:24 ` Virt " Andrew Cooper
  2022-07-19 14:13   ` Thomas Gleixner
@ 2022-07-19 14:45   ` Michael Kelley (LINUX)
  2022-07-19 20:16     ` Peter Zijlstra
  1 sibling, 1 reply; 142+ messages in thread
From: Michael Kelley (LINUX) @ 2022-07-19 14:45 UTC (permalink / raw)
  To: Andrew Cooper, Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui

From: Andrew Cooper <Andrew.Cooper3@citrix.com> Sent: Tuesday, July 19, 2022 3:25 AM
> 
> On 17/07/2022 00:17, Thomas Gleixner wrote:
> > As IBRS is a performance horror show, Peter Zijstra and me revisited the
> > call depth tracking approach and implemented it in a way which is hopefully
> > more palatable and avoids the downsides of the original attempt.
> >
> > We both unsurprisingly hate the result with a passion...
> 
> And I hate to add more problems, but here we go.
> 
> Under virt, it's not just SMI's which might run behind your back.
> Regular interrupts/etc can probably be hand-waved away in the same way
> that SMIs are.
> 
> Hypercalls however are a different matter.
> 
> Xen and HyperV both have hypercall pages, where the hypervisor provides
> some executable code for the guest kernel to use.
> 
> Under the current scheme, the calls into the hypercall pages get
> accounted, as objtool can see them, but the ret's don't.  This imbalance
> is exasperated because some hypercalls are called in loops.
> 
> Worse however, it opens a hole where branch history is calculable and
> the ret can reliably underflow.  This occurs when there's a minimal call
> depth in Linux to get to the hypercall, and then a call depth of >16 in
> the hypervisor.
> 
> The only variable in these cases is how much user control there is of
> the registers, and I for one am not feeling lucky in face of the current
> research.
> 
> The only solution I see here is for Linux to ret-thunk the hypercall
> page too.  Under Xen, the hypercall page is mutable by the guest and
> there is room to turn every ret into a jmp, but obviously none of this
> is covered by any formal ABI, and this probably needs more careful
> consideration than the short time I've put towards it.
> 
> That said, after a return from the hypervisor, Linux has no idea what
> state the RSB is in, so the only safe course of action is to re-stuff.
> 
> CC'ing the HyperV folk for input on their side.

In Hyper-V, the hypercall page is *not* writable by the guest.  Quoting
from Section 3.13 in the Hyper-V TLFS:

    The hypercall page appears as an "overlay" to the GPA space; that is,
    it covers whatever else is mapped to the GPA range. Its contents are
    readable and executable by the guest. Attempts to write to the
    hypercall page will result in a protection (#GP) exception.

And:

    After the interface has been established, the guest can initiate a
    hypercall. To do so, it populates the registers per the hypercall protocol
    and issues a CALL to the beginning of the hypercall page. The guest
    should assume the hypercall page performs the equivalent of a near
    return (0xC3) to return to the caller.  As such, the hypercall must be
    invoked with a valid stack.

Michael

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: Virt Call depth tracking mitigation
  2022-07-19 14:13   ` Thomas Gleixner
@ 2022-07-19 16:23     ` Andrew Cooper
  2022-07-19 21:17       ` Thomas Gleixner
  0 siblings, 1 reply; 142+ messages in thread
From: Andrew Cooper @ 2022-07-19 16:23 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann, kys,
	haiyangz, Stephen Hemminger, Wei Liu, decui, Michael Kelley

On 19/07/2022 15:13, Thomas Gleixner wrote:
> On Tue, Jul 19 2022 at 10:24, Andrew Cooper wrote:
>> On 17/07/2022 00:17, Thomas Gleixner wrote:
>>> As IBRS is a performance horror show, Peter Zijstra and me revisited the
>>> call depth tracking approach and implemented it in a way which is hopefully
>>> more palatable and avoids the downsides of the original attempt.
>>>
>>> We both unsurprisingly hate the result with a passion...
>> And I hate to add more problems, but here we go.
>>
>> Under virt, it's not just SMI's which might run behind your back. 
>> Regular interrupts/etc can probably be hand-waved away in the same way
>> that SMIs are.
> You mean host side interrupts, right?

Yes.

>
>> Hypercalls however are a different matter.
>>
>> Xen and HyperV both have hypercall pages, where the hypervisor provides
>> some executable code for the guest kernel to use.
>>
>> Under the current scheme, the calls into the hypercall pages get
>> accounted, as objtool can see them, but the ret's don't.  This imbalance
>> is exasperated because some hypercalls are called in loops.
> Bah.
>
>> Worse however, it opens a hole where branch history is calculable and
>> the ret can reliably underflow.  This occurs when there's a minimal call
>> depth in Linux to get to the hypercall, and then a call depth of >16 in
>> the hypervisor.
>>
>> The only variable in these cases is how much user control there is of
>> the registers, and I for one am not feeling lucky in face of the current
>> research.
>>
>> The only solution I see here is for Linux to ret-thunk the hypercall
>> page too.  Under Xen, the hypercall page is mutable by the guest and
>> there is room to turn every ret into a jmp, but obviously none of this
>> is covered by any formal ABI, and this probably needs more careful
>> consideration than the short time I've put towards it.
> Well, that makes the guest side "safe", but isn't a deep hypercall > 16
> already underflowing in the hypervisor code before it returns to the
> guest?

Yeah, but that's the hypervisor's problem to deal with, in whatever
manner it sees fit.

And if the hypervisor is using IBeeRS then the first ret in guest
context will underflow.

>> That said, after a return from the hypervisor, Linux has no idea what
>> state the RSB is in, so the only safe course of action is to re-stuff.
> Indeed.
>
> Another proof for my claim that virt creates more problems than it
> solves.

So how did you like debugging the gsbase crash on native hardware. :)

~Andrew

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  8:26                         ` David Laight
@ 2022-07-19 16:27                           ` Linus Torvalds
  2022-07-19 17:23                             ` Sami Tolvanen
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19 16:27 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, Sami Tolvanen, Joao Moreira, Peter Zijlstra,
	LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 1:26 AM David Laight <David.Laight@aculab.com> wrote:
>
> Shouldn't it be testing the value the caller supplied?

Actually, I'm just all confused.

All that verification code is *in* the caller, before the call - to
verify that the target looks fine.

I think I was confused by the hash thunk above the function also being
generated with a "cmpl $hash". And I don't even know why that is, and
why it wasn't just the bare constant.

            Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:11                           ` Linus Torvalds
  2022-07-19  0:23                             ` Peter Zijlstra
@ 2022-07-19 17:19                             ` Sami Tolvanen
  2022-07-20 21:13                               ` Peter Zijlstra
  1 sibling, 1 reply; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-19 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Mon, Jul 18, 2022 at 05:11:27PM -0700, Linus Torvalds wrote:
> On Mon, Jul 18, 2022 at 5:03 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So it already only adds the pattern to things that have their address
> > taken, not all functions?

The preamble is added to address-taken static functions and all global
functions, because those might be indirectly called from other
translation units. With LTO, we could prune unnecessary preambles from
non-address-taken globals too.

> > If so, that's simple enough to sort out: don't do any RSB stack
> > adjustment for those thunks AT ALL.
> >
> > Because they should just then end up with a jump to the "real" target,
> > and that real target will do the RSB stack thing.
> 
> Put another way, let's say that you have a function that looks like this:
> 
>   int silly(void)
>   {
>        return 0;
>   }
> 
> and now you have two cases:
> 
>  - the "direct callable version" of that function looks exactly the
> way it always has looked, and gets the 16 bytes of padding for it, and
> the RSB counting can happen in that padding
> 
>  - the "somebody took the address of this function" creates code that
> has the hash marker before it, and has the hash check, and then does a
> "jmp silly" to actually jump to the real code.

Clang's current CFI implementation is somewhat similar to this. It
creates separate thunks for address-taken functions and changes
function addresses in C code to point to the thunks instead.

While this works, it creates painful situations when interacting with
assembly (e.g. a function address taken in assembly cannot be used
for indirect calls in C as it doesn't point to the thunk) and needs
unpleasant hacks when we want take the actual function address in C
(i.e. scattering the code with function_nocfi() calls).

I have to agree with Peter on this, I would rather avoid messing with
function pointers in KCFI to avoid these issues.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19  0:19                         ` Joao Moreira
@ 2022-07-19 17:21                           ` Sami Tolvanen
  2022-07-19 17:58                             ` Joao Moreira
  0 siblings, 1 reply; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-19 17:21 UTC (permalink / raw)
  To: Joao Moreira
  Cc: Linus Torvalds, Thomas Gleixner, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Mon, Jul 18, 2022 at 05:19:13PM -0700, Joao Moreira wrote:
> > The extra instruction is likely less of a problem than the extra
> > register used.
> > 
> FWIIW, per-ABI, R11 is a scratch-reg and should be usable without hard
> consequences in this scenario.

Clang always uses r11 for the indirect call with retpolines, so we'd
need to use another register. Nevertheless, splitting the constant into
two instructions would solve the call target gadget issue.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 16:27                           ` Linus Torvalds
@ 2022-07-19 17:23                             ` Sami Tolvanen
  2022-07-19 17:27                               ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-19 17:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Thomas Gleixner, Joao Moreira, Peter Zijlstra,
	LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 09:27:02AM -0700, Linus Torvalds wrote:
> On Tue, Jul 19, 2022 at 1:26 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > Shouldn't it be testing the value the caller supplied?
> 
> Actually, I'm just all confused.
> 
> All that verification code is *in* the caller, before the call - to
> verify that the target looks fine.
> 
> I think I was confused by the hash thunk above the function also being
> generated with a "cmpl $hash". And I don't even know why that is, and
> why it wasn't just the bare constant.

The preamble hash is encoded into an instruction just to avoid special
casing objtool, which would otherwise get confused about the random
bytes. On arm64, we just emit a bare constant before the function.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 17:23                             ` Sami Tolvanen
@ 2022-07-19 17:27                               ` Linus Torvalds
  2022-07-19 18:06                                 ` Sami Tolvanen
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-19 17:27 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: David Laight, Thomas Gleixner, Joao Moreira, Peter Zijlstra,
	LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 10:23 AM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> The preamble hash is encoded into an instruction just to avoid special
> casing objtool, which would otherwise get confused about the random
> bytes. On arm64, we just emit a bare constant before the function.

Ahh.

I think objtool would want to understand about kCFI anyway, so I think
in the long run that hack isn't a goog idea.

But I get why you'd do it as a "do this as just a compiler thing and
hide it from objtool" as a development strategy.

                Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 17:21                           ` Sami Tolvanen
@ 2022-07-19 17:58                             ` Joao Moreira
  0 siblings, 0 replies; 142+ messages in thread
From: Joao Moreira @ 2022-07-19 17:58 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Peter Zijlstra, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

> Clang always uses r11 for the indirect call with retpolines, so we'd
> need to use another register. Nevertheless, splitting the constant into
> two instructions would solve the call target gadget issue.

Yeah, it clicked later yesterday. But, FWIIW, R10 is also considered a 
scratch register, although used for passing static chain pointers which 
I think is not a thing in kernel context. Last case scenario we can 
always do liveness analysis and I doubt we'll have a significant (if 
any) number of spills.

If we are comparing through registers, I would suggest using a sub 
instruction instead of a cmp, as this will destroy the contents of the 
register and prevent it from being re-used on further unprotected 
indirect branches, if any exists.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 17:27                               ` Linus Torvalds
@ 2022-07-19 18:06                                 ` Sami Tolvanen
  2022-07-19 20:10                                   ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-19 18:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Thomas Gleixner, Joao Moreira, Peter Zijlstra,
	LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 10:27:00AM -0700, Linus Torvalds wrote:
> On Tue, Jul 19, 2022 at 10:23 AM Sami Tolvanen <samitolvanen@google.com> wrote:
> >
> > The preamble hash is encoded into an instruction just to avoid special
> > casing objtool, which would otherwise get confused about the random
> > bytes. On arm64, we just emit a bare constant before the function.
> 
> Ahh.
> 
> I think objtool would want to understand about kCFI anyway, so I think
> in the long run that hack isn't a goog idea.
> 
> But I get why you'd do it as a "do this as just a compiler thing and
> hide it from objtool" as a development strategy.

I believe it was actually Peter's idea to use an instruction. :) In
earlier revisions of KCFI, I did teach objtool about the preambles, but
that was just so it can ignore them.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 18:06                                 ` Sami Tolvanen
@ 2022-07-19 20:10                                   ` Peter Zijlstra
  0 siblings, 0 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-19 20:10 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, David Laight, Thomas Gleixner, Joao Moreira,
	LKML, the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 11:06:40AM -0700, Sami Tolvanen wrote:
> On Tue, Jul 19, 2022 at 10:27:00AM -0700, Linus Torvalds wrote:
> > On Tue, Jul 19, 2022 at 10:23 AM Sami Tolvanen <samitolvanen@google.com> wrote:
> > >
> > > The preamble hash is encoded into an instruction just to avoid special
> > > casing objtool, which would otherwise get confused about the random
> > > bytes. On arm64, we just emit a bare constant before the function.
> > 
> > Ahh.
> > 
> > I think objtool would want to understand about kCFI anyway, so I think
> > in the long run that hack isn't a goog idea.
> > 
> > But I get why you'd do it as a "do this as just a compiler thing and
> > hide it from objtool" as a development strategy.
> 
> I believe it was actually Peter's idea to use an instruction. :) In
> earlier revisions of KCFI, I did teach objtool about the preambles, but
> that was just so it can ignore them.

Right; even if we teach objtool about kCFI, having text be actual
instructions makes things much nicer. Objdump and friends also shit
their pants if you put random bytes in. It only costs a single byte to
encode the immediate, so why not.

Specifically, the encoding used is:

	movl $0x12345678, %eax

and that is 0xb8 followed by the constant, but there's plenty other
single byte ops that could be used.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: Virt Call depth tracking mitigation
  2022-07-19 14:45   ` Michael Kelley (LINUX)
@ 2022-07-19 20:16     ` Peter Zijlstra
  0 siblings, 0 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-19 20:16 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Andrew Cooper, Thomas Gleixner, LKML, x86, Linus Torvalds,
	Tim Chen, Josh Poimboeuf, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Wei Liu, Dexuan Cui

On Tue, Jul 19, 2022 at 02:45:40PM +0000, Michael Kelley (LINUX) wrote:

> In Hyper-V, the hypercall page is *not* writable by the guest.  Quoting
> from Section 3.13 in the Hyper-V TLFS:
> 
>     The hypercall page appears as an "overlay" to the GPA space; that is,
>     it covers whatever else is mapped to the GPA range. Its contents are
>     readable and executable by the guest. Attempts to write to the
>     hypercall page will result in a protection (#GP) exception.
> 
> And:
> 
>     After the interface has been established, the guest can initiate a
>     hypercall. To do so, it populates the registers per the hypercall protocol
>     and issues a CALL to the beginning of the hypercall page. The guest
>     should assume the hypercall page performs the equivalent of a near
>     return (0xC3) to return to the caller.  As such, the hypercall must be
>     invoked with a valid stack.

I'm hoping that these days you're following that 0xc3 with a 0xcc at the
very least ?

IIRC the whole hyper-v thing is negotiated using (virtual) MSRs, would
it be possible to write the address of a return thunk into an MSR and
have the hypervisor rewrite the hypercall page accordingly?

This is needed for the AMD jmp2ret thing anyway. Or you get to eat an
IBPB before every hypercall, which I'm guessing your performance people
aren't keen on.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: Virt Call depth tracking mitigation
  2022-07-19 16:23     ` Andrew Cooper
@ 2022-07-19 21:17       ` Thomas Gleixner
  0 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-19 21:17 UTC (permalink / raw)
  To: Andrew Cooper, LKML
  Cc: x86, Linus Torvalds, Tim Chen, Josh Poimboeuf, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann, kys,
	haiyangz, Stephen Hemminger, Wei Liu, decui, Michael Kelley

On Tue, Jul 19 2022 at 16:23, Andrew Cooper wrote:
> On 19/07/2022 15:13, Thomas Gleixner wrote:
>> Well, that makes the guest side "safe", but isn't a deep hypercall > 16
>> already underflowing in the hypervisor code before it returns to the
>> guest?
>
> Yeah, but that's the hypervisor's problem to deal with, in whatever
> manner it sees fit.
>
> And if the hypervisor is using IBeeRS then the first ret in guest
> context will underflow.

I have a look tomorrow.

>>> That said, after a return from the hypervisor, Linux has no idea what
>>> state the RSB is in, so the only safe course of action is to re-stuff.
>> Indeed.
>>
>> Another proof for my claim that virt creates more problems than it
>> solves.
>
> So how did you like debugging the gsbase crash on native hardware. :)

First of all I said it's creating more problems than it solves, which
means it solves some problems.

But more important, I'm not a wimp.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:51             ` Peter Zijlstra
@ 2022-07-20  9:00               ` Thomas Gleixner
  2022-07-20 16:55               ` Sami Tolvanen
  2022-07-20 19:42               ` Sami Tolvanen
  2 siblings, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-20  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Sami Tolvanen
  Cc: Linus Torvalds, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Steven Rostedt, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Tue, Jul 19 2022 at 01:51, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 03:48:04PM -0700, Sami Tolvanen wrote:
>> On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> > Ofc, we can still put the whole:
>> >
>> >         sarq    $5, PER_CPU_VAR(__x86_call_depth);
>> >         jmp     \func_direct
>> >
>> > thing in front of that.
>> 
>> Sure, that would work.
>
> So if we assume \func starts with ENDBR, and further assume we've fixed
> up every direct jmp/call to land at +4, we can overwrite the ENDBR with
> part of the SARQ, that leaves us 6 more byte, placing the immediate at
> -10 if I'm not mis-counting.
>
> Now, the call sites are:
>
> 41 81 7b fa 78 56 34 12		cmpl	$0x12345678, -6(%r11)
> 74 02				je	1f
> 0f 0b				ud2
> e8 00 00 00 00		1:	call	__x86_indirect_thunk_r11
>
> That means the offset of +10 lands in the middle of the CALL
> instruction, and since we only have 16 thunks there is a limited number
> of byte patterns available there.
>
> This really isn't as nice as the -6 but might just work well enough,
> hmm?

So I added a 32byte padding and put the thunk at the start:

        sarq    $5, PER_CPU_VAR(__x86_call_depth);
        jmp     \func_direct

For sockperf that costs about 1% performance vs. the 16 byte
variant. For mitigations=off it's a ~0.5% drop.

That's on a SKL. Did not check on other systems yet.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:51             ` Peter Zijlstra
  2022-07-20  9:00               ` Thomas Gleixner
@ 2022-07-20 16:55               ` Sami Tolvanen
  2022-07-20 19:42               ` Sami Tolvanen
  2 siblings, 0 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-20 16:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Tue, Jul 19, 2022 at 01:51:14AM +0200, Peter Zijlstra wrote:
> So if we assume \func starts with ENDBR, and further assume we've fixed
> up every direct jmp/call to land at +4, we can overwrite the ENDBR with
> part of the SARQ, that leaves us 6 more byte, placing the immediate at
> -10 if I'm not mis-counting.
> 
> Now, the call sites are:
> 
> 41 81 7b fa 78 56 34 12		cmpl	$0x12345678, -6(%r11)
> 74 02				je	1f
> 0f 0b				ud2
> e8 00 00 00 00		1:	call	__x86_indirect_thunk_r11
> 
> That means the offset of +10 lands in the middle of the CALL
> instruction, and since we only have 16 thunks there is a limited number
> of byte patterns available there.
> 
> This really isn't as nice as the -6 but might just work well enough,
> hmm?

I agree, this is probably fine, or at least low enough risk.

Did you have thoughts about changing the check instruction sequence
to split the hash into multiple instructions and thus avoiding this
issue altogether?

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
                   ` (41 preceding siblings ...)
  2022-07-19 10:24 ` Virt " Andrew Cooper
@ 2022-07-20 16:57 ` Steven Rostedt
  2022-07-20 17:09   ` Linus Torvalds
  42 siblings, 1 reply; 142+ messages in thread
From: Steven Rostedt @ 2022-07-20 16:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 4526 bytes --]


I just ran my ftrace test suite against v5.19-rc7 to see if anything pops
up. And one of my boot tests failed with:

[    2.459713] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[    2.460712] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    2.461712] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    2.462713] Spectre V2 : Mitigation: Retpolines
[    2.464712] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    2.465712] Speculative Store Bypass: Vulnerable
[    2.466713] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    2.467712] SRBDS: Vulnerable: No microcode
[    2.488002] ------------[ cut here ]------------
[    2.488712] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:558 apply_returns+0xa3/0x1ec
[    2.489712] Modules linked in:
[    2.490712] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc7-test #65
[    2.491712] Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
[    2.492712] RIP: 0010:apply_returns+0xa3/0x1ec
[    2.493712] Code: 0f b6 5d a2 48 63 45 88 48 01 c3 4c 01 e3 48 89 da 4c 89 e7 e8 c2 25 00 00 84 c0 0f 85 1f 01 00 00 48 81 fb c0 49 40 a1 74 07 <0f> 0b e9 0f 01 00 00 83 3d a1 83 4f 02 00 74 24 0f b6 55 a2 48 63
[    2.494712] RSP: 0000:ffffffffa1e03df0 EFLAGS: 00010206
[    2.495711] RAX: 0000000000000000 RBX: ffffffffa173e8ad RCX: 0000000000000001
[    2.496711] RDX: 0000000000000003 RSI: ffffffffa161b20c RDI: ffffffffa173e8ad
[    2.497711] RBP: ffffffffa1e03ea8 R08: 00000000fffffff1 R09: 000000000000000f
[    2.498711] R10: ffffffffa1e03db8 R11: 0000000000000b03 R12: ffffffffa173e8a8
[    2.499711] R13: ffffffffa2550d30 R14: 0000000000000000 R15: ffffffffa1e32138
[    2.500712] FS:  0000000000000000(0000) GS:ffff8e8796400000(0000) knlGS:0000000000000000
[    2.501711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.502711] CR2: ffff8e879edff000 CR3: 0000000014e2a001 CR4: 00000000001706f0
[    2.503712] Call Trace:
[    2.504712]  <TASK>
[    2.505721]  alternative_instructions+0x39/0xe9
[    2.506712]  check_bugs+0x310/0x330
[    2.507712]  start_kernel+0x605/0x63e
[    2.508715]  x86_64_start_reservations+0x24/0x2a
[    2.509712]  x86_64_start_kernel+0x8d/0x97
[    2.510713]  secondary_startup_64_no_verify+0xe0/0xeb
[    2.511719]  </TASK>
[    2.512712] irq event stamp: 142170
[    2.513711] hardirqs last  enabled at (142180): [<ffffffffa014427c>] __up_console_sem+0x4b/0x53
[    2.514711] hardirqs last disabled at (142189): [<ffffffffa014425c>] __up_console_sem+0x2b/0x53
[    2.515712] softirqs last  enabled at (8013): [<ffffffffa1400389>] __do_softirq+0x389/0x3c8
[    2.516711] softirqs last disabled at (8006): [<ffffffffa00e2daa>] __irq_exit_rcu+0x72/0xcb
[    2.517711] ---[ end trace 0000000000000000 ]---
[    2.529074] Freeing SMP alternatives memory: 44K
[    2.633924] smpboot: CPU0: Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz (family: 0x6, model: 0x3c, stepping: 0x3)
[    2.636420] cblist_init_generic: Setting adjustable number of callback queues.
[    2.636712] cblist_init_generic: Setting shift to 2 and lim to 1.
[    2.637821] cblist_init_generic: Setting shift to 2 and lim to 1.
[    2.638822] Running RCU-tasks wait API self tests
[    2.742759] Performance Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, Intel PMU driver.
[    2.743718] ... version:                3
[    2.744712] ... bit width:              48
[    2.745712] ... generic registers:      4
[    2.746712] ... value mask:             0000ffffffffffff
[    2.747712] ... max period:             00007fffffffffff
[    2.748712] ... fixed-purpose events:   3
[    2.749712] ... event mask:             000000070000000f
[    2.751038] Estimated ratio of average max frequency by base frequency (times 1024): 1024
[    2.751848] rcu: Hierarchical SRCU implementation.
[    2.755405] smp: Bringing up secondary CPUs ...
[    2.756470] x86: Booting SMP configuration:
[    2.757713] .... node  #0, CPUs:      #1
[    1.309155] numa_add_cpu cpu 1 node 0: mask now 0-1
[    2.767842]  #2
[    1.309155] numa_add_cpu cpu 2 node 0: mask now 0-2
[    2.774816] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    2.777343]  #3
[    1.309155] numa_add_cpu cpu 3 node 0: mask now 0-3
[    2.784230] smp: Brought up 1 node, 4 CPUs
[    2.784713] smpboot: Max logical packages: 1


Attached is the config.

-- Steve

[-- Attachment #2: config-bad --]
[-- Type: application/octet-stream, Size: 142423 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.19.0-rc7 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 11.3.0-3) 11.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=110300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# CONFIG_BPF_JIT is not set
# end of BPF subsystem

CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
# CONFIG_SCHED_CORE is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_MISC is not set
CONFIG_CGROUP_DEBUG=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_FHANDLE is not set
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_GUEST_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=1024
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_AUDIT_ARCH=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
# CONFIG_X86_CPU_RESCTRL is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_HYPERVISOR_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_PROCESSOR_SELECT=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=256
CONFIG_SCHED_CLUSTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCELOG_LEGACY is not set
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=y

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_PERF_EVENTS_INTEL_RAPL=y
CONFIG_PERF_EVENTS_INTEL_CSTATE=y
# CONFIG_PERF_EVENTS_AMD_POWER is not set
CONFIG_PERF_EVENTS_AMD_UNCORE=y
# CONFIG_PERF_EVENTS_AMD_BRS is not set
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
# CONFIG_MICROCODE_LATE_LOADING is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
# CONFIG_X86_PMEM_LEGACY is not set
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_UMIP=y
CONFIG_CC_HAS_IBT=y
# CONFIG_X86_KERNEL_IBT is not set
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_X86_SGX is not set
CONFIG_EFI=y
# CONFIG_EFI_STUB is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
# CONFIG_KEXEC_FILE is not set
CONFIG_CRASH_DUMP=y
# CONFIG_KEXEC_JUMP is not set
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
CONFIG_COMPAT_VDSO=y
CONFIG_LEGACY_VSYSCALL_XONLY=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
CONFIG_HAVE_LIVEPATCH=y
# CONFIG_LIVEPATCH is not set
# end of Processor type and features

CONFIG_CC_HAS_SLS=y
CONFIG_CC_HAS_RETURN_THUNK=y
CONFIG_SPECULATION_MITIGATIONS=y
CONFIG_PAGE_TABLE_ISOLATION=y
CONFIG_RETPOLINE=y
CONFIG_RETHUNK=y
CONFIG_CPU_UNRET_ENTRY=y
CONFIG_CPU_IBPB_ENTRY=y
CONFIG_CPU_IBRS_ENTRY=y
# CONFIG_SLS is not set
CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_SUSPEND_SKIP_SYNC is not set
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=100
CONFIG_PM_WAKELOCKS_GC=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
CONFIG_PM_TEST_SUSPEND=y
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
# CONFIG_ACPI_FPDT is not set
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
# CONFIG_ACPI_TAD is not set
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_PLATFORM_PROFILE=y
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_HOTPLUG_MEMORY is not set
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=y
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_CUSTOM_METHOD is not set
# CONFIG_ACPI_BGRT is not set
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
# CONFIG_ACPI_NFIT is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
# CONFIG_ACPI_APEI is not set
# CONFIG_ACPI_DPTF is not set
# CONFIG_ACPI_EXTLOG is not set
# CONFIG_ACPI_CONFIGFS is not set
# CONFIG_ACPI_PFRUT is not set
CONFIG_ACPI_PCC=y
# CONFIG_PMIC_OPREGION is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_PRMT=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
# CONFIG_X86_AMD_PSTATE is not set
CONFIG_X86_ACPI_CPUFREQ=y
CONFIG_X86_ACPI_CPUFREQ_CPB=y
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_AMD_FREQ_SENSITIVITY is not set
CONFIG_X86_SPEEDSTEP_CENTRINO=y
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# end of CPU Idle

# CONFIG_INTEL_IDLE is not set
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_MMCONF_FAM10H=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
# CONFIG_ISA_BUS is not set
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32_ABI is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
# end of Binary Emulations

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_PFNCACHE=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_DIRTY_RING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_KVM_XFER_TO_GUEST_WORK=y
CONFIG_HAVE_KVM_PM_NOTIFIER=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
CONFIG_KVM_WERROR=y
CONFIG_KVM_INTEL=y
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_XEN is not set
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_KRETPROBE_ON_RETHOOK=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_HUGE_VMALLOC=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_OBJTOOL=y
CONFIG_HAVE_JUMP_LABEL_HACK=y
CONFIG_HAVE_NOINSTR_HACK=y
CONFIG_HAVE_NOINSTR_VALIDATION=y
CONFIG_HAVE_UACCESS_VALIDATION=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
# CONFIG_BLK_CGROUP_IOPRIO is not set
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_PM=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
# CONFIG_IOSCHED_BFQ is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SWAP=y
# CONFIG_ZSWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_STATS=y
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
CONFIG_COMPAT_BRK=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_MHP_MEMMAP_ON_MEMORY=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_WANTS_THP_SWAP=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
# CONFIG_CMA is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
# CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ARCH_HAS_VM_GET_PAGE_PROT=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ARCH_HAS_ZONE_DMA_SET=y
CONFIG_ZONE_DMA=y
CONFIG_ZONE_DMA32=y
# CONFIG_ZONE_DEVICE is not set
CONFIG_VMAP_PFN=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_DIAG is not set
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
# CONFIG_UNIX_DIAG is not set
# CONFIG_TLS is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_USER_COMPAT is not set
# CONFIG_XFRM_INTERFACE is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_AH=y
CONFIG_XFRM_ESP=y
CONFIG_XFRM_IPCOMP=y
CONFIG_NET_KEY=y
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_NET_IP_TUNNEL=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_NET_IPVTI is not set
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
# CONFIG_INET_ESP_OFFLOAD is not set
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=y
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_INET_UDP_DIAG is not set
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=y
CONFIG_TCP_CONG_HSTCP=y
CONFIG_TCP_CONG_HYBLA=y
CONFIG_TCP_CONG_VEGAS=y
# CONFIG_TCP_CONG_NV is not set
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=y
CONFIG_TCP_CONG_YEAH=y
CONFIG_TCP_CONG_ILLINOIS=y
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
# CONFIG_TCP_CONG_BBR is not set
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_HYBLA is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_VENO is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_IPV6_ILA is not set
# CONFIG_IPV6_VTI is not set
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_MPTCP is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_EGRESS=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NETFILTER_NETLINK_OSF=y
CONFIG_NF_CONNTRACK=y
# CONFIG_NF_LOG_SYSLOG is not set
CONFIG_NETFILTER_CONNCOUNT=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
# CONFIG_NF_CONNTRACK_TIMEOUT is not set
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
# CONFIG_NF_CONNTRACK_LABELS is not set
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_H323=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_BROADCAST=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=y
# CONFIG_NF_CONNTRACK_SNMP is not set
CONFIG_NF_CONNTRACK_PPTP=y
CONFIG_NF_CONNTRACK_SANE=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NF_CONNTRACK_TFTP=y
CONFIG_NF_CT_NETLINK=y
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
CONFIG_NF_NAT=y
CONFIG_NF_NAT_AMANDA=y
CONFIG_NF_NAT_FTP=y
CONFIG_NF_NAT_IRC=y
CONFIG_NF_NAT_SIP=y
CONFIG_NF_NAT_TFTP=y
# CONFIG_NF_TABLES is not set
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_CONNMARK=y

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_AUDIT is not set
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
CONFIG_NETFILTER_XT_TARGET_CT=y
CONFIG_NETFILTER_XT_TARGET_DSCP=y
CONFIG_NETFILTER_XT_TARGET_HL=y
# CONFIG_NETFILTER_XT_TARGET_HMARK is not set
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
CONFIG_NETFILTER_XT_TARGET_LED=y
# CONFIG_NETFILTER_XT_TARGET_LOG is not set
CONFIG_NETFILTER_XT_TARGET_MARK=y
# CONFIG_NETFILTER_XT_NAT is not set
# CONFIG_NETFILTER_XT_TARGET_NETMAP is not set
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
CONFIG_NETFILTER_XT_TARGET_RATEEST=y
# CONFIG_NETFILTER_XT_TARGET_REDIRECT is not set
# CONFIG_NETFILTER_XT_TARGET_MASQUERADE is not set
CONFIG_NETFILTER_XT_TARGET_TEE=y
CONFIG_NETFILTER_XT_TARGET_TPROXY=y
CONFIG_NETFILTER_XT_TARGET_TRACE=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=y
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
# CONFIG_NETFILTER_XT_MATCH_CONNLABEL is not set
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_CPU=y
CONFIG_NETFILTER_XT_MATCH_DCCP=y
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
CONFIG_NETFILTER_XT_MATCH_DSCP=y
CONFIG_NETFILTER_XT_MATCH_ECN=y
CONFIG_NETFILTER_XT_MATCH_ESP=y
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
CONFIG_NETFILTER_XT_MATCH_HELPER=y
CONFIG_NETFILTER_XT_MATCH_HL=y
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=y
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
CONFIG_NETFILTER_XT_MATCH_MAC=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=y
CONFIG_NETFILTER_XT_MATCH_OWNER=y
CONFIG_NETFILTER_XT_MATCH_POLICY=y
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=y
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_RATEEST=y
CONFIG_NETFILTER_XT_MATCH_REALM=y
CONFIG_NETFILTER_XT_MATCH_RECENT=y
CONFIG_NETFILTER_XT_MATCH_SCTP=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
CONFIG_NETFILTER_XT_MATCH_STRING=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_TIME=y
CONFIG_NETFILTER_XT_MATCH_U32=y
# end of Core Netfilter Configuration

# CONFIG_IP_SET is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_SOCKET_IPV4=y
CONFIG_NF_TPROXY_IPV4=y
CONFIG_NF_DUP_IPV4=y
# CONFIG_NF_LOG_ARP is not set
# CONFIG_NF_LOG_IPV4 is not set
CONFIG_NF_REJECT_IPV4=y
CONFIG_NF_NAT_PPTP=y
CONFIG_NF_NAT_H323=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_AH=y
CONFIG_IP_NF_MATCH_ECN=y
# CONFIG_IP_NF_MATCH_RPFILTER is not set
CONFIG_IP_NF_MATCH_TTL=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
# CONFIG_IP_NF_TARGET_SYNPROXY is not set
# CONFIG_IP_NF_NAT is not set
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_CLUSTERIP=y
CONFIG_IP_NF_TARGET_ECN=y
CONFIG_IP_NF_TARGET_TTL=y
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_ARPTABLES=y
CONFIG_IP_NF_ARPFILTER=y
CONFIG_IP_NF_ARP_MANGLE=y
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_SOCKET_IPV6 is not set
# CONFIG_NF_TPROXY_IPV6 is not set
# CONFIG_NF_DUP_IPV6 is not set
# CONFIG_NF_REJECT_IPV6 is not set
# CONFIG_NF_LOG_IPV6 is not set
# CONFIG_IP6_NF_IPTABLES is not set
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=y
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=y
CONFIG_BRIDGE_EBT_BROUTE=y
CONFIG_BRIDGE_EBT_T_FILTER=y
CONFIG_BRIDGE_EBT_T_NAT=y
CONFIG_BRIDGE_EBT_802_3=y
CONFIG_BRIDGE_EBT_AMONG=y
CONFIG_BRIDGE_EBT_ARP=y
CONFIG_BRIDGE_EBT_IP=y
# CONFIG_BRIDGE_EBT_IP6 is not set
CONFIG_BRIDGE_EBT_LIMIT=y
CONFIG_BRIDGE_EBT_MARK=y
CONFIG_BRIDGE_EBT_PKTTYPE=y
CONFIG_BRIDGE_EBT_STP=y
CONFIG_BRIDGE_EBT_VLAN=y
CONFIG_BRIDGE_EBT_ARPREPLY=y
CONFIG_BRIDGE_EBT_DNAT=y
CONFIG_BRIDGE_EBT_MARK_T=y
CONFIG_BRIDGE_EBT_REDIRECT=y
CONFIG_BRIDGE_EBT_SNAT=y
CONFIG_BRIDGE_EBT_LOG=y
CONFIG_BRIDGE_EBT_NFLOG=y
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_MRP is not set
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFB is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
CONFIG_NET_SCH_NETEM=y
# CONFIG_NET_SCH_DRR is not set
# CONFIG_NET_SCH_MQPRIO is not set
# CONFIG_NET_SCH_SKBPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_CODEL is not set
# CONFIG_NET_SCH_FQ_CODEL is not set
# CONFIG_NET_SCH_CAKE is not set
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
# CONFIG_NET_SCH_PLUG is not set
# CONFIG_NET_SCH_ETS is not set
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
# CONFIG_NET_CLS_CGROUP is not set
# CONFIG_NET_CLS_BPF is not set
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_CLS_MATCHALL is not set
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_NET_NSH is not set
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
# CONFIG_MCTP is not set
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
# CONFIG_PSAMPLE is not set
# CONFIG_NET_IFE is not set
# CONFIG_LWTUNNEL is not set
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_NET_SELFTESTS=y
CONFIG_FAILOVER=y
CONFIG_ETHTOOL_NETLINK=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
CONFIG_PCIEAER=y
# CONFIG_PCIEAER_INJECT is not set
# CONFIG_PCIE_ECRC is not set
# CONFIG_PCIEASPM is not set
CONFIG_PCIE_PME=y
# CONFIG_PCIE_DPC is not set
# CONFIG_PCIE_PTM is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_PCI_LOCKLESS_CONFIG=y
# CONFIG_PCI_IOV is not set
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
CONFIG_PCI_LABEL=y
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set

#
# PCI controller drivers
#
# CONFIG_VMD is not set

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
CONFIG_PCCARD=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_DEVTMPFS_SAFE is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
# CONFIG_FW_CFG_SYSFS is not set
# CONFIG_SYSFB_SIMPLEFB is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
# CONFIG_EFI_VARS is not set
CONFIG_EFI_ESRT=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y
# CONFIG_EFI_DISABLE_RUNTIME is not set
# CONFIG_EFI_COCO_SECRET is not set
# end of EFI (Extensible Firmware Interface) Support

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
# CONFIG_PARPORT_PC is not set
# CONFIG_PARPORT_AX88796 is not set
# CONFIG_PARPORT_1284 is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_FD_RAWCMD is not set
CONFIG_CDROM=y
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=65536
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=128
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_RBD is not set

#
# NVME Support
#
# CONFIG_BLK_DEV_NVME is not set
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TCP is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

# CONFIG_SENSORS_LIS3_I2C is not set
# CONFIG_ALTERA_STAPL is not set
# CONFIG_INTEL_MEI is not set
# CONFIG_INTEL_MEI_ME is not set
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
# CONFIG_INTEL_MEI_PXP is not set
# CONFIG_VMWARE_VMCI is not set
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=y
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_BLK_DEV_SR is not set
# CONFIG_CHR_DEV_SG is not set
CONFIG_BLK_DEV_BSG=y
# CONFIG_CHR_DEV_SCH is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y
CONFIG_SCSI_SAS_LIBSAS=y
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SRP_ATTRS is not set
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
CONFIG_ISCSI_BOOT_SYSFS=y
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_AIC79XX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC94XX=y
CONFIG_AIC94XX_DEBUG=y
CONFIG_SCSI_MVSAS=y
CONFIG_SCSI_MVSAS_DEBUG=y
# CONFIG_SCSI_MVSAS_TASKLET is not set
# CONFIG_SCSI_MVUMI is not set
CONFIG_SCSI_DPT_I2O=y
CONFIG_SCSI_ADVANSYS=y
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=y
CONFIG_MEGARAID_LEGACY=y
CONFIG_MEGARAID_SAS=y
CONFIG_SCSI_MPT3SAS=y
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
CONFIG_SCSI_MPT2SAS=y
# CONFIG_SCSI_MPI3MR is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
CONFIG_SCSI_ISCI=y
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=y
CONFIG_SCSI_QLA_FC=y
CONFIG_SCSI_QLA_ISCSI=y
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=y
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
# end of SCSI device support

CONFIG_ATA=y
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
CONFIG_SATA_MOBILE_LPM_POLICY=0
# CONFIG_SATA_AHCI_PLATFORM is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_PCMCIA is not set
CONFIG_PATA_PLATFORM=y
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=y
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y
CONFIG_MD_MULTIPATH=y
CONFIG_MD_FAULTY=y
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=y
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
# CONFIG_DM_THIN_PROVISIONING is not set
# CONFIG_DM_CACHE is not set
# CONFIG_DM_WRITECACHE is not set
# CONFIG_DM_EBS is not set
# CONFIG_DM_ERA is not set
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=y
CONFIG_DM_LOG_USERSPACE=y
# CONFIG_DM_RAID is not set
CONFIG_DM_ZERO=y
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=y
CONFIG_DM_MULTIPATH_ST=y
# CONFIG_DM_MULTIPATH_HST is not set
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=y
# CONFIG_DM_DUST is not set
# CONFIG_DM_INIT is not set
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_DM_SWITCH is not set
# CONFIG_DM_LOG_WRITES is not set
# CONFIG_DM_INTEGRITY is not set
# CONFIG_DM_AUDIT is not set
# CONFIG_TARGET_CORE is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=y
CONFIG_FUSION_FC=y
CONFIG_FUSION_SAS=y
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=y
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
# CONFIG_MAC_EMUMOUSEBTN is not set
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
CONFIG_DUMMY=y
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=y
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=y
# CONFIG_TUN_VNET_CROSS_LE is not set
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=y
# CONFIG_NLMON is not set
# CONFIG_ARCNET is not set
CONFIG_ETHERNET=y
CONFIG_MDIO=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_PCMCIA_NMCLAN is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ASIX=y
CONFIG_NET_VENDOR_ATHEROS=y
CONFIG_ATL2=y
CONFIG_ATL1=y
CONFIG_ATL1E=y
CONFIG_ATL1C=y
# CONFIG_ALX is not set
# CONFIG_CX_ECAT is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
CONFIG_BNX2=y
CONFIG_CNIC=y
CONFIG_TIGON3=y
CONFIG_TIGON3_HWMON=y
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
# CONFIG_CAVIUM_PTP is not set
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
CONFIG_NET_VENDOR_DAVICOM=y
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
# CONFIG_TULIP is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_PCMCIA_XIRCOM is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_ENGLEDER=y
# CONFIG_TSNEP is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_FUJITSU=y
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_NET_VENDOR_FUNGIBLE=y
# CONFIG_FUN_ETH is not set
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
CONFIG_E100=y
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
CONFIG_IGB_DCA=y
CONFIG_IGBVF=y
CONFIG_IXGB=y
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
CONFIG_IXGBE_DCA=y
# CONFIG_IXGBEVF is not set
# CONFIG_I40E is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
# CONFIG_IGC is not set
CONFIG_JME=y
CONFIG_NET_VENDOR_LITEX=y
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
CONFIG_SKGE=y
# CONFIG_SKGE_DEBUG is not set
# CONFIG_SKGE_GENESIS is not set
CONFIG_SKY2=y
# CONFIG_SKY2_DEBUG is not set
# CONFIG_OCTEON_EP is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MICROSOFT=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_PCMCIA_PCNET is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
CONFIG_SIS190=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
# CONFIG_SFC_SIENA is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_PCMCIA_SMC91C92 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VERTEXCOM=y
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
CONFIG_VIA_VELOCITY=y
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
CONFIG_NET_VENDOR_XIRCOM=y
# CONFIG_PCMCIA_XIRC2PS is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y
CONFIG_SWPHY=y
# CONFIG_LED_TRIGGER_PHY is not set
CONFIG_FIXED_PHY=y

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
# CONFIG_ADIN_PHY is not set
# CONFIG_ADIN1100_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
CONFIG_AX88796B_PHY=y
CONFIG_BROADCOM_PHY=y
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
CONFIG_BCM_NET_PHYLIB=y
CONFIG_CICADA_PHY=y
# CONFIG_CORTINA_PHY is not set
CONFIG_DAVICOM_PHY=y
CONFIG_ICPLUS_PHY=y
CONFIG_LXT_PHY=y
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
CONFIG_MARVELL_PHY=y
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MARVELL_88X2222_PHY is not set
# CONFIG_MAXLINEAR_GPHY is not set
# CONFIG_MEDIATEK_GE_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_MOTORCOMM_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_C45_TJA11XX_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
CONFIG_QSEMI_PHY=y
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
CONFIG_SMSC_PHY=y
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_DP83TD510_PHY is not set
CONFIG_VITESSE_PHY=y
# CONFIG_XILINX_GMII2RGMII is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_FWNODE_MDIO=y
CONFIG_ACPI_MDIO=y
CONFIG_MDIO_DEVRES=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_THUNDER is not set

#
# MDIO Multiplexers
#

#
# PCS device drivers
#
# CONFIG_PCS_XPCS is not set
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
CONFIG_USB_CATC=y
CONFIG_USB_KAWETH=y
CONFIG_USB_PEGASUS=y
CONFIG_USB_RTL8150=y
# CONFIG_USB_RTL8152 is not set
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
CONFIG_USB_NET_CDCETHER=y
CONFIG_USB_NET_CDC_EEM=y
CONFIG_USB_NET_CDC_NCM=y
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
CONFIG_USB_NET_DM9601=y
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
CONFIG_USB_NET_SMSC75XX=y
CONFIG_USB_NET_SMSC95XX=y
CONFIG_USB_NET_GL620A=y
CONFIG_USB_NET_NET1080=y
CONFIG_USB_NET_PLUSB=y
CONFIG_USB_NET_MCS7830=y
CONFIG_USB_NET_RNDIS_HOST=y
CONFIG_USB_NET_CDC_SUBSET_ENABLE=y
CONFIG_USB_NET_CDC_SUBSET=y
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
CONFIG_USB_NET_ZAURUS=y
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
CONFIG_USB_NET_INT51X1=y
# CONFIG_USB_IPHETH is not set
# CONFIG_USB_SIERRA_NET is not set
# CONFIG_USB_VL600 is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
# CONFIG_USB_RTL8153_ECM is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K_PCI is not set
CONFIG_WLAN_VENDOR_ATMEL=y
CONFIG_WLAN_VENDOR_BROADCOM=y
CONFIG_WLAN_VENDOR_CISCO=y
CONFIG_WLAN_VENDOR_INTEL=y
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
CONFIG_WLAN_VENDOR_MARVELL=y
CONFIG_WLAN_VENDOR_MEDIATEK=y
CONFIG_WLAN_VENDOR_MICROCHIP=y
CONFIG_WLAN_VENDOR_PURELIFI=y
CONFIG_WLAN_VENDOR_RALINK=y
CONFIG_WLAN_VENDOR_REALTEK=y
CONFIG_WLAN_VENDOR_RSI=y
CONFIG_WLAN_VENDOR_SILABS=y
CONFIG_WLAN_VENDOR_ST=y
CONFIG_WLAN_VENDOR_TI=y
CONFIG_WLAN_VENDOR_ZYDAS=y
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_PCMCIA_RAYCS is not set
# CONFIG_WAN is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=y
# CONFIG_ISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_SPARSEKMAP=y
# CONFIG_INPUT_MATRIXKMAP is not set
CONFIG_INPUT_VIVALDIFMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
CONFIG_KEYBOARD_NEWTON=y
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
CONFIG_KEYBOARD_XTKBD=y
# CONFIG_KEYBOARD_CYPRESS_SF is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_SMBUS=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_CYAPA is not set
# CONFIG_MOUSE_ELAN_I2C is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_AD714X is not set
# CONFIG_INPUT_BMA150 is not set
# CONFIG_INPUT_E3X0_BUTTON is not set
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_MMA8450 is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_KXTJ9 is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=y
# CONFIG_INPUT_PCF8574 is not set
# CONFIG_INPUT_DA7280_HAPTICS is not set
# CONFIG_INPUT_ADXL34X is not set
# CONFIG_INPUT_IMS_PCU is not set
# CONFIG_INPUT_IQS269A is not set
# CONFIG_INPUT_IQS626A is not set
# CONFIG_INPUT_IQS7222 is not set
# CONFIG_INPUT_CMA3000 is not set
# CONFIG_INPUT_IDEAPAD_SLIDEBAR is not set
# CONFIG_INPUT_DRV2665_HAPTICS is not set
# CONFIG_INPUT_DRV2667_HAPTICS is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_SERIO_ARC_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
# CONFIG_SERIAL_8250_DW is not set
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y
CONFIG_SERIAL_8250_PERICOM=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_KGDB_NMI is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_CONSOLE_POLL=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
CONFIG_NOZOMI=y
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_TTY_PRINTK is not set
# CONFIG_PRINTER is not set
# CONFIG_PPDEV is not set
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
# CONFIG_HW_RANDOM_AMD is not set
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIA=y
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
# CONFIG_SCR24X is not set
# CONFIG_IPWIRELESS is not set
# end of PCMCIA character devices

# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
CONFIG_NVRAM=y
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HPET_MMAP_DEFAULT=y
CONFIG_HANGCHECK_TIMER=y
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
# CONFIG_TCG_TIS is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
# CONFIG_TCG_TIS_I2C_ATMEL is not set
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_TIS_I2C_NUVOTON is not set
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TCG_CRB is not set
# CONFIG_TCG_VTPM_PROXY is not set
# CONFIG_TCG_TIS_ST33ZP24_I2C is not set
# CONFIG_TELCLOCK is not set
# CONFIG_XILLYBUS is not set
# CONFIG_XILLYUSB is not set
CONFIG_RANDOM_TRUST_CPU=y
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_AMD_MP2 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_ISMT is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_DESIGNWARE_PLATFORM is not set
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_CP2615 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_MLXCPLD is not set
# CONFIG_I2C_VIRTIO is not set
# end of I2C Hardware Bus support

# CONFIG_I2C_STUB is not set
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
# CONFIG_PPS_CLIENT_PARPORT is not set
# CONFIG_PPS_CLIENT_GPIO is not set

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_IP5XXX_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SAMSUNG_SDI is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_MAX77976 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_BATTERY_GOLDFISH is not set
# CONFIG_BATTERY_RT5033 is not set
# CONFIG_CHARGER_BD99954 is not set
# CONFIG_BATTERY_UG3105 is not set
CONFIG_HWMON=y
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM1177 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7410 is not set
# CONFIG_SENSORS_ADT7411 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_AHT10 is not set
# CONFIG_SENSORS_AQUACOMPUTER_D5NEXT is not set
# CONFIG_SENSORS_AS370 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_FAM15H_POWER is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ASPEED is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_CORSAIR_PSU is not set
# CONFIG_SENSORS_DRIVETEMP is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_DELL_SMM is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_FTSTEUTATES is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
# CONFIG_SENSORS_I5500 is not set
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_POWR1220 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2990 is not set
# CONFIG_SENSORS_LTC4151 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4222 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4260 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_MAX127 is not set
# CONFIG_SENSORS_MAX16065 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX1668 is not set
# CONFIG_SENSORS_MAX197 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6620 is not set
# CONFIG_SENSORS_MAX6621 is not set
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6642 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_MAX6697 is not set
# CONFIG_SENSORS_MAX31790 is not set
# CONFIG_SENSORS_MCP3021 is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MR75203 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM73 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LM95234 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_LM95245 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_NCT6683 is not set
# CONFIG_SENSORS_NCT6775 is not set
# CONFIG_SENSORS_NCT6775_I2C is not set
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
# CONFIG_SENSORS_NZXT_KRAKEN2 is not set
# CONFIG_SENSORS_NZXT_SMART2 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_PMBUS is not set
# CONFIG_SENSORS_SBTSI is not set
# CONFIG_SENSORS_SBRMI is not set
# CONFIG_SENSORS_SHT21 is not set
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHT4x is not set
# CONFIG_SENSORS_SHTC1 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SY7636A is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_EMC6W201 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_SCH5636 is not set
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_AMC6821 is not set
# CONFIG_SENSORS_INA209 is not set
# CONFIG_SENSORS_INA2XX is not set
# CONFIG_SENSORS_INA238 is not set
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_TMP464 is not set
# CONFIG_SENSORS_TMP513 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83773G is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ASUS_WMI is not set
# CONFIG_SENSORS_ASUS_WMI_EC is not set
# CONFIG_SENSORS_ASUS_EC is not set
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
# CONFIG_THERMAL_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_GOV_STEP_WISE=y
# CONFIG_THERMAL_GOV_BANG_BANG is not set
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
# CONFIG_INTEL_POWERCLAMP is not set
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_PKG_TEMP_THERMAL=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
# CONFIG_INT340X_THERMAL is not set
# end of ACPI INT340X thermal drivers

# CONFIG_INTEL_PCH_THERMAL is not set
# CONFIG_INTEL_TCC_COOLING is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_INTEL_HFI_THERMAL is not set
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
# CONFIG_WATCHDOG_SYSFS is not set
# CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=y
# CONFIG_WDAT_WDT is not set
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_EBC_C384_WDT is not set
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
# CONFIG_SBC_FITPC2_WATCHDOG is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
# CONFIG_IE6XX_WDT is not set
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
# CONFIG_VIA_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
# CONFIG_SSB_PCMCIAHOST is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_INTEL_LPSS_ACPI is not set
# CONFIG_MFD_INTEL_LPSS_PCI is not set
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT4831 is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SI476X_CORE is not set
# CONFIG_MFD_SIMPLE_MFD_I2C is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SKY81452 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_ATC260X_I2C is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
# CONFIG_AGP_AMD64 is not set
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
CONFIG_INTEL_GTT=y
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_MIPI_DSI=y
# CONFIG_DRM_DEBUG_MM is not set
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=y
# CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS is not set
CONFIG_DRM_DEBUG_MODESET_LOCK=y
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
# CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM is not set
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_DISPLAY_HELPER=y
CONFIG_DRM_DISPLAY_DP_HELPER=y
CONFIG_DRM_DISPLAY_HDCP_HELPER=y
CONFIG_DRM_DISPLAY_HDMI_HELPER=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=y
CONFIG_DRM_BUDDY=y

#
# I2C encoder or helper chips
#
# CONFIG_DRM_I2C_CH7006 is not set
# CONFIG_DRM_I2C_SIL164 is not set
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y

#
# drm/i915 Debugging
#
# CONFIG_DRM_I915_WERROR is not set
# CONFIG_DRM_I915_DEBUG is not set
# CONFIG_DRM_I915_DEBUG_MMIO is not set
# CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS is not set
# CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is not set
# CONFIG_DRM_I915_DEBUG_GUC is not set
# CONFIG_DRM_I915_SELFTEST is not set
# CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS is not set
# CONFIG_DRM_I915_DEBUG_VBLANK_EVADE is not set
# CONFIG_DRM_I915_DEBUG_RUNTIME_PM is not set
# end of drm/i915 Debugging

#
# drm/i915 Profile Guided Optimisation
#
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
# end of drm/i915 Profile Guided Optimisation

# CONFIG_DRM_VGEM is not set
# CONFIG_DRM_VKMS is not set
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_UDL is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_QXL is not set
# CONFIG_DRM_VIRTIO_GPU is not set
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
# CONFIG_DRM_BOCHS is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_DRM_GM12U320 is not set
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_GUD is not set
# CONFIG_DRM_SSD130X is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y
CONFIG_DRM_NOMODESET=y
CONFIG_DRM_PRIVACY_SCREEN=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=y
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3639 is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION is not set
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

# CONFIG_LOGO is not set
# end of Graphics support

CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_SEQ_DEVICE=y
CONFIG_SND_JACK=y
CONFIG_SND_JACK_INPUT_DEV=y
# CONFIG_SND_OSSEMUL is not set
CONFIG_SND_PCM_TIMER=y
CONFIG_SND_HRTIMER=y
CONFIG_SND_DYNAMIC_MINORS=y
CONFIG_SND_MAX_CARDS=32
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_PROC_FS=y
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_VERBOSE=y
CONFIG_SND_PCM_XRUN_DEBUG=y
# CONFIG_SND_CTL_VALIDATION is not set
# CONFIG_SND_JACK_INJECTION_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_DMA_SGBUF=y
CONFIG_SND_CTL_LED=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
CONFIG_SND_DRIVERS=y
CONFIG_SND_PCSP=m
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_ALOOP is not set
# CONFIG_SND_VIRMIDI is not set
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_MTS64 is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
# CONFIG_SND_PORTMAN2X4 is not set
CONFIG_SND_PCI=y
# CONFIG_SND_AD1889 is not set
# CONFIG_SND_ALS300 is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ASIHPI is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AW2 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
# CONFIG_SND_CA0106 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_OXYGEN is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CTXFI is not set
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
# CONFIG_SND_DARLA24 is not set
# CONFIG_SND_GINA24 is not set
# CONFIG_SND_LAYLA24 is not set
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
# CONFIG_SND_INDIGO is not set
# CONFIG_SND_INDIGOIO is not set
# CONFIG_SND_INDIGODJ is not set
# CONFIG_SND_INDIGOIOX is not set
# CONFIG_SND_INDIGODJX is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_EMU10K1X is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_HDSP is not set
# CONFIG_SND_HDSPM is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
# CONFIG_SND_INTEL8X0 is not set
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_LOLA is not set
# CONFIG_SND_LX6464ES is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_PCXHR is not set
# CONFIG_SND_RIPTIDE is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_SE6X is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VIRTUOSO is not set
# CONFIG_SND_VX222 is not set
# CONFIG_SND_YMFPCI is not set

#
# HD-Audio
#
CONFIG_SND_HDA=y
CONFIG_SND_HDA_GENERIC_LEDS=y
CONFIG_SND_HDA_INTEL=y
CONFIG_SND_HDA_HWDEP=y
CONFIG_SND_HDA_RECONFIG=y
CONFIG_SND_HDA_INPUT_BEEP=y
CONFIG_SND_HDA_INPUT_BEEP_MODE=1
CONFIG_SND_HDA_PATCH_LOADER=y
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_HDMI=y
CONFIG_SND_HDA_CODEC_CIRRUS=y
# CONFIG_SND_HDA_CODEC_CS8409 is not set
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CA0110=y
CONFIG_SND_HDA_CODEC_CA0132=y
CONFIG_SND_HDA_CODEC_CA0132_DSP=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
# CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM is not set
# end of HD-Audio

CONFIG_SND_HDA_CORE=y
CONFIG_SND_HDA_DSP_LOADER=y
CONFIG_SND_HDA_COMPONENT=y
CONFIG_SND_HDA_I915=y
CONFIG_SND_HDA_PREALLOC_SIZE=0
CONFIG_SND_INTEL_NHLT=y
CONFIG_SND_INTEL_DSP_CONFIG=y
CONFIG_SND_INTEL_SOUNDWIRE_ACPI=y
# CONFIG_SND_USB is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
# CONFIG_SND_SOC is not set
CONFIG_SND_X86=y
# CONFIG_HDMI_LPE_AUDIO is not set
# CONFIG_SND_VIRTIO is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
# CONFIG_HIDRAW is not set
# CONFIG_UHID is not set
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
# CONFIG_HID_ACCUTOUCH is not set
# CONFIG_HID_ACRUX is not set
CONFIG_HID_APPLE=y
# CONFIG_HID_APPLEIR is not set
# CONFIG_HID_ASUS is not set
# CONFIG_HID_AUREAL is not set
CONFIG_HID_BELKIN=y
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
# CONFIG_HID_PRODIKEYS is not set
# CONFIG_HID_CMEDIA is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=y
# CONFIG_HID_DRAGONRISE is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
# CONFIG_HID_ELECOM is not set
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=y
# CONFIG_HID_GEMBIRD is not set
# CONFIG_HID_GFRM is not set
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_VIVALDI is not set
# CONFIG_HID_GT683R is not set
# CONFIG_HID_KEYTOUCH is not set
# CONFIG_HID_KYE is not set
# CONFIG_HID_UCLOGIC is not set
# CONFIG_HID_WALTOP is not set
# CONFIG_HID_VIEWSONIC is not set
# CONFIG_HID_XIAOMI is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_ICADE is not set
# CONFIG_HID_ITE is not set
# CONFIG_HID_JABRA is not set
# CONFIG_HID_TWINHAN is not set
CONFIG_HID_KENSINGTON=y
# CONFIG_HID_LCPOWER is not set
# CONFIG_HID_LED is not set
# CONFIG_HID_LENOVO is not set
# CONFIG_HID_LETSKETCH is not set
CONFIG_HID_LOGITECH=y
# CONFIG_HID_LOGITECH_HIDPP is not set
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
# CONFIG_HID_MAGICMOUSE is not set
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_MEGAWORLD_FF is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
# CONFIG_HID_MULTITOUCH is not set
# CONFIG_HID_NINTENDO is not set
# CONFIG_HID_NTI is not set
# CONFIG_HID_NTRIG is not set
# CONFIG_HID_ORTEK is not set
# CONFIG_HID_PANTHERLORD is not set
# CONFIG_HID_PENMOUNT is not set
# CONFIG_HID_PETALYNX is not set
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_PLANTRONICS is not set
# CONFIG_HID_RAZER is not set
# CONFIG_HID_PRIMAX is not set
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
# CONFIG_HID_SAITEK is not set
# CONFIG_HID_SAMSUNG is not set
# CONFIG_HID_SEMITEK is not set
# CONFIG_HID_SIGMAMICRO is not set
# CONFIG_HID_SONY is not set
# CONFIG_HID_SPEEDLINK is not set
# CONFIG_HID_STEAM is not set
# CONFIG_HID_STEELSERIES is not set
# CONFIG_HID_SUNPLUS is not set
# CONFIG_HID_RMI is not set
# CONFIG_HID_GREENASIA is not set
# CONFIG_HID_SMARTJOYPLUS is not set
# CONFIG_HID_TIVO is not set
# CONFIG_HID_TOPSEED is not set
# CONFIG_HID_THINGM is not set
# CONFIG_HID_THRUSTMASTER is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
# CONFIG_HID_WIIMOTE is not set
# CONFIG_HID_XINMO is not set
# CONFIG_HID_ZEROPLUS is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_SENSOR_HUB is not set
# CONFIG_HID_ALPS is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
CONFIG_USB_HIDDEV=y
# end of USB HID support

#
# I2C HID support
#
# CONFIG_I2C_HID_ACPI is not set
# end of I2C HID support

#
# Intel ISH HID support
#
# CONFIG_INTEL_ISH_HID is not set
# end of Intel ISH HID support

#
# AMD SFH HID Support
#
# CONFIG_AMD_SFH_HID is not set
# end of AMD SFH HID Support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_FEW_INIT_RETRIES is not set
CONFIG_USB_DYNAMIC_MINORS=y
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_PRODUCTLIST is not set
# CONFIG_USB_OTG_DISABLE_EXTERNAL_HUB is not set
# CONFIG_USB_LEDS_TRIGGER_USBPORT is not set
CONFIG_USB_AUTOSUSPEND_DELAY=2
# CONFIG_USB_MON is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_OHCI_HCD is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_SSB is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
CONFIG_USB_STORAGE_ONETOUCH=y
CONFIG_USB_STORAGE_KARMA=y
CONFIG_USB_STORAGE_CYPRESS_ATACB=y
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS_SUPPORT is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_CONSOLE=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
CONFIG_USB_SERIAL_BELKIN=y
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
CONFIG_USB_SERIAL_MCT_U232=y
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
# CONFIG_USB_SERIAL_XR is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=y
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
# CONFIG_TYPEC is not set
# CONFIG_USB_ROLE_SWITCH is not set
# CONFIG_MMC is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_INTEL_SS4200 is not set
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
# CONFIG_LEDS_BLINKM is not set
# CONFIG_LEDS_MLXCPLD is not set
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# Flash and Torch LED drivers
#

#
# RGB LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_ONESHOT is not set
# CONFIG_LEDS_TRIGGER_DISK is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_LEDS_TRIGGER_TRANSIENT is not set
# CONFIG_LEDS_TRIGGER_CAMERA is not set
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=y
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# Simple LED drivers
#
CONFIG_ACCESSIBILITY=y
# CONFIG_A11Y_BRAILLE_CONSOLE is not set

#
# Speakup console speech
#
# CONFIG_SPEAKUP is not set
# end of Speakup console speech

# CONFIG_INFINIBAND is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_AMD64 is not set
# CONFIG_EDAC_E752X is not set
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I3200 is not set
# CONFIG_EDAC_IE31200 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
# CONFIG_EDAC_I7CORE is not set
# CONFIG_EDAC_I5000 is not set
# CONFIG_EDAC_I5100 is not set
# CONFIG_EDAC_I7300 is not set
# CONFIG_EDAC_SBRIDGE is not set
# CONFIG_EDAC_SKX is not set
# CONFIG_EDAC_I10NM is not set
# CONFIG_EDAC_PND2 is not set
# CONFIG_EDAC_IGEN6 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_SYSTOHC=y
CONFIG_RTC_SYSTOHC_DEVICE="rtc0"
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
CONFIG_RTC_DRV_TEST=y

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8523 is not set
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8010 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
# CONFIG_RTC_DRV_EM3027 is not set
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV3032 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_PCF2127 is not set
# CONFIG_RTC_DRV_RV3029C2 is not set
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_DS2404 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
# CONFIG_RTC_DRV_GOLDFISH is not set
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
# CONFIG_INTEL_IDMA64 is not set
# CONFIG_INTEL_IDXD_COMPAT is not set
CONFIG_INTEL_IOATDMA=y
# CONFIG_PLX_DMA is not set
# CONFIG_AMD_PTDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
# CONFIG_DW_DMAC is not set
# CONFIG_DW_DMAC_PCI is not set
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set
# CONFIG_INTEL_LDMA is not set

#
# DMA Clients
#
# CONFIG_ASYNC_TX_DMA is not set
# CONFIG_DMATEST is not set
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
CONFIG_DMABUF_DEBUG=y
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

CONFIG_DCA=y
CONFIG_AUXDISPLAY=y
# CONFIG_IMG_ASCII_LCD is not set
# CONFIG_HT16K33 is not set
# CONFIG_LCD2S is not set
# CONFIG_PARPORT_PANEL is not set
# CONFIG_CHARLCD_BL_OFF is not set
# CONFIG_CHARLCD_BL_ON is not set
CONFIG_CHARLCD_BL_FLASH=y
# CONFIG_PANEL is not set
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_DMEM_GENIRQ is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_UIO_PCI_GENERIC is not set
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
# CONFIG_VFIO is not set
CONFIG_IRQ_BYPASS_MANAGER=y
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=y
CONFIG_VIRTIO_PCI_LIB_LEGACY=y
CONFIG_VIRTIO_MENU=y
# CONFIG_VIRTIO_HARDEN_NOTIFICATION is not set
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=y
# CONFIG_VIRTIO_MEM is not set
# CONFIG_VIRTIO_INPUT is not set
CONFIG_VIRTIO_MMIO=y
# CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=y
CONFIG_VHOST=y
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=y
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
CONFIG_STAGING=y
# CONFIG_RTL8192U is not set
# CONFIG_RTLLIB is not set
# CONFIG_RTS5208 is not set
# CONFIG_FB_SM750 is not set
# CONFIG_STAGING_MEDIA is not set
# CONFIG_LTE_GDM724X is not set
# CONFIG_FIELDBUS_DEV is not set
# CONFIG_QLGE is not set

#
# VME Device Drivers
#
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=y
CONFIG_WMI_BMOF=y
# CONFIG_HUAWEI_WMI is not set
# CONFIG_MXM_WMI is not set
# CONFIG_PEAQ_WMI is not set
# CONFIG_NVIDIA_WMI_EC_BACKLIGHT is not set
# CONFIG_XIAOMI_WMI is not set
# CONFIG_GIGABYTE_WMI is not set
# CONFIG_YOGABOOK_WMI is not set
# CONFIG_ACERHDF is not set
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=y
# CONFIG_AMD_PMC is not set
# CONFIG_AMD_HSMP is not set
# CONFIG_ADV_SWBUTTON is not set
# CONFIG_APPLE_GMUX is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_ASUS_WIRELESS is not set
# CONFIG_ASUS_WMI is not set
CONFIG_EEEPC_LAPTOP=y
# CONFIG_X86_PLATFORM_DRIVERS_DELL is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
# CONFIG_GPD_POCKET_FAN is not set
# CONFIG_HP_ACCEL is not set
# CONFIG_WIRELESS_HOTKEY is not set
CONFIG_HP_WMI=y
# CONFIG_IBM_RTL is not set
# CONFIG_SENSORS_HDAPS is not set
CONFIG_THINKPAD_ACPI=y
CONFIG_THINKPAD_ACPI_ALSA_SUPPORT=y
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_THINKPAD_LMI is not set
# CONFIG_INTEL_ATOMISP2_PM is not set
# CONFIG_INTEL_SAR_INT1092 is not set
# CONFIG_INTEL_PMC_CORE is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
# CONFIG_INTEL_WMI_THUNDERBOLT is not set

#
# Intel Uncore Frequency Control
#
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
# end of Intel Uncore Frequency Control

# CONFIG_INTEL_HID_EVENT is not set
# CONFIG_INTEL_VBTN is not set
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_RST is not set
# CONFIG_INTEL_SMARTCONNECT is not set
# CONFIG_INTEL_TURBO_MAX_3 is not set
# CONFIG_INTEL_VSEC is not set
# CONFIG_MSI_WMI is not set
# CONFIG_SAMSUNG_LAPTOP is not set
# CONFIG_SAMSUNG_Q10 is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_LG_LAPTOP is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_SYSTEM76_ACPI is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_MLX_PLATFORM is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
# CONFIG_SIEMENS_SIMATIC_IPC is not set
# CONFIG_WINMATE_FM07_KEYS is not set
CONFIG_PMC_ATOM=y
# CONFIG_CHROME_PLATFORMS is not set
# CONFIG_MELLANOX_PLATFORM is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_GPE is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_XILINX_VCU is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_AMD_IOMMU is not set
# CONFIG_INTEL_IOMMU is not set
# CONFIG_IRQ_REMAP is not set
# CONFIG_VIRTIO_IOMMU is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_USB_LGM_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

# CONFIG_LIBNVDIMM is not set
# CONFIG_DAX is not set
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_RMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
# CONFIG_JFS_POSIX_ACL is not set
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_XFS_FS=y
CONFIG_XFS_SUPPORT_V4=y
# CONFIG_XFS_QUOTA is not set
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=y
# CONFIG_BTRFS_FS_POSIX_ACL is not set
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
CONFIG_NILFS2_FS=y
# CONFIG_F2FS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
# CONFIG_QFMT_V1 is not set
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
CONFIG_FUSE_FS=y
# CONFIG_CUSE is not set
# CONFIG_VIRTIO_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
# CONFIG_JOLIET is not set
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
# CONFIG_PROC_VMCORE_DEVICE_DUMP is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_TMPFS_XATTR is not set
# CONFIG_TMPFS_INODE64 is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
# CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON is not set
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=m
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=y
CONFIG_CRAMFS_BLOCKDEV=y
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
# CONFIG_NFS_SWAP is not set
# CONFIG_NFS_V4_1 is not set
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
# CONFIG_NFSD_SCSILAYOUT is not set
# CONFIG_NFSD_FLEXFILELAYOUT is not set
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
# CONFIG_SUNRPC_DEBUG is not set
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=y
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
# CONFIG_CIFS_UPCALL is not set
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
CONFIG_CIFS_DEBUG2=y
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
# CONFIG_CIFS_DFS_UPCALL is not set
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_ROOT is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=y
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=y
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_TRUSTED_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=y
CONFIG_ASYNC_MEMCPY=y
CONFIG_ASYNC_XOR=y
CONFIG_ASYNC_PQ=y
CONFIG_ASYNC_RAID6_RECOV=y
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
# CONFIG_CRYPTO_PCRYPT is not set
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Public-key cryptography
#
# CONFIG_CRYPTO_RSA is not set
# CONFIG_CRYPTO_DH is not set
# CONFIG_CRYPTO_ECDH is not set
# CONFIG_CRYPTO_ECDSA is not set
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_SM2 is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CFB is not set
CONFIG_CRYPTO_CTR=y
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_OFB is not set
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=y
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_CRC32 is not set
# CONFIG_CRYPTO_CRC32_PCLMUL is not set
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
# CONFIG_CRYPTO_CRCT10DIF_PCLMUL is not set
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=y
# CONFIG_CRYPTO_POLY1305 is not set
# CONFIG_CRYPTO_POLY1305_X86_64 is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA1_SSSE3 is not set
# CONFIG_CRYPTO_SHA256_SSSE3 is not set
# CONFIG_CRYPTO_SHA512_SSSE3 is not set
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_SHA3 is not set
# CONFIG_CRYPTO_SM3_GENERIC is not set
# CONFIG_CRYPTO_SM3_AVX_X86_64 is not set
# CONFIG_CRYPTO_STREEBOG is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_BLOWFISH_X86_64 is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAMELLIA_X86_64 is not set
# CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64 is not set
# CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64 is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST5_AVX_X86_64 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_CAST6_AVX_X86_64 is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_DES3_EDE_X86_64 is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_CHACHA20 is not set
# CONFIG_CRYPTO_CHACHA20_X86_64 is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_SERPENT_SSE2_X86_64 is not set
# CONFIG_CRYPTO_SERPENT_AVX_X86_64 is not set
# CONFIG_CRYPTO_SERPENT_AVX2_X86_64 is not set
# CONFIG_CRYPTO_SM4_GENERIC is not set
# CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64 is not set
# CONFIG_CRYPTO_SM4_AESNI_AVX2_X86_64 is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set
# CONFIG_CRYPTO_TWOFISH_X86_64_3WAY is not set
# CONFIG_CRYPTO_TWOFISH_AVX_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_LZO is not set
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
# CONFIG_CRYPTO_DEV_CCP is not set
# CONFIG_CRYPTO_DEV_QAT_DH895xCC is not set
# CONFIG_CRYPTO_DEV_QAT_C3XXX is not set
# CONFIG_CRYPTO_DEV_QAT_C62X is not set
# CONFIG_CRYPTO_DEV_QAT_4XXX is not set
# CONFIG_CRYPTO_DEV_QAT_DH895xCCVF is not set
# CONFIG_CRYPTO_DEV_QAT_C3XXXVF is not set
# CONFIG_CRYPTO_DEV_QAT_C62XVF is not set
# CONFIG_CRYPTO_DEV_NITROX_CNN55XX is not set
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
# CONFIG_ASYMMETRIC_KEY_TYPE is not set

#
# Certificates for signature checking
#
# CONFIG_SYSTEM_BLACKLIST_KEYRING is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
# CONFIG_CORDIC is not set
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_LIB_MEMNEQ=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
# CONFIG_CRC4 is not set
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
# CONFIG_CRC8 is not set
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=y
CONFIG_TEXTSEARCH_BM=y
CONFIG_TEXTSEARCH_FSM=y
CONFIG_BTREE=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
CONFIG_DMA_API_DEBUG=y
# CONFIG_DMA_API_DEBUG_SG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
# CONFIG_CPUMASK_OFFSTACK is not set
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_IRQ_POLL=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACKDEPOT_ALWAYS_INIT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DYNAMIC_DEBUG_CORE is not set
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO_NONE=y
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
# CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B is not set
CONFIG_FRAME_POINTER=y
CONFIG_OBJTOOL=y
# CONFIG_STACK_VALIDATION is not set
# CONFIG_VMLINUX_MAP is not set
CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_HONOUR_BLOCKLIST=y
CONFIG_KGDB_SERIAL_CONSOLE=y
# CONFIG_KGDB_TESTS is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
CONFIG_ARCH_HAS_EARLY_DEBUG=y
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# CONFIG_KCSAN is not set
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
CONFIG_DEBUG_PAGEALLOC=y
# CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT is not set
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_TABLE_CHECK is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
# CONFIG_DEBUG_OBJECTS_FREE is not set
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
# CONFIG_DEBUG_OBJECTS_WORK is not set
# CONFIG_DEBUG_OBJECTS_RCU_HEAD is not set
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
# CONFIG_DEBUG_VM_RB is not set
# CONFIG_DEBUG_VM_PGFLAGS is not set
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP=y
# CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
# CONFIG_SOFTLOCKUP_DETECTOR is not set
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
# CONFIG_HARDLOCKUP_DETECTOR is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set
CONFIG_DEBUG_PREEMPT=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
CONFIG_TRACE_IRQFLAGS_NMI=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
CONFIG_DEBUG_KOBJECT=y

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=y
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=y
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
CONFIG_RCU_TRACE=y
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_RETHOOK=y
CONFIG_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
# CONFIG_FPROBE is not set
CONFIG_FUNCTION_PROFILER=y
# CONFIG_STACK_TRACER is not set
CONFIG_IRQSOFF_TRACER=y
# CONFIG_PREEMPT_TRACER is not set
CONFIG_SCHED_TRACER=y
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
# CONFIG_SYNTH_EVENTS is not set
# CONFIG_HIST_TRIGGERS is not set
# CONFIG_TRACE_EVENT_INJECT is not set
CONFIG_TRACEPOINT_BENCHMARK=y
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y
CONFIG_EVENT_TRACE_STARTUP_TEST=y
# CONFIG_EVENT_TRACE_TEST_SYSCALLS is not set
# CONFIG_FTRACE_SORT_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_EARLY_PRINTK_USB_XDBC is not set
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
CONFIG_DEBUG_NMI_SELFTEST=y
CONFIG_X86_DEBUG_FPU=y
# CONFIG_PUNIT_ATOM_DEBUG is not set
# CONFIG_UNWINDER_ORC is not set
CONFIG_UNWINDER_FRAME_POINTER=y
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
CONFIG_FAIL_PAGE_ALLOC=y
# CONFIG_FAULT_INJECTION_USERCOPY is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
# CONFIG_FAIL_FUNCTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=y
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_DIV64 is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_DEBUG_VIRTUAL is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_FREE_PAGES is not set
# CONFIG_TEST_FPU is not set
# CONFIG_TEST_CLOCKSOURCE_WATCHDOG is not set
CONFIG_ARCH_USE_MEMTEST=y
CONFIG_MEMTEST=y
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 16:57 ` [patch 00/38] x86/retbleed: " Steven Rostedt
@ 2022-07-20 17:09   ` Linus Torvalds
  2022-07-20 17:24     ` Peter Zijlstra
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-20 17:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, the arch/x86 maintainers, Tim Chen,
	Josh Poimboeuf, Andrew Cooper, Pawan Gupta, Johannes Wikner,
	Alyssa Milburn, Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 20, 2022 at 9:57 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> [    2.488712] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:558 apply_returns+0xa3/0x1ec

That warning is kind of annoying, in how it doesn't actually give any
information about where the problem is.

I do note that we only fix up JMP32_INSN_OPCODE, and I wonder if we
have a "jmp __x86_return_thunk" that is close enough to the return
thunk that it actually uses a byte offset?

But that WARN_ON_ONCE() should probably be changed to actually give
some information about where the problem is.

The silly thing is, there's even debug output in that function that
you could enable, but it will enable output for the *normal* case, not
for the WARN_ON_ONCE() case or the "we didn't do anything" case. That
seems a bit backwards.

               Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 17:09   ` Linus Torvalds
@ 2022-07-20 17:24     ` Peter Zijlstra
  2022-07-20 17:50       ` Steven Rostedt
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-20 17:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 20, 2022 at 10:09:37AM -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2022 at 9:57 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > [    2.488712] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:558 apply_returns+0xa3/0x1ec
> 
> That warning is kind of annoying, in how it doesn't actually give any
> information about where the problem is.
> 
> I do note that we only fix up JMP32_INSN_OPCODE, and I wonder if we
> have a "jmp __x86_return_thunk" that is close enough to the return
> thunk that it actually uses a byte offset?
> 
> But that WARN_ON_ONCE() should probably be changed to actually give
> some information about where the problem is.

There's a patch for that:

  https://lkml.kernel.org/r/20220713213819.460771-1-keescook@chromium.org

it seems to have gotten lost, let me go queue that.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 17:24     ` Peter Zijlstra
@ 2022-07-20 17:50       ` Steven Rostedt
  2022-07-20 18:07         ` Linus Torvalds
  0 siblings, 1 reply; 142+ messages in thread
From: Steven Rostedt @ 2022-07-20 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Wed, 20 Jul 2022 19:24:32 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> There's a patch for that:
> 
>   https://lkml.kernel.org/r/20220713213819.460771-1-keescook@chromium.org
> 
> it seems to have gotten lost, let me go queue that.

With this applied, I now have:

[    2.442118] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    2.443117] SRBDS: Vulnerable: No microcode
[    2.463339] ------------[ cut here ]------------
[    2.464117] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x8-lkdtm_rodata_do_nothing+0x5/0x8: e9 00 00 00 00
[    2.464128] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:558 apply_returns+0xcb/0x219
[    2.466117] Modules linked in:
[    2.467117] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc7-test+ #66
[    2.468117] Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
[    2.469117] RIP: 0010:apply_returns+0xcb/0x219
[    2.470117] Code: 80 3d d1 32 06 02 00 75 59 49 89 d8 b9 05 00 00 00 4c 89 e2 48 89 de 48 c7 c7 63 98 b4 ae c6 05 b3 32 06 02 01 e8 f8 4f fb 00 <0f> 0b eb 34 44 0f b6 65 a2 31 c0 48 8d 55 c1 c6 45 c0 c3 48 89 d7
[    2.471117] RSP: 0000:ffffffffaee03df0 EFLAGS: 00010286
[    2.472117] RAX: 0000000000000000 RBX: ffffffffae73e8a8 RCX: ffffffffae056641
[    2.473117] RDX: ffffffffaee03d78 RSI: 0000000000000001 RDI: 00000000ffffffff
[    2.474117] RBP: ffffffffaee03ea8 R08: 0000000000000000 R09: 00000000ffffffea
[    2.475117] R10: 000000000000001f R11: ffffffffaee03a4d R12: ffffffffae73e8ad
[    2.476117] R13: ffffffffaf550d30 R14: 0000000000000000 R15: ffffffffaee32138
[    2.477117] FS:  0000000000000000(0000) GS:ffff9d71d6400000(0000) knlGS:0000000000000000
[    2.478117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033


-- Steve

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 17:50       ` Steven Rostedt
@ 2022-07-20 18:07         ` Linus Torvalds
  2022-07-20 18:31           ` Steven Rostedt
  2022-07-20 19:36           ` Kees Cook
  0 siblings, 2 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-20 18:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 20, 2022 at 10:50 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> [    2.464117] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x8-lkdtm_rodata_do_nothing+0x5/0x8: e9 00 00 00 00

Well, that looks like a "jmp" instruction that has never been relocated.

The 'e9' is 'jmp', the four zeros after it are either "I'm jumping to
the next instruction" or "I haven't been filled in".

I'm assuming it's the second case.

That lkdtm_rodata_do_nothing thing is odd, and does

    OBJCOPYFLAGS_rodata_objcopy.o   := \
                            --rename-section
.noinstr.text=.rodata,alloc,readonly,load,contents

to put the code in an odd section. I'm assuming this hackery is
related to it then not getting relocated.

                    Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 18:07         ` Linus Torvalds
@ 2022-07-20 18:31           ` Steven Rostedt
  2022-07-20 18:43             ` Linus Torvalds
  2022-07-20 19:36           ` Kees Cook
  1 sibling, 1 reply; 142+ messages in thread
From: Steven Rostedt @ 2022-07-20 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Wed, 20 Jul 2022 11:07:26 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Jul 20, 2022 at 10:50 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > [    2.464117] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x8-lkdtm_rodata_do_nothing+0x5/0x8: e9 00 00 00 00  
> 
> Well, that looks like a "jmp" instruction that has never been relocated.
> 
> The 'e9' is 'jmp', the four zeros after it are either "I'm jumping to
> the next instruction" or "I haven't been filled in".
> 
> I'm assuming it's the second case.
> 
> That lkdtm_rodata_do_nothing thing is odd, and does
> 
>     OBJCOPYFLAGS_rodata_objcopy.o   := \
>                             --rename-section
> .noinstr.text=.rodata,alloc,readonly,load,contents
> 
> to put the code in an odd section. I'm assuming this hackery is
> related to it then not getting relocated.
> 

Right, because this looks to be some magic being done for testing purposes:

static void lkdtm_EXEC_RODATA(void)
{
        execute_location(dereference_function_descriptor(lkdtm_rodata_do_nothing),
                         CODE_AS_IS);
}

static void *setup_function_descriptor(func_desc_t *fdesc, void *dst)
{
        if (!have_function_descriptors())
                return dst;

        memcpy(fdesc, do_nothing, sizeof(*fdesc));
        fdesc->addr = (unsigned long)dst;
        barrier();

        return fdesc;
}

static noinline void execute_location(void *dst, bool write)
{
        void (*func)(void);
        func_desc_t fdesc;
        void *do_nothing_text = dereference_function_descriptor(do_nothing);

        pr_info("attempting ok execution at %px\n", do_nothing_text);
        do_nothing();

        if (write == CODE_WRITE) {
                memcpy(dst, do_nothing_text, EXEC_SIZE);
                flush_icache_range((unsigned long)dst,
                                   (unsigned long)dst + EXEC_SIZE);
        }
        pr_info("attempting bad execution at %px\n", dst);
        func = setup_function_descriptor(&fdesc, dst);
        func();
        pr_err("FAIL: func returned\n");
}

And that appears that it wants to crash, as the code is located in readonly
data.

OBJCOPYFLAGS_rodata_objcopy.o   := \
                        --rename-section .noinstr.text=.rodata,alloc,readonly,load,contents

And because the alternatives fixup tries to write to it, and fails due to
it being readonly, I'm guessing we get this warning.

Thus, is there a way to keep this file from being entered into the
return_sites section?

-- Steve

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 18:31           ` Steven Rostedt
@ 2022-07-20 18:43             ` Linus Torvalds
  2022-07-20 19:11               ` Steven Rostedt
  0 siblings, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-20 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]

On Wed, Jul 20, 2022 at 11:31 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Thus, is there a way to keep this file from being entered into the
> return_sites section?

I think the whole concept is broken.

Testing known-broken code on the expectation that "this won't work
anyway, so we can jump off to code that is broken" is not acceptable.

*If* the test were to fail, it would start executing random code that
hasn't been relocated or fixed up properly.

So I think the whole concept is broken. It relies on the compiler
generating code that can work in a read-only data section, and it's
not clear that that is even physically possible (ie the data section
might be far enough away from a code section that any relocation just
fundamentally cannot happen).

I think it worked purely by mistake, because the code was simple
enough that it didn't need any relocation at all before. But even
without RETHUNK, that was never guaranteed, because any random tracing
or debug code or whatever could have made even that empty function
have code in it that just fundamentally wouldn't work in a non-text
section.

So honestly, I think that test should be removed as a "we used this,
it happened to work almost by mistake, but it doesn't work any more
and it is unfixably broken".

Maybe somebody can come up with an entirely different way to do that
test that isn't so broken, but if so, I think it's going to be using
some other machinery (eg bpf and explicitly marking it read-only and
non-executable), and removing this broken model is the right thing
regardless.

So unless somebody has some one-liner workaround, I really suspect the
fix is to remove all this. The amount of hackery to make it work in
the first place is kind of disgusting anyway.

Since this was a WARN_ONCE(), can you make sure that with this case
removed, nothing else triggers?

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2820 bytes --]

 drivers/misc/lkdtm/Makefile | 11 -----------
 drivers/misc/lkdtm/lkdtm.h  |  3 ---
 drivers/misc/lkdtm/perms.c  |  7 -------
 drivers/misc/lkdtm/rodata.c | 11 -----------
 4 files changed, 32 deletions(-)

diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
index 2e0aa74ac185..4f1059f0cae9 100644
--- a/drivers/misc/lkdtm/Makefile
+++ b/drivers/misc/lkdtm/Makefile
@@ -6,21 +6,10 @@ lkdtm-$(CONFIG_LKDTM)		+= bugs.o
 lkdtm-$(CONFIG_LKDTM)		+= heap.o
 lkdtm-$(CONFIG_LKDTM)		+= perms.o
 lkdtm-$(CONFIG_LKDTM)		+= refcount.o
-lkdtm-$(CONFIG_LKDTM)		+= rodata_objcopy.o
 lkdtm-$(CONFIG_LKDTM)		+= usercopy.o
 lkdtm-$(CONFIG_LKDTM)		+= stackleak.o
 lkdtm-$(CONFIG_LKDTM)		+= cfi.o
 lkdtm-$(CONFIG_LKDTM)		+= fortify.o
 lkdtm-$(CONFIG_PPC_64S_HASH_MMU)	+= powerpc.o
 
-KASAN_SANITIZE_rodata.o		:= n
 KASAN_SANITIZE_stackleak.o	:= n
-KCOV_INSTRUMENT_rodata.o	:= n
-CFLAGS_REMOVE_rodata.o		+= $(CC_FLAGS_LTO)
-
-OBJCOPYFLAGS :=
-OBJCOPYFLAGS_rodata_objcopy.o	:= \
-			--rename-section .noinstr.text=.rodata,alloc,readonly,load,contents
-targets += rodata.o rodata_objcopy.o
-$(obj)/rodata_objcopy.o: $(obj)/rodata.o FORCE
-	$(call if_changed,objcopy)
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 015e0484026b..e58f69077fcd 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -94,7 +94,4 @@ void __init lkdtm_perms_init(void);
 void __init lkdtm_usercopy_init(void);
 void __exit lkdtm_usercopy_exit(void);
 
-/* Special declaration for function-in-rodata. */
-void lkdtm_rodata_do_nothing(void);
-
 #endif
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index b93404d65650..d1a69ef865c2 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -191,12 +191,6 @@ static void lkdtm_EXEC_VMALLOC(void)
 	vfree(vmalloc_area);
 }
 
-static void lkdtm_EXEC_RODATA(void)
-{
-	execute_location(dereference_function_descriptor(lkdtm_rodata_do_nothing),
-			 CODE_AS_IS);
-}
-
 static void lkdtm_EXEC_USERSPACE(void)
 {
 	unsigned long user_addr;
@@ -280,7 +274,6 @@ static struct crashtype crashtypes[] = {
 	CRASHTYPE(EXEC_STACK),
 	CRASHTYPE(EXEC_KMALLOC),
 	CRASHTYPE(EXEC_VMALLOC),
-	CRASHTYPE(EXEC_RODATA),
 	CRASHTYPE(EXEC_USERSPACE),
 	CRASHTYPE(EXEC_NULL),
 	CRASHTYPE(ACCESS_USERSPACE),
diff --git a/drivers/misc/lkdtm/rodata.c b/drivers/misc/lkdtm/rodata.c
deleted file mode 100644
index baacb876d1d9..000000000000
--- a/drivers/misc/lkdtm/rodata.c
+++ /dev/null
@@ -1,11 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * This includes functions that are meant to live entirely in .rodata
- * (via objcopy tricks), to validate the non-executability of .rodata.
- */
-#include "lkdtm.h"
-
-void noinstr lkdtm_rodata_do_nothing(void)
-{
-	/* Does nothing. We just want an architecture agnostic "return". */
-}

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 18:43             ` Linus Torvalds
@ 2022-07-20 19:11               ` Steven Rostedt
  0 siblings, 0 replies; 142+ messages in thread
From: Steven Rostedt @ 2022-07-20 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Juergen Gross, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann

On Wed, 20 Jul 2022 11:43:37 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So unless somebody has some one-liner workaround, I really suspect the
> fix is to remove all this. The amount of hackery to make it work in
> the first place is kind of disgusting anyway.
> 
> Since this was a WARN_ONCE(), can you make sure that with this case
> removed, nothing else triggers?

Actually, this fixes it too:

(and this config boots to completion without warnings).

I'll add this to my full test suite and see if it finishes.

-- Steve

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1f40dad30d50..2dd61d8594f4 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -25,6 +25,7 @@ endif
 ifdef CONFIG_RETHUNK
 RETHUNK_CFLAGS		:= -mfunction-return=thunk-extern
 RETPOLINE_CFLAGS	+= $(RETHUNK_CFLAGS)
+export RETHUNK_CFLAGS
 endif
 
 export RETPOLINE_CFLAGS
diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
index 2e0aa74ac185..fd96ac1617f7 100644
--- a/drivers/misc/lkdtm/Makefile
+++ b/drivers/misc/lkdtm/Makefile
@@ -16,7 +16,7 @@ lkdtm-$(CONFIG_PPC_64S_HASH_MMU)	+= powerpc.o
 KASAN_SANITIZE_rodata.o		:= n
 KASAN_SANITIZE_stackleak.o	:= n
 KCOV_INSTRUMENT_rodata.o	:= n
-CFLAGS_REMOVE_rodata.o		+= $(CC_FLAGS_LTO)
+CFLAGS_REMOVE_rodata.o		+= $(CC_FLAGS_LTO) $(RETHUNK_CFLAGS)
 
 OBJCOPYFLAGS :=
 OBJCOPYFLAGS_rodata_objcopy.o	:= \

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 18:07         ` Linus Torvalds
  2022-07-20 18:31           ` Steven Rostedt
@ 2022-07-20 19:36           ` Kees Cook
  2022-07-20 19:43             ` Steven Rostedt
  2022-07-20 21:36             ` Peter Zijlstra
  1 sibling, 2 replies; 142+ messages in thread
From: Kees Cook @ 2022-07-20 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Peter Zijlstra, Thomas Gleixner, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 20, 2022 at 11:07:26AM -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2022 at 10:50 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > [    2.464117] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x8-lkdtm_rodata_do_nothing+0x5/0x8: e9 00 00 00 00
> 
> Well, that looks like a "jmp" instruction that has never been relocated.

Peter, Josh, and I drilled down into this recently[1] and discussed
some solutions[2].

This test is doing what's expected: it needed an arch-agnostic way to do
a "return", and when the way to do that changed, it also changed (which
would normally be good, but in this case broke it). It's been happily
being used as part of the per-section architectural behavior testing[3]
of execution-vs-expected-memory-permissions for quite a long while now.

I'd rather not remove it (or do it dynamically) since the point is to
test what has been generated by the toolchain/build process and stuffed
into the .rodata section. i.e. making sure gadgets there can't be
executed, that the boot-time section permission-setting works correctly,
etc. Before the retbleed mitigation, this test worked for all
architectures; I'd hate to regress it. :(

-Kees

[1] https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
[2] https://lore.kernel.org/lkml/20220713213133.455599-1-keescook@chromium.org/
[3] e.g. https://linux.kernelci.org/test/plan/id/62d61ee8ef31e0f0faa39bff/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 23:51             ` Peter Zijlstra
  2022-07-20  9:00               ` Thomas Gleixner
  2022-07-20 16:55               ` Sami Tolvanen
@ 2022-07-20 19:42               ` Sami Tolvanen
  2 siblings, 0 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-20 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Andrew Cooper, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne

On Tue, Jul 19, 2022 at 01:51:14AM +0200, Peter Zijlstra wrote:
> That means the offset of +10 lands in the middle of the CALL
> instruction, and since we only have 16 thunks there is a limited number
> of byte patterns available there.
> 
> This really isn't as nice as the -6 but might just work well enough,
> hmm?

pcc pointed out that we can also just add two more ud2 instructions to
the check sequence if we want to be safe, with the cost of extra four
bytes per callsite.

> Also, since we're talking at least 4 bytes more padding over the 7 that
> are required by the kCFI scheme, the FineIBT alternative gets a little
> more room to breathe. I'm thinking we can have the FineIBT landing site
> at -16.
> 
> __fineibt_\func:
> 	endbr64				# 4
> 	xorl	$0x12345678, %r10d	# 7
> 	je	\func+4			# 2
> 	ud2				# 2
> 
> \func:
> 	nop4
> 	...

I assume this means the preamble must also be 16-byte aligned to avoid
performance issues with the FineIBT alternative? Which means we'll have
a 16-byte preamble preceded by the usual nop padding.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 19:36           ` Kees Cook
@ 2022-07-20 19:43             ` Steven Rostedt
  2022-07-20 21:36             ` Peter Zijlstra
  1 sibling, 0 replies; 142+ messages in thread
From: Steven Rostedt @ 2022-07-20 19:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Peter Zijlstra, Thomas Gleixner, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Wed, 20 Jul 2022 12:36:38 -0700
Kees Cook <keescook@chromium.org> wrote:

> I'd rather not remove it (or do it dynamically) since the point is to
> test what has been generated by the toolchain/build process and stuffed
> into the .rodata section. i.e. making sure gadgets there can't be
> executed, that the boot-time section permission-setting works correctly,
> etc. Before the retbleed mitigation, this test worked for all
> architectures; I'd hate to regress it. :(

If you haven't noticed my reply, I wasn't able to come up with a one line
workaround, but I was able to come up with a two line workaround. Hopefully
that will be good enough to keep your little feature.

  https://lore.kernel.org/all/20220720151123.0e5bf61e@gandalf.local.home/

I'm currently running it under my entire ftrace test suite. If it passes,
I'll submit a formal patch.

-- Steve

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-19 17:19                             ` Sami Tolvanen
@ 2022-07-20 21:13                               ` Peter Zijlstra
  2022-07-21  8:21                                 ` David Laight
  2022-07-21 15:54                                 ` Peter Zijlstra
  0 siblings, 2 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-20 21:13 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Tue, Jul 19, 2022 at 10:19:18AM -0700, Sami Tolvanen wrote:

> Clang's current CFI implementation is somewhat similar to this. It
> creates separate thunks for address-taken functions and changes
> function addresses in C code to point to the thunks instead.
> 
> While this works, it creates painful situations when interacting with
> assembly (e.g. a function address taken in assembly cannot be used
> for indirect calls in C as it doesn't point to the thunk) and needs
> unpleasant hacks when we want take the actual function address in C
> (i.e. scattering the code with function_nocfi() calls).
> 
> I have to agree with Peter on this, I would rather avoid messing with
> function pointers in KCFI to avoid these issues.

It is either this; and I think I can avoid the worst of it (see below);
or grow the indirect_callsites to obscure the immediate (as Linus
suggested), there's around ~16k indirect callsites in a defconfig-ish
kernel, so growing it isn't too horrible, but it isn't nice either.

The prettiest option to obscure the immediate at the callsite I could
conjure up is something like:

kcfi_caller_linus:
	movl	$0x12345600, %r10d
	movb	$0x78, %r10b
	cmpl	%r10d, -OFFSET(%r11)
	je	1f
	ud2
1:	call	__x86_thunk_indirect_r11

Which comes to around 22 bytes (+5 over the original).

Joao suggested putting part of that in the retpoline thunk like:

kcfi_caller_joao:
	movl	$0x12345600, %r10d
	movb	$0x78, %r10b
	call	__x86_thunk_indirect_cfi

__x86_thunk_indirect_cfi:
	cmpl    %r10d, -OFFSET(%r11)
	je      1f
	ud2
1:
	call    1f
	int3
1:
	mov     %r11, (%rsp)
	ret
	int3

The only down-side there is that eIBRS hardware doesn't need retpolines
(given we currently default to ignoring Spectre-BHB) and as such this
doesn't really work nicely (we don't want to re-introduce funneling).


The other option I came up with, alluded to above, is below, and having
written it out, I'm pretty sure I faviour just growing the indirect
callsite as per Linus' option above.

Suppose:

indirect_callsite:
	cmpl	$0x12345678, -6(%r11)		# 8
	je	1f				# 2
	ud2					# 2
	call	__x86_indirect_thunk_r11	# 5	(-> .retpoline_sites)


__cfi_\func:
	movl	$0x12345678, %eax		# 5
	int3					# 1
	int3					# 1
\func:			# aligned 16
	endbr					# 4
	nop12					# 12
	call __fentry__				# 5
	...


And for functions that do not get their address taken:


\func:			# aligned 16
	nop16					# 16
	call __fentry__				# 5
	...



Instead, extend the objtool .call_sites to also include tail-calls and
for:

 - regular (!SKL, !IBT) systems;
   * patch all direct calls/jmps to +16		(.call_sites)
   * static_call/ftrace/etc.. can triviall add the +16
   * retpolines can do +16 for the indirect calls
   * retutn thunks are patched to ret;int3	(.return_sites)

   (indirect calls for eIBRS which don't use retpoline
    simply eat the nops)


 - SKL systems;
   * patch the first 16 bytes into:

	nop6
	sarq	$5, PER_CPU_VAR(__x86_call_depth)

   * patch all direct calls to +6		(.call_sites)
   * patch all direct jumps to +16		(.call_sites)
   * static_call/ftrace adjust to +6/+16 depending on instruction type
   * retpolines are split between call/jmp and do +6/+16 resp.
   * return thunks are patches to x86_return_skl (.return_sites)


 - IBT systes;
   * patch the first 16 bytes to:

	endbr					# 4
	xorl	$0x12345678, %r10d		# 7
	je	1f				# 2
	ud2					# 2
	nop					# 1
1:

   * patch the callsites to:			(.retpoline_sites)

	movl	$0x12345678, %r10d		# 7
	call	*$r11				# 3
	nop7					# 7

   * patch all the direct calls/jmps to +16	(.call_sites)
   * static_call/ftrace/etc.. add +16
   * retutn thunks are patched to ret;int3	(.return_sites)


Yes, frobbing the address for static_call/ftrace/etc.. is a bit
horrible, but at least &sym remains exactly that address and not
something magical.

Note: It is possible to shift the __fentry__ call, but that would mean
that we loose alignment or get to carry .call_sites at runtime (and it
is *huge*)



^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 19:36           ` Kees Cook
  2022-07-20 19:43             ` Steven Rostedt
@ 2022-07-20 21:36             ` Peter Zijlstra
  1 sibling, 0 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-20 21:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Steven Rostedt, Thomas Gleixner, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Juergen Gross,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Wed, Jul 20, 2022 at 12:36:38PM -0700, Kees Cook wrote:
> On Wed, Jul 20, 2022 at 11:07:26AM -0700, Linus Torvalds wrote:
> > On Wed, Jul 20, 2022 at 10:50 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > [    2.464117] missing return thunk: lkdtm_rodata_do_nothing+0x0/0x8-lkdtm_rodata_do_nothing+0x5/0x8: e9 00 00 00 00
> > 
> > Well, that looks like a "jmp" instruction that has never been relocated.
> 
> Peter, Josh, and I drilled down into this recently[1] and discussed
> some solutions[2].
> 
> This test is doing what's expected: it needed an arch-agnostic way to do
> a "return", and when the way to do that changed, it also changed (which
> would normally be good, but in this case broke it). It's been happily
> being used as part of the per-section architectural behavior testing[3]
> of execution-vs-expected-memory-permissions for quite a long while now.
> 
> I'd rather not remove it (or do it dynamically) since the point is to
> test what has been generated by the toolchain/build process and stuffed
> into the .rodata section. i.e. making sure gadgets there can't be
> executed, that the boot-time section permission-setting works correctly,
> etc. Before the retbleed mitigation, this test worked for all
> architectures; I'd hate to regress it. :(
> 
> -Kees
> 
> [1] https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
> [2] https://lore.kernel.org/lkml/20220713213133.455599-1-keescook@chromium.org/
> [3] e.g. https://linux.kernelci.org/test/plan/id/62d61ee8ef31e0f0faa39bff/

Josh posted this:

  https://lkml.kernel.org/r/8ec0039712f252693049c70ed3891d39a2357112.1658155446.git.jpoimboe@kernel.org

which I picked up today; barring robot fail I'll push it to x86/urgent
tomorrow.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 21:13                               ` Peter Zijlstra
@ 2022-07-21  8:21                                 ` David Laight
  2022-07-21 10:56                                   ` David Laight
  2022-07-21 15:54                                 ` Peter Zijlstra
  1 sibling, 1 reply; 142+ messages in thread
From: David Laight @ 2022-07-21  8:21 UTC (permalink / raw)
  To: 'Peter Zijlstra', Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

From: Peter Zijlstra
> Sent: 20 July 2022 22:13
...
> The prettiest option to obscure the immediate at the callsite I could
> conjure up is something like:
> 
> kcfi_caller_linus:
> 	movl	$0x12345600, %r10d
> 	movb	$0x78, %r10b
> 	cmpl	%r10d, -OFFSET(%r11)
> 	je	1f
> 	ud2
> 1:	call	__x86_thunk_indirect_r11
> 
> Which comes to around 22 bytes (+5 over the original).

You'd be better doing:
	movl $0x12345678-0xaa, %r10d
	addl $0xaa, %r10d
so that the immediate is obscured even if the low bits are zero.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21  8:21                                 ` David Laight
@ 2022-07-21 10:56                                   ` David Laight
  0 siblings, 0 replies; 142+ messages in thread
From: David Laight @ 2022-07-21 10:56 UTC (permalink / raw)
  To: 'Peter Zijlstra', 'Sami Tolvanen'
  Cc: 'Linus Torvalds', 'Thomas Gleixner',
	'Joao Moreira', 'LKML',
	'the arch/x86 maintainers', 'Tim Chen',
	'Josh Poimboeuf', 'Cooper, Andrew',
	'Pawan Gupta', 'Johannes Wikner',
	'Alyssa Milburn', 'Jann Horn', 'H.J. Lu',
	'Moreira, Joao', 'Nuzman, Joseph',
	'Steven Rostedt', 'Gross, Jurgen',
	'Masami Hiramatsu', 'Alexei Starovoitov',
	'Daniel Borkmann', 'Peter Collingbourne',
	'Kees Cook'

From: David Laight
> Sent: 21 July 2022 09:22
> 
> From: Peter Zijlstra
> > Sent: 20 July 2022 22:13
> ...
> > The prettiest option to obscure the immediate at the callsite I could
> > conjure up is something like:
> >
> > kcfi_caller_linus:
> > 	movl	$0x12345600, %r10d
> > 	movb	$0x78, %r10b
> > 	cmpl	%r10d, -OFFSET(%r11)
> > 	je	1f
> > 	ud2
> > 1:	call	__x86_thunk_indirect_r11
> >
> > Which comes to around 22 bytes (+5 over the original).
> 
> You'd be better doing:
> 	movl $0x12345678-0xaa, %r10d
> 	addl $0xaa, %r10d
> so that the immediate is obscured even if the low bits are zero.

Actually, can't you use %eax instead of %r10d?
IIRC it is only used for the number of FP registers in a varargs
call - and that isn't used in the kernel.
That removes the 3 'REG' prefixes and lets you use the
2-byte 04-xx instruction to add to %al.

Although I'm sure I remember something about a penalty for
accessing %al just after the full register.
So the 3-byte sign extending add may be better.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-20 21:13                               ` Peter Zijlstra
  2022-07-21  8:21                                 ` David Laight
@ 2022-07-21 15:54                                 ` Peter Zijlstra
  2022-07-21 17:55                                   ` Peter Zijlstra
  2022-07-23  9:50                                   ` Thomas Gleixner
  1 sibling, 2 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-21 15:54 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Wed, Jul 20, 2022 at 11:13:16PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 19, 2022 at 10:19:18AM -0700, Sami Tolvanen wrote:
> 
> > Clang's current CFI implementation is somewhat similar to this. It
> > creates separate thunks for address-taken functions and changes
> > function addresses in C code to point to the thunks instead.
> > 
> > While this works, it creates painful situations when interacting with
> > assembly (e.g. a function address taken in assembly cannot be used
> > for indirect calls in C as it doesn't point to the thunk) and needs
> > unpleasant hacks when we want take the actual function address in C
> > (i.e. scattering the code with function_nocfi() calls).
> > 
> > I have to agree with Peter on this, I would rather avoid messing with
> > function pointers in KCFI to avoid these issues.
> 
> It is either this; and I think I can avoid the worst of it (see below);
> or grow the indirect_callsites to obscure the immediate (as Linus
> suggested), there's around ~16k indirect callsites in a defconfig-ish
> kernel, so growing it isn't too horrible, but it isn't nice either.
> 
> The prettiest option to obscure the immediate at the callsite I could
> conjure up is something like:
> 
> kcfi_caller_linus:
> 	movl	$0x12345600, %r10d
> 	movb	$0x78, %r10b
> 	cmpl	%r10d, -OFFSET(%r11)
> 	je	1f
> 	ud2
> 1:	call	__x86_thunk_indirect_r11
> 
> Which comes to around 22 bytes (+5 over the original).

My very firstest LLVM patch; except it explodes at runtime and I'm not
sure where to start looking...

On top of sami-llvm/kcfi

If I comment out the orl and cmpl it compiles stuff, put either one back
and it explodes in some very unhelpful message.

The idea is the above callthunk that makes any offset work by not having
the full hash as an immediate and allow kCFI along with
-fpatchable-function-entry=N,M and include M in the offset.

Specifically, I meant to use -fpatchable-function-entry=16,16, but alas,
I never got that far.

Help ?

---

diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index 5e011d409ee8..ffdb95324da7 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -124,23 +124,12 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF,
     OutStreamer->emitSymbolAttribute(FnSym, MCSA_ELF_TypeFunction);
   OutStreamer->emitLabel(FnSym);
 
-  // Emit int3 padding to allow runtime patching of the preamble.
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
                               .addReg(X86::EAX)
                               .addImm(Type->getZExtValue()));
 
-  // The type hash is encoded in the last four bytes of the X86::MOV32ri
-  // instruction. Emit additional X86::INT3 padding to ensure the hash is
-  // at offset -6 from the function start to avoid potential call target
-  // gadgets in checks emitted by X86AsmPrinter::LowerKCFI_CHECK.
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-
   if (MAI->hasDotTypeDotSizeDirective()) {
     MCSymbol *EndSym = OutContext.createTempSymbol("cfi_func_end");
     OutStreamer->emitLabel(EndSym);
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 16c4d2e45970..d72e82f4f63a 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -1340,22 +1340,34 @@ void X86AsmPrinter::LowerKCFI_CHECK(const MachineInstr &MI) {
   assert(std::next(MI.getIterator())->isCall() &&
          "KCFI_CHECK not followed by a call instruction");
 
-  // The type hash is encoded in the last four bytes of the X86::CMP32mi
-  // instruction. If we decided to place the hash immediately before
-  // indirect call targets (offset -4), the X86::JCC_1 instruction we'll
-  // emit next would be a potential indirect call target as it's preceded
-  // by a valid type hash.
-  //
-  // To avoid generating useful gadgets, X86AsmPrinter::emitKCFITypeId
-  // emits the type hash prefix at offset -6, which makes X86::TRAP the
-  // only possible target in this instruction sequence.
-  EmitAndCountInstruction(MCInstBuilder(X86::CMP32mi)
+  const Function &F = MF->getFunction();
+  unsigned Imm = MI.getOperand(1).getImm();
+  unsigned Num;
+
+  if (F.hasFnAttribute("patchable-function-prefix")) {
+    if (F.getFnAttribute("patchable-function-prefix")
+            .getValueAsString()
+            .getAsInteger(10, Num))
+      Num = 0;
+  }
+
+  // movl $0x12345600, %r10d
+  EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
+		  .addReg(X86::R10)
+		  .addImm(Imm & ~0xff));
+
+  // orl $0x78, %r10d
+  EmitAndCountInstruction(MCInstBuilder(X86::OR32ri8)
+		  .addReg(X86::R10)
+		  .addImm(Imm & 0xff));
+
+  // cmpl %r10, -off(%r11)
+  EmitAndCountInstruction(MCInstBuilder(X86::CMP32rm)
+                              .addReg(X86::R10)
                               .addReg(MI.getOperand(0).getReg())
                               .addImm(1)
                               .addReg(X86::NoRegister)
-                              .addImm(-6)
-                              .addReg(X86::NoRegister)
-                              .addImm(MI.getOperand(1).getImm()));
+                              .addImm(-(Num + 4)));
 
   MCSymbol *Pass = OutContext.createTempSymbol();
   EmitAndCountInstruction(

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 15:54                                 ` Peter Zijlstra
@ 2022-07-21 17:55                                   ` Peter Zijlstra
  2022-07-21 18:06                                     ` Linus Torvalds
  2022-07-23  9:50                                   ` Thomas Gleixner
  1 sibling, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-21 17:55 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 05:54:38PM +0200, Peter Zijlstra wrote:
> My very firstest LLVM patch; except it explodes at runtime and I'm not
> sure where to start looking...
> 
> On top of sami-llvm/kcfi

Thanks Sami!

this seems to work, let me go hack the kernel..

---
 clang/lib/Driver/SanitizerArgs.cpp     | 12 ---------
 llvm/lib/Target/X86/X86AsmPrinter.cpp  | 11 --------
 llvm/lib/Target/X86/X86MCInstLower.cpp | 47 ++++++++++++++++++++++------------
 3 files changed, 31 insertions(+), 39 deletions(-)

diff --git a/clang/lib/Driver/SanitizerArgs.cpp b/clang/lib/Driver/SanitizerArgs.cpp
index 373a74399df0..b6ebc8ad1842 100644
--- a/clang/lib/Driver/SanitizerArgs.cpp
+++ b/clang/lib/Driver/SanitizerArgs.cpp
@@ -719,18 +719,6 @@ SanitizerArgs::SanitizerArgs(const ToolChain &TC,
       D.Diag(diag::err_drv_argument_not_allowed_with)
           << "-fsanitize=kcfi"
           << lastArgumentForMask(D, Args, SanitizerKind::CFI);
-
-    if (Arg *A = Args.getLastArg(options::OPT_fpatchable_function_entry_EQ)) {
-      StringRef S = A->getValue();
-      unsigned N, M;
-      // With -fpatchable-function-entry=N,M, where M > 0,
-      // llvm::AsmPrinter::emitFunctionHeader injects nops before the
-      // KCFI type identifier, which is currently unsupported.
-      if (!S.consumeInteger(10, N) && S.consume_front(",") &&
-          !S.consumeInteger(10, M) && M > 0)
-        D.Diag(diag::err_drv_argument_not_allowed_with)
-            << "-fsanitize=kcfi" << A->getAsString(Args);
-    }
   }
 
   Stats = Args.hasFlag(options::OPT_fsanitize_stats,
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index 5e011d409ee8..ffdb95324da7 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -124,23 +124,12 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF,
     OutStreamer->emitSymbolAttribute(FnSym, MCSA_ELF_TypeFunction);
   OutStreamer->emitLabel(FnSym);
 
-  // Emit int3 padding to allow runtime patching of the preamble.
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-
   // Embed the type hash in the X86::MOV32ri instruction to avoid special
   // casing object file parsers.
   EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
                               .addReg(X86::EAX)
                               .addImm(Type->getZExtValue()));
 
-  // The type hash is encoded in the last four bytes of the X86::MOV32ri
-  // instruction. Emit additional X86::INT3 padding to ensure the hash is
-  // at offset -6 from the function start to avoid potential call target
-  // gadgets in checks emitted by X86AsmPrinter::LowerKCFI_CHECK.
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-  EmitAndCountInstruction(MCInstBuilder(X86::INT3));
-
   if (MAI->hasDotTypeDotSizeDirective()) {
     MCSymbol *EndSym = OutContext.createTempSymbol("cfi_func_end");
     OutStreamer->emitLabel(EndSym);
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 16c4d2e45970..4ed23348aa7c 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -1340,22 +1340,37 @@ void X86AsmPrinter::LowerKCFI_CHECK(const MachineInstr &MI) {
   assert(std::next(MI.getIterator())->isCall() &&
          "KCFI_CHECK not followed by a call instruction");
 
-  // The type hash is encoded in the last four bytes of the X86::CMP32mi
-  // instruction. If we decided to place the hash immediately before
-  // indirect call targets (offset -4), the X86::JCC_1 instruction we'll
-  // emit next would be a potential indirect call target as it's preceded
-  // by a valid type hash.
-  //
-  // To avoid generating useful gadgets, X86AsmPrinter::emitKCFITypeId
-  // emits the type hash prefix at offset -6, which makes X86::TRAP the
-  // only possible target in this instruction sequence.
-  EmitAndCountInstruction(MCInstBuilder(X86::CMP32mi)
-                              .addReg(MI.getOperand(0).getReg())
-                              .addImm(1)
-                              .addReg(X86::NoRegister)
-                              .addImm(-6)
-                              .addReg(X86::NoRegister)
-                              .addImm(MI.getOperand(1).getImm()));
+  const Function &F = MF->getFunction();
+  unsigned Imm = MI.getOperand(1).getImm();
+  unsigned Num = 0;
+
+  if (F.hasFnAttribute("patchable-function-prefix")) {
+    if (F.getFnAttribute("patchable-function-prefix")
+            .getValueAsString()
+            .getAsInteger(10, Num))
+      Num = 0;
+  } 
+
+  // movl $(~0x12345678), %r10d
+  EmitAndCountInstruction(MCInstBuilder(X86::MOV32ri)
+		  .addReg(X86::R10D) // dst
+		  .addImm(~Imm));
+
+  // negl %r10d
+  EmitAndCountInstruction(MCInstBuilder(X86::NEG32r)
+		  .addReg(X86::R10D) // dst
+		  .addReg(X86::R10D) // src
+		  );
+
+  // cmpl %r10d, -off(%r11)
+  EmitAndCountInstruction(MCInstBuilder(X86::CMP32mr)
+                              .addReg(MI.getOperand(0).getReg()) // base
+                              .addImm(0) // scale
+                              .addReg(0) // index
+                              .addImm(-(Num+4)) // offset
+                              .addReg(0) // segment
+                              .addReg(X86::R10D) // reg
+			      );
 
   MCSymbol *Pass = OutContext.createTempSymbol();
   EmitAndCountInstruction(

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 17:55                                   ` Peter Zijlstra
@ 2022-07-21 18:06                                     ` Linus Torvalds
  2022-07-21 18:27                                       ` Peter Zijlstra
  2022-07-21 22:01                                       ` David Laight
  0 siblings, 2 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-21 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 10:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> this seems to work, let me go hack the kernel..

Am I missing something?

Isn't this generating

        movl $~IMM,%r10d
        negl %r10d
        cmpl %r10d,-4(%calldest)

for the sequence?

That seems bogus for two reasons:

 (a) 'neg' is not the opposite of '~'. Did you mean 'notl' or did you mean '-'?

     Or am I missing something entirely?

 (b) since you have that r10 use anyway, why can't you just generate the simpler

        movl $-IMM,%r10d
        addl -4(%calldest),%r10d

     instead? You only need ZF anyway.

     Maybe you need to add some "r10 is clobbered" thing, I don't know.

But again: I don't know llvm, so the above is basically me just doing
the "pattern matching monkey" thing.

             Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 18:06                                     ` Linus Torvalds
@ 2022-07-21 18:27                                       ` Peter Zijlstra
  2022-07-21 18:32                                         ` Linus Torvalds
  2022-07-22  0:16                                         ` Sami Tolvanen
  2022-07-21 22:01                                       ` David Laight
  1 sibling, 2 replies; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-21 18:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 11:06:42AM -0700, Linus Torvalds wrote:
> On Thu, Jul 21, 2022 at 10:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > this seems to work, let me go hack the kernel..
> 
> Am I missing something?
> 
> Isn't this generating
> 
>         movl $~IMM,%r10d
>         negl %r10d
>         cmpl %r10d,-4(%calldest)
> 
> for the sequence?
> 
> That seems bogus for two reasons:
> 
>  (a) 'neg' is not the opposite of '~'. Did you mean 'notl' or did you mean '-'?
> 
>      Or am I missing something entirely?

No, you're right, I'm being daft again.

>  (b) since you have that r10 use anyway, why can't you just generate the simpler
> 
>         movl $-IMM,%r10d
>         addl -4(%calldest),%r10d
> 
>      instead? You only need ZF anyway.

Right, lemme see if I can wrangle llvm to generate that.

>      Maybe you need to add some "r10 is clobbered" thing, I don't know.

R11,R11 are caller-saved, and since this is the actual call site, the
caller must already have saved them or marked them clobbered.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 18:27                                       ` Peter Zijlstra
@ 2022-07-21 18:32                                         ` Linus Torvalds
  2022-07-21 20:22                                           ` Joao Moreira
  2022-07-22  0:16                                         ` Sami Tolvanen
  1 sibling, 1 reply; 142+ messages in thread
From: Linus Torvalds @ 2022-07-21 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> R11,R11 are caller-saved, and since this is the actual call site, the
> caller must already have saved them or marked them clobbered.

Ok. I don't know the context, but I was thinking along the lines of
the same hash value perhaps being used multiple times because it has
the same function type.  Then using the "addl" trick means that the
hash value in %r10 will be changing and cannot be re-used.

But I guess this is *much* too late for those kinds of optimizations,
as it literally just outputs the raw instruction sequence, and so the
(negated) hash value will always be re-generated anyway, no re-use
possible.

             Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 18:32                                         ` Linus Torvalds
@ 2022-07-21 20:22                                           ` Joao Moreira
  0 siblings, 0 replies; 142+ messages in thread
From: Joao Moreira @ 2022-07-21 20:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Sami Tolvanen, Thomas Gleixner, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

> Ok. I don't know the context, but I was thinking along the lines of
> the same hash value perhaps being used multiple times because it has
> the same function type.  Then using the "addl" trick means that the
> hash value in %r10 will be changing and cannot be re-used.

Fwiiw, even if %r10 value was not being destroyed by the "addl", the 
call right after the check implies that you cannot trust the contents of 
%r10 anymore (it may have been messed up within the called function).

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 18:06                                     ` Linus Torvalds
  2022-07-21 18:27                                       ` Peter Zijlstra
@ 2022-07-21 22:01                                       ` David Laight
  2022-07-22 11:03                                         ` Peter Zijlstra
  1 sibling, 1 reply; 142+ messages in thread
From: David Laight @ 2022-07-21 22:01 UTC (permalink / raw)
  To: 'Linus Torvalds', Peter Zijlstra
  Cc: Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

From: Linus Torvalds
> Sent: 21 July 2022 19:07
...
>  (b) since you have that r10 use anyway, why can't you just generate the simpler
> 
>         movl $-IMM,%r10d
>         addl -4(%calldest),%r10d
> 
>      instead? You only need ZF anyway.
> 
>      Maybe you need to add some "r10 is clobbered" thing, I don't know.
> 
> But again: I don't know llvm, so the above is basically me just doing
> the "pattern matching monkey" thing.
> 
>              Linus

Since: "If the callee is a variadic function, then the number of floating
point arguments passed to the function in vector registers must be provided
by the caller in the AL register."

And that that never happens in the kernel you can use %eax instead
of %r10d.

Even in userspace %al can be set non-zero after the signature check.

If you are willing to cut the signature down to 26 bits and
then ensure that one of the bytes of -IMM (or ~IMM if you
use xor) is 0xcc and jump back to that on error the check
becomes:
	movl	$-IMM,%eax
1:	addl	-4(%calldest),%eax
	jnz	1b-1	// or -2, -3, -4
	add	$num_fp_args,%eax	// If needed non-zero
	call	%calldest

I think that adds 10 bytes to the call site.
Although with retpoline thunks (and no fp varargs calls)
all but the initial movl can go into the thunk.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 18:27                                       ` Peter Zijlstra
  2022-07-21 18:32                                         ` Linus Torvalds
@ 2022-07-22  0:16                                         ` Sami Tolvanen
  2022-07-22 10:23                                           ` Peter Zijlstra
  1 sibling, 1 reply; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-22  0:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 08:27:12PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 11:06:42AM -0700, Linus Torvalds wrote:
> > On Thu, Jul 21, 2022 at 10:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > this seems to work, let me go hack the kernel..
> > 
> > Am I missing something?
> > 
> > Isn't this generating
> > 
> >         movl $~IMM,%r10d
> >         negl %r10d
> >         cmpl %r10d,-4(%calldest)
> > 
> > for the sequence?
> > 
> > That seems bogus for two reasons:
> > 
> >  (a) 'neg' is not the opposite of '~'. Did you mean 'notl' or did you mean '-'?
> > 
> >      Or am I missing something entirely?
> 
> No, you're right, I'm being daft again.
> 
> >  (b) since you have that r10 use anyway, why can't you just generate the simpler
> > 
> >         movl $-IMM,%r10d
> >         addl -4(%calldest),%r10d
> > 
> >      instead? You only need ZF anyway.
> 
> Right, lemme see if I can wrangle llvm to generate that.

That looks good to me. I updated my LLVM tree to generate this code
for the checks:

https://github.com/samitolvanen/llvm-project/commits/kcfi

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-22  0:16                                         ` Sami Tolvanen
@ 2022-07-22 10:23                                           ` Peter Zijlstra
  2022-07-22 15:38                                             ` Sami Tolvanen
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-22 10:23 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 05:16:14PM -0700, Sami Tolvanen wrote:

> That looks good to me. I updated my LLVM tree to generate this code
> for the checks:
> 
> https://github.com/samitolvanen/llvm-project/commits/kcfi

Thanks!

The alignment thing you added:

  // Emit int3 padding before the type information to maintain alignment.
  // The X86::MOV32ri instruction we emit is 5 bytes long.
  uint64_t Padding = offsetToAlignment(5, MF.getAlignment());
  while (Padding--)
    EmitAndCountInstruction(MCInstBuilder(X86::INT3));

Doesn't seem to quite do what we want though.

When I use -fpatchable-function-entry=16,16 we effectively get a 32 byte
prefix on every function:

0000000000000000 <__cfi___traceiter_sched_kthread_stop>:
       0:       cc                      int3
       1:       cc                      int3
       2:       cc                      int3
       3:       cc                      int3
       4:       cc                      int3
       5:       cc                      int3
       6:       cc                      int3
       7:       cc                      int3
       8:       cc                      int3
       9:       cc                      int3
       a:       cc                      int3
       b:       b8 26 b1 df 98          mov    $0x98dfb126,%eax
      10:       90                      nop
      11:       90                      nop
      12:       90                      nop
      13:       90                      nop
      14:       90                      nop
      15:       90                      nop
      16:       90                      nop
      17:       90                      nop
      18:       90                      nop
      19:       90                      nop
      1a:       90                      nop
      1b:       90                      nop
      1c:       90                      nop
      1d:       90                      nop
      1e:       90                      nop
      1f:       90                      nop

And given the parameters, that's indeed the only option. However, given
I can scribble the type thing just fine when moving to FineIBT and the
whole Skylake depth tracking only needs 10 bytes, I figured I'd try:
-fpatchable-function-entry=11,11 instead. But that resulted in
unalignment:

0000000000000000 <__cfi___traceiter_sched_kthread_stop>:
       0:       cc                      int3
       1:       cc                      int3
       2:       cc                      int3
       3:       cc                      int3
       4:       cc                      int3
       5:       cc                      int3
       6:       cc                      int3
       7:       cc                      int3
       8:       cc                      int3
       9:       cc                      int3
       a:       cc                      int3
       b:       b8 26 b1 df 98          mov    $0x98dfb126,%eax
      10:       90                      nop
      11:       90                      nop
      12:       90                      nop
      13:       90                      nop
      14:       90                      nop
      15:       90                      nop
      16:       90                      nop
      17:       90                      nop
      18:       90                      nop
      19:       90                      nop
      1a:       90                      nop

000000000000001b <__traceiter_sched_kthread_stop>:

However, if I change clang like so:

 llvm/lib/Target/X86/X86AsmPrinter.cpp | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index 789597f8ef1a..6c94313a197d 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -124,9 +124,15 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF,
     OutStreamer->emitSymbolAttribute(FnSym, MCSA_ELF_TypeFunction);
   OutStreamer->emitLabel(FnSym);
 
+  int64_t PrefixNops = 0;
+  (void)MF.getFunction()
+      .getFnAttribute("patchable-function-prefix")
+      .getValueAsString()
+      .getAsInteger(10, PrefixNops);
+
   // Emit int3 padding before the type information to maintain alignment.
   // The X86::MOV32ri instruction we emit is 5 bytes long.
-  uint64_t Padding = offsetToAlignment(5, MF.getAlignment());
+  uint64_t Padding = offsetToAlignment(5+PrefixNops, MF.getAlignment());
   while (Padding--)
     EmitAndCountInstruction(MCInstBuilder(X86::INT3));
 

Then it becomes:

0000000000000000 <__cfi___traceiter_sched_kthread_stop>:
       0:       b8 26 b1 df 98          mov    $0x98dfb126,%eax
       5:       90                      nop
       6:       90                      nop
       7:       90                      nop
       8:       90                      nop
       9:       90                      nop
       a:       90                      nop
       b:       90                      nop
       c:       90                      nop
       d:       90                      nop
       e:       90                      nop
       f:       90                      nop

0000000000000010 <__traceiter_sched_kthread_stop>:

and things are 'good' again, except for functions that don't get a kcfi
preamble, those are unaligned... I couldn't find where the
patchable-function-prefix nops are generated to fix this up :/


Also; could you perhaps add a switch to supress ENDBR for functions with
a kCFI preamble ?

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 22:01                                       ` David Laight
@ 2022-07-22 11:03                                         ` Peter Zijlstra
  2022-07-22 13:27                                           ` David Laight
  0 siblings, 1 reply; 142+ messages in thread
From: Peter Zijlstra @ 2022-07-22 11:03 UTC (permalink / raw)
  To: David Laight
  Cc: 'Linus Torvalds',
	Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21, 2022 at 10:01:12PM +0000, David Laight wrote:

> Since: "If the callee is a variadic function, then the number of floating
> point arguments passed to the function in vector registers must be provided
> by the caller in the AL register."
> 
> And that that never happens in the kernel you can use %eax instead
> of %r10d.

Except there's the AMD BTC thing and we should (compiler patch seems
MIA) have an unconditional: 'xor %eax,%eax' in front of every function
call.

(The official mitigation strategy was CALL; LFENCE IIRC, but that's so
horrible nobody is actually considering that)

Yes, the suggested sequence ends with rax being zero, but since we start
the speculation before that result is computed that's not good enough I
suspect.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-22 11:03                                         ` Peter Zijlstra
@ 2022-07-22 13:27                                           ` David Laight
  0 siblings, 0 replies; 142+ messages in thread
From: David Laight @ 2022-07-22 13:27 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Linus Torvalds',
	Sami Tolvanen, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

From: Peter Zijlstra
> Sent: 22 July 2022 12:03
> 
> On Thu, Jul 21, 2022 at 10:01:12PM +0000, David Laight wrote:
> 
> > Since: "If the callee is a variadic function, then the number of floating
> > point arguments passed to the function in vector registers must be provided
> > by the caller in the AL register."
> >
> > And that that never happens in the kernel you can use %eax instead
> > of %r10d.
> 
> Except there's the AMD BTC thing and we should (compiler patch seems
> MIA) have an unconditional: 'xor %eax,%eax' in front of every function
> call.

I've just read https://www.amd.com/system/files/documents/technical-guidance-for-mitigating-branch-type-confusion_v7_20220712.pdf

It doesn't seem to suggest clearing registers except as a vague
'might help' before a function return (to limit what the speculated
code can do.

The only advantage I can think of for 'xor ax,ax' is that it is done as
a register rename - and isn't dependant on older instructions.
So it might reduce some pipeline stalls.

I'm guessing that someone might find a 'gadget' that depends on %eax
and it may be possible to find somewhere that leaves an arbitrary
value in it.
It is also about the only register that isn't live!

> (The official mitigation strategy was CALL; LFENCE IIRC, but that's so
> horrible nobody is actually considering that)
> 
> Yes, the suggested sequence ends with rax being zero, but since we start
> the speculation before that result is computed that's not good enough I
> suspect.

The speculated code can't use the 'wrong' %eax value.
The only problem is that reading from -4(%r11) is likely to be a
D$ miss giving plenty of time for the cpu to execute 'crap'.
But I'm not sure a later 'xor ax,ax' helps.
(OTOH this is all horrid and makes my brian hurt.)

AFAICT with BTC you 'just lose'.
I thought it was bad enough that some cpu used the BTB for predicted
conditional jumps - but using it to decide 'this must be a branch
instruction' seems especially broken.

Seems the best thing to do with those cpu is to run an embedded
system with a busybox+buildroot userspace where almost everything
runs as root :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-22 10:23                                           ` Peter Zijlstra
@ 2022-07-22 15:38                                             ` Sami Tolvanen
  0 siblings, 0 replies; 142+ messages in thread
From: Sami Tolvanen @ 2022-07-22 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, Joao Moreira, LKML,
	the arch/x86 maintainers, Tim Chen, Josh Poimboeuf, Cooper,
	Andrew, Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn,
	H.J. Lu, Moreira, Joao, Nuzman, Joseph, Steven Rostedt, Gross,
	Jurgen, Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Fri, Jul 22, 2022 at 12:23:30PM +0200, Peter Zijlstra wrote:
> and things are 'good' again, except for functions that don't get a kcfi
> preamble, those are unaligned...

One way to fix this would be to just emit an empty KCFI preamble for
non-address-taken functions when patchable-function-prefix > 0, so all
the functions end up with the same alignment.

Note that Clang doesn't keep the function entry aligned with
-fpatchable-function-entry=N,M, where M>0. It generates .p2align 4,
0x90 before the nops, but if you want to maintain alignment for the
entry, you just have to tell it to generate the correct number of
prefix nops.

> I couldn't find where the patchable-function-prefix nops are generated
> to fix this up :/

It's all in AsmPrinter::emitFunctionHeader, look for emitNops.
 
> Also; could you perhaps add a switch to supress ENDBR for functions with
> a kCFI preamble ?

I'm planning to do that in a follow-up patch. I would rather not add
features that are not critical to the initial patch to avoid further
delays in getting the compiler changes accepted.

Sami

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-18 20:44       ` Thomas Gleixner
  2022-07-18 21:01         ` Linus Torvalds
  2022-07-18 21:18         ` Peter Zijlstra
@ 2022-07-22 20:11         ` Tim Chen
  2022-07-22 22:18           ` Linus Torvalds
  2 siblings, 1 reply; 142+ messages in thread
From: Tim Chen @ 2022-07-22 20:11 UTC (permalink / raw)
  To: Thomas Gleixner, Linus Torvalds
  Cc: LKML, the arch/x86 maintainers, Josh Poimboeuf, Andrew Cooper,
	Pawan Gupta, Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu,
	Joao Moreira, Joseph Nuzman, Steven Rostedt, Juergen Gross,
	Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Mon, 2022-07-18 at 22:44 +0200, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 12:51, Linus Torvalds wrote:
> > On Mon, Jul 18, 2022 at 12:30 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Let the compiler add a 16 byte padding in front of each function entry
> > > point and put the call depth accounting there. That avoids calling out
> > > into the module area and reduces ITLB pressure.
> > 
> > Ooh.
> > 
> > I actually like this a lot better.
> > 
> > Could we just say "use this instead if you have SKL and care about the issue?"
> > 
> > I don't hate your module thunk trick, but this does seem *so* much
> > simpler, and if it performs better anyway, it really does seem like
> > the better approach.
> 
> Yes, Peter and I came from avoiding a new compiler and the overhead for
> everyone when putting the padding into the code. We realized only when
> staring at the perf data that this padding in front of the function
> might be an acceptable solution. I did some more tests today on different
> machines with mitigations=off with kernels compiled with and without
> that padding. I couldn't find a single test case where the result was
> outside of the usual noise. But then my tests are definitely incomplete.
> 

Here are some performance numbers for FIO running on a SKX server with
Intel Cold Stream SSD. Padding improves performance significantly.

Tested latest depth tracking code from Thomas:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=depthtracking
(SHA1 714d29e3e7e3faac27142424ae2533163ddd3a46)
latest gcc patch from Thomas is included at the end.


					Baseline	Baseline
read (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		356.33	6.35	0.00%		7.11%		98.93%
retbleed=off		332.67	5.51	-6.64%		0.00%		99.16%
retbleed=ibrs		242.00	5.57	-32.09%		-27.25%		99.41%
retbleed=stuff (nopad)	281.67	4.62	-20.95%		-15.33%		99.35%
retbleed=stuff (pad)	310.67	0.58	-12.82%		-6.61%		99.29%
					
read/write 				Baseline 	Baseline 
70/30 (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		340.60	8.12	0.00%		4.01%		96.80%
retbleed=off		327.47	8.03	-3.86%		0.00%		97.06%
retbleed=ibrs		239.47	0.75	-29.69%		-26.87%		98.23%
retbleed=stuff (nopad)	275.20	0.69	-19.20%		-15.96%		97.86%
retbleed=stuff (pad)	296.60	2.03	-12.92%		-9.43%		97.14%
					
					Baseline 	Baseline 
write (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		299.33	4.04	0.00%		7.16%		93.51%
retbleed=off		279.33	7.51	-6.68%		0.00%		94.30%
retbleed=ibrs		231.33	0.58	-22.72%		-17.18%		95.84%
retbleed=stuff (nopad)	257.67	0.58	-13.92%		-7.76%		94.96%
retbleed=stuff (pad)	274.67	1.53	-8.24%		-1.67%		94.31%


Tim

gcc patch from Thomas:


---
 gcc/config/i386/i386.cc  |   13 +++++++++++++
 gcc/config/i386/i386.h   |    7 +++++++
 gcc/config/i386/i386.opt |    4 ++++
 gcc/doc/invoke.texi      |    6 ++++++
 4 files changed, 30 insertions(+)

--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6182,6 +6182,19 @@ ix86_code_end (void)
     file_end_indicate_split_stack ();
 }
 
+void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED)
+{
+  if (force_function_padding)
+    {
+      fprintf (asm_out_file, "\t.align %d\n",
+	       1 << force_function_padding);
+      fprintf (asm_out_file, "\t.skip %d,0xcc\n",
+	       1 << force_function_padding);
+    }
+}
+
 /* Emit code for the SET_GOT patterns.  */
 
 const char *
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2860,6 +2860,13 @@ extern enum attr_cpu ix86_schedule;
 #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
 #endif
 
+#include <stdio.h>
+extern void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED);
+#undef ASM_OUTPUT_FUNCTION_PREFIX
+#define ASM_OUTPUT_FUNCTION_PREFIX x86_asm_output_function_prefix
+
 /*
 Local variables:
 version-control: t
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1064,6 +1064,10 @@ mindirect-branch=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
 Convert indirect call and jump to call and return thunks.
 
+mforce-function-padding=
+Target Joined UInteger Var(force_function_padding) Init(0) IntegerRange(0, 6)
+Put a 2^$N byte padding area before each function
+
 mfunction-return=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
 Convert function return to call and return thunk.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1451,6 +1451,7 @@ See RS/6000 and PowerPC Options.
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
 -mindirect-branch-register -mharden-sls=@var{choice} @gol
 -mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access}
+-mforce-function-padding @gol
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -32849,6 +32850,11 @@ Force all calls to functions to be indir
 when using Intel Processor Trace where it generates more precise timing
 information for function calls.
 
+@item -mforce-function-padding
+@opindex -mforce-function-padding
+Force a 16 byte padding are before each function which allows run-time
+code patching to put a special prologue before the function entry.
+
 @item -mmanual-endbr
 @opindex mmanual-endbr
 Insert ENDBR instruction at function entry only via the @code{cf_check}




^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-22 20:11         ` Tim Chen
@ 2022-07-22 22:18           ` Linus Torvalds
  0 siblings, 0 replies; 142+ messages in thread
From: Linus Torvalds @ 2022-07-22 22:18 UTC (permalink / raw)
  To: Tim Chen
  Cc: Thomas Gleixner, LKML, the arch/x86 maintainers, Josh Poimboeuf,
	Andrew Cooper, Pawan Gupta, Johannes Wikner, Alyssa Milburn,
	Jann Horn, H.J. Lu, Joao Moreira, Joseph Nuzman, Steven Rostedt,
	Juergen Gross, Peter Zijlstra (Intel),
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann

On Fri, Jul 22, 2022 at 1:11 PM Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> Here are some performance numbers for FIO running on a SKX server with
> Intel Cold Stream SSD. Padding improves performance significantly.

That certainly looks oh-so-much better than those disgusting ibrs numbers.

One thing that I wonder about - gcc already knows about leaf functions
for other reasons (stack setup is often different anyway), and I
wonder it it might be worth looking at making leaf functions avoid the
whole depth counting, and just rely on a regular call/ret.

The whole call chain thing is already not entirely exact and is
counting to a smaller value than the real RSB size.

And leaf functions are generally the smallest and most often called,
so it might be noticeable on profiles and performance numbers to just
say "we already know this is a leaf, there's no reason to increment
the depth for this only to decrement it when it returns".

The main issue is obviously that the return instruction has to be a
non-decrementing one too for the leaf function case, so it's not just
"don't do the counting on entry", it's also a "don't do the usual
rethunk on exit".

So I just wanted to raise this, not because it's hugely important, but
just to make people think about it - I have these very clear memories
of the whole "don't make leaf functions create a stack frame" having
been a surprisingly big deal for some loads.

Of course, sometimes when I have clear memories, they turn out to be
just some drug-induced confusion in the end. But I know people
experimented with "-fno-omit-frame-pointer -momit-leaf-frame-pointer"
and that it made a big difference (but caused some issue with pvops
hidden in asm so that gcc incorrectly thought they were leaf functions
when they weren't).

                    Linus

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
  2022-07-21 15:54                                 ` Peter Zijlstra
  2022-07-21 17:55                                   ` Peter Zijlstra
@ 2022-07-23  9:50                                   ` Thomas Gleixner
  1 sibling, 0 replies; 142+ messages in thread
From: Thomas Gleixner @ 2022-07-23  9:50 UTC (permalink / raw)
  To: Peter Zijlstra, Sami Tolvanen
  Cc: Linus Torvalds, Joao Moreira, LKML, the arch/x86 maintainers,
	Tim Chen, Josh Poimboeuf, Cooper, Andrew, Pawan Gupta,
	Johannes Wikner, Alyssa Milburn, Jann Horn, H.J. Lu, Moreira,
	Joao, Nuzman, Joseph, Steven Rostedt, Gross, Jurgen,
	Masami Hiramatsu, Alexei Starovoitov, Daniel Borkmann,
	Peter Collingbourne, Kees Cook

On Thu, Jul 21 2022 at 17:54, Peter Zijlstra wrote:
> On Wed, Jul 20, 2022 at 11:13:16PM +0200, Peter Zijlstra wrote:
> The idea is the above callthunk that makes any offset work by not having
> the full hash as an immediate and allow kCFI along with
> -fpatchable-function-entry=N,M and include M in the offset.
>
> Specifically, I meant to use -fpatchable-function-entry=16,16, but alas,
> I never got that far.

That's embarrasing. I missed that option in the maze of gcc
options. That would have spared me to stare at gcc code :)

^ permalink raw reply	[flat|nested] 142+ messages in thread

end of thread, other threads:[~2022-07-23  9:51 UTC | newest]

Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
2022-07-16 23:17 ` [patch 01/38] x86/paravirt: Ensure proper alignment Thomas Gleixner
2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
2022-07-17  0:22   ` Andrew Cooper
2022-07-17 15:20     ` Linus Torvalds
2022-07-17 19:08     ` Thomas Gleixner
2022-07-17 20:08       ` Thomas Gleixner
2022-07-17 20:13         ` Thomas Gleixner
2022-07-17 21:54           ` Thomas Gleixner
2022-07-18  5:11             ` Juergen Gross
2022-07-18  6:54               ` Thomas Gleixner
2022-07-18  8:55                 ` Thomas Gleixner
2022-07-18  9:31                   ` Peter Zijlstra
2022-07-18 10:33                     ` Thomas Gleixner
2022-07-18 11:42                       ` Thomas Gleixner
2022-07-18 17:52   ` [patch 0/3] x86/cpu: Sanitize switch_to_new_gdt() Thomas Gleixner
2022-07-18 17:52   ` [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt() Thomas Gleixner
2022-07-18 18:43     ` Linus Torvalds
2022-07-18 18:55       ` Thomas Gleixner
2022-07-18 17:52   ` [patch 2/3] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations Thomas Gleixner
2022-07-18 17:52   ` [patch 3/3] x86/cpu: Re-enable stackprotector Thomas Gleixner
2022-07-16 23:17 ` [patch 03/38] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc() Thomas Gleixner
2022-07-16 23:17 ` [patch 04/38] x86/vdso: Ensure all kernel code is seen by objtool Thomas Gleixner
2022-07-16 23:17 ` [patch 05/38] btree: Initialize early when builtin Thomas Gleixner
2022-07-16 23:17 ` [patch 06/38] objtool: Allow GS relative relocs Thomas Gleixner
2022-07-16 23:17 ` [patch 07/38] objtool: Track init section Thomas Gleixner
2022-07-16 23:17 ` [patch 08/38] objtool: Add .call_sites section Thomas Gleixner
2022-07-16 23:17 ` [patch 09/38] objtool: Add .sym_sites section Thomas Gleixner
2022-07-16 23:17 ` [patch 10/38] objtool: Add --hacks=skylake Thomas Gleixner
2022-07-16 23:17 ` [patch 11/38] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls Thomas Gleixner
2022-07-16 23:17 ` [patch 12/38] x86/entry: Make sync_regs() invocation a tail call Thomas Gleixner
2022-07-16 23:17 ` [patch 13/38] x86/modules: Make module_alloc() generally available Thomas Gleixner
2022-07-16 23:17 ` [patch 14/38] x86/Kconfig: Add CONFIG_CALL_THUNKS Thomas Gleixner
2022-07-16 23:17 ` [patch 15/38] x86/retbleed: Add X86_FEATURE_CALL_DEPTH Thomas Gleixner
2022-07-16 23:17 ` [patch 16/38] modules: Make struct module_layout unconditionally available Thomas Gleixner
2022-07-16 23:17 ` [patch 17/38] module: Add arch_data to module_layout Thomas Gleixner
2022-07-16 23:17 ` [patch 18/38] mm/vmalloc: Provide huge page mappings Thomas Gleixner
2022-07-16 23:17 ` [patch 19/38] x86/module: Provide __module_alloc() Thomas Gleixner
2022-07-16 23:17 ` [patch 20/38] x86/alternatives: Provide text_poke_[copy|set]_locked() Thomas Gleixner
2022-07-16 23:17 ` [patch 21/38] x86/entry: Make some entry symbols global Thomas Gleixner
2022-07-16 23:17 ` [patch 22/38] x86/paravirt: Make struct paravirt_call_site unconditionally available Thomas Gleixner
2022-07-16 23:17 ` [patch 23/38] x86/callthunks: Add call patching for call depth tracking Thomas Gleixner
2022-07-16 23:17 ` [patch 24/38] module: Add layout for callthunks tracking Thomas Gleixner
2022-07-16 23:17 ` [patch 25/38] x86/modules: Add call thunk patching Thomas Gleixner
2022-07-16 23:17 ` [patch 26/38] x86/returnthunk: Allow different return thunks Thomas Gleixner
2022-07-16 23:17 ` [patch 27/38] x86/asm: Provide ALTERNATIVE_3 Thomas Gleixner
2022-07-16 23:17 ` [patch 28/38] x86/retbleed: Add SKL return thunk Thomas Gleixner
2022-07-16 23:17 ` [patch 29/38] x86/retpoline: Add SKL retthunk retpolines Thomas Gleixner
2022-07-16 23:17 ` [patch 30/38] x86/retbleed: Add SKL call thunk Thomas Gleixner
2022-07-16 23:18 ` [patch 31/38] x86/calldepth: Add ret/call counting for debug Thomas Gleixner
2022-07-16 23:18 ` [patch 32/38] static_call: Add call depth tracking support Thomas Gleixner
2022-07-16 23:18 ` [patch 33/38] kallsyms: Take callthunks into account Thomas Gleixner
2022-07-16 23:18 ` [patch 34/38] x86/orc: Make it callthunk aware Thomas Gleixner
2022-07-16 23:18 ` [patch 35/38] kprobes: Add callthunk blacklisting Thomas Gleixner
2022-07-16 23:18 ` [patch 36/38] x86/ftrace: Make it call depth tracking aware Thomas Gleixner
2022-07-18 21:01   ` Steven Rostedt
2022-07-19  8:46     ` Peter Zijlstra
2022-07-19 13:06       ` Steven Rostedt
2022-07-16 23:18 ` [patch 37/38] x86/bpf: Emit call depth accounting if required Thomas Gleixner
2022-07-19  5:30   ` Alexei Starovoitov
2022-07-19  8:34     ` Peter Zijlstra
2022-07-16 23:18 ` [patch 38/38] x86/retbleed: Add call depth tracking mitigation Thomas Gleixner
2022-07-17  9:45 ` [patch 00/38] x86/retbleed: Call " David Laight
2022-07-17 15:07   ` Thomas Gleixner
2022-07-17 17:56     ` David Laight
2022-07-17 19:15       ` Thomas Gleixner
2022-07-18 19:29 ` Thomas Gleixner
2022-07-18 19:30   ` Thomas Gleixner
2022-07-18 19:51     ` Linus Torvalds
2022-07-18 20:44       ` Thomas Gleixner
2022-07-18 21:01         ` Linus Torvalds
2022-07-18 21:43           ` Peter Zijlstra
2022-07-18 22:34             ` Linus Torvalds
2022-07-18 23:52               ` Peter Zijlstra
2022-07-18 21:18         ` Peter Zijlstra
2022-07-18 22:22           ` Thomas Gleixner
2022-07-18 22:47             ` Joao Moreira
2022-07-18 22:55               ` Sami Tolvanen
2022-07-18 23:08                 ` Joao Moreira
2022-07-18 23:19                 ` Thomas Gleixner
2022-07-18 23:42                   ` Linus Torvalds
2022-07-18 23:52                     ` Linus Torvalds
2022-07-18 23:57                       ` Peter Zijlstra
2022-07-19  0:03                         ` Linus Torvalds
2022-07-19  0:11                           ` Linus Torvalds
2022-07-19  0:23                             ` Peter Zijlstra
2022-07-19  1:02                               ` Linus Torvalds
2022-07-19 17:19                             ` Sami Tolvanen
2022-07-20 21:13                               ` Peter Zijlstra
2022-07-21  8:21                                 ` David Laight
2022-07-21 10:56                                   ` David Laight
2022-07-21 15:54                                 ` Peter Zijlstra
2022-07-21 17:55                                   ` Peter Zijlstra
2022-07-21 18:06                                     ` Linus Torvalds
2022-07-21 18:27                                       ` Peter Zijlstra
2022-07-21 18:32                                         ` Linus Torvalds
2022-07-21 20:22                                           ` Joao Moreira
2022-07-22  0:16                                         ` Sami Tolvanen
2022-07-22 10:23                                           ` Peter Zijlstra
2022-07-22 15:38                                             ` Sami Tolvanen
2022-07-21 22:01                                       ` David Laight
2022-07-22 11:03                                         ` Peter Zijlstra
2022-07-22 13:27                                           ` David Laight
2022-07-23  9:50                                   ` Thomas Gleixner
2022-07-19  0:01                       ` Linus Torvalds
2022-07-19  0:19                         ` Joao Moreira
2022-07-19 17:21                           ` Sami Tolvanen
2022-07-19 17:58                             ` Joao Moreira
2022-07-19  8:26                         ` David Laight
2022-07-19 16:27                           ` Linus Torvalds
2022-07-19 17:23                             ` Sami Tolvanen
2022-07-19 17:27                               ` Linus Torvalds
2022-07-19 18:06                                 ` Sami Tolvanen
2022-07-19 20:10                                   ` Peter Zijlstra
2022-07-18 22:48           ` Sami Tolvanen
2022-07-18 22:59             ` Thomas Gleixner
2022-07-18 23:10               ` Sami Tolvanen
2022-07-18 23:39               ` Linus Torvalds
2022-07-18 23:51             ` Peter Zijlstra
2022-07-20  9:00               ` Thomas Gleixner
2022-07-20 16:55               ` Sami Tolvanen
2022-07-20 19:42               ` Sami Tolvanen
2022-07-22 20:11         ` Tim Chen
2022-07-22 22:18           ` Linus Torvalds
2022-07-18 19:55 ` Thomas Gleixner
2022-07-19 10:24 ` Virt " Andrew Cooper
2022-07-19 14:13   ` Thomas Gleixner
2022-07-19 16:23     ` Andrew Cooper
2022-07-19 21:17       ` Thomas Gleixner
2022-07-19 14:45   ` Michael Kelley (LINUX)
2022-07-19 20:16     ` Peter Zijlstra
2022-07-20 16:57 ` [patch 00/38] x86/retbleed: " Steven Rostedt
2022-07-20 17:09   ` Linus Torvalds
2022-07-20 17:24     ` Peter Zijlstra
2022-07-20 17:50       ` Steven Rostedt
2022-07-20 18:07         ` Linus Torvalds
2022-07-20 18:31           ` Steven Rostedt
2022-07-20 18:43             ` Linus Torvalds
2022-07-20 19:11               ` Steven Rostedt
2022-07-20 19:36           ` Kees Cook
2022-07-20 19:43             ` Steven Rostedt
2022-07-20 21:36             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.