linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Add ability to disable ia32 at boot time
@ 2023-06-07  7:29 Nikolay Borisov
  2023-06-07  7:29 ` [PATCH 1/3] x86: Introduce ia32_disabled boot parameter Nikolay Borisov
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07  7:29 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

We at SUSE would like to have the ability to disable IA32 compat layer and to
give users the ability to override this decision. The motivation behind this is
the fact that the compat layer is not as thoroughly tested/exercised as the native
64bit one. At the same time there are environments where 32bit is still in use
and we'd like to cater to those as well.

As a first step this patchset introduces the 'ia32_disabled' boot time parameter
which breaks 32bit support. This is achieved mainly by setting the user32 cs in the
GDT as not present (P bit set to 0) and making the int 0x80 call gate also not
present. It also re-uses the existing code which makes sysenter defunct when
IA32_EMULATION is not selected. Finally, it also ensures that 32bit processes
can't be loaded by the compat elf loader.

I'm sending now to solicit opinions whether this is an acceptable solution, in the
future likely the mechanism for enabling this would be changed. I.e instead of a
boot time parameter to disable I think we'd ideally introduce a new Kconfig option
which in the distro case might default to "ia32_disabled" whilst the upstream would
retain the current behavior. But before getting into this discussion I'd like to
get confirmation that what I'm doing w.r.t to x86 architecture is not completely
bogus.

Nikolay Borisov (3):
  x86: Introduce ia32_disabled boot parameter
  x86/entry: Disable IA32 syscalls in the presence of ia32_disabled
  x86: Disable running 32bit processes if ia32_disabled is passed

 arch/x86/entry/common.c      | 12 ++++++++++++
 arch/x86/entry/entry_64.S    |  2 --
 arch/x86/include/asm/desc.h  |  5 +++++
 arch/x86/include/asm/elf.h   |  5 +++--
 arch/x86/include/asm/traps.h |  4 ++++
 arch/x86/kernel/cpu/common.c | 37 +++++++++++++++++++++++++-----------
 6 files changed, 50 insertions(+), 15 deletions(-)

--
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] x86: Introduce ia32_disabled boot parameter
  2023-06-07  7:29 [RFC PATCH 0/3] Add ability to disable ia32 at boot time Nikolay Borisov
@ 2023-06-07  7:29 ` Nikolay Borisov
  2023-06-07  8:55   ` Thomas Gleixner
  2023-06-07  7:29 ` [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled Nikolay Borisov
  2023-06-07  7:29 ` [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Nikolay Borisov
  2 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07  7:29 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

Distributions would like to reduce their attack surface as much as
possible but at the same time they have to cater to a wide variety of
legacy software. One such avenue where distros have to strike a balance
is the support for 32bit syscalls on a 64bit kernel. Ideally we'd have
the ability to disable the the compat support at boot time. This would
allow the decision whether it should be disabled/enabled can be
delegated to system administrators.

This patch simply introduces ia32_disable boot parameter which aims at
disabling 32 bit process support even if CONFIG_IA32_EMULATION has been
selected at build time.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
---
 arch/x86/entry/common.c      | 12 ++++++++++++
 arch/x86/include/asm/traps.h |  4 ++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b33..817518768ba2 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
 #include <linux/nospec.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/init.h>
 
 #ifdef CONFIG_XEN_PV
 #include <xen/xen-ops.h>
@@ -96,6 +97,17 @@ static __always_inline int syscall_32_enter(struct pt_regs *regs)
 	return (int)regs->orig_ax;
 }
 
+#ifdef CONFIG_IA32_EMULATION
+bool ia32_disabled = false;
+
+static int ia32_disabled_cmdline(char *arg)
+{
+	ia32_disabled = true;
+	return 1;
+}
+__setup("ia32_disabled", ia32_disabled_cmdline);
+#endif
+
 /*
  * Invoke a 32-bit syscall.  Called with IRQs on in CONTEXT_KERNEL.
  */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..dd93aac3718b 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -20,6 +20,10 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *e
 
 extern bool ibt_selftest(void);
 
+#ifdef CONFIG_IA32_EMULATION
+extern bool ia32_disabled;
+#endif
+
 #ifdef CONFIG_X86_F00F_BUG
 /* For handling the FOOF bug */
 void handle_invalid_op(struct pt_regs *regs);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled
  2023-06-07  7:29 [RFC PATCH 0/3] Add ability to disable ia32 at boot time Nikolay Borisov
  2023-06-07  7:29 ` [PATCH 1/3] x86: Introduce ia32_disabled boot parameter Nikolay Borisov
@ 2023-06-07  7:29 ` Nikolay Borisov
  2023-06-07  9:11   ` Thomas Gleixner
  2023-06-07  7:29 ` [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Nikolay Borisov
  2 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07  7:29 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

First stage of disabling ia32 compat layer is to disable 32bit syscall
entry points. Legacy int 0x80 vector is disabled by setting its gate
descriptor to "not present" and the sysenter vector is disabled by
re-using the existing code in case IA32_EMULATION is disabled.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
---
 arch/x86/entry/entry_64.S    |  2 --
 arch/x86/include/asm/desc.h  |  5 +++++
 arch/x86/kernel/cpu/common.c | 29 ++++++++++++++++++-----------
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f31e286c2977..5e0e8a5e05ca 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1514,7 +1514,6 @@ SYM_CODE_START(asm_exc_nmi)
 	iretq
 SYM_CODE_END(asm_exc_nmi)
 
-#ifndef CONFIG_IA32_EMULATION
 /*
  * This handles SYSCALL from 32-bit code.  There is no way to program
  * MSRs to fully disable 32-bit SYSCALL.
@@ -1525,7 +1524,6 @@ SYM_CODE_START(ignore_sysret)
 	mov	$-ENOSYS, %eax
 	sysretl
 SYM_CODE_END(ignore_sysret)
-#endif
 
 .pushsection .text, "ax"
 	__FUNC_ALIGN
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index ab97b22ac04a..618b428586d1 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -8,6 +8,7 @@
 #include <asm/fixmap.h>
 #include <asm/irq_vectors.h>
 #include <asm/cpu_entry_area.h>
+#include <asm/traps.h>
 
 #include <linux/debug_locks.h>
 #include <linux/smp.h>
@@ -429,6 +430,10 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
 	gate->offset_high	= (u32) (addr >> 32);
 	gate->reserved		= 0;
 #endif
+#ifdef CONFIG_IA32_EMULATION
+	if (ia32_disabled && d->vector == IA32_SYSCALL_VECTOR)
+		gate->bits.p = 0;
+#endif
 }
 
 extern unsigned long system_vectors[];
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 80710a68ef7d..71f8b55f70c9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2054,17 +2054,24 @@ void syscall_init(void)
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
 #ifdef CONFIG_IA32_EMULATION
-	wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
-	/*
-	 * This only works on Intel CPUs.
-	 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
-	 * This does not cause SYSENTER to jump to the wrong location, because
-	 * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
-	 */
-	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
-	wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
-		    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
-	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
+	if (ia32_disabled) {
+		wrmsrl_cstar((unsigned long)ignore_sysret);
+		wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+		wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+		wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+	} else {
+		wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
+		/*
+		 * This only works on Intel CPUs.
+		 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
+		 * This does not cause SYSENTER to jump to the wrong location, because
+		 * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
+		 */
+		wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
+		wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
+			    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
+		wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
+	}
 #else
 	wrmsrl_cstar((unsigned long)ignore_sysret);
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07  7:29 [RFC PATCH 0/3] Add ability to disable ia32 at boot time Nikolay Borisov
  2023-06-07  7:29 ` [PATCH 1/3] x86: Introduce ia32_disabled boot parameter Nikolay Borisov
  2023-06-07  7:29 ` [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled Nikolay Borisov
@ 2023-06-07  7:29 ` Nikolay Borisov
  2023-06-07 12:01   ` Thomas Gleixner
  2023-06-08  4:37   ` Brian Gerst
  2 siblings, 2 replies; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07  7:29 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

In addition to disabling 32bit syscall interface let's also disable the
ability to run 32bit processes altogether. This is achieved by setting
the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would
cause 32 bit processes to trap with a #NP exception. Furthermore,
forbid loading compat processes as well.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
---
 arch/x86/include/asm/elf.h   | 5 +++--
 arch/x86/kernel/cpu/common.c | 8 ++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 18fd06f7936a..406245bc0fb0 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -148,9 +148,10 @@ do {						\
 #define elf_check_arch(x)			\
 	((x)->e_machine == EM_X86_64)
 
+extern bool ia32_disabled;
 #define compat_elf_check_arch(x)					\
-	(elf_check_arch_ia32(x) ||					\
-	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
+	(!ia32_disabled && (elf_check_arch_ia32(x) ||			\
+	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))
 
 static inline void elf_common_init(struct thread_struct *t,
 				   struct pt_regs *regs, const u16 ds)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 71f8b55f70c9..ddc301c09419 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info)
 }
 #endif
 
+static void remove_user32cs_from_gdt(void * __unused)
+{
+	get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0;
+}
+
 /*
  * Invoked from core CPU hotplug code after hotplug operations
  */
@@ -2368,4 +2373,7 @@ void arch_smt_update(void)
 	cpu_bugs_smt_update();
 	/* Check whether IPI broadcasting can be enabled */
 	apic_smt_update();
+	if (ia32_disabled)
+		on_each_cpu(remove_user32cs_from_gdt, NULL, 1);
+
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] x86: Introduce ia32_disabled boot parameter
  2023-06-07  7:29 ` [PATCH 1/3] x86: Introduce ia32_disabled boot parameter Nikolay Borisov
@ 2023-06-07  8:55   ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07  8:55 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
> Distributions would like to reduce their attack surface as much as
> possible but at the same time they have to cater to a wide variety of
> legacy software. One such avenue where distros have to strike a balance
> is the support for 32bit syscalls on a 64bit kernel. Ideally we'd have
> the ability to disable the the compat support at boot time. This would
> allow the decision whether it should be disabled/enabled can be
> delegated to system administrators.
>
> This patch simply introduces ia32_disable boot parameter which aims at

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

> disabling 32 bit process support even if CONFIG_IA32_EMULATION has been
> selected at build time.
>
> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
>  arch/x86/entry/common.c      | 12 ++++++++++++
>  arch/x86/include/asm/traps.h |  4 ++++

New command line parameters require documentation.

https://www.kernel.org/doc/html/latest/process/

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled
  2023-06-07  7:29 ` [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled Nikolay Borisov
@ 2023-06-07  9:11   ` Thomas Gleixner
  2023-06-08  3:18     ` Brian Gerst
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07  9:11 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index ab97b22ac04a..618b428586d1 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -8,6 +8,7 @@
>  #include <asm/fixmap.h>
>  #include <asm/irq_vectors.h>
>  #include <asm/cpu_entry_area.h>
> +#include <asm/traps.h>
>  
>  #include <linux/debug_locks.h>
>  #include <linux/smp.h>
> @@ -429,6 +430,10 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
>  	gate->offset_high	= (u32) (addr >> 32);
>  	gate->reserved		= 0;
>  #endif
> +#ifdef CONFIG_IA32_EMULATION
> +	if (ia32_disabled && d->vector == IA32_SYSCALL_VECTOR)
> +		gate->bits.p = 0;
> +#endif

Why installing the IDT vector in the first place? This is completely
inconsistent with the CONFIG_IA32_EMULATION=n behaviour.

Just slapping this conditional into some random place is really not
cutting it.

The obvious solution is to remove IA32_SYSCALL_VECTOR from def_idts[]
and handle it separately.

>  extern unsigned long system_vectors[];
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 80710a68ef7d..71f8b55f70c9 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2054,17 +2054,24 @@ void syscall_init(void)
>  	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
>  
>  #ifdef CONFIG_IA32_EMULATION
> -	wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
> -	/*
> -	 * This only works on Intel CPUs.
> -	 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
> -	 * This does not cause SYSENTER to jump to the wrong location, because
> -	 * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
> -	 */
> -	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
> -	wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
> -		    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
> -	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
> +	if (ia32_disabled) {
> +		wrmsrl_cstar((unsigned long)ignore_sysret);
> +		wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
> +		wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
> +		wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
> +	} else {
> +		wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
> +		/*
> +		 * This only works on Intel CPUs.
> +		 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
> +		 * This does not cause SYSENTER to jump to the wrong location, because
> +		 * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
> +		 */
> +		wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
> +		wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
> +			    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
> +		wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
> +	}
>  #else
>  	wrmsrl_cstar((unsigned long)ignore_sysret);
>  	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);

So this ends up with two copies of the same code for invalidating
compat. Why?

   if (IS_ENABLED(CONFIG_IA32_EMULATION) && !ia32_disabled)) {
	wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
        ...
   } else {
	wrmsrl_cstar((unsigned long)ignore_sysret);
        ...
   }

All it requires is

#ifdef CONFIG_IA32_EMULATION
void entry_SYSCALL_compat(void);
#else
#define entry_SYSCALL_compat NULL
#endif

in the header which declares entry_SYSCALL_compat.

No?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07  7:29 ` [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Nikolay Borisov
@ 2023-06-07 12:01   ` Thomas Gleixner
  2023-06-07 12:19     ` Nikolay Borisov
  2023-06-07 12:55     ` Thomas Gleixner
  2023-06-08  4:37   ` Brian Gerst
  1 sibling, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07 12:01 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
> In addition to disabling 32bit syscall interface let's also disable the
> ability to run 32bit processes altogether. This is achieved by setting
> the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would
> cause 32 bit processes to trap with a #NP exception. Furthermore,
> forbid loading compat processes as well.

This is obviously the wrong order of things. Prevent loading of compat
processes is the first step, no?

>  
> +extern bool ia32_disabled;
>  #define compat_elf_check_arch(x)					\

So in patch 1 you add the declaration with #ifdef guards and now you add
another one without. Fortunately this is the last patch otherwise we'd
might end up with another incarnation in the next header file.

> -	(elf_check_arch_ia32(x) ||					\
> -	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
> +	(!ia32_disabled && (elf_check_arch_ia32(x) ||			\
> +	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))

If I'm reading this correctly then ia32_disabled also prevents binaries
with X32 ABI to be loaded.

That might be intentional but I'm failing to find any explanation for
this in the changelog.

X32_ABI != IA32_EMULATION

>  static inline void elf_common_init(struct thread_struct *t,
>  				   struct pt_regs *regs, const u16 ds)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 71f8b55f70c9..ddc301c09419 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info)
>  }
>  #endif
>  
> +static void remove_user32cs_from_gdt(void * __unused)
> +{
> +	get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0;
> +}
> +
>  /*
>   * Invoked from core CPU hotplug code after hotplug operations
>   */
> @@ -2368,4 +2373,7 @@ void arch_smt_update(void)
>  	cpu_bugs_smt_update();
>  	/* Check whether IPI broadcasting can be enabled */
>  	apic_smt_update();
> +	if (ia32_disabled)
> +		on_each_cpu(remove_user32cs_from_gdt, NULL, 1);
> +
>  }

This issues a SMP function call on all online CPUs to set these entries
to 0 on _every_ CPU hotplug operation.

I'm sure there is a reason why these bits need to be cleared over and
over. It's just not obvious to me why clearing them _ONCE_ per boot is
not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS -
1) times, but for the last CPU it's enough to do it once.

That aside, what's the justification for doing this in the first place
and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n?

The changelog is full of void in that aspect.

Thanks,

        tglx
        

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 12:01   ` Thomas Gleixner
@ 2023-06-07 12:19     ` Nikolay Borisov
  2023-06-07 12:53       ` Thomas Gleixner
  2023-06-07 12:55     ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07 12:19 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: linux-kernel, mhocko, jslaby



On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote:
> On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
>> In addition to disabling 32bit syscall interface let's also disable the
>> ability to run 32bit processes altogether. This is achieved by setting
>> the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would
>> cause 32 bit processes to trap with a #NP exception. Furthermore,
>> forbid loading compat processes as well.
> 
> This is obviously the wrong order of things. Prevent loading of compat
> processes is the first step, no?

You mean to change the sequence in which those things are mentioned in 
the log?

> 
>>   
>> +extern bool ia32_disabled;
>>   #define compat_elf_check_arch(x)					\
> 
> So in patch 1 you add the declaration with #ifdef guards and now you add
> another one without. Fortunately this is the last patch otherwise we'd
> might end up with another incarnation in the next header file.

My bad, will fix it.

> 
>> -	(elf_check_arch_ia32(x) ||					\
>> -	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
>> +	(!ia32_disabled && (elf_check_arch_ia32(x) ||			\
>> +	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))
> 
> If I'm reading this correctly then ia32_disabled also prevents binaries
> with X32 ABI to be loaded.
> 
> That might be intentional but I'm failing to find any explanation for
> this in the changelog.
> 
> X32_ABI != IA32_EMULATION

Right, however given the other changes (i.e disabling sysenter/int 0x80) 
can we really have a working X32 abi when ia32_disabled is true? Now I'm 
thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled, 
I guess the answer is no?

> 
>>   static inline void elf_common_init(struct thread_struct *t,
>>   				   struct pt_regs *regs, const u16 ds)
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index 71f8b55f70c9..ddc301c09419 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info)
>>   }
>>   #endif
>>   
>> +static void remove_user32cs_from_gdt(void * __unused)
>> +{
>> +	get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0;
>> +}
>> +
>>   /*
>>    * Invoked from core CPU hotplug code after hotplug operations
>>    */
>> @@ -2368,4 +2373,7 @@ void arch_smt_update(void)
>>   	cpu_bugs_smt_update();
>>   	/* Check whether IPI broadcasting can be enabled */
>>   	apic_smt_update();
>> +	if (ia32_disabled)
>> +		on_each_cpu(remove_user32cs_from_gdt, NULL, 1);
>> +
>>   }
> 
> This issues a SMP function call on all online CPUs to set these entries
> to 0 on _every_ CPU hotplug operation.
> 
> I'm sure there is a reason why these bits need to be cleared over and
> over. It's just not obvious to me why clearing them _ONCE_ per boot is
> not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS -
> 1) times, but for the last CPU it's enough to do it once.

Actually clearing them once per-cpu is perfectly fine. Looking around 
the code i saw arch_smt_update() to be the only place where a 
on_each_cpu() call is being made hence I put the code there. Another 
aspect I was thinking of is what if a cpu gets onlined at a later stage 
and a 32bit process is scheduled on that cpu, if the gdt entry wasn't 
cleared on that CPU then it would be possible to run 32bit processes on 
it. I guess a better alternative is to use arch_initcall() ?

> 
> That aside, what's the justification for doing this in the first place
> and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n?

I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the 
traps.h header can't be included in elf.h without causing build breakage.

> 
> The changelog is full of void in that aspect.
> 
> Thanks,
> 
>          tglx
>          

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 12:19     ` Nikolay Borisov
@ 2023-06-07 12:53       ` Thomas Gleixner
  2023-06-07 13:38         ` Nikolay Borisov
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07 12:53 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On Wed, Jun 07 2023 at 15:19, Nikolay Borisov wrote:
> On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote:
>> 
>>> -	(elf_check_arch_ia32(x) ||					\
>>> -	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
>>> +	(!ia32_disabled && (elf_check_arch_ia32(x) ||			\
>>> +	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))
>> 
>> If I'm reading this correctly then ia32_disabled also prevents binaries
>> with X32 ABI to be loaded.
>> 
>> That might be intentional but I'm failing to find any explanation for
>> this in the changelog.
>> 
>> X32_ABI != IA32_EMULATION
>
> Right, however given the other changes (i.e disabling sysenter/int 0x80) 
> can we really have a working X32 abi when ia32_disabled is true? Now I'm 
> thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled, 
> I guess the answer is no?

X32_ABI is completely _independent_ from IA32_EMULATION.

It just shares some of the required compat code, but it does not use
sysenter/int 0x80 at all. It uses the regular 64bit system call.

You can build a working kernel with X32_ABI=y and IA32_EMULATION=n.

So why would boottime disabling of IA32_EMULATION affect X32_ABI in any
way?

>> 
>> This issues a SMP function call on all online CPUs to set these entries
>> to 0 on _every_ CPU hotplug operation.
>> 
>> I'm sure there is a reason why these bits need to be cleared over and
>> over. It's just not obvious to me why clearing them _ONCE_ per boot is
>> not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS -
>> 1) times, but for the last CPU it's enough to do it once.
>
> Actually clearing them once per-cpu is perfectly fine. Looking around 
> the code i saw arch_smt_update() to be the only place where a 
> on_each_cpu() call is being made hence I put the code there. Another 
> aspect I was thinking of is what if a cpu gets onlined at a later stage 
> and a 32bit process is scheduled on that cpu, if the gdt entry wasn't 
> cleared on that CPU then it would be possible to run 32bit processes on 
> it. I guess a better alternative is to use arch_initcall() ?

Why do you need an on_each_cpu() function call at all? Why would you
need an extra arch_initcall()?

The obvious place to clear this is when a CPU is initialized, no?

>> That aside, what's the justification for doing this in the first place
>> and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n?
>
> I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the 
> traps.h header can't be included in elf.h without causing build breakage.

You are not answering my question at all and neither the elf nor the
traps header have anything to do with it. I'm happy to rephrase it:

  1) What is the justification for setting the 'present' bit of
     GDT_ENTRY_DEFAULT_USER32_CS to 0?

  2) Why is #1 inconsistent with CONFIG_IA32_EMULATION=n?

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 12:01   ` Thomas Gleixner
  2023-06-07 12:19     ` Nikolay Borisov
@ 2023-06-07 12:55     ` Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07 12:55 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby, Nikolay Borisov

On Wed, Jun 07 2023 at 14:01, Thomas Gleixner wrote:
> On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
>> @@ -2368,4 +2373,7 @@ void arch_smt_update(void)
>>  	cpu_bugs_smt_update();
>>  	/* Check whether IPI broadcasting can be enabled */
>>  	apic_smt_update();
>> +	if (ia32_disabled)
>> +		on_each_cpu(remove_user32cs_from_gdt, NULL, 1);

My brain based compiler tells me, that this breaks the 32bit build and
the 64bit build with CONFIG_IA32_EMULATION=n. I'm pretty confident that
a real compiler will agree.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 12:53       ` Thomas Gleixner
@ 2023-06-07 13:38         ` Nikolay Borisov
  2023-06-07 14:49           ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-07 13:38 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: linux-kernel, mhocko, jslaby



On 7.06.23 г. 15:53 ч., Thomas Gleixner wrote:
> On Wed, Jun 07 2023 at 15:19, Nikolay Borisov wrote:
>> On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote:
>>>
>>>> -	(elf_check_arch_ia32(x) ||					\
>>>> -	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
>>>> +	(!ia32_disabled && (elf_check_arch_ia32(x) ||			\
>>>> +	 (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))
>>>
>>> If I'm reading this correctly then ia32_disabled also prevents binaries
>>> with X32 ABI to be loaded.
>>>
>>> That might be intentional but I'm failing to find any explanation for
>>> this in the changelog.
>>>
>>> X32_ABI != IA32_EMULATION
>>
>> Right, however given the other changes (i.e disabling sysenter/int 0x80)
>> can we really have a working X32 abi when ia32_disabled is true? Now I'm
>> thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled,
>> I guess the answer is no?
> 
> X32_ABI is completely _independent_ from IA32_EMULATION.
> 
> It just shares some of the required compat code, but it does not use
> sysenter/int 0x80 at all. It uses the regular 64bit system call.
> 
> You can build a working kernel with X32_ABI=y and IA32_EMULATION=n.
> 
> So why would boottime disabling of IA32_EMULATION affect X32_ABI in any
> way?

In this case it shouldn't affect it and the check should be

((elf_check_arch_ia32(x) && !ia32_disabled) || 
(IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)).

> 
>>>
>>> This issues a SMP function call on all online CPUs to set these entries
>>> to 0 on _every_ CPU hotplug operation.
>>>
>>> I'm sure there is a reason why these bits need to be cleared over and
>>> over. It's just not obvious to me why clearing them _ONCE_ per boot is
>>> not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS -
>>> 1) times, but for the last CPU it's enough to do it once.
>>
>> Actually clearing them once per-cpu is perfectly fine. Looking around
>> the code i saw arch_smt_update() to be the only place where a
>> on_each_cpu() call is being made hence I put the code there. Another
>> aspect I was thinking of is what if a cpu gets onlined at a later stage
>> and a 32bit process is scheduled on that cpu, if the gdt entry wasn't
>> cleared on that CPU then it would be possible to run 32bit processes on
>> it. I guess a better alternative is to use arch_initcall() ?
> 
> Why do you need an on_each_cpu() function call at all? Why would you
> need an extra arch_initcall()?
> 
> The obvious place to clear this is when a CPU is initialized, no?
> 
>>> That aside, what's the justification for doing this in the first place
>>> and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n?
>>
>> I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the
>> traps.h header can't be included in elf.h without causing build breakage.
> 
> You are not answering my question at all and neither the elf nor the
> traps header have anything to do with it. I'm happy to rephrase it:
> 
>    1) What is the justification for setting the 'present' bit of
>       GDT_ENTRY_DEFAULT_USER32_CS to 0?

This was something which was suggested by Andrew Cooper on irc, to my 
understanding the idea is that by not having a 32bit capable descriptor 
it's impossible to run a 32bit code. I guess the scenario where it might 
be relevant if someone starts a 64bit process and with inline assembly 
tries to run 32bit code somehow, though it might be a far fetched 
example and also the fact that the compat_elf_check_arch() forbids 
loading 32bit processes might be sufficient.

> 
>    2) Why is #1 inconsistent with CONFIG_IA32_EMULATION=n?

Because I forgot doing it.

> 
> Thanks,
> 
>          tglx
> 
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 13:38         ` Nikolay Borisov
@ 2023-06-07 14:49           ` Thomas Gleixner
  2023-06-07 17:25             ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07 14:49 UTC (permalink / raw)
  To: Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby, Andrew Cooper

On Wed, Jun 07 2023 at 16:38, Nikolay Borisov wrote:
> On 7.06.23 г. 15:53 ч., Thomas Gleixner wrote:
>> So why would boottime disabling of IA32_EMULATION affect X32_ABI in any
>> way?
>
> In this case it shouldn't affect it and the check should be
>
> ((elf_check_arch_ia32(x) && !ia32_disabled) || 
> (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)).

Correct.

>>    1) What is the justification for setting the 'present' bit of
>>       GDT_ENTRY_DEFAULT_USER32_CS to 0?
>
> This was something which was suggested by Andrew Cooper on irc, to my 
> understanding the idea is that by not having a 32bit capable descriptor 
> it's impossible to run a 32bit code.

Right, but that's a completely separate change. If it is agreed on then
it needs to be consistent and not depend on this command line parameter.

> I guess the scenario where it might be relevant if someone starts a
> 64bit process and with inline assembly tries to run 32bit code
> somehow, though it might be a far fetched example and also the fact
> that the compat_elf_check_arch() forbids loading 32bit processes might
> be sufficient.

Guesswork is not helpful. Facts matter.

Fact is that today a 64bit application can do a far jump of far call
into a 32bit code segment based on the default descriptors or by setting
them up via LDT. That 32bit code obviously can't do compat syscalls if
IA32_EMULATION is disabled, but other than that it just works.

Whether that makes sense or not is a completely different question.

Ergo fact is that clearing the present bit is a user space visible
change, which can't be done nilly willy and burried into a patch
which is about making CONFIG_IA32_EMULATION a boot time switch.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 14:49           ` Thomas Gleixner
@ 2023-06-07 17:25             ` Andrew Cooper
  2023-06-07 21:52               ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-06-07 17:25 UTC (permalink / raw)
  To: Thomas Gleixner, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On 07/06/2023 3:49 pm, Thomas Gleixner wrote:
> On Wed, Jun 07 2023 at 16:38, Nikolay Borisov wrote:
>> On 7.06.23 г. 15:53 ч., Thomas Gleixner wrote:
>>>    1) What is the justification for setting the 'present' bit of
>>>       GDT_ENTRY_DEFAULT_USER32_CS to 0?
>> This was something which was suggested by Andrew Cooper on irc, to my 
>> understanding the idea is that by not having a 32bit capable descriptor 
>> it's impossible to run a 32bit code.
> Right, but that's a completely separate change. If it is agreed on then
> it needs to be consistent and not depend on this command line parameter.

And the recommendation was only for debugging purposes.

Segments are not like pagetables.  The present bit is the last of the
checks, not the first, so you'll e.g. get #GP (bad segment type) ahead
of getting #NP[sel].

If you're looking to block 32bit, you should zero the entire GDT entry. 
That way you get #GP to convert into a signal for userspace, rather than
starting down a segment-"paged out" looking route.

>> I guess the scenario where it might be relevant if someone starts a
>> 64bit process and with inline assembly tries to run 32bit code
>> somehow, though it might be a far fetched example and also the fact
>> that the compat_elf_check_arch() forbids loading 32bit processes might
>> be sufficient.
> Guesswork is not helpful. Facts matter.
>
> Fact is that today a 64bit application can do a far jump of far call
> into a 32bit code segment based on the default descriptors or by setting
> them up via LDT. That 32bit code obviously can't do compat syscalls if
> IA32_EMULATION is disabled, but other than that it just works.
>
> Whether that makes sense or not is a completely different question.
>
> Ergo fact is that clearing the present bit is a user space visible
> change, which can't be done nilly willy and burried into a patch
> which is about making CONFIG_IA32_EMULATION a boot time switch.

Removing GDT_ENTRY_DEFAULT_USER32_CS is necessary but not sufficient to
block userspace getting into 32bit mode.

You also have to block Linux from taking any SYSRETL or SYSEXITL path
out of the kernel, as these will load fixed 32bit mode attributes into
%cs without reference to the GDT.

And you need to prevent any userspace use of the LDT, which might be as
simple as just blocking SYS_modify_ldt, but it's been a while since I
last looked.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 17:25             ` Andrew Cooper
@ 2023-06-07 21:52               ` Thomas Gleixner
  2023-06-07 23:43                 ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-07 21:52 UTC (permalink / raw)
  To: Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On Wed, Jun 07 2023 at 18:25, Andrew Cooper wrote:
> On 07/06/2023 3:49 pm, Thomas Gleixner wrote:
>> Ergo fact is that clearing the present bit is a user space visible
>> change, which can't be done nilly willy and burried into a patch
>> which is about making CONFIG_IA32_EMULATION a boot time switch.
>
> Removing GDT_ENTRY_DEFAULT_USER32_CS is necessary but not sufficient to
> block userspace getting into 32bit mode.

Correct.

> You also have to block Linux from taking any SYSRETL or SYSEXITL path
> out of the kernel, as these will load fixed 32bit mode attributes into
> %cs without reference to the GDT.

That's non-trivial as there is no way to disable 32bit SYSCALL on AMD
(Intel does not support 32bit syscall and you get #UD if CS.L != 1). So
to be safe you'd need to make ignore_sysret() kill the process w/o
returning to user space.

Though arguably if GDT does not have USER32_CS and LDT is disabled (or
the creation of code segments is blocked) then invoking SYSCALL from
compat mode requires quite some advanced magic (assumed there are no CPU
and kernel bugs), no?

> And you need to prevent any userspace use of the LDT, which might be as
> simple as just blocking SYS_modify_ldt, but it's been a while since I
> last looked.

CONFIG_MODIFY_LDT_SYSCALL=n is the only in kernel option right now, but
that could be made boottime disabled trivially. Extending LDT to reject
the creation of code segments is not rocket science either.

Though the real question is:

       What is the benefit of such a change?

So far I haven't seen any argument for that. Maybe there is none :)

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 21:52               ` Thomas Gleixner
@ 2023-06-07 23:43                 ` Andrew Cooper
  2023-06-08  0:25                   ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2023-06-07 23:43 UTC (permalink / raw)
  To: Thomas Gleixner, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On 07/06/2023 10:52 pm, Thomas Gleixner wrote:
> On Wed, Jun 07 2023 at 18:25, Andrew Cooper wrote:
>> You also have to block Linux from taking any SYSRETL or SYSEXITL path
>> out of the kernel, as these will load fixed 32bit mode attributes into
>> %cs without reference to the GDT.
> That's non-trivial as there is no way to disable 32bit SYSCALL on AMD
> (Intel does not support 32bit syscall and you get #UD if CS.L != 1). So
> to be safe you'd need to make ignore_sysret() kill the process w/o
> returning to user space.

ignore_sysret() desperately needs renaming to entry_SYSCALL32_ignore()
or similar.

And yes, wiring this into SIGSEGV/etc would be a sensible move.

The same applies to 32bit SYSENTER if configured. (Linux doesn't, but
other OSes do.)

> Though arguably if GDT does not have USER32_CS and LDT is disabled (or
> the creation of code segments is blocked) then invoking SYSCALL from
> compat mode requires quite some advanced magic (assumed there are no CPU
> and kernel bugs), no?

Plenty of arcane magic exists.  Rowhammer the GDT, or exploit a VMM, SMM
or ACM bug (all 3 of which can load segments behind the kernel's back),
or Red-Unlock mode which can write the segments registers directly, or
you could play with the AVX512 brownout erratum some more - the
descriptor L bit is only a few bits along from DPL...

But if we're assuming no bugs, then no - I'm not aware of any way of
doing this.  There are only 4 instructions which can reduce privilege:
LRET, IRET, SYSEXIT and SYSRET.

>> And you need to prevent any userspace use of the LDT, which might be as
>> simple as just blocking SYS_modify_ldt, but it's been a while since I
>> last looked.
> CONFIG_MODIFY_LDT_SYSCALL=n is the only in kernel option right now, but
> that could be made boottime disabled trivially. Extending LDT to reject
> the creation of code segments is not rocket science either.
>
> Though the real question is:
>
>        What is the benefit of such a change?
>
> So far I haven't seen any argument for that. Maybe there is none :)

Hardening.  The general purpose distros definitely won't care, but
special purpose ones will.

An x86 bytestream is decoded differently in different modes, and malware
can hide in the differences.  Standard tooling can't cope with
multi-mode binaries, and if it happens by accident you tend get very
obscure crash to diagnose.

Furthermore, despite CET-SS explicitly trying to account for and protect
against accidental mismatches, there are errata in some parts which let
userspace forge legal return addresses on the shadow stack by dropping
into 32bit mode because, there's a #GP check missing in a microflow.

For usecases where there ought not to be any 32bit code at all (and
there absolutely are), it would be lovely if this could be enforced,
rather than relying on blind hope that it doesn't happen.

Thanks,

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07 23:43                 ` Andrew Cooper
@ 2023-06-08  0:25                   ` Thomas Gleixner
  2023-06-08  6:16                     ` Jiri Slaby
                                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-08  0:25 UTC (permalink / raw)
  To: Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On Thu, Jun 08 2023 at 00:43, Andrew Cooper wrote:
> On 07/06/2023 10:52 pm, Thomas Gleixner wrote:
>> On Wed, Jun 07 2023 at 18:25, Andrew Cooper wrote:
>>> You also have to block Linux from taking any SYSRETL or SYSEXITL path
>>> out of the kernel, as these will load fixed 32bit mode attributes into
>>> %cs without reference to the GDT.
>> That's non-trivial as there is no way to disable 32bit SYSCALL on AMD
>> (Intel does not support 32bit syscall and you get #UD if CS.L != 1). So
>> to be safe you'd need to make ignore_sysret() kill the process w/o
>> returning to user space.
>
> ignore_sysret() desperately needs renaming to entry_SYSCALL32_ignore()
> or similar.

No argument about that.

> And yes, wiring this into SIGSEGV/etc would be a sensible move.

The only option is to wire it into die_hard_crash_and_burn(). There is
no sane way to deliver a signal to the process which managed to run into
this. Appropriate info to parent or ptracer will still be delivered.

> The same applies to 32bit SYSENTER if configured. (Linux doesn't, but
> other OSes do.)

Why the heck are they doing that?

I really wish that we could disable syscall32 reliably on AMD and make
it raise #UD as it does on Intal.

>> Though the real question is:
>>
>>        What is the benefit of such a change?
>>
>> So far I haven't seen any argument for that. Maybe there is none :)
>
> Hardening.  The general purpose distros definitely won't care, but
> special purpose ones will.
>
> An x86 bytestream is decoded differently in different modes, and malware
> can hide in the differences.  Standard tooling can't cope with
> multi-mode binaries, and if it happens by accident you tend get very
> obscure crash to diagnose.

IOW, you are talking about defense in depth, right? I can buy that one.

> Furthermore, despite CET-SS explicitly trying to account for and protect
> against accidental mismatches, there are errata in some parts which let
> userspace forge legal return addresses on the shadow stack by dropping
> into 32bit mode because, there's a #GP check missing in a microflow.

Didn't we assume that there are no CPU bugs? :)

> For usecases where there ought not to be any 32bit code at all (and
> there absolutely are), it would be lovely if this could be enforced,
> rather than relying on blind hope that it doesn't happen.

I think it's rather clear what needs to be done here to achieve that,
but that's completely orthogonal to the intent of the patch series in
question which aims to make CONFIG_IA32_EMULATION a boot time decision.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled
  2023-06-07  9:11   ` Thomas Gleixner
@ 2023-06-08  3:18     ` Brian Gerst
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Gerst @ 2023-06-08  3:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Nikolay Borisov, x86, linux-kernel, mhocko, jslaby

On Wed, Jun 7, 2023 at 5:23 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote:
> > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> > index ab97b22ac04a..618b428586d1 100644
> > --- a/arch/x86/include/asm/desc.h
> > +++ b/arch/x86/include/asm/desc.h
> > @@ -8,6 +8,7 @@
> >  #include <asm/fixmap.h>
> >  #include <asm/irq_vectors.h>
> >  #include <asm/cpu_entry_area.h>
> > +#include <asm/traps.h>
> >
> >  #include <linux/debug_locks.h>
> >  #include <linux/smp.h>
> > @@ -429,6 +430,10 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
> >       gate->offset_high       = (u32) (addr >> 32);
> >       gate->reserved          = 0;
> >  #endif
> > +#ifdef CONFIG_IA32_EMULATION
> > +     if (ia32_disabled && d->vector == IA32_SYSCALL_VECTOR)
> > +             gate->bits.p = 0;
> > +#endif
>
> Why installing the IDT vector in the first place? This is completely
> inconsistent with the CONFIG_IA32_EMULATION=n behaviour.
>
> Just slapping this conditional into some random place is really not
> cutting it.
>
> The obvious solution is to remove IA32_SYSCALL_VECTOR from def_idts[]
> and handle it separately.
>
> >  extern unsigned long system_vectors[];
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 80710a68ef7d..71f8b55f70c9 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -2054,17 +2054,24 @@ void syscall_init(void)
> >       wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
> >
> >  #ifdef CONFIG_IA32_EMULATION
> > -     wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
> > -     /*
> > -      * This only works on Intel CPUs.
> > -      * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
> > -      * This does not cause SYSENTER to jump to the wrong location, because
> > -      * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
> > -      */
> > -     wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
> > -     wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
> > -                 (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
> > -     wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
> > +     if (ia32_disabled) {
> > +             wrmsrl_cstar((unsigned long)ignore_sysret);
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
> > +     } else {
> > +             wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
> > +             /*
> > +              * This only works on Intel CPUs.
> > +              * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
> > +              * This does not cause SYSENTER to jump to the wrong location, because
> > +              * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
> > +              */
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
> > +                         (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
> > +             wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
> > +     }
> >  #else
> >       wrmsrl_cstar((unsigned long)ignore_sysret);
> >       wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
>
> So this ends up with two copies of the same code for invalidating
> compat. Why?
>
>    if (IS_ENABLED(CONFIG_IA32_EMULATION) && !ia32_disabled)) {
>         wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
>         ...
>    } else {
>         wrmsrl_cstar((unsigned long)ignore_sysret);
>         ...
>    }
>
> All it requires is
>
> #ifdef CONFIG_IA32_EMULATION
> void entry_SYSCALL_compat(void);
> #else
> #define entry_SYSCALL_compat NULL
> #endif
>
> in the header which declares entry_SYSCALL_compat.
>
> No?

SYSCALL from 32-bit mode can't be disabled like that.  That's why
ignore_sysret() exists for the !CONFIG_IA32_EMULATION case.

Brian Gerst

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-07  7:29 ` [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Nikolay Borisov
  2023-06-07 12:01   ` Thomas Gleixner
@ 2023-06-08  4:37   ` Brian Gerst
  1 sibling, 0 replies; 26+ messages in thread
From: Brian Gerst @ 2023-06-08  4:37 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: x86, linux-kernel, mhocko, jslaby

On Wed, Jun 7, 2023 at 3:41 AM Nikolay Borisov <nik.borisov@suse.com> wrote:
>
> In addition to disabling 32bit syscall interface let's also disable the
> ability to run 32bit processes altogether. This is achieved by setting
> the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would
> cause 32 bit processes to trap with a #NP exception. Furthermore,
> forbid loading compat processes as well.
>
> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
>  arch/x86/include/asm/elf.h   | 5 +++--
>  arch/x86/kernel/cpu/common.c | 8 ++++++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 18fd06f7936a..406245bc0fb0 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -148,9 +148,10 @@ do {                                               \
>  #define elf_check_arch(x)                      \
>         ((x)->e_machine == EM_X86_64)
>
> +extern bool ia32_disabled;
>  #define compat_elf_check_arch(x)                                       \
> -       (elf_check_arch_ia32(x) ||                                      \
> -        (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))
> +       (!ia32_disabled && (elf_check_arch_ia32(x) ||                   \
> +        (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)))
>
>  static inline void elf_common_init(struct thread_struct *t,
>                                    struct pt_regs *regs, const u16 ds)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 71f8b55f70c9..ddc301c09419 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info)
>  }
>  #endif
>
> +static void remove_user32cs_from_gdt(void * __unused)
> +{
> +       get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0;
> +}
> +
>  /*
>   * Invoked from core CPU hotplug code after hotplug operations
>   */
> @@ -2368,4 +2373,7 @@ void arch_smt_update(void)
>         cpu_bugs_smt_update();
>         /* Check whether IPI broadcasting can be enabled */
>         apic_smt_update();
> +       if (ia32_disabled)
> +               on_each_cpu(remove_user32cs_from_gdt, NULL, 1);
> +
>  }

Disabling USER32_CS isn't really necessary, as simply running 32-bit
user code (in a normally 64-bit process) doesn't pose much of a threat
to the kernel with 32-bit syscalls disabled.

Wine, for example, is moving to a model where the main code runs in
64-bit mode, uses only 64-bit unix libraries, and the 32->64 bit
transitions are done entirely in userspace.  It will still need the
ability to use 32-bit code segments, but won't need 32-bit unix
libraries or syscalls to run 32-bit Windows code.

Brian Gerst

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  0:25                   ` Thomas Gleixner
@ 2023-06-08  6:16                     ` Jiri Slaby
  2023-06-08  6:36                       ` Jiri Slaby
                                         ` (2 more replies)
  2023-06-08  6:29                     ` Jiri Slaby
  2023-06-08 11:25                     ` Andrew Cooper
  2 siblings, 3 replies; 26+ messages in thread
From: Jiri Slaby @ 2023-06-08  6:16 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko

On 08. 06. 23, 2:25, Thomas Gleixner wrote:
> I really wish that we could disable syscall32 reliably on AMD and make
> it raise #UD as it does on Intal.

Sorry, I am likely missing something, but why is not #GP enough when we 
set CSTAR = 0? It's of course not as good as Intel's *defined* #UD, but 
why is not the above sufficient/reliable?

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  0:25                   ` Thomas Gleixner
  2023-06-08  6:16                     ` Jiri Slaby
@ 2023-06-08  6:29                     ` Jiri Slaby
  2023-06-08 11:25                     ` Andrew Cooper
  2 siblings, 0 replies; 26+ messages in thread
From: Jiri Slaby @ 2023-06-08  6:29 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko

On 08. 06. 23, 2:25, Thomas Gleixner wrote:
>> For usecases where there ought not to be any 32bit code at all (and
>> there absolutely are), it would be lovely if this could be enforced,
>> rather than relying on blind hope that it doesn't happen.
> 
> I think it's rather clear what needs to be done here to achieve that,
> but that's completely orthogonal to the intent of the patch series in
> question which aims to make CONFIG_IA32_EMULATION a boot time decision.

Agreed. The original intent was only to disable the 32bit paths in the 
kernel. I.e. set CONFIG_IA32_EMULATION=n by a runtime switch.

Then we came up with idea to disallow loading 32bit binaries to WARN 
people as the bins won't (mostly) work anyway. (We are/were aware this 
doesn't forbid running 32bit code.)

But now, when we are doing that, I think we should disable 32 bits 
completely by the switch. I.e. don't provide 32bit segments and 
whatever. And make that clear and documented in the series. Because as 
it stands now, it's half-way. Agreed? This whole attempt served as a 
call for opinions which we got and which is perfect.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  6:16                     ` Jiri Slaby
@ 2023-06-08  6:36                       ` Jiri Slaby
  2023-06-08 15:30                       ` Thomas Gleixner
  2023-06-08 15:32                       ` Andrew Cooper
  2 siblings, 0 replies; 26+ messages in thread
From: Jiri Slaby @ 2023-06-08  6:36 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko

On 08. 06. 23, 8:16, Jiri Slaby wrote:
> On 08. 06. 23, 2:25, Thomas Gleixner wrote:
>> I really wish that we could disable syscall32 reliably on AMD and make
>> it raise #UD as it does on Intal.
> 
> Sorry, I am likely missing something, but why is not #GP enough when we 
> set CSTAR = 0?

Or rather to some hole (to avoid mappings when mmap_min_addr=0) or to 
something like entry_SYSCALL32_kill which you suggested elsewhere.

But that is maybe what you consider not being "reliable".

>  It's of course not as good as Intel's *defined* #UD, but 
> why is not the above sufficient/reliable?
> 
> thanks,

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  0:25                   ` Thomas Gleixner
  2023-06-08  6:16                     ` Jiri Slaby
  2023-06-08  6:29                     ` Jiri Slaby
@ 2023-06-08 11:25                     ` Andrew Cooper
  2023-06-08 15:56                       ` Thomas Gleixner
  2023-06-08 21:29                       ` Nikolay Borisov
  2 siblings, 2 replies; 26+ messages in thread
From: Andrew Cooper @ 2023-06-08 11:25 UTC (permalink / raw)
  To: Thomas Gleixner, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On 08/06/2023 1:25 am, Thomas Gleixner wrote:
> On Thu, Jun 08 2023 at 00:43, Andrew Cooper wrote:
>> And yes, wiring this into SIGSEGV/etc would be a sensible move.
> The only option is to wire it into die_hard_crash_and_burn(). There is
> no sane way to deliver a signal to the process which managed to run into
> this. Appropriate info to parent or ptracer will still be delivered.

Hmm yeah.  Something has already gone seriously wrong to end up here, so
terminating it is probably best.

> I really wish that we could disable syscall32 reliably on AMD and make
> it raise #UD as it does on Intal.

You know that's changing in FRED, and will follow the AMD model?

The SYSCALL instruction is lower latency if it doesn't need to check %cs
to conditionally #UD.

>> Furthermore, despite CET-SS explicitly trying to account for and protect
>> against accidental mismatches, there are errata in some parts which let
>> userspace forge legal return addresses on the shadow stack by dropping
>> into 32bit mode because, there's a #GP check missing in a microflow.
> Didn't we assume that there are no CPU bugs? :)

Tell me, is such a CPU delivered with or without a complimentary unicorn? :)

>> For usecases where there ought not to be any 32bit code at all (and
>> there absolutely are), it would be lovely if this could be enforced,
>> rather than relying on blind hope that it doesn't happen.
> I think it's rather clear what needs to be done here to achieve that,
> but that's completely orthogonal to the intent of the patch series in
> question which aims to make CONFIG_IA32_EMULATION a boot time decision.

Fully inhibiting 32bit mode is stronger, because it impacts code which
would otherwise operate in CONFIG_IA32_EMULATION=n configuration.

Which is fine, but I agree that it doesn't want to be confused with
being a "runtime CONFIG_IA32_EMULATION" knob.

If the runtime form could also come with an equivalent Kconfig form,
that would be awesome too.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  6:16                     ` Jiri Slaby
  2023-06-08  6:36                       ` Jiri Slaby
@ 2023-06-08 15:30                       ` Thomas Gleixner
  2023-06-08 15:32                       ` Andrew Cooper
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-08 15:30 UTC (permalink / raw)
  To: Jiri Slaby, Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko

On Thu, Jun 08 2023 at 08:16, Jiri Slaby wrote:
> On 08. 06. 23, 2:25, Thomas Gleixner wrote:
>> I really wish that we could disable syscall32 reliably on AMD and make
>> it raise #UD as it does on Intal.
>
> Sorry, I am likely missing something, but why is not #GP enough when we 
> set CSTAR = 0?

Because you are not getting a #GP.

It will try to execute from virtual address 0 in CPL 0 and with RSP
still pointing to the user space stack. So you have several
possibilities:

1) 0 is mapped in user space and SMEP/SMAP is off.

   Attacker won

2) 0 is not mapped or SMEP is on.

   You get #PF from CPL0 and RSP is still pointing to the user space
   stack. If SMAP is on this results in #DF

   If SMAP is off, kernel uses an attacker controlled stack...

Similar sillies when you set it to a valid kernel address which is not
mapped or lacks X or contains invalid opcode ....

So no. CSTAR _must_ be a valid kernel text address which handles the
32bit syscall. Right now all it does is SYSRETL when IA32_EMULATION is
disabled.

So the only way to handle that is to have proper entry code which
switches to kernel context and then runs "syscall32_kill_myself()" which
kills the process hard and it exits without the chance to attempt a
return to user.

Anything else wont work.

Bah. Was it really necessary to bring this up so I hade to page in the
gory details of this hardware insanity again?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08  6:16                     ` Jiri Slaby
  2023-06-08  6:36                       ` Jiri Slaby
  2023-06-08 15:30                       ` Thomas Gleixner
@ 2023-06-08 15:32                       ` Andrew Cooper
  2 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2023-06-08 15:32 UTC (permalink / raw)
  To: Jiri Slaby, Thomas Gleixner, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko

On 08/06/2023 7:16 am, Jiri Slaby wrote:
> On 08. 06. 23, 2:25, Thomas Gleixner wrote:
>> I really wish that we could disable syscall32 reliably on AMD and make
>> it raise #UD as it does on Intal.
>
> Sorry, I am likely missing something, but why is not #GP enough when
> we set CSTAR = 0?

Yeah, don't be setting CSTAR to 0.

If you set CSTAR to 0, and userspace has mapped something at 0, then the
CPU will start executing from 0 in kernel mode.

If you've got SMEP active, this doesn't help.  Instead of executing from
0, you'll take #PF.  Except you were already in kernel mode and #PF
isn't an IST vector, so you'll then start executing the #PF handler on
the same stack as before... which is the user stack, and it can still
hijack execution by hooking a return address.

If you've got (just) SMAP active, then this doesn't help.  The hijacked
execution doesn't need to touch the stack to execute STAC and re-permit
user data accesses.

If you've got SMEP, SMAP, *and* FMASK configured to clear AC
automatically on syscall, then you end up in #DF from a SMEP violation
trying to fetch the code, and a SMAP violation while trying to push the
SMEP violation's #PF IRET frame.


It's almost as if not switching the stack was a terrible terrible idea...

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08 11:25                     ` Andrew Cooper
@ 2023-06-08 15:56                       ` Thomas Gleixner
  2023-06-08 21:29                       ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-06-08 15:56 UTC (permalink / raw)
  To: Andrew Cooper, Nikolay Borisov, x86; +Cc: linux-kernel, mhocko, jslaby

On Thu, Jun 08 2023 at 12:25, Andrew Cooper wrote:
> On 08/06/2023 1:25 am, Thomas Gleixner wrote:
>> I really wish that we could disable syscall32 reliably on AMD and make
>> it raise #UD as it does on Intal.
>
> You know that's changing in FRED, and will follow the AMD model?
>
> The SYSCALL instruction is lower latency if it doesn't need to check %cs
> to conditionally #UD.

Yes, but with FRED the CPL0 context is fully consistent. There are no
CPL3 leftovers.

>> Didn't we assume that there are no CPU bugs? :)
>
> Tell me, is such a CPU delivered with or without a complimentary unicorn? :)

It's deliverd by a fairytale princess :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed
  2023-06-08 11:25                     ` Andrew Cooper
  2023-06-08 15:56                       ` Thomas Gleixner
@ 2023-06-08 21:29                       ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2023-06-08 21:29 UTC (permalink / raw)
  To: Andrew Cooper, Thomas Gleixner, x86; +Cc: linux-kernel, mhocko, jslaby



On 8.06.23 г. 14:25 ч., Andrew Cooper wrote:
> On 08/06/2023 1:25 am, Thomas Gleixner wrote:
>> On Thu, Jun 08 2023 at 00:43, Andrew Cooper wrote:
>>> And yes, wiring this into SIGSEGV/etc would be a sensible move.
>> The only option is to wire it into die_hard_crash_and_burn(). There is
>> no sane way to deliver a signal to the process which managed to run into
>> this. Appropriate info to parent or ptracer will still be delivered.
> 
> Hmm yeah.  Something has already gone seriously wrong to end up here, so
> terminating it is probably best.
> 
>> I really wish that we could disable syscall32 reliably on AMD and make
>> it raise #UD as it does on Intal.
> 
> You know that's changing in FRED, and will follow the AMD model?
> 
> The SYSCALL instruction is lower latency if it doesn't need to check %cs
> to conditionally #UD.
> 
>>> Furthermore, despite CET-SS explicitly trying to account for and protect
>>> against accidental mismatches, there are errata in some parts which let
>>> userspace forge legal return addresses on the shadow stack by dropping
>>> into 32bit mode because, there's a #GP check missing in a microflow.
>> Didn't we assume that there are no CPU bugs? :)
> 
> Tell me, is such a CPU delivered with or without a complimentary unicorn? :)
> 
>>> For usecases where there ought not to be any 32bit code at all (and
>>> there absolutely are), it would be lovely if this could be enforced,
>>> rather than relying on blind hope that it doesn't happen.
>> I think it's rather clear what needs to be done here to achieve that,
>> but that's completely orthogonal to the intent of the patch series in
>> question which aims to make CONFIG_IA32_EMULATION a boot time decision.
> 
> Fully inhibiting 32bit mode is stronger, because it impacts code which
> would otherwise operate in CONFIG_IA32_EMULATION=n configuration.
> 
> Which is fine, but I agree that it doesn't want to be confused with
> being a "runtime CONFIG_IA32_EMULATION" knob.
> 
> If the runtime form could also come with an equivalent Kconfig form,
> that would be awesome too.


Actually that's something I'm working on. I.e be able to set the default 
state of IA32_EMULATION at compile time (i.e disabled or enabled) and 
provide a boottime switch to override this. That way upstream can retain 
the current behavior, while distros can set the "default disabled" 
config-time switch and finally users will have the ability to override 
the config-switch at boottime.


> 
> ~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-06-08 21:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07  7:29 [RFC PATCH 0/3] Add ability to disable ia32 at boot time Nikolay Borisov
2023-06-07  7:29 ` [PATCH 1/3] x86: Introduce ia32_disabled boot parameter Nikolay Borisov
2023-06-07  8:55   ` Thomas Gleixner
2023-06-07  7:29 ` [PATCH 2/3] x86/entry: Disable IA32 syscalls in the presence of ia32_disabled Nikolay Borisov
2023-06-07  9:11   ` Thomas Gleixner
2023-06-08  3:18     ` Brian Gerst
2023-06-07  7:29 ` [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Nikolay Borisov
2023-06-07 12:01   ` Thomas Gleixner
2023-06-07 12:19     ` Nikolay Borisov
2023-06-07 12:53       ` Thomas Gleixner
2023-06-07 13:38         ` Nikolay Borisov
2023-06-07 14:49           ` Thomas Gleixner
2023-06-07 17:25             ` Andrew Cooper
2023-06-07 21:52               ` Thomas Gleixner
2023-06-07 23:43                 ` Andrew Cooper
2023-06-08  0:25                   ` Thomas Gleixner
2023-06-08  6:16                     ` Jiri Slaby
2023-06-08  6:36                       ` Jiri Slaby
2023-06-08 15:30                       ` Thomas Gleixner
2023-06-08 15:32                       ` Andrew Cooper
2023-06-08  6:29                     ` Jiri Slaby
2023-06-08 11:25                     ` Andrew Cooper
2023-06-08 15:56                       ` Thomas Gleixner
2023-06-08 21:29                       ` Nikolay Borisov
2023-06-07 12:55     ` Thomas Gleixner
2023-06-08  4:37   ` Brian Gerst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).