linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
       [not found] <20210106064807.253112-1-Sonicadvance1@gmail.com>
@ 2021-01-15  7:02 ` sonicadvance1
  2021-01-15  7:17   ` [PATCH v2] " sonicadvance1
  2021-01-15 20:01   ` [PATCH] " David Laight
  0 siblings, 2 replies; 6+ messages in thread
From: sonicadvance1 @ 2021-01-15  7:02 UTC (permalink / raw)
  Cc: linux-xtensa, Rich Felker, Jan Kara, Miklos Szeredi,
	Dominik Brodowski, linux-kernel, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	linux-ia64, Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arch, Stephen Rothwell, Arnd Bergmann, Yoshinori Sato,
	linux-sh, Helge Deller, x86, YueHaibing, Russell King,
	Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
	linux-arm-kernel, Catalin Marinas, Matt Turner, linux-mips,
	Fenghua Yu, Arnaldo Carvalho de Melo, Vasily Gorbik, Ryan Houdek,
	Brian Gerst, Heiko Carstens, David Rientjes, Willem de Bruijn,
	Nicholas Piggin, Suren Baghdasaryan, Aleksa Sarai,
	Thomas Bogendoerfer, Ivan Kokshaysky, Alexander Viro,
	Andy Lutomirski, Thomas Gleixner, Xiaoming Ni, Vlastimil Babka,
	Richard Henderson, Chris Zankel, Michal Simek, Tony Luck,
	linux-parisc, linux-m68k, linux-s390, Oleg Nesterov, Minchan Kim,
	Eric W. Biederman, linux-alpha, linux-fsdevel, Borislav Petkov,
	Andrew Morton, linux-api, linuxppc-dev, David S. Miller

From: Ryan Houdek <Sonicadvance1@gmail.com>

Problem presented:
A backwards compatibility layer that allows running x86-64 and x86
processes inside of an AArch64 process.
  - CPU is emulated
  - Syscall interface is mostly passthrough
  - Some syscalls require patching or emulation depending on behaviour
  - Not viable from the emulator design to use an AArch32 host process

x86-64 and x86 userspace emulator source:
https://github.com/FEX-Emu/FEX
Usage of ioctl32 is currently in a downstream fork. This will be the
first user of the syscall.

Cross documentation:
https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54

ioctls are opaque from the emulator perspective and the data wants to be
passed through a syscall as unimpeded as possible.
Sadly due to ioctl struct differences between x86 and x86-64, we need a
syscall that exposes the compatibility ioctl handler to userspace in a
64bit process.

This is necessary behaves of the behaviour differences that occur
between an x86 process doing an ioctl and an x86-64 process doing an
ioctl.

Both of which are captured and passed through the AArch64 ioctl space.
This is implementing a new ioctl32 syscall that allows us to pass 32bit
x86 ioctls through to the kernel with zero or minimal manipulation.

The only supported hosts where we care about this currently is AArch64
and x86-64 (For testing purposes).
PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the
future; But I don't have any platforms that get anywhere near Cortex-A77
performance in those architectures. Nor do I have the time to bring up
the emulator on them.
x86-64 can get to the compatibility ioctl through the int $0x80 handler.

This does not solve the following problems:
1) compat_alloc_user_space inside ioctl
2) ioctls that check task mode instead of entry point for behaviour
3) ioctls allocating memory
4) struct packing problems between architectures

Workarounds for the problems presented:
1a) Do a stack pivot to the lower 32bits from userspace
  - Forces host 64bit process to have its thread stacks to live in 32bit
  space. Not ideal.
  - Only do a stack pivot on ioctl to save previous 32bit VA space
1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer
  - x86-64 truncates stack from this function
  - AArch64 returns the full stack pointer
  - Only ~29 users. Validating all of them support a 64bit stack is
  trivial?

2a) Any application using these can be checked for compatibility in
userspace and put on a block list.
2b) Fix any ioctls doing broken behaviour based on task mode rather than
ioctl entry point

3a) Userspace consumes all VA space above 32bit. Forcing allocations to
occur in lower 32bits
  - This is the current implementation
3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather
than just allow generic memory allocations in full VA space
  - This is hard to guarantee

4a) Blocklist any application using ioctls that have different struct
packing across the boundary
  - Can happen when struct packing of 32bit x86 application goes down
  the aarch64 compat_ioctl path
  - Userspace is a AArch64 process passing 32bit x86 ioctl structures
  through the compat_ioctl path which is typically for AArch32 processes
  - None currently identified
4b) Work with upstream kernel and userspace projects to evaluate and fix
  - Identify the problem ioctls
  - Implement a new ioctl with more sane struct packing that matches
  cross-arch
  - Implement new ioctl while maintaining backwards compatibility with
  previous ioctl handler
  - Change upstream project to use the new compatibility ioctl
  - ioctl deprecation will be case by case per device and project
4b) Userspace implements a full ioctl emulation layer
  - Parses the full ioctl tree
  - Either passes through ioctls that it doesn't understand or
  transforms ioctls that it knows are trouble
  - Has the downside that it can still run in to edge cases that will
  fail
  - Performance of additional tracking is a concern
  - Prone to failure keeping the kernel ioctl and userspace ioctl
  handling in sync
  - Really want to have it in the kernel space as much as possible

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +
 arch/arm/tools/syscall.tbl                  |  1 +
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  2 ++
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +
 fs/ioctl.c                                  | 18 ++++++++++++++++--
 include/linux/syscalls.h                    |  4 ++++
 include/uapi/asm-generic/unistd.h           |  9 ++++++++-
 kernel/sys_ni.c                             |  3 +++
 tools/include/uapi/asm-generic/unistd.h     |  9 ++++++++-
 23 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index a6617067dbe6..81e70fd241d7 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -481,3 +481,4 @@
 549	common	faccessat2			sys_faccessat2
 550	common	process_madvise			sys_process_madvise
 551	common	epoll_pwait2			sys_epoll_pwait2
+552	common	ioctl32			sys_ni_syscall
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 20e1170e2e0a..98fbf1af1169 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -455,3 +455,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 86a9d7b3eabe..949788f5ba40 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		442
+#define __NR_compat_syscalls		443
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index cccfbbefbf95..35e3bc83dbdc 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -891,6 +891,8 @@ __SYSCALL(__NR_faccessat2, sys_faccessat2)
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SYSCALL(__NR_epoll_pwait2, compat_sys_epoll_pwait2)
+#define __NR_ioctl32 442
+__SYSCALL(__NR_ioctl32, compat_sys_ioctl)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index bfc00f2bd437..087fc9627357 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -362,3 +362,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	sys_ioctl32			sys_ioctl32
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 7fe4e45c864c..502b2f87ab60 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index a522adf194ab..e69be6c836d2 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -447,3 +447,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0f03ad223f33..ba395218446f 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -380,3 +380,4 @@
 439	n32	faccessat2			sys_faccessat2
 440	n32	process_madvise			sys_process_madvise
 441	n32	epoll_pwait2			compat_sys_epoll_pwait2
+442	n32	ioctl32			sys_ni_syscall
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 91649690b52f..f42f939702e2 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -356,3 +356,5 @@
 439	n64	faccessat2			sys_faccessat2
 440	n64	process_madvise			sys_process_madvise
 441	n64	epoll_pwait2			sys_epoll_pwait2
+441	n64	epoll_pwait2			sys_epoll_pwait2
+442	n64	ioctl32			sys_ioctl32
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 4bad0c40aed6..b08ff6066f06 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -429,3 +429,4 @@
 439	o32	faccessat2			sys_faccessat2
 440	o32	process_madvise			sys_process_madvise
 441	o32	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	o32	ioctl32			sys_ni_syscall
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 6bcc31966b44..84d2b88d92fa 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	ioctl32			sys_ioctl32
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index f744eb5cba88..9f04d73cf649 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -531,3 +531,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32				sys_ioctl32
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index d443423495e5..2c90c0ecb5c7 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
 440  common	process_madvise		sys_process_madvise		sys_process_madvise
 441  common	epoll_pwait2		sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32			sys_ni_syscall
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 9df40ac0ebc0..1e02a13fa049 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 40d8c7cd8298..f7d24678d0b1 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -487,3 +487,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32			sys_ioctl32
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 874aeacde2dd..b1a3461e1e20 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -446,3 +446,4 @@
 439	i386	faccessat2		sys_faccessat2
 440	i386	process_madvise		sys_process_madvise
 441	i386	epoll_pwait2		sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	i386	ioctl32		sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78672124d28b..0250a04df0df 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -363,6 +363,7 @@
 439	common	faccessat2		sys_faccessat2
 440	common	process_madvise		sys_process_madvise
 441	common	epoll_pwait2		sys_epoll_pwait2
+442	64	ioctl32		sys_ioctl32
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 46116a28eeed..34b653b36b7b 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -412,3 +412,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 4e6cc0a7d69c..7b324a21a257 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -790,8 +790,8 @@ long compat_ptr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 }
 EXPORT_SYMBOL(compat_ptr_ioctl);
 
-COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
-		       compat_ulong_t, arg)
+long do_ioctl32(unsigned int fd, unsigned int cmd,
+			compat_ulong_t arg)
 {
 	struct fd f = fdget(fd);
 	int error;
@@ -850,4 +850,18 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 
 	return error;
 }
+
+COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
+			compat_ulong_t, arg)
+{
+	return do_ioctl32(fd, cmd, arg);
+}
+
+#if BITS_PER_LONG == 64
+SYSCALL_DEFINE3(ioctl32, unsigned int, fd, unsigned int, cmd,
+			compat_ulong_t, arg)
+{
+	return do_ioctl32(fd, cmd, arg);
+}
+#endif
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f3929aff39cf..fb7bac17167a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -386,6 +386,10 @@ asmlinkage long sys_inotify_rm_watch(int fd, __s32 wd);
 /* fs/ioctl.c */
 asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd,
 				unsigned long arg);
+#if defined(CONFIG_COMPAT) && BITS_PER_LONG == 64
+asmlinkage long sys_ioctl32(unsigned int fd, unsigned int cmd,
+				compat_ulong_t arg);
+#endif
 
 /* fs/ioprio.c */
 asmlinkage long sys_ioprio_set(int which, int who, int ioprio);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 728752917785..18279e5b7b4f 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -862,8 +862,15 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2)
 
+#define __NR_ioctl32 442
+#ifdef CONFIG_COMPAT
+__SC_COMP(__NR_ioctl32, sys_ioctl32, compat_sys_ioctl)
+#else
+__SC_COMP(__NR_ioctl32, sys_ni_syscall, sys_ni_syscall)
+#endif
+
 #undef __NR_syscalls
-#define __NR_syscalls 442
+#define __NR_syscalls 443
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 19aa806890d5..5a2f25eb341c 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -302,6 +302,9 @@ COND_SYSCALL(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time64);
 
+COND_SYSCALL(ioctl32);
+COND_SYSCALL_COMPAT(ioctl32);
+
 /*
  * Architecture specific syscalls: see further below
  */
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 728752917785..18279e5b7b4f 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -862,8 +862,15 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2)
 
+#define __NR_ioctl32 442
+#ifdef CONFIG_COMPAT
+__SC_COMP(__NR_ioctl32, sys_ioctl32, compat_sys_ioctl)
+#else
+__SC_COMP(__NR_ioctl32, sys_ni_syscall, sys_ni_syscall)
+#endif
+
 #undef __NR_syscalls
-#define __NR_syscalls 442
+#define __NR_syscalls 443
 
 /*
  * 32 bit systems traditionally used different
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2] Adds a new ioctl32 syscall for backwards compatibility layers
  2021-01-15  7:02 ` [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers sonicadvance1
@ 2021-01-15  7:17   ` sonicadvance1
  2021-01-15 20:01   ` [PATCH] " David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: sonicadvance1 @ 2021-01-15  7:17 UTC (permalink / raw)
  Cc: linux-xtensa, Rich Felker, linux-ia64, Miklos Szeredi,
	Dominik Brodowski, linux-mips, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	Christian Brauner, Vincenzo Frascino, Will Deacon, linux-arch,
	Stephen Rothwell, Arnd Bergmann, Yoshinori Sato, linux-sh,
	Helge Deller, x86, YueHaibing, Russell King,
	Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
	linux-arm-kernel, Catalin Marinas, Matt Turner, Fenghua Yu,
	Arnaldo Carvalho de Melo, Vasily Gorbik, Ryan Houdek,
	Brian Gerst, Heiko Carstens, David Rientjes, Willem de Bruijn,
	Nicholas Piggin, Suren Baghdasaryan, linux-fsdevel, Aleksa Sarai,
	Thomas Bogendoerfer, Ivan Kokshaysky, Alexander Viro,
	Andy Lutomirski, Thomas Gleixner, Xiaoming Ni, Vlastimil Babka,
	Richard Henderson, Chris Zankel, Michal Simek, Tony Luck,
	linux-parisc, linux-m68k, linux-s390, linux-kernel, Minchan Kim,
	Eric W. Biederman, linux-alpha, Joe Perches, Borislav Petkov,
	Andrew Morton, linux-api, linuxppc-dev, David S. Miller

From: Ryan Houdek <Sonicadvance1@gmail.com>

Problem presented:
A backwards compatibility layer that allows running x86-64 and x86
processes inside of an AArch64 process.
  - CPU is emulated
  - Syscall interface is mostly passthrough
  - Some syscalls require patching or emulation depending on behaviour
  - Not viable from the emulator design to use an AArch32 host process

x86-64 and x86 userspace emulator source:
https://github.com/FEX-Emu/FEX
Usage of ioctl32 is currently in a downstream fork. This will be the
first user of the syscall.

Cross documentation:
https://github.com/FEX-Emu/FEX/wiki/32Bit-x86-Woes#ioctl---54

ioctls are opaque from the emulator perspective and the data wants to be
passed through a syscall as unimpeded as possible.
Sadly due to ioctl struct differences between x86 and x86-64, we need a
syscall that exposes the compatibility ioctl handler to userspace in a
64bit process.

This is necessary behaves of the behaviour differences that occur
between an x86 process doing an ioctl and an x86-64 process doing an
ioctl.

Both of which are captured and passed through the AArch64 ioctl space.
This is implementing a new ioctl32 syscall that allows us to pass 32bit
x86 ioctls through to the kernel with zero or minimal manipulation.

The only supported hosts where we care about this currently is AArch64
and x86-64 (For testing purposes).
PPC64LE, MIPS64LE, and RISC-V64 might be interesting to support in the
future; But I don't have any platforms that get anywhere near Cortex-A77
performance in those architectures. Nor do I have the time to bring up
the emulator on them.
x86-64 can get to the compatibility ioctl through the int $0x80 handler.

This does not solve the following problems:
1) compat_alloc_user_space inside ioctl
2) ioctls that check task mode instead of entry point for behaviour
3) ioctls allocating memory
4) struct packing problems between architectures

Workarounds for the problems presented:
1a) Do a stack pivot to the lower 32bits from userspace
  - Forces host 64bit process to have its thread stacks to live in 32bit
  space. Not ideal.
  - Only do a stack pivot on ioctl to save previous 32bit VA space
1b) Teach kernel that compat_alloc_userspace can return a 64bit pointer
  - x86-64 truncates stack from this function
  - AArch64 returns the full stack pointer
  - Only ~29 users. Validating all of them support a 64bit stack is
  trivial?

2a) Any application using these can be checked for compatibility in
userspace and put on a block list.
2b) Fix any ioctls doing broken behaviour based on task mode rather than
ioctl entry point

3a) Userspace consumes all VA space above 32bit. Forcing allocations to
occur in lower 32bits
  - This is the current implementation
3b) Ensure any allocation in the ioctl handles ioctl entrypoint rather
than just allow generic memory allocations in full VA space
  - This is hard to guarantee

4a) Blocklist any application using ioctls that have different struct
packing across the boundary
  - Can happen when struct packing of 32bit x86 application goes down
  the aarch64 compat_ioctl path
  - Userspace is a AArch64 process passing 32bit x86 ioctl structures
  through the compat_ioctl path which is typically for AArch32 processes
  - None currently identified
4b) Work with upstream kernel and userspace projects to evaluate and fix
  - Identify the problem ioctls
  - Implement a new ioctl with more sane struct packing that matches
  cross-arch
  - Implement new ioctl while maintaining backwards compatibility with
  previous ioctl handler
  - Change upstream project to use the new compatibility ioctl
  - ioctl deprecation will be case by case per device and project
4b) Userspace implements a full ioctl emulation layer
  - Parses the full ioctl tree
  - Either passes through ioctls that it doesn't understand or
  transforms ioctls that it knows are trouble
  - Has the downside that it can still run in to edge cases that will
  fail
  - Performance of additional tracking is a concern
  - Prone to failure keeping the kernel ioctl and userspace ioctl
  handling in sync
  - Really want to have it in the kernel space as much as possible

Changes in v2:
- Added the syscall to all architecture tables
- Disabled on 32bit and BE platforms. They can call ioctl directly.
- Disabled on x86-64 as well since you can call this from ia32 or x32
dispatch tables

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +
 arch/arm/tools/syscall.tbl                  |  1 +
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  2 ++
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +
 fs/ioctl.c                                  | 18 ++++++++++++++++--
 include/linux/syscalls.h                    |  4 ++++
 include/uapi/asm-generic/unistd.h           |  9 ++++++++-
 kernel/sys_ni.c                             |  3 +++
 tools/include/uapi/asm-generic/unistd.h     |  9 ++++++++-
 23 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index a6617067dbe6..81e70fd241d7 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -481,3 +481,4 @@
 549	common	faccessat2			sys_faccessat2
 550	common	process_madvise			sys_process_madvise
 551	common	epoll_pwait2			sys_epoll_pwait2
+552	common	ioctl32			sys_ni_syscall
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 20e1170e2e0a..98fbf1af1169 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -455,3 +455,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 86a9d7b3eabe..949788f5ba40 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		442
+#define __NR_compat_syscalls		443
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index cccfbbefbf95..35e3bc83dbdc 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -891,6 +891,8 @@ __SYSCALL(__NR_faccessat2, sys_faccessat2)
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SYSCALL(__NR_epoll_pwait2, compat_sys_epoll_pwait2)
+#define __NR_ioctl32 442
+__SYSCALL(__NR_ioctl32, compat_sys_ioctl)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index bfc00f2bd437..087fc9627357 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -362,3 +362,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	sys_ioctl32			sys_ioctl32
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 7fe4e45c864c..502b2f87ab60 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index a522adf194ab..e69be6c836d2 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -447,3 +447,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0f03ad223f33..ba395218446f 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -380,3 +380,4 @@
 439	n32	faccessat2			sys_faccessat2
 440	n32	process_madvise			sys_process_madvise
 441	n32	epoll_pwait2			compat_sys_epoll_pwait2
+442	n32	ioctl32			sys_ni_syscall
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 91649690b52f..f42f939702e2 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -356,3 +356,5 @@
 439	n64	faccessat2			sys_faccessat2
 440	n64	process_madvise			sys_process_madvise
 441	n64	epoll_pwait2			sys_epoll_pwait2
+441	n64	epoll_pwait2			sys_epoll_pwait2
+442	n64	ioctl32			sys_ioctl32
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 4bad0c40aed6..b08ff6066f06 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -429,3 +429,4 @@
 439	o32	faccessat2			sys_faccessat2
 440	o32	process_madvise			sys_process_madvise
 441	o32	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	o32	ioctl32			sys_ni_syscall
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 6bcc31966b44..84d2b88d92fa 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	ioctl32			sys_ioctl32
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index f744eb5cba88..9f04d73cf649 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -531,3 +531,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32				sys_ioctl32
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index d443423495e5..2c90c0ecb5c7 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
 440  common	process_madvise		sys_process_madvise		sys_process_madvise
 441  common	epoll_pwait2		sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32			sys_ni_syscall
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 9df40ac0ebc0..1e02a13fa049 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 40d8c7cd8298..f7d24678d0b1 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -487,3 +487,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	64	sys_ioctl32			sys_ioctl32
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 874aeacde2dd..b1a3461e1e20 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -446,3 +446,4 @@
 439	i386	faccessat2		sys_faccessat2
 440	i386	process_madvise		sys_process_madvise
 441	i386	epoll_pwait2		sys_epoll_pwait2		compat_sys_epoll_pwait2
+442	i386	ioctl32		sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78672124d28b..0250a04df0df 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -363,6 +363,7 @@
 439	common	faccessat2		sys_faccessat2
 440	common	process_madvise		sys_process_madvise
 441	common	epoll_pwait2		sys_epoll_pwait2
+442	64	ioctl32		sys_ioctl32
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 46116a28eeed..34b653b36b7b 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -412,3 +412,4 @@
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
 441	common	epoll_pwait2			sys_epoll_pwait2
+442	common	ioctl32			sys_ni_syscall
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 4e6cc0a7d69c..7b324a21a257 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -790,8 +790,8 @@ long compat_ptr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 }
 EXPORT_SYMBOL(compat_ptr_ioctl);
 
-COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
-		       compat_ulong_t, arg)
+long do_ioctl32(unsigned int fd, unsigned int cmd,
+			compat_ulong_t arg)
 {
 	struct fd f = fdget(fd);
 	int error;
@@ -850,4 +850,18 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 
 	return error;
 }
+
+COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
+			compat_ulong_t, arg)
+{
+	return do_ioctl32(fd, cmd, arg);
+}
+
+#if BITS_PER_LONG == 64
+SYSCALL_DEFINE3(ioctl32, unsigned int, fd, unsigned int, cmd,
+			compat_ulong_t, arg)
+{
+	return do_ioctl32(fd, cmd, arg);
+}
+#endif
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f3929aff39cf..fb7bac17167a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -386,6 +386,10 @@ asmlinkage long sys_inotify_rm_watch(int fd, __s32 wd);
 /* fs/ioctl.c */
 asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd,
 				unsigned long arg);
+#if defined(CONFIG_COMPAT) && BITS_PER_LONG == 64
+asmlinkage long sys_ioctl32(unsigned int fd, unsigned int cmd,
+				compat_ulong_t arg);
+#endif
 
 /* fs/ioprio.c */
 asmlinkage long sys_ioprio_set(int which, int who, int ioprio);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 728752917785..18279e5b7b4f 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -862,8 +862,15 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2)
 
+#define __NR_ioctl32 442
+#ifdef CONFIG_COMPAT
+__SC_COMP(__NR_ioctl32, sys_ioctl32, compat_sys_ioctl)
+#else
+__SC_COMP(__NR_ioctl32, sys_ni_syscall, sys_ni_syscall)
+#endif
+
 #undef __NR_syscalls
-#define __NR_syscalls 442
+#define __NR_syscalls 443
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 19aa806890d5..5a2f25eb341c 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -302,6 +302,9 @@ COND_SYSCALL(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time64);
 
+COND_SYSCALL(ioctl32);
+COND_SYSCALL_COMPAT(ioctl32);
+
 /*
  * Architecture specific syscalls: see further below
  */
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 728752917785..18279e5b7b4f 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -862,8 +862,15 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise)
 #define __NR_epoll_pwait2 441
 __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2)
 
+#define __NR_ioctl32 442
+#ifdef CONFIG_COMPAT
+__SC_COMP(__NR_ioctl32, sys_ioctl32, compat_sys_ioctl)
+#else
+__SC_COMP(__NR_ioctl32, sys_ni_syscall, sys_ni_syscall)
+#endif
+
 #undef __NR_syscalls
-#define __NR_syscalls 442
+#define __NR_syscalls 443
 
 /*
  * 32 bit systems traditionally used different
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
  2021-01-15  7:02 ` [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers sonicadvance1
  2021-01-15  7:17   ` [PATCH v2] " sonicadvance1
@ 2021-01-15 20:01   ` David Laight
  2021-01-15 22:17     ` Arnd Bergmann
  1 sibling, 1 reply; 6+ messages in thread
From: David Laight @ 2021-01-15 20:01 UTC (permalink / raw)
  To: 'sonicadvance1@gmail.com'
  Cc: linux-xtensa, Rich Felker, Jan Kara, Miklos Szeredi,
	Dominik Brodowski, linux-kernel, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	linux-ia64, Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arch, Stephen Rothwell, Arnd Bergmann, Yoshinori Sato,
	linux-sh, Helge Deller, x86, YueHaibing, Russell King,
	Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
	linux-arm-kernel, Catalin Marinas, Matt Turner, linux-mips,
	Fenghua Yu, Arnaldo Carvalho de Melo, Vasily Gorbik, linux-s390,
	Brian Gerst, Heiko Carstens, David Rientjes, Willem de Bruijn,
	Nicholas Piggin, Suren Baghdasaryan, Aleksa Sarai,
	Thomas Bogendoerfer, Ivan Kokshaysky, Alexander Viro,
	Andy Lutomirski, Thomas Gleixner, Xiaoming Ni, Vlastimil Babka,
	Richard Henderson, Chris Zankel, Michal Simek, Tony Luck,
	linux-parisc, linux-m68k, linux-api, Oleg Nesterov, Minchan Kim,
	Eric W. Biederman, linux-alpha, linux-fsdevel, Borislav Petkov,
	Andrew Morton, linuxppc-dev, David S. Miller

From: sonicadvance1@gmail.com
> Sent: 15 January 2021 07:03
> Problem presented:
> A backwards compatibility layer that allows running x86-64 and x86
> processes inside of an AArch64 process.
>   - CPU is emulated
>   - Syscall interface is mostly passthrough
>   - Some syscalls require patching or emulation depending on behaviour
>   - Not viable from the emulator design to use an AArch32 host process
> 

You are going to need to add all the x86 compatibility code into
your arm64 kernel.
This is likely to be different from the 32bit arm compatibility
because 64bit items are only aligned on 32bit boundaries.
The x86 x32 compatibility will be more like the 32bit arm 'compat'
code - I'm pretty sure arm32 64bit aligned 64bit data.

You'll then need to remember how the process entered the kernel
to work out which compatibility code to invoke.
This is what x86 does.
It allows a single process to do all three types of system call.

Trying to 'patch up' structures outside the kernel, or in the
syscall interface code will always cause grief somewhere.
The only sane place is in the code that uses the structures.
Which, for ioctls, means inside the driver that parses them.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
  2021-01-15 20:01   ` [PATCH] " David Laight
@ 2021-01-15 22:17     ` Arnd Bergmann
  2021-01-15 22:23       ` Rich Felker
  2021-01-15 22:39       ` David Laight
  0 siblings, 2 replies; 6+ messages in thread
From: Arnd Bergmann @ 2021-01-15 22:17 UTC (permalink / raw)
  To: David Laight
  Cc: linux-xtensa, Rich Felker, Jan Kara, Miklos Szeredi,
	Dominik Brodowski, linux-kernel, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	linux-ia64, Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arch, Stephen Rothwell, Arnd Bergmann, Yoshinori Sato,
	linux-sh, Helge Deller, x86, YueHaibing, Russell King,
	Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
	linux-arm-kernel, Catalin Marinas, Matt Turner, linux-mips,
	Fenghua Yu, Arnaldo Carvalho de Melo, Vasily Gorbik,
	sonicadvance1, Brian Gerst, Heiko Carstens, David Rientjes,
	Willem de Bruijn, Nicholas Piggin, Suren Baghdasaryan,
	Aleksa Sarai, Thomas Bogendoerfer, Ivan Kokshaysky,
	Alexander Viro, Andy Lutomirski, Thomas Gleixner, Xiaoming Ni,
	Vlastimil Babka, Richard Henderson, Chris Zankel, Michal Simek,
	Tony Luck, linux-parisc, linux-m68k, linux-s390, Oleg Nesterov,
	Minchan Kim, Eric W. Biederman, linux-alpha, linux-fsdevel,
	Borislav Petkov, Andrew Morton, linux-api, linuxppc-dev,
	David S. Miller

On Fri, Jan 15, 2021 at 9:01 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: sonicadvance1@gmail.com
> > Sent: 15 January 2021 07:03
> > Problem presented:
> > A backwards compatibility layer that allows running x86-64 and x86
> > processes inside of an AArch64 process.
> >   - CPU is emulated
> >   - Syscall interface is mostly passthrough
> >   - Some syscalls require patching or emulation depending on behaviour
> >   - Not viable from the emulator design to use an AArch32 host process
> >
>
> You are going to need to add all the x86 compatibility code into
> your arm64 kernel.
> This is likely to be different from the 32bit arm compatibility
> because 64bit items are only aligned on 32bit boundaries.
> The x86 x32 compatibility will be more like the 32bit arm 'compat'
> code - I'm pretty sure arm32 64bit aligned 64bit data.

All other architectures that have both 32-bit and 64-bit variants
use the same alignment for all types, except for x86.

There are additional differences though, especially if one
were to try to generalize the interface to all architectures.
A subset of the issues includes

- x32 has 64-bit types in places of some types that are
  32 bit everywhere else (time_t, ino_t, off_t, clock_t, ...)

- m68k aligns struct members to at most 16 bits

- uid_t/gid_t/ino_t/dev_t/... are

> You'll then need to remember how the process entered the kernel
> to work out which compatibility code to invoke.
> This is what x86 does.
> It allows a single process to do all three types of system call.
>
> Trying to 'patch up' structures outside the kernel, or in the
> syscall interface code will always cause grief somewhere.
> The only sane place is in the code that uses the structures.
> Which, for ioctls, means inside the driver that parses them.

He's already doing the system call emulation for all the system
calls other than ioctl in user space though. In my experience,
there are actually fairly few ioctl commands that are different
between architectures -- most of them have no misaligned
or architecture-defined struct members at all.

Once you have conversion functions to deal with the 32/64-bit
interface differences and architecture specifics of sockets,
sysvipc, signals, stat, and input_event, handling the
x86-32 specific ioctl commands is comparably easy.

         Arnd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
  2021-01-15 22:17     ` Arnd Bergmann
@ 2021-01-15 22:23       ` Rich Felker
  2021-01-15 22:39       ` David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: Rich Felker @ 2021-01-15 22:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-xtensa, Jan Kara, Miklos Szeredi, Dominik Brodowski,
	linux-kernel, James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, linux-ia64, Christian Brauner,
	Vincenzo Frascino, Will Deacon, linux-arch, Stephen Rothwell,
	Arnd Bergmann, Yoshinori Sato, linux-sh, Helge Deller, x86,
	YueHaibing, Russell King, Christian Borntraeger, Ingo Molnar,
	Geert Uytterhoeven, linux-arm-kernel, Catalin Marinas,
	Matt Turner, linux-mips, Fenghua Yu, Arnaldo Carvalho de Melo,
	Vasily Gorbik, sonicadvance1, Brian Gerst, Heiko Carstens,
	David Rientjes, Willem de Bruijn, Nicholas Piggin,
	Suren Baghdasaryan, Aleksa Sarai, Thomas Bogendoerfer,
	Ivan Kokshaysky, Alexander Viro, Andy Lutomirski,
	Thomas Gleixner, Xiaoming Ni, Vlastimil Babka, Richard Henderson,
	Chris Zankel, Michal Simek, Tony Luck, linux-parisc, linux-m68k,
	linux-s390, Oleg Nesterov, Minchan Kim, David Laight,
	Eric W. Biederman, linux-alpha, linux-fsdevel, Borislav Petkov,
	Andrew Morton, linux-api, linuxppc-dev, David S. Miller

On Fri, Jan 15, 2021 at 11:17:09PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 15, 2021 at 9:01 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: sonicadvance1@gmail.com
> > > Sent: 15 January 2021 07:03
> > > Problem presented:
> > > A backwards compatibility layer that allows running x86-64 and x86
> > > processes inside of an AArch64 process.
> > >   - CPU is emulated
> > >   - Syscall interface is mostly passthrough
> > >   - Some syscalls require patching or emulation depending on behaviour
> > >   - Not viable from the emulator design to use an AArch32 host process
> > >
> >
> > You are going to need to add all the x86 compatibility code into
> > your arm64 kernel.
> > This is likely to be different from the 32bit arm compatibility
> > because 64bit items are only aligned on 32bit boundaries.
> > The x86 x32 compatibility will be more like the 32bit arm 'compat'
> > code - I'm pretty sure arm32 64bit aligned 64bit data.
> 
> All other architectures that have both 32-bit and 64-bit variants
> use the same alignment for all types, except for x86.
> 
> There are additional differences though, especially if one
> were to try to generalize the interface to all architectures.
> A subset of the issues includes
> 
> - x32 has 64-bit types in places of some types that are
>   32 bit everywhere else (time_t, ino_t, off_t, clock_t, ...)
> 
> - m68k aligns struct members to at most 16 bits
> 
> - uid_t/gid_t/ino_t/dev_t/... are
> 
> > You'll then need to remember how the process entered the kernel
> > to work out which compatibility code to invoke.
> > This is what x86 does.
> > It allows a single process to do all three types of system call.
> >
> > Trying to 'patch up' structures outside the kernel, or in the
> > syscall interface code will always cause grief somewhere.
> > The only sane place is in the code that uses the structures.
> > Which, for ioctls, means inside the driver that parses them.
> 
> He's already doing the system call emulation for all the system
> calls other than ioctl in user space though. In my experience,
> there are actually fairly few ioctl commands that are different
> between architectures -- most of them have no misaligned
> or architecture-defined struct members at all.
> 
> Once you have conversion functions to deal with the 32/64-bit
> interface differences and architecture specifics of sockets,
> sysvipc, signals, stat, and input_event, handling the
> x86-32 specific ioctl commands is comparably easy.

Indeed, all of this should just be done in userspace. Note (as you of
course know, but others on CC probably don't) that we did this in musl
libc for the sake of being able to run a time64 userspace on a
pre-time64 kernel, with translation from the new time64 ioctl
structures to the versions needed by the old ioctls and back using a
fairly simple table:

https://git.musl-libc.org/cgit/musl/tree/src/misc/ioctl.c?id=v1.2.2

I imagine there's a fair bit more to be done for 32-/64-bit mismatch
in size/long/pointer types and different alignments, but the problem
is almost certainly tractable, and much easier than what they already
have to be doing for syscalls.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers
  2021-01-15 22:17     ` Arnd Bergmann
  2021-01-15 22:23       ` Rich Felker
@ 2021-01-15 22:39       ` David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: David Laight @ 2021-01-15 22:39 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-xtensa, Rich Felker, Jan Kara, Miklos Szeredi,
	Dominik Brodowski, linux-kernel, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	linux-ia64, Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arch, Stephen Rothwell, Arnd Bergmann, Yoshinori Sato,
	linux-sh, Helge Deller, x86, YueHaibing, Russell King,
	Christian Borntraeger, Ingo Molnar, Geert Uytterhoeven,
	linux-arm-kernel, Catalin Marinas, Matt Turner, linux-mips,
	Fenghua Yu, Arnaldo Carvalho de Melo, Vasily Gorbik,
	sonicadvance1, Brian Gerst, Heiko Carstens, David Rientjes,
	Willem de Bruijn, Nicholas Piggin, Suren Baghdasaryan,
	Aleksa Sarai, Thomas Bogendoerfer, Ivan Kokshaysky,
	Alexander Viro, Andy Lutomirski, Thomas Gleixner, Xiaoming Ni,
	Vlastimil Babka, Richard Henderson, Chris Zankel, Michal Simek,
	Tony Luck, linux-parisc, linux-m68k, linux-s390, Oleg Nesterov,
	Minchan Kim, Eric W. Biederman, linux-alpha, linux-fsdevel,
	Borislav Petkov, Andrew Morton, linux-api, linuxppc-dev,
	David S. Miller

...
> He's already doing the system call emulation for all the system
> calls other than ioctl in user space though. In my experience,
> there are actually fairly few ioctl commands that are different
> between architectures -- most of them have no misaligned
> or architecture-defined struct members at all.

Aren't there also some intractable issues with socket options?
IIRC the kernel code that tried to change them to 64bit was
horribly broken in some obscure cases.

Pushing the conversion down the stack not only identified the
issues, it also made them easier to fix.

If you change the kernel so a 64bit process can execute 32bit
system calls then a lot of the problems do go away.
This is probably easiest done by setting a high bit on the
system call number - as x86_64 does for x32 calls.

You still have to solve the different alignment of 64bit data
on i386.

Of course the system call numbers are different - but that is
just a lookup.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-16 22:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210106064807.253112-1-Sonicadvance1@gmail.com>
2021-01-15  7:02 ` [PATCH] Adds a new ioctl32 syscall for backwards compatibility layers sonicadvance1
2021-01-15  7:17   ` [PATCH v2] " sonicadvance1
2021-01-15 20:01   ` [PATCH] " David Laight
2021-01-15 22:17     ` Arnd Bergmann
2021-01-15 22:23       ` Rich Felker
2021-01-15 22:39       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).