linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] x86: sigcontext fixes, again
@ 2015-10-26  1:25 Andy Lutomirski
  2015-10-26  1:25 ` [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-26  1:25 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev, Andy Lutomirski

This is take 2 at fixing x86 64-bit signals wrt SS.  After a lot of
thought, this is not controlled by any flags -- I would much prefer
to avoid opt-in behavior.  Instead, it just tries hard to avoid
triggering the cases that break DOSEMU.

Stas, this now seems to pass the test you sent me.  It works with
stock dosemu2 (I haven't tested classic dosemu because I can't get it
to work regardless).  It also works with a patched dosemu2 that bypasses
the userspace trampoline:

https://github.com/amluto/dosemu2/commit/571b4d08dc885b7a133e444a2ad23e0d21366206

With this applied, all of the x86 selftests pass on x86_64.  That
wasn't the case before -- ldt_gdt_64 was broken.

This is a bit risky, and another option would be to do nothing at
all.  Then we'd disable the problematic self-tests (sigh), and
DOSEMU and similar tools will be stuck using gross hacks even on new
kernels.

Changes from v1:
 - Comment fixes
 - Fix screwed up uaccess that broke things

Andy Lutomirski (4):
  x86/signal/64: Add a comment about sigcontext->fs and gs
  x86/signal/64: Fix SS if needed when delivering a 64-bit signal
  x86/signal/64: Re-add support for SS in the 64-bit signal context
  selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS

 arch/x86/include/asm/desc_defs.h        |  23 +++
 arch/x86/include/asm/sigcontext.h       |   2 +-
 arch/x86/include/asm/sighandling.h      |   1 -
 arch/x86/include/uapi/asm/sigcontext.h  |  23 ++-
 arch/x86/include/uapi/asm/ucontext.h    |  43 +++++-
 arch/x86/kernel/signal.c                | 114 ++++++++++++---
 tools/testing/selftests/x86/Makefile    |   4 +-
 tools/testing/selftests/x86/sigreturn.c | 240 ++++++++++++++++++++++++++++----
 8 files changed, 391 insertions(+), 59 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs
  2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
@ 2015-10-26  1:25 ` Andy Lutomirski
  2015-10-31 15:25   ` Stas Sergeev
  2015-10-26  1:25 ` [PATCH v2 2/4] x86/signal/64: Fix SS if needed when delivering a 64-bit signal Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-26  1:25 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev, Andy Lutomirski

These fields have a strange history.  This tries to document it.

This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext"), which was reverted by ed596cde9425 ("Revert x86
sigcontext cleanups").

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/uapi/asm/sigcontext.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 40836a9a7250..827c6ed0e26a 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -177,6 +177,24 @@ struct sigcontext {
 	__u64 rip;
 	__u64 eflags;		/* RFLAGS */
 	__u16 cs;
+
+	/*
+	 * Prior to 2.5.64 ("[PATCH] x86-64 updates for 2.5.64-bk3"),
+	 * Linux saved and restored fs and gs in these slots.  This
+	 * was counterproductive, as fsbase and gsbase were never
+	 * saved, so arch_prctl was presumably unreliable.
+	 *
+	 * If these slots are ever needed for any other purpose, there
+	 * is some risk that very old 64-bit binaries could get
+	 * confused.  I doubt that many such binaries still work,
+	 * though, since the same patch in 2.5.64 also removed the
+	 * 64-bit set_thread_area syscall, so it appears that there is
+	 * no TLS API that works in both pre- and post-2.5.64 kernels.
+	 *
+	 * There is at least one additional concern if these slots are
+	 * recycled for another purpose: some DOSEMU versions stash fs
+	 * and gs in these slots manually.
+	 */
 	__u16 gs;
 	__u16 fs;
 	__u16 __pad0;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/4] x86/signal/64: Fix SS if needed when delivering a 64-bit signal
  2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
  2015-10-26  1:25 ` [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs Andy Lutomirski
@ 2015-10-26  1:25 ` Andy Lutomirski
  2015-10-26  1:25 ` [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-26  1:25 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev, Andy Lutomirski

Signals are always delivered to 64-bit tasks with CS set to a long
mode segment.  In long mode, SS doesn't matter, as long as it's a
present writable segment.

If SS starts out invalid (this can happen if the signal was caused
by an IRET fault or was delivered on the way out of set_thread_area
or modify_ldt), then IRET to the signal handler can fail, eventually
killing the task.

The straightforward fix would be to simply reset SS when delivering
a signal.  That breaks DOSEMU, though: 64-bit builds of DOSEMU rely
on SS being set to the faulting SS when signals are delivered.

As a compromise, this patch leaves SS alone so long as it's valid.

The net effect should be that the behavior of successfully delivered
signals is unchanged.  Some signals that would previously have
failed to be delivered will now be delivered successfully.

This has no effect for x32 or 32-bit tasks: their signal handlers
were already called with SS == __USER_DS.

(On Xen, there's a slight hole: if a task sets SS to a writable
 *kernel* data segment, then we will fail to identify it as invalid
 and we'll still kill the task.  If anyone cares, this could be fixed
 with a new paravirt hook.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc_defs.h | 23 ++++++++++++++++++
 arch/x86/kernel/signal.c         | 51 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
index 278441f39856..00971705a16d 100644
--- a/arch/x86/include/asm/desc_defs.h
+++ b/arch/x86/include/asm/desc_defs.h
@@ -98,4 +98,27 @@ struct desc_ptr {
 
 #endif /* !__ASSEMBLY__ */
 
+/* Access rights as returned by LAR */
+#define AR_TYPE_RODATA		(0 * (1 << 9))
+#define AR_TYPE_RWDATA		(1 * (1 << 9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1 << 9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1 << 9))
+#define AR_TYPE_XOCODE		(4 * (1 << 9))
+#define AR_TYPE_XRCODE		(5 * (1 << 9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1 << 9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1 << 9))
+#define AR_TYPE_MASK		(7 * (1 << 9))
+
+#define AR_DPL0			(0 * (1 << 13))
+#define AR_DPL3			(3 * (1 << 13))
+#define AR_DPL_MASK		(3 * (1 << 13))
+
+#define AR_A			(1 << 8)	/* A means "accessed" */
+#define AR_S			(1 << 12)	/* S means "not system" */
+#define AR_P			(1 << 15)	/* P means "present" */
+#define AR_AVL			(1 << 20)	/* AVL does nothing */
+#define AR_L			(1 << 21)	/* L means "long mode" */
+#define AR_DB			(1 << 22)	/* D or B, depending on type */
+#define AR_G			(1 << 23)	/* G means "limit in pages" */
+
 #endif /* _ASM_X86_DESC_DEFS_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index d87ce92d3404..ca8975096728 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -61,6 +61,35 @@
 	regs->seg = GET_SEG(seg) | 3;			\
 } while (0)
 
+#ifdef CONFIG_X86_64
+/*
+ * If regs->ss will cause an IRET fault, change it.  Otherwise leave it
+ * alone.  Using this generally makes no sense unless
+ * user_64bit_mode(regs) would return true.
+ */
+static void force_valid_ss(struct pt_regs *regs)
+{
+	u32 ar;
+	asm volatile ("lar %[old_ss], %[ar]\n\t"
+		      "jz 1f\n\t"		/* If invalid: */
+		      "xorl %[ar], %[ar]\n\t"	/* set ar = 0 */
+		      "1:"
+		      : [ar] "=r" (ar)
+		      : [old_ss] "rm" ((u16)regs->ss));
+
+	/*
+	 * For a valid 64-bit user context, we need DPL 3, type
+	 * read-write data or read-write exp-down data, and S and P
+	 * set.  We can't use VERW because VERW doesn't check the
+	 * P bit.
+	 */
+	ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK;
+	if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) &&
+	    ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN))
+		regs->ss = __USER_DS;
+}
+#endif
+
 int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
 {
 	void __user *buf;
@@ -457,10 +486,28 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 
 	regs->sp = (unsigned long)frame;
 
-	/* Set up the CS register to run signal handlers in 64-bit mode,
-	   even if the handler happens to be interrupting 32-bit code. */
+	/*
+	 * Set up the CS and SS registers to run signal handlers in
+	 * 64-bit mode, even if the handler happens to be interrupting
+	 * 32-bit or 16-bit code.
+	 *
+	 * SS is subtle.  In 64-bit mode, we don't need any particular
+	 * SS descriptor, but we do need SS to be valid.  It's possible
+	 * that the old SS is entirely bogus -- this can happen if the
+	 * signal we're trying to deliver is #GP or #SS caused by a bad
+	 * SS value.  We also have a compatbility issue here: DOSEMU
+	 * relies on the contents of the SS register indicating the
+	 * SS value at the time of the signal, even though that code in
+	 * DOSEMU predates sigreturn's ability to restore SS.  (DOSEMU
+	 * avoids relying on sigreturn to restore SS; instead it uses
+	 * a trampoline.)  So we do our best: if the old SS was valid,
+	 * we keep it.  Otherwise we replace it.
+	 */
 	regs->cs = __USER_CS;
 
+	if (unlikely(regs->ss != __USER_DS))
+		force_valid_ss(regs);
+
 	return 0;
 }
 #endif /* CONFIG_X86_32 */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context
  2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
  2015-10-26  1:25 ` [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs Andy Lutomirski
  2015-10-26  1:25 ` [PATCH v2 2/4] x86/signal/64: Fix SS if needed when delivering a 64-bit signal Andy Lutomirski
@ 2015-10-26  1:25 ` Andy Lutomirski
  2015-10-31 15:18   ` Stas Sergeev
  2015-10-26  1:25 ` [PATCH v2 4/4] selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS Andy Lutomirski
  2015-10-26 11:45 ` [PATCH v2 0/4] x86: sigcontext fixes, again Stas Sergeev
  4 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-26  1:25 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev, Andy Lutomirski, Cyrill Gorcunov, Pavel Emelyanov

This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").

This adds two new uc_flags flags.  UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32).  It indicates that the saved SS
field is valid and that the kernel supports the new behavior.

The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).

This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to load or even restore SS:ESP.
(DOSEMU has a monstrous hack to partially work around this.)

If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.

Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.

This patch is a bit more complicated, and it tries to balance a
bunch of goals.  It makes signal delivery due to a bogus SS reliable
without having to set any flags (64-bit signals will always be
delivered to a valid SS), and it makes most cases of changing
ucontext->ss during signal handling work as expected.

I do this by special-casing the interesting case.  On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).

For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.

To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.

The nitty-gritty details are documented in the header file.

Cc: Stas Sergeev <stsp@list.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/sigcontext.h      |  2 +-
 arch/x86/include/asm/sighandling.h     |  1 -
 arch/x86/include/uapi/asm/sigcontext.h |  5 ++-
 arch/x86/include/uapi/asm/ucontext.h   | 43 ++++++++++++++++++++---
 arch/x86/kernel/signal.c               | 63 ++++++++++++++++++++++++----------
 tools/testing/selftests/x86/Makefile   |  4 +--
 6 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/sigcontext.h b/arch/x86/include/asm/sigcontext.h
index 9dfce4e0417d..9789402b99b6 100644
--- a/arch/x86/include/asm/sigcontext.h
+++ b/arch/x86/include/asm/sigcontext.h
@@ -59,7 +59,7 @@ struct sigcontext {
 	unsigned short cs;
 	unsigned short gs;
 	unsigned short fs;
-	unsigned short __pad0;
+	unsigned short ss;	/* Only restored if UC_RESTORE_SS is set. */
 	unsigned long err;
 	unsigned long trapno;
 	unsigned long oldmask;
diff --git a/arch/x86/include/asm/sighandling.h b/arch/x86/include/asm/sighandling.h
index 89db46752a8f..452c88b8ad06 100644
--- a/arch/x86/include/asm/sighandling.h
+++ b/arch/x86/include/asm/sighandling.h
@@ -13,7 +13,6 @@
 			 X86_EFLAGS_CF | X86_EFLAGS_RF)
 
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where);
-int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc);
 int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 		     struct pt_regs *regs, unsigned long mask);
 
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 827c6ed0e26a..ad02ae519179 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -197,7 +197,10 @@ struct sigcontext {
 	 */
 	__u16 gs;
 	__u16 fs;
-	__u16 __pad0;
+	union {
+		__u16 ss;	/* If UC_SAVED_SS */
+		__u16 __pad0;	/* If !UC_SAVED_SS */
+	};
 	__u64 err;
 	__u64 trapno;
 	__u64 oldmask;
diff --git a/arch/x86/include/uapi/asm/ucontext.h b/arch/x86/include/uapi/asm/ucontext.h
index b7c29c8017f2..a5c1718ce65b 100644
--- a/arch/x86/include/uapi/asm/ucontext.h
+++ b/arch/x86/include/uapi/asm/ucontext.h
@@ -1,11 +1,44 @@
 #ifndef _ASM_X86_UCONTEXT_H
 #define _ASM_X86_UCONTEXT_H
 
-#define UC_FP_XSTATE	0x1	/* indicates the presence of extended state
-				 * information in the memory layout pointed
-				 * by the fpstate pointer in the ucontext's
-				 * sigcontext struct (uc_mcontext).
-				 */
+/*
+ * indicates the presence of extended state
+ * information in the memory layout pointed
+ * by the fpstate pointer in the ucontext's
+ * sigcontext struct (uc_mcontext).
+ */
+#define UC_FP_XSTATE	0x1
+
+#ifdef __x86_64__
+/*
+ * UC_SIGCONTEXT_SS will be set when delivering 64-bit or x32 signals on
+ * kernels that save SS in the sigcontext.  All kernels that set
+ * UC_SIGCONTEXT_SS will correctly restore at least the low 32 bits of esp
+ * regardless of SS (i.e. they implement espfix).
+ *
+ * Kernels that set UC_SIGCONTEXT_SS will also set UC_STRICT_RESTORE_SS
+ * when delivering a signal that came from 64-bit code.
+ *
+ * Sigreturn modifies its behavior depending on the
+ * UC_STRICT_RESTORE_SS flag.  If UC_STRICT_RESTORE_SS is set, or if
+ * the CS value in the signal context does not refer to a 64-bit
+ * code segment, then the SS value in the signal context is restored
+ * verbatim.  If UC_STRICT_RESTORE_SS is not set, the CS value in
+ * the signal context refers to a 64-bit code segment, and the
+ * signal context's SS value is invalid, then SS it will be replaced
+ * with a flat 32-bit selector.
+
+ * This behavior serves two purposes.  It ensures that older programs
+ * that are unaware of the signal context's SS slot and either construct
+ * a signal context from scratch or that catch signals from segmented
+ * contexts and change CS to a 64-bit selector won't crash due to a bad
+ * SS value.  It also ensures that signal handlers that do not modify
+ * the signal context at all return back to the exact CS and SS state
+ * that they came from.
+ */
+#define UC_SIGCONTEXT_SS	0x2
+#define UC_STRICT_RESTORE_SS	0x4
+#endif
 
 #include <asm-generic/ucontext.h>
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index ca8975096728..24c92cc188d0 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -90,7 +90,9 @@ static void force_valid_ss(struct pt_regs *regs)
 }
 #endif
 
-int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
+static int restore_sigcontext(struct pt_regs *regs,
+			      struct sigcontext __user *sc,
+			      unsigned long uc_flags)
 {
 	void __user *buf;
 	unsigned int tmpflags;
@@ -122,15 +124,18 @@ int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
 		COPY(r15);
 #endif /* CONFIG_X86_64 */
 
-#ifdef CONFIG_X86_32
 		COPY_SEG_CPL3(cs);
 		COPY_SEG_CPL3(ss);
-#else /* !CONFIG_X86_32 */
-		/* Kernel saves and restores only the CS segment register on signals,
-		 * which is the bare minimum needed to allow mixed 32/64-bit code.
-		 * App's signal handler can save/restore other segments if needed. */
-		COPY_SEG_CPL3(cs);
-#endif /* CONFIG_X86_32 */
+
+#ifdef CONFIG_X86_64
+		/*
+		 * Fix up SS if needed for the benefit of old DOSEMU and
+		 * CRIU.
+		 */
+		if (unlikely(!(uc_flags & UC_STRICT_RESTORE_SS) &&
+			     user_64bit_mode(regs)))
+			force_valid_ss(regs);
+#endif
 
 		get_user_ex(tmpflags, &sc->flags);
 		regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
@@ -192,6 +197,7 @@ int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 		put_user_ex(regs->cs, &sc->cs);
 		put_user_ex(0, &sc->gs);
 		put_user_ex(0, &sc->fs);
+		put_user_ex(regs->ss, &sc->ss);
 #endif /* CONFIG_X86_32 */
 
 		put_user_ex(fpstate, &sc->fpstate);
@@ -430,6 +436,21 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 	return 0;
 }
 #else /* !CONFIG_X86_32 */
+static unsigned long frame_uc_flags(struct pt_regs *regs)
+{
+	unsigned long flags;
+
+	if (cpu_has_xsave)
+		flags = UC_FP_XSTATE | UC_SIGCONTEXT_SS;
+	else
+		flags = UC_SIGCONTEXT_SS;
+
+	if (likely(user_64bit_mode(regs)))
+		flags |= UC_STRICT_RESTORE_SS;
+
+	return flags;
+}
+
 static int __setup_rt_frame(int sig, struct ksignal *ksig,
 			    sigset_t *set, struct pt_regs *regs)
 {
@@ -449,10 +470,7 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 
 	put_user_try {
 		/* Create the ucontext.  */
-		if (cpu_has_xsave)
-			put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
-		else
-			put_user_ex(0, &frame->uc.uc_flags);
+		put_user_ex(frame_uc_flags(regs), &frame->uc.uc_flags);
 		put_user_ex(0, &frame->uc.uc_link);
 		save_altstack_ex(&frame->uc.uc_stack, regs->sp);
 
@@ -534,10 +552,7 @@ static int x32_setup_rt_frame(struct ksignal *ksig,
 
 	put_user_try {
 		/* Create the ucontext.  */
-		if (cpu_has_xsave)
-			put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
-		else
-			put_user_ex(0, &frame->uc.uc_flags);
+		put_user_ex(frame_uc_flags(regs), &frame->uc.uc_flags);
 		put_user_ex(0, &frame->uc.uc_link);
 		compat_save_altstack_ex(&frame->uc.uc_stack, regs->sp);
 		put_user_ex(0, &frame->uc.uc__pad0);
@@ -599,7 +614,11 @@ asmlinkage unsigned long sys_sigreturn(void)
 
 	set_current_blocked(&set);
 
-	if (restore_sigcontext(regs, &frame->sc))
+	/*
+	 * x86_32 has no uc_flags bits relevant to restore_sigcontext.
+	 * Save a few cycles by skipping the __get_user.
+	 */
+	if (restore_sigcontext(regs, &frame->sc, 0))
 		goto badframe;
 	return regs->ax;
 
@@ -615,16 +634,19 @@ asmlinkage long sys_rt_sigreturn(void)
 	struct pt_regs *regs = current_pt_regs();
 	struct rt_sigframe __user *frame;
 	sigset_t set;
+	unsigned long uc_flags;
 
 	frame = (struct rt_sigframe __user *)(regs->sp - sizeof(long));
 	if (!access_ok(VERIFY_READ, frame, sizeof(*frame)))
 		goto badframe;
 	if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
 		goto badframe;
+	if (__get_user(uc_flags, &frame->uc.uc_flags))
+		goto badframe;
 
 	set_current_blocked(&set);
 
-	if (restore_sigcontext(regs, &frame->uc.uc_mcontext))
+	if (restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
 		goto badframe;
 
 	if (restore_altstack(&frame->uc.uc_stack))
@@ -805,6 +827,7 @@ asmlinkage long sys32_x32_rt_sigreturn(void)
 	struct pt_regs *regs = current_pt_regs();
 	struct rt_sigframe_x32 __user *frame;
 	sigset_t set;
+	unsigned long uc_flags;
 
 	frame = (struct rt_sigframe_x32 __user *)(regs->sp - 8);
 
@@ -812,10 +835,12 @@ asmlinkage long sys32_x32_rt_sigreturn(void)
 		goto badframe;
 	if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
 		goto badframe;
+	if (__get_user(uc_flags, &frame->uc.uc_flags))
+		goto badframe;
 
 	set_current_blocked(&set);
 
-	if (restore_sigcontext(regs, &frame->uc.uc_mcontext))
+	if (restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
 		goto badframe;
 
 	if (compat_restore_altstack(&frame->uc.uc_stack))
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 389701f59940..c2221786d835 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,8 +4,8 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs ldt_gdt syscall_nt ptrace_syscall
-TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall_vdso unwind_vdso
+TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs ldt_gdt syscall_nt ptrace_syscall sigreturn
+TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/4] selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS
  2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-10-26  1:25 ` [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context Andy Lutomirski
@ 2015-10-26  1:25 ` Andy Lutomirski
  2015-10-26 11:45 ` [PATCH v2 0/4] x86: sigcontext fixes, again Stas Sergeev
  4 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-26  1:25 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev, Andy Lutomirski

This tests the two ABI-preserving cases that DOSEMU cares about, and
it also explicitly tests the new UC_SIGCONTEXT_SS and
UC_STRICT_RESTORE_SS flags.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/sigreturn.c | 240 ++++++++++++++++++++++++++++----
 1 file changed, 212 insertions(+), 28 deletions(-)

diff --git a/tools/testing/selftests/x86/sigreturn.c b/tools/testing/selftests/x86/sigreturn.c
index b5aa1bab7416..43e840470e32 100644
--- a/tools/testing/selftests/x86/sigreturn.c
+++ b/tools/testing/selftests/x86/sigreturn.c
@@ -55,6 +55,47 @@
 #include <sys/user.h>
 
 /*
+ * Copied from asm/ucontext.h, as asm/ucontext.h conflicts badly with the glibc
+ * headers.
+ */
+#ifdef __x86_64__
+/*
+ * UC_SAVED_SS will be set when delivering 64-bit or x32 signals on
+ * kernels that save SS in the sigcontext.  Kernels that set UC_SAVED_SS
+ * allow signal handlers to set UC_RESTORE_SS; if UC_RESTORE_SS is set,
+ * then sigreturn will restore SS.
+ *
+ * For compatibility with old programs, the kernel will *not* set
+ * UC_RESTORE_SS when delivering signals.
+ */
+#define UC_SIGCONTEXT_SS       0x2
+#define UC_STRICT_RESTORE_SS   0x4
+#endif
+
+/* Access rights as returned by LAR */
+#define AR_TYPE_RODATA		(0 * (1 << 9))
+#define AR_TYPE_RWDATA		(1 * (1 << 9))
+#define AR_TYPE_RODATA_EXPDOWN	(2 * (1 << 9))
+#define AR_TYPE_RWDATA_EXPDOWN	(3 * (1 << 9))
+#define AR_TYPE_XOCODE		(4 * (1 << 9))
+#define AR_TYPE_XRCODE		(5 * (1 << 9))
+#define AR_TYPE_XOCODE_CONF	(6 * (1 << 9))
+#define AR_TYPE_XRCODE_CONF	(7 * (1 << 9))
+#define AR_TYPE_MASK		(7 * (1 << 9))
+
+#define AR_DPL0			(0 * (1 << 13))
+#define AR_DPL3			(3 * (1 << 13))
+#define AR_DPL_MASK		(3 * (1 << 13))
+
+#define AR_A			(1 << 8)	/* A means "accessed" */
+#define AR_S			(1 << 12)	/* S means "not system" */
+#define AR_P			(1 << 15)	/* P means "present" */
+#define AR_AVL			(1 << 20)	/* AVL does nothing */
+#define AR_L			(1 << 21)	/* L means "long mode" */
+#define AR_DB			(1 << 22)	/* D or B, depending on type */
+#define AR_G			(1 << 23)	/* G means "limit in pages" */
+
+/*
  * In principle, this test can run on Linux emulation layers (e.g.
  * Illumos "LX branded zones").  Solaris-based kernels reserve LDT
  * entries 0-5 for their own internal purposes, so start our LDT
@@ -267,6 +308,9 @@ static gregset_t initial_regs, requested_regs, resulting_regs;
 /* Instructions for the SIGUSR1 handler. */
 static volatile unsigned short sig_cs, sig_ss;
 static volatile sig_atomic_t sig_trapped, sig_err, sig_trapno;
+#ifdef __x86_64__
+static volatile sig_atomic_t sig_corrupt_final_ss;
+#endif
 
 /* Abstractions for some 32-bit vs 64-bit differences. */
 #ifdef __x86_64__
@@ -305,9 +349,105 @@ static greg_t *csptr(ucontext_t *ctx)
 }
 #endif
 
+/*
+ * Checks a given selector for its code bitness or returns -1 if it's not
+ * a usable code segment selector.
+ */
+int cs_bitness(unsigned short cs)
+{
+	uint32_t valid = 0, ar;
+	asm ("lar %[cs], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "mov $1, %[valid]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [valid] "+rm" (valid)
+	     : [cs] "r" (cs));
+
+	if (!valid)
+		return -1;
+
+	bool db = (ar & (1 << 22));
+	bool l = (ar & (1 << 21));
+
+	if (!(ar & (1<<11)))
+	    return -1;	/* Not code. */
+
+	if (l && !db)
+		return 64;
+	else if (!l && db)
+		return 32;
+	else if (!l && !db)
+		return 16;
+	else
+		return -1;	/* Unknown bitness. */
+}
+
+/*
+ * Checks a given selector for its code bitness or returns -1 if it's not
+ * a usable code segment selector.
+ */
+bool is_valid_ss(unsigned short cs)
+{
+	uint32_t valid = 0, ar;
+	asm ("lar %[cs], %[ar]\n\t"
+	     "jnz 1f\n\t"
+	     "mov $1, %[valid]\n\t"
+	     "1:"
+	     : [ar] "=r" (ar), [valid] "+rm" (valid)
+	     : [cs] "r" (cs));
+
+	if (!valid)
+		return false;
+
+	if ((ar & AR_TYPE_MASK) != AR_TYPE_RWDATA &&
+	    (ar & AR_TYPE_MASK) != AR_TYPE_RWDATA_EXPDOWN)
+		return false;
+
+	return (ar & AR_P);
+}
+
 /* Number of errors in the current test case. */
 static volatile sig_atomic_t nerrs;
 
+static void validate_signal_ss(int sig, ucontext_t *ctx)
+{
+#ifdef __x86_64__
+	bool was_64bit = (cs_bitness(*csptr(ctx)) == 64);
+
+	if (!(ctx->uc_flags & UC_SIGCONTEXT_SS)) {
+		printf("[FAIL]\tUC_SIGCONTEXT_SS was not set\n");
+		nerrs++;
+
+		/*
+		 * This happens on Linux 4.1.  The rest will fail, too, so
+		 * return now to reduce the noise.
+		 */
+		return;
+	}
+
+	/* UC_STRICT_RESTORE_SS is set iff we came from 64-bit mode. */
+	if (!!(ctx->uc_flags & UC_STRICT_RESTORE_SS) != was_64bit) {
+		printf("[FAIL]\tUC_STRICT_RESTORE_SS was wrong in signal %d\n",
+		       sig);
+		nerrs++;
+	}
+
+	if (is_valid_ss(*ssptr(ctx))) {
+		/*
+		 * DOSEMU was written before 64-bit sigcontext had SS, and
+		 * it tries to figure out the signal source SS by looking at
+		 * the physical register.  Make sure that keeps working.
+		 */
+		unsigned short hw_ss;
+		asm ("mov %%ss, %0" : "=rm" (hw_ss));
+		if (hw_ss != *ssptr(ctx)) {
+			printf("[FAIL]\tHW SS didn't match saved SS\n");
+			nerrs++;
+		}
+	}
+#endif
+}
+
 /*
  * SIGUSR1 handler.  Sets CS and SS as requested and points IP to the
  * int3 trampoline.  Sets SP to a large known value so that we can see
@@ -317,6 +457,8 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 {
 	ucontext_t *ctx = (ucontext_t*)ctx_void;
 
+	validate_signal_ss(sig, ctx);
+
 	memcpy(&initial_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 
 	*csptr(ctx) = sig_cs;
@@ -334,13 +476,16 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 }
 
 /*
- * Called after a successful sigreturn.  Restores our state so that
- * the original raise(SIGUSR1) returns.
+ * Called after a successful sigreturn (via int3) or from a failed
+ * sigreturn (directly by kernel).  Restores our state so that the
+ * original raise(SIGUSR1) returns.
  */
 static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
 {
 	ucontext_t *ctx = (ucontext_t*)ctx_void;
 
+	validate_signal_ss(sig, ctx);
+
 	sig_err = ctx->uc_mcontext.gregs[REG_ERR];
 	sig_trapno = ctx->uc_mcontext.gregs[REG_TRAPNO];
 
@@ -358,41 +503,62 @@ static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
 	memcpy(&resulting_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 	memcpy(&ctx->uc_mcontext.gregs, &initial_regs, sizeof(gregset_t));
 
+#ifdef __x86_64__
+	if (sig_corrupt_final_ss) {
+		if (ctx->uc_flags & UC_STRICT_RESTORE_SS) {
+			printf("[FAIL]\tUC_STRICT_RESTORE_SS was set inappropriately\n");
+			nerrs++;
+		} else {
+			/*
+			 * DOSEMU transitions from 32-bit to 64-bit mode by
+			 * adjusting sigcontext, and it requires that this work
+			 * even if the saved SS is bogus.
+			 */
+			printf("\tCorrupting SS on return to 64-bit mode\n");
+			*ssptr(ctx) = 0;
+		}
+	}
+#endif
+
 	sig_trapped = sig;
 }
 
-/*
- * Checks a given selector for its code bitness or returns -1 if it's not
- * a usable code segment selector.
- */
-int cs_bitness(unsigned short cs)
+#ifdef __x86_64__
+/* Tests recovery if !UC_STRICT_RESTORE_SS */
+static void sigusr2(int sig, siginfo_t *info, void *ctx_void)
 {
-	uint32_t valid = 0, ar;
-	asm ("lar %[cs], %[ar]\n\t"
-	     "jnz 1f\n\t"
-	     "mov $1, %[valid]\n\t"
-	     "1:"
-	     : [ar] "=r" (ar), [valid] "+rm" (valid)
-	     : [cs] "r" (cs));
+	ucontext_t *ctx = (ucontext_t*)ctx_void;
 
-	if (!valid)
-		return -1;
+	if (!(ctx->uc_flags & UC_STRICT_RESTORE_SS)) {
+		printf("[FAIL]\traise(2) didn't set UC_STRICT_RESTORE_SS\n");
+		nerrs++;
+		return;  /* We can't do the rest. */
+	}
 
-	bool db = (ar & (1 << 22));
-	bool l = (ar & (1 << 21));
+	ctx->uc_flags &= ~UC_STRICT_RESTORE_SS;
+	*ssptr(ctx) = 0;
 
-	if (!(ar & (1<<11)))
-	    return -1;	/* Not code. */
+	/* Return.  The kernel should recover without sending another signal. */
+}
 
-	if (l && !db)
-		return 64;
-	else if (!l && db)
-		return 32;
-	else if (!l && !db)
-		return 16;
-	else
-		return -1;	/* Unknown bitness. */
+static int test_nonstrict_ss(void)
+{
+	clearhandler(SIGUSR1);
+	clearhandler(SIGTRAP);
+	clearhandler(SIGSEGV);
+	clearhandler(SIGILL);
+	sethandler(SIGUSR2, sigusr2, 0);
+
+	nerrs = 0;
+
+	printf("[RUN]\tClear UC_STRICT_RESTORE_SS and corrupt SS\n");
+	raise(SIGUSR2);
+	if (!nerrs)
+		printf("[OK]\tIt worked\n");
+
+	return nerrs;
 }
+#endif
 
 /* Finds a usable code segment of the requested bitness. */
 int find_cs(int bitness)
@@ -576,6 +742,12 @@ static int test_bad_iret(int cs_bits, unsigned short ss, int force_cs)
 		       errdesc, strsignal(sig_trapped));
 		return 0;
 	} else {
+		/*
+		 * This also implicitly tests UC_STRICT_RESTORE_SS:
+		 * We check that these signals set UC_STRICT_RESTORE_SS and,
+		 * if UC_STRICT_RESTORE_SS doesn't cause strict behavior,
+		 * then we won't get SIGSEGV.
+		 */
 		printf("[FAIL]\tDid not get SIGSEGV\n");
 		return 1;
 	}
@@ -632,6 +804,14 @@ int main()
 						    GDT3(gdt_data16_idx));
 	}
 
+#ifdef __x86_64__
+	/* Nasty ABI case: check SS corruption handling. */
+	sig_corrupt_final_ss = 1;
+	total_nerrs += test_valid_sigreturn(32, false, -1);
+	total_nerrs += test_valid_sigreturn(32, true, -1);
+	sig_corrupt_final_ss = 0;
+#endif
+
 	/*
 	 * We're done testing valid sigreturn cases.  Now we test states
 	 * for which sigreturn itself will succeed but the subsequent
@@ -680,5 +860,9 @@ int main()
 	if (gdt_npdata32_idx)
 		test_bad_iret(32, GDT3(gdt_npdata32_idx), -1);
 
+#ifdef __x86_64__
+	total_nerrs += test_nonstrict_ss();
+#endif
+
 	return total_nerrs ? 1 : 0;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
                   ` (3 preceding siblings ...)
  2015-10-26  1:25 ` [PATCH v2 4/4] selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS Andy Lutomirski
@ 2015-10-26 11:45 ` Stas Sergeev
  2015-10-27  0:52   ` Andy Lutomirski
  4 siblings, 1 reply; 22+ messages in thread
From: Stas Sergeev @ 2015-10-26 11:45 UTC (permalink / raw)
  To: Andy Lutomirski, x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Stas Sergeev

26.10.2015 04:25, Andy Lutomirski пишет:
> This is take 2 at fixing x86 64-bit signals wrt SS.  After a lot of
> thought, this is not controlled by any flags -- I would much prefer
> to avoid opt-in behavior.  Instead, it just tries hard to avoid
> triggering the cases that break DOSEMU.
> 
> Stas, this now seems to pass the test you sent me.  It works with
> stock dosemu2 (I haven't tested classic dosemu because I can't get it
> to work regardless).
I'll test it myself then.
But this will have to wait till a week-end I am afraid.
In a mean time you can test vm86() - last time I tried,
I got oops and hard lockup.

> It also works with a patched dosemu2 that bypasses
> the userspace trampoline:
> 
> https://github.com/amluto/dosemu2/commit/571b4d08dc885b7a133e444a2ad23e0d21366206
> 
> With this applied, all of the x86 selftests pass on x86_64.  That
> wasn't the case before -- ldt_gdt_64 was broken.
> 
> This is a bit risky, and another option would be to do nothing at
> all.  Then we'd disable the problematic self-tests (sigh), and
No, another option is a new SA_ flag.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-26 11:45 ` [PATCH v2 0/4] x86: sigcontext fixes, again Stas Sergeev
@ 2015-10-27  0:52   ` Andy Lutomirski
  2015-10-27 14:05     ` Stas Sergeev
  0 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-27  0:52 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst,
	Denys Vlasenko, Linus Torvalds, Borislav Petkov, Stas Sergeev

On Mon, Oct 26, 2015 at 4:45 AM, Stas Sergeev <stsp@list.ru> wrote:
> 26.10.2015 04:25, Andy Lutomirski пишет:
>> This is take 2 at fixing x86 64-bit signals wrt SS.  After a lot of
>> thought, this is not controlled by any flags -- I would much prefer
>> to avoid opt-in behavior.  Instead, it just tries hard to avoid
>> triggering the cases that break DOSEMU.
>>
>> Stas, this now seems to pass the test you sent me.  It works with
>> stock dosemu2 (I haven't tested classic dosemu because I can't get it
>> to work regardless).
> I'll test it myself then.
> But this will have to wait till a week-end I am afraid.
> In a mean time you can test vm86() - last time I tried,
> I got oops and hard lockup.

Can you tell me exactly what kernel version (release by Linus or
commit hash) oopses and, if it's easy, post a screenshot or log?

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-27  0:52   ` Andy Lutomirski
@ 2015-10-27 14:05     ` Stas Sergeev
  2015-10-27 22:37       ` Linus Torvalds
  2015-10-30 23:50       ` Andy Lutomirski
  0 siblings, 2 replies; 22+ messages in thread
From: Stas Sergeev @ 2015-10-27 14:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst,
	Denys Vlasenko, Linus Torvalds, Borislav Petkov, Stas Sergeev

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

27.10.2015 03:52, Andy Lutomirski пишет:
> On Mon, Oct 26, 2015 at 4:45 AM, Stas Sergeev <stsp@list.ru> wrote:
>> 26.10.2015 04:25, Andy Lutomirski пишет:
>>> This is take 2 at fixing x86 64-bit signals wrt SS.  After a lot of
>>> thought, this is not controlled by any flags -- I would much prefer
>>> to avoid opt-in behavior.  Instead, it just tries hard to avoid
>>> triggering the cases that break DOSEMU.
>>>
>>> Stas, this now seems to pass the test you sent me.  It works with
>>> stock dosemu2 (I haven't tested classic dosemu because I can't get it
>>> to work regardless).
>> I'll test it myself then.
>> But this will have to wait till a week-end I am afraid.
>> In a mean time you can test vm86() - last time I tried,
>> I got oops and hard lockup.
> 
> Can you tell me exactly what kernel version (release by Linus or
> commit hash) oopses and, if it's easy, post a screenshot or log?
I archived my config and git hash.
I can't easily post an Oops: under X it doesn't even appear -
machine freezes immediately, and under non-KMS console it is
possible to get one, but difficult to screen-shot (using bare
metal, not VM). Also the Oops was seemingly unrelated.
And if you run "dosemu -s" under non-KMS console, you'll also
reproduce this one:
https://bugzilla.kernel.org/show_bug.cgi?id=97321

[-- Attachment #2: cfg.tar.gz --]
[-- Type: application/x-gzip, Size: 32362 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-27 14:05     ` Stas Sergeev
@ 2015-10-27 22:37       ` Linus Torvalds
  2015-10-28  0:04         ` Toshi Kani
  2015-10-30 23:50       ` Andy Lutomirski
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2015-10-27 22:37 UTC (permalink / raw)
  To: Stas Sergeev, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru> wrote:
>
> I can't easily post an Oops: under X it doesn't even appear -
> machine freezes immediately, and under non-KMS console it is
> possible to get one, but difficult to screen-shot (using bare
> metal, not VM). Also the Oops was seemingly unrelated.
> And if you run "dosemu -s" under non-KMS console, you'll also
> reproduce this one:
> https://bugzilla.kernel.org/show_bug.cgi?id=97321

Hmm. Andrew Morton responded to that initially, but then nothing
happened, and now it's been another six months. Andrew?

The arch/x86/mm/pat.c error handling does seem to be suspect. This is
all code several years old, so none of this is new, and I think Suresh
is gone.  Adding a few other people with recent sign-offs to that
file, in the hope that somebody feels like they own it..

               Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-27 22:37       ` Linus Torvalds
@ 2015-10-28  0:04         ` Toshi Kani
  2015-10-28  9:53           ` Stas Sergeev
  0 siblings, 1 reply; 22+ messages in thread
From: Toshi Kani @ 2015-10-28  0:04 UTC (permalink / raw)
  To: Linus Torvalds, Stas Sergeev, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
> On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru> wrote:
> > 
> > I can't easily post an Oops: under X it doesn't even appear -
> > machine freezes immediately, and under non-KMS console it is
> > possible to get one, but difficult to screen-shot (using bare
> > metal, not VM). Also the Oops was seemingly unrelated.
> > And if you run "dosemu -s" under non-KMS console, you'll also
> > reproduce this one:
> > https://bugzilla.kernel.org/show_bug.cgi?id=97321
> 
> Hmm. Andrew Morton responded to that initially, but then nothing
> happened, and now it's been another six months. Andrew?
> 
> The arch/x86/mm/pat.c error handling does seem to be suspect. This is
> all code several years old, so none of this is new, and I think Suresh
> is gone.  Adding a few other people with recent sign-offs to that
> file, in the hope that somebody feels like they own it..

In the case of PFNMAP, the range should always be mapped.  So, I wonder why
follow_phys() failed with the !pte_present() check.

Stas, do you have a test program that can reproduce 97321?

-Toshi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-28  0:04         ` Toshi Kani
@ 2015-10-28  9:53           ` Stas Sergeev
  2015-10-28 16:34             ` Toshi Kani
  0 siblings, 1 reply; 22+ messages in thread
From: Stas Sergeev @ 2015-10-28  9:53 UTC (permalink / raw)
  To: Toshi Kani, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

28.10.2015 03:04, Toshi Kani пишет:
> On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
>> On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru> wrote:
>>>
>>> I can't easily post an Oops: under X it doesn't even appear -
>>> machine freezes immediately, and under non-KMS console it is
>>> possible to get one, but difficult to screen-shot (using bare
>>> metal, not VM). Also the Oops was seemingly unrelated.
>>> And if you run "dosemu -s" under non-KMS console, you'll also
>>> reproduce this one:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=97321
>>
>> Hmm. Andrew Morton responded to that initially, but then nothing
>> happened, and now it's been another six months. Andrew?
>>
>> The arch/x86/mm/pat.c error handling does seem to be suspect. This is
>> all code several years old, so none of this is new, and I think Suresh
>> is gone.  Adding a few other people with recent sign-offs to that
>> file, in the hope that somebody feels like they own it..
> 
> In the case of PFNMAP, the range should always be mapped.  So, I wonder why
> follow_phys() failed with the !pte_present() check.
> 
> Stas, do you have a test program that can reproduce 97321?
Get dosemu2 from here:
https://github.com/stsp/dosemu2/releases
or from git, or get dosemu1.
Then boot your kernel with "nomodeset=1" to get a text console.
Run

dosemu -s

and you'll get the bug.
You don't even need to mess with FreeDOS or whatever, because
the problem happens on a very start. And since it is WARN_OCNCE,
you'll need to reboot to get it again.
You will probably need to adjust /etc/sudoers because -s means
that dosemu (a wrapper script for dosemu.bin) will try sudo so
that dosemu.bin can grab /dev/mem as root.

If it is too complicated, let me know and I'll try to code up
a reduced test-case (but can't promise to get to that quickly).

Thanks to everyone (esp Linus of course :-) who got involved in
finally supporting dosemu again - it was rather ill for the last
years.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-28  9:53           ` Stas Sergeev
@ 2015-10-28 16:34             ` Toshi Kani
  2015-10-28 19:22               ` Toshi Kani
  0 siblings, 1 reply; 22+ messages in thread
From: Toshi Kani @ 2015-10-28 16:34 UTC (permalink / raw)
  To: Stas Sergeev, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Wed, 2015-10-28 at 12:53 +0300, Stas Sergeev wrote:
> 28.10.2015 03:04, Toshi Kani пишет:
> > On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
> > > On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru> wrote:
> > > > 
> > > > I can't easily post an Oops: under X it doesn't even appear -
> > > > machine freezes immediately, and under non-KMS console it is
> > > > possible to get one, but difficult to screen-shot (using bare
> > > > metal, not VM). Also the Oops was seemingly unrelated.
> > > > And if you run "dosemu -s" under non-KMS console, you'll also
> > > > reproduce this one:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=97321
> > > 
> > > Hmm. Andrew Morton responded to that initially, but then nothing
> > > happened, and now it's been another six months. Andrew?
> > > 
> > > The arch/x86/mm/pat.c error handling does seem to be suspect. This is
> > > all code several years old, so none of this is new, and I think
> > > Suresh
> > > is gone.  Adding a few other people with recent sign-offs to that
> > > file, in the hope that somebody feels like they own it..
> > 
> > In the case of PFNMAP, the range should always be mapped.  So, I wonder
> > why
> > follow_phys() failed with the !pte_present() check.
> > 
> > Stas, do you have a test program that can reproduce 97321?
> Get dosemu2 from here:
> https://github.com/stsp/dosemu2/releases
> or from git, or get dosemu1.
> Then boot your kernel with "nomodeset=1" to get a text console.
> Run
> 
> dosemu -s
> 
> and you'll get the bug.
> You don't even need to mess with FreeDOS or whatever, because
> the problem happens on a very start. And since it is WARN_OCNCE,
> you'll need to reboot to get it again.
> You will probably need to adjust /etc/sudoers because -s means
> that dosemu (a wrapper script for dosemu.bin) will try sudo so
> that dosemu.bin can grab /dev/mem as root.
> 
> If it is too complicated, let me know and I'll try to code up
> a reduced test-case (but can't promise to get to that quickly).
> 
> Thanks to everyone (esp Linus of course :-) who got involved in
> finally supporting dosemu again - it was rather ill for the last
> years.

I compiled "pre4" dosemu2, and ran on 4.3-rc7 kernel booted with the
"nomodeset=1" option.  I may be missing something, but I am not able to
reproduce your problem because dosemu fails with the segfault below on my
system...

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000421fdb in video_init () at video.c:242
242       if (Video->priv_init)

(gdb) bt
#0  0x0000000000421fdb in video_init () at video.c:242
#1  video_config_init () at video.c:391
#2  0x00000000004b5693 in device_init () at init.c:237
#3  0x000000000041ea3a in main (argc=3, argv=0x7fff70851d28) at emu.c:356

(gdb) p Video
$1 = (struct video_system *) 0x0

Thanks,
-Toshi
 





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-28 16:34             ` Toshi Kani
@ 2015-10-28 19:22               ` Toshi Kani
  2015-10-28 22:51                 ` Toshi Kani
  0 siblings, 1 reply; 22+ messages in thread
From: Toshi Kani @ 2015-10-28 19:22 UTC (permalink / raw)
  To: Stas Sergeev, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Wed, 2015-10-28 at 10:34 -0600, Toshi Kani wrote:
> On Wed, 2015-10-28 at 12:53 +0300, Stas Sergeev wrote:
> > 28.10.2015 03:04, Toshi Kani пишет:
> > > On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
> > > > On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru>
> > > > wrote:
> > > > > 
> > > > > I can't easily post an Oops: under X it doesn't even appear -
> > > > > machine freezes immediately, and under non-KMS console it is
> > > > > possible to get one, but difficult to screen-shot (using bare
> > > > > metal, not VM). Also the Oops was seemingly unrelated.
> > > > > And if you run "dosemu -s" under non-KMS console, you'll also
> > > > > reproduce this one:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=97321
> > > > 
> > > > Hmm. Andrew Morton responded to that initially, but then nothing
> > > > happened, and now it's been another six months. Andrew?
> > > > 
> > > > The arch/x86/mm/pat.c error handling does seem to be suspect. This 
> > > > is all code several years old, so none of this is new, and I think
> > > > Suresh is gone.  Adding a few other people with recent sign-offs to 
> > > > that file, in the hope that somebody feels like they own it..
> > > 
> > > In the case of PFNMAP, the range should always be mapped.  So, I 
> > > wonder why follow_phys() failed with the !pte_present() check.
> > > 
> > > Stas, do you have a test program that can reproduce 97321?
> > Get dosemu2 from here:
> > https://github.com/stsp/dosemu2/releases
> > or from git, or get dosemu1.
> > Then boot your kernel with "nomodeset=1" to get a text console.
> > Run
> > 
> > dosemu -s
> > 
> > and you'll get the bug.

I looked at the dosemu code and was able to reproduce the issue with a test
program.  This problem happens when mremap() to /dev/mem (or PFNMAP) is
called with MREMAP_FIXED.

In this case, mremap calls move_vma(), which first calls move_page_tables()
to remap the translation and then calls do_munmap() to remove the original
mapping.  Hence, when untrack_pfn() is called from do_munmap(), the
original map is already removed, and follow_phys() fails with the
 !pte_present() check.

I think there are a couple of issues:
 - If track_pfn() ignores an error from follow_phys() and skips
free_pfn_range(), PAT continues to track the original map that is removed.
 - track_pfn() calls free_pfn_range() to untrack a given free range. 
 However, rbt_memtype_erase() requires the free range match exactly to the
tracked range.  This does not support mremap, which needs to free up part
of the tracked range.
 - PAT does not track a new translation specified by mremap() with MREMAP_F
IXED.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-28 19:22               ` Toshi Kani
@ 2015-10-28 22:51                 ` Toshi Kani
  2015-10-31 11:58                   ` Stas Sergeev
  0 siblings, 1 reply; 22+ messages in thread
From: Toshi Kani @ 2015-10-28 22:51 UTC (permalink / raw)
  To: Stas Sergeev, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Wed, 2015-10-28 at 13:22 -0600, Toshi Kani wrote:
> On Wed, 2015-10-28 at 10:34 -0600, Toshi Kani wrote:
> > On Wed, 2015-10-28 at 12:53 +0300, Stas Sergeev wrote:
> > > 28.10.2015 03:04, Toshi Kani пишет:
> > > > On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
> > > > > On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru>
> > > > > wrote:
> > > > > > 
> > > > > > I can't easily post an Oops: under X it doesn't even appear -
> > > > > > machine freezes immediately, and under non-KMS console it is
> > > > > > possible to get one, but difficult to screen-shot (using bare
> > > > > > metal, not VM). Also the Oops was seemingly unrelated.
> > > > > > And if you run "dosemu -s" under non-KMS console, you'll also
> > > > > > reproduce this one:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=97321
> > > > > 
> > > > > Hmm. Andrew Morton responded to that initially, but then nothing
> > > > > happened, and now it's been another six months. Andrew?
> > > > > 
> > > > > The arch/x86/mm/pat.c error handling does seem to be suspect. This 
> > > > > is all code several years old, so none of this is new, and I think
> > > > > Suresh is gone.  Adding a few other people with recent sign-offs to 
> > > > > that file, in the hope that somebody feels like they own it..
> > > > 
> > > > In the case of PFNMAP, the range should always be mapped.  So, I 
> > > > wonder why follow_phys() failed with the !pte_present() check.
> > > > 
> > > > Stas, do you have a test program that can reproduce 97321?
> > > Get dosemu2 from here:
> > > https://github.com/stsp/dosemu2/releases
> > > or from git, or get dosemu1.
> > > Then boot your kernel with "nomodeset=1" to get a text console.
> > > Run
> > > 
> > > dosemu -s
> > > 
> > > and you'll get the bug.
> 
> I looked at the dosemu code and was able to reproduce the issue with a test
> program.  This problem happens when mremap() to /dev/mem (or PFNMAP) is
> called with MREMAP_FIXED.
> 
> In this case, mremap calls move_vma(), which first calls move_page_tables()
> to remap the translation and then calls do_munmap() to remove the original
> mapping.  Hence, when untrack_pfn() is called from do_munmap(), the
> original map is already removed, and follow_phys() fails with the
>  !pte_present() check.
> 
> I think there are a couple of issues:
>  - If untrack_pfn() ignores an error from follow_phys() and skips
> free_pfn_range(), PAT continues to track the original map that is removed.
>  - untrack_pfn() calls free_pfn_range() to untrack a given free range. 
>  However, rbt_memtype_erase() requires the free range match exactly to the
> tracked range.  This does not support mremap, which needs to free up part
> of the tracked range.
>  - PAT does not track a new translation specified by mremap() with MREMAP_F
> IXED.

Thinking further, I think the 1st and 3rd items are non-issues.  mremap remaps
virtual address, but keeps the same cache type and pfns.  So, PAT does not have
to change the tracked pfns in this case.  The 2nd item is still a problem,
though. 

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-27 14:05     ` Stas Sergeev
  2015-10-27 22:37       ` Linus Torvalds
@ 2015-10-30 23:50       ` Andy Lutomirski
  1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2015-10-30 23:50 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Brian Gerst,
	Denys Vlasenko, Linus Torvalds, Borislav Petkov, Stas Sergeev

On Tue, Oct 27, 2015 at 7:05 AM, Stas Sergeev <stsp@list.ru> wrote:
> 27.10.2015 03:52, Andy Lutomirski пишет:
>> On Mon, Oct 26, 2015 at 4:45 AM, Stas Sergeev <stsp@list.ru> wrote:
>>> 26.10.2015 04:25, Andy Lutomirski пишет:
>>>> This is take 2 at fixing x86 64-bit signals wrt SS.  After a lot of
>>>> thought, this is not controlled by any flags -- I would much prefer
>>>> to avoid opt-in behavior.  Instead, it just tries hard to avoid
>>>> triggering the cases that break DOSEMU.
>>>>
>>>> Stas, this now seems to pass the test you sent me.  It works with
>>>> stock dosemu2 (I haven't tested classic dosemu because I can't get it
>>>> to work regardless).
>>> I'll test it myself then.
>>> But this will have to wait till a week-end I am afraid.
>>> In a mean time you can test vm86() - last time I tried,
>>> I got oops and hard lockup.
>>
>> Can you tell me exactly what kernel version (release by Linus or
>> commit hash) oopses and, if it's easy, post a screenshot or log?
> I archived my config and git hash.
> I can't easily post an Oops: under X it doesn't even appear -
> machine freezes immediately, and under non-KMS console it is
> possible to get one, but difficult to screen-shot (using bare
> metal, not VM). Also the Oops was seemingly unrelated.
> And if you run "dosemu -s" under non-KMS console, you'll also
> reproduce this one:
> https://bugzilla.kernel.org/show_bug.cgi?id=97321

Like this?

[  288.221786] BUG: unable to handle kernel paging request at ffffffb9
[  288.222475] IP: [<c169bf48>] snd_seq_delete_port+0x48/0xd0
[  288.222743] *pde = 01c8c067 *pte = 00000000
[  288.222743] Oops: 0000 [#1] SMP
[  288.222743] Modules linked in:
[  288.222743] CPU: 0 PID: 5480 Comm: dosemu.bin Not tainted 4.3.0-rc7+ #345
[  288.222743] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[  288.222743] task: c7006b40 ti: c7bb4000 task.ti: c7bb4000
[  288.222743] EIP: 0060:[<c169bf48>] EFLAGS: 00010082 CPU: 0
[  288.222743] EIP is at snd_seq_delete_port+0x48/0xd0
[  288.222743] EAX: 00000000 EBX: ffffffb8 ECX: c707c67c EDX: 00000001
[  288.222743] ESI: c707c600 EDI: c707c684 EBP: c7bb5d60 ESP: c7bb5d48
[  288.222743]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  288.222743] CR0: 80050033 CR2: ffffffb9 CR3: 07b00000 CR4: 000406d0
[  288.222743] Stack:
[  288.222743]  00000001 00000246 c707c68c c707c600 40a45321 c7bb5ee0
c7bb5e14 c16965cb
[  288.222743]  0000010f 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[  288.222743]  00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[  288.222743] Call Trace:
[  288.222743]  [<c16965cb>] snd_seq_ioctl_delete_port+0x3b/0x90
[  288.222743]  [<c1696c65>] snd_seq_do_ioctl+0x85/0x90
[  288.222743]  [<c1696ca3>] snd_seq_kernel_client_ctl+0x33/0x50
[  288.222743]  [<c169b78b>] snd_seq_event_port_detach+0x3b/0x50
[  288.222743]  [<c169d6a2>] delete_port+0x12/0x30
[  288.222743]  [<c169dbc1>] snd_seq_oss_release+0x41/0x50
[  288.222743]  [<c169d406>] odev_release+0x26/0x40
[  288.222743]  [<c11a46a3>] __fput+0xc3/0x1d0
[  288.222743]  [<c11a47e8>] ____fput+0x8/0x10
[  288.222743]  [<c10b924f>] task_work_run+0x6f/0x90
[  288.222743]  [<c10017e5>] prepare_exit_to_usermode+0xd5/0x100
[  288.222743]  [<c1001841>] syscall_return_slowpath+0x31/0x120
[  288.222743]  [<c11bd094>] ? __close_fd+0x54/0x70
[  288.222743]  [<c188b372>] syscall_exit_work+0x7/0xc
[  288.222743] Code: 5f d0 1e 00 89 f8 e8 68 f0 1e 00 89 45 ec 8b 46
7c 8d 4e 7c 39 c1 74 25 8d 58 b8 0f b6 40 b9 8b 55 e8 39 d0 75 0d eb
3b 8d 76 00 <0f> b6 40 b9 39 d0 74 30 8b 43 48 39 c1 8d 58 b8 75 ee 8b
55 ec
[  288.222743] EIP: [<c169bf48>] snd_seq_delete_port+0x48/0xd0 SS:ESP
0068:c7bb5d48
[  288.222743] CR2: 00000000ffffffb9
[  288.222743] ---[ end trace f216bf40eb9b39d6 ]---

I'll try to narrow that down a little bit and email the appropriate maintainer.

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-28 22:51                 ` Toshi Kani
@ 2015-10-31 11:58                   ` Stas Sergeev
  2015-11-02 17:01                     ` Toshi Kani
  0 siblings, 1 reply; 22+ messages in thread
From: Stas Sergeev @ 2015-10-31 11:58 UTC (permalink / raw)
  To: Toshi Kani, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

29.10.2015 01:51, Toshi Kani пишет:
> On Wed, 2015-10-28 at 13:22 -0600, Toshi Kani wrote:
>> On Wed, 2015-10-28 at 10:34 -0600, Toshi Kani wrote:
>>> On Wed, 2015-10-28 at 12:53 +0300, Stas Sergeev wrote:
>>>> 28.10.2015 03:04, Toshi Kani пишет:
>>>>> On Wed, 2015-10-28 at 07:37 +0900, Linus Torvalds wrote:
>>>>>> On Tue, Oct 27, 2015 at 11:05 PM, Stas Sergeev <stsp@list.ru>
>>>>>> wrote:
>>>>>>> I can't easily post an Oops: under X it doesn't even appear -
>>>>>>> machine freezes immediately, and under non-KMS console it is
>>>>>>> possible to get one, but difficult to screen-shot (using bare
>>>>>>> metal, not VM). Also the Oops was seemingly unrelated.
>>>>>>> And if you run "dosemu -s" under non-KMS console, you'll also
>>>>>>> reproduce this one:
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=97321
>>>>>> Hmm. Andrew Morton responded to that initially, but then nothing
>>>>>> happened, and now it's been another six months. Andrew?
>>>>>>
>>>>>> The arch/x86/mm/pat.c error handling does seem to be suspect. This
>>>>>> is all code several years old, so none of this is new, and I think
>>>>>> Suresh is gone.  Adding a few other people with recent sign-offs to
>>>>>> that file, in the hope that somebody feels like they own it..
>>>>> In the case of PFNMAP, the range should always be mapped.  So, I
>>>>> wonder why follow_phys() failed with the !pte_present() check.
>>>>>
>>>>> Stas, do you have a test program that can reproduce 97321?
>>>> Get dosemu2 from here:
>>>> https://github.com/stsp/dosemu2/releases
>>>> or from git, or get dosemu1.
>>>> Then boot your kernel with "nomodeset=1" to get a text console.
>>>> Run
>>>>
>>>> dosemu -s
>>>>
>>>> and you'll get the bug.
>> I looked at the dosemu code and was able to reproduce the issue with a test
>> program.  This problem happens when mremap() to /dev/mem (or PFNMAP) is
>> called with MREMAP_FIXED.
>>
>> In this case, mremap calls move_vma(), which first calls move_page_tables()
>> to remap the translation and then calls do_munmap() to remove the original
>> mapping.  Hence, when untrack_pfn() is called from do_munmap(), the
>> original map is already removed, and follow_phys() fails with the
>>   !pte_present() check.
>>
>> I think there are a couple of issues:
>>   - If untrack_pfn() ignores an error from follow_phys() and skips
>> free_pfn_range(), PAT continues to track the original map that is removed.
>>   - untrack_pfn() calls free_pfn_range() to untrack a given free range.
>>   However, rbt_memtype_erase() requires the free range match exactly to the
>> tracked range.  This does not support mremap, which needs to free up part
>> of the tracked range.
>>   - PAT does not track a new translation specified by mremap() with MREMAP_F
>> IXED.
> Thinking further, I think the 1st and 3rd items are non-issues.  mremap remaps
> virtual address, but keeps the same cache type and pfns.  So, PAT does not have
> to change the tracked pfns in this case.  The 2nd item is still a problem,
> though.
Hello Toshi, thanks for your analysis.
Now as you do not seem to be preparing a fix, how
about attaching your test-case to the bug-report for
others to re-use? Or maybe you can even make it a
part of the kernel's test suit - I suppose this will help.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context
  2015-10-26  1:25 ` [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context Andy Lutomirski
@ 2015-10-31 15:18   ` Stas Sergeev
  0 siblings, 0 replies; 22+ messages in thread
From: Stas Sergeev @ 2015-10-31 15:18 UTC (permalink / raw)
  To: Andy Lutomirski, x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov,
	Cyrill Gorcunov, Pavel Emelyanov

26.10.2015 04:25, Andy Lutomirski пишет:
> This is a second attempt to make the improvements from c6f2062935c8
> ("x86/signal/64: Fix SS handling for signals delivered to 64-bit
> programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
> support for SS in the 64-bit signal context").
>
> This adds two new uc_flags flags.  UC_SIGCONTEXT_SS will be set for
> all 64-bit signals (including x32).  It indicates that the saved SS
> field is valid and that the kernel supports the new behavior.
>
> The goal is to fix a problems with signal handling in 64-bit tasks:
> SS wasn't saved in the 64-bit signal context, making it awkward to
> determine what SS was at the time of signal delivery and making it
> impossible to return to a non-flat SS (as calling sigreturn clobbers
> SS).
>
> This also made it extremely difficult for 64-bit tasks to return to
> fully-defined 16-bit contexts, because only the kernel can easily do
> espfix64, but sigreturn was unable to load or even restore SS:ESP.
> (DOSEMU has a monstrous hack to partially work around this.)
>
> If we could go back in time, the correct fix would be to make 64-bit
> signals work just like 32-bit signals with respect to SS: save it
> in signal context, reset it when delivering a signal, and restore
> it in sigreturn.
>
> Unfortunately, doing that (as I tried originally) breaks DOSEMU:
> DOSEMU wouldn't reset the signal context's SS when clearing the LDT
> and changing the saved CS to 64-bit mode, since it predates the SS
> context field existing in the first place.
>
> This patch is a bit more complicated, and it tries to balance a
> bunch of goals.  It makes signal delivery due to a bogus SS reliable
> without having to set any flags (64-bit signals will always be
> delivered to a valid SS), and it makes most cases of changing
> ucontext->ss during signal handling work as expected.
>
> I do this by special-casing the interesting case.  On sigreturn,
> ucontext->ss will be honored by default, unless the ucontext was
> created from scratch by an old program and had a 64-bit CS
> (unfortunately, CRIU can do this) or was the result of changing a
> 32-bit signal context to 64-bit without resetting SS (as DOSEMU
> does).
>
> For the benefit of new 64-bit software that uses segmentation (new
> versions of DOSEMU might), the new behavior can be detected with a
> new ucontext flag UC_SIGCONTEXT_SS.
>
> To avoid compilation issues, __pad0 is left as an alias for ss in
> ucontext.
>
> The nitty-gritty details are documented in the header file.
>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Stas Sergeev <stsp@list.ru>

Both dosemu2 and 1 work fine.
Still as this approach seems very risky, I'd appreciate if some
of the explanations from this msg:
https://lkml.org/lkml/2015/10/14/726
go into the code comments.
What I find especially useful are your notes about
siglongjmp(), threads, and the fact that running with bad
SS is now impossible on the recent kernels. Without this
info at hands, no one can trust this code perhaps.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs
  2015-10-26  1:25 ` [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs Andy Lutomirski
@ 2015-10-31 15:25   ` Stas Sergeev
  2015-12-07 23:23     ` Andy Lutomirski
  0 siblings, 1 reply; 22+ messages in thread
From: Stas Sergeev @ 2015-10-31 15:25 UTC (permalink / raw)
  To: Andy Lutomirski, x86, linux-kernel
  Cc: Brian Gerst, Denys Vlasenko, Linus Torvalds, Borislav Petkov

26.10.2015 04:25, Andy Lutomirski пишет:
> These fields have a strange history.  This tries to document it.
>
> This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
> from sigcontext"), which was reverted by ed596cde9425 ("Revert x86
> sigcontext cleanups").
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
Now the strategy about SS is to always save it to the sigcontext.
This is good because the syscall can clobber it, so the app had
to be very careful in the past trying to save it by hands.
How about saving also fs and gs? (without restoring yet)
If you do, you'll save dosemu a headache of doing any "pre-syscall"
work. Pre-syscall work is very nasty.
I haven't checked if the syscall clobber also these or only SS,
but perhaps saving them by the kernel won't hurt?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/4] x86: sigcontext fixes, again
  2015-10-31 11:58                   ` Stas Sergeev
@ 2015-11-02 17:01                     ` Toshi Kani
  0 siblings, 0 replies; 22+ messages in thread
From: Toshi Kani @ 2015-11-02 17:01 UTC (permalink / raw)
  To: Stas Sergeev, Linus Torvalds, Andrew Morton, Toshi Kani, Ingo Molnar
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, linux-kernel,
	Brian Gerst, Denys Vlasenko, Borislav Petkov, Stas Sergeev

On Sat, 2015-10-31 at 14:58 +0300, Stas Sergeev wrote:
> 29.10.2015 01:51, Toshi Kani пишет:
> > On Wed, 2015-10-28 at 13:22 -0600, Toshi Kani wrote:
> > > On Wed, 2015-10-28 at 10:34 -0600, Toshi Kani wrote:

:

> > > I looked at the dosemu code and was able to reproduce the issue with a
> > > test program.  This problem happens when mremap() to /dev/mem (or PFNMAP)
> > > is called with MREMAP_FIXED.
> > > 
> > > In this case, mremap calls move_vma(), which first calls 
> > > move_page_tables() to remap the translation and then calls do_munmap() to 
> > > remove the original mapping.  Hence, when untrack_pfn() is called from 
> > > do_munmap(), the original map is already removed, and follow_phys() fails 
> > > with the !pte_present() check.
> > > 
> > > I think there are a couple of issues:
> > >   - If untrack_pfn() ignores an error from follow_phys() and skips
> > > free_pfn_range(), PAT continues to track the original map that is removed.
> > >   - untrack_pfn() calls free_pfn_range() to untrack a given free range.
> > >   However, rbt_memtype_erase() requires the free range match exactly to 
> > > the tracked range.  This does not support mremap, which needs to free up
> > > part of the tracked range.
> > >   - PAT does not track a new translation specified by mremap() with 
> > > MREMAP_FIXED.
> > Thinking further, I think the 1st and 3rd items are non-issues.  mremap 
> > remaps virtual address, but keeps the same cache type and pfns.  So, PAT 
> > does not have to change the tracked pfns in this case.  The 2nd item is 
> > still a problem, though.
> Hello Toshi, thanks for your analysis.
> Now as you do not seem to be preparing a fix, how
> about attaching your test-case to the bug-report for
> others to re-use? Or maybe you can even make it a
> part of the kernel's test suit - I suppose this will help.

I can work on the fix, but I did not think we needed to rush on it since this
issue exists for a long time.  Can we target it to 4.4-rcX and then cc to
stables?

As for the test, it might be tricky to make it run on any platforms since it
needs to mmap/mremap a non-RAM range.  It could just map a reserved range
randomly, but I chose a range manually which I know is safe to map & read/write
on my platform in my test.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs
  2015-10-31 15:25   ` Stas Sergeev
@ 2015-12-07 23:23     ` Andy Lutomirski
  2015-12-29 12:24       ` Stas Sergeev
  0 siblings, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2015-12-07 23:23 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Denys Vlasenko, Borislav Petkov, Brian Gerst, linux-kernel,
	X86 ML, Linus Torvalds

On Oct 31, 2015 8:25 AM, "Stas Sergeev" <stsp@list.ru> wrote:
>
> 26.10.2015 04:25, Andy Lutomirski пишет:
>
>> These fields have a strange history.  This tries to document it.
>>
>> This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
>> from sigcontext"), which was reverted by ed596cde9425 ("Revert x86
>> sigcontext cleanups").
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>
> Now the strategy about SS is to always save it to the sigcontext.
> This is good because the syscall can clobber it, so the app had
> to be very careful in the past trying to save it by hands.
> How about saving also fs and gs? (without restoring yet)
> If you do, you'll save dosemu a headache of doing any "pre-syscall"
> work. Pre-syscall work is very nasty.
> I haven't checked if the syscall clobber also these or only SS,
> but perhaps saving them by the kernel won't hurt?

There's a bunch of ongoing work about FS and GS.  I want to wait and see.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs
  2015-12-07 23:23     ` Andy Lutomirski
@ 2015-12-29 12:24       ` Stas Sergeev
  2015-12-29 12:31         ` Andy Lutomirski
  0 siblings, 1 reply; 22+ messages in thread
From: Stas Sergeev @ 2015-12-29 12:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Borislav Petkov, Brian Gerst, linux-kernel,
	X86 ML, Linus Torvalds

08.12.2015 02:23, Andy Lutomirski пишет:
> On Oct 31, 2015 8:25 AM, "Stas Sergeev" <stsp@list.ru> wrote:
>>
>> 26.10.2015 04:25, Andy Lutomirski пишет:
>>
>>> These fields have a strange history.  This tries to document it.
>>>
>>> This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
>>> from sigcontext"), which was reverted by ed596cde9425 ("Revert x86
>>> sigcontext cleanups").
>>>
>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>
>> Now the strategy about SS is to always save it to the sigcontext.
>> This is good because the syscall can clobber it, so the app had
>> to be very careful in the past trying to save it by hands.
>> How about saving also fs and gs? (without restoring yet)
>> If you do, you'll save dosemu a headache of doing any "pre-syscall"
>> work. Pre-syscall work is very nasty.
>> I haven't checked if the syscall clobber also these or only SS,
>> but perhaps saving them by the kernel won't hurt?
> 
> There's a bunch of ongoing work about FS and GS.  I want to wait and see.
Andy, have you postponed only this, or also the sigreturn patch?
Long time no news...
sigreturn patch would really be nice to have.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs
  2015-12-29 12:24       ` Stas Sergeev
@ 2015-12-29 12:31         ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2015-12-29 12:31 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Denys Vlasenko, Borislav Petkov, Brian Gerst, linux-kernel,
	X86 ML, Linus Torvalds

On Tue, Dec 29, 2015 at 4:24 AM, Stas Sergeev <stsp@list.ru> wrote:
> 08.12.2015 02:23, Andy Lutomirski пишет:
>> On Oct 31, 2015 8:25 AM, "Stas Sergeev" <stsp@list.ru> wrote:
>>>
>>> 26.10.2015 04:25, Andy Lutomirski пишет:
>>>
>>>> These fields have a strange history.  This tries to document it.
>>>>
>>>> This borrows from 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
>>>> from sigcontext"), which was reverted by ed596cde9425 ("Revert x86
>>>> sigcontext cleanups").
>>>>
>>>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>>
>>> Now the strategy about SS is to always save it to the sigcontext.
>>> This is good because the syscall can clobber it, so the app had
>>> to be very careful in the past trying to save it by hands.
>>> How about saving also fs and gs? (without restoring yet)
>>> If you do, you'll save dosemu a headache of doing any "pre-syscall"
>>> work. Pre-syscall work is very nasty.
>>> I haven't checked if the syscall clobber also these or only SS,
>>> but perhaps saving them by the kernel won't hurt?
>>
>> There's a bunch of ongoing work about FS and GS.  I want to wait and see.
> Andy, have you postponed only this, or also the sigreturn patch?
> Long time no news...
> sigreturn patch would really be nice to have.

Sorry, I got caught up with a bunch of other stuff, and everyone being
slow with the holiday season.  There's some slight chance we'll get it
in for 4.5, but 4.6 is more likely.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-12-29 12:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26  1:25 [PATCH v2 0/4] x86: sigcontext fixes, again Andy Lutomirski
2015-10-26  1:25 ` [PATCH v2 1/4] x86/signal/64: Add a comment about sigcontext->fs and gs Andy Lutomirski
2015-10-31 15:25   ` Stas Sergeev
2015-12-07 23:23     ` Andy Lutomirski
2015-12-29 12:24       ` Stas Sergeev
2015-12-29 12:31         ` Andy Lutomirski
2015-10-26  1:25 ` [PATCH v2 2/4] x86/signal/64: Fix SS if needed when delivering a 64-bit signal Andy Lutomirski
2015-10-26  1:25 ` [PATCH v2 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context Andy Lutomirski
2015-10-31 15:18   ` Stas Sergeev
2015-10-26  1:25 ` [PATCH v2 4/4] selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS Andy Lutomirski
2015-10-26 11:45 ` [PATCH v2 0/4] x86: sigcontext fixes, again Stas Sergeev
2015-10-27  0:52   ` Andy Lutomirski
2015-10-27 14:05     ` Stas Sergeev
2015-10-27 22:37       ` Linus Torvalds
2015-10-28  0:04         ` Toshi Kani
2015-10-28  9:53           ` Stas Sergeev
2015-10-28 16:34             ` Toshi Kani
2015-10-28 19:22               ` Toshi Kani
2015-10-28 22:51                 ` Toshi Kani
2015-10-31 11:58                   ` Stas Sergeev
2015-11-02 17:01                     ` Toshi Kani
2015-10-30 23:50       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).