linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] x86: 32-bit compatible C/R on x86_64
@ 2016-06-01 13:11 Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 1/6] x86/vdso: unmap vdso blob on vvar mapping failure Dmitry Safonov
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov

This patches set is an attempt to add checkpoint/restore
for 32-bit tasks in compatibility mode on x86_64 hosts.

Restore in CRIU starts from one root restoring process, which
reads info for all threads being restored from images files.
This information is used further to to find out which processes
share some resources. Later shared resources are restored only
by one process and all other inherit them.
After that it calls clone() and new threads restore their
properties in parallel. Those threads inherit all parent's
mappings and fetch properties from those mappings
(and do clone themself, if they have children/subthreads). [1]
Then starts restorer blob's play, it's PIE binary, which
unmaps all unneeded for restoring VMAs, maps new VMAs and
finalize restoring with sigreturn syscall. [2]

To restore of 32-bit task we need three things to do in running
x86_64 restorer blob:
a) set code selector to __USER32_CS (to run 32-bit code);
b) remap vdso blob from 64-bit to 32-bit
   This is primary needed because restore may happen on a different
   kernel, which has different vDSO image than we had on dump.
c) if 32-bit vDSO differ to dumped image, move it on free place
   and add jump trampolines to that place.
d) switch TIF_IA32 flag, so kernel would know that it deals with
   compatible 32-bit application.

>From all this:
a) setting CS may be done from userspace, no patches needed;
b) patches 1-3 add ability to map different vDSO blobs on x86 kernel;
c) for remapping/moving 32-bit vDSO blob patches have been send earlier
   and seems to be accepted [3]
d) and for swapping TIF_IA32 flag discussion with Andy ended in conclusion
   that it's better to remove this flag completely.
   Patches 4-6 deletes usage of TIF_IA32 from ptrace, signal and coredump
   code. This is rework/resend of RFC [4]

[1] https://criu.org/Checkpoint/Restore#Restore
[2] https://criu.org/Restorer_context
[3] https://lkml.org/lkml/2016/5/17/243
[4] https://lkml.org/lkml/2016/4/25/650

Dmitry Safonov (6):
  x86/vdso: unmap vdso blob on vvar mapping failure
  x86/vdso: introduce do_map_vdso() and vdso_type enum
  x86/arch_prctl/vdso: add ARCH_MAP_VDSO_*
  x86/coredump: use core regs, rather that TIF_IA32 flag
  x86/ptrace: down with test_thread_flag(TIF_IA32)
  x86/signal: add SA_{X32,IA32}_ABI sa_flags

 arch/x86/entry/vdso/vma.c          | 72 +++++++++++++++++++++-----------------
 arch/x86/ia32/ia32_signal.c        |  2 +-
 arch/x86/include/asm/compat.h      |  8 ++---
 arch/x86/include/asm/fpu/signal.h  |  6 ++++
 arch/x86/include/asm/vdso.h        |  4 +++
 arch/x86/include/uapi/asm/prctl.h  |  6 ++++
 arch/x86/include/uapi/asm/signal.h |  6 +++-
 arch/x86/kernel/process_64.c       | 10 ++++++
 arch/x86/kernel/ptrace.c           |  2 +-
 arch/x86/kernel/signal.c           | 19 +++++-----
 arch/x86/kernel/signal_compat.c    | 30 ++++++++++++++--
 fs/binfmt_elf.c                    | 18 +++++-----
 kernel/signal.c                    |  5 +++
 13 files changed, 129 insertions(+), 59 deletions(-)

-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] x86/vdso: unmap vdso blob on vvar mapping failure
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum Dmitry Safonov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski

If remapping of vDSO blob failed on vvar mapping,
we need to unmap previously mapped vDSO blob.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/entry/vdso/vma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 3329844e3c43..387028e6755d 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -238,7 +238,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
-		goto up_fail;
+		do_munmap(mm, text_start, image->size);
 	}
 
 up_fail:
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 1/6] x86/vdso: unmap vdso blob on vvar mapping failure Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-03  9:50   ` Cyrill Gorcunov
  2016-06-01 13:11 ` [PATCH 3/6] x86/arch_prctl/vdso: add ARCH_MAP_VDSO_* Dmitry Safonov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski

Make in-kernel API to map vDSO blobs on x86.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/entry/vdso/vma.c   | 70 +++++++++++++++++++++++++--------------------
 arch/x86/include/asm/vdso.h |  4 +++
 2 files changed, 43 insertions(+), 31 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 387028e6755d..4017b60eed33 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -176,11 +176,18 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 	return VM_FAULT_SIGBUS;
 }
 
-static int map_vdso(const struct vdso_image *image, bool calculate_addr)
+/*
+ * Add vdso and vvar mappings to current process.
+ * @image          - blob to map
+ * @addr           - request a specific address (zero to map at free addr)
+ * @calculate_addr - turn on aslr (@addr will be ignored)
+ */
+static int map_vdso(const struct vdso_image *image,
+		unsigned long addr, bool calculate_addr)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
-	unsigned long addr, text_start;
+	unsigned long text_start;
 	int ret = 0;
 
 	static const struct vm_special_mapping vdso_mapping = {
@@ -193,12 +200,9 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		.fault = vvar_fault,
 	};
 
-	if (calculate_addr) {
+	if (calculate_addr)
 		addr = vdso_addr(current->mm->start_stack,
 				 image->size - image->sym_vvar_start);
-	} else {
-		addr = 0;
-	}
 
 	if (down_write_killable(&mm->mmap_sem))
 		return -EINTR;
@@ -249,48 +253,52 @@ up_fail:
 	return ret;
 }
 
-#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
-static int load_vdso32(void)
+int do_map_vdso(vdso_type type, unsigned long addr, bool randomize_addr)
 {
-	if (vdso32_enabled != 1)  /* Other values all mean "disabled" */
-		return 0;
-
-	return map_vdso(&vdso_image_32, false);
-}
+	switch (type) {
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+	case VDSO_32:
+		if (vdso32_enabled != 1)  /* Other values all mean "disabled" */
+			return 0;
+		/* vDSO aslr turned off for i386 vDSO */
+		return map_vdso(&vdso_image_32, addr, false);
+#endif
+#ifdef CONFIG_X86_64
+	case VDSO_64:
+		if (!vdso64_enabled)
+			return 0;
+		return map_vdso(&vdso_image_64, addr, randomize_addr);
+#endif
+#ifdef CONFIG_X86_X32_ABI
+	case VDSO_X32:
+		if (!vdso64_enabled)
+			return 0;
+		return map_vdso(&vdso_image_x32, addr, randomize_addr);
 #endif
+	default:
+		return -EINVAL;
+	}
+}
 
 #ifdef CONFIG_X86_64
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
-	if (!vdso64_enabled)
-		return 0;
-
-	return map_vdso(&vdso_image_64, true);
+	return do_map_vdso(VDSO_64, 0, true);
 }
 
 #ifdef CONFIG_COMPAT
 int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
 				       int uses_interp)
 {
-#ifdef CONFIG_X86_X32_ABI
-	if (test_thread_flag(TIF_X32)) {
-		if (!vdso64_enabled)
-			return 0;
-
-		return map_vdso(&vdso_image_x32, true);
-	}
-#endif
-#ifdef CONFIG_IA32_EMULATION
-	return load_vdso32();
-#else
-	return 0;
-#endif
+	if (test_thread_flag(TIF_X32))
+		return do_map_vdso(VDSO_X32, 0, true);
+	return do_map_vdso(VDSO_32, 0, false);
 }
 #endif
 #else
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
-	return load_vdso32();
+	return do_map_vdso(VDSO_32, 0, false);
 }
 #endif
 
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 43dc55be524e..2be137897842 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -41,6 +41,10 @@ extern const struct vdso_image vdso_image_32;
 
 extern void __init init_vdso_image(const struct vdso_image *image);
 
+typedef enum { VDSO_32, VDSO_64, VDSO_X32 } vdso_type;
+
+extern int do_map_vdso(vdso_type type, unsigned long addr, bool randomize_addr);
+
 #endif /* __ASSEMBLER__ */
 
 #endif /* _ASM_X86_VDSO_H */
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/6] x86/arch_prctl/vdso: add ARCH_MAP_VDSO_*
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 1/6] x86/vdso: unmap vdso blob on vvar mapping failure Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-01 13:11 ` [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag Dmitry Safonov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski

Add API to change vdso blob type with arch_prctl.
As this is useful yet only by needs of CRIU, expose
this interface under CONFIG_CHECKPOINT_RESTORE.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/include/uapi/asm/prctl.h |  6 ++++++
 arch/x86/kernel/process_64.c      | 10 ++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 3ac5032fae09..ae135de547f5 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -6,4 +6,10 @@
 #define ARCH_GET_FS 0x1003
 #define ARCH_GET_GS 0x1004
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
+# define ARCH_MAP_VDSO_X32	0x2001
+# define ARCH_MAP_VDSO_32	0x2002
+# define ARCH_MAP_VDSO_64	0x2003
+#endif
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6e789ca1f841..64459c88b3d9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -49,6 +49,7 @@
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/vdso.h>
 
 asmlinkage extern void ret_from_fork(void);
 
@@ -577,6 +578,15 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
 		break;
 	}
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
+	case ARCH_MAP_VDSO_X32:
+		return do_map_vdso(VDSO_X32, addr, false);
+	case ARCH_MAP_VDSO_32:
+		return do_map_vdso(VDSO_32, addr, false);
+	case ARCH_MAP_VDSO_64:
+		return do_map_vdso(VDSO_64, addr, false);
+#endif
+
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
                   ` (2 preceding siblings ...)
  2016-06-01 13:11 ` [PATCH 3/6] x86/arch_prctl/vdso: add ARCH_MAP_VDSO_* Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-03  9:51   ` Cyrill Gorcunov
  2016-06-06 20:45   ` Oleg Nesterov
  2016-06-01 13:11 ` [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32) Dmitry Safonov
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski, Alexander Viro,
	linux-fsdevel

As we have here core registers, use them to determine application's
mode and sizes of register set and elf_prstatus instead of TIF_IA32
flag.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/include/asm/compat.h |  8 ++++----
 fs/binfmt_elf.c               | 18 ++++++++++--------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 5a3b2c119ed0..d0b517fc77ff 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -264,10 +264,10 @@ struct compat_shmid64_ds {
 #ifdef CONFIG_X86_X32_ABI
 typedef struct user_regs_struct compat_elf_gregset_t;
 
-#define PR_REG_SIZE(S) (test_thread_flag(TIF_IA32) ? 68 : 216)
-#define PRSTATUS_SIZE(S) (test_thread_flag(TIF_IA32) ? 144 : 296)
-#define SET_PR_FPVALID(S,V) \
-  do { *(int *) (((void *) &((S)->pr_reg)) + PR_REG_SIZE(0)) = (V); } \
+#define PR_REG_SIZE(S, R) (!user_64bit_mode(R) ? 68 : 216)
+#define PRSTATUS_SIZE(S, R) (!user_64bit_mode(R) ? 144 : 296)
+#define SET_PR_FPVALID(S, V, R) \
+  do { *(int *) (((void *) &((S)->pr_reg)) + PR_REG_SIZE(0, R)) = (V); } \
   while (0)
 
 #define COMPAT_USE_64BIT_TIME \
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index e158b22ef32f..3876382edc72 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1623,11 +1623,11 @@ static void do_thread_regset_writeback(struct task_struct *task,
 }
 
 #ifndef PR_REG_SIZE
-#define PR_REG_SIZE(S) sizeof(S)
+#define PR_REG_SIZE(S, R) sizeof(S)
 #endif
 
 #ifndef PRSTATUS_SIZE
-#define PRSTATUS_SIZE(S) sizeof(S)
+#define PRSTATUS_SIZE(S, R) sizeof(S)
 #endif
 
 #ifndef PR_REG_PTR
@@ -1635,12 +1635,13 @@ static void do_thread_regset_writeback(struct task_struct *task,
 #endif
 
 #ifndef SET_PR_FPVALID
-#define SET_PR_FPVALID(S, V) ((S)->pr_fpvalid = (V))
+#define SET_PR_FPVALID(S, V, R) ((S)->pr_fpvalid = (V))
 #endif
 
 static int fill_thread_core_info(struct elf_thread_core_info *t,
 				 const struct user_regset_view *view,
-				 long signr, size_t *total)
+				 long signr, size_t *total,
+				 struct pt_regs *regs __maybe_unused)
 {
 	unsigned int i;
 
@@ -1652,11 +1653,11 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
 	 */
 	fill_prstatus(&t->prstatus, t->task, signr);
 	(void) view->regsets[0].get(t->task, &view->regsets[0],
-				    0, PR_REG_SIZE(t->prstatus.pr_reg),
+				    0, PR_REG_SIZE(t->prstatus.pr_reg, regs),
 				    PR_REG_PTR(&t->prstatus), NULL);
 
 	fill_note(&t->notes[0], "CORE", NT_PRSTATUS,
-		  PRSTATUS_SIZE(t->prstatus), &t->prstatus);
+		  PRSTATUS_SIZE(t->prstatus, regs), &t->prstatus);
 	*total += notesize(&t->notes[0]);
 
 	do_thread_regset_writeback(t->task, &view->regsets[0]);
@@ -1686,7 +1687,7 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
 						  regset->core_note_type,
 						  size, data);
 				else {
-					SET_PR_FPVALID(&t->prstatus, 1);
+					SET_PR_FPVALID(&t->prstatus, 1, regs);
 					fill_note(&t->notes[i], "CORE",
 						  NT_PRFPREG, size, data);
 				}
@@ -1772,7 +1773,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
 	 * Now fill in each thread's information.
 	 */
 	for (t = info->thread; t != NULL; t = t->next)
-		if (!fill_thread_core_info(t, view, siginfo->si_signo, &info->size))
+		if (!fill_thread_core_info(t, view, siginfo->si_signo,
+					&info->size, regs))
 			return 0;
 
 	/*
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
                   ` (3 preceding siblings ...)
  2016-06-01 13:11 ` [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-03  9:57   ` Cyrill Gorcunov
  2016-06-06 21:19   ` Oleg Nesterov
  2016-06-01 13:11 ` [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags Dmitry Safonov
  2016-06-01 13:15 ` [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
  6 siblings, 2 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski

As the task isn't executing at the moment of {GET,SET}REGS,
return regset that corresponds to code selector, rather than
value of TIF_IA32 flag.
I.e. if we ptrace i386 elf binary that has just changed it's
code selector to __USER_CS, than GET_REGS will return
full x86_64 register set.

Note, that this will work only if application has changed it's CS.
If the application does 32-bit syscall with __USER_CS, ptrace
will still return 64-bit register set. Which might be still confusing
for tools that expect TS_COMPACT to be exposed [1, 2].

So this this change should make PTRACE_GETREGSET more reliable and
this will be another step to drop TIF_{IA32,X32} flags.

[1]: https://sourceforge.net/p/strace/mailman/message/30471411/
[2]: https://lkml.org/lkml/2012/1/18/320

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 600edd225e81..a2612d06cf4b 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
 const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 {
 #ifdef CONFIG_IA32_EMULATION
-	if (test_tsk_thread_flag(task, TIF_IA32))
+	if (!user_64bit_mode(task_pt_regs(task)))
 #endif
 #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
 		return &user_x86_32_view;
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
                   ` (4 preceding siblings ...)
  2016-06-01 13:11 ` [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32) Dmitry Safonov
@ 2016-06-01 13:11 ` Dmitry Safonov
  2016-06-04  5:08   ` Andy Lutomirski
  2016-06-01 13:15 ` [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
  6 siblings, 1 reply; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:11 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul,
	khorenko, Dmitry Safonov, Andy Lutomirski

Introduce new flags that defines which ABI to use on creating sigframe.
Those flags kernel will set according to sigaction syscall ABI,
which set handler for the signal being delivered.

So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver.
Those flags will be used only under CONFIG_COMPAT.

Similar way ARM uses sa_flags to differ in which mode deliver signal
for 26-bit applications (look at SA_THIRYTWO).

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/ia32/ia32_signal.c        |  2 +-
 arch/x86/include/asm/fpu/signal.h  |  6 ++++++
 arch/x86/include/uapi/asm/signal.h |  6 +++++-
 arch/x86/kernel/signal.c           | 19 ++++++++++---------
 arch/x86/kernel/signal_compat.c    | 30 +++++++++++++++++++++++++++---
 kernel/signal.c                    |  5 +++++
 6 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 2f29f4e407c3..cb13c0564ea7 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -378,7 +378,7 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
 		put_user_ex(*((u64 *)&code), (u64 __user *)frame->retcode);
 	} put_user_catch(err);
 
-	err |= copy_siginfo_to_user32(&frame->info, &ksig->info);
+	err |= __copy_siginfo_to_user32(&frame->info, &ksig->info, false);
 	err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
 				     regs, set->sig[0]);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
index 0e970d00dfcd..20a1fbf7fe4e 100644
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -19,6 +19,12 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
 # define ia32_setup_rt_frame	__setup_rt_frame
 #endif
 
+#ifdef CONFIG_COMPAT
+int __copy_siginfo_to_user32(compat_siginfo_t __user *to,
+		const siginfo_t *from, bool x32_ABI);
+#endif
+
+
 extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
 			      struct task_struct *tsk);
 extern void convert_to_fxsr(struct task_struct *tsk,
diff --git a/arch/x86/include/uapi/asm/signal.h b/arch/x86/include/uapi/asm/signal.h
index 8264f47cf53e..9c663b6fc023 100644
--- a/arch/x86/include/uapi/asm/signal.h
+++ b/arch/x86/include/uapi/asm/signal.h
@@ -70,6 +70,8 @@ typedef unsigned long sigset_t;
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
+ * SA_IA32_ABI/SA_X32_ABI indicates ABI for a signal frame,
+ *   if neither is set, the kernel will set them according to a syscall ABI
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -85,7 +87,9 @@ typedef unsigned long sigset_t;
 #define SA_NOMASK	SA_NODEFER
 #define SA_ONESHOT	SA_RESETHAND
 
-#define SA_RESTORER	0x04000000
+#define SA_RESTORER	0x04000000u
+#define SA_IA32_ABI	0x02000000u
+#define SA_X32_ABI	0x01000000u
 
 #define MINSIGSTKSZ	2048
 #define SIGSTKSZ	8192
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 22cc2f9f8aec..55f37b37148f 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -547,7 +547,7 @@ static int x32_setup_rt_frame(struct ksignal *ksig,
 		return -EFAULT;
 
 	if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
-		if (copy_siginfo_to_user32(&frame->info, &ksig->info))
+		if (__copy_siginfo_to_user32(&frame->info, &ksig->info, true))
 			return -EFAULT;
 	}
 
@@ -660,20 +660,21 @@ badframe:
 	return 0;
 }
 
-static inline int is_ia32_compat_frame(void)
+static inline int is_ia32_compat_frame(struct ksignal *ksig)
 {
 	return config_enabled(CONFIG_IA32_EMULATION) &&
-	       test_thread_flag(TIF_IA32);
+		ksig->ka.sa.sa_flags & SA_IA32_ABI;
 }
 
-static inline int is_ia32_frame(void)
+static inline int is_ia32_frame(struct ksignal *ksig)
 {
-	return config_enabled(CONFIG_X86_32) || is_ia32_compat_frame();
+	return config_enabled(CONFIG_X86_32) || is_ia32_compat_frame(ksig);
 }
 
-static inline int is_x32_frame(void)
+static inline int is_x32_frame(struct ksignal *ksig)
 {
-	return config_enabled(CONFIG_X86_X32_ABI) && test_thread_flag(TIF_X32);
+	return config_enabled(CONFIG_X86_X32_ABI) &&
+		ksig->ka.sa.sa_flags & SA_X32_ABI;
 }
 
 static int
@@ -684,12 +685,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
 	/* Set up the stack frame */
-	if (is_ia32_frame()) {
+	if (is_ia32_frame(ksig)) {
 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
 			return ia32_setup_rt_frame(usig, ksig, cset, regs);
 		else
 			return ia32_setup_frame(usig, ksig, cset, regs);
-	} else if (is_x32_frame()) {
+	} else if (is_x32_frame(ksig)) {
 		return x32_setup_rt_frame(ksig, cset, regs);
 	} else {
 		return __setup_rt_frame(ksig->sig, ksig, set, regs);
diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index dc3c0b1c816f..9e7da5f402dd 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -1,10 +1,28 @@
 #include <linux/compat.h>
 #include <linux/uaccess.h>
+#include <linux/ptrace.h>
 
-int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
+void sigaction_compat_abi(struct k_sigaction *act)
+{
+	if (!act)
+		return;
+
+	/* don't let flags to be set from userspace */
+	act->sa.sa_flags &= ~(SA_IA32_ABI | SA_X32_ABI);
+
+	if (user_64bit_mode(current_pt_regs()))
+		return;
+
+	if (in_ia32_syscall())
+		act->sa.sa_flags |= SA_IA32_ABI;
+	if (in_x32_syscall())
+		act->sa.sa_flags |= SA_X32_ABI;
+}
+
+int __copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from,
+		bool x32_ABI)
 {
 	int err = 0;
-	bool ia32 = test_thread_flag(TIF_IA32);
 
 	if (!access_ok(VERIFY_WRITE, to, sizeof(compat_siginfo_t)))
 		return -EFAULT;
@@ -38,7 +56,7 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
 				put_user_ex(from->si_arch, &to->si_arch);
 				break;
 			case __SI_CHLD >> 16:
-				if (ia32) {
+				if (!x32_ABI) {
 					put_user_ex(from->si_utime, &to->si_utime);
 					put_user_ex(from->si_stime, &to->si_stime);
 				} else {
@@ -72,6 +90,12 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
 	return err;
 }
 
+/* from syscall's path, where we know the ABI */
+int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
+{
+	return __copy_siginfo_to_user32(to, from, in_x32_syscall());
+}
+
 int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
 {
 	int err = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index 96e9bc40667f..d452fd846c35 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3048,6 +3048,10 @@ void kernel_sigaction(int sig, __sighandler_t action)
 }
 EXPORT_SYMBOL(kernel_sigaction);
 
+void __weak sigaction_compat_abi(struct k_sigaction *act)
+{
+}
+
 int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 {
 	struct task_struct *p = current, *t;
@@ -3058,6 +3062,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 		return -EINVAL;
 
 	k = &p->sighand->action[sig-1];
+	sigaction_compat_abi(act);
 
 	spin_lock_irq(&p->sighand->siglock);
 	if (oact)
-- 
2.8.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/6] x86: 32-bit compatible C/R on x86_64
  2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
                   ` (5 preceding siblings ...)
  2016-06-01 13:11 ` [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags Dmitry Safonov
@ 2016-06-01 13:15 ` Dmitry Safonov
  6 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-01 13:15 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, gorcunov, xemul, khorenko

On 06/01/2016 04:11 PM, Dmitry Safonov wrote:
> This patches set is an attempt to add checkpoint/restore
> for 32-bit tasks in compatibility mode on x86_64 hosts.
>
> Restore in CRIU starts from one root restoring process, which
> reads info for all threads being restored from images files.
> This information is used further to to find out which processes
> share some resources. Later shared resources are restored only
> by one process and all other inherit them.
> After that it calls clone() and new threads restore their
> properties in parallel. Those threads inherit all parent's
> mappings and fetch properties from those mappings
> (and do clone themself, if they have children/subthreads). [1]
> Then starts restorer blob's play, it's PIE binary, which
> unmaps all unneeded for restoring VMAs, maps new VMAs and
> finalize restoring with sigreturn syscall. [2]
>
> To restore of 32-bit task we need three things to do in running
> x86_64 restorer blob:
> a) set code selector to __USER32_CS (to run 32-bit code);
> b) remap vdso blob from 64-bit to 32-bit
>    This is primary needed because restore may happen on a different
>    kernel, which has different vDSO image than we had on dump.
> c) if 32-bit vDSO differ to dumped image, move it on free place
>    and add jump trampolines to that place.
> d) switch TIF_IA32 flag, so kernel would know that it deals with
>    compatible 32-bit application.
>
> From all this:
> a) setting CS may be done from userspace, no patches needed;
> b) patches 1-3 add ability to map different vDSO blobs on x86 kernel;
> c) for remapping/moving 32-bit vDSO blob patches have been send earlier
>    and seems to be accepted [3]
> d) and for swapping TIF_IA32 flag discussion with Andy ended in conclusion
>    that it's better to remove this flag completely.
>    Patches 4-6 deletes usage of TIF_IA32 from ptrace, signal and coredump
>    code. This is rework/resend of RFC [4]
>
> [1] https://criu.org/Checkpoint/Restore#Restore
> [2] https://criu.org/Restorer_context
> [3] https://lkml.org/lkml/2016/5/17/243
> [4] https://lkml.org/lkml/2016/4/25/650
>
> Dmitry Safonov (6):
>   x86/vdso: unmap vdso blob on vvar mapping failure
>   x86/vdso: introduce do_map_vdso() and vdso_type enum
>   x86/arch_prctl/vdso: add ARCH_MAP_VDSO_*
>   x86/coredump: use core regs, rather that TIF_IA32 flag
>   x86/ptrace: down with test_thread_flag(TIF_IA32)
>   x86/signal: add SA_{X32,IA32}_ABI sa_flags

Oh, forgot to mention: this patches set is based on [3],
so kernel build bot may be not happy about applying them.
Anyway, I hope for the next revisions, the base-patches
will be applied to linux-next and build-bot will test them
just fine.

>
>  arch/x86/entry/vdso/vma.c          | 72 +++++++++++++++++++++-----------------
>  arch/x86/ia32/ia32_signal.c        |  2 +-
>  arch/x86/include/asm/compat.h      |  8 ++---
>  arch/x86/include/asm/fpu/signal.h  |  6 ++++
>  arch/x86/include/asm/vdso.h        |  4 +++
>  arch/x86/include/uapi/asm/prctl.h  |  6 ++++
>  arch/x86/include/uapi/asm/signal.h |  6 +++-
>  arch/x86/kernel/process_64.c       | 10 ++++++
>  arch/x86/kernel/ptrace.c           |  2 +-
>  arch/x86/kernel/signal.c           | 19 +++++-----
>  arch/x86/kernel/signal_compat.c    | 30 ++++++++++++++--
>  fs/binfmt_elf.c                    | 18 +++++-----
>  kernel/signal.c                    |  5 +++
>  13 files changed, 129 insertions(+), 59 deletions(-)
>


-- 
Regards,
Dmitry Safonov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum
  2016-06-01 13:11 ` [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum Dmitry Safonov
@ 2016-06-03  9:50   ` Cyrill Gorcunov
  2016-06-03 10:03     ` Dmitry Safonov
  0 siblings, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2016-06-03  9:50 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski

On Wed, Jun 01, 2016 at 04:11:33PM +0300, Dmitry Safonov wrote:
> Make in-kernel API to map vDSO blobs on x86.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>

Hi! Sorry for delay in reply. Dima, here is a moment I somehow missing:
when do_map_vdso is called it is implied that old vdso is unmapped already,
or both can be present in the system?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-01 13:11 ` [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag Dmitry Safonov
@ 2016-06-03  9:51   ` Cyrill Gorcunov
  2016-06-03  9:56     ` Dmitry Safonov
  2016-06-06 20:45   ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2016-06-03  9:51 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On Wed, Jun 01, 2016 at 04:11:35PM +0300, Dmitry Safonov wrote:
> As we have here core registers, use them to determine application's
> mode and sizes of register set and elf_prstatus instead of TIF_IA32
> flag.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>

This can be used unrelated to vdso re-mapping idea, right?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-03  9:51   ` Cyrill Gorcunov
@ 2016-06-03  9:56     ` Dmitry Safonov
  2016-06-03 10:01       ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-03  9:56 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On 06/03/2016 12:51 PM, Cyrill Gorcunov wrote:
> On Wed, Jun 01, 2016 at 04:11:35PM +0300, Dmitry Safonov wrote:
>> As we have here core registers, use them to determine application's
>> mode and sizes of register set and elf_prstatus instead of TIF_IA32
>> flag.
>>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> Cc: linux-fsdevel@vger.kernel.org
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>
> This can be used unrelated to vdso re-mapping idea, right?
>

Right -- 4,5,6 are pathes for dropping TIF_IA32 flag.
I put them in this patches set because of their goal,
to make tasks independent of presence of TIF_IA32.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-01 13:11 ` [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32) Dmitry Safonov
@ 2016-06-03  9:57   ` Cyrill Gorcunov
  2016-06-03 10:27     ` Dmitry Safonov
  2016-06-06 21:19   ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Cyrill Gorcunov @ 2016-06-03  9:57 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski

On Wed, Jun 01, 2016 at 04:11:36PM +0300, Dmitry Safonov wrote:
> As the task isn't executing at the moment of {GET,SET}REGS,
> return regset that corresponds to code selector, rather than
> value of TIF_IA32 flag.
> I.e. if we ptrace i386 elf binary that has just changed it's
> code selector to __USER_CS, than GET_REGS will return
> full x86_64 register set.
> 
> Note, that this will work only if application has changed it's CS.
> If the application does 32-bit syscall with __USER_CS, ptrace
> will still return 64-bit register set. Which might be still confusing
> for tools that expect TS_COMPACT to be exposed [1, 2].
> 
> So this this change should make PTRACE_GETREGSET more reliable and
> this will be another step to drop TIF_{IA32,X32} flags.
> 
> [1]: https://sourceforge.net/p/strace/mailman/message/30471411/
> [2]: https://lkml.org/lkml/2012/1/18/320
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>

Looks reasonable! Still if cs has been changed to non-compat
selector and we now return 64bit registers set, won't it
cause problems for old tools? I suspect it should not but
still.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-03  9:56     ` Dmitry Safonov
@ 2016-06-03 10:01       ` Cyrill Gorcunov
  0 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2016-06-03 10:01 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On Fri, Jun 03, 2016 at 12:56:30PM +0300, Dmitry Safonov wrote:
> > 
> > This can be used unrelated to vdso re-mapping idea, right?
> > 
> 
> Right -- 4,5,6 are pathes for dropping TIF_IA32 flag.
> I put them in this patches set because of their goal,
> to make tasks independent of presence of TIF_IA32.

Ah, great. So this is continues Andy's idea of
ripping off this flag. Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum
  2016-06-03  9:50   ` Cyrill Gorcunov
@ 2016-06-03 10:03     ` Dmitry Safonov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-03 10:03 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski

On 06/03/2016 12:50 PM, Cyrill Gorcunov wrote:
> On Wed, Jun 01, 2016 at 04:11:33PM +0300, Dmitry Safonov wrote:
>> Make in-kernel API to map vDSO blobs on x86.
>>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>
> Hi! Sorry for delay in reply. Dima, here is a moment I somehow missing:
> when do_map_vdso is called it is implied that old vdso is unmapped already,
> or both can be present in the system?
>

Both can be present. There are some limitations: if first vDSO was
32-bit, syscalls through it wouldn't work, only on the last mapped.
It's because of mm->context.vdso pointer, which for 32-bit vDSO
should point on it. (and this code make it point to a new mapped vDSO)

On RFC I did patch that unmaps previous vDSO, but I have though
it's better to simplify this code and drop unmapping.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-03  9:57   ` Cyrill Gorcunov
@ 2016-06-03 10:27     ` Dmitry Safonov
  2016-06-03 10:41       ` Cyrill Gorcunov
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-03 10:27 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski

On 06/03/2016 12:57 PM, Cyrill Gorcunov wrote:
> On Wed, Jun 01, 2016 at 04:11:36PM +0300, Dmitry Safonov wrote:
>> As the task isn't executing at the moment of {GET,SET}REGS,
>> return regset that corresponds to code selector, rather than
>> value of TIF_IA32 flag.
>> I.e. if we ptrace i386 elf binary that has just changed it's
>> code selector to __USER_CS, than GET_REGS will return
>> full x86_64 register set.
>>
>> Note, that this will work only if application has changed it's CS.
>> If the application does 32-bit syscall with __USER_CS, ptrace
>> will still return 64-bit register set. Which might be still confusing
>> for tools that expect TS_COMPACT to be exposed [1, 2].
>>
>> So this this change should make PTRACE_GETREGSET more reliable and
>> this will be another step to drop TIF_{IA32,X32} flags.
>>
>> [1]: https://sourceforge.net/p/strace/mailman/message/30471411/
>> [2]: https://lkml.org/lkml/2012/1/18/320
>>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>
> Looks reasonable! Still if cs has been changed to non-compat
> selector and we now return 64bit registers set, won't it
> cause problems for old tools? I suspect it should not but
> still.
>

Thanks! Hmm, strace works fine - I'll check gdb to be sure.
What else could be bothered by this?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-03 10:27     ` Dmitry Safonov
@ 2016-06-03 10:41       ` Cyrill Gorcunov
  0 siblings, 0 replies; 30+ messages in thread
From: Cyrill Gorcunov @ 2016-06-03 10:41 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	xemul, khorenko, Andy Lutomirski

On Fri, Jun 03, 2016 at 01:27:33PM +0300, Dmitry Safonov wrote:
> > 
> > Looks reasonable! Still if cs has been changed to non-compat
> > selector and we now return 64bit registers set, won't it
> > cause problems for old tools? I suspect it should not but
> > still.
> > 
> 
> Thanks! Hmm, strace works fine - I'll check gdb to be sure.
> What else could be bothered by this?

None I know of at the moment.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags
  2016-06-01 13:11 ` [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags Dmitry Safonov
@ 2016-06-04  5:08   ` Andy Lutomirski
  2016-06-04 15:57     ` Dmitry Safonov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2016-06-04  5:08 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: H. Peter Anvin, Dmitry Safonov, khorenko, linux-kernel,
	Thomas Gleixner, Cyrill Gorcunov, Oleg Nesterov, xemul, X86 ML,
	Ingo Molnar

On Jun 1, 2016 6:13 AM, "Dmitry Safonov" <dsafonov@virtuozzo.com> wrote:
>
> Introduce new flags that defines which ABI to use on creating sigframe.
> Those flags kernel will set according to sigaction syscall ABI,
> which set handler for the signal being delivered.
>
> So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver.
> Those flags will be used only under CONFIG_COMPAT.
>
> Similar way ARM uses sa_flags to differ in which mode deliver signal
> for 26-bit applications (look at SA_THIRYTWO).
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
> ---
>  arch/x86/ia32/ia32_signal.c        |  2 +-
>  arch/x86/include/asm/fpu/signal.h  |  6 ++++++
>  arch/x86/include/uapi/asm/signal.h |  6 +++++-
>  arch/x86/kernel/signal.c           | 19 ++++++++++---------
>  arch/x86/kernel/signal_compat.c    | 30 +++++++++++++++++++++++++++---
>  kernel/signal.c                    |  5 +++++
>  6 files changed, 54 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
> index 2f29f4e407c3..cb13c0564ea7 100644
> --- a/arch/x86/ia32/ia32_signal.c
> +++ b/arch/x86/ia32/ia32_signal.c
> @@ -378,7 +378,7 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
>                 put_user_ex(*((u64 *)&code), (u64 __user *)frame->retcode);
>         } put_user_catch(err);
>
> -       err |= copy_siginfo_to_user32(&frame->info, &ksig->info);
> +       err |= __copy_siginfo_to_user32(&frame->info, &ksig->info, false);
>         err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
>                                      regs, set->sig[0]);
>         err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
> diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
> index 0e970d00dfcd..20a1fbf7fe4e 100644
> --- a/arch/x86/include/asm/fpu/signal.h
> +++ b/arch/x86/include/asm/fpu/signal.h
> @@ -19,6 +19,12 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
>  # define ia32_setup_rt_frame   __setup_rt_frame
>  #endif
>
> +#ifdef CONFIG_COMPAT
> +int __copy_siginfo_to_user32(compat_siginfo_t __user *to,
> +               const siginfo_t *from, bool x32_ABI);
> +#endif
> +
> +
>  extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
>                               struct task_struct *tsk);
>  extern void convert_to_fxsr(struct task_struct *tsk,
> diff --git a/arch/x86/include/uapi/asm/signal.h b/arch/x86/include/uapi/asm/signal.h
> index 8264f47cf53e..9c663b6fc023 100644
> --- a/arch/x86/include/uapi/asm/signal.h
> +++ b/arch/x86/include/uapi/asm/signal.h
> @@ -70,6 +70,8 @@ typedef unsigned long sigset_t;
>   * SA_RESETHAND clears the handler when the signal is delivered.
>   * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
>   * SA_NODEFER prevents the current signal from being masked in the handler.
> + * SA_IA32_ABI/SA_X32_ABI indicates ABI for a signal frame,
> + *   if neither is set, the kernel will set them according to a syscall ABI

I would prefer if these weren't in the UAPI header.  If you want a
foreign signal frame type, do a foreign syscall.  This also means that
you should strip these flags if a user sets them directly.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags
  2016-06-04  5:08   ` Andy Lutomirski
@ 2016-06-04 15:57     ` Dmitry Safonov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-04 15:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, H. Peter Anvin, khorenko, linux-kernel,
	Thomas Gleixner, Cyrill Gorcunov, Oleg Nesterov, xemul, X86 ML,
	Ingo Molnar

2016-06-04 8:08 GMT+03:00 Andy Lutomirski <luto@amacapital.net>:
> On Jun 1, 2016 6:13 AM, "Dmitry Safonov" <dsafonov@virtuozzo.com> wrote:

>> --- a/arch/x86/include/uapi/asm/signal.h
>> +++ b/arch/x86/include/uapi/asm/signal.h
>> @@ -70,6 +70,8 @@ typedef unsigned long sigset_t;
>>   * SA_RESETHAND clears the handler when the signal is delivered.
>>   * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
>>   * SA_NODEFER prevents the current signal from being masked in the handler.
>> + * SA_IA32_ABI/SA_X32_ABI indicates ABI for a signal frame,
>> + *   if neither is set, the kernel will set them according to a syscall ABI
>
> I would prefer if these weren't in the UAPI header.  If you want a
> foreign signal frame type, do a foreign syscall.  This also means that
> you should strip these flags if a user sets them directly.

Oh, thanks Andy!
I remember you have said it to the previous RFC.
So I did it -- please, check sigaction_compat_abi().
It seems like, I forget to correct this comment accordingly (will do to v2).
Shouldn't I mark in uapi header that those flag bits are reserved,
defining them there?

-- 
Regards,
Safonov Dmitry.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-01 13:11 ` [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag Dmitry Safonov
  2016-06-03  9:51   ` Cyrill Gorcunov
@ 2016-06-06 20:45   ` Oleg Nesterov
  2016-06-06 22:43     ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-06 20:45 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, gorcunov,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On 06/01, Dmitry Safonov wrote:
>
>  static int fill_thread_core_info(struct elf_thread_core_info *t,
>  				 const struct user_regset_view *view,
> -				 long signr, size_t *total)
> +				 long signr, size_t *total,
> +				 struct pt_regs *regs __maybe_unused)
>  {
>  	unsigned int i;
>  
> @@ -1652,11 +1653,11 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
>  	 */
>  	fill_prstatus(&t->prstatus, t->task, signr);
>  	(void) view->regsets[0].get(t->task, &view->regsets[0],
> -				    0, PR_REG_SIZE(t->prstatus.pr_reg),
> +				    0, PR_REG_SIZE(t->prstatus.pr_reg, regs),

Hmm. I don't understand this... Note that this "regs" argument has nothing
to do with t->task if the process is multithreaded,

> @@ -1772,7 +1773,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
>  	 * Now fill in each thread's information.
>  	 */
>  	for (t = info->thread; t != NULL; t = t->next)
> -		if (!fill_thread_core_info(t, view, siginfo->si_signo, &info->size))
> +		if (!fill_thread_core_info(t, view, siginfo->si_signo,
> +					&info->size, regs))

fill_note_info(..., args) is called with args = task_pt_regs(dumper_thread).

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-01 13:11 ` [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32) Dmitry Safonov
  2016-06-03  9:57   ` Cyrill Gorcunov
@ 2016-06-06 21:19   ` Oleg Nesterov
  2016-06-07 11:38     ` Dmitry Safonov
  2016-06-09 17:21     ` Andy Lutomirski
  1 sibling, 2 replies; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-06 21:19 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, gorcunov,
	xemul, khorenko, Andy Lutomirski

On 06/01, Dmitry Safonov wrote:
>
> Note, that this will work only if application has changed it's CS.

So, suppose it changes it's CS and crashes,

> If the application does 32-bit syscall with __USER_CS, ptrace
> @@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
>  const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>  {
>  #ifdef CONFIG_IA32_EMULATION
> -	if (test_tsk_thread_flag(task, TIF_IA32))
> +	if (!user_64bit_mode(task_pt_regs(task)))
>  #endif

then coredump will do fill_elf_header(view->e_machine) and use EM_X86_64
instead of EM_386, or vice versa...

I simply can't understand is this better or worse, I guess gdb or any
other tool which looks at this coredump will be confused anyway.

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-06 20:45   ` Oleg Nesterov
@ 2016-06-06 22:43     ` Oleg Nesterov
  2016-06-08 13:28       ` Dmitry Safonov
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-06 22:43 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, gorcunov,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On 06/06, Oleg Nesterov wrote:
>
> On 06/01, Dmitry Safonov wrote:
> >
> >  static int fill_thread_core_info(struct elf_thread_core_info *t,
> >  				 const struct user_regset_view *view,
> > -				 long signr, size_t *total)
> > +				 long signr, size_t *total,
> > +				 struct pt_regs *regs __maybe_unused)
> >  {
> >  	unsigned int i;
> >
> > @@ -1652,11 +1653,11 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
> >  	 */
> >  	fill_prstatus(&t->prstatus, t->task, signr);
> >  	(void) view->regsets[0].get(t->task, &view->regsets[0],
> > -				    0, PR_REG_SIZE(t->prstatus.pr_reg),
> > +				    0, PR_REG_SIZE(t->prstatus.pr_reg, regs),
>
> Hmm. I don't understand this... Note that this "regs" argument has nothing
> to do with t->task if the process is multithreaded,
>
> > @@ -1772,7 +1773,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
> >  	 * Now fill in each thread's information.
> >  	 */
> >  	for (t = info->thread; t != NULL; t = t->next)
> > -		if (!fill_thread_core_info(t, view, siginfo->si_signo, &info->size))
> > +		if (!fill_thread_core_info(t, view, siginfo->si_signo,
> > +					&info->size, regs))
>
> fill_note_info(..., args) is called with args = task_pt_regs(dumper_thread).

forgot to mention... yes, this matches the fact we use a single "view"
for all threads, and we get it via task_user_regset_view(dump_task).

But this change (imo) adds even more confusion, and without the next patch
the logic looks "obviously wrong", becauase PR_REG_SIZE/etc look at
dumper_thread->cs while task_user_regset_view() checks thread flags.

Anyway I fail to understand these macros... Say, PR_REG_SIZE(S). Can't we
kill it and use regsets[0].n * regsets[0].size instead ? These numbers
should match whatever we do, if we call ->get().

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-06 21:19   ` Oleg Nesterov
@ 2016-06-07 11:38     ` Dmitry Safonov
  2016-06-09 17:21     ` Andy Lutomirski
  1 sibling, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-07 11:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, gorcunov,
	xemul, khorenko, Andy Lutomirski

On 06/07/2016 12:19 AM, Oleg Nesterov wrote:
> On 06/01, Dmitry Safonov wrote:
>>
>> Note, that this will work only if application has changed it's CS.
>
> So, suppose it changes it's CS and crashes,
>
>> If the application does 32-bit syscall with __USER_CS, ptrace
>> @@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
>>  const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>>  {
>>  #ifdef CONFIG_IA32_EMULATION
>> -	if (test_tsk_thread_flag(task, TIF_IA32))
>> +	if (!user_64bit_mode(task_pt_regs(task)))
>>  #endif
>
> then coredump will do fill_elf_header(view->e_machine) and use EM_X86_64
> instead of EM_386, or vice versa...
>
> I simply can't understand is this better or worse, I guess gdb or any
> other tool which looks at this coredump will be confused anyway.

Oleg, thanks for your reply!

At least, gdb will have all registers at that moment.
Firstly, I did it for returning with GET_REGSET corresponding
register set. So ptrace-attaching to a task with 64-bit code
selector would return 64-bit register set and contrariwise.
(as task may use all 64-bit registers, that seems logical
for me even if it has been loaded as 32-bit ELF)

And for coredump -- I guess it will show full register set
on the crash moment and show proper disasm around that place.
And for my purpose -- that's really what I need, to generate
32-bit ELF core file on crash after changing CS.

I did it for C/R of 32-bit application with changing code
selector from 64-bit. So last thing restorer does - unmap
itself (it's pie blob) and sigreturn to application.
So if the application crashes after it - it will be good
to have 32-bit coredump as for the original application.

Thanks,
Dmitry Safonov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag
  2016-06-06 22:43     ` Oleg Nesterov
@ 2016-06-08 13:28       ` Dmitry Safonov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-08 13:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, gorcunov,
	xemul, khorenko, Andy Lutomirski, Alexander Viro, linux-fsdevel

On 06/07/2016 01:43 AM, Oleg Nesterov wrote:
> On 06/06, Oleg Nesterov wrote:
>>
>> On 06/01, Dmitry Safonov wrote:
>>>
>>>  static int fill_thread_core_info(struct elf_thread_core_info *t,
>>>  				 const struct user_regset_view *view,
>>> -				 long signr, size_t *total)
>>> +				 long signr, size_t *total,
>>> +				 struct pt_regs *regs __maybe_unused)
>>>  {
>>>  	unsigned int i;
>>>
>>> @@ -1652,11 +1653,11 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
>>>  	 */
>>>  	fill_prstatus(&t->prstatus, t->task, signr);
>>>  	(void) view->regsets[0].get(t->task, &view->regsets[0],
>>> -				    0, PR_REG_SIZE(t->prstatus.pr_reg),
>>> +				    0, PR_REG_SIZE(t->prstatus.pr_reg, regs),
>>
>> Hmm. I don't understand this... Note that this "regs" argument has nothing
>> to do with t->task if the process is multithreaded,
>>
>>> @@ -1772,7 +1773,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
>>>  	 * Now fill in each thread's information.
>>>  	 */
>>>  	for (t = info->thread; t != NULL; t = t->next)
>>> -		if (!fill_thread_core_info(t, view, siginfo->si_signo, &info->size))
>>> +		if (!fill_thread_core_info(t, view, siginfo->si_signo,
>>> +					&info->size, regs))
>>
>> fill_note_info(..., args) is called with args = task_pt_regs(dumper_thread).
>
> forgot to mention... yes, this matches the fact we use a single "view"
> for all threads, and we get it via task_user_regset_view(dump_task).
>
> But this change (imo) adds even more confusion, and without the next patch
> the logic looks "obviously wrong", becauase PR_REG_SIZE/etc look at
> dumper_thread->cs while task_user_regset_view() checks thread flags.
>
> Anyway I fail to understand these macros... Say, PR_REG_SIZE(S). Can't we
> kill it and use regsets[0].n * regsets[0].size instead ? These numbers
> should match whatever we do, if we call ->get().
>

Thanks, the idea of dropping PR_REG_SIZE looks better than my patch!
I'll try to drop those macros for the next revision.

-- 
Regards,
Dmitry Safonov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-06 21:19   ` Oleg Nesterov
  2016-06-07 11:38     ` Dmitry Safonov
@ 2016-06-09 17:21     ` Andy Lutomirski
  2016-06-10 20:07       ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2016-06-09 17:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

On Jun 6, 2016 3:21 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
> On 06/01, Dmitry Safonov wrote:
> >
> > Note, that this will work only if application has changed it's CS.
>
> So, suppose it changes it's CS and crashes,
>
> > If the application does 32-bit syscall with __USER_CS, ptrace
> > @@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
> >  const struct user_regset_view *task_user_regset_view(struct task_struct *task)
> >  {
> >  #ifdef CONFIG_IA32_EMULATION
> > -     if (test_tsk_thread_flag(task, TIF_IA32))
> > +     if (!user_64bit_mode(task_pt_regs(task)))
> >  #endif
>
> then coredump will do fill_elf_header(view->e_machine) and use EM_X86_64
> instead of EM_386, or vice versa...
>
> I simply can't understand is this better or worse, I guess gdb or any
> other tool which looks at this coredump will be confused anyway.
>

I think it's better. CRIU will change CS and someone will make the
restored process crash afterwards.

> Oleg.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-09 17:21     ` Andy Lutomirski
@ 2016-06-10 20:07       ` Oleg Nesterov
  2016-06-10 20:14         ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-10 20:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

On 06/09, Andy Lutomirski wrote:
>
> On Jun 6, 2016 3:21 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >
> > On 06/01, Dmitry Safonov wrote:
> > >
> > > Note, that this will work only if application has changed it's CS.
> >
> > So, suppose it changes it's CS and crashes,
> >
> > > If the application does 32-bit syscall with __USER_CS, ptrace
> > > @@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
> > >  const struct user_regset_view *task_user_regset_view(struct task_struct *task)
> > >  {
> > >  #ifdef CONFIG_IA32_EMULATION
> > > -     if (test_tsk_thread_flag(task, TIF_IA32))
> > > +     if (!user_64bit_mode(task_pt_regs(task)))
> > >  #endif
> >
> > then coredump will do fill_elf_header(view->e_machine) and use EM_X86_64
> > instead of EM_386, or vice versa...
> >
> > I simply can't understand is this better or worse, I guess gdb or any
> > other tool which looks at this coredump will be confused anyway.
> >
>
> I think it's better.

and I tend to agree, I didn't try to argue with this change, but

> CRIU will change CS and someone will make the
> restored process crash afterwards.

I don't understand what do you mean... could you explain?

IIRC, CRIU can't c/r the 32-bit applications, or this is no longer true?

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-10 20:07       ` Oleg Nesterov
@ 2016-06-10 20:14         ` Andy Lutomirski
  2016-06-13 13:50           ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2016-06-10 20:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

On Fri, Jun 10, 2016 at 1:07 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 06/09, Andy Lutomirski wrote:
>>
>> On Jun 6, 2016 3:21 PM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>> >
>> > On 06/01, Dmitry Safonov wrote:
>> > >
>> > > Note, that this will work only if application has changed it's CS.
>> >
>> > So, suppose it changes it's CS and crashes,
>> >
>> > > If the application does 32-bit syscall with __USER_CS, ptrace
>> > > @@ -1355,7 +1355,7 @@ void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
>> > >  const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>> > >  {
>> > >  #ifdef CONFIG_IA32_EMULATION
>> > > -     if (test_tsk_thread_flag(task, TIF_IA32))
>> > > +     if (!user_64bit_mode(task_pt_regs(task)))
>> > >  #endif
>> >
>> > then coredump will do fill_elf_header(view->e_machine) and use EM_X86_64
>> > instead of EM_386, or vice versa...
>> >
>> > I simply can't understand is this better or worse, I guess gdb or any
>> > other tool which looks at this coredump will be confused anyway.
>> >
>>
>> I think it's better.
>
> and I tend to agree, I didn't try to argue with this change, but
>
>> CRIU will change CS and someone will make the
>> restored process crash afterwards.
>
> I don't understand what do you mean... could you explain?
>
> IIRC, CRIU can't c/r the 32-bit applications, or this is no longer true?
>

CRIU has a horrible, nasty, brilliant idea: it will start restoring
32-bit processes by treating them mostly like 64-bit processes.  The
restorer will start out 64-bit, set everything up, and long
jump/return/sigreturn/whatever back to 32-bit mode.  My proposal was
that, rather than coming up with nasty hacks to switch the kernel's
idea of the task bitness, we instead teach the kernel to respect that
actual bitness as indicated by CS and the syscalls used to the extent
possible.

So, yes, a restored 32-bit process that crashes should dump core as
though it's 32-bit even though it was 64-bit when execve was last
called :)

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-10 20:14         ` Andy Lutomirski
@ 2016-06-13 13:50           ` Oleg Nesterov
  2016-06-13 20:40             ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-13 13:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

To avoid the confusion, let me first say that I am not going to argue
with these changes, I simply do not understand the problem space enough.

On 06/10, Andy Lutomirski wrote:
>
> On Fri, Jun 10, 2016 at 1:07 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > IIRC, CRIU can't c/r the 32-bit applications, or this is no longer true?
> >
>
> CRIU has a horrible, nasty, brilliant idea: it will start restoring
> 32-bit processes by treating them mostly like 64-bit processes.  The
> restorer will start out 64-bit, set everything up, and long
> jump/return/sigreturn/whatever back to 32-bit mode.

OK, I see,

> My proposal was
> that, rather than coming up with nasty hacks to switch the kernel's
> idea of the task bitness,

Well, I can't resist but to me SA_IA32_ABI/SA_X32_ABI looks like a hack
too.  We actually shift TIF_*32 into k_sigaction->flags, and the fact
that we do this per-signal looks, well, interesting ;)

And at first glance it would be very simple to change the task bitness,
CRIU can simply exec a dummy 32-bit application before anything else.
In this case (I think) we also do not need do_map_vdso/ARCH_MAP_VDSO_*
at least right now.

Yes, I guess this will complicate CRIU significantly.

> we instead teach the kernel to respect that
> actual bitness as indicated by CS and the syscalls used to the extent
> possible.

I am still not sure the idea to remove TIF_IA32/TIF_X32 is really good.
But again, I won't argue, I do not feel I understand pro/cons enough.

> So, yes, a restored 32-bit process that crashes should dump core as
> though it's 32-bit even though it was 64-bit when execve was last
> called :)

OK, thanks for you explanation Andy.

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-13 13:50           ` Oleg Nesterov
@ 2016-06-13 20:40             ` Andy Lutomirski
  2016-06-14 14:34               ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2016-06-13 20:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

On Mon, Jun 13, 2016 at 6:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> To avoid the confusion, let me first say that I am not going to argue
> with these changes, I simply do not understand the problem space enough.
>
> On 06/10, Andy Lutomirski wrote:
>>
>> On Fri, Jun 10, 2016 at 1:07 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> > IIRC, CRIU can't c/r the 32-bit applications, or this is no longer true?
>> >
>>
>> CRIU has a horrible, nasty, brilliant idea: it will start restoring
>> 32-bit processes by treating them mostly like 64-bit processes.  The
>> restorer will start out 64-bit, set everything up, and long
>> jump/return/sigreturn/whatever back to 32-bit mode.
>
> OK, I see,
>
>> My proposal was
>> that, rather than coming up with nasty hacks to switch the kernel's
>> idea of the task bitness,
>
> Well, I can't resist but to me SA_IA32_ABI/SA_X32_ABI looks like a hack
> too.  We actually shift TIF_*32 into k_sigaction->flags, and the fact
> that we do this per-signal looks, well, interesting ;)

Is anything actually wrong with this, though?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-13 20:40             ` Andy Lutomirski
@ 2016-06-14 14:34               ` Oleg Nesterov
  2016-06-14 14:43                 ` Dmitry Safonov
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2016-06-14 14:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, H. Peter Anvin, Dmitry Safonov, khorenko,
	linux-kernel, Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML,
	Ingo Molnar

On 06/13, Andy Lutomirski wrote:
>
> On Mon, Jun 13, 2016 at 6:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Well, I can't resist but to me SA_IA32_ABI/SA_X32_ABI looks like a hack
> > too.  We actually shift TIF_*32 into k_sigaction->flags, and the fact
> > that we do this per-signal looks, well, interesting ;)
>
> Is anything actually wrong with this, though?

No, I think the patch is correct... It adds the user-visible change, but
I do not think it can break something.

Well, perhaps we should also remove SA_IA32_ABI|SA_X32_ABI from
oact->sa.sa_flags in do_sigaction() to ensure that these non-uapi flags won't
leak to user-space, but probably we do not really care.

Oleg.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32)
  2016-06-14 14:34               ` Oleg Nesterov
@ 2016-06-14 14:43                 ` Dmitry Safonov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Safonov @ 2016-06-14 14:43 UTC (permalink / raw)
  To: Oleg Nesterov, Andy Lutomirski
  Cc: H. Peter Anvin, Dmitry Safonov, khorenko, linux-kernel,
	Thomas Gleixner, Cyrill Gorcunov, xemul, X86 ML, Ingo Molnar

On 06/14/2016 05:34 PM, Oleg Nesterov wrote:

> Well, perhaps we should also remove SA_IA32_ABI|SA_X32_ABI from
> oact->sa.sa_flags in do_sigaction() to ensure that these non-uapi flags won't
> leak to user-space, but probably we do not really care.

Thanks, I missed that - will mask it out to v2.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-06-14 14:44 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 13:11 [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov
2016-06-01 13:11 ` [PATCH 1/6] x86/vdso: unmap vdso blob on vvar mapping failure Dmitry Safonov
2016-06-01 13:11 ` [PATCH 2/6] x86/vdso: introduce do_map_vdso() and vdso_type enum Dmitry Safonov
2016-06-03  9:50   ` Cyrill Gorcunov
2016-06-03 10:03     ` Dmitry Safonov
2016-06-01 13:11 ` [PATCH 3/6] x86/arch_prctl/vdso: add ARCH_MAP_VDSO_* Dmitry Safonov
2016-06-01 13:11 ` [PATCH 4/6] x86/coredump: use core regs, rather that TIF_IA32 flag Dmitry Safonov
2016-06-03  9:51   ` Cyrill Gorcunov
2016-06-03  9:56     ` Dmitry Safonov
2016-06-03 10:01       ` Cyrill Gorcunov
2016-06-06 20:45   ` Oleg Nesterov
2016-06-06 22:43     ` Oleg Nesterov
2016-06-08 13:28       ` Dmitry Safonov
2016-06-01 13:11 ` [PATCH 5/6] x86/ptrace: down with test_thread_flag(TIF_IA32) Dmitry Safonov
2016-06-03  9:57   ` Cyrill Gorcunov
2016-06-03 10:27     ` Dmitry Safonov
2016-06-03 10:41       ` Cyrill Gorcunov
2016-06-06 21:19   ` Oleg Nesterov
2016-06-07 11:38     ` Dmitry Safonov
2016-06-09 17:21     ` Andy Lutomirski
2016-06-10 20:07       ` Oleg Nesterov
2016-06-10 20:14         ` Andy Lutomirski
2016-06-13 13:50           ` Oleg Nesterov
2016-06-13 20:40             ` Andy Lutomirski
2016-06-14 14:34               ` Oleg Nesterov
2016-06-14 14:43                 ` Dmitry Safonov
2016-06-01 13:11 ` [PATCH 6/6] x86/signal: add SA_{X32,IA32}_ABI sa_flags Dmitry Safonov
2016-06-04  5:08   ` Andy Lutomirski
2016-06-04 15:57     ` Dmitry Safonov
2016-06-01 13:15 ` [PATCH 0/6] x86: 32-bit compatible C/R on x86_64 Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).