Linux-ARM-Kernel Archive on lore.kernel.org
 help / Atom feed
* [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64
@ 2019-01-15 13:58 Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 1/4] arm64: uaccess: Cleanup get/put_user() Julien Thierry
                   ` (5 more replies)
  0 siblings, 6 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-15 13:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Julien Thierry, peterz, catalin.marinas, will.deacon, mingo,
	james.morse, hpa, valentin.schneider

Hi,

First version of this series[1] was briefly in linux-next but had to be
reverted due to a bug where schedule would end up being called while
user_access was active[2].

After clarifications[3], rescheduling while in a user_access region is not
allowed.

* Patches 1 and 2 implement the unsafe accessors for arm64
* Patches 3 and 4 clarify this restriction in the API and attempts to
  check against violations of the restriction.

Changes since v2[4]:
- Rebase on v5.0-rc2
- access_ok() is now done in user_access_begin(), so rework accessors
  so access_ok() is not called in unsafe_get/put_user()
- Split addition of unsafe accessors and the user_access_region check
  into separate patches
- Avoid reading SCTLR_EL1 in user_access_region check
- Add build option for user_access_region checking
- Reword clarifications on unsafe accessors API

Changes since v1[1]:
- Add a way to detect code calling schedule within a user_access region
- Make sure put_user/get_user arguments are evaluated before disabling PAN

[1] https://www.spinics.net/lists/arm-kernel/msg674925.html
[2] https://patchwork.kernel.org/patch/10634783/
[3] https://lkml.org/lkml/2018/11/28/50
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/617080.html

Cheers,

Julien

-->

Julien Thierry (4):
  arm64: uaccess: Cleanup get/put_user()
  arm64: uaccess: Implement unsafe accessors
  uaccess: Check no rescheduling function is called in unsafe region
  arm64: uaccess: Implement user_access_region_active

 arch/arm64/include/asm/sysreg.h  |   2 +
 arch/arm64/include/asm/uaccess.h | 123 ++++++++++++++++++++++++++-------------
 include/linux/kernel.h           |  11 +++-
 include/linux/uaccess.h          |  13 +++++
 kernel/sched/core.c              |  22 +++++++
 lib/Kconfig.debug                |   8 +++
 6 files changed, 135 insertions(+), 44 deletions(-)

--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3 1/4] arm64: uaccess: Cleanup get/put_user()
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
@ 2019-01-15 13:58 ` Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 2/4] arm64: uaccess: Implement unsafe accessors Julien Thierry
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-15 13:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Julien Thierry, peterz, catalin.marinas, will.deacon, mingo,
	james.morse, hpa, valentin.schneider

__get/put_user_check() macro is made to return a value but this is never
used. Get rid of them and just use directly __get/put_user_error() as
a statement, reducing macro indirection.

Also, take this opportunity to rename __get/put_user_err() as it gets
a bit confusing having them along __get/put_user_error().

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>

---
 arch/arm64/include/asm/uaccess.h | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 547d7a0..8e40808 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -268,7 +268,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	: "+r" (err), "=&r" (x)						\
 	: "r" (addr), "i" (-EFAULT))

-#define __get_user_err(x, ptr, err)					\
+#define __raw_get_user(x, ptr, err)					\
 do {									\
 	unsigned long __gu_val;						\
 	__chk_user_ptr(ptr);						\
@@ -297,28 +297,22 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
 } while (0)

-#define __get_user_check(x, ptr, err)					\
-({									\
+#define __get_user_error(x, ptr, err)					\
+do {									\
 	__typeof__(*(ptr)) __user *__p = (ptr);				\
 	might_fault();							\
 	if (access_ok(__p, sizeof(*__p))) {				\
 		__p = uaccess_mask_ptr(__p);				\
-		__get_user_err((x), __p, (err));			\
+		__raw_get_user((x), __p, (err));			\
 	} else {							\
 		(x) = 0; (err) = -EFAULT;				\
 	}								\
-})
-
-#define __get_user_error(x, ptr, err)					\
-({									\
-	__get_user_check((x), (ptr), (err));				\
-	(void)0;							\
-})
+} while (0)

 #define __get_user(x, ptr)						\
 ({									\
 	int __gu_err = 0;						\
-	__get_user_check((x), (ptr), __gu_err);				\
+	__get_user_error((x), (ptr), __gu_err);				\
 	__gu_err;							\
 })

@@ -338,7 +332,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	: "+r" (err)							\
 	: "r" (x), "r" (addr), "i" (-EFAULT))

-#define __put_user_err(x, ptr, err)					\
+#define __raw_put_user(x, ptr, err)					\
 do {									\
 	__typeof__(*(ptr)) __pu_val = (x);				\
 	__chk_user_ptr(ptr);						\
@@ -366,28 +360,22 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	uaccess_disable_not_uao();					\
 } while (0)

-#define __put_user_check(x, ptr, err)					\
-({									\
+#define __put_user_error(x, ptr, err)					\
+do {									\
 	__typeof__(*(ptr)) __user *__p = (ptr);				\
 	might_fault();							\
 	if (access_ok(__p, sizeof(*__p))) {				\
 		__p = uaccess_mask_ptr(__p);				\
-		__put_user_err((x), __p, (err));			\
+		__raw_put_user((x), __p, (err));			\
 	} else	{							\
 		(err) = -EFAULT;					\
 	}								\
-})
-
-#define __put_user_error(x, ptr, err)					\
-({									\
-	__put_user_check((x), (ptr), (err));				\
-	(void)0;							\
-})
+} while (0)

 #define __put_user(x, ptr)						\
 ({									\
 	int __pu_err = 0;						\
-	__put_user_check((x), (ptr), __pu_err);				\
+	__put_user_error((x), (ptr), __pu_err);				\
 	__pu_err;							\
 })

--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3 2/4] arm64: uaccess: Implement unsafe accessors
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 1/4] arm64: uaccess: Cleanup get/put_user() Julien Thierry
@ 2019-01-15 13:58 ` Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Julien Thierry
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-15 13:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Julien Thierry, peterz, catalin.marinas, will.deacon, mingo,
	james.morse, hpa, valentin.schneider

Current implementation of get/put_user_unsafe default to get/put_user
which toggle PAN before each access, despite having been told by the caller
that multiple accesses to user memory were about to happen.

Provide implementations for user_access_begin/end to turn PAN off/on and
implement unsafe accessors that assume PAN was already turned off.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>

---
 arch/arm64/include/asm/uaccess.h | 79 ++++++++++++++++++++++++++++++----------
 1 file changed, 59 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 8e40808..6a70c75 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -270,31 +270,26 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)

 #define __raw_get_user(x, ptr, err)					\
 do {									\
-	unsigned long __gu_val;						\
-	__chk_user_ptr(ptr);						\
-	uaccess_enable_not_uao();					\
 	switch (sizeof(*(ptr))) {					\
 	case 1:								\
-		__get_user_asm("ldrb", "ldtrb", "%w", __gu_val, (ptr),  \
+		__get_user_asm("ldrb", "ldtrb", "%w", (x), (ptr),	\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 2:								\
-		__get_user_asm("ldrh", "ldtrh", "%w", __gu_val, (ptr),  \
+		__get_user_asm("ldrh", "ldtrh", "%w", (x), (ptr),	\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 4:								\
-		__get_user_asm("ldr", "ldtr", "%w", __gu_val, (ptr),	\
+		__get_user_asm("ldr", "ldtr", "%w", (x), (ptr),		\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 8:								\
-		__get_user_asm("ldr", "ldtr", "%x",  __gu_val, (ptr),	\
+		__get_user_asm("ldr", "ldtr", "%x",  (x), (ptr),	\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
 	}								\
-	uaccess_disable_not_uao();					\
-	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
 } while (0)

 #define __get_user_error(x, ptr, err)					\
@@ -302,8 +297,13 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	__typeof__(*(ptr)) __user *__p = (ptr);				\
 	might_fault();							\
 	if (access_ok(__p, sizeof(*__p))) {				\
+		unsigned long __gu_val;					\
+		__chk_user_ptr(__p);					\
 		__p = uaccess_mask_ptr(__p);				\
-		__raw_get_user((x), __p, (err));			\
+		uaccess_enable_not_uao();				\
+		__raw_get_user(__gu_val, __p, (err));			\
+		uaccess_disable_not_uao();				\
+		(x) = (__force __typeof__(*__p)) __gu_val;		\
 	} else {							\
 		(x) = 0; (err) = -EFAULT;				\
 	}								\
@@ -334,30 +334,26 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)

 #define __raw_put_user(x, ptr, err)					\
 do {									\
-	__typeof__(*(ptr)) __pu_val = (x);				\
-	__chk_user_ptr(ptr);						\
-	uaccess_enable_not_uao();					\
 	switch (sizeof(*(ptr))) {					\
 	case 1:								\
-		__put_user_asm("strb", "sttrb", "%w", __pu_val, (ptr),	\
+		__put_user_asm("strb", "sttrb", "%w", (x), (ptr),	\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 2:								\
-		__put_user_asm("strh", "sttrh", "%w", __pu_val, (ptr),	\
+		__put_user_asm("strh", "sttrh", "%w", (x), (ptr),	\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 4:								\
-		__put_user_asm("str", "sttr", "%w", __pu_val, (ptr),	\
+		__put_user_asm("str", "sttr", "%w", (x), (ptr),		\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	case 8:								\
-		__put_user_asm("str", "sttr", "%x", __pu_val, (ptr),	\
+		__put_user_asm("str", "sttr", "%x", (x), (ptr),		\
 			       (err), ARM64_HAS_UAO);			\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
 	}								\
-	uaccess_disable_not_uao();					\
 } while (0)

 #define __put_user_error(x, ptr, err)					\
@@ -365,9 +361,13 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	__typeof__(*(ptr)) __user *__p = (ptr);				\
 	might_fault();							\
 	if (access_ok(__p, sizeof(*__p))) {				\
+		__typeof__(*(ptr)) __pu_val = (x);			\
+		__chk_user_ptr(__p);					\
 		__p = uaccess_mask_ptr(__p);				\
-		__raw_put_user((x), __p, (err));			\
-	} else	{							\
+		uaccess_enable_not_uao();				\
+		__raw_put_user(__pu_val, __p, (err));			\
+		uaccess_disable_not_uao();				\
+	} else {							\
 		(err) = -EFAULT;					\
 	}								\
 } while (0)
@@ -381,6 +381,45 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)

 #define put_user	__put_user

+static __must_check inline bool user_access_begin(const void __user *ptr,
+						  size_t len)
+{
+	if (unlikely(!access_ok(ptr, len)))
+		return false;
+
+	uaccess_enable_not_uao();
+	return true;
+}
+#define user_access_begin(ptr, len)	user_access_begin(ptr, len)
+#define user_access_end()		uaccess_disable_not_uao()
+
+#define unsafe_get_user(x, ptr, err)					\
+do {									\
+	__typeof__(*(ptr)) __user *__p = (ptr);				\
+	unsigned long __gu_val;						\
+	int __gu_err = 0;						\
+	might_fault();							\
+	__chk_user_ptr(__p);						\
+	__p = uaccess_mask_ptr(__p);					\
+	__raw_get_user(__gu_val, __p, __gu_err);			\
+	(x) = (__force __typeof__(*__p)) __gu_val;			\
+	if (__gu_err != 0)						\
+		goto err;						\
+} while (0)
+
+#define unsafe_put_user(x, ptr, err)					\
+do {									\
+	__typeof__(*(ptr)) __user *__p = (ptr);				\
+	__typeof__(*(ptr)) __pu_val = (x);				\
+	int __pu_err = 0;						\
+	might_fault();							\
+	__chk_user_ptr(__p);						\
+	__p = uaccess_mask_ptr(__p);					\
+	__raw_put_user(__pu_val, __p, __pu_err);			\
+	if (__pu_err != 0)						\
+		goto err;						\
+} while (0)
+
 extern unsigned long __must_check __arch_copy_from_user(void *to, const void __user *from, unsigned long n);
 #define raw_copy_from_user(to, from, n)					\
 ({									\
--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 1/4] arm64: uaccess: Cleanup get/put_user() Julien Thierry
  2019-01-15 13:58 ` [PATCH v3 2/4] arm64: uaccess: Implement unsafe accessors Julien Thierry
@ 2019-01-15 13:58 ` Julien Thierry
  2019-01-30 16:58   ` Valentin Schneider
  2019-02-11 13:45   ` Ingo Molnar
  2019-01-15 13:58 ` [PATCH v3 4/4] arm64: uaccess: Implement user_access_region_active Julien Thierry
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-15 13:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Julien Thierry, peterz, catalin.marinas, will.deacon, mingo,
	james.morse, hpa, valentin.schneider

While running a user_access regions, it is not supported to reschedule.
Add an overridable primitive to indicate whether a user_access region is
active and check that this is not the case when calling rescheduling
functions.

These checks are only performed when DEBUG_UACCESS_SLEEP is selected.

Also, add a comment clarifying the behaviour of user_access regions.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

---
 include/linux/kernel.h  | 11 +++++++++--
 include/linux/uaccess.h | 13 +++++++++++++
 kernel/sched/core.c     | 22 ++++++++++++++++++++++
 lib/Kconfig.debug       |  8 ++++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8f0e68e..73f1f82 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -237,11 +237,18 @@
 struct pt_regs;
 struct user;

+#ifdef CONFIG_DEBUG_UACCESS_SLEEP
+extern void __might_resched(const char *file, int line);
+#else
+#define __might_resched(file, line) do { } while (0)
+#endif
+
 #ifdef CONFIG_PREEMPT_VOLUNTARY
 extern int _cond_resched(void);
-# define might_resched() _cond_resched()
+# define might_resched() \
+	do { __might_resched(__FILE__, __LINE__); _cond_resched(); } while (0)
 #else
-# define might_resched() do { } while (0)
+# define might_resched() __might_resched(__FILE__, __LINE__)
 #endif

 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 37b226e..2c0c39e 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -263,6 +263,15 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
 #define probe_kernel_address(addr, retval)		\
 	probe_kernel_read(&retval, addr, sizeof(retval))

+/*
+ * user_access_begin() and user_access_end() define a region where
+ * unsafe user accessors can be used. Exceptions and interrupt shall exit the
+ * user_access region and re-enter it when returning to the interrupted context.
+ *
+ * No sleeping function should get called during a user_access region - we rely
+ * on exception handling to take care of the user_access status for us, but that
+ * doesn't happen when directly calling schedule().
+ */
 #ifndef user_access_begin
 #define user_access_begin(ptr,len) access_ok(ptr, len)
 #define user_access_end() do { } while (0)
@@ -270,6 +279,10 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
 #define unsafe_put_user(x, ptr, err) do { if (unlikely(__put_user(x, ptr))) goto err; } while (0)
 #endif

+#ifndef unsafe_user_region_active
+#define unsafe_user_region_active()	false
+#endif
+
 #ifdef CONFIG_HARDENED_USERCOPY
 void usercopy_warn(const char *name, const char *detail, bool to_user,
 		   unsigned long offset, unsigned long len);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a674c7db..b1bb7e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
 		__schedule_bug(prev);
 		preempt_count_set(PREEMPT_DISABLED);
 	}
+
+	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
+	    unlikely(unsafe_user_region_active())) {
+		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
+		       prev->comm, prev->pid, preempt_count());
+		dump_stack();
+	}
+
 	rcu_sleep_check();

 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
@@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 EXPORT_SYMBOL(___might_sleep);
 #endif

+#ifdef CONFIG_DEBUG_UACCESS_SLEEP
+void __might_resched(const char *file, int line)
+{
+	if (!unsafe_user_region_active())
+		return;
+
+	printk(KERN_ERR
+		"BUG: rescheduling function called from user access context at %s:%d\n",
+			file, line);
+	dump_stack();
+}
+EXPORT_SYMBOL(__might_resched);
+#endif
+
 #ifdef CONFIG_MAGIC_SYSRQ
 void normalize_rt_tasks(void)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d4df5b2..d030e31 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2069,6 +2069,14 @@ config IO_STRICT_DEVMEM

 	  If in doubt, say Y.

+config DEBUG_UACCESS_SLEEP
+	bool "Check sleep inside a user access region"
+	depends on DEBUG_KERNEL
+	help
+	  If you say Y here, various routines which may sleep will become very
+	  noisy if they are called inside a user access region (i.e. between
+	  a user_access_begin() and a user_access_end())
+
 source "arch/$(SRCARCH)/Kconfig.debug"

 endmenu # Kernel hacking
--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3 4/4] arm64: uaccess: Implement user_access_region_active
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
                   ` (2 preceding siblings ...)
  2019-01-15 13:58 ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Julien Thierry
@ 2019-01-15 13:58 ` Julien Thierry
  2019-01-25 14:27 ` [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Catalin Marinas
  2019-01-30 16:17 ` Julien Thierry
  5 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-15 13:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Julien Thierry, peterz, catalin.marinas, will.deacon, mingo,
	james.morse, hpa, valentin.schneider

Provide arm64 implementation of user_access_region_active.

Check the status of PAN or SW PAN if needed.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/include/asm/uaccess.h | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 72dc4c0..f0b2f32 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -114,6 +114,8 @@
 #define SYS_DC_CSW			sys_insn(1, 0, 7, 10, 2)
 #define SYS_DC_CISW			sys_insn(1, 0, 7, 14, 2)

+#define SYS_PSTATE_PAN			sys_reg(3, 0, 4, 2, 3)
+
 #define SYS_OSDTRRX_EL1			sys_reg(2, 0, 0, 0, 2)
 #define SYS_MDCCINT_EL1			sys_reg(2, 0, 0, 2, 0)
 #define SYS_MDSCR_EL1			sys_reg(2, 0, 0, 2, 2)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 6a70c75..1a8acc9 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -225,6 +225,18 @@ static inline void uaccess_enable_not_uao(void)
 	__uaccess_enable(ARM64_ALT_PAN_NOT_UAO);
 }

+#define unsafe_user_region_active	uaccess_region_active
+static inline bool uaccess_region_active(void)
+{
+	if (system_uses_ttbr0_pan()) {
+		return read_sysreg(ttbr1_el1) & TTBR_ASID_MASK;
+	} else if (cpus_have_const_cap(ARM64_ALT_PAN_NOT_UAO) &&
+		   alternatives_applied) {
+		return !read_sysreg_s(SYS_PSTATE_PAN);
+	}
+	return false;
+}
+
 /*
  * Sanitise a uaccess pointer such that it becomes NULL if above the
  * current addr_limit.
--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
                   ` (3 preceding siblings ...)
  2019-01-15 13:58 ` [PATCH v3 4/4] arm64: uaccess: Implement user_access_region_active Julien Thierry
@ 2019-01-25 14:27 ` Catalin Marinas
  2019-01-30 16:17 ` Julien Thierry
  5 siblings, 0 replies; 58+ messages in thread
From: Catalin Marinas @ 2019-01-25 14:27 UTC (permalink / raw)
  To: Julien Thierry
  Cc: peterz, will.deacon, linux-kernel, mingo, james.morse, hpa,
	valentin.schneider, linux-arm-kernel

Hi Julien,

On Tue, Jan 15, 2019 at 01:58:25PM +0000, Julien Thierry wrote:
> Julien Thierry (4):
>   arm64: uaccess: Cleanup get/put_user()
>   arm64: uaccess: Implement unsafe accessors
>   uaccess: Check no rescheduling function is called in unsafe region
>   arm64: uaccess: Implement user_access_region_active

I queued the first two patches for 5.1. It would be nice to upstream the
other two but patch 3 needs an ack from (or merged by) the scheduler
folk.

Thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64
  2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
                   ` (4 preceding siblings ...)
  2019-01-25 14:27 ` [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Catalin Marinas
@ 2019-01-30 16:17 ` Julien Thierry
  5 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-01-30 16:17 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, mingo, peterz, hpa
  Cc: catalin.marinas, will.deacon, james.morse, valentin.schneider

Hi,

Gentle ping for patches 3 and 4.

Thanks,

Julien

On 15/01/2019 13:58, Julien Thierry wrote:
> Hi,
> 
> First version of this series[1] was briefly in linux-next but had to be
> reverted due to a bug where schedule would end up being called while
> user_access was active[2].
> 
> After clarifications[3], rescheduling while in a user_access region is not
> allowed.
> 
> * Patches 1 and 2 implement the unsafe accessors for arm64
> * Patches 3 and 4 clarify this restriction in the API and attempts to
>   check against violations of the restriction.
> 
> Changes since v2[4]:
> - Rebase on v5.0-rc2
> - access_ok() is now done in user_access_begin(), so rework accessors
>   so access_ok() is not called in unsafe_get/put_user()
> - Split addition of unsafe accessors and the user_access_region check
>   into separate patches
> - Avoid reading SCTLR_EL1 in user_access_region check
> - Add build option for user_access_region checking
> - Reword clarifications on unsafe accessors API
> 
> Changes since v1[1]:
> - Add a way to detect code calling schedule within a user_access region
> - Make sure put_user/get_user arguments are evaluated before disabling PAN
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg674925.html
> [2] https://patchwork.kernel.org/patch/10634783/
> [3] https://lkml.org/lkml/2018/11/28/50
> [4] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/617080.html
> 
> Cheers,
> 
> Julien
> 
> -->
> 
> Julien Thierry (4):
>   arm64: uaccess: Cleanup get/put_user()
>   arm64: uaccess: Implement unsafe accessors
>   uaccess: Check no rescheduling function is called in unsafe region
>   arm64: uaccess: Implement user_access_region_active
> 
>  arch/arm64/include/asm/sysreg.h  |   2 +
>  arch/arm64/include/asm/uaccess.h | 123 ++++++++++++++++++++++++++-------------
>  include/linux/kernel.h           |  11 +++-
>  include/linux/uaccess.h          |  13 +++++
>  kernel/sched/core.c              |  22 +++++++
>  lib/Kconfig.debug                |   8 +++
>  6 files changed, 135 insertions(+), 44 deletions(-)
> 
> --
> 1.9.1
> 

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-01-15 13:58 ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Julien Thierry
@ 2019-01-30 16:58   ` Valentin Schneider
  2019-02-04 13:27     ` Julien Thierry
  2019-02-11 13:45   ` Ingo Molnar
  1 sibling, 1 reply; 58+ messages in thread
From: Valentin Schneider @ 2019-01-30 16:58 UTC (permalink / raw)
  To: Julien Thierry, linux-kernel, linux-arm-kernel
  Cc: peterz, catalin.marinas, will.deacon, mingo, james.morse, hpa

On 15/01/2019 13:58, Julien Thierry wrote:
[...]> @@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  EXPORT_SYMBOL(___might_sleep);
>  #endif
> 
> +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
> +void __might_resched(const char *file, int line)
> +{
> +	if (!unsafe_user_region_active())
> +		return;
> +
> +	printk(KERN_ERR
> +		"BUG: rescheduling function called from user access context at %s:%d\n",
> +			file, line);
> +	dump_stack();

Since I've been staring intensely at ___might_sleep() lately, I was thinking
we could "copy" it a bit more closely (sorry for going back on what I said
earlier).

Coming back to the double warnings (__might_resched() + schedule_debug()),
it might be better to drop the schedule_debug() warning and just have the
one in __might_resched() - if something goes wrong, there'll already be a
"BUG" in the log.

> +}
> +EXPORT_SYMBOL(__might_resched);
> +#endif
> +
>  #ifdef CONFIG_MAGIC_SYSRQ
>  void normalize_rt_tasks(void)
>  {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d4df5b2..d030e31 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2069,6 +2069,14 @@ config IO_STRICT_DEVMEM
> 
>  	  If in doubt, say Y.
> 
> +config DEBUG_UACCESS_SLEEP
> +	bool "Check sleep inside a user access region"
> +	depends on DEBUG_KERNEL
> +	help
> +	  If you say Y here, various routines which may sleep will become very
> +	  noisy if they are called inside a user access region (i.e. between
> +	  a user_access_begin() and a user_access_end())

If it does get noisy, we should go for some ratelimiting - it's probably
good practice even if it is not noisy actually.

___might_sleep() has this:

  if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
  	return;
  prev_jiffy = jiffies;

> +
>  source "arch/$(SRCARCH)/Kconfig.debug"
> 
>  endmenu # Kernel hacking
> --
> 1.9.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-01-30 16:58   ` Valentin Schneider
@ 2019-02-04 13:27     ` Julien Thierry
  0 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-02-04 13:27 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, linux-arm-kernel
  Cc: peterz, catalin.marinas, will.deacon, mingo, james.morse, hpa



On 30/01/2019 16:58, Valentin Schneider wrote:
> On 15/01/2019 13:58, Julien Thierry wrote:
> [...]> @@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>>  EXPORT_SYMBOL(___might_sleep);
>>  #endif
>>
>> +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
>> +void __might_resched(const char *file, int line)
>> +{
>> +	if (!unsafe_user_region_active())
>> +		return;
>> +
>> +	printk(KERN_ERR
>> +		"BUG: rescheduling function called from user access context at %s:%d\n",
>> +			file, line);
>> +	dump_stack();
> 
> Since I've been staring intensely at ___might_sleep() lately, I was thinking
> we could "copy" it a bit more closely (sorry for going back on what I said
> earlier).
> 
> Coming back to the double warnings (__might_resched() + schedule_debug()),
> it might be better to drop the schedule_debug() warning and just have the
> one in __might_resched() - if something goes wrong, there'll already be a
> "BUG" in the log.
> 

My only worry with that approach is that if someone has a function that
does resched but does not include the annotation __might_resched() we'd
miss the fact that something went wrong.

But that might be a lot of "if"s since that assumes that something does
go wrong.

Could I have a maintainers opinion on this to know if I respin it?

>> +}
>> +EXPORT_SYMBOL(__might_resched);
>> +#endif
>> +
>>  #ifdef CONFIG_MAGIC_SYSRQ
>>  void normalize_rt_tasks(void)
>>  {
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index d4df5b2..d030e31 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -2069,6 +2069,14 @@ config IO_STRICT_DEVMEM
>>
>>  	  If in doubt, say Y.
>>
>> +config DEBUG_UACCESS_SLEEP
>> +	bool "Check sleep inside a user access region"
>> +	depends on DEBUG_KERNEL
>> +	help
>> +	  If you say Y here, various routines which may sleep will become very
>> +	  noisy if they are called inside a user access region (i.e. between
>> +	  a user_access_begin() and a user_access_end())
> 
> If it does get noisy, we should go for some ratelimiting - it's probably
> good practice even if it is not noisy actually.
> 
> ___might_sleep() has this:
> 
>   if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
>   	return;
>   prev_jiffy = jiffies;
> 

I guess the noisiness could depend on the arch specific handling of user
accesses. So yes I guess it might be a good idea to add this.

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-01-15 13:58 ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Julien Thierry
  2019-01-30 16:58   ` Valentin Schneider
@ 2019-02-11 13:45   ` Ingo Molnar
  2019-02-11 13:51     ` Peter Zijlstra
  1 sibling, 1 reply; 58+ messages in thread
From: Ingo Molnar @ 2019-02-11 13:45 UTC (permalink / raw)
  To: Julien Thierry
  Cc: peterz, catalin.marinas, will.deacon, linux-kernel, mingo,
	james.morse, hpa, valentin.schneider, linux-arm-kernel


* Julien Thierry <julien.thierry@arm.com> wrote:

> While running a user_access regions, it is not supported to reschedule.
> Add an overridable primitive to indicate whether a user_access region is
> active and check that this is not the case when calling rescheduling
> functions.
> 
> These checks are only performed when DEBUG_UACCESS_SLEEP is selected.
> 
> Also, add a comment clarifying the behaviour of user_access regions.
> 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> 
> ---
>  include/linux/kernel.h  | 11 +++++++++--
>  include/linux/uaccess.h | 13 +++++++++++++
>  kernel/sched/core.c     | 22 ++++++++++++++++++++++
>  lib/Kconfig.debug       |  8 ++++++++
>  4 files changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 8f0e68e..73f1f82 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -237,11 +237,18 @@
>  struct pt_regs;
>  struct user;
> 
> +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
> +extern void __might_resched(const char *file, int line);
> +#else
> +#define __might_resched(file, line) do { } while (0)
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_VOLUNTARY
>  extern int _cond_resched(void);
> -# define might_resched() _cond_resched()
> +# define might_resched() \
> +	do { __might_resched(__FILE__, __LINE__); _cond_resched(); } while (0)
>  #else
> -# define might_resched() do { } while (0)
> +# define might_resched() __might_resched(__FILE__, __LINE__)
>  #endif
> 
>  #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 37b226e..2c0c39e 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -263,6 +263,15 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
>  #define probe_kernel_address(addr, retval)		\
>  	probe_kernel_read(&retval, addr, sizeof(retval))
> 
> +/*
> + * user_access_begin() and user_access_end() define a region where
> + * unsafe user accessors can be used. Exceptions and interrupt shall exit the
> + * user_access region and re-enter it when returning to the interrupted context.
> + *
> + * No sleeping function should get called during a user_access region - we rely
> + * on exception handling to take care of the user_access status for us, but that
> + * doesn't happen when directly calling schedule().
> + */
>  #ifndef user_access_begin
>  #define user_access_begin(ptr,len) access_ok(ptr, len)
>  #define user_access_end() do { } while (0)
> @@ -270,6 +279,10 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
>  #define unsafe_put_user(x, ptr, err) do { if (unlikely(__put_user(x, ptr))) goto err; } while (0)
>  #endif
> 
> +#ifndef unsafe_user_region_active
> +#define unsafe_user_region_active()	false
> +#endif
> +
>  #ifdef CONFIG_HARDENED_USERCOPY
>  void usercopy_warn(const char *name, const char *detail, bool to_user,
>  		   unsigned long offset, unsigned long len);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a674c7db..b1bb7e9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
>  		__schedule_bug(prev);
>  		preempt_count_set(PREEMPT_DISABLED);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> +	    unlikely(unsafe_user_region_active())) {
> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> +		       prev->comm, prev->pid, preempt_count());
> +		dump_stack();
> +	}
> +
>  	rcu_sleep_check();
> 
>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> @@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  EXPORT_SYMBOL(___might_sleep);
>  #endif
> 
> +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
> +void __might_resched(const char *file, int line)
> +{
> +	if (!unsafe_user_region_active())
> +		return;

Could you please more clearly explain why you want/need an exception from 
the __might_resched() debug warning?

Thanks,

	Ingo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-11 13:45   ` Ingo Molnar
@ 2019-02-11 13:51     ` Peter Zijlstra
  2019-02-12  9:15       ` Julien Thierry
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-11 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Julien Thierry, catalin.marinas, will.deacon, linux-kernel,
	mingo, james.morse, hpa, valentin.schneider, linux-arm-kernel

On Mon, Feb 11, 2019 at 02:45:27PM +0100, Ingo Molnar wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a674c7db..b1bb7e9 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
> >  		__schedule_bug(prev);
> >  		preempt_count_set(PREEMPT_DISABLED);
> >  	}
> > +
> > +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> > +	    unlikely(unsafe_user_region_active())) {
> > +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> > +		       prev->comm, prev->pid, preempt_count());
> > +		dump_stack();
> > +	}
> > +
> >  	rcu_sleep_check();
> > 
> >  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > @@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
> >  EXPORT_SYMBOL(___might_sleep);
> >  #endif
> > 
> > +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
> > +void __might_resched(const char *file, int line)
> > +{
> > +	if (!unsafe_user_region_active())
> > +		return;
> 
> Could you please more clearly explain why you want/need an exception from 
> the __might_resched() debug warning?

In specific; how is the addition in schedule_debug() not triggering on
PREEMPT=y kernels?

If code is preemptible, you can (get) schedule(d). If it is not
preemptible; you do not need these additional tests.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-11 13:51     ` Peter Zijlstra
@ 2019-02-12  9:15       ` Julien Thierry
  2019-02-13  8:21         ` Ingo Molnar
  2019-02-13 10:35         ` Peter Zijlstra
  0 siblings, 2 replies; 58+ messages in thread
From: Julien Thierry @ 2019-02-12  9:15 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: catalin.marinas, will.deacon, linux-kernel, mingo, james.morse,
	hpa, valentin.schneider, linux-arm-kernel



On 11/02/2019 13:51, Peter Zijlstra wrote:
> On Mon, Feb 11, 2019 at 02:45:27PM +0100, Ingo Molnar wrote:
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index a674c7db..b1bb7e9 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
>>>  		__schedule_bug(prev);
>>>  		preempt_count_set(PREEMPT_DISABLED);
>>>  	}
>>> +
>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
>>> +	    unlikely(unsafe_user_region_active())) {
>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
>>> +		       prev->comm, prev->pid, preempt_count());
>>> +		dump_stack();
>>> +	}
>>> +
>>>  	rcu_sleep_check();
>>>
>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
>>> @@ -6151,6 +6159,20 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>>>  EXPORT_SYMBOL(___might_sleep);
>>>  #endif
>>>
>>> +#ifdef CONFIG_DEBUG_UACCESS_SLEEP
>>> +void __might_resched(const char *file, int line)
>>> +{
>>> +	if (!unsafe_user_region_active())
>>> +		return;
>>
>> Could you please more clearly explain why you want/need an exception from 
>> the __might_resched() debug warning?

So, the scenarios I'm trying to avoid are of the following flavour:

	if (user_access_begin(ptr, size)) {

		[...]

		// Calling a function that might call schedule()

		[...]
		user_access_end();
	}


The thing is, as I understand, not all function that call schedule() are
annotated with might_resched(), and on the other hand, not every time we
call a function that might_resched() will it call schedule().

Now with Peter's remark I think I might have been overzealous.

> 
> In specific; how is the addition in schedule_debug() not triggering on
> PREEMPT=y kernels?
> 
> If code is preemptible, you can (get) schedule(d). If it is not
> preemptible; you do not need these additional tests.
> 

Yes that sounds right, might_resched() only potentially reschedules if
in a suitable context, so best case I issue two warnings, worst case I
actually be warn when the caller took care to disable preemption or
interrupts before calling a might_resched().

I guess I got a bit confused with might_sleep() which is "if you call
this in the wrong context I warn" whereas might_resched() is just "if
you call this in preemptible context, lets resched".

I guess I'll drop the might_resched() part of this patch if that sounds
alright.

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-12  9:15       ` Julien Thierry
@ 2019-02-13  8:21         ` Ingo Molnar
  2019-02-13 10:35         ` Peter Zijlstra
  1 sibling, 0 replies; 58+ messages in thread
From: Ingo Molnar @ 2019-02-13  8:21 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, catalin.marinas, will.deacon, linux-kernel,
	mingo, james.morse, hpa, valentin.schneider, linux-arm-kernel


* Julien Thierry <julien.thierry@arm.com> wrote:

> I guess I'll drop the might_resched() part of this patch if that sounds
> alright.

That sounds perfect - that bit was pretty much the only problem I had 
with the series.

Thanks,

	Ingo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-12  9:15       ` Julien Thierry
  2019-02-13  8:21         ` Ingo Molnar
@ 2019-02-13 10:35         ` Peter Zijlstra
  2019-02-13 10:50           ` Julien Thierry
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 10:35 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, will.deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:

> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>> index a674c7db..b1bb7e9 100644
> >>> --- a/kernel/sched/core.c
> >>> +++ b/kernel/sched/core.c
> >>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
> >>>  		__schedule_bug(prev);
> >>>  		preempt_count_set(PREEMPT_DISABLED);
> >>>  	}
> >>> +
> >>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> >>> +	    unlikely(unsafe_user_region_active())) {
> >>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> >>> +		       prev->comm, prev->pid, preempt_count());
> >>> +		dump_stack();
> >>> +	}
> >>> +
> >>>  	rcu_sleep_check();
> >>>
> >>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));

> I guess I'll drop the might_resched() part of this patch if that sounds
> alright.

I'm still confused by the schedule_debug() part. How is that not broken?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 10:35         ` Peter Zijlstra
@ 2019-02-13 10:50           ` Julien Thierry
  2019-02-13 13:17             ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Julien Thierry @ 2019-02-13 10:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: catalin.marinas, will.deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel



On 13/02/2019 10:35, Peter Zijlstra wrote:
> On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:
> 
>>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>>> index a674c7db..b1bb7e9 100644
>>>>> --- a/kernel/sched/core.c
>>>>> +++ b/kernel/sched/core.c
>>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
>>>>>  		__schedule_bug(prev);
>>>>>  		preempt_count_set(PREEMPT_DISABLED);
>>>>>  	}
>>>>> +
>>>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
>>>>> +	    unlikely(unsafe_user_region_active())) {
>>>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
>>>>> +		       prev->comm, prev->pid, preempt_count());
>>>>> +		dump_stack();
>>>>> +	}
>>>>> +
>>>>>  	rcu_sleep_check();
>>>>>
>>>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> 
>> I guess I'll drop the might_resched() part of this patch if that sounds
>> alright.
> 
> I'm still confused by the schedule_debug() part. How is that not broken?

Hmmm, I am not exactly sure which part you expect to be broken, I guess
it's because of the nature of the uaccess unsafe accessor usage.

Basically, the following is a definite no:
	if (user_access_begin(ptr, size)) {

		[...]

		//something that calls schedule

		[...]

		user_access_end();
	}
	

However the following is fine:

- user_access_begin(ptr, size)
- taking irq/exception
- get preempted
- get resumed at some point in time
- restore state + eret
- user_access_end()

That's because exceptions/irq implicitly "suspend" the user access
region. (That's what I'm trying to clarify with the comment)
So, unsafe_user_region_active() should return false in a irq/exception
context.

Is this what you were concerned about? Or there still something that
might be broken?

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 10:50           ` Julien Thierry
@ 2019-02-13 13:17             ` Peter Zijlstra
  2019-02-13 13:20               ` Peter Zijlstra
  2019-02-13 14:00               ` Will Deacon
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 13:17 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, will.deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 10:50:21AM +0000, Julien Thierry wrote:
> On 13/02/2019 10:35, Peter Zijlstra wrote:
> > On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:
> > 
> >>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>>>> index a674c7db..b1bb7e9 100644
> >>>>> --- a/kernel/sched/core.c
> >>>>> +++ b/kernel/sched/core.c
> >>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
> >>>>>  		__schedule_bug(prev);
> >>>>>  		preempt_count_set(PREEMPT_DISABLED);
> >>>>>  	}
> >>>>> +
> >>>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> >>>>> +	    unlikely(unsafe_user_region_active())) {
> >>>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> >>>>> +		       prev->comm, prev->pid, preempt_count());
> >>>>> +		dump_stack();
> >>>>> +	}
> >>>>> +
> >>>>>  	rcu_sleep_check();
> >>>>>
> >>>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > 
> >> I guess I'll drop the might_resched() part of this patch if that sounds
> >> alright.
> > 
> > I'm still confused by the schedule_debug() part. How is that not broken?
> 
> Hmmm, I am not exactly sure which part you expect to be broken, I guess
> it's because of the nature of the uaccess unsafe accessor usage.
> 
> Basically, the following is a definite no:
> 	if (user_access_begin(ptr, size)) {
> 
> 		[...]
> 
> 		//something that calls schedule
> 
> 		[...]
> 
> 		user_access_end();
> 	}
> 	
> 
> However the following is fine:
> 
> - user_access_begin(ptr, size)
> - taking irq/exception
> - get preempted

This; how is getting preempted fundamentally different from scheduling
ourselves?

> - get resumed at some point in time
> - restore state + eret
> - user_access_end()
> 
> That's because exceptions/irq implicitly "suspend" the user access
> region. (That's what I'm trying to clarify with the comment)
> So, unsafe_user_region_active() should return false in a irq/exception
> context.
> 
> Is this what you were concerned about? Or there still something that
> might be broken?

I really hate the asymetry introduced between preemptible and being able
to schedule. Both end up calling __schedule() and there really should
not be a difference.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 13:17             ` Peter Zijlstra
@ 2019-02-13 13:20               ` Peter Zijlstra
  2019-02-13 14:00               ` Will Deacon
  1 sibling, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 13:20 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, will.deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 02:17:20PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 10:50:21AM +0000, Julien Thierry wrote:
> > On 13/02/2019 10:35, Peter Zijlstra wrote:
> > > On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:
> > > 
> > >>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > >>>>> index a674c7db..b1bb7e9 100644
> > >>>>> --- a/kernel/sched/core.c
> > >>>>> +++ b/kernel/sched/core.c
> > >>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
> > >>>>>  		__schedule_bug(prev);
> > >>>>>  		preempt_count_set(PREEMPT_DISABLED);
> > >>>>>  	}
> > >>>>> +
> > >>>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> > >>>>> +	    unlikely(unsafe_user_region_active())) {
> > >>>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> > >>>>> +		       prev->comm, prev->pid, preempt_count());
> > >>>>> +		dump_stack();
> > >>>>> +	}
> > >>>>> +
> > >>>>>  	rcu_sleep_check();
> > >>>>>
> > >>>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > > 
> > >> I guess I'll drop the might_resched() part of this patch if that sounds
> > >> alright.
> > > 
> > > I'm still confused by the schedule_debug() part. How is that not broken?
> > 
> > Hmmm, I am not exactly sure which part you expect to be broken, I guess
> > it's because of the nature of the uaccess unsafe accessor usage.
> > 
> > Basically, the following is a definite no:
> > 	if (user_access_begin(ptr, size)) {
> > 
> > 		[...]
> > 
> > 		//something that calls schedule
> > 
> > 		[...]
> > 
> > 		user_access_end();
> > 	}
> > 	
> > 
> > However the following is fine:
> > 
> > - user_access_begin(ptr, size)
> > - taking irq/exception
> > - get preempted
> 
> This; how is getting preempted fundamentally different from scheduling
> ourselves?

This is also the thing that PREEMPT_VOLUNTARY hinges on; it inserts
'random' reschedule points through might_sleep() and cond_resched().

If you're preemptible; you must be able to schedule and vice-versa.

You're breaking that.

> > - get resumed at some point in time
> > - restore state + eret
> > - user_access_end()
> > 
> > That's because exceptions/irq implicitly "suspend" the user access
> > region. (That's what I'm trying to clarify with the comment)
> > So, unsafe_user_region_active() should return false in a irq/exception
> > context.
> > 
> > Is this what you were concerned about? Or there still something that
> > might be broken?
> 
> I really hate the asymetry introduced between preemptible and being able
> to schedule. Both end up calling __schedule() and there really should
> not be a difference.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 13:17             ` Peter Zijlstra
  2019-02-13 13:20               ` Peter Zijlstra
@ 2019-02-13 14:00               ` Will Deacon
  2019-02-13 14:07                 ` Julien Thierry
                                   ` (2 more replies)
  1 sibling, 3 replies; 58+ messages in thread
From: Will Deacon @ 2019-02-13 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julien Thierry, catalin.marinas, linux-kernel,
	valentin.schneider, mingo, james.morse, hpa, Ingo Molnar,
	linux-arm-kernel

Hi Peter,

On Wed, Feb 13, 2019 at 02:17:20PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 10:50:21AM +0000, Julien Thierry wrote:
> > On 13/02/2019 10:35, Peter Zijlstra wrote:
> > > On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:
> > > 
> > >>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > >>>>> index a674c7db..b1bb7e9 100644
> > >>>>> --- a/kernel/sched/core.c
> > >>>>> +++ b/kernel/sched/core.c
> > >>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
> > >>>>>  		__schedule_bug(prev);
> > >>>>>  		preempt_count_set(PREEMPT_DISABLED);
> > >>>>>  	}
> > >>>>> +
> > >>>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
> > >>>>> +	    unlikely(unsafe_user_region_active())) {
> > >>>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
> > >>>>> +		       prev->comm, prev->pid, preempt_count());
> > >>>>> +		dump_stack();
> > >>>>> +	}
> > >>>>> +
> > >>>>>  	rcu_sleep_check();
> > >>>>>
> > >>>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > > 
> > >> I guess I'll drop the might_resched() part of this patch if that sounds
> > >> alright.
> > > 
> > > I'm still confused by the schedule_debug() part. How is that not broken?
> > 
> > Hmmm, I am not exactly sure which part you expect to be broken, I guess
> > it's because of the nature of the uaccess unsafe accessor usage.
> > 
> > Basically, the following is a definite no:
> > 	if (user_access_begin(ptr, size)) {
> > 
> > 		[...]
> > 
> > 		//something that calls schedule
> > 
> > 		[...]
> > 
> > 		user_access_end();
> > 	}
> > 	
> > 
> > However the following is fine:
> > 
> > - user_access_begin(ptr, size)
> > - taking irq/exception
> > - get preempted
> 
> This; how is getting preempted fundamentally different from scheduling
> ourselves?

The difference is because getting preempted in the sequence above is
triggered off the back of an interrupt. On arm64, and I think also on x86,
the user access state (SMAP or PAN) is saved and restored across exceptions
but not across context switch. Consequently, taking an irq in a
user_access_{begin,end} section and then scheduling is fine, but calling
schedule directly within such a section is not.

Julien -- please yell if I've missed some crucial detail, but I think that's
the gist of what we're trying to describe here.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:00               ` Will Deacon
@ 2019-02-13 14:07                 ` Julien Thierry
  2019-02-13 14:17                 ` Peter Zijlstra
  2019-02-13 14:25                 ` Peter Zijlstra
  2 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-02-13 14:07 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra
  Cc: catalin.marinas, linux-kernel, valentin.schneider, mingo,
	james.morse, hpa, Ingo Molnar, linux-arm-kernel



On 13/02/2019 14:00, Will Deacon wrote:
> Hi Peter,
> 
> On Wed, Feb 13, 2019 at 02:17:20PM +0100, Peter Zijlstra wrote:
>> On Wed, Feb 13, 2019 at 10:50:21AM +0000, Julien Thierry wrote:
>>> On 13/02/2019 10:35, Peter Zijlstra wrote:
>>>> On Tue, Feb 12, 2019 at 09:15:13AM +0000, Julien Thierry wrote:
>>>>
>>>>>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>>>>>> index a674c7db..b1bb7e9 100644
>>>>>>>> --- a/kernel/sched/core.c
>>>>>>>> +++ b/kernel/sched/core.c
>>>>>>>> @@ -3289,6 +3289,14 @@ static inline void schedule_debug(struct task_struct *prev)
>>>>>>>>  		__schedule_bug(prev);
>>>>>>>>  		preempt_count_set(PREEMPT_DISABLED);
>>>>>>>>  	}
>>>>>>>> +
>>>>>>>> +	if (IS_ENABLED(CONFIG_DEBUG_UACCESS_SLEEP) &&
>>>>>>>> +	    unlikely(unsafe_user_region_active())) {
>>>>>>>> +		printk(KERN_ERR "BUG: scheduling while user_access enabled: %s/%d/0x%08x\n",
>>>>>>>> +		       prev->comm, prev->pid, preempt_count());
>>>>>>>> +		dump_stack();
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>>  	rcu_sleep_check();
>>>>>>>>
>>>>>>>>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
>>>>
>>>>> I guess I'll drop the might_resched() part of this patch if that sounds
>>>>> alright.
>>>>
>>>> I'm still confused by the schedule_debug() part. How is that not broken?
>>>
>>> Hmmm, I am not exactly sure which part you expect to be broken, I guess
>>> it's because of the nature of the uaccess unsafe accessor usage.
>>>
>>> Basically, the following is a definite no:
>>> 	if (user_access_begin(ptr, size)) {
>>>
>>> 		[...]
>>>
>>> 		//something that calls schedule
>>>
>>> 		[...]
>>>
>>> 		user_access_end();
>>> 	}
>>> 	
>>>
>>> However the following is fine:
>>>
>>> - user_access_begin(ptr, size)
>>> - taking irq/exception
>>> - get preempted
>>
>> This; how is getting preempted fundamentally different from scheduling
>> ourselves?
> 
> The difference is because getting preempted in the sequence above is
> triggered off the back of an interrupt. On arm64, and I think also on x86,
> the user access state (SMAP or PAN) is saved and restored across exceptions
> but not across context switch. Consequently, taking an irq in a
> user_access_{begin,end} section and then scheduling is fine, but calling
> schedule directly within such a section is not.
> 
> Julien -- please yell if I've missed some crucial detail, but I think that's
> the gist of what we're trying to describe here.
> 

Yes, this summarizes things correctly. Thanks!

I might also stress out that this limitation is already existing for x86
(and is in the arm64 patches picked by Catalin for 5.1), as was
discussed in here:

https://lkml.org/lkml/2018/11/23/430

So this patch is not introducing new semantics, it is only making
existing ones explicit.

If the current state is not good, we need to re-discuss the semantics of
user_access regions.

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:00               ` Will Deacon
  2019-02-13 14:07                 ` Julien Thierry
@ 2019-02-13 14:17                 ` Peter Zijlstra
  2019-02-13 14:24                   ` Julien Thierry
  2019-02-13 14:25                 ` Peter Zijlstra
  2 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 14:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Julien Thierry, catalin.marinas, linux-kernel,
	valentin.schneider, mingo, james.morse, hpa, Ingo Molnar,
	linux-arm-kernel

On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> > This; how is getting preempted fundamentally different from scheduling
> > ourselves?
> 
> The difference is because getting preempted in the sequence above is
> triggered off the back of an interrupt. On arm64, and I think also on x86,
> the user access state (SMAP or PAN) is saved and restored across exceptions
> but not across context switch. Consequently, taking an irq in a
> user_access_{begin,end} section and then scheduling is fine, but calling
> schedule directly within such a section is not.

So how's this then:

	if (user_access_begin()) {

		preempt_disable();

		<IRQ>
			set_need_resched();
		</IRQ no preempt>

		preempt_enable()
		  __schedule();

		user_access_end();
	}

That _should_ work just fine but explodes with the proposed nonsense.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:17                 ` Peter Zijlstra
@ 2019-02-13 14:24                   ` Julien Thierry
  2019-02-13 14:40                     ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Julien Thierry @ 2019-02-13 14:24 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: catalin.marinas, linux-kernel, valentin.schneider, mingo,
	james.morse, hpa, Ingo Molnar, linux-arm-kernel



On 13/02/2019 14:17, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
>>> This; how is getting preempted fundamentally different from scheduling
>>> ourselves?
>>
>> The difference is because getting preempted in the sequence above is
>> triggered off the back of an interrupt. On arm64, and I think also on x86,
>> the user access state (SMAP or PAN) is saved and restored across exceptions
>> but not across context switch. Consequently, taking an irq in a
>> user_access_{begin,end} section and then scheduling is fine, but calling
>> schedule directly within such a section is not.
> 
> So how's this then:
> 
> 	if (user_access_begin()) {
> 
> 		preempt_disable();
> 
> 		<IRQ>
> 			set_need_resched();
> 		</IRQ no preempt>
> 
> 		preempt_enable()
> 		  __schedule();
> 
> 		user_access_end();
> 	}
> 
> That _should_ work just fine but explodes with the proposed nonsense.

AFAICT, This does not work properly because when you schedule out this
task, you won't be saving the EFLAGS.AF/PSTATE.PAN bit on the stack, and
next time you schedule the task back in, it might no longer have the
correct flag value (so an unsafe_get/put_user() will fail even though
you haven't reached user_access_end()).

One solution is to deal with this in task switching code, but so far
I've been told that calling schedule() in such a context is not expected
to be supported.

Cheers,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:00               ` Will Deacon
  2019-02-13 14:07                 ` Julien Thierry
  2019-02-13 14:17                 ` Peter Zijlstra
@ 2019-02-13 14:25                 ` Peter Zijlstra
  2019-02-13 14:39                   ` Julien Thierry
  2 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 14:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Julien Thierry, catalin.marinas, linux-kernel,
	valentin.schneider, mingo, james.morse, hpa, Ingo Molnar,
	linux-arm-kernel

On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> The difference is because getting preempted in the sequence above is
> triggered off the back of an interrupt. On arm64, and I think also on x86,
> the user access state (SMAP or PAN) is saved and restored across exceptions
> but not across context switch.

A quick reading of the SDM seems to suggest the SMAP state is part of
EFLAGS, which is context switched just fine AFAIK.

SMAP {ab,re}uses the EFLAGS.AC bit.

> Consequently, taking an irq in a
> user_access_{begin,end} section and then scheduling is fine, but calling
> schedule directly within such a section is not.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:25                 ` Peter Zijlstra
@ 2019-02-13 14:39                   ` Julien Thierry
  2019-02-13 14:41                     ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Julien Thierry @ 2019-02-13 14:39 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: catalin.marinas, linux-kernel, valentin.schneider, mingo,
	james.morse, hpa, Ingo Molnar, linux-arm-kernel

Hi Peter,

On 13/02/2019 14:25, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
>> The difference is because getting preempted in the sequence above is
>> triggered off the back of an interrupt. On arm64, and I think also on x86,
>> the user access state (SMAP or PAN) is saved and restored across exceptions
>> but not across context switch.
> 
> A quick reading of the SDM seems to suggest the SMAP state is part of
> EFLAGS, which is context switched just fine AFAIK.
> 
I fail to see where this is happening when looking at the switch_to()
logic in x86_64.

And Peter A. didn't seem to suggest that this transfer of the eflags was
happening without them being saved on the stack through exception handling.

What am I missing?

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:24                   ` Julien Thierry
@ 2019-02-13 14:40                     ` Peter Zijlstra
  2019-02-13 15:08                       ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 14:40 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, Will Deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 02:24:34PM +0000, Julien Thierry wrote:
> On 13/02/2019 14:17, Peter Zijlstra wrote:
> > On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> >>> This; how is getting preempted fundamentally different from scheduling
> >>> ourselves?
> >>
> >> The difference is because getting preempted in the sequence above is
> >> triggered off the back of an interrupt. On arm64, and I think also on x86,
> >> the user access state (SMAP or PAN) is saved and restored across exceptions
> >> but not across context switch. Consequently, taking an irq in a
> >> user_access_{begin,end} section and then scheduling is fine, but calling
> >> schedule directly within such a section is not.
> > 
> > So how's this then:
> > 
> > 	if (user_access_begin()) {
> > 
> > 		preempt_disable();
> > 
> > 		<IRQ>
> > 			set_need_resched();
> > 		</IRQ no preempt>
> > 
> > 		preempt_enable()
> > 		  __schedule();
> > 
> > 		user_access_end();
> > 	}
> > 
> > That _should_ work just fine but explodes with the proposed nonsense.
> 
> AFAICT, This does not work properly because when you schedule out this
> task, you won't be saving the EFLAGS.AF/PSTATE.PAN bit on the stack, and

EFLAGS.AC, but yes.

> next time you schedule the task back in, it might no longer have the
> correct flag value (so an unsafe_get/put_user() will fail even though
> you haven't reached user_access_end()).

/me looks at __switch_to_asm() and there is indeed a distinct lack of
pushing and popping EFLAGS :/

> One solution is to deal with this in task switching code, but so far
> I've been told that calling schedule() in such a context is not expected
> to be supported.

Well, per the above it breaks the preemption model. And I hates that.

And the added WARN doesn't even begin to cover it, since you'd have to
actually hit the preempt_enable() reschedule for it trigger.

So far, all 6 in-tree users are indeed free of dodgy code, but *groan*.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:39                   ` Julien Thierry
@ 2019-02-13 14:41                     ` Peter Zijlstra
  2019-02-13 15:45                       ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 14:41 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, Will Deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 02:39:22PM +0000, Julien Thierry wrote:
> Hi Peter,
> 
> On 13/02/2019 14:25, Peter Zijlstra wrote:
> > On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> >> The difference is because getting preempted in the sequence above is
> >> triggered off the back of an interrupt. On arm64, and I think also on x86,
> >> the user access state (SMAP or PAN) is saved and restored across exceptions
> >> but not across context switch.
> > 
> > A quick reading of the SDM seems to suggest the SMAP state is part of
> > EFLAGS, which is context switched just fine AFAIK.
> > 
> I fail to see where this is happening when looking at the switch_to()
> logic in x86_64.

Yeah, me too.. we obviously preserve EFLAGS for user context, but for
kernel-kernel switches we do not seem to preserve it :-(


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:40                     ` Peter Zijlstra
@ 2019-02-13 15:08                       ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 15:08 UTC (permalink / raw)
  To: Julien Thierry
  Cc: catalin.marinas, Will Deacon, linux-kernel, valentin.schneider,
	mingo, james.morse, hpa, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 03:40:00PM +0100, Peter Zijlstra wrote:

> So far, all 6 in-tree users are indeed free of dodgy code, but *groan*.

because of this, there must also not be tracepoints (even implicit ones
like function-trace) between user_access_{begin,end}().

And while that is unlikely with the current code; it is not guaranteed
afaict.

What a ff'ing mess.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 14:41                     ` Peter Zijlstra
@ 2019-02-13 15:45                       ` Peter Zijlstra
  2019-02-13 18:54                         ` Peter Zijlstra
                                           ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 15:45 UTC (permalink / raw)
  To: Julien Thierry
  Cc: dvlasenk, brgerst, catalin.marinas, jpoimboe, Will Deacon,
	linux-kernel, valentin.schneider, mingo, james.morse, luto, hpa,
	bp, tglx, torvalds, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 03:41:45PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 02:39:22PM +0000, Julien Thierry wrote:
> > Hi Peter,
> > 
> > On 13/02/2019 14:25, Peter Zijlstra wrote:
> > > On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> > >> The difference is because getting preempted in the sequence above is
> > >> triggered off the back of an interrupt. On arm64, and I think also on x86,
> > >> the user access state (SMAP or PAN) is saved and restored across exceptions
> > >> but not across context switch.
> > > 
> > > A quick reading of the SDM seems to suggest the SMAP state is part of
> > > EFLAGS, which is context switched just fine AFAIK.
> > > 
> > I fail to see where this is happening when looking at the switch_to()
> > logic in x86_64.
> 
> Yeah, me too.. we obviously preserve EFLAGS for user context, but for
> kernel-kernel switches we do not seem to preserve it :-(

So I dug around the context switch code a little, and I think we lost it
here:

  0100301bfdf5 ("sched/x86: Rewrite the switch_to() code")

Before that, x86_64 switch_to() read like (much simplified):

	asm volatile ( /* do RSP twiddle */
	  : /* output */
	  : /* input */
	  : "memory", "cc", .... "flags");

(see __EXTRA_CLOBBER)

Which I suppose means that GCC generates the PUSHF/POPF to preserve the
EFLAGS, since we mark those explicitly clobbered.

Before that:

  f05e798ad4c0 ("Disintegrate asm/system.h for X86")

We had explicit PUSHF / POPF in SAVE_CONTEXT / RESTORE_CONTEXT resp.

Now I cannot see how the current code preserves EFLAGS (if indeed it
does), and the changelog doesn't mention this change _AT_ALL_.


For a little bit of context; it turns out that user_access_begin() /
user_access_end() sets EFLAGS.AC and scheduling in between there wrecks
that because we're apparently not saving that anymore.

Now, I'm tempted to add the PUSHF / POPF right back because of this, but
first I suppose we need to figure out if that change was on purpose and
why that went missing from the Changelog.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 15:45                       ` Peter Zijlstra
@ 2019-02-13 18:54                         ` Peter Zijlstra
       [not found]                         ` <D61C430D-4321-4114-AB85-671A3C7B8EAE@amacapital.net>
  2019-02-13 23:19                         ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Linus Torvalds
  2 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 18:54 UTC (permalink / raw)
  To: Julien Thierry
  Cc: dvlasenk, brgerst, catalin.marinas, jpoimboe, Will Deacon,
	linux-kernel, valentin.schneider, mingo, james.morse, luto, hpa,
	bp, tglx, torvalds, Ingo Molnar, linux-arm-kernel

On Wed, Feb 13, 2019 at 04:45:32PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 03:41:45PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 13, 2019 at 02:39:22PM +0000, Julien Thierry wrote:
> > > Hi Peter,
> > > 
> > > On 13/02/2019 14:25, Peter Zijlstra wrote:
> > > > On Wed, Feb 13, 2019 at 02:00:26PM +0000, Will Deacon wrote:
> > > >> The difference is because getting preempted in the sequence above is
> > > >> triggered off the back of an interrupt. On arm64, and I think also on x86,
> > > >> the user access state (SMAP or PAN) is saved and restored across exceptions
> > > >> but not across context switch.
> > > > 
> > > > A quick reading of the SDM seems to suggest the SMAP state is part of
> > > > EFLAGS, which is context switched just fine AFAIK.
> > > > 
> > > I fail to see where this is happening when looking at the switch_to()
> > > logic in x86_64.
> > 
> > Yeah, me too.. we obviously preserve EFLAGS for user context, but for
> > kernel-kernel switches we do not seem to preserve it :-(
> 
> So I dug around the context switch code a little, and I think we lost it
> here:
> 
>   0100301bfdf5 ("sched/x86: Rewrite the switch_to() code")
> 
> Before that, x86_64 switch_to() read like (much simplified):
> 
> 	asm volatile ( /* do RSP twiddle */
> 	  : /* output */
> 	  : /* input */
> 	  : "memory", "cc", .... "flags");
> 
> (see __EXTRA_CLOBBER)
> 
> Which I suppose means that GCC generates the PUSHF/POPF to preserve the
> EFLAGS, since we mark those explicitly clobbered.
> 
> Before that:
> 
>   f05e798ad4c0 ("Disintegrate asm/system.h for X86")
> 
> We had explicit PUSHF / POPF in SAVE_CONTEXT / RESTORE_CONTEXT resp.
> 
> Now I cannot see how the current code preserves EFLAGS (if indeed it
> does), and the changelog doesn't mention this change _AT_ALL_.
> 
> 
> For a little bit of context; it turns out that user_access_begin() /
> user_access_end() sets EFLAGS.AC and scheduling in between there wrecks
> that because we're apparently not saving that anymore.
> 
> Now, I'm tempted to add the PUSHF / POPF right back because of this, but
> first I suppose we need to figure out if that change was on purpose and
> why that went missing from the Changelog.

Just for giggles; the below patch seems to boot.

---
 arch/x86/entry/entry_32.S        | 2 ++
 arch/x86/entry/entry_64.S        | 2 ++
 arch/x86/include/asm/switch_to.h | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d309f30cf7af..5fc76b755510 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -650,6 +650,7 @@ ENTRY(__switch_to_asm)
 	pushl	%ebx
 	pushl	%edi
 	pushl	%esi
+	pushfl
 
 	/* switch stack */
 	movl	%esp, TASK_threadsp(%eax)
@@ -672,6 +673,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfl
 	popl	%esi
 	popl	%edi
 	popl	%ebx
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb7b629..4fe27b67d7e2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -291,6 +291,7 @@ ENTRY(__switch_to_asm)
 	pushq	%r13
 	pushq	%r14
 	pushq	%r15
+	pushfq
 
 	/* switch stack */
 	movq	%rsp, TASK_threadsp(%rdi)
@@ -313,6 +314,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfq
 	popq	%r15
 	popq	%r14
 	popq	%r13
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 7cf1a270d891..157149d4129c 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
  * order of the fields must match the code in __switch_to_asm().
  */
 struct inactive_task_frame {
+	unsigned long flags;
 #ifdef CONFIG_X86_64
 	unsigned long r15;
 	unsigned long r14;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
       [not found]                         ` <D61C430D-4321-4114-AB85-671A3C7B8EAE@amacapital.net>
@ 2019-02-13 22:21                           ` Peter Zijlstra
  2019-02-13 22:49                             ` Andy Lutomirski
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-13 22:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dvlasenk, brgerst, Julien Thierry, catalin.marinas, jpoimboe,
	Will Deacon, linux-kernel, valentin.schneider, mingo,
	james.morse, luto, hpa, bp, tglx, torvalds, Ingo Molnar,
	linux-arm-kernel

On Wed, Feb 13, 2019 at 10:51:24AM -0800, Andy Lutomirski wrote:
> > On Feb 13, 2019, at 7:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> > Which I suppose means that GCC generates the PUSHF/POPF to preserve the
> > EFLAGS, since we mark those explicitly clobbered.
> > 
> 
> Not quite.  A flags clobber doesn’t save the control bits like AC
> except on certain rather buggy llvm compilers. The change you’re
> looking for is:
> 
> http://git.kernel.org/tip/2c7577a7583747c9b71f26dced7f696b739da745

Indeed, failed to find that.

> > For a little bit of context; it turns out that user_access_begin() /
> > user_access_end() sets EFLAGS.AC and scheduling in between there wrecks
> > that because we're apparently not saving that anymore.
> 
> But only explicit scheduling — preemption and sleepy page faults are
> fine because the interrupt frame saves flags.

No, like pointed out elsewhere in this thread, anything that does
preempt_disable() is utterly broken with this.

Because at that point the IRQ return path doesn't reschedule but
preempt_enable() will, and that doesn't preserve EFLAGS again.

> > Now, I'm tempted to add the PUSHF / POPF right back because of this, but
> > first I suppose we need to figure out if that change was on purpose and
> > why that went missing from the Changelog.
> 
> That’s certainly the easy solution. Or we could teach the might_sleep
> checks about this, but that could be a mess.

That's not enough, we'd have to teach preempt_disable(), but worse,
preempt_disable_notrace().

Anything that lands in ftrace, which _will_ use
preempt_disable_notrace(), will screw this thing up.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 22:21                           ` Peter Zijlstra
@ 2019-02-13 22:49                             ` Andy Lutomirski
  2019-02-14 10:14                               ` [PATCH] sched/x86: Save [ER]FLAGS on context switch Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Andy Lutomirski @ 2019-02-13 22:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dvlasenk, brgerst, Julien Thierry, catalin.marinas, jpoimboe,
	Will Deacon, linux-kernel, valentin.schneider, mingo,
	james.morse, luto, hpa, bp, tglx, torvalds, Ingo Molnar,
	linux-arm-kernel



> On Feb 13, 2019, at 2:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Feb 13, 2019 at 10:51:24AM -0800, Andy Lutomirski wrote:
>>> On Feb 13, 2019, at 7:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>> Which I suppose means that GCC generates the PUSHF/POPF to preserve the
>>> EFLAGS, since we mark those explicitly clobbered.
>>> 
>> 
>> Not quite.  A flags clobber doesn’t save the control bits like AC
>> except on certain rather buggy llvm compilers. The change you’re
>> looking for is:
>> 
>> http://git.kernel.org/tip/2c7577a7583747c9b71f26dced7f696b739da745
> 
> Indeed, failed to find that.
> 
>>> For a little bit of context; it turns out that user_access_begin() /
>>> user_access_end() sets EFLAGS.AC and scheduling in between there wrecks
>>> that because we're apparently not saving that anymore.
>> 
>> But only explicit scheduling — preemption and sleepy page faults are
>> fine because the interrupt frame saves flags.
> 
> No, like pointed out elsewhere in this thread, anything that does
> preempt_disable() is utterly broken with this.
> 
> Because at that point the IRQ return path doesn't reschedule but
> preempt_enable() will, and that doesn't preserve EFLAGS again.
> 
>>> Now, I'm tempted to add the PUSHF / POPF right back because of this, but
>>> first I suppose we need to figure out if that change was on purpose and
>>> why that went missing from the Changelog.
>> 
>> That’s certainly the easy solution. Or we could teach the might_sleep
>> checks about this, but that could be a mess.
> 
> That's not enough, we'd have to teach preempt_disable(), but worse,
> preempt_disable_notrace().
> 
> Anything that lands in ftrace, which _will_ use
> preempt_disable_notrace(), will screw this thing up.

Ugh.  Consider your patch acked.  Do we need to backport this thing?  The problem can’t be too widespread or we would have heard of it before.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region
  2019-02-13 15:45                       ` Peter Zijlstra
  2019-02-13 18:54                         ` Peter Zijlstra
       [not found]                         ` <D61C430D-4321-4114-AB85-671A3C7B8EAE@amacapital.net>
@ 2019-02-13 23:19                         ` Linus Torvalds
  2 siblings, 0 replies; 58+ messages in thread
From: Linus Torvalds @ 2019-02-13 23:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dvlasenk, brgerst, Julien Thierry, Catalin Marinas,
	Josh Poimboeuf, Will Deacon, Linux List Kernel Mailing,
	valentin.schneider, Ingo Molnar, James Morse, Andrew Lutomirski,
	Peter Anvin, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	linux-alpha

On Wed, Feb 13, 2019 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Before that, x86_64 switch_to() read like (much simplified):
>
>         asm volatile ( /* do RSP twiddle */
>           : /* output */
>           : /* input */
>           : "memory", "cc", .... "flags");
>
> (see __EXTRA_CLOBBER)
>
> Which I suppose means that GCC generates the PUSHF/POPF to preserve the
> EFLAGS, since we mark those explicitly clobbered.

No, it only means that gcc won't keep conditionals in the flags over
the asm. It doesn't make gcc save anything.

The push/pop got removed elsewhere as Andy says.

That said, I do agree that it's probably a good idea to save/restore
flags anyway when scheduling. It's not just AC, actually, now that I
look at it again I worry a bit about DF too.

We have the rule that we run with DF clear in the kernel, and all the
kernel entry points do clear it properly (so that memcpy etc don't
need to). But there are a few places that set DF temporarily because
they do something odd (backwards memmove), and those atcually have the
*exact* same issue as stac/clac has: it's ok to take a trap or
interrupt, and schedule due to that (because the trap/irq will clear
DF), but it would be a horrible bug to have a synchronous scheduling
point there.

Arguably the DF issue really isn't even remotely likely to actually be
a real issue (the code that sets DF really is very special and should
never do any kind of preemption), but it's conceptually quite
similar..

Of course, if DF is ever set, and we end up calling any C code at all,
I guess it would already be a huge problem. If the C code then does
memcpy or something, it would corrupt things quite badly.

So I guess save/restore isn't going to save us wrt DF. If we get
anywhere close to the scheduler with the DF bit set, we've already
lost.

But I still do kind of prefer saving flags. We have other sticky state
in there too, although none of it matters in the kernel currently (eg
iopl etc - only matters in user space, and user space will always
reload eflags on return).

                 Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-13 22:49                             ` Andy Lutomirski
@ 2019-02-14 10:14                               ` Peter Zijlstra
  2019-02-14 16:18                                 ` Brian Gerst
                                                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-14 10:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dvlasenk, brgerst, Julien Thierry, catalin.marinas, jpoimboe,
	Will Deacon, linux-kernel, valentin.schneider, mingo,
	james.morse, luto, hpa, bp, tglx, torvalds, Ingo Molnar,
	linux-arm-kernel

On Wed, Feb 13, 2019 at 02:49:47PM -0800, Andy Lutomirski wrote:

> Do we need to backport this thing?

Possibly, just to be safe.

> The problem can’t be too widespread or we would have heard of it before.

Yes, so far we've been lucky.

---
Subject: sched/x86: Save [ER]FLAGS on context switch
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Feb 14 10:30:52 CET 2019

Effectively revert commit:

  2c7577a75837 ("sched/x86_64: Don't save flags on context switch")

Specifically because SMAP uses FLAGS.AC which invalidates the claim
that the kernel has clean flags.

In particular; while preemption from interrupt return is fine (the
IRET frame on the exception stack contains FLAGS) it breaks any code
that does synchonous scheduling, including preempt_enable().

This has become a significant issue ever since commit:

  5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")

provided for means of having 'normal' C code between STAC / CLAC,
exposing the FLAGS.AC state. So far this hasn't led to trouble,
however fix it before it comes apart.

Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_32.S        |    2 ++
 arch/x86/entry/entry_64.S        |    2 ++
 arch/x86/include/asm/switch_to.h |    1 +
 3 files changed, 5 insertions(+)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -650,6 +650,7 @@ ENTRY(__switch_to_asm)
 	pushl	%ebx
 	pushl	%edi
 	pushl	%esi
+	pushfl
 
 	/* switch stack */
 	movl	%esp, TASK_threadsp(%eax)
@@ -672,6 +673,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfl
 	popl	%esi
 	popl	%edi
 	popl	%ebx
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -291,6 +291,7 @@ ENTRY(__switch_to_asm)
 	pushq	%r13
 	pushq	%r14
 	pushq	%r15
+	pushfq
 
 	/* switch stack */
 	movq	%rsp, TASK_threadsp(%rdi)
@@ -313,6 +314,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfq
 	popq	%r15
 	popq	%r14
 	popq	%r13
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
  * order of the fields must match the code in __switch_to_asm().
  */
 struct inactive_task_frame {
+	unsigned long flags;
 #ifdef CONFIG_X86_64
 	unsigned long r15;
 	unsigned long r14;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 10:14                               ` [PATCH] sched/x86: Save [ER]FLAGS on context switch Peter Zijlstra
@ 2019-02-14 16:18                                 ` Brian Gerst
  2019-02-14 19:34                                   ` Peter Zijlstra
  2019-02-16  4:06                                 ` hpa
  2019-02-18  9:03                                 ` [PATCH v2] " Peter Zijlstra
  2 siblings, 1 reply; 58+ messages in thread
From: Brian Gerst @ 2019-02-14 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Catalin Marinas,
	valentin.schneider, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, James Morse, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar, linux-arm-kernel

On Thu, Feb 14, 2019 at 5:14 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Feb 13, 2019 at 02:49:47PM -0800, Andy Lutomirski wrote:
>
> > Do we need to backport this thing?
>
> Possibly, just to be safe.
>
> > The problem can’t be too widespread or we would have heard of it before.
>
> Yes, so far we've been lucky.
>
> ---
> Subject: sched/x86: Save [ER]FLAGS on context switch
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu Feb 14 10:30:52 CET 2019
>
> Effectively revert commit:
>
>   2c7577a75837 ("sched/x86_64: Don't save flags on context switch")
>
> Specifically because SMAP uses FLAGS.AC which invalidates the claim
> that the kernel has clean flags.
>
> In particular; while preemption from interrupt return is fine (the
> IRET frame on the exception stack contains FLAGS) it breaks any code
> that does synchonous scheduling, including preempt_enable().
>
> This has become a significant issue ever since commit:
>
>   5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
>
> provided for means of having 'normal' C code between STAC / CLAC,
> exposing the FLAGS.AC state. So far this hasn't led to trouble,
> however fix it before it comes apart.
>
> Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
> Acked-by: Andy Lutomirski <luto@amacapital.net>
> Reported-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/entry/entry_32.S        |    2 ++
>  arch/x86/entry/entry_64.S        |    2 ++
>  arch/x86/include/asm/switch_to.h |    1 +
>  3 files changed, 5 insertions(+)
>
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -650,6 +650,7 @@ ENTRY(__switch_to_asm)
>         pushl   %ebx
>         pushl   %edi
>         pushl   %esi
> +       pushfl
>
>         /* switch stack */
>         movl    %esp, TASK_threadsp(%eax)
> @@ -672,6 +673,7 @@ ENTRY(__switch_to_asm)
>  #endif
>
>         /* restore callee-saved registers */
> +       popfl
>         popl    %esi
>         popl    %edi
>         popl    %ebx
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -291,6 +291,7 @@ ENTRY(__switch_to_asm)
>         pushq   %r13
>         pushq   %r14
>         pushq   %r15
> +       pushfq
>
>         /* switch stack */
>         movq    %rsp, TASK_threadsp(%rdi)
> @@ -313,6 +314,7 @@ ENTRY(__switch_to_asm)
>  #endif
>
>         /* restore callee-saved registers */
> +       popfq
>         popq    %r15
>         popq    %r14
>         popq    %r13
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
>   * order of the fields must match the code in __switch_to_asm().
>   */
>  struct inactive_task_frame {
> +       unsigned long flags;
>  #ifdef CONFIG_X86_64
>         unsigned long r15;
>         unsigned long r14;

flags should be initialized in copy_thread_tls().  I think the new
stack is zeroed already, but it would be better to explicitly set it.

--
Brian Gerst

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 16:18                                 ` Brian Gerst
@ 2019-02-14 19:34                                   ` Peter Zijlstra
  2019-02-15 14:34                                     ` Brian Gerst
  2019-02-15 17:18                                     ` Linus Torvalds
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-14 19:34 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Catalin Marinas,
	valentin.schneider, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, James Morse, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar, linux-arm-kernel

On Thu, Feb 14, 2019 at 11:18:01AM -0500, Brian Gerst wrote:

> > --- a/arch/x86/include/asm/switch_to.h
> > +++ b/arch/x86/include/asm/switch_to.h
> > @@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
> >   * order of the fields must match the code in __switch_to_asm().
> >   */
> >  struct inactive_task_frame {
> > +       unsigned long flags;
> >  #ifdef CONFIG_X86_64
> >         unsigned long r15;
> >         unsigned long r14;
> 
> flags should be initialized in copy_thread_tls().  I think the new
> stack is zeroed already, but it would be better to explicitly set it.

Ah indeed. I somehow misread that code and thought we got initialized
with a copy of current.

Something like the below, right?

---

--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -127,6 +127,7 @@ int copy_thread_tls(unsigned long clone_
 	struct task_struct *tsk;
 	int err;
 
+	frame->flags = 0;
 	frame->bp = 0;
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -392,6 +392,7 @@ int copy_thread_tls(unsigned long clone_
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;
+	frame->flags = 0;
 	frame->bp = 0;
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 19:34                                   ` Peter Zijlstra
@ 2019-02-15 14:34                                     ` Brian Gerst
  2019-02-15 17:18                                     ` Linus Torvalds
  1 sibling, 0 replies; 58+ messages in thread
From: Brian Gerst @ 2019-02-15 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Catalin Marinas,
	valentin.schneider, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, James Morse, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar, linux-arm-kernel

On Thu, Feb 14, 2019 at 2:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Feb 14, 2019 at 11:18:01AM -0500, Brian Gerst wrote:
>
> > > --- a/arch/x86/include/asm/switch_to.h
> > > +++ b/arch/x86/include/asm/switch_to.h
> > > @@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
> > >   * order of the fields must match the code in __switch_to_asm().
> > >   */
> > >  struct inactive_task_frame {
> > > +       unsigned long flags;
> > >  #ifdef CONFIG_X86_64
> > >         unsigned long r15;
> > >         unsigned long r14;
> >
> > flags should be initialized in copy_thread_tls().  I think the new
> > stack is zeroed already, but it would be better to explicitly set it.
>
> Ah indeed. I somehow misread that code and thought we got initialized
> with a copy of current.
>
> Something like the below, right?
>
> ---
>
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -127,6 +127,7 @@ int copy_thread_tls(unsigned long clone_
>         struct task_struct *tsk;
>         int err;
>
> +       frame->flags = 0;
>         frame->bp = 0;
>         frame->ret_addr = (unsigned long) ret_from_fork;
>         p->thread.sp = (unsigned long) fork_frame;
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -392,6 +392,7 @@ int copy_thread_tls(unsigned long clone_
>         childregs = task_pt_regs(p);
>         fork_frame = container_of(childregs, struct fork_frame, regs);
>         frame = &fork_frame->frame;
> +       frame->flags = 0;
>         frame->bp = 0;
>         frame->ret_addr = (unsigned long) ret_from_fork;
>         p->thread.sp = (unsigned long) fork_frame;

Yes, this looks good.

--
Brian Gerst

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 19:34                                   ` Peter Zijlstra
  2019-02-15 14:34                                     ` Brian Gerst
@ 2019-02-15 17:18                                     ` Linus Torvalds
  2019-02-15 17:40                                       ` Peter Zijlstra
  1 sibling, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2019-02-15 17:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, valentin.schneider, Ingo Molnar, James Morse,
	Andy Lutomirski, Catalin Marinas, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Thu, Feb 14, 2019 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Something like the below, right?
>
> +       frame->flags = 0;
> +       frame->flags = 0;

Those are not valid flag values.

Can you popf them? Yes.

Do they make sense? No.

It has the IF flag clear, for example. Is that intentional? If it is,
it should likely be documented. Because IF clear means "interrupts
disabled". Are the places that load flags in irq disabled regions?
Maybe they are, maybe they aren't, but shouldn't this be something
that is made explicit, rather than "oh, let's initialize to zero like
all the other registers we don't care about".

Because a zero initializer for eflags really is odd.

             Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-15 17:18                                     ` Linus Torvalds
@ 2019-02-15 17:40                                       ` Peter Zijlstra
  2019-02-15 18:28                                         ` Andy Lutomirski
  2019-02-15 23:34                                         ` Peter Zijlstra
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-15 17:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, valentin.schneider, Ingo Molnar, James Morse,
	Andy Lutomirski, Catalin Marinas, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Fri, Feb 15, 2019 at 09:18:00AM -0800, Linus Torvalds wrote:
> On Thu, Feb 14, 2019 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Something like the below, right?
> >
> > +       frame->flags = 0;
> > +       frame->flags = 0;
> 
> Those are not valid flag values.
> 
> Can you popf them? Yes.
> 
> Do they make sense? No.
> 
> It has the IF flag clear, for example. Is that intentional? If it is,

Uhmm. yeah, that's bonkers. We should have interrupts disabled here.
I'll go read up on the eflags and figure out what they _should_ be right
about there.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-15 17:40                                       ` Peter Zijlstra
@ 2019-02-15 18:28                                         ` Andy Lutomirski
  2019-02-15 23:34                                         ` Peter Zijlstra
  1 sibling, 0 replies; 58+ messages in thread
From: Andy Lutomirski @ 2019-02-15 18:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	valentin.schneider, Ingo Molnar, James Morse, Andy Lutomirski,
	Catalin Marinas, Borislav Petkov, Thomas Gleixner,
	Linus Torvalds, Ingo Molnar, linux-alpha



> On Feb 15, 2019, at 9:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Fri, Feb 15, 2019 at 09:18:00AM -0800, Linus Torvalds wrote:
>>> On Thu, Feb 14, 2019 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> Something like the below, right?
>>> 
>>> +       frame->flags = 0;
>>> +       frame->flags = 0;
>> 
>> Those are not valid flag values.
>> 
>> Can you popf them? Yes.
>> 
>> Do they make sense? No.
>> 
>> It has the IF flag clear, for example. Is that intentional? If it is,
> 
> Uhmm. yeah, that's bonkers. We should have interrupts disabled here.
> I'll go read up on the eflags and figure out what they _should_ be right
> about there.

And probably add a comment near the POPF explaining that it will keep IRQs off :)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-15 17:40                                       ` Peter Zijlstra
  2019-02-15 18:28                                         ` Andy Lutomirski
@ 2019-02-15 23:34                                         ` Peter Zijlstra
  2019-02-16  0:21                                           ` Linus Torvalds
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-15 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, valentin.schneider, Ingo Molnar, James Morse,
	Andy Lutomirski, Catalin Marinas, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Fri, Feb 15, 2019 at 06:40:34PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 09:18:00AM -0800, Linus Torvalds wrote:
> > On Thu, Feb 14, 2019 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Something like the below, right?
> > >
> > > +       frame->flags = 0;
> > > +       frame->flags = 0;
> > 
> > Those are not valid flag values.
> > 
> > Can you popf them? Yes.
> > 
> > Do they make sense? No.
> > 
> > It has the IF flag clear, for example. Is that intentional? If it is,
> 
> Uhmm. yeah, that's bonkers. We should have interrupts disabled here.
> I'll go read up on the eflags and figure out what they _should_ be right
> about there.

I misread (I'm forever confused about what way around IF goes), but you
said it right; IF=0 is interrupts disabled and we very much have that in
the middle of context switch.

(just for giggles I set IF for the initial flags value; and it comes
unstuck _real_ quick)

Now, EFLAGS bit 1 is supposedly always 1, but it really doesn't seem to
matter for POPF.

I went through the other flags, and aside from VIP/VIF (I've no clue),
they looks like 0 should be just fine.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-15 23:34                                         ` Peter Zijlstra
@ 2019-02-16  0:21                                           ` Linus Torvalds
  2019-02-16 10:32                                             ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Linus Torvalds @ 2019-02-16  0:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, valentin.schneider, Ingo Molnar, James Morse,
	Andy Lutomirski, Catalin Marinas, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Fri, Feb 15, 2019 at 3:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Now, EFLAGS bit 1 is supposedly always 1, but it really doesn't seem to
> matter for POPF.

Correct, it's "read as 1", you can try to write it and it doesn't matter.

> I went through the other flags, and aside from VIP/VIF (I've no clue),
> they looks like 0 should be just fine.

So 0 is a perfectly valid initializer in the sense that it _works_, I
just want it to be something that was thought about, not just a random
"initialize to zero" without thinking.

Even just a comment about it would be fine. But it might also be good
to show that it's an explicit eflags value and just use
X86_EFLAGS_FIXED as the initializer.

            Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 10:14                               ` [PATCH] sched/x86: Save [ER]FLAGS on context switch Peter Zijlstra
  2019-02-14 16:18                                 ` Brian Gerst
@ 2019-02-16  4:06                                 ` hpa
  2019-02-16 10:30                                   ` Peter Zijlstra
  2019-02-18  9:03                                 ` [PATCH v2] " Peter Zijlstra
  2 siblings, 1 reply; 58+ messages in thread
From: hpa @ 2019-02-16  4:06 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: dvlasenk, Julien Thierry, catalin.marinas, jpoimboe, Will Deacon,
	linux-kernel, valentin.schneider, mingo, james.morse, luto,
	brgerst, bp, tglx, torvalds, Ingo Molnar, linux-arm-kernel

On February 14, 2019 2:14:29 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Wed, Feb 13, 2019 at 02:49:47PM -0800, Andy Lutomirski wrote:
>
>> Do we need to backport this thing?
>
>Possibly, just to be safe.
>
>> The problem can’t be too widespread or we would have heard of it
>before.
>
>Yes, so far we've been lucky.
>
>---
>Subject: sched/x86: Save [ER]FLAGS on context switch
>From: Peter Zijlstra <peterz@infradead.org>
>Date: Thu Feb 14 10:30:52 CET 2019
>
>Effectively revert commit:
>
>  2c7577a75837 ("sched/x86_64: Don't save flags on context switch")
>
>Specifically because SMAP uses FLAGS.AC which invalidates the claim
>that the kernel has clean flags.
>
>In particular; while preemption from interrupt return is fine (the
>IRET frame on the exception stack contains FLAGS) it breaks any code
>that does synchonous scheduling, including preempt_enable().
>
>This has become a significant issue ever since commit:
>
>5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
>accesses")
>
>provided for means of having 'normal' C code between STAC / CLAC,
>exposing the FLAGS.AC state. So far this hasn't led to trouble,
>however fix it before it comes apart.
>
>Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
>accesses")
>Acked-by: Andy Lutomirski <luto@amacapital.net>
>Reported-by: Julien Thierry <julien.thierry@arm.com>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> arch/x86/entry/entry_32.S        |    2 ++
> arch/x86/entry/entry_64.S        |    2 ++
> arch/x86/include/asm/switch_to.h |    1 +
> 3 files changed, 5 insertions(+)
>
>--- a/arch/x86/entry/entry_32.S
>+++ b/arch/x86/entry/entry_32.S
>@@ -650,6 +650,7 @@ ENTRY(__switch_to_asm)
> 	pushl	%ebx
> 	pushl	%edi
> 	pushl	%esi
>+	pushfl
> 
> 	/* switch stack */
> 	movl	%esp, TASK_threadsp(%eax)
>@@ -672,6 +673,7 @@ ENTRY(__switch_to_asm)
> #endif
> 
> 	/* restore callee-saved registers */
>+	popfl
> 	popl	%esi
> 	popl	%edi
> 	popl	%ebx
>--- a/arch/x86/entry/entry_64.S
>+++ b/arch/x86/entry/entry_64.S
>@@ -291,6 +291,7 @@ ENTRY(__switch_to_asm)
> 	pushq	%r13
> 	pushq	%r14
> 	pushq	%r15
>+	pushfq
> 
> 	/* switch stack */
> 	movq	%rsp, TASK_threadsp(%rdi)
>@@ -313,6 +314,7 @@ ENTRY(__switch_to_asm)
> #endif
> 
> 	/* restore callee-saved registers */
>+	popfq
> 	popq	%r15
> 	popq	%r14
> 	popq	%r13
>--- a/arch/x86/include/asm/switch_to.h
>+++ b/arch/x86/include/asm/switch_to.h
>@@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
>  * order of the fields must match the code in __switch_to_asm().
>  */
> struct inactive_task_frame {
>+	unsigned long flags;
> #ifdef CONFIG_X86_64
> 	unsigned long r15;
> 	unsigned long r14;

This implies we invoke schedule -- a restricted operation (consider may_sleep) during execution of STAC-enabled code, but *not* as an exception or interrupt, since those preserve the flags.

I have serious concerns about this. This is more or less saying that we have left an unlimited gap and have had AC escape.

Does *anyone* see a need to allow this? I got a question at LPC from someone about this, and what they were trying to do once all the layers had been unwound was so far down the wrong track for a root problem that actually has a very simple solution.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-16  4:06                                 ` hpa
@ 2019-02-16 10:30                                   ` Peter Zijlstra
  2019-02-18 22:30                                     ` H. Peter Anvin
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-16 10:30 UTC (permalink / raw)
  To: hpa
  Cc: dvlasenk, jpoimboe, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, brgerst, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@zytor.com wrote:
> This implies we invoke schedule -- a restricted operation (consider
> may_sleep) during execution of STAC-enabled code, but *not* as an
> exception or interrupt, since those preserve the flags.

Meet preempt_enable().

> I have serious concerns about this. This is more or less saying that
> we have left an unlimited gap and have had AC escape.

Yes; by allowing normal C in between STAC and CLAC this is so.

> Does *anyone* see a need to allow this? I got a question at LPC from
> someone about this, and what they were trying to do once all the
> layers had been unwound was so far down the wrong track for a root
> problem that actually has a very simple solution.

Have you read the rest of the thread?

All it takes for this to explode is a call to a C function that has
tracing on it in between the user_access_begin() and user_access_end()
things. That is a _very_ easy thing to do.

Heck, GCC is allowed to generate that broken code compiling
lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user (esp.
with CONFIG_OPTIMIZE_INLINING), and making that a function call would
get us fentry hooks and ftrace and *BOOM*.

(Now, of course, its a static function with a single caller, and GCC
isn't _THAT_ stupid, but it could, if it wanted to)

Since the bar is _that_ low for mistakes, I figure we'd better fix it.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-16  0:21                                           ` Linus Torvalds
@ 2019-02-16 10:32                                             ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-16 10:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Brian Gerst,
	H. Peter Anvin, Will Deacon, Linux Kernel Mailing List,
	Andy Lutomirski, valentin.schneider, Ingo Molnar, James Morse,
	Andy Lutomirski, Catalin Marinas, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Fri, Feb 15, 2019 at 04:21:46PM -0800, Linus Torvalds wrote:
> Even just a comment about it would be fine. But it might also be good
> to show that it's an explicit eflags value and just use
> X86_EFLAGS_FIXED as the initializer.

That is in fact what I have now; I'll repost on Monday or so.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2] sched/x86: Save [ER]FLAGS on context switch
  2019-02-14 10:14                               ` [PATCH] sched/x86: Save [ER]FLAGS on context switch Peter Zijlstra
  2019-02-14 16:18                                 ` Brian Gerst
  2019-02-16  4:06                                 ` hpa
@ 2019-02-18  9:03                                 ` " Peter Zijlstra
  2 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-18  9:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dvlasenk, brgerst, Julien Thierry, catalin.marinas, jpoimboe,
	Will Deacon, linux-kernel, valentin.schneider, mingo,
	james.morse, luto, hpa, bp, tglx, torvalds, Ingo Molnar,
	linux-arm-kernel



Subject: sched/x86: Save [ER]FLAGS on context switch
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Feb 14 10:30:52 CET 2019

Effectively reverts commit:

  2c7577a75837 ("sched/x86_64: Don't save flags on context switch")

Specifically because SMAP uses FLAGS.AC which invalidates the claim
that the kernel has clean flags.

In particular; while preemption from interrupt return is fine (the
IRET frame on the exception stack contains FLAGS) it breaks any code
that does synchonous scheduling, including preempt_enable().

This has become a significant issue ever since commit:

  5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")

provided for means of having 'normal' C code between STAC / CLAC,
exposing the FLAGS.AC state. So far this hasn't led to trouble,
however fix it before it comes apart.

Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_32.S        |    2 ++
 arch/x86/entry/entry_64.S        |    2 ++
 arch/x86/include/asm/switch_to.h |    1 +
 arch/x86/kernel/process_32.c     |    7 +++++++
 arch/x86/kernel/process_64.c     |    8 ++++++++
 5 files changed, 20 insertions(+)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -650,6 +650,7 @@ ENTRY(__switch_to_asm)
 	pushl	%ebx
 	pushl	%edi
 	pushl	%esi
+	pushfl
 
 	/* switch stack */
 	movl	%esp, TASK_threadsp(%eax)
@@ -672,6 +673,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfl
 	popl	%esi
 	popl	%edi
 	popl	%ebx
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -291,6 +291,7 @@ ENTRY(__switch_to_asm)
 	pushq	%r13
 	pushq	%r14
 	pushq	%r15
+	pushfq
 
 	/* switch stack */
 	movq	%rsp, TASK_threadsp(%rdi)
@@ -313,6 +314,7 @@ ENTRY(__switch_to_asm)
 #endif
 
 	/* restore callee-saved registers */
+	popfq
 	popq	%r15
 	popq	%r14
 	popq	%r13
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void);
  * order of the fields must match the code in __switch_to_asm().
  */
 struct inactive_task_frame {
+	unsigned long flags;
 #ifdef CONFIG_X86_64
 	unsigned long r15;
 	unsigned long r14;
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -127,6 +127,13 @@ int copy_thread_tls(unsigned long clone_
 	struct task_struct *tsk;
 	int err;
 
+	/*
+	 * For a new task use the RESET flags value since there is no before.
+	 * All the status flags are zero; DF and all the system flags must also
+	 * be 0, specifically IF must be 0 because we context switch to the new
+	 * task with interrupts disabled.
+	 */
+	frame->flags = X86_EFLAGS_FIXED;
 	frame->bp = 0;
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -392,6 +392,14 @@ int copy_thread_tls(unsigned long clone_
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;
+
+	/*
+	 * For a new task use the RESET flags value since there is no before.
+	 * All the status flags are zero; DF and all the system flags must also
+	 * be 0, specifically IF must be 0 because we context switch to the new
+	 * task with interrupts disabled.
+	 */
+	frame->flags = X86_EFLAGS_FIXED;
 	frame->bp = 0;
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-16 10:30                                   ` Peter Zijlstra
@ 2019-02-18 22:30                                     ` H. Peter Anvin
  2019-02-19  0:24                                       ` Linus Torvalds
  2019-02-19  9:04                                       ` Peter Zijlstra
  0 siblings, 2 replies; 58+ messages in thread
From: H. Peter Anvin @ 2019-02-18 22:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dvlasenk, jpoimboe, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, brgerst, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On 2/16/19 2:30 AM, Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@zytor.com wrote:
>> This implies we invoke schedule -- a restricted operation (consider
>> may_sleep) during execution of STAC-enabled code, but *not* as an
>> exception or interrupt, since those preserve the flags.
> 
> Meet preempt_enable().

I believe this falls under "doctor, it hurts when I do that." And it hurts for
very good reasons. See below.

>> I have serious concerns about this. This is more or less saying that
>> we have left an unlimited gap and have had AC escape.
> 
> Yes; by allowing normal C in between STAC and CLAC this is so.
> 
>> Does *anyone* see a need to allow this? I got a question at LPC from
>> someone about this, and what they were trying to do once all the
>> layers had been unwound was so far down the wrong track for a root
>> problem that actually has a very simple solution.
> 
> Have you read the rest of the thread?
> 
> All it takes for this to explode is a call to a C function that has
> tracing on it in between the user_access_begin() and user_access_end()
> things. That is a _very_ easy thing to do.
> 
> Heck, GCC is allowed to generate that broken code compiling
> lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user (esp.
> with CONFIG_OPTIMIZE_INLINING), and making that a function call would
> get us fentry hooks and ftrace and *BOOM*.
> 
> (Now, of course, its a static function with a single caller, and GCC
> isn't _THAT_ stupid, but it could, if it wanted to)
> 
> Since the bar is _that_ low for mistakes, I figure we'd better fix it.
> 

The question is what "fix it" means. I'm really concerned about AC escapes,
and everyone else should be, too.

For an issue specific to tracing, it would be more appropriate to do the
appropriate things in the tracing entry/exit than in schedule. Otherwise, we
don't want to silently paper over mistakes which could mean that we run a
large portion of the kernel without protection we thought we had.

In that sense, calling schedule() with AC set is in fact a great place to have
a WARN() or even BUG(), because such an event really could imply that the
kernel has been security compromised.

Does that make more sense?

	-hpa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-18 22:30                                     ` H. Peter Anvin
@ 2019-02-19  0:24                                       ` Linus Torvalds
  2019-02-19  2:20                                         ` Andy Lutomirski
                                                           ` (2 more replies)
  2019-02-19  9:04                                       ` Peter Zijlstra
  1 sibling, 3 replies; 58+ messages in thread
From: Linus Torvalds @ 2019-02-19  0:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Josh Poimboeuf, Julien Thierry, Peter Zijlstra,
	Catalin Marinas, valentin.schneider, Will Deacon,
	Linux List Kernel Mailing, Andy Lutomirski, Ingo Molnar,
	James Morse, Andrew Lutomirski, Brian Gerst, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> The question is what "fix it" means. I'm really concerned about AC escapes,
> and everyone else should be, too.

I do think that it might be the right thing to do to add some kind of
WARN_ON_ONCE() for AC being set in various can-reschedule situations.

We'd just have to abstract it sanely. I'm sure arm64 has the exact
same issue with PAN - maybe it saves properly, but the same "we
wouldn't want to go through the scheduler with PAN clear".

On x86, we might as well check DF at the same time as AC.

              Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  0:24                                       ` Linus Torvalds
@ 2019-02-19  2:20                                         ` Andy Lutomirski
  2019-02-19  2:46                                           ` H. Peter Anvin
  2019-02-19  8:53                                         ` Julien Thierry
  2019-02-19  9:15                                         ` Peter Zijlstra
  2 siblings, 1 reply; 58+ messages in thread
From: Andy Lutomirski @ 2019-02-19  2:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Brian Gerst, Julien Thierry, Peter Zijlstra,
	Catalin Marinas, Josh Poimboeuf, Will Deacon,
	Linux List Kernel Mailing, valentin.schneider, Ingo Molnar,
	James Morse, Andrew Lutomirski, H. Peter Anvin, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha



> On Feb 18, 2019, at 4:24 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
>> 
>> The question is what "fix it" means. I'm really concerned about AC escapes,
>> and everyone else should be, too.
> 
> I do think that it might be the right thing to do to add some kind of
> WARN_ON_ONCE() for AC being set in various can-reschedule situations.
> 
> We'd just have to abstract it sanely. I'm sure arm64 has the exact
> same issue with PAN - maybe it saves properly, but the same "we
> wouldn't want to go through the scheduler with PAN clear".
> 
> On x86, we might as well check DF at the same time as AC.
> 
> 

hpa is right, though — calling into tracing code with AC set is not really so good.  And calling schedule() (via preempt_enable() or whatever) is also bad because it runs all the scheduler code with AC on.  Admittedly, the scheduler is not *that* interesting of an attack surface.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  2:20                                         ` Andy Lutomirski
@ 2019-02-19  2:46                                           ` H. Peter Anvin
  2019-02-19  9:07                                             ` Julien Thierry
  0 siblings, 1 reply; 58+ messages in thread
From: H. Peter Anvin @ 2019-02-19  2:46 UTC (permalink / raw)
  To: Andy Lutomirski, Linus Torvalds
  Cc: Denys Vlasenko, Julien Thierry, Peter Zijlstra, Catalin Marinas,
	Josh Poimboeuf, Will Deacon, Linux List Kernel Mailing,
	valentin.schneider, Ingo Molnar, James Morse, Andrew Lutomirski,
	Brian Gerst, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	linux-alpha

On 2/18/19 6:20 PM, Andy Lutomirski wrote:
> 
> 
>> On Feb 18, 2019, at 4:24 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> The question is what "fix it" means. I'm really concerned about AC escapes,
>>> and everyone else should be, too.
>>
>> I do think that it might be the right thing to do to add some kind of
>> WARN_ON_ONCE() for AC being set in various can-reschedule situations.
>>
>> We'd just have to abstract it sanely. I'm sure arm64 has the exact
>> same issue with PAN - maybe it saves properly, but the same "we
>> wouldn't want to go through the scheduler with PAN clear".
>>
>> On x86, we might as well check DF at the same time as AC.
>>
> 
> hpa is right, though — calling into tracing code with AC set is not really so good.  And calling schedule() (via preempt_enable() or whatever) is also bad because it runs all the scheduler code with AC on.  Admittedly, the scheduler is not *that* interesting of an attack surface.
> 

Not just that, but the other question is just how much code we are running
with AC open. It really should only be done in some very small regions.

	-hpa


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  0:24                                       ` Linus Torvalds
  2019-02-19  2:20                                         ` Andy Lutomirski
@ 2019-02-19  8:53                                         ` Julien Thierry
  2019-02-19  9:15                                         ` Peter Zijlstra
  2 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-02-19  8:53 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin
  Cc: Denys Vlasenko, Josh Poimboeuf, Peter Zijlstra, Catalin Marinas,
	valentin.schneider, Will Deacon, Linux List Kernel Mailing,
	Andy Lutomirski, Ingo Molnar, James Morse, Andrew Lutomirski,
	Brian Gerst, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	linux-alpha



On 19/02/2019 00:24, Linus Torvalds wrote:
> On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> The question is what "fix it" means. I'm really concerned about AC escapes,
>> and everyone else should be, too.
> 
> I do think that it might be the right thing to do to add some kind of
> WARN_ON_ONCE() for AC being set in various can-reschedule situations.
> 
> We'd just have to abstract it sanely. I'm sure arm64 has the exact
> same issue with PAN - maybe it saves properly, but the same "we
> wouldn't want to go through the scheduler with PAN clear".
> 

As of right now, we have the same issue on arm64 as on x86. We don't
currently save the PAN bit on task switch, but I have a patch to do that.

Unless we decide to go down the route of warning against uses of
schedule() inside.

As for the abstraction, I had this patch[1] that added another primitive
for the user_access API (although this might not be suited for x86 if
you also want to check DF). However, an issue that appears is where to
perform the check to cover enough ground.

Checking inside the schedule() code you only cover cases where things
have already gone wrong, and not the use of functions that are unsafe to
call inside a user_access region.

[1]
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-January/625385.html

Cheers,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-18 22:30                                     ` H. Peter Anvin
  2019-02-19  0:24                                       ` Linus Torvalds
@ 2019-02-19  9:04                                       ` Peter Zijlstra
  2019-02-19  9:21                                         ` hpa
                                                           ` (2 more replies)
  1 sibling, 3 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-19  9:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dvlasenk, jpoimboe, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, brgerst, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On Mon, Feb 18, 2019 at 02:30:21PM -0800, H. Peter Anvin wrote:
> On 2/16/19 2:30 AM, Peter Zijlstra wrote:
> > On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@zytor.com wrote:
> >> This implies we invoke schedule -- a restricted operation (consider
> >> may_sleep) during execution of STAC-enabled code, but *not* as an
> >> exception or interrupt, since those preserve the flags.
> > 
> > Meet preempt_enable().
> 
> I believe this falls under "doctor, it hurts when I do that." And it hurts for
> very good reasons. See below.

I disagree; the basic rule is that if you're preemptible you must also
be able to schedule and vice-versa. These AC regions violate this.

And, like illustrated, it is _very_ easy to get all sorts inside that AC
crap.

> >> I have serious concerns about this. This is more or less saying that
> >> we have left an unlimited gap and have had AC escape.
> > 
> > Yes; by allowing normal C in between STAC and CLAC this is so.
> > 
> >> Does *anyone* see a need to allow this? I got a question at LPC from
> >> someone about this, and what they were trying to do once all the
> >> layers had been unwound was so far down the wrong track for a root
> >> problem that actually has a very simple solution.
> > 
> > Have you read the rest of the thread?
> > 
> > All it takes for this to explode is a call to a C function that has
> > tracing on it in between the user_access_begin() and user_access_end()
> > things. That is a _very_ easy thing to do.
> > 
> > Heck, GCC is allowed to generate that broken code compiling
> > lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user (esp.
> > with CONFIG_OPTIMIZE_INLINING), and making that a function call would
> > get us fentry hooks and ftrace and *BOOM*.
> > 
> > (Now, of course, its a static function with a single caller, and GCC
> > isn't _THAT_ stupid, but it could, if it wanted to)
> > 
> > Since the bar is _that_ low for mistakes, I figure we'd better fix it.
> > 
> 
> The question is what "fix it" means. 

It means that when we do schedule, the next task doesn't have AC set,
and when we schedule back, we'll restore our AC when we had it. Unlike
the current situation, where the next task will run with AC set.

IOW, it confines AC to the task context where it was set.

> I'm really concerned about AC escapes,
> and everyone else should be, too.

Then _that_ should be asserted.

> For an issue specific to tracing, it would be more appropriate to do the
> appropriate things in the tracing entry/exit than in schedule. Otherwise, we
> don't want to silently paper over mistakes which could mean that we run a
> large portion of the kernel without protection we thought we had.
> 
> In that sense, calling schedule() with AC set is in fact a great place to have
> a WARN() or even BUG(), because such an event really could imply that the
> kernel has been security compromised.

It is not specific to tracing, tracing is just one of the most trivial
and non-obvious ways to make it go splat.

There's lot of fairly innocent stuff that does preempt_disable() /
preempt_enable(). And just a warning in schedule() isn't sufficient,
you'd have to actually trigger a reschedule before you know your code is
bad.

And like I said; the invariant is: if you're preemptible you can
schedule and v.v.

Now, clearly you don't want to mark these whole regions as !preemptible,
because then you can also not take faults, but somehow you're not
worried about the whole fault handler, but you are worried about the
scheduler ?!? How does that work? The fault handler can touch a _ton_
more code. Heck, the fault handler can schedule.

So either pre-fault, and run the whole AC crap with preemption disabled
and retry, or accept that we have to schedule.

> Does that make more sense?

It appears to me you're going about it backwards.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  2:46                                           ` H. Peter Anvin
@ 2019-02-19  9:07                                             ` Julien Thierry
  0 siblings, 0 replies; 58+ messages in thread
From: Julien Thierry @ 2019-02-19  9:07 UTC (permalink / raw)
  To: H. Peter Anvin, Andy Lutomirski, Linus Torvalds
  Cc: Denys Vlasenko, Peter Zijlstra, Catalin Marinas, Josh Poimboeuf,
	Will Deacon, Linux List Kernel Mailing, valentin.schneider,
	Ingo Molnar, James Morse, Andrew Lutomirski, Brian Gerst,
	Borislav Petkov, Thomas Gleixner, Ingo Molnar, linux-alpha



On 19/02/2019 02:46, H. Peter Anvin wrote:
> On 2/18/19 6:20 PM, Andy Lutomirski wrote:
>>
>>
>>> On Feb 18, 2019, at 4:24 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>
>>>> On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>>>
>>>> The question is what "fix it" means. I'm really concerned about AC escapes,
>>>> and everyone else should be, too.
>>>
>>> I do think that it might be the right thing to do to add some kind of
>>> WARN_ON_ONCE() for AC being set in various can-reschedule situations.
>>>
>>> We'd just have to abstract it sanely. I'm sure arm64 has the exact
>>> same issue with PAN - maybe it saves properly, but the same "we
>>> wouldn't want to go through the scheduler with PAN clear".
>>>
>>> On x86, we might as well check DF at the same time as AC.
>>>
>>
>> hpa is right, though — calling into tracing code with AC set is not really so good.  And calling schedule() (via preempt_enable() or whatever) is also bad because it runs all the scheduler code with AC on.  Admittedly, the scheduler is not *that* interesting of an attack surface.
>>
> 
> Not just that, but the other question is just how much code we are running
> with AC open. It really should only be done in some very small regions.

Yes, but we don't really have a way to enforce that, as far as I'm aware.

The user_access_begin/end() is generic API, meaning any arch is free to
implement it. If they don't have the same hardware behaviour as
x86/arm64, it might be that their interrupt/exception entry code will
run with user_access open until they reach the entry code that closes it
(and entry code could potentially be a more interesting attack surface
than the scheduler). This could be the case of software emulated PAN on
arm/arm64 (although currently arm, non-64bit, doesn't have
user_access_begin/end() at the time).

So the whole "very small region" restriction sounds a bit
loose/arbitrary to me...

Thanks,

-- 
Julien Thierry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  0:24                                       ` Linus Torvalds
  2019-02-19  2:20                                         ` Andy Lutomirski
  2019-02-19  8:53                                         ` Julien Thierry
@ 2019-02-19  9:15                                         ` Peter Zijlstra
  2019-02-19  9:19                                           ` Peter Zijlstra
  2 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-19  9:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Denys Vlasenko, Brian Gerst, Julien Thierry,
	Catalin Marinas, valentin.schneider, Will Deacon,
	Linux List Kernel Mailing, Andy Lutomirski, Ingo Molnar,
	James Morse, Andrew Lutomirski, H. Peter Anvin, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Mon, Feb 18, 2019 at 04:24:30PM -0800, Linus Torvalds wrote:
> On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > The question is what "fix it" means. I'm really concerned about AC escapes,
> > and everyone else should be, too.
> 
> I do think that it might be the right thing to do to add some kind of
> WARN_ON_ONCE() for AC being set in various can-reschedule situations.

So I disagree.

Either we set AC with preempt disabled, and then we don't need an extra
warning, because we already have a warning about scheduling with
preemption disabled, or we accept that the fault handler can run.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  9:15                                         ` Peter Zijlstra
@ 2019-02-19  9:19                                           ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-19  9:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Denys Vlasenko, Brian Gerst, Julien Thierry,
	Catalin Marinas, valentin.schneider, Will Deacon,
	Linux List Kernel Mailing, Andy Lutomirski, Ingo Molnar,
	James Morse, Andrew Lutomirski, H. Peter Anvin, Borislav Petkov,
	Thomas Gleixner, Ingo Molnar, linux-alpha

On Tue, Feb 19, 2019 at 10:15:25AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 18, 2019 at 04:24:30PM -0800, Linus Torvalds wrote:
> > On Mon, Feb 18, 2019 at 2:31 PM H. Peter Anvin <hpa@zytor.com> wrote:
> > >
> > > The question is what "fix it" means. I'm really concerned about AC escapes,
> > > and everyone else should be, too.
> > 
> > I do think that it might be the right thing to do to add some kind of
> > WARN_ON_ONCE() for AC being set in various can-reschedule situations.
> 
> So I disagree.
> 
> Either we set AC with preempt disabled, and then we don't need an extra
> warning, because we already have a warning about scheduling with
> preemption disabled, or we accept that the fault handler can run.

n/m about the faults, forgot the obvious :/

I still really dislike wrecking the preemption model over this.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  9:04                                       ` Peter Zijlstra
@ 2019-02-19  9:21                                         ` hpa
  2019-02-19  9:44                                         ` Peter Zijlstra
  2019-02-19 12:48                                         ` Will Deacon
  2 siblings, 0 replies; 58+ messages in thread
From: hpa @ 2019-02-19  9:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dvlasenk, jpoimboe, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, brgerst, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On February 19, 2019 1:04:09 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Mon, Feb 18, 2019 at 02:30:21PM -0800, H. Peter Anvin wrote:
>> On 2/16/19 2:30 AM, Peter Zijlstra wrote:
>> > On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@zytor.com wrote:
>> >> This implies we invoke schedule -- a restricted operation
>(consider
>> >> may_sleep) during execution of STAC-enabled code, but *not* as an
>> >> exception or interrupt, since those preserve the flags.
>> > 
>> > Meet preempt_enable().
>> 
>> I believe this falls under "doctor, it hurts when I do that." And it
>hurts for
>> very good reasons. See below.
>
>I disagree; the basic rule is that if you're preemptible you must also
>be able to schedule and vice-versa. These AC regions violate this.
>
>And, like illustrated, it is _very_ easy to get all sorts inside that
>AC
>crap.
>
>> >> I have serious concerns about this. This is more or less saying
>that
>> >> we have left an unlimited gap and have had AC escape.
>> > 
>> > Yes; by allowing normal C in between STAC and CLAC this is so.
>> > 
>> >> Does *anyone* see a need to allow this? I got a question at LPC
>from
>> >> someone about this, and what they were trying to do once all the
>> >> layers had been unwound was so far down the wrong track for a root
>> >> problem that actually has a very simple solution.
>> > 
>> > Have you read the rest of the thread?
>> > 
>> > All it takes for this to explode is a call to a C function that has
>> > tracing on it in between the user_access_begin() and
>user_access_end()
>> > things. That is a _very_ easy thing to do.
>> > 
>> > Heck, GCC is allowed to generate that broken code compiling
>> > lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user
>(esp.
>> > with CONFIG_OPTIMIZE_INLINING), and making that a function call
>would
>> > get us fentry hooks and ftrace and *BOOM*.
>> > 
>> > (Now, of course, its a static function with a single caller, and
>GCC
>> > isn't _THAT_ stupid, but it could, if it wanted to)
>> > 
>> > Since the bar is _that_ low for mistakes, I figure we'd better fix
>it.
>> > 
>> 
>> The question is what "fix it" means. 
>
>It means that when we do schedule, the next task doesn't have AC set,
>and when we schedule back, we'll restore our AC when we had it. Unlike
>the current situation, where the next task will run with AC set.
>
>IOW, it confines AC to the task context where it was set.
>
>> I'm really concerned about AC escapes,
>> and everyone else should be, too.
>
>Then _that_ should be asserted.
>
>> For an issue specific to tracing, it would be more appropriate to do
>the
>> appropriate things in the tracing entry/exit than in schedule.
>Otherwise, we
>> don't want to silently paper over mistakes which could mean that we
>run a
>> large portion of the kernel without protection we thought we had.
>> 
>> In that sense, calling schedule() with AC set is in fact a great
>place to have
>> a WARN() or even BUG(), because such an event really could imply that
>the
>> kernel has been security compromised.
>
>It is not specific to tracing, tracing is just one of the most trivial
>and non-obvious ways to make it go splat.
>
>There's lot of fairly innocent stuff that does preempt_disable() /
>preempt_enable(). And just a warning in schedule() isn't sufficient,
>you'd have to actually trigger a reschedule before you know your code
>is
>bad.
>
>And like I said; the invariant is: if you're preemptible you can
>schedule and v.v.
>
>Now, clearly you don't want to mark these whole regions as
>!preemptible,
>because then you can also not take faults, but somehow you're not
>worried about the whole fault handler, but you are worried about the
>scheduler ?!? How does that work? The fault handler can touch a _ton_
>more code. Heck, the fault handler can schedule.
>
>So either pre-fault, and run the whole AC crap with preemption disabled
>and retry, or accept that we have to schedule.
>
>> Does that make more sense?
>
>It appears to me you're going about it backwards.

I'm not worried about the fault handler, because AC is always presented/disabled on exception entry; otherwise I would of course be very concerned.

To be clear: I'm not worried about the scheduler itself. I'm worried about how much code we have gone through to get there. Perhaps the scheduler itself is not the right point to probe for it, but we do need to catch things that have gone wrong, rather than just leaving the door wide open.

I would personally be far more comfortable saying you have to disable preemption in user access regions. It may be an unnecessary constraint for x86 and ARM64, but it is safe.

And Julien, yes, it is "somewhat arbitrary", but so are many engineering tradeoffs. Not everything has a very sharply definable line.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  9:04                                       ` Peter Zijlstra
  2019-02-19  9:21                                         ` hpa
@ 2019-02-19  9:44                                         ` Peter Zijlstra
  2019-02-19 11:38                                           ` Thomas Gleixner
  2019-02-19 12:48                                         ` Will Deacon
  2 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-19  9:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dvlasenk, jpoimboe, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, brgerst, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On Tue, Feb 19, 2019 at 10:04:09AM +0100, Peter Zijlstra wrote:
> > Does that make more sense?
> 
> It appears to me you're going about it backwards.

So how about you do a GCC plugin that verifies limits on code-gen
between user_access_begin/user_access_end() ?

 - No CALL/RET
   - implies user_access_end() happens
   - implies no fentry hooks
 - No __preempt_count frobbing
 - No tracepoints
 - ...

That way you put the burden on the special code, not on the rest of the
kernel.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  9:44                                         ` Peter Zijlstra
@ 2019-02-19 11:38                                           ` Thomas Gleixner
  2019-02-19 11:58                                             ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2019-02-19 11:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jpoimboe, dvlasenk, brgerst, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, H. Peter Anvin, bp, torvalds,
	Ingo Molnar, linux-arm-kernel

On Tue, 19 Feb 2019, Peter Zijlstra wrote:

> On Tue, Feb 19, 2019 at 10:04:09AM +0100, Peter Zijlstra wrote:
> > > Does that make more sense?
> > 
> > It appears to me you're going about it backwards.
> 
> So how about you do a GCC plugin that verifies limits on code-gen
> between user_access_begin/user_access_end() ?
> 
>  - No CALL/RET
>    - implies user_access_end() happens
>    - implies no fentry hooks
>  - No __preempt_count frobbing
>  - No tracepoints
>  - ...
> 
> That way you put the burden on the special code, not on the rest of the
> kernel.

And then you have kprobes ....

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19 11:38                                           ` Thomas Gleixner
@ 2019-02-19 11:58                                             ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2019-02-19 11:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: jpoimboe, dvlasenk, brgerst, Julien Thierry, catalin.marinas,
	valentin.schneider, Will Deacon, linux-kernel, Andy Lutomirski,
	mingo, james.morse, luto, H. Peter Anvin, bp, torvalds,
	Ingo Molnar, linux-arm-kernel

On Tue, Feb 19, 2019 at 12:38:42PM +0100, Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, Peter Zijlstra wrote:
> 
> > On Tue, Feb 19, 2019 at 10:04:09AM +0100, Peter Zijlstra wrote:
> > > > Does that make more sense?
> > > 
> > > It appears to me you're going about it backwards.
> > 
> > So how about you do a GCC plugin that verifies limits on code-gen
> > between user_access_begin/user_access_end() ?
> > 
> >  - No CALL/RET
> >    - implies user_access_end() happens
> >    - implies no fentry hooks
> >  - No __preempt_count frobbing
> >  - No tracepoints
> >  - ...
> > 
> > That way you put the burden on the special code, not on the rest of the
> > kernel.
> 
> And then you have kprobes ....

They prod the INT3 byte and then take an exception, and exceptions are
'fine'.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch
  2019-02-19  9:04                                       ` Peter Zijlstra
  2019-02-19  9:21                                         ` hpa
  2019-02-19  9:44                                         ` Peter Zijlstra
@ 2019-02-19 12:48                                         ` Will Deacon
  2 siblings, 0 replies; 58+ messages in thread
From: Will Deacon @ 2019-02-19 12:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jpoimboe, dvlasenk, brgerst, Julien Thierry, catalin.marinas,
	valentin.schneider, linux-kernel, Andy Lutomirski, mingo,
	james.morse, luto, H. Peter Anvin, bp, tglx, torvalds,
	Ingo Molnar, linux-arm-kernel

On Tue, Feb 19, 2019 at 10:04:09AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 18, 2019 at 02:30:21PM -0800, H. Peter Anvin wrote:
> > On 2/16/19 2:30 AM, Peter Zijlstra wrote:
> > > On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@zytor.com wrote:
> > >> This implies we invoke schedule -- a restricted operation (consider
> > >> may_sleep) during execution of STAC-enabled code, but *not* as an
> > >> exception or interrupt, since those preserve the flags.
> > > 
> > > Meet preempt_enable().
> > 
> > I believe this falls under "doctor, it hurts when I do that." And it hurts for
> > very good reasons. See below.
> 
> I disagree; the basic rule is that if you're preemptible you must also
> be able to schedule and vice-versa. These AC regions violate this.
> 
> And, like illustrated, it is _very_ easy to get all sorts inside that AC
> crap.
> 
> > >> I have serious concerns about this. This is more or less saying that
> > >> we have left an unlimited gap and have had AC escape.
> > > 
> > > Yes; by allowing normal C in between STAC and CLAC this is so.
> > > 
> > >> Does *anyone* see a need to allow this? I got a question at LPC from
> > >> someone about this, and what they were trying to do once all the
> > >> layers had been unwound was so far down the wrong track for a root
> > >> problem that actually has a very simple solution.
> > > 
> > > Have you read the rest of the thread?
> > > 
> > > All it takes for this to explode is a call to a C function that has
> > > tracing on it in between the user_access_begin() and user_access_end()
> > > things. That is a _very_ easy thing to do.
> > > 
> > > Heck, GCC is allowed to generate that broken code compiling
> > > lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user (esp.
> > > with CONFIG_OPTIMIZE_INLINING), and making that a function call would
> > > get us fentry hooks and ftrace and *BOOM*.
> > > 
> > > (Now, of course, its a static function with a single caller, and GCC
> > > isn't _THAT_ stupid, but it could, if it wanted to)
> > > 
> > > Since the bar is _that_ low for mistakes, I figure we'd better fix it.
> > > 
> > 
> > The question is what "fix it" means. 
> 
> It means that when we do schedule, the next task doesn't have AC set,
> and when we schedule back, we'll restore our AC when we had it. Unlike
> the current situation, where the next task will run with AC set.
> 
> IOW, it confines AC to the task context where it was set.
> 
> > I'm really concerned about AC escapes,
> > and everyone else should be, too.
> 
> Then _that_ should be asserted.
> 
> > For an issue specific to tracing, it would be more appropriate to do the
> > appropriate things in the tracing entry/exit than in schedule. Otherwise, we
> > don't want to silently paper over mistakes which could mean that we run a
> > large portion of the kernel without protection we thought we had.
> > 
> > In that sense, calling schedule() with AC set is in fact a great place to have
> > a WARN() or even BUG(), because such an event really could imply that the
> > kernel has been security compromised.
> 
> It is not specific to tracing, tracing is just one of the most trivial
> and non-obvious ways to make it go splat.
> 
> There's lot of fairly innocent stuff that does preempt_disable() /
> preempt_enable(). And just a warning in schedule() isn't sufficient,
> you'd have to actually trigger a reschedule before you know your code is
> bad.
> 
> And like I said; the invariant is: if you're preemptible you can
> schedule and v.v.
> 
> Now, clearly you don't want to mark these whole regions as !preemptible,
> because then you can also not take faults, but somehow you're not
> worried about the whole fault handler, but you are worried about the
> scheduler ?!? How does that work? The fault handler can touch a _ton_
> more code. Heck, the fault handler can schedule.
> 
> So either pre-fault, and run the whole AC crap with preemption disabled
> and retry, or accept that we have to schedule.

I think you'll still hate this, but could we not disable preemption during
the uaccess-enabled region, re-enabling it on the fault path after we've
toggled uaccess off and disable it again when we return back to the
uaccess-enabled region? Doesn't help with tracing, but it should at least
handle the common case.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, back to index

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-15 13:58 [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Julien Thierry
2019-01-15 13:58 ` [PATCH v3 1/4] arm64: uaccess: Cleanup get/put_user() Julien Thierry
2019-01-15 13:58 ` [PATCH v3 2/4] arm64: uaccess: Implement unsafe accessors Julien Thierry
2019-01-15 13:58 ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Julien Thierry
2019-01-30 16:58   ` Valentin Schneider
2019-02-04 13:27     ` Julien Thierry
2019-02-11 13:45   ` Ingo Molnar
2019-02-11 13:51     ` Peter Zijlstra
2019-02-12  9:15       ` Julien Thierry
2019-02-13  8:21         ` Ingo Molnar
2019-02-13 10:35         ` Peter Zijlstra
2019-02-13 10:50           ` Julien Thierry
2019-02-13 13:17             ` Peter Zijlstra
2019-02-13 13:20               ` Peter Zijlstra
2019-02-13 14:00               ` Will Deacon
2019-02-13 14:07                 ` Julien Thierry
2019-02-13 14:17                 ` Peter Zijlstra
2019-02-13 14:24                   ` Julien Thierry
2019-02-13 14:40                     ` Peter Zijlstra
2019-02-13 15:08                       ` Peter Zijlstra
2019-02-13 14:25                 ` Peter Zijlstra
2019-02-13 14:39                   ` Julien Thierry
2019-02-13 14:41                     ` Peter Zijlstra
2019-02-13 15:45                       ` Peter Zijlstra
2019-02-13 18:54                         ` Peter Zijlstra
     [not found]                         ` <D61C430D-4321-4114-AB85-671A3C7B8EAE@amacapital.net>
2019-02-13 22:21                           ` Peter Zijlstra
2019-02-13 22:49                             ` Andy Lutomirski
2019-02-14 10:14                               ` [PATCH] sched/x86: Save [ER]FLAGS on context switch Peter Zijlstra
2019-02-14 16:18                                 ` Brian Gerst
2019-02-14 19:34                                   ` Peter Zijlstra
2019-02-15 14:34                                     ` Brian Gerst
2019-02-15 17:18                                     ` Linus Torvalds
2019-02-15 17:40                                       ` Peter Zijlstra
2019-02-15 18:28                                         ` Andy Lutomirski
2019-02-15 23:34                                         ` Peter Zijlstra
2019-02-16  0:21                                           ` Linus Torvalds
2019-02-16 10:32                                             ` Peter Zijlstra
2019-02-16  4:06                                 ` hpa
2019-02-16 10:30                                   ` Peter Zijlstra
2019-02-18 22:30                                     ` H. Peter Anvin
2019-02-19  0:24                                       ` Linus Torvalds
2019-02-19  2:20                                         ` Andy Lutomirski
2019-02-19  2:46                                           ` H. Peter Anvin
2019-02-19  9:07                                             ` Julien Thierry
2019-02-19  8:53                                         ` Julien Thierry
2019-02-19  9:15                                         ` Peter Zijlstra
2019-02-19  9:19                                           ` Peter Zijlstra
2019-02-19  9:04                                       ` Peter Zijlstra
2019-02-19  9:21                                         ` hpa
2019-02-19  9:44                                         ` Peter Zijlstra
2019-02-19 11:38                                           ` Thomas Gleixner
2019-02-19 11:58                                             ` Peter Zijlstra
2019-02-19 12:48                                         ` Will Deacon
2019-02-18  9:03                                 ` [PATCH v2] " Peter Zijlstra
2019-02-13 23:19                         ` [PATCH v3 3/4] uaccess: Check no rescheduling function is called in unsafe region Linus Torvalds
2019-01-15 13:58 ` [PATCH v3 4/4] arm64: uaccess: Implement user_access_region_active Julien Thierry
2019-01-25 14:27 ` [PATCH v3 0/4] uaccess: Add unsafe accessors for arm64 Catalin Marinas
2019-01-30 16:17 ` Julien Thierry

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org infradead-linux-arm-kernel@archiver.kernel.org
	public-inbox-index linux-arm-kernel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox